feat: use postgres
This commit is contained in:
+101
-48
@@ -155,7 +155,9 @@ Intentional runtime omissions in v1:
|
||||
|
||||
Required:
|
||||
|
||||
- `NOTIFICATION_REDIS_ADDR`
|
||||
- `NOTIFICATION_REDIS_MASTER_ADDR`
|
||||
- `NOTIFICATION_REDIS_PASSWORD`
|
||||
- `NOTIFICATION_POSTGRES_PRIMARY_DSN`
|
||||
- `NOTIFICATION_USER_SERVICE_BASE_URL`
|
||||
|
||||
Primary configuration groups:
|
||||
@@ -168,12 +170,18 @@ Primary configuration groups:
|
||||
- `NOTIFICATION_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s`
|
||||
- `NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT` with default `10s`
|
||||
- `NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m`
|
||||
- Redis connectivity:
|
||||
- `NOTIFICATION_REDIS_USERNAME`
|
||||
- `NOTIFICATION_REDIS_PASSWORD`
|
||||
- Redis connectivity (master/replica/password shape; the deprecated
|
||||
`NOTIFICATION_REDIS_ADDR`, `NOTIFICATION_REDIS_USERNAME`, and
|
||||
`NOTIFICATION_REDIS_TLS_ENABLED` env vars are rejected at startup):
|
||||
- `NOTIFICATION_REDIS_REPLICA_ADDRS` (optional, comma-separated)
|
||||
- `NOTIFICATION_REDIS_DB`
|
||||
- `NOTIFICATION_REDIS_TLS_ENABLED`
|
||||
- `NOTIFICATION_REDIS_OPERATION_TIMEOUT`
|
||||
- PostgreSQL connectivity:
|
||||
- `NOTIFICATION_POSTGRES_REPLICA_DSNS` (optional, comma-separated)
|
||||
- `NOTIFICATION_POSTGRES_OPERATION_TIMEOUT`
|
||||
- `NOTIFICATION_POSTGRES_MAX_OPEN_CONNS`
|
||||
- `NOTIFICATION_POSTGRES_MAX_IDLE_CONNS`
|
||||
- `NOTIFICATION_POSTGRES_CONN_MAX_LIFETIME`
|
||||
- stream names:
|
||||
- `NOTIFICATION_INTENTS_STREAM` with default `notification:intents`
|
||||
- `NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT` with default `2s`
|
||||
@@ -186,9 +194,13 @@ Primary configuration groups:
|
||||
- `NOTIFICATION_ROUTE_BACKOFF_MIN` with default `1s`
|
||||
- `NOTIFICATION_ROUTE_BACKOFF_MAX` with default `5m`
|
||||
- `NOTIFICATION_ROUTE_LEASE_TTL` with default `5s`
|
||||
- `NOTIFICATION_DEAD_LETTER_TTL` with default `720h`
|
||||
- `NOTIFICATION_RECORD_TTL` with default `720h`
|
||||
- `NOTIFICATION_IDEMPOTENCY_TTL` with default `168h`
|
||||
- retention (periodic SQL retention worker; replaces the previous
|
||||
`NOTIFICATION_DEAD_LETTER_TTL` and `NOTIFICATION_RECORD_TTL` Redis-EXPIRE
|
||||
knobs):
|
||||
- `NOTIFICATION_RECORD_RETENTION` with default `720h`
|
||||
- `NOTIFICATION_MALFORMED_INTENT_RETENTION` with default `2160h`
|
||||
- `NOTIFICATION_CLEANUP_INTERVAL` with default `1h`
|
||||
- `User Service` enrichment:
|
||||
- `NOTIFICATION_USER_SERVICE_TIMEOUT` with default `1s`
|
||||
- administrator routing:
|
||||
@@ -472,52 +484,90 @@ Materialization rules:
|
||||
The service-local aggregate notification status is derived from routes and is
|
||||
not a separate durable source of truth.
|
||||
|
||||
## Redis Logical Model
|
||||
## Persistence Model
|
||||
|
||||
Durable storage is split between PostgreSQL (table-shaped business state)
|
||||
and Redis (streams, runtime coordination). The architectural rules live in
|
||||
[`ARCHITECTURE.md §Persistence Backends`](../ARCHITECTURE.md#persistence-backends);
|
||||
the per-service decision record is
|
||||
[`docs/postgres-migration.md`](docs/postgres-migration.md).
|
||||
|
||||
### PostgreSQL durable state
|
||||
|
||||
The service owns the `notification` schema. Migrations are embedded in the
|
||||
binary (`internal/adapters/postgres/migrations`) and applied at startup via
|
||||
`pkg/postgres.RunMigrations` strictly before any HTTP listener becomes
|
||||
ready. Every time-valued column is `timestamptz`, normalised to UTC by the
|
||||
adapter on bind and scan.
|
||||
|
||||
| Table | Frozen columns |
|
||||
| --- | --- |
|
||||
| `records` | `notification_id`, `notification_type`, `producer`, `audience_kind`, `recipient_user_ids` (jsonb), `payload_json`, `idempotency_key`, `request_fingerprint`, `request_id`, `trace_id`, `occurred_at`, `accepted_at`, `updated_at`, `idempotency_expires_at`; `UNIQUE (producer, idempotency_key)` |
|
||||
| `routes` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `status`, `attempt_count`, `max_attempts`, `next_attempt_at`, `resolved_email`, `resolved_locale`, `last_error_classification`, `last_error_message`, `last_error_at`, `created_at`, `updated_at`, `published_at`, `dead_lettered_at`, `skipped_at`; PRIMARY KEY `(notification_id, route_id)` |
|
||||
| `dead_letters` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `final_attempt_count`, `max_attempts`, `failure_classification`, `failure_message`, `recovery_hint`, `created_at`; PRIMARY KEY `(notification_id, route_id)` cascading from `routes` |
|
||||
| `malformed_intents` | `stream_entry_id`, `notification_type`, `producer`, `idempotency_key`, `failure_code`, `failure_message`, `raw_fields` (jsonb), `recorded_at` |
|
||||
|
||||
Storage rules:
|
||||
|
||||
- durable records are stored as strict JSON blobs
|
||||
- timestamps are stored in Unix milliseconds
|
||||
- dynamic Redis key segments are base64url-encoded
|
||||
- `notification:route_schedule` is one shared sorted set for both `push` and
|
||||
`email`
|
||||
- the durable `records` row IS the idempotency reservation; the
|
||||
`(producer, idempotency_key)` UNIQUE constraint surfaces conflicts as
|
||||
`acceptintent.ErrConflict`
|
||||
- `next_attempt_at` is non-NULL only while the route is a scheduling
|
||||
candidate (`status=pending|failed`); the partial index `routes_due_idx`
|
||||
drives the publishers' `ListDueRoutes` scan
|
||||
- `payload_json` stores the canonical normalized JSON string used for
|
||||
idempotency fingerprinting; `recipient_user_ids` is JSONB and omitted
|
||||
for `audience_kind=admin_email`
|
||||
- terminal transitions clear `next_attempt_at` and stamp the appropriate
|
||||
terminal column (`published_at` / `dead_lettered_at` / `skipped_at`)
|
||||
- record-level retention deletes cascade to `routes` and `dead_letters`
|
||||
via `ON DELETE CASCADE`
|
||||
|
||||
### Redis runtime-coordination state
|
||||
|
||||
| Logical artifact | Redis key |
|
||||
| --- | --- |
|
||||
| `notification_record` | `notification:records:<notification_id>` |
|
||||
| `notification_route` | `notification:routes:<notification_id>:<route_id>` |
|
||||
| temporary route lease | `notification:route_leases:<notification_id>:<route_id>` |
|
||||
| `notification_idempotency_record` | `notification:idempotency:<producer>:<idempotency_key>` |
|
||||
| `notification_dead_letter_entry` | `notification:dead_letters:<notification_id>:<route_id>` |
|
||||
| malformed intent record | `notification:malformed_intents:<stream_entry_id>` |
|
||||
| stream offset record | `notification:stream_offsets:<stream>` |
|
||||
| ingress stream | `notification:intents` |
|
||||
| route schedule sorted set | `notification:route_schedule` |
|
||||
|
||||
| Record | Frozen fields |
|
||||
| --- | --- |
|
||||
| `notification_record` | `notification_id`, `notification_type`, `producer`, `audience_kind`, normalized `recipient_user_ids`, normalized `payload_json`, `idempotency_key`, `request_fingerprint`, optional `request_id`, optional `trace_id`, `occurred_at_ms`, `accepted_at_ms`, `updated_at_ms` |
|
||||
| `notification_route` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `status`, `attempt_count`, `max_attempts`, `next_attempt_at_ms`, optional `resolved_email`, optional `resolved_locale`, optional `last_error_classification`, optional `last_error_message`, optional `last_error_at_ms`, `created_at_ms`, `updated_at_ms`, optional `published_at_ms`, optional `dead_lettered_at_ms`, optional `skipped_at_ms` |
|
||||
| `notification_idempotency_record` | `producer`, `idempotency_key`, `notification_id`, `request_fingerprint`, `created_at_ms`, `expires_at_ms` |
|
||||
| `notification_dead_letter_entry` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `final_attempt_count`, `max_attempts`, `failure_classification`, `failure_message`, `created_at_ms`, optional `recovery_hint` |
|
||||
| malformed intent record | `stream_entry_id`, optional `notification_type`, optional `producer`, optional `idempotency_key`, `failure_code`, `failure_message`, `raw_fields_json`, `recorded_at_ms` |
|
||||
| stream offset record | `stream`, `last_processed_entry_id`, `updated_at_ms` |
|
||||
Storage rules:
|
||||
|
||||
`notification_record.recipient_user_ids` stores a normalized array of unique
|
||||
`user_id` values and is omitted for `audience_kind=admin_email`.
|
||||
`notification_record.payload_json` stores the canonical normalized JSON string
|
||||
used for idempotency fingerprinting.
|
||||
Temporary route lease keys store one opaque worker token and use
|
||||
`NOTIFICATION_ROUTE_LEASE_TTL`; they are service-local coordination state
|
||||
rather than durable records.
|
||||
`notification:route_schedule` stores one member per scheduled route where score
|
||||
= `next_attempt_at_ms` and member = full Redis route key with encoded dynamic
|
||||
segments.
|
||||
Newly accepted publishable routes enter the schedule immediately with
|
||||
`status=pending` and `next_attempt_at_ms = accepted_at_ms`.
|
||||
`failed` routes remain scheduled for retry.
|
||||
`published`, `dead_letter`, and `skipped` are absent from the schedule.
|
||||
Only the current lease holder may finalize one due publication attempt.
|
||||
- dynamic Redis key segments are base64url-encoded
|
||||
- temporary route lease keys store one opaque worker token and use
|
||||
`NOTIFICATION_ROUTE_LEASE_TTL`; they are service-local coordination
|
||||
state rather than durable records, retained on Redis as a per-replica
|
||||
exclusivity hint atop the SQL claim
|
||||
- stream offset records persist plain-XREAD consumer progress for
|
||||
`notification:intents` and never expire
|
||||
- the outbound streams `gateway:client-events` and `mail:delivery_commands`
|
||||
remain Redis Streams owned by Gateway and Mail Service respectively;
|
||||
Notification Service emits one entry through `XADD` before committing
|
||||
the route's PostgreSQL state transition
|
||||
|
||||
### Publisher claim and lease coordination
|
||||
|
||||
`Push` and `Email` publishers share the same scheduling pattern:
|
||||
|
||||
- `routes_due_idx` (the partial index on `next_attempt_at`) replaces the
|
||||
former `notification:route_schedule` ZSET; the SQL query
|
||||
`SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS
|
||||
NOT NULL AND next_attempt_at <= now() ORDER BY next_attempt_at ASC LIMIT
|
||||
N` returns the next due batch
|
||||
- `push` publishers filter for `route_id` prefix `push:`; `email`
|
||||
publishers filter for prefix `email:` so the two workers do not contend
|
||||
- `push` and `email` replicas coordinate through
|
||||
`notification:route_leases:<notification_id>:<route_id>` with
|
||||
`NOTIFICATION_ROUTE_LEASE_TTL`
|
||||
- only the current lease holder finalises one due publication attempt;
|
||||
the durable transition is a `Complete*` SQL transaction with optimistic
|
||||
concurrency on `routes.updated_at` so a stale lease cannot overwrite a
|
||||
fresher row state
|
||||
- newly accepted publishable routes enter the partial index immediately
|
||||
with `status=pending` and `next_attempt_at = accepted_at`
|
||||
- `failed` routes remain in the partial index for retry
|
||||
- `published`, `dead_letter`, and `skipped` clear `next_attempt_at` and
|
||||
drop out of the index
|
||||
|
||||
## Retry And Dead-Letter Policy
|
||||
|
||||
@@ -550,12 +600,15 @@ Rules:
|
||||
|
||||
Retention rules:
|
||||
|
||||
- `notification_record` and `notification_route` use
|
||||
`NOTIFICATION_RECORD_TTL`
|
||||
- `notification_idempotency_record` uses `NOTIFICATION_IDEMPOTENCY_TTL`
|
||||
- `notification_dead_letter_entry` and malformed intent records use
|
||||
`NOTIFICATION_DEAD_LETTER_TTL`
|
||||
- stream offset records do not use TTL
|
||||
- `records` and their cascaded `routes` / `dead_letters` use
|
||||
`NOTIFICATION_RECORD_RETENTION` (deleted by the periodic SQL retention
|
||||
worker after the configured window; cascade clears dependent rows)
|
||||
- the per-record idempotency window (`records.idempotency_expires_at`)
|
||||
uses `NOTIFICATION_IDEMPOTENCY_TTL`
|
||||
- `malformed_intents` use `NOTIFICATION_MALFORMED_INTENT_RETENTION`
|
||||
(independent retention pass)
|
||||
- the retention worker runs once per `NOTIFICATION_CLEANUP_INTERVAL`
|
||||
- stream offset records do not expire
|
||||
|
||||
## Observability
|
||||
|
||||
|
||||
Reference in New Issue
Block a user