feat: use postgres

This commit is contained in:
Ilia Denisov
2026-04-26 20:34:39 +02:00
committed by GitHub
parent 48b0056b49
commit fe829285a6
365 changed files with 29223 additions and 24049 deletions
+3 -1
View File
@@ -8,7 +8,9 @@ placeholders unless explicitly stated otherwise.
Minimal local runtime:
```dotenv
NOTIFICATION_REDIS_ADDR=127.0.0.1:6379
NOTIFICATION_REDIS_MASTER_ADDR=127.0.0.1:6379
NOTIFICATION_REDIS_PASSWORD=integration
NOTIFICATION_POSTGRES_PRIMARY_DSN=postgres://notificationservice:notificationservice@127.0.0.1:5432/galaxy?search_path=notification&sslmode=disable
NOTIFICATION_INTERNAL_HTTP_ADDR=:8092
NOTIFICATION_USER_SERVICE_BASE_URL=http://127.0.0.1:8091
+265
View File
@@ -0,0 +1,265 @@
# PostgreSQL Migration
PG_PLAN.md §5 migrated `galaxy/notification` from a Redis-only durable store
to the steady-state split codified in `ARCHITECTURE.md §Persistence
Backends`: PostgreSQL is the source of truth for table-shaped notification
state, and Redis keeps only the inbound `notification:intents` stream, the
two outbound streams (`gateway:client-events`, `mail:delivery_commands`),
the persisted consumer offset, and the short-lived per-route exclusivity
lease.
This document records the schema decisions and the non-obvious agreements
behind them. Use it together with the migration script
(`internal/adapters/postgres/migrations/00001_init.sql`) and the runtime
wiring (`internal/app/runtime.go`).
## Outcomes
- Schema `notification` (provisioned externally) holds the durable state:
`records`, `routes`, `dead_letters`, `malformed_intents`.
- The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`,
applies embedded goose migrations strictly before any HTTP listener
becomes ready, and exits non-zero when migration or ping fails.
- The runtime opens one shared `*redis.Client` via
`pkg/redisconn.NewMasterClient` and passes it to the intent consumer, the
publishers (outbound XADDs), the route lease store, and the persisted
stream offset store.
- The Redis adapter package (`internal/adapters/redisstate/`) is reduced to
the surviving `LeaseStore`, `StreamOffsetStore`, and a slim `Keyspace`
exposing only `RouteLease(notificationID, routeID)`,
`StreamOffset(stream)`, and `Intents()`. The Lua-backed atomic writer,
the route-state mutation scripts, the records/routes/idempotency/dead-
letters/malformed-intents keyspace, and the per-record TTL constants are
gone.
- Configuration drops `NOTIFICATION_REDIS_USERNAME` /
`NOTIFICATION_REDIS_TLS_ENABLED` / `NOTIFICATION_REDIS_ADDR` and
introduces `NOTIFICATION_REDIS_MASTER_ADDR` /
`NOTIFICATION_REDIS_REPLICA_ADDRS` plus `NOTIFICATION_POSTGRES_*`. The
retention knobs `NOTIFICATION_RECORD_TTL` /
`NOTIFICATION_DEAD_LETTER_TTL` are renamed to
`NOTIFICATION_RECORD_RETENTION` /
`NOTIFICATION_MALFORMED_INTENT_RETENTION`, and a new
`NOTIFICATION_CLEANUP_INTERVAL` drives the periodic SQL retention
worker.
## Decisions
### 1. One schema, externally-provisioned role
**Decision.** The `notification` schema and the matching
`notificationservice` role are created outside the migration sequence (in
tests, by
`integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`;
in production, by an ops init script not in scope for this stage). The
embedded migration `00001_init.sql` only contains DDL for tables and
indexes and assumes it runs as the schema owner with
`search_path=notification`.
**Why.** Mixing role creation, schema creation, and table DDL into one
script forces every consumer of the migration to run as a superuser. The
schema-per-service architectural rule
(`ARCHITECTURE.md §Persistence Backends`) lines up neatly with the
operational split: ops provisions roles and schemas, the service applies
schema-scoped migrations.
### 2. Idempotency record IS the records row
**Decision.** The `records` table carries `producer`, `idempotency_key`,
`request_fingerprint`, and `idempotency_expires_at` columns and a
`UNIQUE (producer, idempotency_key)` constraint. Acceptance flows insert
the row directly; a duplicate request races on the UNIQUE constraint and
surfaces as `acceptintent.ErrConflict`. There is no separate idempotency
table.
**Why.** PG_PLAN.md §3 fixed this rule for every PG-backed service. With
the reservation living on the durable record, recovery is a single fact —
the row either exists or it does not — so no Redis-loss window can make a
duplicate sneak through. The `records.accepted_at` value doubles as the
`IdempotencyRecord.CreatedAt` returned to the service layer.
### 3. `recipient_user_ids` as JSONB
**Decision.** `records.recipient_user_ids` stores the normalized recipient
user-id list as a JSONB column. The codec round-trips a nil slice as `[]`
to keep the column NOT NULL while letting the read path return a nil slice
when the audience is not user-targeted.
**Why.** The list is opaque to queries (we never element-filter on it).
JSONB lines up with the "everything outside primary fields is JSON"
pattern Mail Stage 4 already established; PostgreSQL will accept a future
GIN index on `recipient_user_ids jsonb_path_ops` if a recipient-filtered
operator UI ever lands. `text[]` would have forced a `pgtype.Array[string]`
boundary type and a different scan path with no functional benefit today.
### 4. Timestamps are uniformly `timestamptz` and always UTC at the boundary
**Decision.** Every time-valued column on every Stage 5 table uses
PostgreSQL's `timestamptz`. The domain model continues to use `time.Time`;
the adapter normalises every `time.Time` parameter to UTC at the binding
site (`record.X.UTC()` or the `nullableTime` helper that wraps a possibly
zero-valued `time.Time`), and re-wraps every scanned `time.Time` with
`.UTC()` (directly or via `timeFromNullable` for nullable columns) before
it leaves the adapter. The architecture-wide form of this rule lives in
`ARCHITECTURE.md §Persistence Backends → Timestamp handling`.
**Why.** PG_PLAN.md §5 originally specified `_ms` epoch-millisecond
columns. User Service Stage 3 and Mail Service Stage 4 already use
`timestamptz` for every table and the runtime contract tests expect
Go-level `time.Time` semantics throughout. Keeping the same shape across
services reduces adapter-layer complexity and avoids two parallel encoding
paths in the notificationstore. The deviation from the literal plan is
intentional and is documented here. The defensive `.UTC()` rule on both
sides eliminates the class of bug where the pgx driver returns scanned
values in `time.Local`, which silently breaks equality tests, JSON
formatting, and comparison against pointer fields.
### 5. Scheduler claim is non-locking; transitions use optimistic concurrency on `updated_at`
**Decision.** `ListDueRoutes(ctx, now, limit)` is a non-locking
`SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS
NOT NULL AND next_attempt_at <= $1 ORDER BY next_attempt_at ASC LIMIT $2`.
The publisher then takes a Redis lease (`route_leases:*`), reads the
route, emits the outbound stream entry, and calls one of
`CompleteRoutePublished` / `CompleteRouteFailed` /
`CompleteRouteDeadLetter`. Each `Complete*` transaction issues
`UPDATE routes SET ... WHERE notification_id = $a AND route_id = $b AND
updated_at = $expectedUpdatedAt`; a zero `RowsAffected` count surfaces as
`routestate.ErrConflict`, which the publisher treats as a no-op (some other
replica progressed the row since the worker loaded it).
**Why.** A `FOR UPDATE` held across the publisher's whole publish window
would serialise concurrent publishers and block the outbound stream emit.
Per-row optimistic concurrency on `updated_at` keeps the lock duration
inside the SQL transaction itself; the lease bounds duplicates atop that.
The explicit `next_attempt_at` column (set to `NULL` for terminal states)
keeps the partial index `routes_due_idx` narrow and avoids the "schedule
out of sync with row" failure mode of the previous Redis ZSET +
JSON-payload pair.
### 6. Outbound XADD precedes SQL completion (at-least-once across the dual-system boundary)
**Decision.** The publisher emits the outbound stream entry through
`*redis.Client.XAdd` *before* the route's SQL state transition is
committed. If the XADD succeeds and the SQL update later fails, the next
replica retries — same notification gets a second outbound entry; the
consumer side (Gateway, Mail) deduplicates on the entry id. If the XADD
fails, `recordFailure` records a publication failure with classification
`gateway_stream_publish_failed` or `mail_stream_publish_failed` and
schedules a retry.
**Why.** PG_PLAN.md §5 explicitly endorses this ordering by saying the
lease is "atop the SQL claim" rather than replacing it. The lease bounds
duplicate emission to one replica per route per lease window; the
consumer-side dedupe handles the rare cross-window case. A transactional
outbox would solve the duplicate but is out of Stage 5 scope; revisit if
duplicate-traffic ever becomes an operational concern.
### 7. Lease stays on Redis as a hint
**Decision.** The lease key `notification:route_leases:<notificationID>:<routeID>`
keeps its existing SETNX/Lua-release semantics, lifted into a dedicated
`redisstate.LeaseStore`. The composite
`internal/adapters/postgres/routepublisher.Store` wires the SQL state
store and the Redis lease store behind the existing publisher-worker
interfaces (`PushRouteStateStore`, `EmailRouteStateStore`).
**Why.** PG_PLAN.md §5 retains the lease as a "short-lived, per-process
exclusivity hint atop the SQL claim". Without the lease, two replicas
selecting overlapping due batches would each XADD before either commits
the SQL transition — duplicating outbound traffic during contention. The
lease bounds emission rate to one-per-route-per-lease-TTL even when scans
overlap. Keeping the abstraction inside `LeaseStore` (separate from the
SQL store) keeps the architectural split visible.
### 8. Periodic SQL retention replaces Redis EXPIRE
**Decision.** A new `worker.SQLRetentionWorker` runs the two DELETE
statements driven by config:
- `DELETE FROM records WHERE accepted_at < now() - $record_retention`
cascades to `routes` and `dead_letters` via `ON DELETE CASCADE`.
- `DELETE FROM malformed_intents WHERE recorded_at < now() -
$malformed_intent_retention` is a standalone retention pass.
Three new env vars (`NOTIFICATION_RECORD_RETENTION`,
`NOTIFICATION_MALFORMED_INTENT_RETENTION`,
`NOTIFICATION_CLEANUP_INTERVAL`) drive the worker.
`NOTIFICATION_IDEMPOTENCY_TTL` survives unchanged: the service layer
materialises it on each row as `idempotency_expires_at`.
**Why.** PostgreSQL maintains its own indexes; the previous per-key Redis
EXPIRE TTL semantics translate to a periodic batch DELETE. The two-knob
shape mirrors Mail Stage 4 (`MAIL_DELIVERY_RETENTION` +
`MAIL_MALFORMED_COMMAND_RETENTION`). The legacy
`NOTIFICATION_RECORD_TTL` / `NOTIFICATION_DEAD_LETTER_TTL` env vars are
intentionally retired without a backward-compat shim — keeping the names
would mislead operators reading the runbook because the eviction
mechanism genuinely changed.
### 9. Shared Redis client with consumer-driven shutdown
**Decision.** `internal/app/runtime.go` constructs one
`redisconn.NewMasterClient(cfg.Redis.Conn)` (via the thin
`redisadapter.NewClient` wrapper) and passes it to the intent consumer,
the lease store, the stream offset store, and both publishers (for their
outbound XADDs). The runtime cleanup tolerates `redis.ErrClosed` so a
double-close from any consumer is benign.
**Why.** Each subsequent PG_PLAN stage (Lobby) ships a similar pattern;
sharing one client is the shape we want all stages to converge on. A
dedicated client per consumer is the artefact the Redis-only architecture
needed; sharing one client multiplies fewer TCP connections, ping points,
and OpenTelemetry instrumentation hooks for no functional benefit.
### 10. Query layer is `go-jet/jet/v2`
**Decision.** All `notificationstore` packages build SQL through the
jet builder API (`pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE` plus
the `pg.AND/OR/SET/MIN/COUNT/...` DSL). `cmd/jetgen` (invoked via
`make jet`) brings up a transient PostgreSQL container, applies the
embedded migrations, and runs
`github.com/go-jet/jet/v2/generator/postgres.GenerateDB` against the
provisioned schema; the generated table/model code lives under
`internal/adapters/postgres/jet/notification/{model,table}/*.go` and
is committed to the repo, so build consumers do not need Docker.
Statements are run through the `database/sql` API
(`stmt.Sql() → db/tx.Exec/Query/QueryRow`); manual `rowScanner`
helpers preserve the codecs.go boundary translations and domain-type
mapping.
**Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer:
`github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives
under each service `internal/adapters/postgres/jet/`, regenerated via
a `make jet` target and committed to the repo"). Constructs the jet
builder does not cover natively (`MIN(timestamptz)` aggregates,
optimistic-concurrency `WHERE updated_at = $expected`, JSONB params)
are expressed through the per-DSL helpers (`pg.MIN(...)`,
`pg.TimestampzT(...)`, direct `[]byte`/string params for JSONB
columns).
## Cross-References
- `PG_PLAN.md §5` (Stage 5 — Notification Service migration).
- `ARCHITECTURE.md §Persistence Backends`.
- `internal/adapters/postgres/migrations/00001_init.sql` and
`internal/adapters/postgres/migrations/migrations.go`.
- `internal/adapters/postgres/notificationstore/{store,records,routes,
acceptance,scheduler,dead_letters,malformed_intents,retention,codecs,
helpers}.go` plus the testcontainers-backed unit suite under
`notificationstore/{harness,store}_test.go`.
- `internal/adapters/postgres/jet/notification/{model,table}/*.go`
(committed generated code) plus `cmd/jetgen/main.go` and the
`make jet` Makefile target that regenerate it.
- `internal/adapters/postgres/routepublisher/store.go` (composite
PG state + Redis lease behind the publisher contracts).
- `internal/service/routestate/types.go` (storage-agnostic value types).
- `internal/config/{config,env}.go` (`PostgresConfig` plus the
`redisconn.Config`-shaped `RedisConfig` envelope).
- `internal/app/runtime.go` (shared Redis client + PG pool open + migration
+ notificationstore wiring + retention worker startup).
- `internal/worker/sqlretention.go` (periodic SQL retention worker).
- `internal/adapters/redisstate/{keyspace,codecs,errors,lease_store,
stream_offset_store}.go` (surviving slim Redis surface).
- `integration/internal/harness/notificationservice.go`
(per-suite Postgres container + `notification`/`notificationservice`
provisioning).
+19 -6
View File
@@ -7,10 +7,16 @@ This runbook covers startup, steady-state verification, shutdown, and common
Before starting the process, confirm:
- `NOTIFICATION_REDIS_ADDR` points to the Redis deployment that stores
notification records, routes, idempotency reservations, malformed intents,
dead letters, stream offsets, and route schedules
- Redis ACL, DB, TLS, and timeout settings match the target environment
- `NOTIFICATION_REDIS_MASTER_ADDR` points to the Redis master deployment
that hosts the inbound `notification:intents` stream, the persisted
consumer offset, the outbound `gateway:client-events` and
`mail:delivery_commands` streams, and the temporary `route_leases:*` keys
- `NOTIFICATION_REDIS_PASSWORD` matches the connection password
(mandatory; the deprecated `NOTIFICATION_REDIS_USERNAME` /
`NOTIFICATION_REDIS_TLS_ENABLED` env vars are rejected at startup)
- `NOTIFICATION_POSTGRES_PRIMARY_DSN` points to the PostgreSQL primary
hosting the `notification` schema; the role must own
`records`, `routes`, `dead_letters`, and `malformed_intents`
- `NOTIFICATION_USER_SERVICE_BASE_URL` points to the trusted internal
`User Service`
- `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM` matches the stream consumed by
@@ -19,11 +25,18 @@ Before starting the process, confirm:
`Mail Service`
- administrator email variables are populated for notification types that
should notify administrators
- retention knobs (`NOTIFICATION_RECORD_RETENTION`,
`NOTIFICATION_MALFORMED_INTENT_RETENTION`,
`NOTIFICATION_CLEANUP_INTERVAL`) are sized for the expected operator
history window
- OpenTelemetry exporter settings point at the intended collector when traces
or metrics are expected outside the process
At startup the process performs a bounded Redis `PING`. Startup fails fast if
configuration validation or Redis connectivity fails.
At startup the process performs a bounded Redis `PING`, opens the
PostgreSQL pool, runs the embedded goose migrations, and only then starts
the internal HTTP probe. Startup fails fast if configuration validation,
Redis connectivity, PostgreSQL connectivity, or migration application
fails.
Known startup caveats:
+20 -7
View File
@@ -129,7 +129,9 @@ Startup fails fast on invalid configuration or unavailable Redis.
Required:
- `NOTIFICATION_REDIS_ADDR`
- `NOTIFICATION_REDIS_MASTER_ADDR`
- `NOTIFICATION_REDIS_PASSWORD`
- `NOTIFICATION_POSTGRES_PRIMARY_DSN`
- `NOTIFICATION_USER_SERVICE_BASE_URL`
Core process config:
@@ -144,12 +146,12 @@ Internal HTTP config:
- `NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT` with default `10s`
- `NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m`
Redis connectivity:
Redis connectivity (master/replica/password shape; the deprecated
`NOTIFICATION_REDIS_ADDR`, `NOTIFICATION_REDIS_USERNAME`, and
`NOTIFICATION_REDIS_TLS_ENABLED` env vars are rejected at startup):
- `NOTIFICATION_REDIS_USERNAME`
- `NOTIFICATION_REDIS_PASSWORD`
- `NOTIFICATION_REDIS_REPLICA_ADDRS` (optional, comma-separated)
- `NOTIFICATION_REDIS_DB`
- `NOTIFICATION_REDIS_TLS_ENABLED`
- `NOTIFICATION_REDIS_OPERATION_TIMEOUT`
- `NOTIFICATION_INTENTS_STREAM`
- `NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT`
@@ -157,6 +159,14 @@ Redis connectivity:
- `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN`
- `NOTIFICATION_MAIL_DELIVERY_COMMANDS_STREAM`
PostgreSQL connectivity:
- `NOTIFICATION_POSTGRES_REPLICA_DSNS` (optional, comma-separated)
- `NOTIFICATION_POSTGRES_OPERATION_TIMEOUT`
- `NOTIFICATION_POSTGRES_MAX_OPEN_CONNS`
- `NOTIFICATION_POSTGRES_MAX_IDLE_CONNS`
- `NOTIFICATION_POSTGRES_CONN_MAX_LIFETIME`
Retry and retention:
- `NOTIFICATION_PUSH_RETRY_MAX_ATTEMPTS`
@@ -164,9 +174,12 @@ Retry and retention:
- `NOTIFICATION_ROUTE_BACKOFF_MIN`
- `NOTIFICATION_ROUTE_BACKOFF_MAX`
- `NOTIFICATION_ROUTE_LEASE_TTL`
- `NOTIFICATION_DEAD_LETTER_TTL`
- `NOTIFICATION_RECORD_TTL`
- `NOTIFICATION_IDEMPOTENCY_TTL`
- `NOTIFICATION_RECORD_RETENTION` (replaces the legacy
`NOTIFICATION_RECORD_TTL`; cascades to `routes` and `dead_letters`)
- `NOTIFICATION_MALFORMED_INTENT_RETENTION` (replaces the legacy
`NOTIFICATION_DEAD_LETTER_TTL`)
- `NOTIFICATION_CLEANUP_INTERVAL` (period of the SQL retention worker)
User enrichment: