Files
galaxy-game/notification/docs/postgres-migration.md
T
2026-04-26 20:34:39 +02:00

14 KiB

PostgreSQL Migration

PG_PLAN.md §5 migrated galaxy/notification from a Redis-only durable store to the steady-state split codified in ARCHITECTURE.md §Persistence Backends: PostgreSQL is the source of truth for table-shaped notification state, and Redis keeps only the inbound notification:intents stream, the two outbound streams (gateway:client-events, mail:delivery_commands), the persisted consumer offset, and the short-lived per-route exclusivity lease.

This document records the schema decisions and the non-obvious agreements behind them. Use it together with the migration script (internal/adapters/postgres/migrations/00001_init.sql) and the runtime wiring (internal/app/runtime.go).

Outcomes

  • Schema notification (provisioned externally) holds the durable state: records, routes, dead_letters, malformed_intents.
  • The runtime opens one PostgreSQL pool via pkg/postgres.OpenPrimary, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails.
  • The runtime opens one shared *redis.Client via pkg/redisconn.NewMasterClient and passes it to the intent consumer, the publishers (outbound XADDs), the route lease store, and the persisted stream offset store.
  • The Redis adapter package (internal/adapters/redisstate/) is reduced to the surviving LeaseStore, StreamOffsetStore, and a slim Keyspace exposing only RouteLease(notificationID, routeID), StreamOffset(stream), and Intents(). The Lua-backed atomic writer, the route-state mutation scripts, the records/routes/idempotency/dead- letters/malformed-intents keyspace, and the per-record TTL constants are gone.
  • Configuration drops NOTIFICATION_REDIS_USERNAME / NOTIFICATION_REDIS_TLS_ENABLED / NOTIFICATION_REDIS_ADDR and introduces NOTIFICATION_REDIS_MASTER_ADDR / NOTIFICATION_REDIS_REPLICA_ADDRS plus NOTIFICATION_POSTGRES_*. The retention knobs NOTIFICATION_RECORD_TTL / NOTIFICATION_DEAD_LETTER_TTL are renamed to NOTIFICATION_RECORD_RETENTION / NOTIFICATION_MALFORMED_INTENT_RETENTION, and a new NOTIFICATION_CLEANUP_INTERVAL drives the periodic SQL retention worker.

Decisions

1. One schema, externally-provisioned role

Decision. The notification schema and the matching notificationservice role are created outside the migration sequence (in tests, by integration/internal/harness/postgres_container.go::EnsureRoleAndSchema; in production, by an ops init script not in scope for this stage). The embedded migration 00001_init.sql only contains DDL for tables and indexes and assumes it runs as the schema owner with search_path=notification.

Why. Mixing role creation, schema creation, and table DDL into one script forces every consumer of the migration to run as a superuser. The schema-per-service architectural rule (ARCHITECTURE.md §Persistence Backends) lines up neatly with the operational split: ops provisions roles and schemas, the service applies schema-scoped migrations.

2. Idempotency record IS the records row

Decision. The records table carries producer, idempotency_key, request_fingerprint, and idempotency_expires_at columns and a UNIQUE (producer, idempotency_key) constraint. Acceptance flows insert the row directly; a duplicate request races on the UNIQUE constraint and surfaces as acceptintent.ErrConflict. There is no separate idempotency table.

Why. PG_PLAN.md §3 fixed this rule for every PG-backed service. With the reservation living on the durable record, recovery is a single fact — the row either exists or it does not — so no Redis-loss window can make a duplicate sneak through. The records.accepted_at value doubles as the IdempotencyRecord.CreatedAt returned to the service layer.

3. recipient_user_ids as JSONB

Decision. records.recipient_user_ids stores the normalized recipient user-id list as a JSONB column. The codec round-trips a nil slice as [] to keep the column NOT NULL while letting the read path return a nil slice when the audience is not user-targeted.

Why. The list is opaque to queries (we never element-filter on it). JSONB lines up with the "everything outside primary fields is JSON" pattern Mail Stage 4 already established; PostgreSQL will accept a future GIN index on recipient_user_ids jsonb_path_ops if a recipient-filtered operator UI ever lands. text[] would have forced a pgtype.Array[string] boundary type and a different scan path with no functional benefit today.

4. Timestamps are uniformly timestamptz and always UTC at the boundary

Decision. Every time-valued column on every Stage 5 table uses PostgreSQL's timestamptz. The domain model continues to use time.Time; the adapter normalises every time.Time parameter to UTC at the binding site (record.X.UTC() or the nullableTime helper that wraps a possibly zero-valued time.Time), and re-wraps every scanned time.Time with .UTC() (directly or via timeFromNullable for nullable columns) before it leaves the adapter. The architecture-wide form of this rule lives in ARCHITECTURE.md §Persistence Backends → Timestamp handling.

Why. PG_PLAN.md §5 originally specified _ms epoch-millisecond columns. User Service Stage 3 and Mail Service Stage 4 already use timestamptz for every table and the runtime contract tests expect Go-level time.Time semantics throughout. Keeping the same shape across services reduces adapter-layer complexity and avoids two parallel encoding paths in the notificationstore. The deviation from the literal plan is intentional and is documented here. The defensive .UTC() rule on both sides eliminates the class of bug where the pgx driver returns scanned values in time.Local, which silently breaks equality tests, JSON formatting, and comparison against pointer fields.

5. Scheduler claim is non-locking; transitions use optimistic concurrency on updated_at

Decision. ListDueRoutes(ctx, now, limit) is a non-locking SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS NOT NULL AND next_attempt_at <= $1 ORDER BY next_attempt_at ASC LIMIT $2. The publisher then takes a Redis lease (route_leases:*), reads the route, emits the outbound stream entry, and calls one of CompleteRoutePublished / CompleteRouteFailed / CompleteRouteDeadLetter. Each Complete* transaction issues UPDATE routes SET ... WHERE notification_id = $a AND route_id = $b AND updated_at = $expectedUpdatedAt; a zero RowsAffected count surfaces as routestate.ErrConflict, which the publisher treats as a no-op (some other replica progressed the row since the worker loaded it).

Why. A FOR UPDATE held across the publisher's whole publish window would serialise concurrent publishers and block the outbound stream emit. Per-row optimistic concurrency on updated_at keeps the lock duration inside the SQL transaction itself; the lease bounds duplicates atop that. The explicit next_attempt_at column (set to NULL for terminal states) keeps the partial index routes_due_idx narrow and avoids the "schedule out of sync with row" failure mode of the previous Redis ZSET + JSON-payload pair.

6. Outbound XADD precedes SQL completion (at-least-once across the dual-system boundary)

Decision. The publisher emits the outbound stream entry through *redis.Client.XAdd before the route's SQL state transition is committed. If the XADD succeeds and the SQL update later fails, the next replica retries — same notification gets a second outbound entry; the consumer side (Gateway, Mail) deduplicates on the entry id. If the XADD fails, recordFailure records a publication failure with classification gateway_stream_publish_failed or mail_stream_publish_failed and schedules a retry.

Why. PG_PLAN.md §5 explicitly endorses this ordering by saying the lease is "atop the SQL claim" rather than replacing it. The lease bounds duplicate emission to one replica per route per lease window; the consumer-side dedupe handles the rare cross-window case. A transactional outbox would solve the duplicate but is out of Stage 5 scope; revisit if duplicate-traffic ever becomes an operational concern.

7. Lease stays on Redis as a hint

Decision. The lease key notification:route_leases:<notificationID>:<routeID> keeps its existing SETNX/Lua-release semantics, lifted into a dedicated redisstate.LeaseStore. The composite internal/adapters/postgres/routepublisher.Store wires the SQL state store and the Redis lease store behind the existing publisher-worker interfaces (PushRouteStateStore, EmailRouteStateStore).

Why. PG_PLAN.md §5 retains the lease as a "short-lived, per-process exclusivity hint atop the SQL claim". Without the lease, two replicas selecting overlapping due batches would each XADD before either commits the SQL transition — duplicating outbound traffic during contention. The lease bounds emission rate to one-per-route-per-lease-TTL even when scans overlap. Keeping the abstraction inside LeaseStore (separate from the SQL store) keeps the architectural split visible.

8. Periodic SQL retention replaces Redis EXPIRE

Decision. A new worker.SQLRetentionWorker runs the two DELETE statements driven by config:

  • DELETE FROM records WHERE accepted_at < now() - $record_retention cascades to routes and dead_letters via ON DELETE CASCADE.
  • DELETE FROM malformed_intents WHERE recorded_at < now() - $malformed_intent_retention is a standalone retention pass.

Three new env vars (NOTIFICATION_RECORD_RETENTION, NOTIFICATION_MALFORMED_INTENT_RETENTION, NOTIFICATION_CLEANUP_INTERVAL) drive the worker. NOTIFICATION_IDEMPOTENCY_TTL survives unchanged: the service layer materialises it on each row as idempotency_expires_at.

Why. PostgreSQL maintains its own indexes; the previous per-key Redis EXPIRE TTL semantics translate to a periodic batch DELETE. The two-knob shape mirrors Mail Stage 4 (MAIL_DELIVERY_RETENTION + MAIL_MALFORMED_COMMAND_RETENTION). The legacy NOTIFICATION_RECORD_TTL / NOTIFICATION_DEAD_LETTER_TTL env vars are intentionally retired without a backward-compat shim — keeping the names would mislead operators reading the runbook because the eviction mechanism genuinely changed.

9. Shared Redis client with consumer-driven shutdown

Decision. internal/app/runtime.go constructs one redisconn.NewMasterClient(cfg.Redis.Conn) (via the thin redisadapter.NewClient wrapper) and passes it to the intent consumer, the lease store, the stream offset store, and both publishers (for their outbound XADDs). The runtime cleanup tolerates redis.ErrClosed so a double-close from any consumer is benign.

Why. Each subsequent PG_PLAN stage (Lobby) ships a similar pattern; sharing one client is the shape we want all stages to converge on. A dedicated client per consumer is the artefact the Redis-only architecture needed; sharing one client multiplies fewer TCP connections, ping points, and OpenTelemetry instrumentation hooks for no functional benefit.

10. Query layer is go-jet/jet/v2

Decision. All notificationstore packages build SQL through the jet builder API (pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE plus the pg.AND/OR/SET/MIN/COUNT/... DSL). cmd/jetgen (invoked via make jet) brings up a transient PostgreSQL container, applies the embedded migrations, and runs github.com/go-jet/jet/v2/generator/postgres.GenerateDB against the provisioned schema; the generated table/model code lives under internal/adapters/postgres/jet/notification/{model,table}/*.go and is committed to the repo, so build consumers do not need Docker. Statements are run through the database/sql API (stmt.Sql() → db/tx.Exec/Query/QueryRow); manual rowScanner helpers preserve the codecs.go boundary translations and domain-type mapping.

Why. Aligns with PG_PLAN.md §Library stack ("Query layer: github.com/go-jet/jet/v2 (PostgreSQL dialect). Generated code lives under each service internal/adapters/postgres/jet/, regenerated via a make jet target and committed to the repo"). Constructs the jet builder does not cover natively (MIN(timestamptz) aggregates, optimistic-concurrency WHERE updated_at = $expected, JSONB params) are expressed through the per-DSL helpers (pg.MIN(...), pg.TimestampzT(...), direct []byte/string params for JSONB columns).

Cross-References

  • PG_PLAN.md §5 (Stage 5 — Notification Service migration).
  • ARCHITECTURE.md §Persistence Backends.
  • internal/adapters/postgres/migrations/00001_init.sql and internal/adapters/postgres/migrations/migrations.go.
  • internal/adapters/postgres/notificationstore/{store,records,routes, acceptance,scheduler,dead_letters,malformed_intents,retention,codecs, helpers}.go plus the testcontainers-backed unit suite under notificationstore/{harness,store}_test.go.
  • internal/adapters/postgres/jet/notification/{model,table}/*.go (committed generated code) plus cmd/jetgen/main.go and the make jet Makefile target that regenerate it.
  • internal/adapters/postgres/routepublisher/store.go (composite PG state + Redis lease behind the publisher contracts).
  • internal/service/routestate/types.go (storage-agnostic value types).
  • internal/config/{config,env}.go (PostgresConfig plus the redisconn.Config-shaped RedisConfig envelope).
  • internal/app/runtime.go (shared Redis client + PG pool open + migration
    • notificationstore wiring + retention worker startup).
  • internal/worker/sqlretention.go (periodic SQL retention worker).
  • internal/adapters/redisstate/{keyspace,codecs,errors,lease_store, stream_offset_store}.go (surviving slim Redis surface).
  • integration/internal/harness/notificationservice.go (per-suite Postgres container + notification/notificationservice provisioning).