14 KiB
PostgreSQL Migration
PG_PLAN.md §5 migrated galaxy/notification from a Redis-only durable store
to the steady-state split codified in ARCHITECTURE.md §Persistence Backends: PostgreSQL is the source of truth for table-shaped notification
state, and Redis keeps only the inbound notification:intents stream, the
two outbound streams (gateway:client-events, mail:delivery_commands),
the persisted consumer offset, and the short-lived per-route exclusivity
lease.
This document records the schema decisions and the non-obvious agreements
behind them. Use it together with the migration script
(internal/adapters/postgres/migrations/00001_init.sql) and the runtime
wiring (internal/app/runtime.go).
Outcomes
- Schema
notification(provisioned externally) holds the durable state:records,routes,dead_letters,malformed_intents. - The runtime opens one PostgreSQL pool via
pkg/postgres.OpenPrimary, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails. - The runtime opens one shared
*redis.Clientviapkg/redisconn.NewMasterClientand passes it to the intent consumer, the publishers (outbound XADDs), the route lease store, and the persisted stream offset store. - The Redis adapter package (
internal/adapters/redisstate/) is reduced to the survivingLeaseStore,StreamOffsetStore, and a slimKeyspaceexposing onlyRouteLease(notificationID, routeID),StreamOffset(stream), andIntents(). The Lua-backed atomic writer, the route-state mutation scripts, the records/routes/idempotency/dead- letters/malformed-intents keyspace, and the per-record TTL constants are gone. - Configuration drops
NOTIFICATION_REDIS_USERNAME/NOTIFICATION_REDIS_TLS_ENABLED/NOTIFICATION_REDIS_ADDRand introducesNOTIFICATION_REDIS_MASTER_ADDR/NOTIFICATION_REDIS_REPLICA_ADDRSplusNOTIFICATION_POSTGRES_*. The retention knobsNOTIFICATION_RECORD_TTL/NOTIFICATION_DEAD_LETTER_TTLare renamed toNOTIFICATION_RECORD_RETENTION/NOTIFICATION_MALFORMED_INTENT_RETENTION, and a newNOTIFICATION_CLEANUP_INTERVALdrives the periodic SQL retention worker.
Decisions
1. One schema, externally-provisioned role
Decision. The notification schema and the matching
notificationservice role are created outside the migration sequence (in
tests, by
integration/internal/harness/postgres_container.go::EnsureRoleAndSchema;
in production, by an ops init script not in scope for this stage). The
embedded migration 00001_init.sql only contains DDL for tables and
indexes and assumes it runs as the schema owner with
search_path=notification.
Why. Mixing role creation, schema creation, and table DDL into one
script forces every consumer of the migration to run as a superuser. The
schema-per-service architectural rule
(ARCHITECTURE.md §Persistence Backends) lines up neatly with the
operational split: ops provisions roles and schemas, the service applies
schema-scoped migrations.
2. Idempotency record IS the records row
Decision. The records table carries producer, idempotency_key,
request_fingerprint, and idempotency_expires_at columns and a
UNIQUE (producer, idempotency_key) constraint. Acceptance flows insert
the row directly; a duplicate request races on the UNIQUE constraint and
surfaces as acceptintent.ErrConflict. There is no separate idempotency
table.
Why. PG_PLAN.md §3 fixed this rule for every PG-backed service. With
the reservation living on the durable record, recovery is a single fact —
the row either exists or it does not — so no Redis-loss window can make a
duplicate sneak through. The records.accepted_at value doubles as the
IdempotencyRecord.CreatedAt returned to the service layer.
3. recipient_user_ids as JSONB
Decision. records.recipient_user_ids stores the normalized recipient
user-id list as a JSONB column. The codec round-trips a nil slice as []
to keep the column NOT NULL while letting the read path return a nil slice
when the audience is not user-targeted.
Why. The list is opaque to queries (we never element-filter on it).
JSONB lines up with the "everything outside primary fields is JSON"
pattern Mail Stage 4 already established; PostgreSQL will accept a future
GIN index on recipient_user_ids jsonb_path_ops if a recipient-filtered
operator UI ever lands. text[] would have forced a pgtype.Array[string]
boundary type and a different scan path with no functional benefit today.
4. Timestamps are uniformly timestamptz and always UTC at the boundary
Decision. Every time-valued column on every Stage 5 table uses
PostgreSQL's timestamptz. The domain model continues to use time.Time;
the adapter normalises every time.Time parameter to UTC at the binding
site (record.X.UTC() or the nullableTime helper that wraps a possibly
zero-valued time.Time), and re-wraps every scanned time.Time with
.UTC() (directly or via timeFromNullable for nullable columns) before
it leaves the adapter. The architecture-wide form of this rule lives in
ARCHITECTURE.md §Persistence Backends → Timestamp handling.
Why. PG_PLAN.md §5 originally specified _ms epoch-millisecond
columns. User Service Stage 3 and Mail Service Stage 4 already use
timestamptz for every table and the runtime contract tests expect
Go-level time.Time semantics throughout. Keeping the same shape across
services reduces adapter-layer complexity and avoids two parallel encoding
paths in the notificationstore. The deviation from the literal plan is
intentional and is documented here. The defensive .UTC() rule on both
sides eliminates the class of bug where the pgx driver returns scanned
values in time.Local, which silently breaks equality tests, JSON
formatting, and comparison against pointer fields.
5. Scheduler claim is non-locking; transitions use optimistic concurrency on updated_at
Decision. ListDueRoutes(ctx, now, limit) is a non-locking
SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS NOT NULL AND next_attempt_at <= $1 ORDER BY next_attempt_at ASC LIMIT $2.
The publisher then takes a Redis lease (route_leases:*), reads the
route, emits the outbound stream entry, and calls one of
CompleteRoutePublished / CompleteRouteFailed /
CompleteRouteDeadLetter. Each Complete* transaction issues
UPDATE routes SET ... WHERE notification_id = $a AND route_id = $b AND updated_at = $expectedUpdatedAt; a zero RowsAffected count surfaces as
routestate.ErrConflict, which the publisher treats as a no-op (some other
replica progressed the row since the worker loaded it).
Why. A FOR UPDATE held across the publisher's whole publish window
would serialise concurrent publishers and block the outbound stream emit.
Per-row optimistic concurrency on updated_at keeps the lock duration
inside the SQL transaction itself; the lease bounds duplicates atop that.
The explicit next_attempt_at column (set to NULL for terminal states)
keeps the partial index routes_due_idx narrow and avoids the "schedule
out of sync with row" failure mode of the previous Redis ZSET +
JSON-payload pair.
6. Outbound XADD precedes SQL completion (at-least-once across the dual-system boundary)
Decision. The publisher emits the outbound stream entry through
*redis.Client.XAdd before the route's SQL state transition is
committed. If the XADD succeeds and the SQL update later fails, the next
replica retries — same notification gets a second outbound entry; the
consumer side (Gateway, Mail) deduplicates on the entry id. If the XADD
fails, recordFailure records a publication failure with classification
gateway_stream_publish_failed or mail_stream_publish_failed and
schedules a retry.
Why. PG_PLAN.md §5 explicitly endorses this ordering by saying the lease is "atop the SQL claim" rather than replacing it. The lease bounds duplicate emission to one replica per route per lease window; the consumer-side dedupe handles the rare cross-window case. A transactional outbox would solve the duplicate but is out of Stage 5 scope; revisit if duplicate-traffic ever becomes an operational concern.
7. Lease stays on Redis as a hint
Decision. The lease key notification:route_leases:<notificationID>:<routeID>
keeps its existing SETNX/Lua-release semantics, lifted into a dedicated
redisstate.LeaseStore. The composite
internal/adapters/postgres/routepublisher.Store wires the SQL state
store and the Redis lease store behind the existing publisher-worker
interfaces (PushRouteStateStore, EmailRouteStateStore).
Why. PG_PLAN.md §5 retains the lease as a "short-lived, per-process
exclusivity hint atop the SQL claim". Without the lease, two replicas
selecting overlapping due batches would each XADD before either commits
the SQL transition — duplicating outbound traffic during contention. The
lease bounds emission rate to one-per-route-per-lease-TTL even when scans
overlap. Keeping the abstraction inside LeaseStore (separate from the
SQL store) keeps the architectural split visible.
8. Periodic SQL retention replaces Redis EXPIRE
Decision. A new worker.SQLRetentionWorker runs the two DELETE
statements driven by config:
DELETE FROM records WHERE accepted_at < now() - $record_retentioncascades toroutesanddead_lettersviaON DELETE CASCADE.DELETE FROM malformed_intents WHERE recorded_at < now() - $malformed_intent_retentionis a standalone retention pass.
Three new env vars (NOTIFICATION_RECORD_RETENTION,
NOTIFICATION_MALFORMED_INTENT_RETENTION,
NOTIFICATION_CLEANUP_INTERVAL) drive the worker.
NOTIFICATION_IDEMPOTENCY_TTL survives unchanged: the service layer
materialises it on each row as idempotency_expires_at.
Why. PostgreSQL maintains its own indexes; the previous per-key Redis
EXPIRE TTL semantics translate to a periodic batch DELETE. The two-knob
shape mirrors Mail Stage 4 (MAIL_DELIVERY_RETENTION +
MAIL_MALFORMED_COMMAND_RETENTION). The legacy
NOTIFICATION_RECORD_TTL / NOTIFICATION_DEAD_LETTER_TTL env vars are
intentionally retired without a backward-compat shim — keeping the names
would mislead operators reading the runbook because the eviction
mechanism genuinely changed.
9. Shared Redis client with consumer-driven shutdown
Decision. internal/app/runtime.go constructs one
redisconn.NewMasterClient(cfg.Redis.Conn) (via the thin
redisadapter.NewClient wrapper) and passes it to the intent consumer,
the lease store, the stream offset store, and both publishers (for their
outbound XADDs). The runtime cleanup tolerates redis.ErrClosed so a
double-close from any consumer is benign.
Why. Each subsequent PG_PLAN stage (Lobby) ships a similar pattern; sharing one client is the shape we want all stages to converge on. A dedicated client per consumer is the artefact the Redis-only architecture needed; sharing one client multiplies fewer TCP connections, ping points, and OpenTelemetry instrumentation hooks for no functional benefit.
10. Query layer is go-jet/jet/v2
Decision. All notificationstore packages build SQL through the
jet builder API (pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE plus
the pg.AND/OR/SET/MIN/COUNT/... DSL). cmd/jetgen (invoked via
make jet) brings up a transient PostgreSQL container, applies the
embedded migrations, and runs
github.com/go-jet/jet/v2/generator/postgres.GenerateDB against the
provisioned schema; the generated table/model code lives under
internal/adapters/postgres/jet/notification/{model,table}/*.go and
is committed to the repo, so build consumers do not need Docker.
Statements are run through the database/sql API
(stmt.Sql() → db/tx.Exec/Query/QueryRow); manual rowScanner
helpers preserve the codecs.go boundary translations and domain-type
mapping.
Why. Aligns with PG_PLAN.md §Library stack ("Query layer:
github.com/go-jet/jet/v2 (PostgreSQL dialect). Generated code lives
under each service internal/adapters/postgres/jet/, regenerated via
a make jet target and committed to the repo"). Constructs the jet
builder does not cover natively (MIN(timestamptz) aggregates,
optimistic-concurrency WHERE updated_at = $expected, JSONB params)
are expressed through the per-DSL helpers (pg.MIN(...),
pg.TimestampzT(...), direct []byte/string params for JSONB
columns).
Cross-References
PG_PLAN.md §5(Stage 5 — Notification Service migration).ARCHITECTURE.md §Persistence Backends.internal/adapters/postgres/migrations/00001_init.sqlandinternal/adapters/postgres/migrations/migrations.go.internal/adapters/postgres/notificationstore/{store,records,routes, acceptance,scheduler,dead_letters,malformed_intents,retention,codecs, helpers}.goplus the testcontainers-backed unit suite undernotificationstore/{harness,store}_test.go.internal/adapters/postgres/jet/notification/{model,table}/*.go(committed generated code) pluscmd/jetgen/main.goand themake jetMakefile target that regenerate it.internal/adapters/postgres/routepublisher/store.go(composite PG state + Redis lease behind the publisher contracts).internal/service/routestate/types.go(storage-agnostic value types).internal/config/{config,env}.go(PostgresConfigplus theredisconn.Config-shapedRedisConfigenvelope).internal/app/runtime.go(shared Redis client + PG pool open + migration- notificationstore wiring + retention worker startup).
internal/worker/sqlretention.go(periodic SQL retention worker).internal/adapters/redisstate/{keyspace,codecs,errors,lease_store, stream_offset_store}.go(surviving slim Redis surface).integration/internal/harness/notificationservice.go(per-suite Postgres container +notification/notificationserviceprovisioning).