feat: use postgres
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# PostgreSQL Migration
|
||||
|
||||
PG_PLAN.md §5 migrated `galaxy/notification` from a Redis-only durable store
|
||||
to the steady-state split codified in `ARCHITECTURE.md §Persistence
|
||||
Backends`: PostgreSQL is the source of truth for table-shaped notification
|
||||
state, and Redis keeps only the inbound `notification:intents` stream, the
|
||||
two outbound streams (`gateway:client-events`, `mail:delivery_commands`),
|
||||
the persisted consumer offset, and the short-lived per-route exclusivity
|
||||
lease.
|
||||
|
||||
This document records the schema decisions and the non-obvious agreements
|
||||
behind them. Use it together with the migration script
|
||||
(`internal/adapters/postgres/migrations/00001_init.sql`) and the runtime
|
||||
wiring (`internal/app/runtime.go`).
|
||||
|
||||
## Outcomes
|
||||
|
||||
- Schema `notification` (provisioned externally) holds the durable state:
|
||||
`records`, `routes`, `dead_letters`, `malformed_intents`.
|
||||
- The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`,
|
||||
applies embedded goose migrations strictly before any HTTP listener
|
||||
becomes ready, and exits non-zero when migration or ping fails.
|
||||
- The runtime opens one shared `*redis.Client` via
|
||||
`pkg/redisconn.NewMasterClient` and passes it to the intent consumer, the
|
||||
publishers (outbound XADDs), the route lease store, and the persisted
|
||||
stream offset store.
|
||||
- The Redis adapter package (`internal/adapters/redisstate/`) is reduced to
|
||||
the surviving `LeaseStore`, `StreamOffsetStore`, and a slim `Keyspace`
|
||||
exposing only `RouteLease(notificationID, routeID)`,
|
||||
`StreamOffset(stream)`, and `Intents()`. The Lua-backed atomic writer,
|
||||
the route-state mutation scripts, the records/routes/idempotency/dead-
|
||||
letters/malformed-intents keyspace, and the per-record TTL constants are
|
||||
gone.
|
||||
- Configuration drops `NOTIFICATION_REDIS_USERNAME` /
|
||||
`NOTIFICATION_REDIS_TLS_ENABLED` / `NOTIFICATION_REDIS_ADDR` and
|
||||
introduces `NOTIFICATION_REDIS_MASTER_ADDR` /
|
||||
`NOTIFICATION_REDIS_REPLICA_ADDRS` plus `NOTIFICATION_POSTGRES_*`. The
|
||||
retention knobs `NOTIFICATION_RECORD_TTL` /
|
||||
`NOTIFICATION_DEAD_LETTER_TTL` are renamed to
|
||||
`NOTIFICATION_RECORD_RETENTION` /
|
||||
`NOTIFICATION_MALFORMED_INTENT_RETENTION`, and a new
|
||||
`NOTIFICATION_CLEANUP_INTERVAL` drives the periodic SQL retention
|
||||
worker.
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. One schema, externally-provisioned role
|
||||
|
||||
**Decision.** The `notification` schema and the matching
|
||||
`notificationservice` role are created outside the migration sequence (in
|
||||
tests, by
|
||||
`integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`;
|
||||
in production, by an ops init script not in scope for this stage). The
|
||||
embedded migration `00001_init.sql` only contains DDL for tables and
|
||||
indexes and assumes it runs as the schema owner with
|
||||
`search_path=notification`.
|
||||
|
||||
**Why.** Mixing role creation, schema creation, and table DDL into one
|
||||
script forces every consumer of the migration to run as a superuser. The
|
||||
schema-per-service architectural rule
|
||||
(`ARCHITECTURE.md §Persistence Backends`) lines up neatly with the
|
||||
operational split: ops provisions roles and schemas, the service applies
|
||||
schema-scoped migrations.
|
||||
|
||||
### 2. Idempotency record IS the records row
|
||||
|
||||
**Decision.** The `records` table carries `producer`, `idempotency_key`,
|
||||
`request_fingerprint`, and `idempotency_expires_at` columns and a
|
||||
`UNIQUE (producer, idempotency_key)` constraint. Acceptance flows insert
|
||||
the row directly; a duplicate request races on the UNIQUE constraint and
|
||||
surfaces as `acceptintent.ErrConflict`. There is no separate idempotency
|
||||
table.
|
||||
|
||||
**Why.** PG_PLAN.md §3 fixed this rule for every PG-backed service. With
|
||||
the reservation living on the durable record, recovery is a single fact —
|
||||
the row either exists or it does not — so no Redis-loss window can make a
|
||||
duplicate sneak through. The `records.accepted_at` value doubles as the
|
||||
`IdempotencyRecord.CreatedAt` returned to the service layer.
|
||||
|
||||
### 3. `recipient_user_ids` as JSONB
|
||||
|
||||
**Decision.** `records.recipient_user_ids` stores the normalized recipient
|
||||
user-id list as a JSONB column. The codec round-trips a nil slice as `[]`
|
||||
to keep the column NOT NULL while letting the read path return a nil slice
|
||||
when the audience is not user-targeted.
|
||||
|
||||
**Why.** The list is opaque to queries (we never element-filter on it).
|
||||
JSONB lines up with the "everything outside primary fields is JSON"
|
||||
pattern Mail Stage 4 already established; PostgreSQL will accept a future
|
||||
GIN index on `recipient_user_ids jsonb_path_ops` if a recipient-filtered
|
||||
operator UI ever lands. `text[]` would have forced a `pgtype.Array[string]`
|
||||
boundary type and a different scan path with no functional benefit today.
|
||||
|
||||
### 4. Timestamps are uniformly `timestamptz` and always UTC at the boundary
|
||||
|
||||
**Decision.** Every time-valued column on every Stage 5 table uses
|
||||
PostgreSQL's `timestamptz`. The domain model continues to use `time.Time`;
|
||||
the adapter normalises every `time.Time` parameter to UTC at the binding
|
||||
site (`record.X.UTC()` or the `nullableTime` helper that wraps a possibly
|
||||
zero-valued `time.Time`), and re-wraps every scanned `time.Time` with
|
||||
`.UTC()` (directly or via `timeFromNullable` for nullable columns) before
|
||||
it leaves the adapter. The architecture-wide form of this rule lives in
|
||||
`ARCHITECTURE.md §Persistence Backends → Timestamp handling`.
|
||||
|
||||
**Why.** PG_PLAN.md §5 originally specified `_ms` epoch-millisecond
|
||||
columns. User Service Stage 3 and Mail Service Stage 4 already use
|
||||
`timestamptz` for every table and the runtime contract tests expect
|
||||
Go-level `time.Time` semantics throughout. Keeping the same shape across
|
||||
services reduces adapter-layer complexity and avoids two parallel encoding
|
||||
paths in the notificationstore. The deviation from the literal plan is
|
||||
intentional and is documented here. The defensive `.UTC()` rule on both
|
||||
sides eliminates the class of bug where the pgx driver returns scanned
|
||||
values in `time.Local`, which silently breaks equality tests, JSON
|
||||
formatting, and comparison against pointer fields.
|
||||
|
||||
### 5. Scheduler claim is non-locking; transitions use optimistic concurrency on `updated_at`
|
||||
|
||||
**Decision.** `ListDueRoutes(ctx, now, limit)` is a non-locking
|
||||
`SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS
|
||||
NOT NULL AND next_attempt_at <= $1 ORDER BY next_attempt_at ASC LIMIT $2`.
|
||||
The publisher then takes a Redis lease (`route_leases:*`), reads the
|
||||
route, emits the outbound stream entry, and calls one of
|
||||
`CompleteRoutePublished` / `CompleteRouteFailed` /
|
||||
`CompleteRouteDeadLetter`. Each `Complete*` transaction issues
|
||||
`UPDATE routes SET ... WHERE notification_id = $a AND route_id = $b AND
|
||||
updated_at = $expectedUpdatedAt`; a zero `RowsAffected` count surfaces as
|
||||
`routestate.ErrConflict`, which the publisher treats as a no-op (some other
|
||||
replica progressed the row since the worker loaded it).
|
||||
|
||||
**Why.** A `FOR UPDATE` held across the publisher's whole publish window
|
||||
would serialise concurrent publishers and block the outbound stream emit.
|
||||
Per-row optimistic concurrency on `updated_at` keeps the lock duration
|
||||
inside the SQL transaction itself; the lease bounds duplicates atop that.
|
||||
The explicit `next_attempt_at` column (set to `NULL` for terminal states)
|
||||
keeps the partial index `routes_due_idx` narrow and avoids the "schedule
|
||||
out of sync with row" failure mode of the previous Redis ZSET +
|
||||
JSON-payload pair.
|
||||
|
||||
### 6. Outbound XADD precedes SQL completion (at-least-once across the dual-system boundary)
|
||||
|
||||
**Decision.** The publisher emits the outbound stream entry through
|
||||
`*redis.Client.XAdd` *before* the route's SQL state transition is
|
||||
committed. If the XADD succeeds and the SQL update later fails, the next
|
||||
replica retries — same notification gets a second outbound entry; the
|
||||
consumer side (Gateway, Mail) deduplicates on the entry id. If the XADD
|
||||
fails, `recordFailure` records a publication failure with classification
|
||||
`gateway_stream_publish_failed` or `mail_stream_publish_failed` and
|
||||
schedules a retry.
|
||||
|
||||
**Why.** PG_PLAN.md §5 explicitly endorses this ordering by saying the
|
||||
lease is "atop the SQL claim" rather than replacing it. The lease bounds
|
||||
duplicate emission to one replica per route per lease window; the
|
||||
consumer-side dedupe handles the rare cross-window case. A transactional
|
||||
outbox would solve the duplicate but is out of Stage 5 scope; revisit if
|
||||
duplicate-traffic ever becomes an operational concern.
|
||||
|
||||
### 7. Lease stays on Redis as a hint
|
||||
|
||||
**Decision.** The lease key `notification:route_leases:<notificationID>:<routeID>`
|
||||
keeps its existing SETNX/Lua-release semantics, lifted into a dedicated
|
||||
`redisstate.LeaseStore`. The composite
|
||||
`internal/adapters/postgres/routepublisher.Store` wires the SQL state
|
||||
store and the Redis lease store behind the existing publisher-worker
|
||||
interfaces (`PushRouteStateStore`, `EmailRouteStateStore`).
|
||||
|
||||
**Why.** PG_PLAN.md §5 retains the lease as a "short-lived, per-process
|
||||
exclusivity hint atop the SQL claim". Without the lease, two replicas
|
||||
selecting overlapping due batches would each XADD before either commits
|
||||
the SQL transition — duplicating outbound traffic during contention. The
|
||||
lease bounds emission rate to one-per-route-per-lease-TTL even when scans
|
||||
overlap. Keeping the abstraction inside `LeaseStore` (separate from the
|
||||
SQL store) keeps the architectural split visible.
|
||||
|
||||
### 8. Periodic SQL retention replaces Redis EXPIRE
|
||||
|
||||
**Decision.** A new `worker.SQLRetentionWorker` runs the two DELETE
|
||||
statements driven by config:
|
||||
|
||||
- `DELETE FROM records WHERE accepted_at < now() - $record_retention`
|
||||
cascades to `routes` and `dead_letters` via `ON DELETE CASCADE`.
|
||||
- `DELETE FROM malformed_intents WHERE recorded_at < now() -
|
||||
$malformed_intent_retention` is a standalone retention pass.
|
||||
|
||||
Three new env vars (`NOTIFICATION_RECORD_RETENTION`,
|
||||
`NOTIFICATION_MALFORMED_INTENT_RETENTION`,
|
||||
`NOTIFICATION_CLEANUP_INTERVAL`) drive the worker.
|
||||
`NOTIFICATION_IDEMPOTENCY_TTL` survives unchanged: the service layer
|
||||
materialises it on each row as `idempotency_expires_at`.
|
||||
|
||||
**Why.** PostgreSQL maintains its own indexes; the previous per-key Redis
|
||||
EXPIRE TTL semantics translate to a periodic batch DELETE. The two-knob
|
||||
shape mirrors Mail Stage 4 (`MAIL_DELIVERY_RETENTION` +
|
||||
`MAIL_MALFORMED_COMMAND_RETENTION`). The legacy
|
||||
`NOTIFICATION_RECORD_TTL` / `NOTIFICATION_DEAD_LETTER_TTL` env vars are
|
||||
intentionally retired without a backward-compat shim — keeping the names
|
||||
would mislead operators reading the runbook because the eviction
|
||||
mechanism genuinely changed.
|
||||
|
||||
### 9. Shared Redis client with consumer-driven shutdown
|
||||
|
||||
**Decision.** `internal/app/runtime.go` constructs one
|
||||
`redisconn.NewMasterClient(cfg.Redis.Conn)` (via the thin
|
||||
`redisadapter.NewClient` wrapper) and passes it to the intent consumer,
|
||||
the lease store, the stream offset store, and both publishers (for their
|
||||
outbound XADDs). The runtime cleanup tolerates `redis.ErrClosed` so a
|
||||
double-close from any consumer is benign.
|
||||
|
||||
**Why.** Each subsequent PG_PLAN stage (Lobby) ships a similar pattern;
|
||||
sharing one client is the shape we want all stages to converge on. A
|
||||
dedicated client per consumer is the artefact the Redis-only architecture
|
||||
needed; sharing one client multiplies fewer TCP connections, ping points,
|
||||
and OpenTelemetry instrumentation hooks for no functional benefit.
|
||||
|
||||
### 10. Query layer is `go-jet/jet/v2`
|
||||
|
||||
**Decision.** All `notificationstore` packages build SQL through the
|
||||
jet builder API (`pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE` plus
|
||||
the `pg.AND/OR/SET/MIN/COUNT/...` DSL). `cmd/jetgen` (invoked via
|
||||
`make jet`) brings up a transient PostgreSQL container, applies the
|
||||
embedded migrations, and runs
|
||||
`github.com/go-jet/jet/v2/generator/postgres.GenerateDB` against the
|
||||
provisioned schema; the generated table/model code lives under
|
||||
`internal/adapters/postgres/jet/notification/{model,table}/*.go` and
|
||||
is committed to the repo, so build consumers do not need Docker.
|
||||
Statements are run through the `database/sql` API
|
||||
(`stmt.Sql() → db/tx.Exec/Query/QueryRow`); manual `rowScanner`
|
||||
helpers preserve the codecs.go boundary translations and domain-type
|
||||
mapping.
|
||||
|
||||
**Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer:
|
||||
`github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives
|
||||
under each service `internal/adapters/postgres/jet/`, regenerated via
|
||||
a `make jet` target and committed to the repo"). Constructs the jet
|
||||
builder does not cover natively (`MIN(timestamptz)` aggregates,
|
||||
optimistic-concurrency `WHERE updated_at = $expected`, JSONB params)
|
||||
are expressed through the per-DSL helpers (`pg.MIN(...)`,
|
||||
`pg.TimestampzT(...)`, direct `[]byte`/string params for JSONB
|
||||
columns).
|
||||
|
||||
## Cross-References
|
||||
|
||||
- `PG_PLAN.md §5` (Stage 5 — Notification Service migration).
|
||||
- `ARCHITECTURE.md §Persistence Backends`.
|
||||
- `internal/adapters/postgres/migrations/00001_init.sql` and
|
||||
`internal/adapters/postgres/migrations/migrations.go`.
|
||||
- `internal/adapters/postgres/notificationstore/{store,records,routes,
|
||||
acceptance,scheduler,dead_letters,malformed_intents,retention,codecs,
|
||||
helpers}.go` plus the testcontainers-backed unit suite under
|
||||
`notificationstore/{harness,store}_test.go`.
|
||||
- `internal/adapters/postgres/jet/notification/{model,table}/*.go`
|
||||
(committed generated code) plus `cmd/jetgen/main.go` and the
|
||||
`make jet` Makefile target that regenerate it.
|
||||
- `internal/adapters/postgres/routepublisher/store.go` (composite
|
||||
PG state + Redis lease behind the publisher contracts).
|
||||
- `internal/service/routestate/types.go` (storage-agnostic value types).
|
||||
- `internal/config/{config,env}.go` (`PostgresConfig` plus the
|
||||
`redisconn.Config`-shaped `RedisConfig` envelope).
|
||||
- `internal/app/runtime.go` (shared Redis client + PG pool open + migration
|
||||
+ notificationstore wiring + retention worker startup).
|
||||
- `internal/worker/sqlretention.go` (periodic SQL retention worker).
|
||||
- `internal/adapters/redisstate/{keyspace,codecs,errors,lease_store,
|
||||
stream_offset_store}.go` (surviving slim Redis surface).
|
||||
- `integration/internal/harness/notificationservice.go`
|
||||
(per-suite Postgres container + `notification`/`notificationservice`
|
||||
provisioning).
|
||||
Reference in New Issue
Block a user