|
|
|
@@ -0,0 +1,265 @@
|
|
|
|
|
# PostgreSQL Migration
|
|
|
|
|
|
|
|
|
|
PG_PLAN.md §5 migrated `galaxy/notification` from a Redis-only durable store
|
|
|
|
|
to the steady-state split codified in `ARCHITECTURE.md §Persistence
|
|
|
|
|
Backends`: PostgreSQL is the source of truth for table-shaped notification
|
|
|
|
|
state, and Redis keeps only the inbound `notification:intents` stream, the
|
|
|
|
|
two outbound streams (`gateway:client-events`, `mail:delivery_commands`),
|
|
|
|
|
the persisted consumer offset, and the short-lived per-route exclusivity
|
|
|
|
|
lease.
|
|
|
|
|
|
|
|
|
|
This document records the schema decisions and the non-obvious agreements
|
|
|
|
|
behind them. Use it together with the migration script
|
|
|
|
|
(`internal/adapters/postgres/migrations/00001_init.sql`) and the runtime
|
|
|
|
|
wiring (`internal/app/runtime.go`).
|
|
|
|
|
|
|
|
|
|
## Outcomes
|
|
|
|
|
|
|
|
|
|
- Schema `notification` (provisioned externally) holds the durable state:
|
|
|
|
|
`records`, `routes`, `dead_letters`, `malformed_intents`.
|
|
|
|
|
- The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`,
|
|
|
|
|
applies embedded goose migrations strictly before any HTTP listener
|
|
|
|
|
becomes ready, and exits non-zero when migration or ping fails.
|
|
|
|
|
- The runtime opens one shared `*redis.Client` via
|
|
|
|
|
`pkg/redisconn.NewMasterClient` and passes it to the intent consumer, the
|
|
|
|
|
publishers (outbound XADDs), the route lease store, and the persisted
|
|
|
|
|
stream offset store.
|
|
|
|
|
- The Redis adapter package (`internal/adapters/redisstate/`) is reduced to
|
|
|
|
|
the surviving `LeaseStore`, `StreamOffsetStore`, and a slim `Keyspace`
|
|
|
|
|
exposing only `RouteLease(notificationID, routeID)`,
|
|
|
|
|
`StreamOffset(stream)`, and `Intents()`. The Lua-backed atomic writer,
|
|
|
|
|
the route-state mutation scripts, the records/routes/idempotency/dead-
|
|
|
|
|
letters/malformed-intents keyspace, and the per-record TTL constants are
|
|
|
|
|
gone.
|
|
|
|
|
- Configuration drops `NOTIFICATION_REDIS_USERNAME` /
|
|
|
|
|
`NOTIFICATION_REDIS_TLS_ENABLED` / `NOTIFICATION_REDIS_ADDR` and
|
|
|
|
|
introduces `NOTIFICATION_REDIS_MASTER_ADDR` /
|
|
|
|
|
`NOTIFICATION_REDIS_REPLICA_ADDRS` plus `NOTIFICATION_POSTGRES_*`. The
|
|
|
|
|
retention knobs `NOTIFICATION_RECORD_TTL` /
|
|
|
|
|
`NOTIFICATION_DEAD_LETTER_TTL` are renamed to
|
|
|
|
|
`NOTIFICATION_RECORD_RETENTION` /
|
|
|
|
|
`NOTIFICATION_MALFORMED_INTENT_RETENTION`, and a new
|
|
|
|
|
`NOTIFICATION_CLEANUP_INTERVAL` drives the periodic SQL retention
|
|
|
|
|
worker.
|
|
|
|
|
|
|
|
|
|
## Decisions
|
|
|
|
|
|
|
|
|
|
### 1. One schema, externally-provisioned role
|
|
|
|
|
|
|
|
|
|
**Decision.** The `notification` schema and the matching
|
|
|
|
|
`notificationservice` role are created outside the migration sequence (in
|
|
|
|
|
tests, by
|
|
|
|
|
`integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`;
|
|
|
|
|
in production, by an ops init script not in scope for this stage). The
|
|
|
|
|
embedded migration `00001_init.sql` only contains DDL for tables and
|
|
|
|
|
indexes and assumes it runs as the schema owner with
|
|
|
|
|
`search_path=notification`.
|
|
|
|
|
|
|
|
|
|
**Why.** Mixing role creation, schema creation, and table DDL into one
|
|
|
|
|
script forces every consumer of the migration to run as a superuser. The
|
|
|
|
|
schema-per-service architectural rule
|
|
|
|
|
(`ARCHITECTURE.md §Persistence Backends`) lines up neatly with the
|
|
|
|
|
operational split: ops provisions roles and schemas, the service applies
|
|
|
|
|
schema-scoped migrations.
|
|
|
|
|
|
|
|
|
|
### 2. Idempotency record IS the records row
|
|
|
|
|
|
|
|
|
|
**Decision.** The `records` table carries `producer`, `idempotency_key`,
|
|
|
|
|
`request_fingerprint`, and `idempotency_expires_at` columns and a
|
|
|
|
|
`UNIQUE (producer, idempotency_key)` constraint. Acceptance flows insert
|
|
|
|
|
the row directly; a duplicate request races on the UNIQUE constraint and
|
|
|
|
|
surfaces as `acceptintent.ErrConflict`. There is no separate idempotency
|
|
|
|
|
table.
|
|
|
|
|
|
|
|
|
|
**Why.** PG_PLAN.md §3 fixed this rule for every PG-backed service. With
|
|
|
|
|
the reservation living on the durable record, recovery is a single fact —
|
|
|
|
|
the row either exists or it does not — so no Redis-loss window can make a
|
|
|
|
|
duplicate sneak through. The `records.accepted_at` value doubles as the
|
|
|
|
|
`IdempotencyRecord.CreatedAt` returned to the service layer.
|
|
|
|
|
|
|
|
|
|
### 3. `recipient_user_ids` as JSONB
|
|
|
|
|
|
|
|
|
|
**Decision.** `records.recipient_user_ids` stores the normalized recipient
|
|
|
|
|
user-id list as a JSONB column. The codec round-trips a nil slice as `[]`
|
|
|
|
|
to keep the column NOT NULL while letting the read path return a nil slice
|
|
|
|
|
when the audience is not user-targeted.
|
|
|
|
|
|
|
|
|
|
**Why.** The list is opaque to queries (we never element-filter on it).
|
|
|
|
|
JSONB lines up with the "everything outside primary fields is JSON"
|
|
|
|
|
pattern Mail Stage 4 already established; PostgreSQL will accept a future
|
|
|
|
|
GIN index on `recipient_user_ids jsonb_path_ops` if a recipient-filtered
|
|
|
|
|
operator UI ever lands. `text[]` would have forced a `pgtype.Array[string]`
|
|
|
|
|
boundary type and a different scan path with no functional benefit today.
|
|
|
|
|
|
|
|
|
|
### 4. Timestamps are uniformly `timestamptz` and always UTC at the boundary
|
|
|
|
|
|
|
|
|
|
**Decision.** Every time-valued column on every Stage 5 table uses
|
|
|
|
|
PostgreSQL's `timestamptz`. The domain model continues to use `time.Time`;
|
|
|
|
|
the adapter normalises every `time.Time` parameter to UTC at the binding
|
|
|
|
|
site (`record.X.UTC()` or the `nullableTime` helper that wraps a possibly
|
|
|
|
|
zero-valued `time.Time`), and re-wraps every scanned `time.Time` with
|
|
|
|
|
`.UTC()` (directly or via `timeFromNullable` for nullable columns) before
|
|
|
|
|
it leaves the adapter. The architecture-wide form of this rule lives in
|
|
|
|
|
`ARCHITECTURE.md §Persistence Backends → Timestamp handling`.
|
|
|
|
|
|
|
|
|
|
**Why.** PG_PLAN.md §5 originally specified `_ms` epoch-millisecond
|
|
|
|
|
columns. User Service Stage 3 and Mail Service Stage 4 already use
|
|
|
|
|
`timestamptz` for every table and the runtime contract tests expect
|
|
|
|
|
Go-level `time.Time` semantics throughout. Keeping the same shape across
|
|
|
|
|
services reduces adapter-layer complexity and avoids two parallel encoding
|
|
|
|
|
paths in the notificationstore. The deviation from the literal plan is
|
|
|
|
|
intentional and is documented here. The defensive `.UTC()` rule on both
|
|
|
|
|
sides eliminates the class of bug where the pgx driver returns scanned
|
|
|
|
|
values in `time.Local`, which silently breaks equality tests, JSON
|
|
|
|
|
formatting, and comparison against pointer fields.
|
|
|
|
|
|
|
|
|
|
### 5. Scheduler claim is non-locking; transitions use optimistic concurrency on `updated_at`
|
|
|
|
|
|
|
|
|
|
**Decision.** `ListDueRoutes(ctx, now, limit)` is a non-locking
|
|
|
|
|
`SELECT notification_id, route_id FROM routes WHERE next_attempt_at IS
|
|
|
|
|
NOT NULL AND next_attempt_at <= $1 ORDER BY next_attempt_at ASC LIMIT $2`.
|
|
|
|
|
The publisher then takes a Redis lease (`route_leases:*`), reads the
|
|
|
|
|
route, emits the outbound stream entry, and calls one of
|
|
|
|
|
`CompleteRoutePublished` / `CompleteRouteFailed` /
|
|
|
|
|
`CompleteRouteDeadLetter`. Each `Complete*` transaction issues
|
|
|
|
|
`UPDATE routes SET ... WHERE notification_id = $a AND route_id = $b AND
|
|
|
|
|
updated_at = $expectedUpdatedAt`; a zero `RowsAffected` count surfaces as
|
|
|
|
|
`routestate.ErrConflict`, which the publisher treats as a no-op (some other
|
|
|
|
|
replica progressed the row since the worker loaded it).
|
|
|
|
|
|
|
|
|
|
**Why.** A `FOR UPDATE` held across the publisher's whole publish window
|
|
|
|
|
would serialise concurrent publishers and block the outbound stream emit.
|
|
|
|
|
Per-row optimistic concurrency on `updated_at` keeps the lock duration
|
|
|
|
|
inside the SQL transaction itself; the lease bounds duplicates atop that.
|
|
|
|
|
The explicit `next_attempt_at` column (set to `NULL` for terminal states)
|
|
|
|
|
keeps the partial index `routes_due_idx` narrow and avoids the "schedule
|
|
|
|
|
out of sync with row" failure mode of the previous Redis ZSET +
|
|
|
|
|
JSON-payload pair.
|
|
|
|
|
|
|
|
|
|
### 6. Outbound XADD precedes SQL completion (at-least-once across the dual-system boundary)
|
|
|
|
|
|
|
|
|
|
**Decision.** The publisher emits the outbound stream entry through
|
|
|
|
|
`*redis.Client.XAdd` *before* the route's SQL state transition is
|
|
|
|
|
committed. If the XADD succeeds and the SQL update later fails, the next
|
|
|
|
|
replica retries — same notification gets a second outbound entry; the
|
|
|
|
|
consumer side (Gateway, Mail) deduplicates on the entry id. If the XADD
|
|
|
|
|
fails, `recordFailure` records a publication failure with classification
|
|
|
|
|
`gateway_stream_publish_failed` or `mail_stream_publish_failed` and
|
|
|
|
|
schedules a retry.
|
|
|
|
|
|
|
|
|
|
**Why.** PG_PLAN.md §5 explicitly endorses this ordering by saying the
|
|
|
|
|
lease is "atop the SQL claim" rather than replacing it. The lease bounds
|
|
|
|
|
duplicate emission to one replica per route per lease window; the
|
|
|
|
|
consumer-side dedupe handles the rare cross-window case. A transactional
|
|
|
|
|
outbox would solve the duplicate but is out of Stage 5 scope; revisit if
|
|
|
|
|
duplicate-traffic ever becomes an operational concern.
|
|
|
|
|
|
|
|
|
|
### 7. Lease stays on Redis as a hint
|
|
|
|
|
|
|
|
|
|
**Decision.** The lease key `notification:route_leases:<notificationID>:<routeID>`
|
|
|
|
|
keeps its existing SETNX/Lua-release semantics, lifted into a dedicated
|
|
|
|
|
`redisstate.LeaseStore`. The composite
|
|
|
|
|
`internal/adapters/postgres/routepublisher.Store` wires the SQL state
|
|
|
|
|
store and the Redis lease store behind the existing publisher-worker
|
|
|
|
|
interfaces (`PushRouteStateStore`, `EmailRouteStateStore`).
|
|
|
|
|
|
|
|
|
|
**Why.** PG_PLAN.md §5 retains the lease as a "short-lived, per-process
|
|
|
|
|
exclusivity hint atop the SQL claim". Without the lease, two replicas
|
|
|
|
|
selecting overlapping due batches would each XADD before either commits
|
|
|
|
|
the SQL transition — duplicating outbound traffic during contention. The
|
|
|
|
|
lease bounds emission rate to one-per-route-per-lease-TTL even when scans
|
|
|
|
|
overlap. Keeping the abstraction inside `LeaseStore` (separate from the
|
|
|
|
|
SQL store) keeps the architectural split visible.
|
|
|
|
|
|
|
|
|
|
### 8. Periodic SQL retention replaces Redis EXPIRE
|
|
|
|
|
|
|
|
|
|
**Decision.** A new `worker.SQLRetentionWorker` runs the two DELETE
|
|
|
|
|
statements driven by config:
|
|
|
|
|
|
|
|
|
|
- `DELETE FROM records WHERE accepted_at < now() - $record_retention`
|
|
|
|
|
cascades to `routes` and `dead_letters` via `ON DELETE CASCADE`.
|
|
|
|
|
- `DELETE FROM malformed_intents WHERE recorded_at < now() -
|
|
|
|
|
$malformed_intent_retention` is a standalone retention pass.
|
|
|
|
|
|
|
|
|
|
Three new env vars (`NOTIFICATION_RECORD_RETENTION`,
|
|
|
|
|
`NOTIFICATION_MALFORMED_INTENT_RETENTION`,
|
|
|
|
|
`NOTIFICATION_CLEANUP_INTERVAL`) drive the worker.
|
|
|
|
|
`NOTIFICATION_IDEMPOTENCY_TTL` survives unchanged: the service layer
|
|
|
|
|
materialises it on each row as `idempotency_expires_at`.
|
|
|
|
|
|
|
|
|
|
**Why.** PostgreSQL maintains its own indexes; the previous per-key Redis
|
|
|
|
|
EXPIRE TTL semantics translate to a periodic batch DELETE. The two-knob
|
|
|
|
|
shape mirrors Mail Stage 4 (`MAIL_DELIVERY_RETENTION` +
|
|
|
|
|
`MAIL_MALFORMED_COMMAND_RETENTION`). The legacy
|
|
|
|
|
`NOTIFICATION_RECORD_TTL` / `NOTIFICATION_DEAD_LETTER_TTL` env vars are
|
|
|
|
|
intentionally retired without a backward-compat shim — keeping the names
|
|
|
|
|
would mislead operators reading the runbook because the eviction
|
|
|
|
|
mechanism genuinely changed.
|
|
|
|
|
|
|
|
|
|
### 9. Shared Redis client with consumer-driven shutdown
|
|
|
|
|
|
|
|
|
|
**Decision.** `internal/app/runtime.go` constructs one
|
|
|
|
|
`redisconn.NewMasterClient(cfg.Redis.Conn)` (via the thin
|
|
|
|
|
`redisadapter.NewClient` wrapper) and passes it to the intent consumer,
|
|
|
|
|
the lease store, the stream offset store, and both publishers (for their
|
|
|
|
|
outbound XADDs). The runtime cleanup tolerates `redis.ErrClosed` so a
|
|
|
|
|
double-close from any consumer is benign.
|
|
|
|
|
|
|
|
|
|
**Why.** Each subsequent PG_PLAN stage (Lobby) ships a similar pattern;
|
|
|
|
|
sharing one client is the shape we want all stages to converge on. A
|
|
|
|
|
dedicated client per consumer is the artefact the Redis-only architecture
|
|
|
|
|
needed; sharing one client multiplies fewer TCP connections, ping points,
|
|
|
|
|
and OpenTelemetry instrumentation hooks for no functional benefit.
|
|
|
|
|
|
|
|
|
|
### 10. Query layer is `go-jet/jet/v2`
|
|
|
|
|
|
|
|
|
|
**Decision.** All `notificationstore` packages build SQL through the
|
|
|
|
|
jet builder API (`pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE` plus
|
|
|
|
|
the `pg.AND/OR/SET/MIN/COUNT/...` DSL). `cmd/jetgen` (invoked via
|
|
|
|
|
`make jet`) brings up a transient PostgreSQL container, applies the
|
|
|
|
|
embedded migrations, and runs
|
|
|
|
|
`github.com/go-jet/jet/v2/generator/postgres.GenerateDB` against the
|
|
|
|
|
provisioned schema; the generated table/model code lives under
|
|
|
|
|
`internal/adapters/postgres/jet/notification/{model,table}/*.go` and
|
|
|
|
|
is committed to the repo, so build consumers do not need Docker.
|
|
|
|
|
Statements are run through the `database/sql` API
|
|
|
|
|
(`stmt.Sql() → db/tx.Exec/Query/QueryRow`); manual `rowScanner`
|
|
|
|
|
helpers preserve the codecs.go boundary translations and domain-type
|
|
|
|
|
mapping.
|
|
|
|
|
|
|
|
|
|
**Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer:
|
|
|
|
|
`github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives
|
|
|
|
|
under each service `internal/adapters/postgres/jet/`, regenerated via
|
|
|
|
|
a `make jet` target and committed to the repo"). Constructs the jet
|
|
|
|
|
builder does not cover natively (`MIN(timestamptz)` aggregates,
|
|
|
|
|
optimistic-concurrency `WHERE updated_at = $expected`, JSONB params)
|
|
|
|
|
are expressed through the per-DSL helpers (`pg.MIN(...)`,
|
|
|
|
|
`pg.TimestampzT(...)`, direct `[]byte`/string params for JSONB
|
|
|
|
|
columns).
|
|
|
|
|
|
|
|
|
|
## Cross-References
|
|
|
|
|
|
|
|
|
|
- `PG_PLAN.md §5` (Stage 5 — Notification Service migration).
|
|
|
|
|
- `ARCHITECTURE.md §Persistence Backends`.
|
|
|
|
|
- `internal/adapters/postgres/migrations/00001_init.sql` and
|
|
|
|
|
`internal/adapters/postgres/migrations/migrations.go`.
|
|
|
|
|
- `internal/adapters/postgres/notificationstore/{store,records,routes,
|
|
|
|
|
acceptance,scheduler,dead_letters,malformed_intents,retention,codecs,
|
|
|
|
|
helpers}.go` plus the testcontainers-backed unit suite under
|
|
|
|
|
`notificationstore/{harness,store}_test.go`.
|
|
|
|
|
- `internal/adapters/postgres/jet/notification/{model,table}/*.go`
|
|
|
|
|
(committed generated code) plus `cmd/jetgen/main.go` and the
|
|
|
|
|
`make jet` Makefile target that regenerate it.
|
|
|
|
|
- `internal/adapters/postgres/routepublisher/store.go` (composite
|
|
|
|
|
PG state + Redis lease behind the publisher contracts).
|
|
|
|
|
- `internal/service/routestate/types.go` (storage-agnostic value types).
|
|
|
|
|
- `internal/config/{config,env}.go` (`PostgresConfig` plus the
|
|
|
|
|
`redisconn.Config`-shaped `RedisConfig` envelope).
|
|
|
|
|
- `internal/app/runtime.go` (shared Redis client + PG pool open + migration
|
|
|
|
|
+ notificationstore wiring + retention worker startup).
|
|
|
|
|
- `internal/worker/sqlretention.go` (periodic SQL retention worker).
|
|
|
|
|
- `internal/adapters/redisstate/{keyspace,codecs,errors,lease_store,
|
|
|
|
|
stream_offset_store}.go` (surviving slim Redis surface).
|
|
|
|
|
- `integration/internal/harness/notificationservice.go`
|
|
|
|
|
(per-suite Postgres container + `notification`/`notificationservice`
|
|
|
|
|
provisioning).
|