12 KiB
PostgreSQL Migration
PG_PLAN.md §4 migrated galaxy/mail from a Redis-only durable store to the
steady-state split codified in ARCHITECTURE.md §Persistence Backends:
PostgreSQL is the source of truth for table-shaped business state, and Redis
keeps only the inbound mail:delivery_commands stream and its persisted
consumer offset.
This document records the schema decisions and the non-obvious agreements
behind them. Use it together with the migration script
(internal/adapters/postgres/migrations/00001_init.sql) and the runtime
wiring (internal/app/runtime.go).
Outcomes
- Schema
mail(provisioned externally) holds the durable state:deliveries,delivery_recipients,attempts,dead_letters,delivery_payloads,malformed_commands. - The runtime opens one PostgreSQL pool via
pkg/postgres.OpenPrimary, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails. - The runtime opens one shared
*redis.Clientviapkg/redisconn.NewMasterClientand passes it to the command consumer and the stream offset store; both stores no longer hold their own connection topology fields. - The Redis adapter package (
internal/adapters/redisstate/) is reduced to the survivingStreamOffsetStoreplus a slimKeyspaceexposing onlyStreamOffset(stream)andDeliveryCommands(). The Lua-backed atomic writer, the secondary index keys, the recipient/template/status indexes, the idempotency keyspace, and the per-record TTL constants are gone. - Configuration drops
MAIL_REDIS_USERNAME,MAIL_REDIS_TLS_ENABLED,MAIL_REDIS_ATTEMPT_SCHEDULE_KEY,MAIL_REDIS_DEAD_LETTER_PREFIX,MAIL_DELIVERY_TTL, andMAIL_ATTEMPT_TTL.MAIL_REDIS_ADDRbecomesMAIL_REDIS_MASTER_ADDR+ optionalMAIL_REDIS_REPLICA_ADDRS. PostgreSQL-specific knobs live underMAIL_POSTGRES_*. New retention knobs (MAIL_DELIVERY_RETENTION,MAIL_MALFORMED_COMMAND_RETENTION,MAIL_CLEANUP_INTERVAL) drive a periodic SQL retention worker.
Decisions
1. One schema, externally-provisioned role
Decision. The mail schema and the matching mailservice role are
created outside the migration sequence (in tests, by
integration/internal/harness/postgres_container.go::EnsureRoleAndSchema;
in production, by an ops init script not in scope for this stage). The
embedded migration 00001_init.sql only contains DDL for tables and
indexes and assumes it runs as the schema owner with search_path=mail.
Why. Mixing role creation, schema creation, and table DDL into one
script forces every consumer of the migration to run as a superuser. The
schema-per-service architectural rule
(ARCHITECTURE.md §Persistence Backends) lines up neatly with the
operational split: ops provisions roles and schemas, the service applies
schema-scoped migrations.
2. Idempotency record IS the deliveries row
Decision. The deliveries table carries source,
idempotency_key, request_fingerprint, and idempotency_expires_at
columns and a UNIQUE (source, idempotency_key) constraint. Acceptance
flows insert the row directly; a duplicate request races on the UNIQUE
constraint and surfaces as acceptauthdelivery.ErrConflict /
acceptgenericdelivery.ErrConflict. There is no separate idempotency
table.
Why. PG_PLAN.md §3 fixed this rule for every PG-backed service. With
the reservation living on the durable record, recovery is a single fact
("the row either exists or it does not"); no Redis-loss window can make a
duplicate sneak through. Resend deliveries store an empty
request_fingerprint and a synthetic far-future idempotency_expires_at;
the read helper treats those rows as non-idempotent so future operator
queries cannot mistake a clone for a hit.
3. Recipients live in a normalised side table
Decision. A delivery_recipients(delivery_id, kind, position, email)
table stores envelope addresses with a kind CHECK constraint
('to'|'cc'|'bcc'|'reply_to') and an email index that excludes
reply_to. The deliveries row does not embed envelope JSON.
Why. PG_PLAN.md §4 prescribed INDEX on … recipient as needed. A
normalised table makes future recipient-filtered listing slot in without
schema work and lets the existing operator listing implement the
recipient filter as delivery_id IN (SELECT … FROM delivery_recipients WHERE … lower(email) = lower($1)). The Redis adapter previously
maintained one index key per recipient — the same observable behaviour
now comes for free from the PostgreSQL row layout plus a single index.
4. Timestamps are uniformly timestamptz and always UTC at the boundary
Decision. Every time-valued column on every Stage 4 table uses
PostgreSQL's timestamptz. The domain model continues to use
time.Time / *time.Time; the adapter normalises every time.Time
parameter to UTC at the binding site (record.X.UTC() or the
nullableTime helper that wraps *time.Time), and re-wraps every
scanned time.Time with .UTC() (directly or via timeFromNullable)
before it leaves the adapter. The architecture-wide form of this rule
lives in ARCHITECTURE.md §Persistence Backends → Timestamp handling.
Why. PG_PLAN.md §4 originally specified mixed naming
(timestamptz on deliveries, bigint epoch-ms on attempts/dead_letters/
malformed_commands). User Service Stage 3 already uses timestamptz for
every table and the runtime contract tests expect Go-level time.Time
semantics throughout. Keeping the same shape across services reduces
adapter-layer complexity and avoids two parallel encoding paths in the
mailstore. The deviation from the literal plan is intentional and is
documented here. The defensive UTC rule on both sides eliminates the
class of bug where the pgx driver returns scanned values in
time.Local, which silently breaks equality tests, JSON formatting,
and comparison against pointer fields.
5. Attempt scheduler reads via FOR UPDATE SKIP LOCKED
Decision. The attempt scheduler uses two indexed predicates:
SELECT delivery_id FROM deliveries WHERE next_attempt_at IS NOT NULL AND next_attempt_at <= $now ORDER BY next_attempt_at ASC LIMIT $nto surface due deliveries (partial indexdeliveries_due_idx).SELECT … FROM deliveries WHERE delivery_id = $id AND status IN ('queued','rendered') AND next_attempt_at IS NOT NULL AND next_attempt_at <= $now FOR UPDATE SKIP LOCKEDinside the claim transaction.
The next_attempt_at column is maintained explicitly: acceptance and
attempt-commit transactions write it from the active scheduled attempt;
claim sets it to NULL (the row is sending and the row stops being a
scheduling candidate); a recovery commit re-populates it for the next
attempt.
Why. FOR UPDATE SKIP LOCKED lets multiple scheduler instances run
concurrently without serialising work on a single sorted set. Maintaining
next_attempt_at in lockstep with the active attempt keeps the partial
index small and avoids reading attempt rows during the hot-path schedule
query. The previous Redis ZSET sort key was implicit; the SQL column is
explicit, which removes a class of "the index is out of sync with the
record" bugs that Lua-coordinated mutations made possible.
6. Recovery uses the most-recent attempt by exact attempt_no
Decision. LoadWorkItem(deliveryID) reads the delivery row and then
the attempt row whose attempt_no = delivery.attempt_count. Concurrent
commits that update the count and insert a new attempt are tolerated:
the load lookup uses an exact key and never observes a partial state.
Why. A naive ORDER BY attempt_no DESC LIMIT 1 racing against a
commit that already wrote the next attempt but had not yet committed
the parent delivery row could observe attempt_no=count+1 while the
delivery still reports count. Keying the read by the count
deterministically returns the delivery's view of its own active attempt
even under concurrent worker progress.
7. Periodic SQL retention replaces Redis index cleanup
Decision. A new worker.SQLRetentionWorker runs the two DELETE
statements driven by config:
DELETE FROM deliveries WHERE created_at < now() - $delivery_retentioncascades toattempts,dead_letters,delivery_payloads, anddelivery_recipientsviaON DELETE CASCADE.DELETE FROM malformed_commands WHERE recorded_at < now() - $malformed_retentionis a standalone retention pass.
Three new env vars (MAIL_DELIVERY_RETENTION, MAIL_MALFORMED_COMMAND_RETENTION,
MAIL_CLEANUP_INTERVAL) drive the worker. MAIL_IDEMPOTENCY_TTL survives
unchanged: it controls the per-acceptance idempotency_expires_at column
the service layer materialises on each row.
Why. PostgreSQL maintains its own indexes; the previous
redisstate.IndexCleaner had nothing to do once secondary index keys
were gone. A per-table retention worker is the simplest model that keeps
the mail database from accumulating audit history forever, while leaving
the per-acceptance idempotency window controlled by its existing knob.
8. Shared Redis client with consumer-driven shutdown
Decision. internal/app/runtime.go constructs one
redisconn.NewMasterClient(cfg.Redis.Conn) and passes it to both the
stream offset store and the command consumer. The consumer's Shutdown
closes the shared client to break the in-flight blocking XREAD; the
runtime's cleanup function tolerates redis.ErrClosed so a double-close
is benign.
Why. Each subsequent PG_PLAN stage (Notification, Lobby) ships a similar pattern; sharing one client is the shape we want all stages to converge on. The dedicated client for the consumer was an artefact of the Redis-only architecture and multiplied TCP connections, ping points, and OpenTelemetry instrumentation hooks for no functional benefit.
9. Query layer is go-jet/jet/v2
Decision. All mailstore packages build SQL through the jet
builder API (pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE plus the
pg.AND/OR/SET/IN/... DSL). cmd/jetgen (invoked via make jet)
brings up a transient PostgreSQL container, applies the embedded
migrations, and runs
github.com/go-jet/jet/v2/generator/postgres.GenerateDB against the
provisioned schema; the generated table/model code lives under
internal/adapters/postgres/jet/mail/{model,table}/*.go and is
committed to the repo, so build consumers do not need Docker.
Statements are run through the database/sql API
(stmt.Sql() → db/tx.Exec/Query/QueryRow); manual scanners preserve
the codecs.go boundary translations and domain-type mapping.
Why. Aligns with PG_PLAN.md §Library stack ("Query layer:
github.com/go-jet/jet/v2 (PostgreSQL dialect). Generated code lives
under each service internal/adapters/postgres/jet/, regenerated via
a make jet target and committed to the repo"). Constructs the jet
builder does not cover natively (FOR UPDATE, FOR UPDATE SKIP LOCKED, keyset-pagination row-comparison, JSONB params,
LOWER(...) on subselects) are expressed through the per-DSL helpers
(.FOR(pg.UPDATE()), .FOR(pg.UPDATE().SKIP_LOCKED()), pg.LOWER,
OR/AND expansion of cursor predicates).
Cross-References
PG_PLAN.md §4(Stage 4 — Mail Service migration).ARCHITECTURE.md §Persistence Backends.internal/adapters/postgres/migrations/00001_init.sqlandinternal/adapters/postgres/migrations/migrations.go.internal/adapters/postgres/mailstore/{store,deliveries, auth_acceptance,generic_acceptance,render,operator, attempt_execution,malformed_command,codecs,helpers}.goplus the testcontainers-backed unit suite undermailstore/{harness,store}_test.go.internal/adapters/postgres/jet/mail/{model,table}/*.go(committed generated code) pluscmd/jetgen/main.goand themake jetMakefile target that regenerate it.internal/config/{config,env,validation}.go(PostgresConfig + theredisconn.Config-shaped Redis envelope).internal/app/{runtime,bootstrap}.go(shared Redis client + PG pool open + migration + mailstore wiring).internal/worker/sqlretention.go(periodic SQL retention worker).internal/adapters/redisstate/{keyspace,offset_codec,stream_offset_store}.go(surviving slim Redis surface).integration/internal/harness/mailservice.go(per-suite Postgres container +mail/mailserviceprovisioning).