10 KiB
PostgreSQL Migration
PG_PLAN.md §3 migrated galaxy/user from a Redis-only durable store to the
steady-state split codified in ARCHITECTURE.md §Persistence Backends:
PostgreSQL is the source of truth for table-shaped business state, and Redis
keeps only the two streams that publish auxiliary domain events
(user:domain_events) and trusted user-lifecycle events
(user:lifecycle_events).
This document records the schema decisions and the non-obvious agreements
behind them. Use it together with the migration script
(internal/adapters/postgres/migrations/00001_init.sql) and the runtime
wiring (internal/app/runtime.go).
Outcomes
- Schema
user(provisioned externally) holds the durable state:accounts,blocked_emails,entitlement_records,entitlement_snapshots,sanction_records,sanction_active,limit_records,limit_active. - The runtime opens one PostgreSQL pool via
pkg/postgres.OpenPrimary, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails. - The runtime opens one shared
*redis.Clientviapkg/redisconn.NewMasterClientand passes it to both stream publishers (internal/adapters/redis/domainevents,internal/adapters/redis/lifecycleevents); the publishers no longer hold their own connection topology fields. internal/adapters/redis/userstore/and the entireinternal/adapters/redisstate/package are removed. The Redis Lua scripts, Watch/Multi optimistic-concurrency loops, and ZSET indexes are gone.- Configuration drops
USERSERVICE_REDIS_USERNAME,USERSERVICE_REDIS_TLS_ENABLED, andUSERSERVICE_REDIS_KEYSPACE_PREFIX.USERSERVICE_REDIS_ADDRis replaced byUSERSERVICE_REDIS_MASTER_ADDR+ optionalUSERSERVICE_REDIS_REPLICA_ADDRS. Postgres-specific knobs live underUSERSERVICE_POSTGRES_*per the architectural rule.
Decisions
1. One schema, externally-provisioned role
Decision. The user schema and the matching userservice role are
created outside the migration sequence (in tests, by
integration/internal/harness/postgres_container.go::EnsureRoleAndSchema;
in production, by an ops init script not in scope for this stage). The
embedded migration 00001_init.sql only contains DDL for tables and
indexes and assumes it runs as the schema owner with search_path=user.
Why. Mixing role creation, schema creation, and table DDL into one
script forces every consumer of the migration to run as a superuser. The
schema-per-service architectural rule
(ARCHITECTURE.md §Persistence Backends) lines up neatly with the
operational split: ops provisions roles and schemas, the service applies
schema-scoped migrations.
2. entitlement_snapshots stays denormalised
Decision. A dedicated entitlement_snapshots table holds exactly one
row per user_id mirroring the current effective fields (plan_code,
is_paid, starts_at, ends_at, source, actor_*, reason_code,
updated_at). Lifecycle operations (Grant, Extend, Revoke,
RepairExpired) write the history row and the snapshot row inside one
transaction.
Why. The lobby-eligibility hot-path reads exactly one row per user; a
JOIN over entitlement_records to compute the current segment would add
latency and wire-format complexity. Keeping the snapshot denormalised
matches the previous Redis shape where the hot read returned a
pre-materialised JSON blob, which preserves the existing service-layer
contract and the public REST envelope.
3. sanction_active / limit_active are the source of truth for "active"
Decision. The active state of a sanction or a user-specific limit is
expressed by a small dedicated table (sanction_active, limit_active)
whose primary key is (user_id, code). Each row references the matching
history record by record_id. Lifecycle operations maintain both tables
inside one transaction.
Why. The lobby-eligibility hot path needs to enumerate active
sanctions/limits without scanning the full history. Encoding "active"
as a partial index on removed_at IS NULL would still require dedup
because a user can apply, remove, and re-apply the same code. Two narrow
tables let the same predicates that the Redis adapter encoded as
active keys remain index-only.
4. Eligibility flags are computed predicates, not stored columns
Decision. No can_login, can_create_private_game, can_join_game
columns or indexes exist. The admin listing surface (and the lobby
eligibility snapshot) compute these from entitlement_snapshots and
sanction_active at read time.
Why. Stage 21 expanded the eligibility marker catalogue and Stage 22
added permanent_block. Each addition would have required schema work
plus a backfill if eligibility flags were materialised columns. Computed
predicates push that complexity into one place — the SQL query — and
keep the schema small.
5. Atomic flows use explicit BEGIN … COMMIT with per-row FOR UPDATE
Decision. Composite operations (AuthDirectoryStore.{Resolve, Ensure, Block*}, EntitlementLifecycleStore.{Grant, Extend, Revoke, RepairExpired}, PolicyLifecycleStore.{ApplySanction, RemoveSanction, SetLimit, RemoveLimit}) execute inside store.withTx and acquire row
locks with SELECT … FOR UPDATE on the rows they intend to mutate.
Optimistic-replacement guards (Expected*Record, Expected*Snapshot)
are validated against the locked rows before the write goes through;
mismatches surface as ports.ErrConflict.
Why. PostgreSQL's default READ COMMITTED isolation plus row-level
locks gives us the serialisation property the previous Redis
WATCH/MULTI loops achieved without needing the application to retry on
optimistic-failure errors. The explicit FOR UPDATE keeps intent
visible; ad-hoc CTE patterns would obscure the locking shape.
6. Query layer is go-jet/jet/v2
Decision. All userstore packages build SQL through the jet
builder API (pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE plus the
pg.AND/OR/SET/... DSL). cmd/jetgen (invoked via make jet) brings
up a transient PostgreSQL container, applies the embedded migrations,
and runs github.com/go-jet/jet/v2/generator/postgres.GenerateDB
against the provisioned schema; the generated table/model code lives
under internal/adapters/postgres/jet/user/{model,table}/*.go and is
committed to the repo, so build consumers do not need Docker.
Statements are run through the database/sql API
(stmt.Sql() → db.Exec/Query/QueryRow); manual rowScanner helpers
preserve domain-type marshalling.
Why. Aligns with PG_PLAN.md §Library stack ("Query layer:
github.com/go-jet/jet/v2 (PostgreSQL dialect). Generated code lives
under each service internal/adapters/postgres/jet/, regenerated via
a make jet target and committed to the repo"). Constructs the jet
builder does not cover natively (FOR UPDATE, keyset-pagination
row-comparison, partial UNIQUE WHERE in CREATE INDEX) are expressed
through the per-DSL helpers (.FOR(pg.UPDATE()), OR/AND expansion
of (created_at, user_id) < (…)). The ports contract and the schema
do not change.
7. Redis publishers share one *redis.Client
Decision. internal/app/runtime.go constructs one
redisconn.NewMasterClient(cfg.Redis.Conn) and passes it to both
domainevents.New(client, cfg) and lifecycleevents.New(client, cfg). The publishers no longer carry connection-topology fields and
no longer close the client; the runtime owns it.
Why. Each subsequent PG_PLAN stage (Mail, Notification, Lobby) ships a similar duo of stream publishers; sharing one client is the shape we want all stages to converge on. Per-publisher clients multiplied TCP connections, ping points, and OpenTelemetry instrumentation hooks for no functional benefit.
8. Mandatory Redis password in tests as well
Decision. Unit tests for the publishers configure
miniredis.RequireAuth("integration") and pass a matching password
through their direct redis.NewClient(...) construction. The runtime
contract test
(runtime_contract_test.go::newRuntimeContractHarness) does the same
plus boots a Postgres container.
Why. The architectural rule forbids password-less Redis connections; carrying the constraint into tests prevents the rule from drifting.
9. Listing surface keeps storage-thin pagination
Decision. UserListStore.ListUserIDs paginates only on
(created_at DESC, user_id DESC) with keyset cursors carried by the
opaque page token. Filter matrix evaluation (paid_state,
declared_country, sanction_code, limit_code, can_*) is performed by
the service-layer adminusers.Lister, which loads each candidate
through the per-user loader. This mirrors the previous Redis
behaviour exactly.
Why. Pushing the filter matrix into SQL is desirable — it eliminates
candidate over-fetching — but doing it without changing the public
UserListStore.ListUserIDs contract (which returns a page of
UserID, not full records) requires a JOIN-driven query. That work
is a non-breaking optimisation and is intentionally deferred so this
stage focuses on the storage cut-over rather than throughput
improvements. The page-token wire format is preserved bit-for-bit so
already-issued tokens keep working.
Cross-References
PG_PLAN.md §3(Stage 3 — User Service migration / pilot).ARCHITECTURE.md §Persistence Backends.internal/adapters/postgres/migrations/00001_init.sqlandinternal/adapters/postgres/migrations/migrations.go.internal/adapters/postgres/userstore/{store,accounts,blocked_emails, auth_directory,entitlement_store,policy_store,list_store,page_token, helpers}.goplus the testcontainers-backed unit suite underuserstore/{harness,store}_test.go.internal/adapters/postgres/jet/user/{model,table}/*.go(committed generated code) pluscmd/jetgen/main.goand themake jetMakefile target that regenerate it.internal/config/config.go(PostgresConfig,RedisConfigreshape).internal/app/runtime.go(PG pool open + migration + shared Redis client wiring).internal/adapters/redis/{domainevents,lifecycleevents}/publisher.go(refactored to accept the shared*redis.Client).runtime_contract_test.go::startPostgresForContractTest(shows the inline Postgres bootstrap used by the existing runtime contract).