Files
galaxy-game/user/docs/postgres-migration.md
T
2026-04-26 20:34:39 +02:00

10 KiB

PostgreSQL Migration

PG_PLAN.md §3 migrated galaxy/user from a Redis-only durable store to the steady-state split codified in ARCHITECTURE.md §Persistence Backends: PostgreSQL is the source of truth for table-shaped business state, and Redis keeps only the two streams that publish auxiliary domain events (user:domain_events) and trusted user-lifecycle events (user:lifecycle_events).

This document records the schema decisions and the non-obvious agreements behind them. Use it together with the migration script (internal/adapters/postgres/migrations/00001_init.sql) and the runtime wiring (internal/app/runtime.go).

Outcomes

  • Schema user (provisioned externally) holds the durable state: accounts, blocked_emails, entitlement_records, entitlement_snapshots, sanction_records, sanction_active, limit_records, limit_active.
  • The runtime opens one PostgreSQL pool via pkg/postgres.OpenPrimary, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails.
  • The runtime opens one shared *redis.Client via pkg/redisconn.NewMasterClient and passes it to both stream publishers (internal/adapters/redis/domainevents, internal/adapters/redis/lifecycleevents); the publishers no longer hold their own connection topology fields.
  • internal/adapters/redis/userstore/ and the entire internal/adapters/redisstate/ package are removed. The Redis Lua scripts, Watch/Multi optimistic-concurrency loops, and ZSET indexes are gone.
  • Configuration drops USERSERVICE_REDIS_USERNAME, USERSERVICE_REDIS_TLS_ENABLED, and USERSERVICE_REDIS_KEYSPACE_PREFIX. USERSERVICE_REDIS_ADDR is replaced by USERSERVICE_REDIS_MASTER_ADDR + optional USERSERVICE_REDIS_REPLICA_ADDRS. Postgres-specific knobs live under USERSERVICE_POSTGRES_* per the architectural rule.

Decisions

1. One schema, externally-provisioned role

Decision. The user schema and the matching userservice role are created outside the migration sequence (in tests, by integration/internal/harness/postgres_container.go::EnsureRoleAndSchema; in production, by an ops init script not in scope for this stage). The embedded migration 00001_init.sql only contains DDL for tables and indexes and assumes it runs as the schema owner with search_path=user.

Why. Mixing role creation, schema creation, and table DDL into one script forces every consumer of the migration to run as a superuser. The schema-per-service architectural rule (ARCHITECTURE.md §Persistence Backends) lines up neatly with the operational split: ops provisions roles and schemas, the service applies schema-scoped migrations.

2. entitlement_snapshots stays denormalised

Decision. A dedicated entitlement_snapshots table holds exactly one row per user_id mirroring the current effective fields (plan_code, is_paid, starts_at, ends_at, source, actor_*, reason_code, updated_at). Lifecycle operations (Grant, Extend, Revoke, RepairExpired) write the history row and the snapshot row inside one transaction.

Why. The lobby-eligibility hot-path reads exactly one row per user; a JOIN over entitlement_records to compute the current segment would add latency and wire-format complexity. Keeping the snapshot denormalised matches the previous Redis shape where the hot read returned a pre-materialised JSON blob, which preserves the existing service-layer contract and the public REST envelope.

3. sanction_active / limit_active are the source of truth for "active"

Decision. The active state of a sanction or a user-specific limit is expressed by a small dedicated table (sanction_active, limit_active) whose primary key is (user_id, code). Each row references the matching history record by record_id. Lifecycle operations maintain both tables inside one transaction.

Why. The lobby-eligibility hot path needs to enumerate active sanctions/limits without scanning the full history. Encoding "active" as a partial index on removed_at IS NULL would still require dedup because a user can apply, remove, and re-apply the same code. Two narrow tables let the same predicates that the Redis adapter encoded as active keys remain index-only.

4. Eligibility flags are computed predicates, not stored columns

Decision. No can_login, can_create_private_game, can_join_game columns or indexes exist. The admin listing surface (and the lobby eligibility snapshot) compute these from entitlement_snapshots and sanction_active at read time.

Why. Stage 21 expanded the eligibility marker catalogue and Stage 22 added permanent_block. Each addition would have required schema work plus a backfill if eligibility flags were materialised columns. Computed predicates push that complexity into one place — the SQL query — and keep the schema small.

5. Atomic flows use explicit BEGIN … COMMIT with per-row FOR UPDATE

Decision. Composite operations (AuthDirectoryStore.{Resolve, Ensure, Block*}, EntitlementLifecycleStore.{Grant, Extend, Revoke, RepairExpired}, PolicyLifecycleStore.{ApplySanction, RemoveSanction, SetLimit, RemoveLimit}) execute inside store.withTx and acquire row locks with SELECT … FOR UPDATE on the rows they intend to mutate. Optimistic-replacement guards (Expected*Record, Expected*Snapshot) are validated against the locked rows before the write goes through; mismatches surface as ports.ErrConflict.

Why. PostgreSQL's default READ COMMITTED isolation plus row-level locks gives us the serialisation property the previous Redis WATCH/MULTI loops achieved without needing the application to retry on optimistic-failure errors. The explicit FOR UPDATE keeps intent visible; ad-hoc CTE patterns would obscure the locking shape.

6. Query layer is go-jet/jet/v2

Decision. All userstore packages build SQL through the jet builder API (pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE plus the pg.AND/OR/SET/... DSL). cmd/jetgen (invoked via make jet) brings up a transient PostgreSQL container, applies the embedded migrations, and runs github.com/go-jet/jet/v2/generator/postgres.GenerateDB against the provisioned schema; the generated table/model code lives under internal/adapters/postgres/jet/user/{model,table}/*.go and is committed to the repo, so build consumers do not need Docker. Statements are run through the database/sql API (stmt.Sql() → db.Exec/Query/QueryRow); manual rowScanner helpers preserve domain-type marshalling.

Why. Aligns with PG_PLAN.md §Library stack ("Query layer: github.com/go-jet/jet/v2 (PostgreSQL dialect). Generated code lives under each service internal/adapters/postgres/jet/, regenerated via a make jet target and committed to the repo"). Constructs the jet builder does not cover natively (FOR UPDATE, keyset-pagination row-comparison, partial UNIQUE WHERE in CREATE INDEX) are expressed through the per-DSL helpers (.FOR(pg.UPDATE()), OR/AND expansion of (created_at, user_id) < (…)). The ports contract and the schema do not change.

7. Redis publishers share one *redis.Client

Decision. internal/app/runtime.go constructs one redisconn.NewMasterClient(cfg.Redis.Conn) and passes it to both domainevents.New(client, cfg) and lifecycleevents.New(client, cfg). The publishers no longer carry connection-topology fields and no longer close the client; the runtime owns it.

Why. Each subsequent PG_PLAN stage (Mail, Notification, Lobby) ships a similar duo of stream publishers; sharing one client is the shape we want all stages to converge on. Per-publisher clients multiplied TCP connections, ping points, and OpenTelemetry instrumentation hooks for no functional benefit.

8. Mandatory Redis password in tests as well

Decision. Unit tests for the publishers configure miniredis.RequireAuth("integration") and pass a matching password through their direct redis.NewClient(...) construction. The runtime contract test (runtime_contract_test.go::newRuntimeContractHarness) does the same plus boots a Postgres container.

Why. The architectural rule forbids password-less Redis connections; carrying the constraint into tests prevents the rule from drifting.

9. Listing surface keeps storage-thin pagination

Decision. UserListStore.ListUserIDs paginates only on (created_at DESC, user_id DESC) with keyset cursors carried by the opaque page token. Filter matrix evaluation (paid_state, declared_country, sanction_code, limit_code, can_*) is performed by the service-layer adminusers.Lister, which loads each candidate through the per-user loader. This mirrors the previous Redis behaviour exactly.

Why. Pushing the filter matrix into SQL is desirable — it eliminates candidate over-fetching — but doing it without changing the public UserListStore.ListUserIDs contract (which returns a page of UserID, not full records) requires a JOIN-driven query. That work is a non-breaking optimisation and is intentionally deferred so this stage focuses on the storage cut-over rather than throughput improvements. The page-token wire format is preserved bit-for-bit so already-issued tokens keep working.

Cross-References

  • PG_PLAN.md §3 (Stage 3 — User Service migration / pilot).
  • ARCHITECTURE.md §Persistence Backends.
  • internal/adapters/postgres/migrations/00001_init.sql and internal/adapters/postgres/migrations/migrations.go.
  • internal/adapters/postgres/userstore/{store,accounts,blocked_emails, auth_directory,entitlement_store,policy_store,list_store,page_token, helpers}.go plus the testcontainers-backed unit suite under userstore/{harness,store}_test.go.
  • internal/adapters/postgres/jet/user/{model,table}/*.go (committed generated code) plus cmd/jetgen/main.go and the make jet Makefile target that regenerate it.
  • internal/config/config.go (PostgresConfig, RedisConfig reshape).
  • internal/app/runtime.go (PG pool open + migration + shared Redis client wiring).
  • internal/adapters/redis/{domainevents,lifecycleevents}/publisher.go (refactored to accept the shared *redis.Client).
  • runtime_contract_test.go::startPostgresForContractTest (shows the inline Postgres bootstrap used by the existing runtime contract).