# PostgreSQL Migration PG_PLAN.md §6A migrated the four core enrollment entities of Game Lobby Service — `Game`, `Application`, `Invite`, `Membership` — from Redis-only durable storage to the steady-state Redis + PostgreSQL split codified in `ARCHITECTURE.md §Persistence Backends`. PG_PLAN.md §6B then moved the Race Name Directory onto PostgreSQL, retiring the Redis Lua scripts and canonical-lookup cache that backed it. PG_PLAN.md §6C confirmed which runtime-coordination state intentionally stays on Redis (per-game `game_turn_stats`, `gap_activated_at`, `capability_evaluation:done:*`, `stream_offsets:*`, plus the event-bus streams themselves) and pruned the remaining redisstate keyspace. This document records the schema decisions and the non-obvious agreements behind them. Use it together with the migration scripts under `internal/adapters/postgres/migrations/` and the runtime wiring (`internal/app/runtime.go`). ## Outcomes - Schema `lobby` (provisioned externally) holds four tables: `games`, `applications`, `invites`, `memberships`. A partial UNIQUE index on `applications(applicant_user_id, game_id) WHERE status <> 'rejected'` enforces the single-active-application constraint at the database level. - The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`, applies embedded goose migrations strictly before any HTTP listener becomes ready, and exits non-zero when migration or ping fails. - The runtime opens one shared `*redis.Client` via `pkg/redisconn.NewMasterClient` and passes it to the Race Name Directory adapter, the per-game stats / gap-activation / evaluation-guard / stream-offset stores, the consumer pipelines, and the notification-intent publisher. - The Redis adapter package (`internal/adapters/redisstate/`) keeps the surviving stores (`racenamedir`, `gameturnstatsstore`, `gapactivationstore`, `evaluationguardstore`, `streamoffsetstore`, `streamlagprobe`) and the keyspace methods that back them; the game/application/invite/membership stores, codecs, tests, and per-record TTL constants are gone. - Configuration drops `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_USERNAME`, `LOBBY_REDIS_TLS_ENABLED` and introduces `LOBBY_REDIS_MASTER_ADDR`, `LOBBY_REDIS_REPLICA_ADDRS`, `LOBBY_REDIS_PASSWORD`, `LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_POSTGRES_REPLICA_DSNS`, plus the standard `LOBBY_POSTGRES_*` pool tuning knobs. Setting either of the two retired Redis env vars now fails fast at startup via the shared `pkg/redisconn.LoadFromEnv` rejection path. ## Decisions ### 1. One schema, externally-provisioned role **Decision.** The `lobby` schema and the matching `lobbyservice` role are created outside the migration sequence (in tests, by `integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`; in production, by an ops init script not in scope for this stage). The embedded migration `00001_init.sql` only contains DDL for tables and indexes and assumes it runs as the schema owner with `search_path=lobby`. **Why.** Mirrors the precedent set by Notification Stage 5 and Mail Stage 4 and matches the schema-per-service architectural rule (`ARCHITECTURE.md §Persistence Backends`). Mixing role + schema + table DDL into one script would force every consumer of the migration to run as a superuser; splitting them lines up with the operational split (ops provisions roles and schemas, the service applies schema-scoped migrations). ### 2. Single-active application = partial UNIQUE on `applications` **Decision.** `applications` carries a partial UNIQUE index on `(applicant_user_id, game_id) WHERE status <> 'rejected'`. INSERT attempts that violate the constraint are surfaced to the service layer as `application.ErrConflict` via the shared `sqlx.IsUniqueViolation` helper. **Why.** Replaces the Redis lookup key `lobby:user_game_application:*:*` with a deterministic database-level invariant. Multiple `rejected` rows are intentionally allowed (one applicant may submit, get rejected, and resubmit), and the UNIQUE only fires on the second simultaneous submitted/approved row for the same `(user, game)`. The constraint is race-safe: under concurrent submission attempts one INSERT wins, the others fail with conflict. ### 3. Public games carry an empty `owner_user_id`; partial index excludes them **Decision.** `games.owner_user_id` is `text NOT NULL DEFAULT ''`, and the secondary `games_owner_idx` is partial: `WHERE game_type = 'private'`. Public games (admin-owned) carry an empty owner string and are excluded from the index entirely. **Why.** Mirrors the previous Redis behaviour where `games_by_owner:*` sets were created only for private games. The partial index keeps the owner lookup tight (only private-game rows participate) while letting the column stay non-nullable and consistent with the domain model. ### 4. JSONB columns for runtime snapshot and runtime binding **Decision.** `games.runtime_snapshot` is `jsonb NOT NULL DEFAULT '{}'::jsonb`; `games.runtime_binding` is `jsonb NULL`. The JSON shapes used inside both columns are stable and live in `internal/adapters/postgres/gamestore/codecs.go`. `runtime_binding` binds NULL when the domain pointer is nil, otherwise an object with `container_id`, `engine_endpoint`, `runtime_job_id`, `bound_at_ms` fields. **Why.** Both fields are opaque to queries — Lobby never element-filters on their internals. JSONB matches the "everything outside primary fields is JSON" pattern Notification Stage 5 already established and allows a future GIN index without a schema rewrite. The `bound_at_ms` field inside the binding stays in Unix milliseconds so the encoded payload is naked-comparable across Redis and PostgreSQL audits during the transition window. ### 5. Optimistic concurrency via current-status compare-and-swap **Decision.** `UpdateStatus` on every store is implemented as `UPDATE … WHERE id = $X AND status = $expected`. A zero-rows result is disambiguated with a follow-up `SELECT status` probe — missing rows map to the per-domain `ErrNotFound`, mismatches map to `ErrConflict`. Snapshot/binding overrides on `games` use the same pattern but only guard on the primary key (no expected-status gate). **Why.** Mirrors the previous Redis WATCH/TxPipelined behaviour without holding a `SELECT … FOR UPDATE` lock across application logic. The compare-and-swap is local to one statement, never spans more than one network round trip, and produces the same observable error semantics the service layer already depends on. ### 6. Memberships store `race_name` and `canonical_key` side by side **Decision.** `memberships` carries both `race_name` (original casing) and `canonical_key` (policy-derived form) as separate `text NOT NULL` columns. There is no UNIQUE constraint on `canonical_key`. **Why.** Downstream consumers — capability evaluation and the user-lifecycle cascade — read the canonical form directly without re-deriving it from `race_name`, which is the same arrangement the Redis JSON record had. Race-name uniqueness across the platform remains the responsibility of the Race Name Directory; enforcing a UNIQUE on memberships' canonical_key now would duplicate the RND invariant and create deadlock potential between the two stores. ### 7. ON DELETE CASCADE from games to children **Decision.** Each child table (`applications`, `invites`, `memberships`) declares its `game_id` as `REFERENCES games(game_id) ON DELETE CASCADE`. **Why.** Lobby code never deletes games today — every status terminal is a soft state — so the cascade has no live trigger. It exists for two future paths: scheduled cleanup of `cancelled` games far past retention, and explicit operator/test resets. CASCADE keeps those paths trivial and free of dangling references. ### 8. Listing order: most-recent-first for games, oldest-first for child tables **Decision.** `GetByStatus` and `GetByOwner` on `games` order by `created_at DESC, game_id DESC`. The per-game/per-user listings on `applications`, `invites`, `memberships` order by `created_at ASC, ASC` (memberships order by `joined_at ASC`). **Why.** Game listings serve user-facing feeds where most-recent-first is the natural expectation, matching the previous Redis sorted-set score and the `accounts.created_at DESC` convention from User Stage 3. Child-table listings serve administrative and cascade flows where the chronological order helps operators reason about the sequence of events. The ports doc explicitly says "order is adapter-defined", so either convention is contract-compatible. ### 9. Heavy `runtime_test.go` / `runtime_smoke_test.go` deleted; integration coverage **Decision.** The service-local `internal/app/runtime_test.go` and `runtime_smoke_test.go` were removed. Black-box runtime coverage moves to the `integration/lobbyuser` and `integration/lobbynotification` suites, which now spin up both a PostgreSQL container (via `harness.StartLobbyServicePersistence`) and the existing Redis container. **Why.** Mirrors the Mail Stage 4 / Notification Stage 5 precedent. Booting a full Lobby runtime now requires both PostgreSQL and Redis, which is the integration-suite shape; duplicating that bootstrap inside `internal/app/` would be heavy and fragile. The remaining service-local tests cover units that do not require the full runtime. ### 10. Query layer is `go-jet/jet/v2` **Decision.** All four PG-store packages build SQL through the jet builder API (`pgtable..INSERT/SELECT/UPDATE/DELETE` plus the `pg.AND/OR/SET/COALESCE/...` DSL). Generated table models live under `internal/adapters/postgres/jet/lobby/{model,table}/` and are regenerated by `make jet` (which spins up a transient PostgreSQL via testcontainers, applies the embedded goose migrations, and runs jet's generator). Generated code is committed. **Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer: `github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives under each service `internal/adapters/postgres/jet/`, regenerated via a `make jet` target and committed to the repo"). PostgreSQL constructs that the jet builder does not cover natively (`FOR UPDATE`, `COALESCE`, `LOWER` on subselects, JSONB params) are expressed through the per-DSL helpers (`.FOR(pg.UPDATE())`, `pg.COALESCE`, `pg.LOWER`, direct `[]byte`/string params for JSONB columns). Manual `rowScanner` helpers (`scanGame`, `scanApplication`, `scanInvite`, `scanMembership`) preserve the codecs.go boundary translations and domain-type mapping; jet only owns SQL construction. ## Out of scope for §6A - Read routing through `LOBBY_POSTGRES_REPLICA_DSNS` — config exposes the field, runtime ignores it. - Production provisioning of the `lobby` schema and `lobbyservice` role — operational concern handled outside the service binary. ## §6B — Race Name Directory on PostgreSQL §6B replaces the Redis-backed Race Name Directory (one Lua script + a canonical-lookup cache + a pending-index ZSET + per-binding string keys) with a single PostgreSQL table `race_names` whose rows back all three binding kinds (`registered`, `reservation`, `pending_registration`). The `race_names` DDL lives in `00001_init.sql` next to the four core enrollment tables (it was originally introduced as a separate `00002_race_names.sql`; PG_PLAN.md §9 collapsed the two files into one init migration during the pre-launch development window). The adapter `internal/adapters/postgres/racenamedir/directory.go` is the canonical reference; the architecture rule is unchanged from §6A. ### 11. One table, composite primary key `(canonical_key, game_id)` **Decision.** `race_names` carries one row per binding under the composite primary key `(canonical_key, game_id)`. Reservations and pending_registrations write the actual game id; registered rows write `game_id = ''` and keep the source game in `source_game_id`. A partial UNIQUE index on `(canonical_key)` filtered to `binding_kind = 'registered'` enforces the single-registered-per-canonical rule. **Why.** PG_PLAN.md §6B sketched the table as `(canonical_key PK, …)`, but the existing port semantics (`testReserveCrossGame`, `testReleaseReservationKeepsCrossGame` in `internal/ports/racenamedirtest/suite.go`) require the same user to hold several per-game reservations on one canonical key concurrently. A flat single-PK table cannot model that without losing the per-game identity. The composite PK matches both invariants — at most one row per (canonical, game) and at most one registered row per canonical — without splitting the data into two tables (which would force every write operation to touch two unrelated indexes and reproduce the old canonical-lookup cache invariant manually). ### 12. Concurrency: PostgreSQL transactional advisory locks **Decision.** Every write operation (`Reserve`, `MarkPendingRegistration`, `Register`, `ReleaseReservation`, the per-row branch of `ExpirePendingRegistrations`) opens a `BEGIN; …; COMMIT` and acquires `pg_advisory_xact_lock(hashtextextended($canonical_key, 0))` as the very first statement. The lock auto-releases on commit or rollback. `ReleaseAllByUser` is a single `DELETE WHERE holder_user_id = $1` and takes no advisory lock — it runs on permanent_blocked / deleted lifecycle events, so the user being deleted cannot be a concurrent writer on those bindings. **Why.** PG_PLAN.md §6B explicitly authorised either `SELECT … FOR UPDATE` or advisory locks. `SELECT … FOR UPDATE` cannot serialize against not-yet-existing rows (e.g. concurrent first-time `Reserve`s for the same canonical), so advisory locks are required for race-free INSERTs. Hashing through `hashtextextended` produces a 64-bit lock key covering arbitrary canonical strings, sidestepping `bigint` truncation that older `hashtext` exposes. Holding the lock for one transaction keeps the contention surface tight and matches the Notification §5 "narrow CAS, no application-logic-bound row locks" precedent. ### 13. `binding_kind` values match `ports.Kind*` verbatim **Decision.** `race_names.binding_kind` stores `"registered"`, `"reservation"`, or `"pending_registration"` — the same string literals exported by `ports.KindRegistered`, `ports.KindReservation`, `ports.KindPendingRegistration`. The adapter returns the raw value directly through `Availability.Kind` without translation. A `CHECK` constraint on the column rejects anything else. **Why.** Avoids one boundary translation and one synonym ("reserved" vs "reservation") that the Redis adapter carried internally as `reservationStatusReserved = "reserved"`. With the port-equivalent literals on disk, future operator-side queries (`SELECT … WHERE binding_kind = 'reservation'`) match the Go-level constants 1:1, and the adapter saves a `switch` per `Check` call. ### 14. `Check` returns the strongest binding via in-process priority **Decision.** `Check` issues `SELECT holder_user_id, binding_kind FROM race_names WHERE canonical_key = $1` and picks the strongest binding in Go using a priority rank `registered > pending_registration > reservation`. There is no SQL `CASE` expression in the ORDER BY. **Why.** The dataset per canonical is bounded (at most one registered + one row per active game) and is read frequently by every `Check`. The Go-side rank avoids a SQL DSL detour that go-jet/v2 would express via raw SQL anyway, and it keeps the query plan a single index scan on `canonical_key`. ### 15. `ExpirePendingRegistrations` scans then locks per row **Decision.** The expirer first runs an indexed scan `WHERE binding_kind = 'pending_registration' AND eligible_until_ms <= $cutoff` (served by `race_names_pending_eligible_idx`), then re-reads each candidate inside its own advisory-locked transaction, asserts the binding is still pending and still expired, and DELETEs it. Concurrent `Register` or `ReleaseReservation` simply causes the per-row branch to skip without error. **Why.** Mirrors the Redis adapter's two-phase `ZRANGEBYSCORE` + per- member release loop. A bulk `DELETE … WHERE eligible_until_ms <= …` would not produce the per-entry `ports.ExpiredPending` slice the worker needs for telemetry, and would race with `Register` (which targets the same row). ### 16. Shared port test suite stays on PostgreSQL via a serial harness **Decision.** The shared `racenamedirtest` suite no longer calls `t.Parallel()` from its subtests. Every subtest goes through the factory, the factory truncates the lobby tables and constructs a fresh adapter against the package-shared testcontainers PostgreSQL. **Why.** The PostgreSQL adapter relies on `pgtest.TruncateAll` between factory invocations; running subtests in parallel against one shared container would race truncate against other subtests' INSERTs. Spinning up a per-subtest schema would multiply container provisioning cost significantly (PG generation step alone takes minutes per fresh container), and the suite is fast enough serially. The Redis-only backend retired in §6B no longer needs the parallelism either; only the in-process stub remains in scope and has trivial setup cost. ## §6C — Workers, ephemeral stores, cleanup §6C closes the Lobby migration: it confirms what intentionally stays on Redis, prunes the dead Redis adapter code, and finalises the service-layer documentation. ### 17. Workers stayed on ports — no functional change **Decision.** The four Lobby workers (`pendingregistration`, `gmevents`, `runtimejobresult`, `userlifecycle`) and the `enrollmentautomation` worker shipped in §6A already consume their storage through ports. After §6B the `RaceNameDirectory` port resolves to the PostgreSQL adapter; no worker required code changes. **Why.** §6A established the port-on-storage seam for `GameStore`, `ApplicationStore`, `InviteStore`, `MembershipStore`. §6B kept the same contract for `RaceNameDirectory`. Worker logic depends on the contract, not the backend, so the migration completes via a wiring switch in `internal/app/wiring.go::buildRaceNameDirectory` without re-touching worker code. ### 18. `redisstate` retains only runtime-coordination adapters **Decision.** After §6C the `internal/adapters/redisstate/` package implements only `GameTurnStatsStore`, `GapActivationStore`, `EvaluationGuardStore`, `StreamOffsetStore`, and the `StreamLagProbe`. The legacy `racenamedir.go`, `racenamedir_lua.go`, `racenamedir_test.go`, `codecs_racename.go`, and the dead game codecs (`codecs.go`'s `MarshalGame`/`UnmarshalGame`) are removed. The `Keyspace` type only builds keys for the surviving adapters (`GapActivatedAt`, `StreamOffset`, `GameTurnStat`, `GameTurnStatsByGame`, `CapabilityEvaluationGuard`). **Why.** Architectural rule (`ARCHITECTURE.md §Persistence Backends`): Redis owns runtime-coordination state, PostgreSQL owns durable business state. The retained Redis stores back ephemeral per-game aggregates (`game_turn_stats`), short-lived sentinels (`gap_activated_at`, `capability_evaluation:done:*`), and the consumer-offset coordination state (`stream_offsets:*`) — all rebuildable or losable without durability impact. Streams stay on Redis because they *are* the event bus. ### 19. Default Race Name Directory backend is `postgres` **Decision.** `LOBBY_RACE_NAME_DIRECTORY_BACKEND` defaults to `"postgres"`. The accepted values are `postgres` (production) and `stub` (in-process for unit tests that do not need a real PostgreSQL). The `redis` value, the corresponding `RaceNameDirectoryBackendRedis` constant, and the wiring branch are removed. **Why.** The Redis adapter is gone; keeping the value in the validator would produce a misleading "configuration accepted, but startup fails when wiring resolves the directory" path. Leaving `stub` as a valid backend lets per-service unit tests run against a small, fast in-process directory; integration suites use `postgres` via the testcontainers harness.