feat: use postgres
This commit is contained in:
+32
-14
@@ -6,10 +6,14 @@ and timestamps with values that match the deployment under inspection.
|
||||
## Example `.env`
|
||||
|
||||
A minimum-viable `LOBBY_*` set for a local run against a single Redis
|
||||
container. The full list with defaults lives in `../README.md` §Configuration.
|
||||
container plus a PostgreSQL container with the `lobby` schema and the
|
||||
`lobbyservice` role provisioned. The full list with defaults lives in
|
||||
`../README.md` §Configuration.
|
||||
|
||||
```bash
|
||||
LOBBY_REDIS_ADDR=127.0.0.1:6379
|
||||
LOBBY_REDIS_MASTER_ADDR=127.0.0.1:6379
|
||||
LOBBY_REDIS_PASSWORD=local
|
||||
LOBBY_POSTGRES_PRIMARY_DSN=postgres://lobbyservice:lobbyservice@127.0.0.1:5432/galaxy?search_path=lobby&sslmode=disable
|
||||
LOBBY_USER_SERVICE_BASE_URL=http://127.0.0.1:8083
|
||||
LOBBY_GM_BASE_URL=http://127.0.0.1:8096
|
||||
|
||||
@@ -19,7 +23,7 @@ LOBBY_INTERNAL_HTTP_ADDR=:8095
|
||||
LOBBY_LOG_LEVEL=info
|
||||
LOBBY_SHUTDOWN_TIMEOUT=30s
|
||||
|
||||
LOBBY_RACE_NAME_DIRECTORY_BACKEND=redis
|
||||
LOBBY_RACE_NAME_DIRECTORY_BACKEND=postgres
|
||||
LOBBY_ENROLLMENT_AUTOMATION_INTERVAL=30s
|
||||
LOBBY_RACE_NAME_EXPIRATION_INTERVAL=1h
|
||||
|
||||
@@ -115,16 +119,36 @@ curl -s http://localhost:8095/api/v1/internal/games/game-01HZ...
|
||||
curl -s http://localhost:8095/api/v1/internal/games/game-01HZ.../memberships
|
||||
```
|
||||
|
||||
## Redis Examples
|
||||
## Storage Inspection Examples
|
||||
|
||||
### Inspect a game record
|
||||
### Inspect a game record (PostgreSQL)
|
||||
|
||||
```bash
|
||||
redis-cli GET lobby:games:game-01HZ...
|
||||
psql "$LOBBY_POSTGRES_PRIMARY_DSN" -c \
|
||||
"SELECT * FROM lobby.games WHERE game_id = 'game-01HZ...'"
|
||||
```
|
||||
|
||||
The value is a strict JSON blob with the fields documented in
|
||||
`../README.md` §Game Record Model.
|
||||
The columns mirror the fields documented in `../README.md` §Game Record Model.
|
||||
|
||||
### Inspect open enrollment games (sorted by created_at)
|
||||
|
||||
```bash
|
||||
psql "$LOBBY_POSTGRES_PRIMARY_DSN" -c \
|
||||
"SELECT game_id, game_name, created_at FROM lobby.games
|
||||
WHERE status = 'enrollment_open'
|
||||
ORDER BY created_at DESC"
|
||||
```
|
||||
|
||||
### Inspect a Race Name Directory binding
|
||||
|
||||
```bash
|
||||
psql "$LOBBY_POSTGRES_PRIMARY_DSN" -c \
|
||||
"SELECT canonical_key, game_id, holder_user_id, race_name, binding_kind,
|
||||
source_game_id, eligible_until_ms, registered_at_ms
|
||||
FROM lobby.race_names WHERE race_name = 'Aurora'"
|
||||
```
|
||||
|
||||
## Redis Examples
|
||||
|
||||
### Publish a runtime job result (Runtime Manager simulation)
|
||||
|
||||
@@ -162,12 +186,6 @@ redis-cli XADD gm:lobby_events '*' \
|
||||
finished_at_ms 1714123456789
|
||||
```
|
||||
|
||||
### Inspect open enrollment games (sorted by created_at)
|
||||
|
||||
```bash
|
||||
redis-cli ZRANGE lobby:games_by_status:enrollment_open 0 -1 WITHSCORES
|
||||
```
|
||||
|
||||
## Notification Intent Format
|
||||
|
||||
Lobby produces every notification through `pkg/notificationintent` and
|
||||
|
||||
@@ -0,0 +1,386 @@
|
||||
# PostgreSQL Migration
|
||||
|
||||
PG_PLAN.md §6A migrated the four core enrollment entities of Game Lobby
|
||||
Service — `Game`, `Application`, `Invite`, `Membership` — from Redis-only
|
||||
durable storage to the steady-state Redis + PostgreSQL split codified in
|
||||
`ARCHITECTURE.md §Persistence Backends`. PG_PLAN.md §6B then moved the
|
||||
Race Name Directory onto PostgreSQL, retiring the Redis Lua scripts and
|
||||
canonical-lookup cache that backed it. PG_PLAN.md §6C confirmed which
|
||||
runtime-coordination state intentionally stays on Redis (per-game
|
||||
`game_turn_stats`, `gap_activated_at`, `capability_evaluation:done:*`,
|
||||
`stream_offsets:*`, plus the event-bus streams themselves) and pruned the
|
||||
remaining redisstate keyspace.
|
||||
|
||||
This document records the schema decisions and the non-obvious agreements
|
||||
behind them. Use it together with the migration scripts under
|
||||
`internal/adapters/postgres/migrations/` and the runtime wiring
|
||||
(`internal/app/runtime.go`).
|
||||
|
||||
## Outcomes
|
||||
|
||||
- Schema `lobby` (provisioned externally) holds four tables: `games`,
|
||||
`applications`, `invites`, `memberships`. A partial UNIQUE index on
|
||||
`applications(applicant_user_id, game_id) WHERE status <> 'rejected'`
|
||||
enforces the single-active-application constraint at the database
|
||||
level.
|
||||
- The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`,
|
||||
applies embedded goose migrations strictly before any HTTP listener
|
||||
becomes ready, and exits non-zero when migration or ping fails.
|
||||
- The runtime opens one shared `*redis.Client` via
|
||||
`pkg/redisconn.NewMasterClient` and passes it to the Race Name
|
||||
Directory adapter, the per-game stats / gap-activation /
|
||||
evaluation-guard / stream-offset stores, the consumer pipelines, and
|
||||
the notification-intent publisher.
|
||||
- The Redis adapter package (`internal/adapters/redisstate/`) keeps the
|
||||
surviving stores (`racenamedir`, `gameturnstatsstore`,
|
||||
`gapactivationstore`, `evaluationguardstore`, `streamoffsetstore`,
|
||||
`streamlagprobe`) and the keyspace methods that back them; the
|
||||
game/application/invite/membership stores, codecs, tests, and
|
||||
per-record TTL constants are gone.
|
||||
- Configuration drops `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_USERNAME`,
|
||||
`LOBBY_REDIS_TLS_ENABLED` and introduces `LOBBY_REDIS_MASTER_ADDR`,
|
||||
`LOBBY_REDIS_REPLICA_ADDRS`, `LOBBY_REDIS_PASSWORD`,
|
||||
`LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_POSTGRES_REPLICA_DSNS`, plus
|
||||
the standard `LOBBY_POSTGRES_*` pool tuning knobs. Setting either of
|
||||
the two retired Redis env vars now fails fast at startup via the
|
||||
shared `pkg/redisconn.LoadFromEnv` rejection path.
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. One schema, externally-provisioned role
|
||||
|
||||
**Decision.** The `lobby` schema and the matching `lobbyservice` role
|
||||
are created outside the migration sequence (in tests, by
|
||||
`integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`;
|
||||
in production, by an ops init script not in scope for this stage). The
|
||||
embedded migration `00001_init.sql` only contains DDL for tables and
|
||||
indexes and assumes it runs as the schema owner with
|
||||
`search_path=lobby`.
|
||||
|
||||
**Why.** Mirrors the precedent set by Notification Stage 5 and Mail
|
||||
Stage 4 and matches the schema-per-service architectural rule
|
||||
(`ARCHITECTURE.md §Persistence Backends`). Mixing role + schema + table
|
||||
DDL into one script would force every consumer of the migration to run
|
||||
as a superuser; splitting them lines up with the operational split
|
||||
(ops provisions roles and schemas, the service applies schema-scoped
|
||||
migrations).
|
||||
|
||||
### 2. Single-active application = partial UNIQUE on `applications`
|
||||
|
||||
**Decision.** `applications` carries a partial UNIQUE index on
|
||||
`(applicant_user_id, game_id) WHERE status <> 'rejected'`. INSERT
|
||||
attempts that violate the constraint are surfaced to the service layer
|
||||
as `application.ErrConflict` via the shared
|
||||
`sqlx.IsUniqueViolation` helper.
|
||||
|
||||
**Why.** Replaces the Redis lookup key `lobby:user_game_application:*:*`
|
||||
with a deterministic database-level invariant. Multiple `rejected`
|
||||
rows are intentionally allowed (one applicant may submit, get rejected,
|
||||
and resubmit), and the UNIQUE only fires on the second simultaneous
|
||||
submitted/approved row for the same `(user, game)`. The constraint is
|
||||
race-safe: under concurrent submission attempts one INSERT wins, the
|
||||
others fail with conflict.
|
||||
|
||||
### 3. Public games carry an empty `owner_user_id`; partial index excludes them
|
||||
|
||||
**Decision.** `games.owner_user_id` is `text NOT NULL DEFAULT ''`, and
|
||||
the secondary `games_owner_idx` is partial: `WHERE game_type = 'private'`.
|
||||
Public games (admin-owned) carry an empty owner string and are excluded
|
||||
from the index entirely.
|
||||
|
||||
**Why.** Mirrors the previous Redis behaviour where `games_by_owner:*`
|
||||
sets were created only for private games. The partial index keeps the
|
||||
owner lookup tight (only private-game rows participate) while letting
|
||||
the column stay non-nullable and consistent with the domain model.
|
||||
|
||||
### 4. JSONB columns for runtime snapshot and runtime binding
|
||||
|
||||
**Decision.** `games.runtime_snapshot` is `jsonb NOT NULL DEFAULT
|
||||
'{}'::jsonb`; `games.runtime_binding` is `jsonb NULL`. The JSON shapes
|
||||
used inside both columns are stable and live in
|
||||
`internal/adapters/postgres/gamestore/codecs.go`. `runtime_binding`
|
||||
binds NULL when the domain pointer is nil, otherwise an object with
|
||||
`container_id`, `engine_endpoint`, `runtime_job_id`, `bound_at_ms`
|
||||
fields.
|
||||
|
||||
**Why.** Both fields are opaque to queries — Lobby never element-filters
|
||||
on their internals. JSONB matches the "everything outside primary
|
||||
fields is JSON" pattern Notification Stage 5 already established and
|
||||
allows a future GIN index without a schema rewrite. The `bound_at_ms`
|
||||
field inside the binding stays in Unix milliseconds so the encoded
|
||||
payload is naked-comparable across Redis and PostgreSQL audits during
|
||||
the transition window.
|
||||
|
||||
### 5. Optimistic concurrency via current-status compare-and-swap
|
||||
|
||||
**Decision.** `UpdateStatus` on every store is implemented as `UPDATE …
|
||||
WHERE id = $X AND status = $expected`. A zero-rows result is
|
||||
disambiguated with a follow-up `SELECT status` probe — missing rows map
|
||||
to the per-domain `ErrNotFound`, mismatches map to `ErrConflict`.
|
||||
Snapshot/binding overrides on `games` use the same pattern but only
|
||||
guard on the primary key (no expected-status gate).
|
||||
|
||||
**Why.** Mirrors the previous Redis WATCH/TxPipelined behaviour without
|
||||
holding a `SELECT … FOR UPDATE` lock across application logic. The
|
||||
compare-and-swap is local to one statement, never spans more than one
|
||||
network round trip, and produces the same observable error semantics
|
||||
the service layer already depends on.
|
||||
|
||||
### 6. Memberships store `race_name` and `canonical_key` side by side
|
||||
|
||||
**Decision.** `memberships` carries both `race_name` (original casing)
|
||||
and `canonical_key` (policy-derived form) as separate `text NOT NULL`
|
||||
columns. There is no UNIQUE constraint on `canonical_key`.
|
||||
|
||||
**Why.** Downstream consumers — capability evaluation and the
|
||||
user-lifecycle cascade — read the canonical form directly without
|
||||
re-deriving it from `race_name`, which is the same arrangement the
|
||||
Redis JSON record had. Race-name uniqueness across the platform
|
||||
remains the responsibility of the Race Name Directory; enforcing a
|
||||
UNIQUE on memberships' canonical_key now would duplicate the RND
|
||||
invariant and create deadlock potential between the two stores.
|
||||
|
||||
### 7. ON DELETE CASCADE from games to children
|
||||
|
||||
**Decision.** Each child table (`applications`, `invites`,
|
||||
`memberships`) declares its `game_id` as `REFERENCES games(game_id) ON
|
||||
DELETE CASCADE`.
|
||||
|
||||
**Why.** Lobby code never deletes games today — every status terminal
|
||||
is a soft state — so the cascade has no live trigger. It exists for
|
||||
two future paths: scheduled cleanup of `cancelled` games far past
|
||||
retention, and explicit operator/test resets. CASCADE keeps those paths
|
||||
trivial and free of dangling references.
|
||||
|
||||
### 8. Listing order: most-recent-first for games, oldest-first for child tables
|
||||
|
||||
**Decision.** `GetByStatus` and `GetByOwner` on `games` order by
|
||||
`created_at DESC, game_id DESC`. The per-game/per-user listings on
|
||||
`applications`, `invites`, `memberships` order by `created_at ASC,
|
||||
<id> ASC` (memberships order by `joined_at ASC`).
|
||||
|
||||
**Why.** Game listings serve user-facing feeds where most-recent-first
|
||||
is the natural expectation, matching the previous Redis sorted-set
|
||||
score and the `accounts.created_at DESC` convention from User Stage 3.
|
||||
Child-table listings serve administrative and cascade flows where the
|
||||
chronological order helps operators reason about the sequence of
|
||||
events. The ports doc explicitly says "order is adapter-defined", so
|
||||
either convention is contract-compatible.
|
||||
|
||||
### 9. Heavy `runtime_test.go` / `runtime_smoke_test.go` deleted; integration coverage
|
||||
|
||||
**Decision.** The service-local `internal/app/runtime_test.go` and
|
||||
`runtime_smoke_test.go` were removed. Black-box runtime coverage moves
|
||||
to the `integration/lobbyuser` and `integration/lobbynotification`
|
||||
suites, which now spin up both a PostgreSQL container (via
|
||||
`harness.StartLobbyServicePersistence`) and the existing Redis
|
||||
container.
|
||||
|
||||
**Why.** Mirrors the Mail Stage 4 / Notification Stage 5 precedent.
|
||||
Booting a full Lobby runtime now requires both PostgreSQL and Redis,
|
||||
which is the integration-suite shape; duplicating that bootstrap
|
||||
inside `internal/app/` would be heavy and fragile. The remaining
|
||||
service-local tests cover units that do not require the full runtime.
|
||||
|
||||
### 10. Query layer is `go-jet/jet/v2`
|
||||
|
||||
**Decision.** All four PG-store packages build SQL through the jet
|
||||
builder API (`pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE` plus the
|
||||
`pg.AND/OR/SET/COALESCE/...` DSL). Generated table models live under
|
||||
`internal/adapters/postgres/jet/lobby/{model,table}/` and are
|
||||
regenerated by `make jet` (which spins up a transient PostgreSQL via
|
||||
testcontainers, applies the embedded goose migrations, and runs jet's
|
||||
generator). Generated code is committed.
|
||||
|
||||
**Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer:
|
||||
`github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives
|
||||
under each service `internal/adapters/postgres/jet/`, regenerated via
|
||||
a `make jet` target and committed to the repo"). PostgreSQL constructs
|
||||
that the jet builder does not cover natively (`FOR UPDATE`,
|
||||
`COALESCE`, `LOWER` on subselects, JSONB params) are expressed through
|
||||
the per-DSL helpers (`.FOR(pg.UPDATE())`, `pg.COALESCE`, `pg.LOWER`,
|
||||
direct `[]byte`/string params for JSONB columns). Manual `rowScanner`
|
||||
helpers (`scanGame`, `scanApplication`, `scanInvite`,
|
||||
`scanMembership`) preserve the codecs.go boundary translations and
|
||||
domain-type mapping; jet only owns SQL construction.
|
||||
|
||||
## Out of scope for §6A
|
||||
|
||||
- Read routing through `LOBBY_POSTGRES_REPLICA_DSNS` — config exposes
|
||||
the field, runtime ignores it.
|
||||
- Production provisioning of the `lobby` schema and `lobbyservice`
|
||||
role — operational concern handled outside the service binary.
|
||||
|
||||
## §6B — Race Name Directory on PostgreSQL
|
||||
|
||||
§6B replaces the Redis-backed Race Name Directory (one Lua script + a
|
||||
canonical-lookup cache + a pending-index ZSET + per-binding string keys)
|
||||
with a single PostgreSQL table `race_names` whose rows back all three
|
||||
binding kinds (`registered`, `reservation`, `pending_registration`).
|
||||
The `race_names` DDL lives in `00001_init.sql` next to the four core
|
||||
enrollment tables (it was originally introduced as a separate
|
||||
`00002_race_names.sql`; PG_PLAN.md §9 collapsed the two files into one
|
||||
init migration during the pre-launch development window). The adapter
|
||||
`internal/adapters/postgres/racenamedir/directory.go` is the canonical
|
||||
reference; the architecture rule is unchanged from §6A.
|
||||
|
||||
### 11. One table, composite primary key `(canonical_key, game_id)`
|
||||
|
||||
**Decision.** `race_names` carries one row per binding under the
|
||||
composite primary key `(canonical_key, game_id)`. Reservations and
|
||||
pending_registrations write the actual game id; registered rows write
|
||||
`game_id = ''` and keep the source game in `source_game_id`. A partial
|
||||
UNIQUE index on `(canonical_key)` filtered to `binding_kind =
|
||||
'registered'` enforces the single-registered-per-canonical rule.
|
||||
|
||||
**Why.** PG_PLAN.md §6B sketched the table as `(canonical_key PK, …)`,
|
||||
but the existing port semantics (`testReserveCrossGame`,
|
||||
`testReleaseReservationKeepsCrossGame` in
|
||||
`internal/ports/racenamedirtest/suite.go`) require the same user to hold
|
||||
several per-game reservations on one canonical key concurrently. A flat
|
||||
single-PK table cannot model that without losing the per-game
|
||||
identity. The composite PK matches both invariants — at most one row per
|
||||
(canonical, game) and at most one registered row per canonical — without
|
||||
splitting the data into two tables (which would force every write
|
||||
operation to touch two unrelated indexes and reproduce the old
|
||||
canonical-lookup cache invariant manually).
|
||||
|
||||
### 12. Concurrency: PostgreSQL transactional advisory locks
|
||||
|
||||
**Decision.** Every write operation (`Reserve`, `MarkPendingRegistration`,
|
||||
`Register`, `ReleaseReservation`, the per-row branch of
|
||||
`ExpirePendingRegistrations`) opens a `BEGIN; …; COMMIT` and acquires
|
||||
`pg_advisory_xact_lock(hashtextextended($canonical_key, 0))` as the very
|
||||
first statement. The lock auto-releases on commit or rollback.
|
||||
`ReleaseAllByUser` is a single `DELETE WHERE holder_user_id = $1` and
|
||||
takes no advisory lock — it runs on permanent_blocked / deleted
|
||||
lifecycle events, so the user being deleted cannot be a concurrent
|
||||
writer on those bindings.
|
||||
|
||||
**Why.** PG_PLAN.md §6B explicitly authorised either `SELECT … FOR
|
||||
UPDATE` or advisory locks. `SELECT … FOR UPDATE` cannot serialize
|
||||
against not-yet-existing rows (e.g. concurrent first-time `Reserve`s for
|
||||
the same canonical), so advisory locks are required for race-free
|
||||
INSERTs. Hashing through `hashtextextended` produces a 64-bit lock key
|
||||
covering arbitrary canonical strings, sidestepping `bigint` truncation
|
||||
that older `hashtext` exposes. Holding the lock for one transaction
|
||||
keeps the contention surface tight and matches the Notification §5
|
||||
"narrow CAS, no application-logic-bound row locks" precedent.
|
||||
|
||||
### 13. `binding_kind` values match `ports.Kind*` verbatim
|
||||
|
||||
**Decision.** `race_names.binding_kind` stores `"registered"`,
|
||||
`"reservation"`, or `"pending_registration"` — the same string literals
|
||||
exported by `ports.KindRegistered`, `ports.KindReservation`,
|
||||
`ports.KindPendingRegistration`. The adapter returns the raw value
|
||||
directly through `Availability.Kind` without translation. A `CHECK`
|
||||
constraint on the column rejects anything else.
|
||||
|
||||
**Why.** Avoids one boundary translation and one synonym ("reserved" vs
|
||||
"reservation") that the Redis adapter carried internally as
|
||||
`reservationStatusReserved = "reserved"`. With the port-equivalent
|
||||
literals on disk, future operator-side queries (`SELECT … WHERE
|
||||
binding_kind = 'reservation'`) match the Go-level constants 1:1, and
|
||||
the adapter saves a `switch` per `Check` call.
|
||||
|
||||
### 14. `Check` returns the strongest binding via in-process priority
|
||||
|
||||
**Decision.** `Check` issues `SELECT holder_user_id, binding_kind FROM
|
||||
race_names WHERE canonical_key = $1` and picks the strongest binding in
|
||||
Go using a priority rank `registered > pending_registration >
|
||||
reservation`. There is no SQL `CASE` expression in the ORDER BY.
|
||||
|
||||
**Why.** The dataset per canonical is bounded (at most one registered +
|
||||
one row per active game) and is read frequently by every `Check`. The
|
||||
Go-side rank avoids a SQL DSL detour that go-jet/v2 would express via
|
||||
raw SQL anyway, and it keeps the query plan a single index scan on
|
||||
`canonical_key`.
|
||||
|
||||
### 15. `ExpirePendingRegistrations` scans then locks per row
|
||||
|
||||
**Decision.** The expirer first runs an indexed scan
|
||||
`WHERE binding_kind = 'pending_registration' AND eligible_until_ms <=
|
||||
$cutoff` (served by `race_names_pending_eligible_idx`), then re-reads
|
||||
each candidate inside its own advisory-locked transaction, asserts the
|
||||
binding is still pending and still expired, and DELETEs it. Concurrent
|
||||
`Register` or `ReleaseReservation` simply causes the per-row branch to
|
||||
skip without error.
|
||||
|
||||
**Why.** Mirrors the Redis adapter's two-phase `ZRANGEBYSCORE` + per-
|
||||
member release loop. A bulk `DELETE … WHERE eligible_until_ms <= …`
|
||||
would not produce the per-entry `ports.ExpiredPending` slice the worker
|
||||
needs for telemetry, and would race with `Register` (which targets the
|
||||
same row).
|
||||
|
||||
### 16. Shared port test suite stays on PostgreSQL via a serial harness
|
||||
|
||||
**Decision.** The shared `racenamedirtest` suite no longer calls
|
||||
`t.Parallel()` from its subtests. Every subtest goes through the
|
||||
factory, the factory truncates the lobby tables and constructs a fresh
|
||||
adapter against the package-shared testcontainers PostgreSQL.
|
||||
|
||||
**Why.** The PostgreSQL adapter relies on `pgtest.TruncateAll` between
|
||||
factory invocations; running subtests in parallel against one shared
|
||||
container would race truncate against other subtests' INSERTs. Spinning
|
||||
up a per-subtest schema would multiply container provisioning cost
|
||||
significantly (PG generation step alone takes minutes per fresh
|
||||
container), and the suite is fast enough serially. The Redis-only
|
||||
backend retired in §6B no longer needs the parallelism either; only the
|
||||
in-process stub remains in scope and has trivial setup cost.
|
||||
|
||||
## §6C — Workers, ephemeral stores, cleanup
|
||||
|
||||
§6C closes the Lobby migration: it confirms what intentionally stays on
|
||||
Redis, prunes the dead Redis adapter code, and finalises the
|
||||
service-layer documentation.
|
||||
|
||||
### 17. Workers stayed on ports — no functional change
|
||||
|
||||
**Decision.** The four Lobby workers (`pendingregistration`,
|
||||
`gmevents`, `runtimejobresult`, `userlifecycle`) and the
|
||||
`enrollmentautomation` worker shipped in §6A already consume their
|
||||
storage through ports. After §6B the `RaceNameDirectory` port resolves
|
||||
to the PostgreSQL adapter; no worker required code changes.
|
||||
|
||||
**Why.** §6A established the port-on-storage seam for `GameStore`,
|
||||
`ApplicationStore`, `InviteStore`, `MembershipStore`. §6B kept the same
|
||||
contract for `RaceNameDirectory`. Worker logic depends on the contract,
|
||||
not the backend, so the migration completes via a wiring switch in
|
||||
`internal/app/wiring.go::buildRaceNameDirectory` without re-touching
|
||||
worker code.
|
||||
|
||||
### 18. `redisstate` retains only runtime-coordination adapters
|
||||
|
||||
**Decision.** After §6C the `internal/adapters/redisstate/` package
|
||||
implements only `GameTurnStatsStore`, `GapActivationStore`,
|
||||
`EvaluationGuardStore`, `StreamOffsetStore`, and the `StreamLagProbe`.
|
||||
The legacy `racenamedir.go`, `racenamedir_lua.go`,
|
||||
`racenamedir_test.go`, `codecs_racename.go`, and the dead game
|
||||
codecs (`codecs.go`'s `MarshalGame`/`UnmarshalGame`) are removed. The
|
||||
`Keyspace` type only builds keys for the surviving adapters
|
||||
(`GapActivatedAt`, `StreamOffset`, `GameTurnStat`,
|
||||
`GameTurnStatsByGame`, `CapabilityEvaluationGuard`).
|
||||
|
||||
**Why.** Architectural rule (`ARCHITECTURE.md §Persistence Backends`):
|
||||
Redis owns runtime-coordination state, PostgreSQL owns durable business
|
||||
state. The retained Redis stores back ephemeral per-game aggregates
|
||||
(`game_turn_stats`), short-lived sentinels (`gap_activated_at`,
|
||||
`capability_evaluation:done:*`), and the consumer-offset coordination
|
||||
state (`stream_offsets:*`) — all rebuildable or losable without
|
||||
durability impact. Streams stay on Redis because they *are* the event
|
||||
bus.
|
||||
|
||||
### 19. Default Race Name Directory backend is `postgres`
|
||||
|
||||
**Decision.** `LOBBY_RACE_NAME_DIRECTORY_BACKEND` defaults to
|
||||
`"postgres"`. The accepted values are `postgres` (production) and
|
||||
`stub` (in-process for unit tests that do not need a real PostgreSQL).
|
||||
The `redis` value, the corresponding `RaceNameDirectoryBackendRedis`
|
||||
constant, and the wiring branch are removed.
|
||||
|
||||
**Why.** The Redis adapter is gone; keeping the value in the validator
|
||||
would produce a misleading "configuration accepted, but startup fails
|
||||
when wiring resolves the directory" path. Leaving `stub` as a valid
|
||||
backend lets per-service unit tests run against a small, fast
|
||||
in-process directory; integration suites use `postgres` via the
|
||||
testcontainers harness.
|
||||
+47
-18
@@ -7,8 +7,23 @@ readiness, shutdown, and the handful of recovery paths specific to Lobby.
|
||||
|
||||
Before starting the process, confirm:
|
||||
|
||||
- `LOBBY_REDIS_ADDR` points to the Redis deployment used for state and the
|
||||
five Lobby-related streams.
|
||||
- `LOBBY_REDIS_MASTER_ADDR` and `LOBBY_REDIS_PASSWORD` point to the Redis
|
||||
deployment used for the runtime-coordination state that intentionally
|
||||
stays on Redis: stream consumers/publishers, stream offsets, per-game
|
||||
turn-stats aggregates, gap-activation timestamps, and the
|
||||
capability-evaluation guard. The deprecated `LOBBY_REDIS_ADDR`,
|
||||
`LOBBY_REDIS_USERNAME`, and `LOBBY_REDIS_TLS_ENABLED` env vars were
|
||||
retired in PG_PLAN.md §6A; setting either of the latter two now fails
|
||||
fast at startup.
|
||||
- `LOBBY_POSTGRES_PRIMARY_DSN` points to the PostgreSQL primary that
|
||||
hosts the `lobby` schema. The DSN must include `search_path=lobby` and
|
||||
`sslmode=disable`. Embedded goose migrations apply at startup before
|
||||
any HTTP listener opens; a migration or ping failure terminates the
|
||||
process with a non-zero exit. After PG_PLAN.md §6A the schema holds
|
||||
`games`, `applications`, `invites`, `memberships`; after §6B it also
|
||||
holds `race_names`. The schema and the `lobbyservice` role are
|
||||
provisioned externally (operator init script in production, the
|
||||
testcontainers harness in tests).
|
||||
- `LOBBY_USER_SERVICE_BASE_URL` and `LOBBY_GM_BASE_URL` are reachable from
|
||||
the network the Lobby pods run in. Lobby does not ping these at boot,
|
||||
but transport failures against them will surface as request errors.
|
||||
@@ -19,11 +34,13 @@ Before starting the process, confirm:
|
||||
- `LOBBY_RUNTIME_JOB_RESULTS_STREAM` (default `runtime:job_results`)
|
||||
- `LOBBY_USER_LIFECYCLE_STREAM` (default `user:lifecycle_events`)
|
||||
- `LOBBY_NOTIFICATION_INTENTS_STREAM` (default `notification:intents`)
|
||||
- `LOBBY_RACE_NAME_DIRECTORY_BACKEND` is `redis` for production; the
|
||||
`stub` value is only for unit tests.
|
||||
- `LOBBY_RACE_NAME_DIRECTORY_BACKEND` is `postgres` for production
|
||||
(the default after PG_PLAN.md §6B); the `stub` value is only for
|
||||
unit tests that do not need a real PostgreSQL.
|
||||
|
||||
At startup the process performs a bounded `PING` against Redis. Startup
|
||||
fails fast if the ping fails. There are no liveness checks against User
|
||||
At startup the process opens the PostgreSQL pool, applies migrations,
|
||||
pings PostgreSQL, then opens the Redis client and pings Redis. Startup
|
||||
fails fast if any step fails. There are no liveness checks against User
|
||||
Service or Game Master at boot; those are surfaced at request time.
|
||||
|
||||
Expected listener state after a healthy start:
|
||||
@@ -160,11 +177,15 @@ is reachable again.
|
||||
To inspect the backlog:
|
||||
|
||||
```bash
|
||||
redis-cli ZRANGE lobby:race_names:pending_index 0 -1 WITHSCORES
|
||||
psql -c "SELECT canonical_key, game_id, holder_user_id, eligible_until_ms
|
||||
FROM lobby.race_names
|
||||
WHERE binding_kind = 'pending_registration'
|
||||
ORDER BY eligible_until_ms ASC"
|
||||
```
|
||||
|
||||
Entries with `score < now()` (Unix milliseconds) are expirable on the next
|
||||
tick.
|
||||
Rows whose `eligible_until_ms` is at or below `extract(epoch from now()) * 1000`
|
||||
are expirable on the next tick. The partial index
|
||||
`race_names_pending_eligible_idx` keeps this scan cheap.
|
||||
|
||||
## Cascade Release Operator Notes
|
||||
|
||||
@@ -195,26 +216,34 @@ out-of-band.
|
||||
|
||||
## Diagnostic Queries
|
||||
|
||||
A handful of Redis CLI snippets help during incidents:
|
||||
Durable enrollment state and Race Name Directory bindings live in
|
||||
PostgreSQL; runtime coordination state stays in Redis. A handful of CLI
|
||||
snippets help during incidents:
|
||||
|
||||
```bash
|
||||
# Live game count by status
|
||||
redis-cli ZCARD lobby:games_by_status:enrollment_open
|
||||
redis-cli ZCARD lobby:games_by_status:running
|
||||
# Live game count by status (PostgreSQL)
|
||||
psql -c "SELECT status, COUNT(*) FROM lobby.games GROUP BY status"
|
||||
|
||||
# Inspect a specific game record
|
||||
redis-cli GET lobby:games:<game_id>
|
||||
psql -c "SELECT * FROM lobby.games WHERE game_id = '<game_id>'"
|
||||
|
||||
# Member roster for a game
|
||||
redis-cli SMEMBERS lobby:game_memberships:<game_id>
|
||||
psql -c "SELECT user_id, race_name, status, joined_at
|
||||
FROM lobby.memberships
|
||||
WHERE game_id = '<game_id>'
|
||||
ORDER BY joined_at"
|
||||
|
||||
# Race name pending entries (oldest first)
|
||||
redis-cli ZRANGE lobby:race_names:pending_index 0 -1 WITHSCORES
|
||||
psql -c "SELECT canonical_key, game_id, holder_user_id, eligible_until_ms
|
||||
FROM lobby.race_names
|
||||
WHERE binding_kind = 'pending_registration'
|
||||
ORDER BY eligible_until_ms ASC"
|
||||
|
||||
# Stream lag inspection
|
||||
# Stream lag inspection (Redis)
|
||||
redis-cli XINFO STREAM gm:lobby_events
|
||||
redis-cli GET lobby:stream_offsets:gm_events
|
||||
```
|
||||
|
||||
The gauges and counters surfaced through OpenTelemetry are the primary
|
||||
observability surface; raw Redis access is for last-resort triage.
|
||||
observability surface; raw PostgreSQL and Redis access is for last-resort
|
||||
triage.
|
||||
|
||||
+19
-11
@@ -56,9 +56,10 @@ flowchart LR
|
||||
|
||||
Notes:
|
||||
|
||||
- `cmd/lobby` refuses startup when Redis connectivity is misconfigured. User
|
||||
Service and Game Master reachability are not verified at boot; transport
|
||||
failures surface as request errors.
|
||||
- `cmd/lobby` refuses startup when Redis connectivity is misconfigured, when
|
||||
PostgreSQL is unreachable, or when the embedded goose migrations fail to
|
||||
apply. User Service and Game Master reachability are not verified at boot;
|
||||
transport failures surface as request errors.
|
||||
- Both HTTP listeners expose `/healthz` and `/readyz` independently so health
|
||||
checks can target either port.
|
||||
- `register-runtime` is an outgoing call from Lobby to Game Master after the
|
||||
@@ -85,7 +86,7 @@ Probe routes:
|
||||
|
||||
- `GET /healthz` returns `{"status":"ok"}`
|
||||
- `GET /readyz` returns `{"status":"ready"}` once startup wiring completes.
|
||||
- Neither probe performs a live Redis ping per request.
|
||||
- Neither probe performs a live Redis or PostgreSQL ping per request.
|
||||
- There is no `/metrics` route. Metrics flow through OpenTelemetry exporters.
|
||||
|
||||
## Background Workers
|
||||
@@ -130,13 +131,20 @@ lags or stalls, the gauge climbs and stays high.
|
||||
The full env-var list with defaults lives in `../README.md` §Configuration.
|
||||
The groups below summarize the structure:
|
||||
|
||||
- **Required** — `LOBBY_REDIS_ADDR`, `LOBBY_USER_SERVICE_BASE_URL`,
|
||||
- **Required** — `LOBBY_REDIS_MASTER_ADDR`, `LOBBY_REDIS_PASSWORD`,
|
||||
`LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_USER_SERVICE_BASE_URL`,
|
||||
`LOBBY_GM_BASE_URL`.
|
||||
- **Process and logging** — `LOBBY_SHUTDOWN_TIMEOUT`, `LOBBY_LOG_LEVEL`.
|
||||
- **HTTP listeners** — `LOBBY_PUBLIC_HTTP_*`, `LOBBY_INTERNAL_HTTP_*`.
|
||||
- **Redis connectivity** — `LOBBY_REDIS_USERNAME`, `LOBBY_REDIS_PASSWORD`,
|
||||
`LOBBY_REDIS_DB`, `LOBBY_REDIS_TLS_ENABLED`,
|
||||
`LOBBY_REDIS_OPERATION_TIMEOUT`.
|
||||
- **Redis connectivity** — `LOBBY_REDIS_MASTER_ADDR`,
|
||||
`LOBBY_REDIS_REPLICA_ADDRS`, `LOBBY_REDIS_PASSWORD`, `LOBBY_REDIS_DB`,
|
||||
`LOBBY_REDIS_OPERATION_TIMEOUT` (legacy `LOBBY_REDIS_ADDR`,
|
||||
`LOBBY_REDIS_TLS_ENABLED`, `LOBBY_REDIS_USERNAME` removed in PG_PLAN.md
|
||||
§6A).
|
||||
- **PostgreSQL connectivity** — `LOBBY_POSTGRES_PRIMARY_DSN`,
|
||||
`LOBBY_POSTGRES_REPLICA_DSNS`, `LOBBY_POSTGRES_OPERATION_TIMEOUT`,
|
||||
`LOBBY_POSTGRES_MAX_OPEN_CONNS`, `LOBBY_POSTGRES_MAX_IDLE_CONNS`,
|
||||
`LOBBY_POSTGRES_CONN_MAX_LIFETIME`.
|
||||
- **Streams** — `LOBBY_GM_EVENTS_STREAM`, `LOBBY_RUNTIME_START_JOBS_STREAM`,
|
||||
`LOBBY_RUNTIME_STOP_JOBS_STREAM`, `LOBBY_RUNTIME_JOB_RESULTS_STREAM`,
|
||||
`LOBBY_NOTIFICATION_INTENTS_STREAM`, `LOBBY_USER_LIFECYCLE_STREAM`.
|
||||
@@ -152,9 +160,9 @@ The groups below summarize the structure:
|
||||
|
||||
- `Game Lobby` owns platform game state. Game Master may cache snapshots but
|
||||
is not the source of truth.
|
||||
- The Race Name Directory ships a Redis adapter and an in-process stub; the
|
||||
stub is intended for unit tests and is selected via
|
||||
`LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`.
|
||||
- The Race Name Directory ships a PostgreSQL adapter (default after
|
||||
PG_PLAN.md §6B) and an in-process stub. The stub is intended for unit
|
||||
tests and is selected via `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`.
|
||||
- A `permanent_block` or `deleted` event from User Service fans out
|
||||
asynchronously through the `user:lifecycle_events` consumer; in-flight
|
||||
games owned by the affected user receive a stop-job and transition to
|
||||
|
||||
Reference in New Issue
Block a user