feat: use postgres
This commit is contained in:
+47
-18
@@ -7,8 +7,23 @@ readiness, shutdown, and the handful of recovery paths specific to Lobby.
|
||||
|
||||
Before starting the process, confirm:
|
||||
|
||||
- `LOBBY_REDIS_ADDR` points to the Redis deployment used for state and the
|
||||
five Lobby-related streams.
|
||||
- `LOBBY_REDIS_MASTER_ADDR` and `LOBBY_REDIS_PASSWORD` point to the Redis
|
||||
deployment used for the runtime-coordination state that intentionally
|
||||
stays on Redis: stream consumers/publishers, stream offsets, per-game
|
||||
turn-stats aggregates, gap-activation timestamps, and the
|
||||
capability-evaluation guard. The deprecated `LOBBY_REDIS_ADDR`,
|
||||
`LOBBY_REDIS_USERNAME`, and `LOBBY_REDIS_TLS_ENABLED` env vars were
|
||||
retired in PG_PLAN.md §6A; setting either of the latter two now fails
|
||||
fast at startup.
|
||||
- `LOBBY_POSTGRES_PRIMARY_DSN` points to the PostgreSQL primary that
|
||||
hosts the `lobby` schema. The DSN must include `search_path=lobby` and
|
||||
`sslmode=disable`. Embedded goose migrations apply at startup before
|
||||
any HTTP listener opens; a migration or ping failure terminates the
|
||||
process with a non-zero exit. After PG_PLAN.md §6A the schema holds
|
||||
`games`, `applications`, `invites`, `memberships`; after §6B it also
|
||||
holds `race_names`. The schema and the `lobbyservice` role are
|
||||
provisioned externally (operator init script in production, the
|
||||
testcontainers harness in tests).
|
||||
- `LOBBY_USER_SERVICE_BASE_URL` and `LOBBY_GM_BASE_URL` are reachable from
|
||||
the network the Lobby pods run in. Lobby does not ping these at boot,
|
||||
but transport failures against them will surface as request errors.
|
||||
@@ -19,11 +34,13 @@ Before starting the process, confirm:
|
||||
- `LOBBY_RUNTIME_JOB_RESULTS_STREAM` (default `runtime:job_results`)
|
||||
- `LOBBY_USER_LIFECYCLE_STREAM` (default `user:lifecycle_events`)
|
||||
- `LOBBY_NOTIFICATION_INTENTS_STREAM` (default `notification:intents`)
|
||||
- `LOBBY_RACE_NAME_DIRECTORY_BACKEND` is `redis` for production; the
|
||||
`stub` value is only for unit tests.
|
||||
- `LOBBY_RACE_NAME_DIRECTORY_BACKEND` is `postgres` for production
|
||||
(the default after PG_PLAN.md §6B); the `stub` value is only for
|
||||
unit tests that do not need a real PostgreSQL.
|
||||
|
||||
At startup the process performs a bounded `PING` against Redis. Startup
|
||||
fails fast if the ping fails. There are no liveness checks against User
|
||||
At startup the process opens the PostgreSQL pool, applies migrations,
|
||||
pings PostgreSQL, then opens the Redis client and pings Redis. Startup
|
||||
fails fast if any step fails. There are no liveness checks against User
|
||||
Service or Game Master at boot; those are surfaced at request time.
|
||||
|
||||
Expected listener state after a healthy start:
|
||||
@@ -160,11 +177,15 @@ is reachable again.
|
||||
To inspect the backlog:
|
||||
|
||||
```bash
|
||||
redis-cli ZRANGE lobby:race_names:pending_index 0 -1 WITHSCORES
|
||||
psql -c "SELECT canonical_key, game_id, holder_user_id, eligible_until_ms
|
||||
FROM lobby.race_names
|
||||
WHERE binding_kind = 'pending_registration'
|
||||
ORDER BY eligible_until_ms ASC"
|
||||
```
|
||||
|
||||
Entries with `score < now()` (Unix milliseconds) are expirable on the next
|
||||
tick.
|
||||
Rows whose `eligible_until_ms` is at or below `extract(epoch from now()) * 1000`
|
||||
are expirable on the next tick. The partial index
|
||||
`race_names_pending_eligible_idx` keeps this scan cheap.
|
||||
|
||||
## Cascade Release Operator Notes
|
||||
|
||||
@@ -195,26 +216,34 @@ out-of-band.
|
||||
|
||||
## Diagnostic Queries
|
||||
|
||||
A handful of Redis CLI snippets help during incidents:
|
||||
Durable enrollment state and Race Name Directory bindings live in
|
||||
PostgreSQL; runtime coordination state stays in Redis. A handful of CLI
|
||||
snippets help during incidents:
|
||||
|
||||
```bash
|
||||
# Live game count by status
|
||||
redis-cli ZCARD lobby:games_by_status:enrollment_open
|
||||
redis-cli ZCARD lobby:games_by_status:running
|
||||
# Live game count by status (PostgreSQL)
|
||||
psql -c "SELECT status, COUNT(*) FROM lobby.games GROUP BY status"
|
||||
|
||||
# Inspect a specific game record
|
||||
redis-cli GET lobby:games:<game_id>
|
||||
psql -c "SELECT * FROM lobby.games WHERE game_id = '<game_id>'"
|
||||
|
||||
# Member roster for a game
|
||||
redis-cli SMEMBERS lobby:game_memberships:<game_id>
|
||||
psql -c "SELECT user_id, race_name, status, joined_at
|
||||
FROM lobby.memberships
|
||||
WHERE game_id = '<game_id>'
|
||||
ORDER BY joined_at"
|
||||
|
||||
# Race name pending entries (oldest first)
|
||||
redis-cli ZRANGE lobby:race_names:pending_index 0 -1 WITHSCORES
|
||||
psql -c "SELECT canonical_key, game_id, holder_user_id, eligible_until_ms
|
||||
FROM lobby.race_names
|
||||
WHERE binding_kind = 'pending_registration'
|
||||
ORDER BY eligible_until_ms ASC"
|
||||
|
||||
# Stream lag inspection
|
||||
# Stream lag inspection (Redis)
|
||||
redis-cli XINFO STREAM gm:lobby_events
|
||||
redis-cli GET lobby:stream_offsets:gm_events
|
||||
```
|
||||
|
||||
The gauges and counters surfaced through OpenTelemetry are the primary
|
||||
observability surface; raw Redis access is for last-resort triage.
|
||||
observability surface; raw PostgreSQL and Redis access is for last-resort
|
||||
triage.
|
||||
|
||||
Reference in New Issue
Block a user