172 lines
7.6 KiB
Markdown
172 lines
7.6 KiB
Markdown
# Runtime and Components
|
|
|
|
The diagram below focuses on the deployed `galaxy/lobby` process and its
|
|
runtime dependencies.
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
subgraph Clients
|
|
Gateway["Edge Gateway"]
|
|
Admin["Admin Service"]
|
|
GM["Game Master"]
|
|
end
|
|
|
|
subgraph Lobby["Game Lobby process"]
|
|
PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"]
|
|
InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"]
|
|
EnrollAuto["Enrollment automation worker"]
|
|
RTJobsConsumer["runtime:job_results consumer"]
|
|
GMEventsConsumer["gm:lobby_events consumer"]
|
|
PendingExpirer["Pending registration expirer"]
|
|
ULConsumer["user:lifecycle_events consumer"]
|
|
IntentPublisher["notification:intents publisher"]
|
|
Telemetry["Logs, traces, metrics"]
|
|
end
|
|
|
|
User["User Service"]
|
|
Redis["Redis\nKV + Streams"]
|
|
|
|
Gateway --> PublicHTTP
|
|
Admin --> InternalHTTP
|
|
GM --> InternalHTTP
|
|
|
|
PublicHTTP --> User
|
|
InternalHTTP --> User
|
|
PublicHTTP -. register-runtime .-> GM
|
|
InternalHTTP -. register-runtime .-> GM
|
|
|
|
EnrollAuto --> Redis
|
|
RTJobsConsumer --> Redis
|
|
GMEventsConsumer --> Redis
|
|
PendingExpirer --> Redis
|
|
ULConsumer --> Redis
|
|
IntentPublisher --> Redis
|
|
|
|
PublicHTTP --> Redis
|
|
InternalHTTP --> Redis
|
|
|
|
PublicHTTP --> Telemetry
|
|
InternalHTTP --> Telemetry
|
|
EnrollAuto --> Telemetry
|
|
RTJobsConsumer --> Telemetry
|
|
GMEventsConsumer --> Telemetry
|
|
PendingExpirer --> Telemetry
|
|
ULConsumer --> Telemetry
|
|
```
|
|
|
|
Notes:
|
|
|
|
- `cmd/lobby` refuses startup when Redis connectivity is misconfigured, when
|
|
PostgreSQL is unreachable, or when the embedded goose migrations fail to
|
|
apply. User Service and Game Master reachability are not verified at boot;
|
|
transport failures surface as request errors.
|
|
- Both HTTP listeners expose `/healthz` and `/readyz` independently so health
|
|
checks can target either port.
|
|
- `register-runtime` is an outgoing call from Lobby to Game Master after the
|
|
container start completes. Lobby does not expose an inbound endpoint of the
|
|
same name.
|
|
|
|
## Listeners
|
|
|
|
| Listener | Default addr | Purpose |
|
|
| --- | --- | --- |
|
|
| Public HTTP | `:8094` | Authenticated user routes; gateway-facing |
|
|
| Internal HTTP | `:8095` | Admin-mirrored routes + Game Master read paths |
|
|
|
|
Shared listener defaults:
|
|
|
|
- read-header timeout: `2s`
|
|
- read timeout: `10s`
|
|
- idle timeout: `1m`
|
|
|
|
Public-port routes carry an `X-User-ID` header injected by Edge Gateway;
|
|
internal-port routes admit the admin actor without the header.
|
|
|
|
Probe routes:
|
|
|
|
- `GET /healthz` returns `{"status":"ok"}`
|
|
- `GET /readyz` returns `{"status":"ready"}` once startup wiring completes.
|
|
- Neither probe performs a live Redis or PostgreSQL ping per request.
|
|
- There is no `/metrics` route. Metrics flow through OpenTelemetry exporters.
|
|
|
|
## Background Workers
|
|
|
|
| Worker | Trigger | Function |
|
|
| --- | --- | --- |
|
|
| Enrollment automation | Periodic tick (`LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`) | Closes enrollment when the deadline or the gap window is exhausted. |
|
|
| `runtime:job_results` consumer | Redis `XREAD` | Drives `starting` to `running`/`paused`/`start_failed` based on Runtime Manager outcomes. |
|
|
| `gm:lobby_events` consumer | Redis `XREAD` | Applies runtime snapshot updates and game-finish events from Game Master; hands `game_finished` events off to capability evaluation. |
|
|
| Pending registration expirer | Periodic tick (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) | Releases `pending_registration` entries past their 30-day window. |
|
|
| `user:lifecycle_events` consumer | Redis `XREAD` | Fans out the cascade for `permanent_blocked` and `deleted` user events (RND release, membership block, application/invite cancel, owned-game cancel). |
|
|
| `notification:intents` publisher | Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. |
|
|
|
|
## Synchronous Upstream Clients
|
|
|
|
| Client | Endpoint | Failure mapping |
|
|
| --- | --- | --- |
|
|
| `User Service` eligibility | `POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility` | Network or non-2xx → `503 service_unavailable`; `permanent_block` → `404 subject_not_found`. |
|
|
| `Game Master` register-runtime | `POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime` | Network or non-2xx → forced-pause path (`paused` + `lobby.runtime_paused_after_start`). |
|
|
| `Game Master` liveness probe | `GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz` | Used during `lobby.game.resume`; failure surfaces as `503 service_unavailable`. |
|
|
|
|
## Stream Offsets
|
|
|
|
Each consumer persists its position under a dedicated key so process restart
|
|
preserves stream progress.
|
|
|
|
| Stream | Offset key | Read block timeout env |
|
|
| --- | --- | --- |
|
|
| `gm:lobby_events` | `lobby:stream_offsets:gm_events` | `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` |
|
|
| `runtime:job_results` | `lobby:stream_offsets:runtime_results` | `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` |
|
|
| `user:lifecycle_events` | `lobby:stream_offsets:user_lifecycle` | `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` |
|
|
|
|
Stream lag is exposed through observable gauges
|
|
`lobby.gm_events.oldest_unprocessed_age_ms`,
|
|
`lobby.runtime_results.oldest_unprocessed_age_ms`, and
|
|
`lobby.user_lifecycle.oldest_unprocessed_age_ms`. The probe samples the
|
|
oldest entry whose ID is greater than the persisted offset; when a consumer
|
|
lags or stalls, the gauge climbs and stays high.
|
|
|
|
## Configuration Groups
|
|
|
|
The full env-var list with defaults lives in `../README.md` §Configuration.
|
|
The groups below summarize the structure:
|
|
|
|
- **Required** — `LOBBY_REDIS_MASTER_ADDR`, `LOBBY_REDIS_PASSWORD`,
|
|
`LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_USER_SERVICE_BASE_URL`,
|
|
`LOBBY_GM_BASE_URL`.
|
|
- **Process and logging** — `LOBBY_SHUTDOWN_TIMEOUT`, `LOBBY_LOG_LEVEL`.
|
|
- **HTTP listeners** — `LOBBY_PUBLIC_HTTP_*`, `LOBBY_INTERNAL_HTTP_*`.
|
|
- **Redis connectivity** — `LOBBY_REDIS_MASTER_ADDR`,
|
|
`LOBBY_REDIS_REPLICA_ADDRS`, `LOBBY_REDIS_PASSWORD`, `LOBBY_REDIS_DB`,
|
|
`LOBBY_REDIS_OPERATION_TIMEOUT` (legacy `LOBBY_REDIS_ADDR`,
|
|
`LOBBY_REDIS_TLS_ENABLED`, `LOBBY_REDIS_USERNAME` removed in PG_PLAN.md
|
|
§6A).
|
|
- **PostgreSQL connectivity** — `LOBBY_POSTGRES_PRIMARY_DSN`,
|
|
`LOBBY_POSTGRES_REPLICA_DSNS`, `LOBBY_POSTGRES_OPERATION_TIMEOUT`,
|
|
`LOBBY_POSTGRES_MAX_OPEN_CONNS`, `LOBBY_POSTGRES_MAX_IDLE_CONNS`,
|
|
`LOBBY_POSTGRES_CONN_MAX_LIFETIME`.
|
|
- **Streams** — `LOBBY_GM_EVENTS_STREAM`, `LOBBY_RUNTIME_START_JOBS_STREAM`,
|
|
`LOBBY_RUNTIME_STOP_JOBS_STREAM`, `LOBBY_RUNTIME_JOB_RESULTS_STREAM`,
|
|
`LOBBY_NOTIFICATION_INTENTS_STREAM`, `LOBBY_USER_LIFECYCLE_STREAM`.
|
|
- **Upstream clients** — `LOBBY_USER_SERVICE_TIMEOUT`, `LOBBY_GM_TIMEOUT`.
|
|
- **Workers** — `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`,
|
|
`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`,
|
|
`LOBBY_RACE_NAME_DIRECTORY_BACKEND`.
|
|
- **Telemetry** — standard `OTEL_*` plus
|
|
`LOBBY_OTEL_STDOUT_TRACES_ENABLED`,
|
|
`LOBBY_OTEL_STDOUT_METRICS_ENABLED`.
|
|
|
|
## Runtime Notes
|
|
|
|
- `Game Lobby` owns platform game state. Game Master may cache snapshots but
|
|
is not the source of truth.
|
|
- The Race Name Directory ships a PostgreSQL adapter (default after
|
|
PG_PLAN.md §6B) and an in-process stub. The stub is intended for unit
|
|
tests and is selected via `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`.
|
|
- A `permanent_block` or `deleted` event from User Service fans out
|
|
asynchronously through the `user:lifecycle_events` consumer; in-flight
|
|
games owned by the affected user receive a stop-job and transition to
|
|
`cancelled` via the `external_block` trigger.
|
|
- `notification:intents` publishes are best-effort: a failed publish is
|
|
logged and counted but does not roll back the committed business state.
|