# Runtime and Components The diagram below focuses on the deployed `galaxy/lobby` process and its runtime dependencies. ```mermaid flowchart LR subgraph Clients Gateway["Edge Gateway"] Admin["Admin Service"] GM["Game Master"] end subgraph Lobby["Game Lobby process"] PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"] InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"] EnrollAuto["Enrollment automation worker"] RTJobsConsumer["runtime:job_results consumer"] GMEventsConsumer["gm:lobby_events consumer"] PendingExpirer["Pending registration expirer"] ULConsumer["user:lifecycle_events consumer"] IntentPublisher["notification:intents publisher"] Telemetry["Logs, traces, metrics"] end User["User Service"] Redis["Redis\nKV + Streams"] Gateway --> PublicHTTP Admin --> InternalHTTP GM --> InternalHTTP PublicHTTP --> User InternalHTTP --> User PublicHTTP -. register-runtime .-> GM InternalHTTP -. register-runtime .-> GM EnrollAuto --> Redis RTJobsConsumer --> Redis GMEventsConsumer --> Redis PendingExpirer --> Redis ULConsumer --> Redis IntentPublisher --> Redis PublicHTTP --> Redis InternalHTTP --> Redis PublicHTTP --> Telemetry InternalHTTP --> Telemetry EnrollAuto --> Telemetry RTJobsConsumer --> Telemetry GMEventsConsumer --> Telemetry PendingExpirer --> Telemetry ULConsumer --> Telemetry ``` Notes: - `cmd/lobby` refuses startup when Redis connectivity is misconfigured, when PostgreSQL is unreachable, or when the embedded goose migrations fail to apply. User Service and Game Master reachability are not verified at boot; transport failures surface as request errors. - Both HTTP listeners expose `/healthz` and `/readyz` independently so health checks can target either port. - `register-runtime` is an outgoing call from Lobby to Game Master after the container start completes. Lobby does not expose an inbound endpoint of the same name. ## Listeners | Listener | Default addr | Purpose | | --- | --- | --- | | Public HTTP | `:8094` | Authenticated user routes; gateway-facing | | Internal HTTP | `:8095` | Admin-mirrored routes + Game Master read paths | Shared listener defaults: - read-header timeout: `2s` - read timeout: `10s` - idle timeout: `1m` Public-port routes carry an `X-User-ID` header injected by Edge Gateway; internal-port routes admit the admin actor without the header. Probe routes: - `GET /healthz` returns `{"status":"ok"}` - `GET /readyz` returns `{"status":"ready"}` once startup wiring completes. - Neither probe performs a live Redis or PostgreSQL ping per request. - There is no `/metrics` route. Metrics flow through OpenTelemetry exporters. ## Background Workers | Worker | Trigger | Function | | --- | --- | --- | | Enrollment automation | Periodic tick (`LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`) | Closes enrollment when the deadline or the gap window is exhausted. | | `runtime:job_results` consumer | Redis `XREAD` | Drives `starting` to `running`/`paused`/`start_failed` based on Runtime Manager outcomes. | | `gm:lobby_events` consumer | Redis `XREAD` | Applies runtime snapshot updates and game-finish events from Game Master; hands `game_finished` events off to capability evaluation. | | Pending registration expirer | Periodic tick (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) | Releases `pending_registration` entries past their 30-day window. | | `user:lifecycle_events` consumer | Redis `XREAD` | Fans out the cascade for `permanent_blocked` and `deleted` user events (RND release, membership block, application/invite cancel, owned-game cancel). | | `notification:intents` publisher | Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. | ## Synchronous Upstream Clients | Client | Endpoint | Failure mapping | | --- | --- | --- | | `User Service` eligibility | `POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility` | Network or non-2xx → `503 service_unavailable`; `permanent_block` → `404 subject_not_found`. | | `Game Master` register-runtime | `POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime` | Network or non-2xx → forced-pause path (`paused` + `lobby.runtime_paused_after_start`). | | `Game Master` liveness probe | `GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz` | Used during `lobby.game.resume`; failure surfaces as `503 service_unavailable`. | ## Stream Offsets Each consumer persists its position under a dedicated key so process restart preserves stream progress. | Stream | Offset key | Read block timeout env | | --- | --- | --- | | `gm:lobby_events` | `lobby:stream_offsets:gm_events` | `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` | | `runtime:job_results` | `lobby:stream_offsets:runtime_results` | `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` | | `user:lifecycle_events` | `lobby:stream_offsets:user_lifecycle` | `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` | Stream lag is exposed through observable gauges `lobby.gm_events.oldest_unprocessed_age_ms`, `lobby.runtime_results.oldest_unprocessed_age_ms`, and `lobby.user_lifecycle.oldest_unprocessed_age_ms`. The probe samples the oldest entry whose ID is greater than the persisted offset; when a consumer lags or stalls, the gauge climbs and stays high. ## Configuration Groups The full env-var list with defaults lives in `../README.md` §Configuration. The groups below summarize the structure: - **Required** — `LOBBY_REDIS_MASTER_ADDR`, `LOBBY_REDIS_PASSWORD`, `LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_USER_SERVICE_BASE_URL`, `LOBBY_GM_BASE_URL`. - **Process and logging** — `LOBBY_SHUTDOWN_TIMEOUT`, `LOBBY_LOG_LEVEL`. - **HTTP listeners** — `LOBBY_PUBLIC_HTTP_*`, `LOBBY_INTERNAL_HTTP_*`. - **Redis connectivity** — `LOBBY_REDIS_MASTER_ADDR`, `LOBBY_REDIS_REPLICA_ADDRS`, `LOBBY_REDIS_PASSWORD`, `LOBBY_REDIS_DB`, `LOBBY_REDIS_OPERATION_TIMEOUT` (legacy `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_TLS_ENABLED`, `LOBBY_REDIS_USERNAME` removed in PG_PLAN.md §6A). - **PostgreSQL connectivity** — `LOBBY_POSTGRES_PRIMARY_DSN`, `LOBBY_POSTGRES_REPLICA_DSNS`, `LOBBY_POSTGRES_OPERATION_TIMEOUT`, `LOBBY_POSTGRES_MAX_OPEN_CONNS`, `LOBBY_POSTGRES_MAX_IDLE_CONNS`, `LOBBY_POSTGRES_CONN_MAX_LIFETIME`. - **Streams** — `LOBBY_GM_EVENTS_STREAM`, `LOBBY_RUNTIME_START_JOBS_STREAM`, `LOBBY_RUNTIME_STOP_JOBS_STREAM`, `LOBBY_RUNTIME_JOB_RESULTS_STREAM`, `LOBBY_NOTIFICATION_INTENTS_STREAM`, `LOBBY_USER_LIFECYCLE_STREAM`. - **Upstream clients** — `LOBBY_USER_SERVICE_TIMEOUT`, `LOBBY_GM_TIMEOUT`. - **Workers** — `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`, `LOBBY_RACE_NAME_EXPIRATION_INTERVAL`, `LOBBY_RACE_NAME_DIRECTORY_BACKEND`. - **Telemetry** — standard `OTEL_*` plus `LOBBY_OTEL_STDOUT_TRACES_ENABLED`, `LOBBY_OTEL_STDOUT_METRICS_ENABLED`. ## Runtime Notes - `Game Lobby` owns platform game state. Game Master may cache snapshots but is not the source of truth. - The Race Name Directory ships a PostgreSQL adapter (default after PG_PLAN.md §6B) and an in-process implementation in `lobby/internal/adapters/racenameinmem/`. The in-memory backend is intended for unit tests and small local deployments and is selected via `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub` (the config token name is preserved for backward compatibility). - A `permanent_block` or `deleted` event from User Service fans out asynchronously through the `user:lifecycle_events` consumer; in-flight games owned by the affected user receive a stop-job and transition to `cancelled` via the `external_block` trigger. - `notification:intents` publishes are best-effort: a failed publish is logged and counted but does not roll back the committed business state.