# Runtime and Components The diagram below focuses on the deployed `galaxy/lobby` process and its runtime dependencies. ```mermaid flowchart LR subgraph Clients Gateway["Edge Gateway"] Admin["Admin Service"] GM["Game Master"] end subgraph Lobby["Game Lobby process"] PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"] InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"] EnrollAuto["Enrollment automation worker"] RTJobsConsumer["runtime:job_results consumer"] GMEventsConsumer["gm:lobby_events consumer"] PendingExpirer["Pending registration expirer"] ULConsumer["user:lifecycle_events consumer"] IntentPublisher["notification:intents publisher"] Telemetry["Logs, traces, metrics"] end User["User Service"] Redis["Redis\nKV + Streams"] Gateway --> PublicHTTP Admin --> InternalHTTP GM --> InternalHTTP PublicHTTP --> User InternalHTTP --> User PublicHTTP -. register-runtime .-> GM InternalHTTP -. register-runtime .-> GM EnrollAuto --> Redis RTJobsConsumer --> Redis GMEventsConsumer --> Redis PendingExpirer --> Redis ULConsumer --> Redis IntentPublisher --> Redis PublicHTTP --> Redis InternalHTTP --> Redis PublicHTTP --> Telemetry InternalHTTP --> Telemetry EnrollAuto --> Telemetry RTJobsConsumer --> Telemetry GMEventsConsumer --> Telemetry PendingExpirer --> Telemetry ULConsumer --> Telemetry ``` Notes: - `cmd/lobby` refuses startup when Redis connectivity is misconfigured. User Service and Game Master reachability are not verified at boot; transport failures surface as request errors. - Both HTTP listeners expose `/healthz` and `/readyz` independently so health checks can target either port. - `register-runtime` is an outgoing call from Lobby to Game Master after the container start completes. Lobby does not expose an inbound endpoint of the same name. ## Listeners | Listener | Default addr | Purpose | | --- | --- | --- | | Public HTTP | `:8094` | Authenticated user routes; gateway-facing | | Internal HTTP | `:8095` | Admin-mirrored routes + Game Master read paths | Shared listener defaults: - read-header timeout: `2s` - read timeout: `10s` - idle timeout: `1m` Public-port routes carry an `X-User-ID` header injected by Edge Gateway; internal-port routes admit the admin actor without the header. Probe routes: - `GET /healthz` returns `{"status":"ok"}` - `GET /readyz` returns `{"status":"ready"}` once startup wiring completes. - Neither probe performs a live Redis ping per request. - There is no `/metrics` route. Metrics flow through OpenTelemetry exporters. ## Background Workers | Worker | Trigger | Function | | --- | --- | --- | | Enrollment automation | Periodic tick (`LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`) | Closes enrollment when the deadline or the gap window is exhausted. | | `runtime:job_results` consumer | Redis `XREAD` | Drives `starting` to `running`/`paused`/`start_failed` based on Runtime Manager outcomes. | | `gm:lobby_events` consumer | Redis `XREAD` | Applies runtime snapshot updates and game-finish events from Game Master; hands `game_finished` events off to capability evaluation. | | Pending registration expirer | Periodic tick (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) | Releases `pending_registration` entries past their 30-day window. | | `user:lifecycle_events` consumer | Redis `XREAD` | Fans out the cascade for `permanent_blocked` and `deleted` user events (RND release, membership block, application/invite cancel, owned-game cancel). | | `notification:intents` publisher | Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. | ## Synchronous Upstream Clients | Client | Endpoint | Failure mapping | | --- | --- | --- | | `User Service` eligibility | `POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility` | Network or non-2xx → `503 service_unavailable`; `permanent_block` → `404 subject_not_found`. | | `Game Master` register-runtime | `POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime` | Network or non-2xx → forced-pause path (`paused` + `lobby.runtime_paused_after_start`). | | `Game Master` liveness probe | `GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz` | Used during `lobby.game.resume`; failure surfaces as `503 service_unavailable`. | ## Stream Offsets Each consumer persists its position under a dedicated key so process restart preserves stream progress. | Stream | Offset key | Read block timeout env | | --- | --- | --- | | `gm:lobby_events` | `lobby:stream_offsets:gm_events` | `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` | | `runtime:job_results` | `lobby:stream_offsets:runtime_results` | `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` | | `user:lifecycle_events` | `lobby:stream_offsets:user_lifecycle` | `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` | Stream lag is exposed through observable gauges `lobby.gm_events.oldest_unprocessed_age_ms`, `lobby.runtime_results.oldest_unprocessed_age_ms`, and `lobby.user_lifecycle.oldest_unprocessed_age_ms`. The probe samples the oldest entry whose ID is greater than the persisted offset; when a consumer lags or stalls, the gauge climbs and stays high. ## Configuration Groups The full env-var list with defaults lives in `../README.md` §Configuration. The groups below summarize the structure: - **Required** — `LOBBY_REDIS_ADDR`, `LOBBY_USER_SERVICE_BASE_URL`, `LOBBY_GM_BASE_URL`. - **Process and logging** — `LOBBY_SHUTDOWN_TIMEOUT`, `LOBBY_LOG_LEVEL`. - **HTTP listeners** — `LOBBY_PUBLIC_HTTP_*`, `LOBBY_INTERNAL_HTTP_*`. - **Redis connectivity** — `LOBBY_REDIS_USERNAME`, `LOBBY_REDIS_PASSWORD`, `LOBBY_REDIS_DB`, `LOBBY_REDIS_TLS_ENABLED`, `LOBBY_REDIS_OPERATION_TIMEOUT`. - **Streams** — `LOBBY_GM_EVENTS_STREAM`, `LOBBY_RUNTIME_START_JOBS_STREAM`, `LOBBY_RUNTIME_STOP_JOBS_STREAM`, `LOBBY_RUNTIME_JOB_RESULTS_STREAM`, `LOBBY_NOTIFICATION_INTENTS_STREAM`, `LOBBY_USER_LIFECYCLE_STREAM`. - **Upstream clients** — `LOBBY_USER_SERVICE_TIMEOUT`, `LOBBY_GM_TIMEOUT`. - **Workers** — `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`, `LOBBY_RACE_NAME_EXPIRATION_INTERVAL`, `LOBBY_RACE_NAME_DIRECTORY_BACKEND`. - **Telemetry** — standard `OTEL_*` plus `LOBBY_OTEL_STDOUT_TRACES_ENABLED`, `LOBBY_OTEL_STDOUT_METRICS_ENABLED`. ## Runtime Notes - `Game Lobby` owns platform game state. Game Master may cache snapshots but is not the source of truth. - The Race Name Directory ships a Redis adapter and an in-process stub; the stub is intended for unit tests and is selected via `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`. - A `permanent_block` or `deleted` event from User Service fans out asynchronously through the `user:lifecycle_events` consumer; in-flight games owned by the affected user receive a stop-job and transition to `cancelled` via the `external_block` trigger. - `notification:intents` publishes are best-effort: a failed publish is logged and counted but does not roll back the committed business state.