7.1 KiB
Runtime and Components
The diagram below focuses on the deployed galaxy/lobby process and its
runtime dependencies.
flowchart LR
subgraph Clients
Gateway["Edge Gateway"]
Admin["Admin Service"]
GM["Game Master"]
end
subgraph Lobby["Game Lobby process"]
PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"]
InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"]
EnrollAuto["Enrollment automation worker"]
RTJobsConsumer["runtime:job_results consumer"]
GMEventsConsumer["gm:lobby_events consumer"]
PendingExpirer["Pending registration expirer"]
ULConsumer["user:lifecycle_events consumer"]
IntentPublisher["notification:intents publisher"]
Telemetry["Logs, traces, metrics"]
end
User["User Service"]
Redis["Redis\nKV + Streams"]
Gateway --> PublicHTTP
Admin --> InternalHTTP
GM --> InternalHTTP
PublicHTTP --> User
InternalHTTP --> User
PublicHTTP -. register-runtime .-> GM
InternalHTTP -. register-runtime .-> GM
EnrollAuto --> Redis
RTJobsConsumer --> Redis
GMEventsConsumer --> Redis
PendingExpirer --> Redis
ULConsumer --> Redis
IntentPublisher --> Redis
PublicHTTP --> Redis
InternalHTTP --> Redis
PublicHTTP --> Telemetry
InternalHTTP --> Telemetry
EnrollAuto --> Telemetry
RTJobsConsumer --> Telemetry
GMEventsConsumer --> Telemetry
PendingExpirer --> Telemetry
ULConsumer --> Telemetry
Notes:
cmd/lobbyrefuses startup when Redis connectivity is misconfigured. User Service and Game Master reachability are not verified at boot; transport failures surface as request errors.- Both HTTP listeners expose
/healthzand/readyzindependently so health checks can target either port. register-runtimeis an outgoing call from Lobby to Game Master after the container start completes. Lobby does not expose an inbound endpoint of the same name.
Listeners
| Listener | Default addr | Purpose |
|---|---|---|
| Public HTTP | :8094 |
Authenticated user routes; gateway-facing |
| Internal HTTP | :8095 |
Admin-mirrored routes + Game Master read paths |
Shared listener defaults:
- read-header timeout:
2s - read timeout:
10s - idle timeout:
1m
Public-port routes carry an X-User-ID header injected by Edge Gateway;
internal-port routes admit the admin actor without the header.
Probe routes:
GET /healthzreturns{"status":"ok"}GET /readyzreturns{"status":"ready"}once startup wiring completes.- Neither probe performs a live Redis ping per request.
- There is no
/metricsroute. Metrics flow through OpenTelemetry exporters.
Background Workers
| Worker | Trigger | Function |
|---|---|---|
| Enrollment automation | Periodic tick (LOBBY_ENROLLMENT_AUTOMATION_INTERVAL) |
Closes enrollment when the deadline or the gap window is exhausted. |
runtime:job_results consumer |
Redis XREAD |
Drives starting to running/paused/start_failed based on Runtime Manager outcomes. |
gm:lobby_events consumer |
Redis XREAD |
Applies runtime snapshot updates and game-finish events from Game Master; hands game_finished events off to capability evaluation. |
| Pending registration expirer | Periodic tick (LOBBY_RACE_NAME_EXPIRATION_INTERVAL) |
Releases pending_registration entries past their 30-day window. |
user:lifecycle_events consumer |
Redis XREAD |
Fans out the cascade for permanent_blocked and deleted user events (RND release, membership block, application/invite cancel, owned-game cancel). |
notification:intents publisher |
Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. |
Synchronous Upstream Clients
| Client | Endpoint | Failure mapping |
|---|---|---|
User Service eligibility |
POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility |
Network or non-2xx → 503 service_unavailable; permanent_block → 404 subject_not_found. |
Game Master register-runtime |
POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime |
Network or non-2xx → forced-pause path (paused + lobby.runtime_paused_after_start). |
Game Master liveness probe |
GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz |
Used during lobby.game.resume; failure surfaces as 503 service_unavailable. |
Stream Offsets
Each consumer persists its position under a dedicated key so process restart preserves stream progress.
| Stream | Offset key | Read block timeout env |
|---|---|---|
gm:lobby_events |
lobby:stream_offsets:gm_events |
LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT |
runtime:job_results |
lobby:stream_offsets:runtime_results |
LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT |
user:lifecycle_events |
lobby:stream_offsets:user_lifecycle |
LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT |
Stream lag is exposed through observable gauges
lobby.gm_events.oldest_unprocessed_age_ms,
lobby.runtime_results.oldest_unprocessed_age_ms, and
lobby.user_lifecycle.oldest_unprocessed_age_ms. The probe samples the
oldest entry whose ID is greater than the persisted offset; when a consumer
lags or stalls, the gauge climbs and stays high.
Configuration Groups
The full env-var list with defaults lives in ../README.md §Configuration.
The groups below summarize the structure:
- Required —
LOBBY_REDIS_ADDR,LOBBY_USER_SERVICE_BASE_URL,LOBBY_GM_BASE_URL. - Process and logging —
LOBBY_SHUTDOWN_TIMEOUT,LOBBY_LOG_LEVEL. - HTTP listeners —
LOBBY_PUBLIC_HTTP_*,LOBBY_INTERNAL_HTTP_*. - Redis connectivity —
LOBBY_REDIS_USERNAME,LOBBY_REDIS_PASSWORD,LOBBY_REDIS_DB,LOBBY_REDIS_TLS_ENABLED,LOBBY_REDIS_OPERATION_TIMEOUT. - Streams —
LOBBY_GM_EVENTS_STREAM,LOBBY_RUNTIME_START_JOBS_STREAM,LOBBY_RUNTIME_STOP_JOBS_STREAM,LOBBY_RUNTIME_JOB_RESULTS_STREAM,LOBBY_NOTIFICATION_INTENTS_STREAM,LOBBY_USER_LIFECYCLE_STREAM. - Upstream clients —
LOBBY_USER_SERVICE_TIMEOUT,LOBBY_GM_TIMEOUT. - Workers —
LOBBY_ENROLLMENT_AUTOMATION_INTERVAL,LOBBY_RACE_NAME_EXPIRATION_INTERVAL,LOBBY_RACE_NAME_DIRECTORY_BACKEND. - Telemetry — standard
OTEL_*plusLOBBY_OTEL_STDOUT_TRACES_ENABLED,LOBBY_OTEL_STDOUT_METRICS_ENABLED.
Runtime Notes
Game Lobbyowns platform game state. Game Master may cache snapshots but is not the source of truth.- The Race Name Directory ships a Redis adapter and an in-process stub; the
stub is intended for unit tests and is selected via
LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub. - A
permanent_blockordeletedevent from User Service fans out asynchronously through theuser:lifecycle_eventsconsumer; in-flight games owned by the affected user receive a stop-job and transition tocancelledvia theexternal_blocktrigger. notification:intentspublishes are best-effort: a failed publish is logged and counted but does not roll back the committed business state.