7.6 KiB
Runtime and Components
The diagram below focuses on the deployed galaxy/lobby process and its
runtime dependencies.
flowchart LR
subgraph Clients
Gateway["Edge Gateway"]
Admin["Admin Service"]
GM["Game Master"]
end
subgraph Lobby["Game Lobby process"]
PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"]
InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"]
EnrollAuto["Enrollment automation worker"]
RTJobsConsumer["runtime:job_results consumer"]
GMEventsConsumer["gm:lobby_events consumer"]
PendingExpirer["Pending registration expirer"]
ULConsumer["user:lifecycle_events consumer"]
IntentPublisher["notification:intents publisher"]
Telemetry["Logs, traces, metrics"]
end
User["User Service"]
Redis["Redis\nKV + Streams"]
Gateway --> PublicHTTP
Admin --> InternalHTTP
GM --> InternalHTTP
PublicHTTP --> User
InternalHTTP --> User
PublicHTTP -. register-runtime .-> GM
InternalHTTP -. register-runtime .-> GM
EnrollAuto --> Redis
RTJobsConsumer --> Redis
GMEventsConsumer --> Redis
PendingExpirer --> Redis
ULConsumer --> Redis
IntentPublisher --> Redis
PublicHTTP --> Redis
InternalHTTP --> Redis
PublicHTTP --> Telemetry
InternalHTTP --> Telemetry
EnrollAuto --> Telemetry
RTJobsConsumer --> Telemetry
GMEventsConsumer --> Telemetry
PendingExpirer --> Telemetry
ULConsumer --> Telemetry
Notes:
cmd/lobbyrefuses startup when Redis connectivity is misconfigured, when PostgreSQL is unreachable, or when the embedded goose migrations fail to apply. User Service and Game Master reachability are not verified at boot; transport failures surface as request errors.- Both HTTP listeners expose
/healthzand/readyzindependently so health checks can target either port. register-runtimeis an outgoing call from Lobby to Game Master after the container start completes. Lobby does not expose an inbound endpoint of the same name.
Listeners
| Listener | Default addr | Purpose |
|---|---|---|
| Public HTTP | :8094 |
Authenticated user routes; gateway-facing |
| Internal HTTP | :8095 |
Admin-mirrored routes + Game Master read paths |
Shared listener defaults:
- read-header timeout:
2s - read timeout:
10s - idle timeout:
1m
Public-port routes carry an X-User-ID header injected by Edge Gateway;
internal-port routes admit the admin actor without the header.
Probe routes:
GET /healthzreturns{"status":"ok"}GET /readyzreturns{"status":"ready"}once startup wiring completes.- Neither probe performs a live Redis or PostgreSQL ping per request.
- There is no
/metricsroute. Metrics flow through OpenTelemetry exporters.
Background Workers
| Worker | Trigger | Function |
|---|---|---|
| Enrollment automation | Periodic tick (LOBBY_ENROLLMENT_AUTOMATION_INTERVAL) |
Closes enrollment when the deadline or the gap window is exhausted. |
runtime:job_results consumer |
Redis XREAD |
Drives starting to running/paused/start_failed based on Runtime Manager outcomes. |
gm:lobby_events consumer |
Redis XREAD |
Applies runtime snapshot updates and game-finish events from Game Master; hands game_finished events off to capability evaluation. |
| Pending registration expirer | Periodic tick (LOBBY_RACE_NAME_EXPIRATION_INTERVAL) |
Releases pending_registration entries past their 30-day window. |
user:lifecycle_events consumer |
Redis XREAD |
Fans out the cascade for permanent_blocked and deleted user events (RND release, membership block, application/invite cancel, owned-game cancel). |
notification:intents publisher |
Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. |
Synchronous Upstream Clients
| Client | Endpoint | Failure mapping |
|---|---|---|
User Service eligibility |
POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility |
Network or non-2xx → 503 service_unavailable; permanent_block → 404 subject_not_found. |
Game Master register-runtime |
POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime |
Network or non-2xx → forced-pause path (paused + lobby.runtime_paused_after_start). |
Game Master liveness probe |
GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz |
Used during lobby.game.resume; failure surfaces as 503 service_unavailable. |
Stream Offsets
Each consumer persists its position under a dedicated key so process restart preserves stream progress.
| Stream | Offset key | Read block timeout env |
|---|---|---|
gm:lobby_events |
lobby:stream_offsets:gm_events |
LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT |
runtime:job_results |
lobby:stream_offsets:runtime_results |
LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT |
user:lifecycle_events |
lobby:stream_offsets:user_lifecycle |
LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT |
Stream lag is exposed through observable gauges
lobby.gm_events.oldest_unprocessed_age_ms,
lobby.runtime_results.oldest_unprocessed_age_ms, and
lobby.user_lifecycle.oldest_unprocessed_age_ms. The probe samples the
oldest entry whose ID is greater than the persisted offset; when a consumer
lags or stalls, the gauge climbs and stays high.
Configuration Groups
The full env-var list with defaults lives in ../README.md §Configuration.
The groups below summarize the structure:
- Required —
LOBBY_REDIS_MASTER_ADDR,LOBBY_REDIS_PASSWORD,LOBBY_POSTGRES_PRIMARY_DSN,LOBBY_USER_SERVICE_BASE_URL,LOBBY_GM_BASE_URL. - Process and logging —
LOBBY_SHUTDOWN_TIMEOUT,LOBBY_LOG_LEVEL. - HTTP listeners —
LOBBY_PUBLIC_HTTP_*,LOBBY_INTERNAL_HTTP_*. - Redis connectivity —
LOBBY_REDIS_MASTER_ADDR,LOBBY_REDIS_REPLICA_ADDRS,LOBBY_REDIS_PASSWORD,LOBBY_REDIS_DB,LOBBY_REDIS_OPERATION_TIMEOUT(legacyLOBBY_REDIS_ADDR,LOBBY_REDIS_TLS_ENABLED,LOBBY_REDIS_USERNAMEremoved in PG_PLAN.md §6A). - PostgreSQL connectivity —
LOBBY_POSTGRES_PRIMARY_DSN,LOBBY_POSTGRES_REPLICA_DSNS,LOBBY_POSTGRES_OPERATION_TIMEOUT,LOBBY_POSTGRES_MAX_OPEN_CONNS,LOBBY_POSTGRES_MAX_IDLE_CONNS,LOBBY_POSTGRES_CONN_MAX_LIFETIME. - Streams —
LOBBY_GM_EVENTS_STREAM,LOBBY_RUNTIME_START_JOBS_STREAM,LOBBY_RUNTIME_STOP_JOBS_STREAM,LOBBY_RUNTIME_JOB_RESULTS_STREAM,LOBBY_NOTIFICATION_INTENTS_STREAM,LOBBY_USER_LIFECYCLE_STREAM. - Upstream clients —
LOBBY_USER_SERVICE_TIMEOUT,LOBBY_GM_TIMEOUT. - Workers —
LOBBY_ENROLLMENT_AUTOMATION_INTERVAL,LOBBY_RACE_NAME_EXPIRATION_INTERVAL,LOBBY_RACE_NAME_DIRECTORY_BACKEND. - Telemetry — standard
OTEL_*plusLOBBY_OTEL_STDOUT_TRACES_ENABLED,LOBBY_OTEL_STDOUT_METRICS_ENABLED.
Runtime Notes
Game Lobbyowns platform game state. Game Master may cache snapshots but is not the source of truth.- The Race Name Directory ships a PostgreSQL adapter (default after
PG_PLAN.md §6B) and an in-process stub. The stub is intended for unit
tests and is selected via
LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub. - A
permanent_blockordeletedevent from User Service fans out asynchronously through theuser:lifecycle_eventsconsumer; in-flight games owned by the affected user receive a stop-job and transition tocancelledvia theexternal_blocktrigger. notification:intentspublishes are best-effort: a failed publish is logged and counted but does not roll back the committed business state.