Files
galaxy-game/lobby/docs/runtime.md
T
2026-04-25 23:20:55 +02:00

7.1 KiB

Runtime and Components

The diagram below focuses on the deployed galaxy/lobby process and its runtime dependencies.

flowchart LR
    subgraph Clients
        Gateway["Edge Gateway"]
        Admin["Admin Service"]
        GM["Game Master"]
    end

    subgraph Lobby["Game Lobby process"]
        PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"]
        InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"]
        EnrollAuto["Enrollment automation worker"]
        RTJobsConsumer["runtime:job_results consumer"]
        GMEventsConsumer["gm:lobby_events consumer"]
        PendingExpirer["Pending registration expirer"]
        ULConsumer["user:lifecycle_events consumer"]
        IntentPublisher["notification:intents publisher"]
        Telemetry["Logs, traces, metrics"]
    end

    User["User Service"]
    Redis["Redis\nKV + Streams"]

    Gateway --> PublicHTTP
    Admin --> InternalHTTP
    GM --> InternalHTTP

    PublicHTTP --> User
    InternalHTTP --> User
    PublicHTTP -. register-runtime .-> GM
    InternalHTTP -. register-runtime .-> GM

    EnrollAuto --> Redis
    RTJobsConsumer --> Redis
    GMEventsConsumer --> Redis
    PendingExpirer --> Redis
    ULConsumer --> Redis
    IntentPublisher --> Redis

    PublicHTTP --> Redis
    InternalHTTP --> Redis

    PublicHTTP --> Telemetry
    InternalHTTP --> Telemetry
    EnrollAuto --> Telemetry
    RTJobsConsumer --> Telemetry
    GMEventsConsumer --> Telemetry
    PendingExpirer --> Telemetry
    ULConsumer --> Telemetry

Notes:

  • cmd/lobby refuses startup when Redis connectivity is misconfigured. User Service and Game Master reachability are not verified at boot; transport failures surface as request errors.
  • Both HTTP listeners expose /healthz and /readyz independently so health checks can target either port.
  • register-runtime is an outgoing call from Lobby to Game Master after the container start completes. Lobby does not expose an inbound endpoint of the same name.

Listeners

Listener Default addr Purpose
Public HTTP :8094 Authenticated user routes; gateway-facing
Internal HTTP :8095 Admin-mirrored routes + Game Master read paths

Shared listener defaults:

  • read-header timeout: 2s
  • read timeout: 10s
  • idle timeout: 1m

Public-port routes carry an X-User-ID header injected by Edge Gateway; internal-port routes admit the admin actor without the header.

Probe routes:

  • GET /healthz returns {"status":"ok"}
  • GET /readyz returns {"status":"ready"} once startup wiring completes.
  • Neither probe performs a live Redis ping per request.
  • There is no /metrics route. Metrics flow through OpenTelemetry exporters.

Background Workers

Worker Trigger Function
Enrollment automation Periodic tick (LOBBY_ENROLLMENT_AUTOMATION_INTERVAL) Closes enrollment when the deadline or the gap window is exhausted.
runtime:job_results consumer Redis XREAD Drives starting to running/paused/start_failed based on Runtime Manager outcomes.
gm:lobby_events consumer Redis XREAD Applies runtime snapshot updates and game-finish events from Game Master; hands game_finished events off to capability evaluation.
Pending registration expirer Periodic tick (LOBBY_RACE_NAME_EXPIRATION_INTERVAL) Releases pending_registration entries past their 30-day window.
user:lifecycle_events consumer Redis XREAD Fans out the cascade for permanent_blocked and deleted user events (RND release, membership block, application/invite cancel, owned-game cancel).
notification:intents publisher Synchronous from services Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state.

Synchronous Upstream Clients

Client Endpoint Failure mapping
User Service eligibility POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility Network or non-2xx → 503 service_unavailable; permanent_block404 subject_not_found.
Game Master register-runtime POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime Network or non-2xx → forced-pause path (paused + lobby.runtime_paused_after_start).
Game Master liveness probe GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz Used during lobby.game.resume; failure surfaces as 503 service_unavailable.

Stream Offsets

Each consumer persists its position under a dedicated key so process restart preserves stream progress.

Stream Offset key Read block timeout env
gm:lobby_events lobby:stream_offsets:gm_events LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT
runtime:job_results lobby:stream_offsets:runtime_results LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT
user:lifecycle_events lobby:stream_offsets:user_lifecycle LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT

Stream lag is exposed through observable gauges lobby.gm_events.oldest_unprocessed_age_ms, lobby.runtime_results.oldest_unprocessed_age_ms, and lobby.user_lifecycle.oldest_unprocessed_age_ms. The probe samples the oldest entry whose ID is greater than the persisted offset; when a consumer lags or stalls, the gauge climbs and stays high.

Configuration Groups

The full env-var list with defaults lives in ../README.md §Configuration. The groups below summarize the structure:

  • RequiredLOBBY_REDIS_ADDR, LOBBY_USER_SERVICE_BASE_URL, LOBBY_GM_BASE_URL.
  • Process and loggingLOBBY_SHUTDOWN_TIMEOUT, LOBBY_LOG_LEVEL.
  • HTTP listenersLOBBY_PUBLIC_HTTP_*, LOBBY_INTERNAL_HTTP_*.
  • Redis connectivityLOBBY_REDIS_USERNAME, LOBBY_REDIS_PASSWORD, LOBBY_REDIS_DB, LOBBY_REDIS_TLS_ENABLED, LOBBY_REDIS_OPERATION_TIMEOUT.
  • StreamsLOBBY_GM_EVENTS_STREAM, LOBBY_RUNTIME_START_JOBS_STREAM, LOBBY_RUNTIME_STOP_JOBS_STREAM, LOBBY_RUNTIME_JOB_RESULTS_STREAM, LOBBY_NOTIFICATION_INTENTS_STREAM, LOBBY_USER_LIFECYCLE_STREAM.
  • Upstream clientsLOBBY_USER_SERVICE_TIMEOUT, LOBBY_GM_TIMEOUT.
  • WorkersLOBBY_ENROLLMENT_AUTOMATION_INTERVAL, LOBBY_RACE_NAME_EXPIRATION_INTERVAL, LOBBY_RACE_NAME_DIRECTORY_BACKEND.
  • Telemetry — standard OTEL_* plus LOBBY_OTEL_STDOUT_TRACES_ENABLED, LOBBY_OTEL_STDOUT_METRICS_ENABLED.

Runtime Notes

  • Game Lobby owns platform game state. Game Master may cache snapshots but is not the source of truth.
  • The Race Name Directory ships a Redis adapter and an in-process stub; the stub is intended for unit tests and is selected via LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub.
  • A permanent_block or deleted event from User Service fans out asynchronously through the user:lifecycle_events consumer; in-flight games owned by the affected user receive a stop-job and transition to cancelled via the external_block trigger.
  • notification:intents publishes are best-effort: a failed publish is logged and counted but does not roll back the committed business state.