feat: game lobby service
This commit is contained in:
@@ -0,0 +1,163 @@
|
||||
# Runtime and Components
|
||||
|
||||
The diagram below focuses on the deployed `galaxy/lobby` process and its
|
||||
runtime dependencies.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Clients
|
||||
Gateway["Edge Gateway"]
|
||||
Admin["Admin Service"]
|
||||
GM["Game Master"]
|
||||
end
|
||||
|
||||
subgraph Lobby["Game Lobby process"]
|
||||
PublicHTTP["Public HTTP listener\n:8094 /healthz /readyz"]
|
||||
InternalHTTP["Internal HTTP listener\n:8095 /healthz /readyz"]
|
||||
EnrollAuto["Enrollment automation worker"]
|
||||
RTJobsConsumer["runtime:job_results consumer"]
|
||||
GMEventsConsumer["gm:lobby_events consumer"]
|
||||
PendingExpirer["Pending registration expirer"]
|
||||
ULConsumer["user:lifecycle_events consumer"]
|
||||
IntentPublisher["notification:intents publisher"]
|
||||
Telemetry["Logs, traces, metrics"]
|
||||
end
|
||||
|
||||
User["User Service"]
|
||||
Redis["Redis\nKV + Streams"]
|
||||
|
||||
Gateway --> PublicHTTP
|
||||
Admin --> InternalHTTP
|
||||
GM --> InternalHTTP
|
||||
|
||||
PublicHTTP --> User
|
||||
InternalHTTP --> User
|
||||
PublicHTTP -. register-runtime .-> GM
|
||||
InternalHTTP -. register-runtime .-> GM
|
||||
|
||||
EnrollAuto --> Redis
|
||||
RTJobsConsumer --> Redis
|
||||
GMEventsConsumer --> Redis
|
||||
PendingExpirer --> Redis
|
||||
ULConsumer --> Redis
|
||||
IntentPublisher --> Redis
|
||||
|
||||
PublicHTTP --> Redis
|
||||
InternalHTTP --> Redis
|
||||
|
||||
PublicHTTP --> Telemetry
|
||||
InternalHTTP --> Telemetry
|
||||
EnrollAuto --> Telemetry
|
||||
RTJobsConsumer --> Telemetry
|
||||
GMEventsConsumer --> Telemetry
|
||||
PendingExpirer --> Telemetry
|
||||
ULConsumer --> Telemetry
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `cmd/lobby` refuses startup when Redis connectivity is misconfigured. User
|
||||
Service and Game Master reachability are not verified at boot; transport
|
||||
failures surface as request errors.
|
||||
- Both HTTP listeners expose `/healthz` and `/readyz` independently so health
|
||||
checks can target either port.
|
||||
- `register-runtime` is an outgoing call from Lobby to Game Master after the
|
||||
container start completes. Lobby does not expose an inbound endpoint of the
|
||||
same name.
|
||||
|
||||
## Listeners
|
||||
|
||||
| Listener | Default addr | Purpose |
|
||||
| --- | --- | --- |
|
||||
| Public HTTP | `:8094` | Authenticated user routes; gateway-facing |
|
||||
| Internal HTTP | `:8095` | Admin-mirrored routes + Game Master read paths |
|
||||
|
||||
Shared listener defaults:
|
||||
|
||||
- read-header timeout: `2s`
|
||||
- read timeout: `10s`
|
||||
- idle timeout: `1m`
|
||||
|
||||
Public-port routes carry an `X-User-ID` header injected by Edge Gateway;
|
||||
internal-port routes admit the admin actor without the header.
|
||||
|
||||
Probe routes:
|
||||
|
||||
- `GET /healthz` returns `{"status":"ok"}`
|
||||
- `GET /readyz` returns `{"status":"ready"}` once startup wiring completes.
|
||||
- Neither probe performs a live Redis ping per request.
|
||||
- There is no `/metrics` route. Metrics flow through OpenTelemetry exporters.
|
||||
|
||||
## Background Workers
|
||||
|
||||
| Worker | Trigger | Function |
|
||||
| --- | --- | --- |
|
||||
| Enrollment automation | Periodic tick (`LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`) | Closes enrollment when the deadline or the gap window is exhausted. |
|
||||
| `runtime:job_results` consumer | Redis `XREAD` | Drives `starting` to `running`/`paused`/`start_failed` based on Runtime Manager outcomes. |
|
||||
| `gm:lobby_events` consumer | Redis `XREAD` | Applies runtime snapshot updates and game-finish events from Game Master; hands `game_finished` events off to capability evaluation. |
|
||||
| Pending registration expirer | Periodic tick (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) | Releases `pending_registration` entries past their 30-day window. |
|
||||
| `user:lifecycle_events` consumer | Redis `XREAD` | Fans out the cascade for `permanent_blocked` and `deleted` user events (RND release, membership block, application/invite cancel, owned-game cancel). |
|
||||
| `notification:intents` publisher | Synchronous from services | Wraps every notification publish with metric instrumentation; producer-side failures degrade notifications without rolling back business state. |
|
||||
|
||||
## Synchronous Upstream Clients
|
||||
|
||||
| Client | Endpoint | Failure mapping |
|
||||
| --- | --- | --- |
|
||||
| `User Service` eligibility | `POST {LOBBY_USER_SERVICE_BASE_URL}/api/v1/internal/users/{user_id}/lobby-eligibility` | Network or non-2xx → `503 service_unavailable`; `permanent_block` → `404 subject_not_found`. |
|
||||
| `Game Master` register-runtime | `POST {LOBBY_GM_BASE_URL}/api/v1/internal/games/{game_id}/register-runtime` | Network or non-2xx → forced-pause path (`paused` + `lobby.runtime_paused_after_start`). |
|
||||
| `Game Master` liveness probe | `GET {LOBBY_GM_BASE_URL}/api/v1/internal/healthz` | Used during `lobby.game.resume`; failure surfaces as `503 service_unavailable`. |
|
||||
|
||||
## Stream Offsets
|
||||
|
||||
Each consumer persists its position under a dedicated key so process restart
|
||||
preserves stream progress.
|
||||
|
||||
| Stream | Offset key | Read block timeout env |
|
||||
| --- | --- | --- |
|
||||
| `gm:lobby_events` | `lobby:stream_offsets:gm_events` | `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` |
|
||||
| `runtime:job_results` | `lobby:stream_offsets:runtime_results` | `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` |
|
||||
| `user:lifecycle_events` | `lobby:stream_offsets:user_lifecycle` | `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` |
|
||||
|
||||
Stream lag is exposed through observable gauges
|
||||
`lobby.gm_events.oldest_unprocessed_age_ms`,
|
||||
`lobby.runtime_results.oldest_unprocessed_age_ms`, and
|
||||
`lobby.user_lifecycle.oldest_unprocessed_age_ms`. The probe samples the
|
||||
oldest entry whose ID is greater than the persisted offset; when a consumer
|
||||
lags or stalls, the gauge climbs and stays high.
|
||||
|
||||
## Configuration Groups
|
||||
|
||||
The full env-var list with defaults lives in `../README.md` §Configuration.
|
||||
The groups below summarize the structure:
|
||||
|
||||
- **Required** — `LOBBY_REDIS_ADDR`, `LOBBY_USER_SERVICE_BASE_URL`,
|
||||
`LOBBY_GM_BASE_URL`.
|
||||
- **Process and logging** — `LOBBY_SHUTDOWN_TIMEOUT`, `LOBBY_LOG_LEVEL`.
|
||||
- **HTTP listeners** — `LOBBY_PUBLIC_HTTP_*`, `LOBBY_INTERNAL_HTTP_*`.
|
||||
- **Redis connectivity** — `LOBBY_REDIS_USERNAME`, `LOBBY_REDIS_PASSWORD`,
|
||||
`LOBBY_REDIS_DB`, `LOBBY_REDIS_TLS_ENABLED`,
|
||||
`LOBBY_REDIS_OPERATION_TIMEOUT`.
|
||||
- **Streams** — `LOBBY_GM_EVENTS_STREAM`, `LOBBY_RUNTIME_START_JOBS_STREAM`,
|
||||
`LOBBY_RUNTIME_STOP_JOBS_STREAM`, `LOBBY_RUNTIME_JOB_RESULTS_STREAM`,
|
||||
`LOBBY_NOTIFICATION_INTENTS_STREAM`, `LOBBY_USER_LIFECYCLE_STREAM`.
|
||||
- **Upstream clients** — `LOBBY_USER_SERVICE_TIMEOUT`, `LOBBY_GM_TIMEOUT`.
|
||||
- **Workers** — `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL`,
|
||||
`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`,
|
||||
`LOBBY_RACE_NAME_DIRECTORY_BACKEND`.
|
||||
- **Telemetry** — standard `OTEL_*` plus
|
||||
`LOBBY_OTEL_STDOUT_TRACES_ENABLED`,
|
||||
`LOBBY_OTEL_STDOUT_METRICS_ENABLED`.
|
||||
|
||||
## Runtime Notes
|
||||
|
||||
- `Game Lobby` owns platform game state. Game Master may cache snapshots but
|
||||
is not the source of truth.
|
||||
- The Race Name Directory ships a Redis adapter and an in-process stub; the
|
||||
stub is intended for unit tests and is selected via
|
||||
`LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`.
|
||||
- A `permanent_block` or `deleted` event from User Service fans out
|
||||
asynchronously through the `user:lifecycle_events` consumer; in-flight
|
||||
games owned by the affected user receive a stop-job and transition to
|
||||
`cancelled` via the `external_block` trigger.
|
||||
- `notification:intents` publishes are best-effort: a failed publish is
|
||||
logged and counted but does not roll back the committed business state.
|
||||
Reference in New Issue
Block a user