# Game Lobby Service `galaxy/lobby` owns platform-level metadata and lifecycle of game sessions. ## References - [Public REST contract](api/public-openapi.yaml) - [Internal REST contract](api/internal-openapi.yaml) - [System architecture](../ARCHITECTURE.md) - [Notification catalog](../notification/README.md) - [User Service lobby eligibility](../user/README.md) - [Service-local docs](docs/) ## Purpose `Game Lobby Service` is the platform source of truth for game sessions as platform entities — from creation through enrollment, start, runtime tracking, and finish. It mediates all player participation actions and maintains the roster state that `Game Master` may cache for runtime authorization. ## Scope `Game Lobby` is the source of truth for: - opaque stable game identifiers in `game-*` form - game metadata: name, description, type, owner, schedule, engine version - platform-level game status from `draft` through `finished` or `cancelled` - enrollment configuration: `min_players`, `max_players`, `start_gap_hours`, `start_gap_players`, `enrollment_ends_at` - applications and their approval or rejection status (public games) - user-bound invitations and their lifecycle (private games) - platform membership roster and participant status - Race Name Directory state across all regular platform users: registered race names (permanent ownership), per-game reservations, and 30-day pending-registration windows - per-game per-user `player_turn_stats` aggregate used at game finish for capability evaluation - denormalized runtime snapshot imported from `Game Master` - user-facing lists: active games, pending applications, open invitations `Game Lobby` is not the source of truth for: - platform user identity or profile — owned by `User Service` - device sessions or authentication state — owned by `Auth / Session Service` - runtime container lifecycle or technical health — owned by `Runtime Manager` - current turn, generation state, engine reachability — owned by `Game Master` - full per-player game state — owned by the game engine container - player-to-engine UUID mapping — owned by `Game Master` ## Non-Goals - `Game Lobby` does not call game engine containers directly; all engine interaction goes through `Game Master`. - `Game Lobby` owns the Race Name Directory data in v1 (Redis adapter); the contract is kept behind a port interface so a future dedicated `Race Name Service` can replace the adapter without domain changes. - `Game Lobby` does not compute notification audiences from roster data at delivery time; notification intents carry explicit `recipient_user_id` values. - `Game Lobby` does not apply sanctions or session-level access control; `User Service` and `Auth / Session Service` remain authoritative for those. - `Game Lobby` does not own billing or entitlement decisions; it reads the current entitlement snapshot from `User Service`. ## Position in the System ```mermaid flowchart LR Gateway["Edge Gateway"] Lobby["Game Lobby Service"] User["User Service"] GM["Game Master"] Runtime["Runtime Manager"] Notify["Notification Service"] Redis["Redis\nKV + Streams"] Gateway --> Lobby Lobby --> User Lobby --> GM Lobby --> Redis Lobby --> Notify GM --> Redis Redis --> Lobby Runtime --> Redis ``` `Gateway` routes authenticated platform-level commands to `Lobby` over trusted REST. `Lobby` reads user eligibility from `User Service` synchronously. `Lobby` registers running games with `Game Master` synchronously at start. `Lobby` submits start jobs to `Runtime Manager` and reads job results from a dedicated Redis Stream. `Game Master` publishes runtime events to a dedicated Redis Stream that `Lobby` consumes asynchronously. `Lobby` publishes notification intents to `notification:intents`. ## Responsibility Boundaries `Game Lobby` is responsible for: - accepting and validating game creation and configuration commands - opening and managing enrollment for public and private games - validating user eligibility before accepting applications and invite redeems - checking race name availability through the Race Name Directory port - enforcing enrollment deadline and roster-size auto-transitions - orchestrating the game start sequence with `Runtime Manager` and `Game Master` - persisting game metadata atomically and removing orphaned containers when metadata persistence fails - maintaining the denormalized runtime snapshot for user-facing reads - emitting notification intents for all participant lifecycle events - enforcing visibility rules: private games are visible only to owner and members `Game Lobby` is not responsible for: - verifying authenticated transport signatures — handled by `Edge Gateway` - checking session revocation state — handled by `Edge Gateway` and `Auth` - email delivery — handled by `Mail Service` - push delivery — handled by `Notification Service` and `Edge Gateway` - container start and stop mechanics — handled by `Runtime Manager` - per-turn player command routing — handled by `Game Master` ## Runtime Surface The service starts two HTTP listeners and one Redis Stream consumer pipeline. ### Listeners - public authenticated REST on `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094` - internal trusted REST on `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095` ### Background workers - enrollment automation ticker — checks enrollment deadlines and roster thresholds at a configurable interval - Runtime Manager result consumer — reads start-job results from a Redis Stream - Game Master event consumer — reads runtime snapshot updates and game-finish events from a dedicated Redis Stream ### Startup dependencies - one reachable Redis deployment at `LOBBY_REDIS_MASTER_ADDR` (mandatory password via `LOBBY_REDIS_PASSWORD`; replicas optional via `LOBBY_REDIS_REPLICA_ADDRS`). Used for streams, race-name directory, per-game runtime aggregates, and stream offsets. - one reachable PostgreSQL primary at `LOBBY_POSTGRES_PRIMARY_DSN` (DSN must include `search_path=lobby&sslmode=disable`). Embedded goose migrations apply at startup before any listener opens; on migration or ping failure the service exits non-zero. The four core enrollment entities (game / application / invite / membership) live here after PG_PLAN.md §6A; `docs/postgres-migration.md` is the decision record. - `User Service` reachable at `LOBBY_USER_SERVICE_BASE_URL` (startup check only; runtime failures are surfaced as request errors, not boot failures) - `Game Master` at `LOBBY_GM_BASE_URL` (same policy — startup check omitted; unreachability at image-ref resolve fails `lobby.game.start` with `service_unavailable`, unreachability at register-runtime triggers the forced-pause path) ### Probes - `GET /healthz` on both ports returns `{"status":"ok"}` - `GET /readyz` on both ports returns `{"status":"ready"}` after successful startup; no live Redis or PostgreSQL ping per request ## Game Record Model ### Fields | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | opaque, stable, `game-*` form | | `game_name` | string | human-readable; mutable in `draft` | | `description` | string | optional; mutable in `draft` and `enrollment_open` | | `game_type` | enum | `public` or `private` | | `owner_user_id` | string | private games only; empty for public | | `status` | enum | see status table below | | `min_players` | int | minimum approved participants to proceed to start | | `max_players` | int | target roster size that activates the gap window | | `start_gap_hours` | int | hours of gap window after `max_players` is reached | | `start_gap_players` | int | additional participants admitted during the gap | | `enrollment_ends_at` | int64 | UTC Unix seconds; deadline for automatic enrollment close | | `turn_schedule` | string | cron expression, e.g. `0 18 * * *`; passed to GM at registration | | `target_engine_version` | string | semver of the engine to launch; passed to GM at registration | | `created_at` | int64 | UTC Unix milliseconds | | `updated_at` | int64 | UTC Unix milliseconds | | `started_at` | int64 | UTC Unix milliseconds; set when status becomes `running` | | `finished_at` | int64 | UTC Unix milliseconds; set when status becomes `finished` | | `current_turn` | int | denormalized from GM; zero until running | | `runtime_status` | string | denormalized from GM; empty until running | | `engine_health_summary` | string | denormalized from GM; empty until running | | `runtime_binding` | object? | non-null after successful container start; contains `container_id`, `engine_endpoint`, `runtime_job_id`, `bound_at` (Unix ms) | All fields set at creation are validated before the game record is persisted. `game_name` is required and must be non-empty after trim. `min_players`, `max_players`, `start_gap_hours`, `start_gap_players`, and `enrollment_ends_at` are required positive integers with `min_players <= max_players`. `turn_schedule` must be a valid five-field cron expression. `target_engine_version` must be a non-empty semver string. ### Status vocabulary | Status | Meaning | | --- | --- | | `draft` | Created; enrollment not yet open; editable | | `enrollment_open` | Accepting applications (public) or invite redeems (private) | | `ready_to_start` | Enrollment closed; start command accepted | | `starting` | Start job submitted to Runtime Manager; awaiting result | | `start_failed` | Container start or metadata persistence failed | | `running` | Game engine container live; normal gameplay | | `paused` | Platform-level pause; engine container may still be alive | | `finished` | Game ended; record is terminal | | `cancelled` | Cancelled before start; record is terminal | ### Status transition table | From | To | Trigger | | --- | --- | --- | | `draft` | `enrollment_open` | explicit command from admin (public) or owner (private) | | `enrollment_open` | `ready_to_start` | manual command when `approved_count >= min_players` | | `enrollment_open` | `ready_to_start` | `enrollment_ends_at` reached and `approved_count >= min_players` | | `enrollment_open` | `ready_to_start` | gap window exhausted (time or player count) | | `ready_to_start` | `starting` | start command from admin (public) or owner (private) | | `starting` | `running` | Runtime Manager confirms container; GM registration succeeds | | `starting` | `paused` | Runtime Manager confirms container; GM registration fails (unavailable) | | `starting` | `start_failed` | Runtime Manager reports container start failure | | `start_failed` | `ready_to_start` | explicit retry command from admin or owner | | `running` | `paused` | explicit pause command from admin or owner | | `running` | `finished` | `game_finished` event from `Game Master` via Redis Stream | | `paused` | `running` | explicit resume command from admin or owner | | `paused` | `finished` | `game_finished` event from `Game Master` via Redis Stream | | `draft` | `cancelled` | explicit cancel command from admin or owner | | `enrollment_open` | `cancelled` | explicit cancel command from admin or owner | | `ready_to_start` | `cancelled` | explicit cancel command from admin or owner | | `start_failed` | `cancelled` | explicit cancel command from admin or owner | | `draft` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `enrollment_open` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `ready_to_start` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `start_failed` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `starting` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `running` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `paused` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | Outside the `external_block` cascade, `running` and `paused` games cannot be cancelled directly; use stop operations through `Game Master` and await the `game_finished` event instead. The cascade publishes a stop-job to Runtime Manager before applying the `external_block` transition for in-flight games. ## Enrollment Rules `enrollment_open → ready_to_start` fires on the first of these conditions: ### Manual close Admin (public game) or owner (private game) issues `lobby.game.ready_to_start` when `approved_count >= min_players`. ### Deadline Enrollment automation worker detects that `enrollment_ends_at` is in the past and `approved_count >= min_players`. If the deadline is reached but `approved_count < min_players`, the game remains in `enrollment_open` — the transition does not fire until the player count condition is also satisfied. ### Gap exhaustion When `approved_count` reaches `max_players`, the gap window opens. During the gap window: - new applications and invite redeems continue to be accepted up to `max_players + start_gap_players` total approved participants - the game does not automatically transition while the gap is open The transition fires when either: - `start_gap_hours` have elapsed since the gap window opened, or - `approved_count` reaches `max_players + start_gap_players` ### On enrollment close When any path transitions the game to `ready_to_start`: - all invites in `created` status transition to `expired` - `lobby.invite.expired` notification intents are published for each expired invite (recipient: private-game owner) - no new applications are accepted in `ready_to_start` status ## Application Lifecycle Applications are used for public games only. Private games use the invite flow exclusively. ### Submit An authenticated user submits `lobby.application.submit` with `race_name`. Pre-conditions checked synchronously: - game status is `enrollment_open` - game type is `public` - user has no existing non-rejected application to the same game - `User Service` eligibility check confirms `can_join_game=true` - `approved_count < max_players + start_gap_players` (or gap window not yet open) - Race Name Directory confirms `race_name` is available for the applicant On success: - an `Application` record is created with `status=submitted` - `lobby.application.submitted` intent published (`audience_kind=admin_email`) with payload: `game_id`, `game_name`, `applicant_user_id`, `applicant_name` `applicant_name` in the notification payload equals the submitted `race_name`. ### Approve Admin issues `lobby.application.approve`. Pre-conditions: - game is `enrollment_open` - application is in `submitted` status - `approved_count < max_players + start_gap_players` On success: - Race Name Directory reserves `race_name` for the applicant - application `status` → `approved` - `Membership` record created with `status=active` - `lobby.membership.approved` intent published (recipient: applicant) with payload: `game_id`, `game_name` - gap window opens automatically if `approved_count` now equals `max_players` - auto-transition to `ready_to_start` if gap exhaustion condition is immediately met ### Reject Admin issues `lobby.application.reject`. Pre-conditions: - application is in `submitted` status On success: - application `status` → `rejected` - any pending Race Name Directory reservation for the applicant is released - `lobby.membership.rejected` intent published (recipient: applicant) with payload: `game_id`, `game_name` ### Application state machine ```text submitted → approved submitted → rejected ``` Rejected applicants may re-apply while enrollment is open, subject to a single active application constraint (at most one non-rejected application per user per game). The single-active constraint is enforced at the persistence layer by the `user_game_application` key (see Redis Logical Model). The key is created atomically with the submitted application record, removed on rejection, and preserved on approval. Service-layer code can rely on this invariant without performing its own scan of `user_applications`. ## Invite Lifecycle Invites are used for private games only. Public games use the application flow exclusively. ### Create Private-game owner issues `lobby.invite.create` with `invitee_user_id`. Pre-conditions: - game status is `enrollment_open` - game type is `private` - the invitee has no active invite or active membership in the game - `approved_count < max_players + start_gap_players` On success: - `Invite` record created with `status=created` - `expires_at` is set to `enrollment_ends_at` of the game - `lobby.invite.created` intent published (recipient: invitee) with payload: `game_id`, `game_name`, `inviter_user_id`, `inviter_name` `inviter_name` is the owner's race name if already a member of the game; otherwise it is the owner's `user_id`. ### Redeem The invited user issues `lobby.invite.redeem` with `race_name`. Pre-conditions: - invite status is `created` - game is `enrollment_open` - `approved_count < max_players + start_gap_players` - inviter and invitee both exist and are not permanently blocked in `User Service` - Race Name Directory confirms `race_name` is available for the invitee On success: - Race Name Directory reserves `race_name` for the invitee - invite `status` → `redeemed` - `Membership` record created with `status=active` - `lobby.invite.redeemed` intent published (recipient: private-game owner) with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name` - gap window opens automatically if `approved_count` now equals `max_players` - auto-transition to `ready_to_start` if gap exhaustion condition is immediately met The synchronous `User Service` check on both inviter and invitee enforces the rule that an invite from or to a permanently blocked or deleted user behaves as if it never existed, even before the asynchronous user-lifecycle cascade has flipped the invite to `revoked`. Cascade-deleted accounts and `permanent_block` sanctions surface as `subject_not_found`. ### Decline The invited user issues `lobby.invite.decline`. Pre-conditions: - invite status is `created` On success: - invite `status` → `declined` - no notification in v1 Declined users may receive a new invite from the owner while enrollment is open. ### Revoke Owner issues `lobby.invite.revoke`. Pre-conditions: - invite status is `created` On success: - invite `status` → `revoked` - no notification in v1 ### Expire Pending invites (`status=created`) are transitioned to `expired` automatically when the game moves to `ready_to_start`. `lobby.invite.expired` intent is published for each expired invite (recipient: private-game owner) with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name`. ### Invite state machine ```text created → redeemed created → declined created → revoked created → expired ``` ## Membership Model ### Fields | Field | Type | Notes | | --- | --- | --- | | `membership_id` | string | opaque, stable | | `game_id` | string | reference to game | | `user_id` | string | reference to platform user | | `race_name` | string | confirmed in-game name as submitted (original casing) | | `canonical_key` | string | canonicalized key under which the RND reservation is held | | `status` | enum | `active`, `removed`, `blocked` | | `joined_at` | int64 | UTC Unix milliseconds | | `removed_at` | int64 | UTC Unix milliseconds; set on remove or block | ### Status vocabulary | Status | Meaning | | --- | --- | | `active` | Full participant; may send commands through `Game Master` | | `removed` | Permanently removed; engine slot deactivated after game start | | `blocked` | Platform-level block; engine slot retained but commands blocked | ### Status transition table | From | To | Trigger | | --- | --- | --- | | `active` | `removed` | explicit remove command from admin or owner (post-start) | | `active` | `blocked` | explicit block command from admin or owner | `removed` and `blocked` are terminal statuses. Pre-start remove drops the membership record entirely rather than transitioning to `removed` (see Removal rules below). ### Removal rules Before game start: - remove drops membership and releases the race name reservation After game start: - `blocked`: the player cannot send commands; engine keeps the player slot - `removed`: `Game Lobby` marks membership `removed`; `Game Master` must also deactivate the player inside the engine; race name reservation remains until game is finished This distinction is architectural and must remain explicit in all implementations. ## Race Name Directory ### Purpose `Race Name Directory` (RND) is the platform source of truth for all in-game `race_name` values. It owns three levels of state per name: - **registered** — permanent user-owned names. Once registered, the name is unavailable to any other user and cannot be released by the owner; only `permanent_block` or `DeleteUser` on the owning account frees it. - **reservation** — a per-game holding created when a participant joins through application approval or invite redeem. Reservations are keyed by `(game_id, canonical_key)`. One user may hold the same name in multiple active games concurrently. - **pending_registration** — a reservation that survived a capable finish and is now waiting up to 30 days for the owner to upgrade it into a registered name via `lobby.race_name.register`. Expiration releases the binding. `User Service` does not store `race_name` values. It only exposes `max_registered_race_names` in the eligibility snapshot and publishes `user.lifecycle.permanent_blocked` / `user.lifecycle.deleted` events. ### Canonical key + confusable-pair policy Every RND key is derived by `racename.Canonicalize(raceName) (canonical string, err error)` living in `lobby/internal/domain/racename/policy.go`: 1. trim and validate the character set via `pkg/util/string.go:ValidateTypeName`; 2. lowercase Unicode fold; 3. apply the frozen confusable-pair replacement map (ported from the former `user/internal/ports/race_name_policy.go`). A name is considered taken for the actor when the RND holds at least one `registered`, active `reservation`, or `pending_registration` whose owner differs from the actor on the same canonical key. ### Port interface ``` type RaceNameDirectory interface { Canonicalize(raceName string) (canonical string, err error) Check(ctx context.Context, raceName, actorUserID string) (Availability, error) Reserve(ctx context.Context, gameID, userID, raceName string) error ReleaseReservation(ctx context.Context, gameID, userID, raceName string) error MarkPendingRegistration( ctx context.Context, gameID, userID, raceName string, eligibleUntil time.Time, ) error ExpirePendingRegistrations(ctx context.Context, now time.Time) ([]ExpiredPending, error) Register(ctx context.Context, gameID, userID, raceName string) error ListRegistered(ctx context.Context, userID string) ([]RegisteredName, error) ListPendingRegistrations(ctx context.Context, userID string) ([]PendingRegistration, error) ListReservations(ctx context.Context, userID string) ([]Reservation, error) ReleaseAllByUser(ctx context.Context, userID string) error } type Availability struct { Taken bool HolderUserID string // "" when available Kind string // "registered" | "reservation" | "pending_registration" } ``` Sentinel errors: `ErrNameTaken`, `ErrInvalidName`, `ErrPendingMissing`, `ErrPendingExpired`, `ErrQuotaExceeded`. ### v1 backends - **PostgreSQL** (`lobby/internal/adapters/postgres/racenamedir/directory.go`) — the production adapter; one row per binding under `lobby.race_names`, transactional writes guarded by `pg_advisory_xact_lock(hashtextextended(canonical_key, 0))`. See `docs/postgres-migration.md` §6B for the full schema and decision record. - **In-memory** (`lobby/internal/adapters/racenameinmem/directory.go`) — in-process implementation used by unit tests that do not need PostgreSQL and by deployments that select the in-memory backend with `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub` (the config token name is preserved for backward compatibility). A future dedicated `Race Name Service` replaces the adapter without changing the domain or service layer. ### Reservation lifecycle and capability 1. `approveapplication` / `redeeminvite` → `Reserve(game_id, user_id, race_name)`. 2. `removemember` before start → `ReleaseReservation`. 3. `removemember` / `blockmember` after start → reservation kept; resolved at `game_finished`. 4. On `game_finished` the capability evaluator runs per active membership: - `capable = max_planets > initial_planets AND max_population > initial_population`, using the per-game stats aggregate (see §Runtime Snapshot); - capable ⇒ `MarkPendingRegistration(..., finished_at + 30 days)` + `lobby.race_name.registration_eligible`; - not capable ⇒ `ReleaseReservation` + optional `lobby.race_name.registration_denied`. 5. The pending-registration worker (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) releases expired entries. ### Registration flow `lobby.race_name.register` → `POST /api/v1/lobby/race-names/register`: - actor is the authenticated user; - body: `{race_name, source_game_id}`; - preconditions: - `pending_registration` exists for `(source_game_id, user_id, canonical_key)` with `eligible_until > now`; - `UserService.GetEligibility` snapshot: no `permanent_block`, `current_registered_count < max_registered_race_names` (a snapshot value of `0` denotes unlimited); - commit: `RND.Register` atomically deletes the pending entry, creates a registered entry, and publishes `lobby.race_name.registered`. Errors: `race_name_registration_quota_exceeded`, `race_name_pending_window_expired`, `subject_not_found`, `forbidden`. ### Self-service reads `lobby.race_names.list` → `GET /api/v1/lobby/my/race-names` returns the acting user's `{registered[], pending[], reservations[]}` using the `user_registered` / `user_reservations` indexes (no full scan). The response shape is fixed by `api/public-openapi.yaml` and carries: - `registered[]`: `canonical_key`, `race_name`, `source_game_id`, `registered_at_ms`; - `pending[]`: `canonical_key`, `race_name`, `source_game_id` (the game whose capable finish promoted the reservation), `reserved_at_ms`, `eligible_until_ms`; - `reservations[]`: `canonical_key`, `race_name`, `game_id`, `reserved_at_ms`, `game_status` (current `game.Status` of the hosting game, joined on read). Each slice is sorted ascending by its time field with `canonical_key` as the tie-breaker so the wire output is stable. The endpoint is exclusively self-service: there is no `?user_id=` parameter and no admin counterpart on the internal port. Visibility is enforced by the `X-User-ID` header alone. ### Cascade release `Game Lobby` consumes `user:lifecycle_events` through a dedicated worker. On `user.lifecycle.permanent_blocked` or `user.lifecycle.deleted`: - `RND.ReleaseAllByUser(user_id)` clears every registered, reservation, and pending entry owned by the user; - every active membership held by the user transitions to `blocked`. For each such membership in a third-party private game, a `lobby.membership.blocked` intent is published to the game owner; - every outstanding `submitted` application authored by the user is rejected; - every `created` invite where the user is invitee or inviter transitions to `revoked`; - every non-terminal game owned by the user transitions to `cancelled` via the `external_block` trigger. For in-flight games (`starting`, `running`, `paused`) a stop-job is published to Runtime Manager before the status transition. Synchronous guard: `lobby.invite.redeem` calls `UserService.GetEligibility` for both the inviter and the invitee. If either party has been permanently blocked or soft-deleted, the redeem fails with `subject_not_found`, matching the «as if the invite never existed» semantic even before the cascade flips the invite to `revoked`. ### Retry and release semantics - `Reserve` is idempotent for the same holder under the same game. A second call returns no error so that `approveapplication` and `redeeminvite` retries after transient upstream failures stay safe. - `ReleaseReservation` is a no-op when no reservation exists for the tuple and also when the reservation belongs to a different user. Defensive release paths (`rejectapplication`, `revokeinvite`, `declineinvite`) never surface an error. - `Register` is idempotent only for the same `(game_id, user_id, race_name)` tuple — repeated calls after success return the same registered record without consuming additional quota. - `MarkPendingRegistration` is idempotent when called with the same `eligible_until`; re-emitting it with a different timestamp returns `ErrInvalidName`. ## Game Start Flow The start sequence spans three services and must be treated as a distributed transaction with explicit failure handling. ```mermaid sequenceDiagram participant Admin as Admin or Private Owner participant Lobby participant Runtime participant GM as Game Master participant Redis Admin->>Lobby: lobby.game.start Lobby->>Lobby: validate ready_to_start + roster Lobby->>GM: GET /internal/engine-versions/{version}/image-ref (sync) alt GM image-ref resolve failed GM-->>Lobby: error / timeout / not found Lobby-->>Admin: service_unavailable (GM unreachable) or engine_version_not_found else image_ref resolved GM-->>Lobby: 200 OK { image_ref } Lobby->>Lobby: status → starting Lobby->>Redis: publish start job to runtime:start_jobs (with image_ref) Runtime->>Runtime: start container Runtime->>Redis: publish result to runtime:job_results alt container start failed Lobby->>Lobby: status → start_failed else container started Lobby->>Lobby: persist runtime binding Lobby->>GM: POST /internal/games/{game_id}/register-runtime (sync) alt GM registration success GM-->>Lobby: 200 OK Lobby->>Lobby: status → running; set started_at else GM unavailable GM-->>Lobby: error / timeout Lobby->>Lobby: status → paused Lobby->>Redis: publish lobby.runtime_paused_after_start intent end end end ``` ### Image-ref resolution (synchronous via Game Master) Before publishing the start job, `Lobby` resolves the Docker `image_ref` for `target_engine_version` by calling `GET /api/v1/internal/engine-versions/{version}/image-ref` on `Game Master`'s internal port. The call is synchronous and runs while the game is still in `ready_to_start`: - success ⇒ `Lobby` proceeds to `starting`, embeds the resolved `image_ref` into the `runtime:start_jobs` envelope, and publishes; - the version is missing or deprecated on GM (`engine_version_not_found`) ⇒ `lobby.game.start` returns `engine_version_not_found`; the game stays in `ready_to_start`; - GM is unreachable (network error, timeout, `5xx`) ⇒ `lobby.game.start` returns `service_unavailable`; the game stays in `ready_to_start` and the operator can retry. Resolving against GM is the v1 contract; the legacy `LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template variable is retired together with the inline `engineimage.Resolver`. ### Critical invariants - If the container starts but `Lobby` cannot persist the runtime binding metadata, the start is a full failure: `Lobby` must issue a stop job to `Runtime Manager` with `reason=orphan_cleanup` before setting `start_failed`. - If metadata is persisted but `Game Master` is unavailable, the game must be placed in `paused`, not in `start_failed`. The container is alive; only the platform tracking is incomplete. - If `Game Master` is unavailable at image-ref resolve time, the start command itself fails with `service_unavailable`. The game stays in `ready_to_start`; no container is created and no `runtime:start_jobs` envelope is published. - No start job is accepted while the game is not in `ready_to_start`. - Concurrent start attempts for the same game must be serialized; the second attempt must fail if the first already moved the game to `starting`. ### Runtime Manager envelopes `Lobby` is the producer for both `runtime:start_jobs` and `runtime:stop_jobs`. The `Lobby ↔ Runtime Manager` transport stays asynchronous indefinitely; there is no synchronous Lobby→RTM REST call in v1 or planned for v2. `runtime:start_jobs` envelope: | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | Lobby `game_id`. | | `image_ref` | string | Docker reference resolved synchronously from `target_engine_version` against `Game Master`'s engine version registry; see §Game Start Flow. | | `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. | `runtime:stop_jobs` envelope: | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | | | `reason` | enum | `orphan_cleanup`, `cancelled`, `finished`, `admin_request`, `timeout`. | | `requested_at_ms` | int64 | UTC milliseconds. | `reason` semantics (Lobby producer side): - `orphan_cleanup` — used by Lobby's runtime-job-result consumer to release a container whose metadata persistence failed after a successful container start. - `cancelled` — used by the user-lifecycle cascade and by explicit cancel paths for in-flight games. - `finished` — reserved; not produced by Lobby in v1 because `game_finished` is engine-driven and stop jobs after finish are an Admin/GM concern. - `admin_request` — reserved for future admin-initiated stop paths through Lobby; not produced in v1. - `timeout` — reserved for future enrollment-timeout-driven stop paths; not produced in v1. ### Design rationale: StopReason placement The `StopReason` enum is declared in `lobby/internal/ports/runtimemanager.go` alongside the `RuntimeManager` interface that consumes it. The enum is publisher-side protocol: it mirrors the AsyncAPI discriminator on `runtime:stop_jobs`, has no behaviour beyond `Validate`, and co-locating it with the interface keeps the AsyncAPI ↔ Go mapping visible in one file. Alternatives considered and rejected: - a dedicated `lobby/internal/domain/runtimejob` package — manufactures a domain layer for a single string enum that exists only to be serialised onto a Redis Stream; - placing the enum in the publisher adapter package (`lobby/internal/adapters/runtimemanager`) — the callers (start-game service, runtime-job-result worker, user-lifecycle worker) live outside that package and would have to depend on a concrete adapter for an enum value. ## Paused State `Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime failure states. Two paths lead to `paused`: ### Voluntary pause Admin or owner issues `lobby.game.pause` while the game is `running`. Resume is issued with `lobby.game.resume`; `Lobby` performs a synchronous liveness check against `Game Master` before transitioning back to `running`. ### Forced pause (GM unavailable after start) If the game start sequence succeeds at the runtime layer but `Game Master` registration fails, `Lobby` transitions to `paused` and publishes `lobby.runtime_paused_after_start` to administrators. Administrators investigate, restore `Game Master`, and issue `lobby.game.resume` through the internal admin surface. ## Game Finish Flow `Game Master` publishes a `game_finished` event to the GM events Redis Stream when the engine reports that the game has ended. `Lobby` consumes this event and, before advancing the stream offset: - transitions game status to `finished` - sets `finished_at` to the event timestamp - updates the denormalized runtime snapshot with the final values - runs the capability evaluator against every `active` membership: - `capable = max_planets > initial_planets AND max_population > initial_population` from the per-member stats aggregate - capable ⇒ `RND.MarkPendingRegistration(game_id, user_id, race_name, finished_at + 30 days)` and publish `lobby.race_name.registration_eligible` - not capable ⇒ `RND.ReleaseReservation(game_id, user_id, race_name)` and (optional) publish `lobby.race_name.registration_denied` - resolves outstanding reservations on `removed` and `blocked` memberships by calling `RND.ReleaseReservation` (post-start remove/block keeps the reservation alive specifically so capability evaluation resolves it here) - deletes the per-game stats aggregate The `game_finished` event from `Game Master` is the sole trigger for the `finished` status. `Lobby` does not independently decide that a game is finished. Capability evaluation must be idempotent: a replayed `game_finished` event must not produce additional RND side effects or notifications. ## Runtime Snapshot `Game Lobby` stores a denormalized runtime snapshot on the game record to prevent fan-out reads to `Game Master` on every user-facing list or detail request, and aggregates per-member stats to support capability evaluation at game finish. ### Denormalized snapshot fields | Field | Source | | --- | --- | | `current_turn` | GM event `runtime_snapshot_update` | | `runtime_status` | GM event `runtime_snapshot_update` | | `engine_health_summary` | GM event `runtime_snapshot_update` | ### Per-member stats aggregate Each `runtime_snapshot_update` carries a `player_turn_stats` array with one entry per active member: `{user_id, planets, population}`. `Lobby` aggregates these in `lobby:game_turn_stats::` with the shape `{initial_planets, initial_population, max_planets, max_population}`. `ships_built` is not part of the contract; the capability rule reduces to `planets` and `population` only. Rules: - `initial_*` values are frozen from the first event after `starting → running`; later events must not change them. - `max_*` values are maintained by max-semantic update; they never decrease. - the aggregate is read once by the capability evaluator at `game_finished` and then deleted. ### Update mechanism `Game Master` publishes events to a dedicated Redis Stream consumed by `Lobby`: - `runtime_snapshot_update`: carries updated `current_turn`, `runtime_status`, `engine_health_summary`, and `player_turn_stats`; `Lobby` applies a compare-and-swap update on the game record plus a stats aggregate upsert. - `game_finished`: carries final snapshot values and signals the finish transition; capability evaluator (see §Game Finish Flow) runs before the stream offset is advanced. `Lobby` does not expose the runtime snapshot update as an internal HTTP endpoint. All snapshot updates are asynchronous and delivered through the stream. ## Public vs Private Game Rules ### Public games - created and controlled by system administrators through the internal admin surface - visible in the public game list when in `enrollment_open`, `ready_to_start`, `running`, or `finished` status - `draft` public games are not visible to non-admin users - players join through the application flow; admission requires admin approval - turn schedule and engine version are set by the administrator ### Private games - created only by eligible paid users whose `User Service` eligibility snapshot carries `can_create_private_game=true` and whose `max_owned_private_games` limit allows it - visible only to the owner and to users who have an active membership or a non-expired invite - `draft` private games are visible only to the owner - players join through the invite flow; invite redemption creates active membership immediately without further owner approval - owner manages invites, turn schedule, and engine version ## Owner-Admin Capabilities Private-game owners have a limited owner-admin capability set over their own games only: - open enrollment (`draft` → `enrollment_open`) - create and revoke invites - manually close enrollment (`enrollment_open` → `ready_to_start`) - start the game (`ready_to_start` → `starting`) - pause and resume the game (`running` ↔ `paused`) - retry start or cancel after `start_failed` - remove or block members - cancel the game (from `draft`, `enrollment_open`, `ready_to_start`, `start_failed`) Owners do not have system-admin power. They cannot see or operate on other users' private games. They cannot approve or reject applications (applications are public-game only). ## Trusted Surfaces ### Public authenticated REST (gateway-facing) All user-facing commands arrive through `Edge Gateway`. Gateway verifies the authenticated session, transcodes the FlatBuffers command to a trusted REST call, and forwards it to `Lobby` on the public port. Gateway enriches each request with the authenticated `user_id` via the `X-User-ID` header. `Lobby` must never derive the acting user from the request payload. #### Message type catalog | `message_type` | Method | Path | Actor | | --- | --- | --- | --- | | `lobby.game.create` | `POST` | `/api/v1/lobby/games` | admin (public), eligible user (private) | | `lobby.game.update` | `PATCH` | `/api/v1/lobby/games/{game_id}` | admin or owner; draft only | | `lobby.game.get` | `GET` | `/api/v1/lobby/games/{game_id}` | any authenticated user (visibility rules apply) | | `lobby.games.list` | `GET` | `/api/v1/lobby/games` | any authenticated user | | `lobby.game.open_enrollment` | `POST` | `/api/v1/lobby/games/{game_id}/open-enrollment` | admin or owner | | `lobby.game.ready_to_start` | `POST` | `/api/v1/lobby/games/{game_id}/ready-to-start` | admin or owner | | `lobby.game.start` | `POST` | `/api/v1/lobby/games/{game_id}/start` | admin or owner | | `lobby.game.pause` | `POST` | `/api/v1/lobby/games/{game_id}/pause` | admin or owner | | `lobby.game.resume` | `POST` | `/api/v1/lobby/games/{game_id}/resume` | admin or owner | | `lobby.game.cancel` | `POST` | `/api/v1/lobby/games/{game_id}/cancel` | admin or owner | | `lobby.game.retry_start` | `POST` | `/api/v1/lobby/games/{game_id}/retry-start` | admin or owner | | `lobby.application.submit` | `POST` | `/api/v1/lobby/games/{game_id}/applications` | authenticated user | | `lobby.application.approve` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/approve` | admin | | `lobby.application.reject` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/reject` | admin | | `lobby.invite.create` | `POST` | `/api/v1/lobby/games/{game_id}/invites` | private-game owner | | `lobby.invite.redeem` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/redeem` | invited user | | `lobby.invite.decline` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/decline` | invited user | | `lobby.invite.revoke` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/revoke` | private-game owner | | `lobby.membership.remove` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/remove` | admin or owner | | `lobby.membership.block` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/block` | admin or owner | | `lobby.memberships.list` | `GET` | `/api/v1/lobby/games/{game_id}/memberships` | admin, owner, or active member | | `lobby.my_games.list` | `GET` | `/api/v1/lobby/my/games` | authenticated user | | `lobby.my_applications.list` | `GET` | `/api/v1/lobby/my/applications` | authenticated user | | `lobby.my_invites.list` | `GET` | `/api/v1/lobby/my/invites` | authenticated user | | `lobby.race_name.register` | `POST` | `/api/v1/lobby/race-names/register` | authenticated user | | `lobby.race_names.list` | `GET` | `/api/v1/lobby/my/race-names` | authenticated user | ### Internal trusted REST (internal-facing) The internal port is not reachable from the public internet. It is used by `Game Master` for the synchronous registration call and by the administrative backend for admin-only operations. Key internal endpoints: | Method | Path | Purpose | | --- | --- | --- | | `GET` | `/api/v1/internal/games/{game_id}` | game detail read for GM/admin | | `GET` | `/api/v1/internal/games/{game_id}/memberships` | full membership list for GM | | `GET` | `/api/v1/internal/healthz` | health probe | | `GET` | `/api/v1/internal/readyz` | readiness probe | Note: every Lobby ↔ Game Master synchronous call is **outgoing** from Lobby to Game Master's internal port at `LOBBY_GM_BASE_URL`. Lobby does not expose an inbound `register-runtime` endpoint or any other GM-facing endpoint: | Call site | Method | Path on Game Master | Purpose | | --- | --- | --- | --- | | `startgame` (pre-publish) | `GET` | `/api/v1/internal/engine-versions/{version}/image-ref` | Resolve the Docker `image_ref` for `target_engine_version` synchronously before publishing `runtime:start_jobs`. Failure ⇒ `service_unavailable` or `engine_version_not_found`; the game stays in `ready_to_start`. | | `startgame` (post-container-up) | `POST` | `/api/v1/internal/games/{game_id}/register-runtime` | Register the runtime after a successful container start. Failure ⇒ forced `paused` (see §Paused State). | | `approveapplication`, `rejectapplication`, `redeeminvite`, `removemember`, `blockmember`, user-lifecycle cascade | `POST` | `/api/v1/internal/games/{game_id}/memberships/invalidate` | Tell GM to drop its in-process membership cache for the game after a roster mutation. Called **post-commit** and is fail-open: a non-2xx response is logged and metered but never rolls back the Lobby commit. GM's TTL safety net catches stale data within the next cache TTL window. | | `removemember` (engine-side cleanup, post-commit) | `POST` | `/api/v1/internal/games/{game_id}/race/{race_name}/banish` | Ask GM to deactivate the engine-side player after a permanent removal. Fail-open in the same sense as the invalidate call. | | `resumegame` | `GET` | `/api/v1/internal/games/{game_id}/liveness` | Check that GM has the runtime in `running` before transitioning the platform record from `paused` back to `running`. | Admin-only operations (approve, reject, cancel, create public games, etc.) are also exposed on the internal port and are intended to be called by `Admin Service` after it enforces the system-admin role check at the gateway boundary. ## User-Facing Lists ### My active games Returns games where the authenticated user has an active membership and the game status is `running` or `paused`. Response includes the denormalized runtime snapshot. ### My pending applications Returns applications submitted by the authenticated user with status `submitted`. Includes game name and type for display. ### My open invitations Returns invites addressed to the authenticated user with status `created`. Includes game name, inviter name, and `expires_at`. ### Public game list Paginated list of public games with status in `enrollment_open`, `ready_to_start`, `running`, or `finished`. Games in `draft` or `cancelled` are excluded. Default order: `enrollment_open` and `ready_to_start` first, then `running`, then `finished` (most recent first within each group). ### Visibility rules - private `draft` games: visible only to the owner - private non-draft games: visible only to the owner and users with active membership or non-expired invite - public `draft` games: visible only to system administrators - public non-draft games: visible in the public list ## Notification Contracts `Game Lobby` publishes normalized notification intents to `notification:intents` using the `galaxy/notificationintent` producer module. | Trigger | `notification_type` | Audience | Channels | | --- | --- | --- | --- | | Application submitted (public game) | `lobby.application.submitted` | configured admin email list | `email` | | Application approved | `lobby.membership.approved` | applicant user | `push+email` | | Application rejected | `lobby.membership.rejected` | applicant user | `push+email` | | Cascade membership block (`permanent_block`/`DeleteUser`) | `lobby.membership.blocked` | private-game owner | `push+email` | | Invite created (private game) | `lobby.invite.created` | invited user | `push+email` | | Invite redeemed (private game) | `lobby.invite.redeemed` | private-game owner | `push+email` | | Invite expired (on enrollment close) | `lobby.invite.expired` | private-game owner | `email` | | GM unavailable after start (forced pause) | `lobby.runtime_paused_after_start` | configured admin email list | `email` | | Race name eligible for registration | `lobby.race_name.registration_eligible` | capable member | `push+email` | | Race name successfully registered | `lobby.race_name.registered` | registering user | `push+email` | | Race name registration denied (capability) | `lobby.race_name.registration_denied` | incapable member | `email` | Rules: - intents carry explicit `recipient_user_id` values; `Lobby` resolves recipients before publishing rather than delegating audience resolution to `Notification Service` - a failed intent publication is a notification degradation and must not roll back already committed business state - `lobby.invite.revoked` and `lobby.invite.declined` produce no notification in v1 - `lobby.application.submitted` is published only for public games; the private-game owner-targeting path defined in the notification catalog is reserved for future use ## Domain Events `Game Lobby` publishes auxiliary post-commit domain events to the Redis stream configured for lobby domain events. Frozen event types: - `lobby.game.created` - `lobby.game.status_changed` - `lobby.membership.activated` - `lobby.membership.removed` - `lobby.membership.blocked` Event rules: - events are post-commit only; they are not emitted on failed operations - event envelopes carry `game_id`, optional `user_id`, occurrence timestamp, new status (for `status_changed`), and optional trace correlation - domain events are observability and downstream-read-model artifacts; they must not carry full business state payloads ## Error Model The trusted internal REST contract uses strict JSON error envelopes: ```json { "error": { "code": "invalid_request", "message": "request is invalid" } } ``` Stable error codes: - `invalid_request` — malformed input or failed validation - `conflict` — state transition not allowed from current status - `subject_not_found` — game, application, invite, membership, or pending race-name registration not found - `eligibility_denied` — user not eligible per `User Service` - `name_taken` — `race_name` already registered, reserved, or pending for another user - `race_name_registration_quota_exceeded` — user's `max_registered_race_names` slot is full - `race_name_pending_window_expired` — the 30-day registration window has passed for the pending entry - `race_name_capability_not_met` — capability condition not satisfied at game finish (reservation released) - `race_name_permanent_blocked` — the user carries an active `permanent_block` sanction - `forbidden` — caller is not authorized for this operation on this game or this race name - `engine_version_not_found` — `target_engine_version` is missing or deprecated on `Game Master`'s engine version registry (returned by `lobby.game.start` at image-ref resolve time) - `internal_error` — unexpected service error - `service_unavailable` — upstream dependency unavailable ## Configuration ### Required - `LOBBY_REDIS_MASTER_ADDR` - `LOBBY_REDIS_PASSWORD` - `LOBBY_POSTGRES_PRIMARY_DSN` - `LOBBY_USER_SERVICE_BASE_URL` - `LOBBY_GM_BASE_URL` ### Configuration groups Process and logging: - `LOBBY_SHUTDOWN_TIMEOUT` with default `30s` - `LOBBY_LOG_LEVEL` with default `info` Public HTTP: - `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094` - `LOBBY_PUBLIC_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `LOBBY_PUBLIC_HTTP_READ_TIMEOUT` with default `10s` - `LOBBY_PUBLIC_HTTP_IDLE_TIMEOUT` with default `1m` Internal HTTP: - `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095` - `LOBBY_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `LOBBY_INTERNAL_HTTP_READ_TIMEOUT` with default `10s` - `LOBBY_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m` Redis connectivity: - `LOBBY_REDIS_MASTER_ADDR` (required) - `LOBBY_REDIS_REPLICA_ADDRS` (optional, comma-separated; not consumed yet) - `LOBBY_REDIS_PASSWORD` (required) - `LOBBY_REDIS_DB` (default 0) - `LOBBY_REDIS_OPERATION_TIMEOUT` (default 250ms) The legacy `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_USERNAME`, and `LOBBY_REDIS_TLS_ENABLED` env vars were retired in PG_PLAN.md §6A; setting either of the latter two now fails fast at startup. See `ARCHITECTURE.md §Persistence Backends` for the architectural rules. PostgreSQL connectivity (PG_PLAN.md §6A and §6B; durable game / application / invite / membership records and the Race Name Directory live here): - `LOBBY_POSTGRES_PRIMARY_DSN` (required; e.g. `postgres://lobbyservice:secret@postgres:5432/galaxy?search_path=lobby&sslmode=disable`) - `LOBBY_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not consumed yet) - `LOBBY_POSTGRES_OPERATION_TIMEOUT` (default 1s) - `LOBBY_POSTGRES_MAX_OPEN_CONNS` (default 25) - `LOBBY_POSTGRES_MAX_IDLE_CONNS` (default 5) - `LOBBY_POSTGRES_CONN_MAX_LIFETIME` (default 30m) Stream names: - `LOBBY_GM_EVENTS_STREAM` with default `gm:lobby_events` - `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` with default `2s` - `LOBBY_RUNTIME_START_JOBS_STREAM` with default `runtime:start_jobs` - `LOBBY_RUNTIME_STOP_JOBS_STREAM` with default `runtime:stop_jobs` - `LOBBY_RUNTIME_JOB_RESULTS_STREAM` with default `runtime:job_results` - `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s` - `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents` Game Master image-ref resolver: - `image_ref` is resolved synchronously by `Game Master` from `target_engine_version` over its engine version registry; see §Game Start Flow. The legacy `LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template variable is retired and rejected at startup if set. Upstream clients: - `LOBBY_USER_SERVICE_TIMEOUT` with default `1s` - `LOBBY_GM_TIMEOUT` with default `5s` Enrollment automation: - `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL` with default `30s` Race Name Directory: - `LOBBY_RACE_NAME_DIRECTORY_BACKEND` with default `postgres` (alternate: `stub` for in-process tests; PG_PLAN.md §6B retired the `redis` backend) - `LOBBY_RACE_NAME_EXPIRATION_INTERVAL` with default `1h` — pending registration expiration worker tick The 30-day eligibility window for `pending_registration` entries is the constant `service/capabilityevaluation.PendingRegistrationWindow`. It is intentionally not operator-tunable today; the env var name `LOBBY_PENDING_REGISTRATION_TTL_HOURS` is reserved for a future change. User lifecycle: - `LOBBY_USER_LIFECYCLE_STREAM` with default `user:lifecycle_events` - `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` with default `2s` OpenTelemetry: - standard `OTEL_*` variables - `LOBBY_OTEL_STDOUT_TRACES_ENABLED` - `LOBBY_OTEL_STDOUT_METRICS_ENABLED` ## Persistence Layout Game / application / invite / membership records live in PostgreSQL after PG_PLAN.md §6A; the Race Name Directory followed in §6B. See `docs/postgres-migration.md` for the schema and decision records. The `lobby` schema owns five tables — `games`, `applications`, `invites`, `memberships`, `race_names` — plus the partial UNIQUE index on `applications(applicant_user_id, game_id) WHERE status <> 'rejected'` that enforces the single-active-application invariant and the partial UNIQUE index on `race_names(canonical_key) WHERE binding_kind = 'registered'` that enforces single-registered-per-canonical. The Redis-backed keys below survive both stages. Redis owns the runtime-coordination state — per-game runtime aggregates, gap activation, capability-evaluation guards, and stream consumer offsets — plus the event-bus streams themselves. ### Redis key table Storage rules for Redis: - timestamps are stored in Unix milliseconds unless noted otherwise - dynamic key segments are base64url-encoded | Logical artifact | Redis key | | --- | --- | | per-game per-user stats aggregate | `lobby:game_turn_stats::` → JSON aggregate | | per-game stats user index | `lobby:game_turn_stats_by_game:` (set of `user_id`) | | capability-evaluation guard | `lobby:capability_evaluation:done:` (sentinel string) | | GM event stream offset | `lobby:stream_offsets:gm_events` | | runtime job result offset | `lobby:stream_offsets:runtime_results` | | user lifecycle stream offset | `lobby:stream_offsets:user_lifecycle` | | gap window activation time | `lobby:gap_activated_at:` | ### Frozen record fields The five durable records are stored in PostgreSQL columns; the field set per record is unchanged from the previous Redis JSON shape and is documented inline with the migration scripts under `internal/adapters/postgres/migrations/`. | Record | Frozen fields | | --- | --- | | game record | all game fields listed in Game Record Model section | | application record | `application_id`, `game_id`, `applicant_user_id`, `race_name`, `status`, `created_at`, `decided_at` | | invite record | `invite_id`, `game_id`, `inviter_user_id`, `invitee_user_id`, `race_name` (set at redeem), `status`, `created_at`, `expires_at`, `decided_at` | | membership record | all membership fields listed in Membership Model section | | race_names row | `canonical_key`, `game_id`, `holder_user_id`, `race_name`, `binding_kind`, `source_game_id`, `reserved_at_ms`, `eligible_until_ms` (pending only), `registered_at_ms` (registered only) | ## Observability ### Metrics - `lobby.game.transitions` — counter; attributes: `from_status`, `to_status`, `trigger` (`command`, `manual`, `deadline`, `gap`, `runtime_event`, `external_block`) - `lobby.application.outcomes` — counter; attributes: `outcome` (`submitted`, `approved`, `rejected`) - `lobby.invite.outcomes` — counter; attributes: `outcome` (`created`, `redeemed`, `declined`, `revoked`, `expired`) - `lobby.membership.changes` — counter; attributes: `change` (`activated`, `removed`, `blocked`, `external_block`) - `lobby.start_flow.outcomes` — counter; attributes: `outcome` (`running`, `paused`, `start_failed`) - `lobby.notification.publish_attempts` — counter; attributes: `notification_type`, `result` (`ok`, `error`) - `lobby.active_games` — observable gauge; attributes: `status` - `lobby.enrollment_automation.checks` — counter; attributes: `result` (`no_op`, `transitioned`) - `lobby.gm_events.oldest_unprocessed_age_ms` — observable gauge - `lobby.runtime_results.oldest_unprocessed_age_ms` — observable gauge - `lobby.user_lifecycle.oldest_unprocessed_age_ms` — observable gauge - `lobby.race_name.outcomes` — counter; attributes: `outcome` (`reserved`, `reservation_released`, `pending_created`, `pending_released`, `registered`, `registered_released`) - `lobby.pending_registration.expirations` — counter; attributes: `trigger` (`tick`, `manual`) - `lobby.user_lifecycle.cascade_releases` — counter; attributes: `event` (`permanent_blocked`, `deleted`) - `lobby.capability_evaluations` — counter; attributes: `result` (`capable`, `incapable`, `noop`) Metrics avoid high-cardinality attributes such as `game_id`, `user_id`, `application_id`, `invite_id`, and `canonical_key`. ### Structured log fields Key operations emit structured logs with these stable field names where applicable: - `game_id` - `game_type` - `game_status` - `from_status` - `to_status` - `user_id` - `application_id` - `invite_id` - `membership_id` - `race_name` - `canonical_key` - `reservation_kind` (`reserved` / `pending_registration` / `registered`) - `eligible_until_ms` - `trigger` - `lifecycle_event` - `request_id` - `trace_id` ## Verification Test doubles split between two styles. Wide-surface ports with no production state (`RuntimeManager`, `IntentPublisher`, `GMClient`, `UserService`) use `gomock`-generated mocks under `internal/adapters/mocks/`; regenerate with `make -C lobby mocks`. Stateful behavioural fakes that mirror the production adapter contract (`gameinmem`, `applicationinmem`, `inviteinmem`, `membershipinmem`, `gameturnstatsinmem`, `racenameinmem`, `evaluationguardinmem`, `gapactivationinmem`, `streamoffsetinmem`) live as in-memory adapters under `internal/adapters/inmem/` and stay hand-rolled because tests rely on their CAS, status-transition, and invariant-tracking behaviour. Focused service-local coverage verifies: - configuration loading and validation for all env var groups - both HTTP listeners start and serve `/healthz` and `/readyz` - game CRUD: create, update, get, list with correct field validation - each status transition fires only from allowed source statuses - enrollment automation: deadline trigger, gap trigger, manual trigger - application flow: submit (eligibility check, race name check), approve, reject - invite flow: create, redeem (auto-membership), decline, revoke, expire on enrollment close - membership model: activate, remove, block with correct before/after-start semantics - Race Name Directory (PostgreSQL + in-memory adapters against the same suite): canonicalization + confusable-pair policy, `Reserve`/`ReleaseReservation` per-game semantics, `MarkPendingRegistration`/`ExpirePendingRegistrations` window, `Register` idempotency + quota, `ReleaseAllByUser` cascade - game start flow: success path (→ running), GM unavailable path (→ paused), container failure path (→ start_failed), metadata persistence failure path (container removed, → start_failed) - GM event stream consumer: snapshot update (stats aggregate), `game_finished` with capability evaluation - user lifecycle stream consumer: `permanent_blocked` and `deleted` cascade release + membership/application/invite settlement - pending-registration expiration worker idempotency - race name registration service: capability, tariff quota, pending window, idempotent retry - notification intent publication for all ten supported triggers - visibility rules: private game hidden from non-member non-owner users - error model: all stable codes returned for correct conditions Cross-service coverage verifies: - `Lobby → User Service` eligibility check compatibility (including the new `max_registered_race_names` field) and failure handling - `Lobby → Notification Service` intent publication for all lobby notification types - `Lobby → Runtime Manager` start job publication and result consumption - `Lobby → Game Master` synchronous registration call (success and failure) - `User Service → Lobby` cascade flow: permanent_block or DeleteUser on a user leads to full RND release + memberships blocked + applications/invites cancelled