# Game Lobby Service `galaxy/lobby` owns platform-level metadata and lifecycle of game sessions. ## References - [Public REST contract](api/public-openapi.yaml) - [Internal REST contract](api/internal-openapi.yaml) - [System architecture](../ARCHITECTURE.md) - [Notification catalog](../notification/README.md) - [User Service lobby eligibility](../user/README.md) - [Service-local docs](docs/) ## Purpose `Game Lobby Service` is the platform source of truth for game sessions as platform entities — from creation through enrollment, start, runtime tracking, and finish. It mediates all player participation actions and maintains the roster state that `Game Master` may cache for runtime authorization. ## Scope `Game Lobby` is the source of truth for: - opaque stable game identifiers in `game-*` form - game metadata: name, description, type, owner, schedule, engine version - platform-level game status from `draft` through `finished` or `cancelled` - enrollment configuration: `min_players`, `max_players`, `start_gap_hours`, `start_gap_players`, `enrollment_ends_at` - applications and their approval or rejection status (public games) - user-bound invitations and their lifecycle (private games) - platform membership roster and participant status - Race Name Directory state across all regular platform users: registered race names (permanent ownership), per-game reservations, and 30-day pending-registration windows - per-game per-user `player_turn_stats` aggregate used at game finish for capability evaluation - denormalized runtime snapshot imported from `Game Master` - user-facing lists: active games, pending applications, open invitations `Game Lobby` is not the source of truth for: - platform user identity or profile — owned by `User Service` - device sessions or authentication state — owned by `Auth / Session Service` - runtime container lifecycle or technical health — owned by `Runtime Manager` - current turn, generation state, engine reachability — owned by `Game Master` - full per-player game state — owned by the game engine container - player-to-engine UUID mapping — owned by `Game Master` ## Non-Goals - `Game Lobby` does not call game engine containers directly; all engine interaction goes through `Game Master`. - `Game Lobby` owns the Race Name Directory data in v1 (Redis adapter); the contract is kept behind a port interface so a future dedicated `Race Name Service` can replace the adapter without domain changes. - `Game Lobby` does not compute notification audiences from roster data at delivery time; notification intents carry explicit `recipient_user_id` values. - `Game Lobby` does not apply sanctions or session-level access control; `User Service` and `Auth / Session Service` remain authoritative for those. - `Game Lobby` does not own billing or entitlement decisions; it reads the current entitlement snapshot from `User Service`. ## Position in the System ```mermaid flowchart LR Gateway["Edge Gateway"] Lobby["Game Lobby Service"] User["User Service"] GM["Game Master"] Runtime["Runtime Manager"] Notify["Notification Service"] Redis["Redis\nKV + Streams"] Gateway --> Lobby Lobby --> User Lobby --> GM Lobby --> Redis Lobby --> Notify GM --> Redis Redis --> Lobby Runtime --> Redis ``` `Gateway` routes authenticated platform-level commands to `Lobby` over trusted REST. `Lobby` reads user eligibility from `User Service` synchronously. `Lobby` registers running games with `Game Master` synchronously at start. `Lobby` submits start jobs to `Runtime Manager` and reads job results from a dedicated Redis Stream. `Game Master` publishes runtime events to a dedicated Redis Stream that `Lobby` consumes asynchronously. `Lobby` publishes notification intents to `notification:intents`. ## Responsibility Boundaries `Game Lobby` is responsible for: - accepting and validating game creation and configuration commands - opening and managing enrollment for public and private games - validating user eligibility before accepting applications and invite redeems - checking race name availability through the Race Name Directory port - enforcing enrollment deadline and roster-size auto-transitions - orchestrating the game start sequence with `Runtime Manager` and `Game Master` - persisting game metadata atomically and removing orphaned containers when metadata persistence fails - maintaining the denormalized runtime snapshot for user-facing reads - emitting notification intents for all participant lifecycle events - enforcing visibility rules: private games are visible only to owner and members `Game Lobby` is not responsible for: - verifying authenticated transport signatures — handled by `Edge Gateway` - checking session revocation state — handled by `Edge Gateway` and `Auth` - email delivery — handled by `Mail Service` - push delivery — handled by `Notification Service` and `Edge Gateway` - container start and stop mechanics — handled by `Runtime Manager` - per-turn player command routing — handled by `Game Master` ## Runtime Surface The service starts two HTTP listeners and one Redis Stream consumer pipeline. ### Listeners - public authenticated REST on `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094` - internal trusted REST on `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095` ### Background workers - enrollment automation ticker — checks enrollment deadlines and roster thresholds at a configurable interval - Runtime Manager result consumer — reads start-job results from a Redis Stream - Game Master event consumer — reads runtime snapshot updates and game-finish events from a dedicated Redis Stream ### Startup dependencies - one reachable Redis deployment at `LOBBY_REDIS_MASTER_ADDR` (mandatory password via `LOBBY_REDIS_PASSWORD`; replicas optional via `LOBBY_REDIS_REPLICA_ADDRS`). Used for streams, race-name directory, per-game runtime aggregates, and stream offsets. - one reachable PostgreSQL primary at `LOBBY_POSTGRES_PRIMARY_DSN` (DSN must include `search_path=lobby&sslmode=disable`). Embedded goose migrations apply at startup before any listener opens; on migration or ping failure the service exits non-zero. The four core enrollment entities (game / application / invite / membership) live here after PG_PLAN.md §6A; `docs/postgres-migration.md` is the decision record. - `User Service` reachable at `LOBBY_USER_SERVICE_BASE_URL` (startup check only; runtime failures are surfaced as request errors, not boot failures) - `Game Master` at `LOBBY_GM_BASE_URL` (same policy — startup check omitted; unreachability at registration triggers the forced-pause path) ### Probes - `GET /healthz` on both ports returns `{"status":"ok"}` - `GET /readyz` on both ports returns `{"status":"ready"}` after successful startup; no live Redis or PostgreSQL ping per request ## Game Record Model ### Fields | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | opaque, stable, `game-*` form | | `game_name` | string | human-readable; mutable in `draft` | | `description` | string | optional; mutable in `draft` and `enrollment_open` | | `game_type` | enum | `public` or `private` | | `owner_user_id` | string | private games only; empty for public | | `status` | enum | see status table below | | `min_players` | int | minimum approved participants to proceed to start | | `max_players` | int | target roster size that activates the gap window | | `start_gap_hours` | int | hours of gap window after `max_players` is reached | | `start_gap_players` | int | additional participants admitted during the gap | | `enrollment_ends_at` | int64 | UTC Unix seconds; deadline for automatic enrollment close | | `turn_schedule` | string | cron expression, e.g. `0 18 * * *`; passed to GM at registration | | `target_engine_version` | string | semver of the engine to launch; passed to GM at registration | | `created_at` | int64 | UTC Unix milliseconds | | `updated_at` | int64 | UTC Unix milliseconds | | `started_at` | int64 | UTC Unix milliseconds; set when status becomes `running` | | `finished_at` | int64 | UTC Unix milliseconds; set when status becomes `finished` | | `current_turn` | int | denormalized from GM; zero until running | | `runtime_status` | string | denormalized from GM; empty until running | | `engine_health_summary` | string | denormalized from GM; empty until running | | `runtime_binding` | object? | non-null after successful container start; contains `container_id`, `engine_endpoint`, `runtime_job_id`, `bound_at` (Unix ms) | All fields set at creation are validated before the game record is persisted. `game_name` is required and must be non-empty after trim. `min_players`, `max_players`, `start_gap_hours`, `start_gap_players`, and `enrollment_ends_at` are required positive integers with `min_players <= max_players`. `turn_schedule` must be a valid five-field cron expression. `target_engine_version` must be a non-empty semver string. ### Status vocabulary | Status | Meaning | | --- | --- | | `draft` | Created; enrollment not yet open; editable | | `enrollment_open` | Accepting applications (public) or invite redeems (private) | | `ready_to_start` | Enrollment closed; start command accepted | | `starting` | Start job submitted to Runtime Manager; awaiting result | | `start_failed` | Container start or metadata persistence failed | | `running` | Game engine container live; normal gameplay | | `paused` | Platform-level pause; engine container may still be alive | | `finished` | Game ended; record is terminal | | `cancelled` | Cancelled before start; record is terminal | ### Status transition table | From | To | Trigger | | --- | --- | --- | | `draft` | `enrollment_open` | explicit command from admin (public) or owner (private) | | `enrollment_open` | `ready_to_start` | manual command when `approved_count >= min_players` | | `enrollment_open` | `ready_to_start` | `enrollment_ends_at` reached and `approved_count >= min_players` | | `enrollment_open` | `ready_to_start` | gap window exhausted (time or player count) | | `ready_to_start` | `starting` | start command from admin (public) or owner (private) | | `starting` | `running` | Runtime Manager confirms container; GM registration succeeds | | `starting` | `paused` | Runtime Manager confirms container; GM registration fails (unavailable) | | `starting` | `start_failed` | Runtime Manager reports container start failure | | `start_failed` | `ready_to_start` | explicit retry command from admin or owner | | `running` | `paused` | explicit pause command from admin or owner | | `running` | `finished` | `game_finished` event from `Game Master` via Redis Stream | | `paused` | `running` | explicit resume command from admin or owner | | `paused` | `finished` | `game_finished` event from `Game Master` via Redis Stream | | `draft` | `cancelled` | explicit cancel command from admin or owner | | `enrollment_open` | `cancelled` | explicit cancel command from admin or owner | | `ready_to_start` | `cancelled` | explicit cancel command from admin or owner | | `start_failed` | `cancelled` | explicit cancel command from admin or owner | | `draft` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `enrollment_open` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `ready_to_start` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `start_failed` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `starting` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `running` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | | `paused` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser | Outside the `external_block` cascade, `running` and `paused` games cannot be cancelled directly; use stop operations through `Game Master` and await the `game_finished` event instead. The cascade publishes a stop-job to Runtime Manager before applying the `external_block` transition for in-flight games. ## Enrollment Rules `enrollment_open → ready_to_start` fires on the first of these conditions: ### Manual close Admin (public game) or owner (private game) issues `lobby.game.ready_to_start` when `approved_count >= min_players`. ### Deadline Enrollment automation worker detects that `enrollment_ends_at` is in the past and `approved_count >= min_players`. If the deadline is reached but `approved_count < min_players`, the game remains in `enrollment_open` — the transition does not fire until the player count condition is also satisfied. ### Gap exhaustion When `approved_count` reaches `max_players`, the gap window opens. During the gap window: - new applications and invite redeems continue to be accepted up to `max_players + start_gap_players` total approved participants - the game does not automatically transition while the gap is open The transition fires when either: - `start_gap_hours` have elapsed since the gap window opened, or - `approved_count` reaches `max_players + start_gap_players` ### On enrollment close When any path transitions the game to `ready_to_start`: - all invites in `created` status transition to `expired` - `lobby.invite.expired` notification intents are published for each expired invite (recipient: private-game owner) - no new applications are accepted in `ready_to_start` status ## Application Lifecycle Applications are used for public games only. Private games use the invite flow exclusively. ### Submit An authenticated user submits `lobby.application.submit` with `race_name`. Pre-conditions checked synchronously: - game status is `enrollment_open` - game type is `public` - user has no existing non-rejected application to the same game - `User Service` eligibility check confirms `can_join_game=true` - `approved_count < max_players + start_gap_players` (or gap window not yet open) - Race Name Directory confirms `race_name` is available for the applicant On success: - an `Application` record is created with `status=submitted` - `lobby.application.submitted` intent published (`audience_kind=admin_email`) with payload: `game_id`, `game_name`, `applicant_user_id`, `applicant_name` `applicant_name` in the notification payload equals the submitted `race_name`. ### Approve Admin issues `lobby.application.approve`. Pre-conditions: - game is `enrollment_open` - application is in `submitted` status - `approved_count < max_players + start_gap_players` On success: - Race Name Directory reserves `race_name` for the applicant - application `status` → `approved` - `Membership` record created with `status=active` - `lobby.membership.approved` intent published (recipient: applicant) with payload: `game_id`, `game_name` - gap window opens automatically if `approved_count` now equals `max_players` - auto-transition to `ready_to_start` if gap exhaustion condition is immediately met ### Reject Admin issues `lobby.application.reject`. Pre-conditions: - application is in `submitted` status On success: - application `status` → `rejected` - any pending Race Name Directory reservation for the applicant is released - `lobby.membership.rejected` intent published (recipient: applicant) with payload: `game_id`, `game_name` ### Application state machine ```text submitted → approved submitted → rejected ``` Rejected applicants may re-apply while enrollment is open, subject to a single active application constraint (at most one non-rejected application per user per game). The single-active constraint is enforced at the persistence layer by the `user_game_application` key (see Redis Logical Model). The key is created atomically with the submitted application record, removed on rejection, and preserved on approval. Service-layer code can rely on this invariant without performing its own scan of `user_applications`. ## Invite Lifecycle Invites are used for private games only. Public games use the application flow exclusively. ### Create Private-game owner issues `lobby.invite.create` with `invitee_user_id`. Pre-conditions: - game status is `enrollment_open` - game type is `private` - the invitee has no active invite or active membership in the game - `approved_count < max_players + start_gap_players` On success: - `Invite` record created with `status=created` - `expires_at` is set to `enrollment_ends_at` of the game - `lobby.invite.created` intent published (recipient: invitee) with payload: `game_id`, `game_name`, `inviter_user_id`, `inviter_name` `inviter_name` is the owner's race name if already a member of the game; otherwise it is the owner's `user_id`. ### Redeem The invited user issues `lobby.invite.redeem` with `race_name`. Pre-conditions: - invite status is `created` - game is `enrollment_open` - `approved_count < max_players + start_gap_players` - inviter and invitee both exist and are not permanently blocked in `User Service` - Race Name Directory confirms `race_name` is available for the invitee On success: - Race Name Directory reserves `race_name` for the invitee - invite `status` → `redeemed` - `Membership` record created with `status=active` - `lobby.invite.redeemed` intent published (recipient: private-game owner) with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name` - gap window opens automatically if `approved_count` now equals `max_players` - auto-transition to `ready_to_start` if gap exhaustion condition is immediately met The synchronous `User Service` check on both inviter and invitee enforces the rule that an invite from or to a permanently blocked or deleted user behaves as if it never existed, even before the asynchronous user-lifecycle cascade has flipped the invite to `revoked`. Cascade-deleted accounts and `permanent_block` sanctions surface as `subject_not_found`. ### Decline The invited user issues `lobby.invite.decline`. Pre-conditions: - invite status is `created` On success: - invite `status` → `declined` - no notification in v1 Declined users may receive a new invite from the owner while enrollment is open. ### Revoke Owner issues `lobby.invite.revoke`. Pre-conditions: - invite status is `created` On success: - invite `status` → `revoked` - no notification in v1 ### Expire Pending invites (`status=created`) are transitioned to `expired` automatically when the game moves to `ready_to_start`. `lobby.invite.expired` intent is published for each expired invite (recipient: private-game owner) with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name`. ### Invite state machine ```text created → redeemed created → declined created → revoked created → expired ``` ## Membership Model ### Fields | Field | Type | Notes | | --- | --- | --- | | `membership_id` | string | opaque, stable | | `game_id` | string | reference to game | | `user_id` | string | reference to platform user | | `race_name` | string | confirmed in-game name as submitted (original casing) | | `canonical_key` | string | canonicalized key under which the RND reservation is held | | `status` | enum | `active`, `removed`, `blocked` | | `joined_at` | int64 | UTC Unix milliseconds | | `removed_at` | int64 | UTC Unix milliseconds; set on remove or block | ### Status vocabulary | Status | Meaning | | --- | --- | | `active` | Full participant; may send commands through `Game Master` | | `removed` | Permanently removed; engine slot deactivated after game start | | `blocked` | Platform-level block; engine slot retained but commands blocked | ### Status transition table | From | To | Trigger | | --- | --- | --- | | `active` | `removed` | explicit remove command from admin or owner (post-start) | | `active` | `blocked` | explicit block command from admin or owner | `removed` and `blocked` are terminal statuses. Pre-start remove drops the membership record entirely rather than transitioning to `removed` (see Removal rules below). ### Removal rules Before game start: - remove drops membership and releases the race name reservation After game start: - `blocked`: the player cannot send commands; engine keeps the player slot - `removed`: `Game Lobby` marks membership `removed`; `Game Master` must also deactivate the player inside the engine; race name reservation remains until game is finished This distinction is architectural and must remain explicit in all implementations. ## Race Name Directory ### Purpose `Race Name Directory` (RND) is the platform source of truth for all in-game `race_name` values. It owns three levels of state per name: - **registered** — permanent user-owned names. Once registered, the name is unavailable to any other user and cannot be released by the owner; only `permanent_block` or `DeleteUser` on the owning account frees it. - **reservation** — a per-game holding created when a participant joins through application approval or invite redeem. Reservations are keyed by `(game_id, canonical_key)`. One user may hold the same name in multiple active games concurrently. - **pending_registration** — a reservation that survived a capable finish and is now waiting up to 30 days for the owner to upgrade it into a registered name via `lobby.race_name.register`. Expiration releases the binding. `User Service` does not store `race_name` values. It only exposes `max_registered_race_names` in the eligibility snapshot and publishes `user.lifecycle.permanent_blocked` / `user.lifecycle.deleted` events. ### Canonical key + confusable-pair policy Every RND key is derived by `racename.Canonicalize(raceName) (canonical string, err error)` living in `lobby/internal/domain/racename/policy.go`: 1. trim and validate the character set via `pkg/util/string.go:ValidateTypeName`; 2. lowercase Unicode fold; 3. apply the frozen confusable-pair replacement map (ported from the former `user/internal/ports/race_name_policy.go`). A name is considered taken for the actor when the RND holds at least one `registered`, active `reservation`, or `pending_registration` whose owner differs from the actor on the same canonical key. ### Port interface ``` type RaceNameDirectory interface { Canonicalize(raceName string) (canonical string, err error) Check(ctx context.Context, raceName, actorUserID string) (Availability, error) Reserve(ctx context.Context, gameID, userID, raceName string) error ReleaseReservation(ctx context.Context, gameID, userID, raceName string) error MarkPendingRegistration( ctx context.Context, gameID, userID, raceName string, eligibleUntil time.Time, ) error ExpirePendingRegistrations(ctx context.Context, now time.Time) ([]ExpiredPending, error) Register(ctx context.Context, gameID, userID, raceName string) error ListRegistered(ctx context.Context, userID string) ([]RegisteredName, error) ListPendingRegistrations(ctx context.Context, userID string) ([]PendingRegistration, error) ListReservations(ctx context.Context, userID string) ([]Reservation, error) ReleaseAllByUser(ctx context.Context, userID string) error } type Availability struct { Taken bool HolderUserID string // "" when available Kind string // "registered" | "reservation" | "pending_registration" } ``` Sentinel errors: `ErrNameTaken`, `ErrInvalidName`, `ErrPendingMissing`, `ErrPendingExpired`, `ErrQuotaExceeded`. ### v1 backends - **PostgreSQL** (`lobby/internal/adapters/postgres/racenamedir/directory.go`) — the production adapter; one row per binding under `lobby.race_names`, transactional writes guarded by `pg_advisory_xact_lock(hashtextextended(canonical_key, 0))`. See `docs/postgres-migration.md` §6B for the full schema and decision record. - **In-memory** (`lobby/internal/adapters/racenameinmem/directory.go`) — in-process implementation used by unit tests that do not need PostgreSQL and by deployments that select the in-memory backend with `LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub` (the config token name is preserved for backward compatibility). A future dedicated `Race Name Service` replaces the adapter without changing the domain or service layer. ### Reservation lifecycle and capability 1. `approveapplication` / `redeeminvite` → `Reserve(game_id, user_id, race_name)`. 2. `removemember` before start → `ReleaseReservation`. 3. `removemember` / `blockmember` after start → reservation kept; resolved at `game_finished`. 4. On `game_finished` the capability evaluator runs per active membership: - `capable = max_planets > initial_planets AND max_population > initial_population`, using the per-game stats aggregate (see §Runtime Snapshot); - capable ⇒ `MarkPendingRegistration(..., finished_at + 30 days)` + `lobby.race_name.registration_eligible`; - not capable ⇒ `ReleaseReservation` + optional `lobby.race_name.registration_denied`. 5. The pending-registration worker (`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) releases expired entries. ### Registration flow `lobby.race_name.register` → `POST /api/v1/lobby/race-names/register`: - actor is the authenticated user; - body: `{race_name, source_game_id}`; - preconditions: - `pending_registration` exists for `(source_game_id, user_id, canonical_key)` with `eligible_until > now`; - `UserService.GetEligibility` snapshot: no `permanent_block`, `current_registered_count < max_registered_race_names` (a snapshot value of `0` denotes unlimited); - commit: `RND.Register` atomically deletes the pending entry, creates a registered entry, and publishes `lobby.race_name.registered`. Errors: `race_name_registration_quota_exceeded`, `race_name_pending_window_expired`, `subject_not_found`, `forbidden`. ### Self-service reads `lobby.race_names.list` → `GET /api/v1/lobby/my/race-names` returns the acting user's `{registered[], pending[], reservations[]}` using the `user_registered` / `user_reservations` indexes (no full scan). The response shape is fixed by `api/public-openapi.yaml` and carries: - `registered[]`: `canonical_key`, `race_name`, `source_game_id`, `registered_at_ms`; - `pending[]`: `canonical_key`, `race_name`, `source_game_id` (the game whose capable finish promoted the reservation), `reserved_at_ms`, `eligible_until_ms`; - `reservations[]`: `canonical_key`, `race_name`, `game_id`, `reserved_at_ms`, `game_status` (current `game.Status` of the hosting game, joined on read). Each slice is sorted ascending by its time field with `canonical_key` as the tie-breaker so the wire output is stable. The endpoint is exclusively self-service: there is no `?user_id=` parameter and no admin counterpart on the internal port. Visibility is enforced by the `X-User-ID` header alone. ### Cascade release `Game Lobby` consumes `user:lifecycle_events` through a dedicated worker. On `user.lifecycle.permanent_blocked` or `user.lifecycle.deleted`: - `RND.ReleaseAllByUser(user_id)` clears every registered, reservation, and pending entry owned by the user; - every active membership held by the user transitions to `blocked`. For each such membership in a third-party private game, a `lobby.membership.blocked` intent is published to the game owner; - every outstanding `submitted` application authored by the user is rejected; - every `created` invite where the user is invitee or inviter transitions to `revoked`; - every non-terminal game owned by the user transitions to `cancelled` via the `external_block` trigger. For in-flight games (`starting`, `running`, `paused`) a stop-job is published to Runtime Manager before the status transition. Synchronous guard: `lobby.invite.redeem` calls `UserService.GetEligibility` for both the inviter and the invitee. If either party has been permanently blocked or soft-deleted, the redeem fails with `subject_not_found`, matching the «as if the invite never existed» semantic even before the cascade flips the invite to `revoked`. ### Retry and release semantics - `Reserve` is idempotent for the same holder under the same game. A second call returns no error so that `approveapplication` and `redeeminvite` retries after transient upstream failures stay safe. - `ReleaseReservation` is a no-op when no reservation exists for the tuple and also when the reservation belongs to a different user. Defensive release paths (`rejectapplication`, `revokeinvite`, `declineinvite`) never surface an error. - `Register` is idempotent only for the same `(game_id, user_id, race_name)` tuple — repeated calls after success return the same registered record without consuming additional quota. - `MarkPendingRegistration` is idempotent when called with the same `eligible_until`; re-emitting it with a different timestamp returns `ErrInvalidName`. ## Game Start Flow The start sequence spans three services and must be treated as a distributed transaction with explicit failure handling. ```mermaid sequenceDiagram participant Admin as Admin or Private Owner participant Lobby participant Runtime participant GM as Game Master participant Redis Admin->>Lobby: lobby.game.start Lobby->>Lobby: validate ready_to_start + roster Lobby->>Lobby: status → starting Lobby->>Redis: publish start job to runtime:start_jobs Runtime->>Runtime: start container Runtime->>Redis: publish result to runtime:job_results alt container start failed Lobby->>Lobby: status → start_failed else container started Lobby->>Lobby: persist runtime binding Lobby->>GM: POST /internal/games/{game_id}/register (sync) alt GM registration success GM-->>Lobby: 200 OK Lobby->>Lobby: status → running; set started_at else GM unavailable GM-->>Lobby: error / timeout Lobby->>Lobby: status → paused Lobby->>Redis: publish lobby.runtime_paused_after_start intent end end ``` ### Critical invariants - If the container starts but `Lobby` cannot persist the runtime binding metadata, the start is a full failure: `Lobby` must issue a stop job to `Runtime Manager` with `reason=orphan_cleanup` before setting `start_failed`. - If metadata is persisted but `Game Master` is unavailable, the game must be placed in `paused`, not in `start_failed`. The container is alive; only the platform tracking is incomplete. - No start job is accepted while the game is not in `ready_to_start`. - Concurrent start attempts for the same game must be serialized; the second attempt must fail if the first already moved the game to `starting`. ### Runtime Manager envelopes `Lobby` is the producer for both `runtime:start_jobs` and `runtime:stop_jobs`. The `Lobby ↔ Runtime Manager` transport stays asynchronous indefinitely; there is no synchronous Lobby→RTM REST call in v1 or planned for v2. `runtime:start_jobs` envelope: | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | Lobby `game_id`. | | `image_ref` | string | Docker reference resolved from `target_engine_version` via `LOBBY_ENGINE_IMAGE_TEMPLATE`. | | `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. | `runtime:stop_jobs` envelope: | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | | | `reason` | enum | `orphan_cleanup`, `cancelled`, `finished`, `admin_request`, `timeout`. | | `requested_at_ms` | int64 | UTC milliseconds. | `reason` semantics (Lobby producer side): - `orphan_cleanup` — used by Lobby's runtime-job-result consumer to release a container whose metadata persistence failed after a successful container start. - `cancelled` — used by the user-lifecycle cascade and by explicit cancel paths for in-flight games. - `finished` — reserved; not produced by Lobby in v1 because `game_finished` is engine-driven and stop jobs after finish are an Admin/GM concern. - `admin_request` — reserved for future admin-initiated stop paths through Lobby; not produced in v1. - `timeout` — reserved for future enrollment-timeout-driven stop paths; not produced in v1. ### Design rationale: StopReason placement The `StopReason` enum is declared in `lobby/internal/ports/runtimemanager.go` alongside the `RuntimeManager` interface that consumes it. The enum is publisher-side protocol: it mirrors the AsyncAPI discriminator on `runtime:stop_jobs`, has no behaviour beyond `Validate`, and co-locating it with the interface keeps the AsyncAPI ↔ Go mapping visible in one file. Alternatives considered and rejected: - a dedicated `lobby/internal/domain/runtimejob` package — manufactures a domain layer for a single string enum that exists only to be serialised onto a Redis Stream; - placing the enum in the publisher adapter package (`lobby/internal/adapters/runtimemanager`) — the callers (start-game service, runtime-job-result worker, user-lifecycle worker) live outside that package and would have to depend on a concrete adapter for an enum value. ### Design rationale: `engineimage.Resolver` validates the template at construction `engineimage.Resolver` stores the validated template; the per-game `Resolve(version)` call is therefore a pure string substitution that cannot fail except on an empty `version`. `LOBBY_ENGINE_IMAGE_TEMPLATE` is loaded at startup. A malformed value (missing `{engine_version}` placeholder, empty string) is an operational misconfiguration that fails fast before any traffic arrives — not on the first start-game request hours later. The synchronous start handler then incurs no per-call template-shape recheck. A stateless free function `engineimage.Resolve(template, version)` was rejected: the only useful checkpoint for the template literal is at startup; a free function would either re-validate on every call (waste) or skip validation (regression). The resolver only guards against an empty/whitespace `version`. Semver validation lives in `lobby/internal/domain/game/model.go:validateSemver` and runs at game-record construction time. Re-running it inside the resolver would either duplicate the rule (drift risk) or import the validator across package boundaries for no behavioural gain. Keeping the resolver narrow leaves it reusable from a future producer (for example `Game Master`, when it takes over `image_ref` resolution) without dragging Lobby's domain rules along. The defensive `return start game: resolve image ref: %w` in `startgame.Service.Handle` is a guard against a future invariant violation; it is not exercised by the service-level test suite because the only resolver-failure mode (empty `version`) requires bypassing `game.Validate`, which `gameinmem.Save` always runs. Adding test scaffolding to skip validation would teach the test suite a back door that the production code path does not have. ## Paused State `Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime failure states. Two paths lead to `paused`: ### Voluntary pause Admin or owner issues `lobby.game.pause` while the game is `running`. Resume is issued with `lobby.game.resume`; `Lobby` performs a synchronous liveness check against `Game Master` before transitioning back to `running`. ### Forced pause (GM unavailable after start) If the game start sequence succeeds at the runtime layer but `Game Master` registration fails, `Lobby` transitions to `paused` and publishes `lobby.runtime_paused_after_start` to administrators. Administrators investigate, restore `Game Master`, and issue `lobby.game.resume` through the internal admin surface. ## Game Finish Flow `Game Master` publishes a `game_finished` event to the GM events Redis Stream when the engine reports that the game has ended. `Lobby` consumes this event and, before advancing the stream offset: - transitions game status to `finished` - sets `finished_at` to the event timestamp - updates the denormalized runtime snapshot with the final values - runs the capability evaluator against every `active` membership: - `capable = max_planets > initial_planets AND max_population > initial_population` from the per-member stats aggregate - capable ⇒ `RND.MarkPendingRegistration(game_id, user_id, race_name, finished_at + 30 days)` and publish `lobby.race_name.registration_eligible` - not capable ⇒ `RND.ReleaseReservation(game_id, user_id, race_name)` and (optional) publish `lobby.race_name.registration_denied` - resolves outstanding reservations on `removed` and `blocked` memberships by calling `RND.ReleaseReservation` (post-start remove/block keeps the reservation alive specifically so capability evaluation resolves it here) - deletes the per-game stats aggregate The `game_finished` event from `Game Master` is the sole trigger for the `finished` status. `Lobby` does not independently decide that a game is finished. Capability evaluation must be idempotent: a replayed `game_finished` event must not produce additional RND side effects or notifications. ## Runtime Snapshot `Game Lobby` stores a denormalized runtime snapshot on the game record to prevent fan-out reads to `Game Master` on every user-facing list or detail request, and aggregates per-member stats to support capability evaluation at game finish. ### Denormalized snapshot fields | Field | Source | | --- | --- | | `current_turn` | GM event `runtime_snapshot_update` | | `runtime_status` | GM event `runtime_snapshot_update` | | `engine_health_summary` | GM event `runtime_snapshot_update` | ### Per-member stats aggregate Each `runtime_snapshot_update` carries a `player_turn_stats` array with one entry per active member: `{user_id, planets, population, ships_built}`. `Lobby` aggregates these in `lobby:game_turn_stats::` with the shape `{initial_planets, initial_population, initial_ships_built, max_planets, max_population, max_ships_built}`. Rules: - `initial_*` values are frozen from the first event after `starting → running`; later events must not change them. - `max_*` values are maintained by max-semantic update; they never decrease. - the aggregate is read once by the capability evaluator at `game_finished` and then deleted. ### Update mechanism `Game Master` publishes events to a dedicated Redis Stream consumed by `Lobby`: - `runtime_snapshot_update`: carries updated `current_turn`, `runtime_status`, `engine_health_summary`, and `player_turn_stats`; `Lobby` applies a compare-and-swap update on the game record plus a stats aggregate upsert. - `game_finished`: carries final snapshot values and signals the finish transition; capability evaluator (see §Game Finish Flow) runs before the stream offset is advanced. `Lobby` does not expose the runtime snapshot update as an internal HTTP endpoint. All snapshot updates are asynchronous and delivered through the stream. ## Public vs Private Game Rules ### Public games - created and controlled by system administrators through the internal admin surface - visible in the public game list when in `enrollment_open`, `ready_to_start`, `running`, or `finished` status - `draft` public games are not visible to non-admin users - players join through the application flow; admission requires admin approval - turn schedule and engine version are set by the administrator ### Private games - created only by eligible paid users whose `User Service` eligibility snapshot carries `can_create_private_game=true` and whose `max_owned_private_games` limit allows it - visible only to the owner and to users who have an active membership or a non-expired invite - `draft` private games are visible only to the owner - players join through the invite flow; invite redemption creates active membership immediately without further owner approval - owner manages invites, turn schedule, and engine version ## Owner-Admin Capabilities Private-game owners have a limited owner-admin capability set over their own games only: - open enrollment (`draft` → `enrollment_open`) - create and revoke invites - manually close enrollment (`enrollment_open` → `ready_to_start`) - start the game (`ready_to_start` → `starting`) - pause and resume the game (`running` ↔ `paused`) - retry start or cancel after `start_failed` - remove or block members - cancel the game (from `draft`, `enrollment_open`, `ready_to_start`, `start_failed`) Owners do not have system-admin power. They cannot see or operate on other users' private games. They cannot approve or reject applications (applications are public-game only). ## Trusted Surfaces ### Public authenticated REST (gateway-facing) All user-facing commands arrive through `Edge Gateway`. Gateway verifies the authenticated session, transcodes the FlatBuffers command to a trusted REST call, and forwards it to `Lobby` on the public port. Gateway enriches each request with the authenticated `user_id` via the `X-User-ID` header. `Lobby` must never derive the acting user from the request payload. #### Message type catalog | `message_type` | Method | Path | Actor | | --- | --- | --- | --- | | `lobby.game.create` | `POST` | `/api/v1/lobby/games` | admin (public), eligible user (private) | | `lobby.game.update` | `PATCH` | `/api/v1/lobby/games/{game_id}` | admin or owner; draft only | | `lobby.game.get` | `GET` | `/api/v1/lobby/games/{game_id}` | any authenticated user (visibility rules apply) | | `lobby.games.list` | `GET` | `/api/v1/lobby/games` | any authenticated user | | `lobby.game.open_enrollment` | `POST` | `/api/v1/lobby/games/{game_id}/open-enrollment` | admin or owner | | `lobby.game.ready_to_start` | `POST` | `/api/v1/lobby/games/{game_id}/ready-to-start` | admin or owner | | `lobby.game.start` | `POST` | `/api/v1/lobby/games/{game_id}/start` | admin or owner | | `lobby.game.pause` | `POST` | `/api/v1/lobby/games/{game_id}/pause` | admin or owner | | `lobby.game.resume` | `POST` | `/api/v1/lobby/games/{game_id}/resume` | admin or owner | | `lobby.game.cancel` | `POST` | `/api/v1/lobby/games/{game_id}/cancel` | admin or owner | | `lobby.game.retry_start` | `POST` | `/api/v1/lobby/games/{game_id}/retry-start` | admin or owner | | `lobby.application.submit` | `POST` | `/api/v1/lobby/games/{game_id}/applications` | authenticated user | | `lobby.application.approve` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/approve` | admin | | `lobby.application.reject` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/reject` | admin | | `lobby.invite.create` | `POST` | `/api/v1/lobby/games/{game_id}/invites` | private-game owner | | `lobby.invite.redeem` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/redeem` | invited user | | `lobby.invite.decline` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/decline` | invited user | | `lobby.invite.revoke` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/revoke` | private-game owner | | `lobby.membership.remove` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/remove` | admin or owner | | `lobby.membership.block` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/block` | admin or owner | | `lobby.memberships.list` | `GET` | `/api/v1/lobby/games/{game_id}/memberships` | admin, owner, or active member | | `lobby.my_games.list` | `GET` | `/api/v1/lobby/my/games` | authenticated user | | `lobby.my_applications.list` | `GET` | `/api/v1/lobby/my/applications` | authenticated user | | `lobby.my_invites.list` | `GET` | `/api/v1/lobby/my/invites` | authenticated user | | `lobby.race_name.register` | `POST` | `/api/v1/lobby/race-names/register` | authenticated user | | `lobby.race_names.list` | `GET` | `/api/v1/lobby/my/race-names` | authenticated user | ### Internal trusted REST (internal-facing) The internal port is not reachable from the public internet. It is used by `Game Master` for the synchronous registration call and by the administrative backend for admin-only operations. Key internal endpoints: | Method | Path | Purpose | | --- | --- | --- | | `GET` | `/api/v1/internal/games/{game_id}` | game detail read for GM/admin | | `GET` | `/api/v1/internal/games/{game_id}/memberships` | full membership list for GM | | `GET` | `/api/v1/internal/healthz` | health probe | | `GET` | `/api/v1/internal/readyz` | readiness probe | Note: the registration call from Lobby to Game Master after a successful container start is **outgoing** — Lobby calls `POST /api/v1/internal/games/{game_id}/register-runtime` on Game Master's internal port. Lobby does not expose an inbound `register-runtime` endpoint. Admin-only operations (approve, reject, cancel, create public games, etc.) are also exposed on the internal port and are intended to be called by `Admin Service` after it enforces the system-admin role check at the gateway boundary. ## User-Facing Lists ### My active games Returns games where the authenticated user has an active membership and the game status is `running` or `paused`. Response includes the denormalized runtime snapshot. ### My pending applications Returns applications submitted by the authenticated user with status `submitted`. Includes game name and type for display. ### My open invitations Returns invites addressed to the authenticated user with status `created`. Includes game name, inviter name, and `expires_at`. ### Public game list Paginated list of public games with status in `enrollment_open`, `ready_to_start`, `running`, or `finished`. Games in `draft` or `cancelled` are excluded. Default order: `enrollment_open` and `ready_to_start` first, then `running`, then `finished` (most recent first within each group). ### Visibility rules - private `draft` games: visible only to the owner - private non-draft games: visible only to the owner and users with active membership or non-expired invite - public `draft` games: visible only to system administrators - public non-draft games: visible in the public list ## Notification Contracts `Game Lobby` publishes normalized notification intents to `notification:intents` using the `galaxy/notificationintent` producer module. | Trigger | `notification_type` | Audience | Channels | | --- | --- | --- | --- | | Application submitted (public game) | `lobby.application.submitted` | configured admin email list | `email` | | Application approved | `lobby.membership.approved` | applicant user | `push+email` | | Application rejected | `lobby.membership.rejected` | applicant user | `push+email` | | Cascade membership block (`permanent_block`/`DeleteUser`) | `lobby.membership.blocked` | private-game owner | `push+email` | | Invite created (private game) | `lobby.invite.created` | invited user | `push+email` | | Invite redeemed (private game) | `lobby.invite.redeemed` | private-game owner | `push+email` | | Invite expired (on enrollment close) | `lobby.invite.expired` | private-game owner | `email` | | GM unavailable after start (forced pause) | `lobby.runtime_paused_after_start` | configured admin email list | `email` | | Race name eligible for registration | `lobby.race_name.registration_eligible` | capable member | `push+email` | | Race name successfully registered | `lobby.race_name.registered` | registering user | `push+email` | | Race name registration denied (capability) | `lobby.race_name.registration_denied` | incapable member | `email` | Rules: - intents carry explicit `recipient_user_id` values; `Lobby` resolves recipients before publishing rather than delegating audience resolution to `Notification Service` - a failed intent publication is a notification degradation and must not roll back already committed business state - `lobby.invite.revoked` and `lobby.invite.declined` produce no notification in v1 - `lobby.application.submitted` is published only for public games; the private-game owner-targeting path defined in the notification catalog is reserved for future use ## Domain Events `Game Lobby` publishes auxiliary post-commit domain events to the Redis stream configured for lobby domain events. Frozen event types: - `lobby.game.created` - `lobby.game.status_changed` - `lobby.membership.activated` - `lobby.membership.removed` - `lobby.membership.blocked` Event rules: - events are post-commit only; they are not emitted on failed operations - event envelopes carry `game_id`, optional `user_id`, occurrence timestamp, new status (for `status_changed`), and optional trace correlation - domain events are observability and downstream-read-model artifacts; they must not carry full business state payloads ## Error Model The trusted internal REST contract uses strict JSON error envelopes: ```json { "error": { "code": "invalid_request", "message": "request is invalid" } } ``` Stable error codes: - `invalid_request` — malformed input or failed validation - `conflict` — state transition not allowed from current status - `subject_not_found` — game, application, invite, membership, or pending race-name registration not found - `eligibility_denied` — user not eligible per `User Service` - `name_taken` — `race_name` already registered, reserved, or pending for another user - `race_name_registration_quota_exceeded` — user's `max_registered_race_names` slot is full - `race_name_pending_window_expired` — the 30-day registration window has passed for the pending entry - `race_name_capability_not_met` — capability condition not satisfied at game finish (reservation released) - `race_name_permanent_blocked` — the user carries an active `permanent_block` sanction - `forbidden` — caller is not authorized for this operation on this game or this race name - `internal_error` — unexpected service error - `service_unavailable` — upstream dependency unavailable ## Configuration ### Required - `LOBBY_REDIS_MASTER_ADDR` - `LOBBY_REDIS_PASSWORD` - `LOBBY_POSTGRES_PRIMARY_DSN` - `LOBBY_USER_SERVICE_BASE_URL` - `LOBBY_GM_BASE_URL` ### Configuration groups Process and logging: - `LOBBY_SHUTDOWN_TIMEOUT` with default `30s` - `LOBBY_LOG_LEVEL` with default `info` Public HTTP: - `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094` - `LOBBY_PUBLIC_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `LOBBY_PUBLIC_HTTP_READ_TIMEOUT` with default `10s` - `LOBBY_PUBLIC_HTTP_IDLE_TIMEOUT` with default `1m` Internal HTTP: - `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095` - `LOBBY_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `LOBBY_INTERNAL_HTTP_READ_TIMEOUT` with default `10s` - `LOBBY_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m` Redis connectivity: - `LOBBY_REDIS_MASTER_ADDR` (required) - `LOBBY_REDIS_REPLICA_ADDRS` (optional, comma-separated; not consumed yet) - `LOBBY_REDIS_PASSWORD` (required) - `LOBBY_REDIS_DB` (default 0) - `LOBBY_REDIS_OPERATION_TIMEOUT` (default 250ms) The legacy `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_USERNAME`, and `LOBBY_REDIS_TLS_ENABLED` env vars were retired in PG_PLAN.md §6A; setting either of the latter two now fails fast at startup. See `ARCHITECTURE.md §Persistence Backends` for the architectural rules. PostgreSQL connectivity (PG_PLAN.md §6A and §6B; durable game / application / invite / membership records and the Race Name Directory live here): - `LOBBY_POSTGRES_PRIMARY_DSN` (required; e.g. `postgres://lobbyservice:secret@postgres:5432/galaxy?search_path=lobby&sslmode=disable`) - `LOBBY_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not consumed yet) - `LOBBY_POSTGRES_OPERATION_TIMEOUT` (default 1s) - `LOBBY_POSTGRES_MAX_OPEN_CONNS` (default 25) - `LOBBY_POSTGRES_MAX_IDLE_CONNS` (default 5) - `LOBBY_POSTGRES_CONN_MAX_LIFETIME` (default 30m) Stream names: - `LOBBY_GM_EVENTS_STREAM` with default `gm:lobby_events` - `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` with default `2s` - `LOBBY_RUNTIME_START_JOBS_STREAM` with default `runtime:start_jobs` - `LOBBY_RUNTIME_STOP_JOBS_STREAM` with default `runtime:stop_jobs` - `LOBBY_RUNTIME_JOB_RESULTS_STREAM` with default `runtime:job_results` - `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s` - `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents` Runtime Manager integration: - `LOBBY_ENGINE_IMAGE_TEMPLATE` with default `galaxy/game:{engine_version}` — Go-style template applied to a game's `target_engine_version` to resolve the Docker `image_ref` published on `runtime:start_jobs`. The template must contain the literal placeholder `{engine_version}`; Lobby fails fast at startup otherwise. Upstream clients: - `LOBBY_USER_SERVICE_TIMEOUT` with default `1s` - `LOBBY_GM_TIMEOUT` with default `5s` Enrollment automation: - `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL` with default `30s` Race Name Directory: - `LOBBY_RACE_NAME_DIRECTORY_BACKEND` with default `postgres` (alternate: `stub` for in-process tests; PG_PLAN.md §6B retired the `redis` backend) - `LOBBY_RACE_NAME_EXPIRATION_INTERVAL` with default `1h` — pending registration expiration worker tick The 30-day eligibility window for `pending_registration` entries is the constant `service/capabilityevaluation.PendingRegistrationWindow`. It is intentionally not operator-tunable today; the env var name `LOBBY_PENDING_REGISTRATION_TTL_HOURS` is reserved for a future change. User lifecycle: - `LOBBY_USER_LIFECYCLE_STREAM` with default `user:lifecycle_events` - `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` with default `2s` OpenTelemetry: - standard `OTEL_*` variables - `LOBBY_OTEL_STDOUT_TRACES_ENABLED` - `LOBBY_OTEL_STDOUT_METRICS_ENABLED` ## Persistence Layout Game / application / invite / membership records live in PostgreSQL after PG_PLAN.md §6A; the Race Name Directory followed in §6B. See `docs/postgres-migration.md` for the schema and decision records. The `lobby` schema owns five tables — `games`, `applications`, `invites`, `memberships`, `race_names` — plus the partial UNIQUE index on `applications(applicant_user_id, game_id) WHERE status <> 'rejected'` that enforces the single-active-application invariant and the partial UNIQUE index on `race_names(canonical_key) WHERE binding_kind = 'registered'` that enforces single-registered-per-canonical. The Redis-backed keys below survive both stages. Redis owns the runtime-coordination state — per-game runtime aggregates, gap activation, capability-evaluation guards, and stream consumer offsets — plus the event-bus streams themselves. ### Redis key table Storage rules for Redis: - timestamps are stored in Unix milliseconds unless noted otherwise - dynamic key segments are base64url-encoded | Logical artifact | Redis key | | --- | --- | | per-game per-user stats aggregate | `lobby:game_turn_stats::` → JSON aggregate | | per-game stats user index | `lobby:game_turn_stats_by_game:` (set of `user_id`) | | capability-evaluation guard | `lobby:capability_evaluation:done:` (sentinel string) | | GM event stream offset | `lobby:stream_offsets:gm_events` | | runtime job result offset | `lobby:stream_offsets:runtime_results` | | user lifecycle stream offset | `lobby:stream_offsets:user_lifecycle` | | gap window activation time | `lobby:gap_activated_at:` | ### Frozen record fields The five durable records are stored in PostgreSQL columns; the field set per record is unchanged from the previous Redis JSON shape and is documented inline with the migration scripts under `internal/adapters/postgres/migrations/`. | Record | Frozen fields | | --- | --- | | game record | all game fields listed in Game Record Model section | | application record | `application_id`, `game_id`, `applicant_user_id`, `race_name`, `status`, `created_at`, `decided_at` | | invite record | `invite_id`, `game_id`, `inviter_user_id`, `invitee_user_id`, `race_name` (set at redeem), `status`, `created_at`, `expires_at`, `decided_at` | | membership record | all membership fields listed in Membership Model section | | race_names row | `canonical_key`, `game_id`, `holder_user_id`, `race_name`, `binding_kind`, `source_game_id`, `reserved_at_ms`, `eligible_until_ms` (pending only), `registered_at_ms` (registered only) | ## Observability ### Metrics - `lobby.game.transitions` — counter; attributes: `from_status`, `to_status`, `trigger` (`command`, `manual`, `deadline`, `gap`, `runtime_event`, `external_block`) - `lobby.application.outcomes` — counter; attributes: `outcome` (`submitted`, `approved`, `rejected`) - `lobby.invite.outcomes` — counter; attributes: `outcome` (`created`, `redeemed`, `declined`, `revoked`, `expired`) - `lobby.membership.changes` — counter; attributes: `change` (`activated`, `removed`, `blocked`, `external_block`) - `lobby.start_flow.outcomes` — counter; attributes: `outcome` (`running`, `paused`, `start_failed`) - `lobby.notification.publish_attempts` — counter; attributes: `notification_type`, `result` (`ok`, `error`) - `lobby.active_games` — observable gauge; attributes: `status` - `lobby.enrollment_automation.checks` — counter; attributes: `result` (`no_op`, `transitioned`) - `lobby.gm_events.oldest_unprocessed_age_ms` — observable gauge - `lobby.runtime_results.oldest_unprocessed_age_ms` — observable gauge - `lobby.user_lifecycle.oldest_unprocessed_age_ms` — observable gauge - `lobby.race_name.outcomes` — counter; attributes: `outcome` (`reserved`, `reservation_released`, `pending_created`, `pending_released`, `registered`, `registered_released`) - `lobby.pending_registration.expirations` — counter; attributes: `trigger` (`tick`, `manual`) - `lobby.user_lifecycle.cascade_releases` — counter; attributes: `event` (`permanent_blocked`, `deleted`) - `lobby.capability_evaluations` — counter; attributes: `result` (`capable`, `incapable`, `noop`) Metrics avoid high-cardinality attributes such as `game_id`, `user_id`, `application_id`, `invite_id`, and `canonical_key`. ### Structured log fields Key operations emit structured logs with these stable field names where applicable: - `game_id` - `game_type` - `game_status` - `from_status` - `to_status` - `user_id` - `application_id` - `invite_id` - `membership_id` - `race_name` - `canonical_key` - `reservation_kind` (`reserved` / `pending_registration` / `registered`) - `eligible_until_ms` - `trigger` - `lifecycle_event` - `request_id` - `trace_id` ## Verification Test doubles split between two styles. Wide-surface ports with no production state (`RuntimeManager`, `IntentPublisher`, `GMClient`, `UserService`) use `gomock`-generated mocks under `internal/adapters/mocks/`; regenerate with `make -C lobby mocks`. Stateful behavioural fakes that mirror the production adapter contract (`gameinmem`, `applicationinmem`, `inviteinmem`, `membershipinmem`, `gameturnstatsinmem`, `racenameinmem`, `evaluationguardinmem`, `gapactivationinmem`, `streamoffsetinmem`) live as in-memory adapters under `internal/adapters/inmem/` and stay hand-rolled because tests rely on their CAS, status-transition, and invariant-tracking behaviour. Focused service-local coverage verifies: - configuration loading and validation for all env var groups - both HTTP listeners start and serve `/healthz` and `/readyz` - game CRUD: create, update, get, list with correct field validation - each status transition fires only from allowed source statuses - enrollment automation: deadline trigger, gap trigger, manual trigger - application flow: submit (eligibility check, race name check), approve, reject - invite flow: create, redeem (auto-membership), decline, revoke, expire on enrollment close - membership model: activate, remove, block with correct before/after-start semantics - Race Name Directory (PostgreSQL + in-memory adapters against the same suite): canonicalization + confusable-pair policy, `Reserve`/`ReleaseReservation` per-game semantics, `MarkPendingRegistration`/`ExpirePendingRegistrations` window, `Register` idempotency + quota, `ReleaseAllByUser` cascade - game start flow: success path (→ running), GM unavailable path (→ paused), container failure path (→ start_failed), metadata persistence failure path (container removed, → start_failed) - GM event stream consumer: snapshot update (stats aggregate), `game_finished` with capability evaluation - user lifecycle stream consumer: `permanent_blocked` and `deleted` cascade release + membership/application/invite settlement - pending-registration expiration worker idempotency - race name registration service: capability, tariff quota, pending window, idempotent retry - notification intent publication for all ten supported triggers - visibility rules: private game hidden from non-member non-owner users - error model: all stable codes returned for correct conditions Cross-service coverage verifies: - `Lobby → User Service` eligibility check compatibility (including the new `max_registered_race_names` field) and failure handling - `Lobby → Notification Service` intent publication for all lobby notification types - `Lobby → Runtime Manager` start job publication and result consumption - `Lobby → Game Master` synchronous registration call (success and failure) - `User Service → Lobby` cascade flow: permanent_block or DeleteUser on a user leads to full RND release + memberships blocked + applications/invites cancelled