Files
galaxy-game/lobby/README.md
T
2026-04-28 20:39:18 +02:00

1417 lines
60 KiB
Markdown

# Game Lobby Service
`galaxy/lobby` owns platform-level metadata and lifecycle of game sessions.
## References
- [Public REST contract](api/public-openapi.yaml)
- [Internal REST contract](api/internal-openapi.yaml)
- [System architecture](../ARCHITECTURE.md)
- [Notification catalog](../notification/README.md)
- [User Service lobby eligibility](../user/README.md)
- [Service-local docs](docs/)
## Purpose
`Game Lobby Service` is the platform source of truth for game sessions as
platform entities — from creation through enrollment, start, runtime tracking,
and finish. It mediates all player participation actions and maintains the
roster state that `Game Master` may cache for runtime authorization.
## Scope
`Game Lobby` is the source of truth for:
- opaque stable game identifiers in `game-*` form
- game metadata: name, description, type, owner, schedule, engine version
- platform-level game status from `draft` through `finished` or `cancelled`
- enrollment configuration: `min_players`, `max_players`, `start_gap_hours`,
`start_gap_players`, `enrollment_ends_at`
- applications and their approval or rejection status (public games)
- user-bound invitations and their lifecycle (private games)
- platform membership roster and participant status
- Race Name Directory state across all regular platform users: registered
race names (permanent ownership), per-game reservations, and 30-day
pending-registration windows
- per-game per-user `player_turn_stats` aggregate used at game finish for
capability evaluation
- denormalized runtime snapshot imported from `Game Master`
- user-facing lists: active games, pending applications, open invitations
`Game Lobby` is not the source of truth for:
- platform user identity or profile — owned by `User Service`
- device sessions or authentication state — owned by `Auth / Session Service`
- runtime container lifecycle or technical health — owned by `Runtime Manager`
- current turn, generation state, engine reachability — owned by `Game Master`
- full per-player game state — owned by the game engine container
- player-to-engine UUID mapping — owned by `Game Master`
## Non-Goals
- `Game Lobby` does not call game engine containers directly; all engine
interaction goes through `Game Master`.
- `Game Lobby` owns the Race Name Directory data in v1 (Redis adapter); the
contract is kept behind a port interface so a future dedicated
`Race Name Service` can replace the adapter without domain changes.
- `Game Lobby` does not compute notification audiences from roster data at
delivery time; notification intents carry explicit `recipient_user_id` values.
- `Game Lobby` does not apply sanctions or session-level access control;
`User Service` and `Auth / Session Service` remain authoritative for those.
- `Game Lobby` does not own billing or entitlement decisions; it reads the
current entitlement snapshot from `User Service`.
## Position in the System
```mermaid
flowchart LR
Gateway["Edge Gateway"]
Lobby["Game Lobby Service"]
User["User Service"]
GM["Game Master"]
Runtime["Runtime Manager"]
Notify["Notification Service"]
Redis["Redis\nKV + Streams"]
Gateway --> Lobby
Lobby --> User
Lobby --> GM
Lobby --> Redis
Lobby --> Notify
GM --> Redis
Redis --> Lobby
Runtime --> Redis
```
`Gateway` routes authenticated platform-level commands to `Lobby` over trusted
REST.
`Lobby` reads user eligibility from `User Service` synchronously.
`Lobby` registers running games with `Game Master` synchronously at start.
`Lobby` submits start jobs to `Runtime Manager` and reads job results from a
dedicated Redis Stream.
`Game Master` publishes runtime events to a dedicated Redis Stream that `Lobby`
consumes asynchronously.
`Lobby` publishes notification intents to `notification:intents`.
## Responsibility Boundaries
`Game Lobby` is responsible for:
- accepting and validating game creation and configuration commands
- opening and managing enrollment for public and private games
- validating user eligibility before accepting applications and invite redeems
- checking race name availability through the Race Name Directory port
- enforcing enrollment deadline and roster-size auto-transitions
- orchestrating the game start sequence with `Runtime Manager` and `Game Master`
- persisting game metadata atomically and removing orphaned containers when
metadata persistence fails
- maintaining the denormalized runtime snapshot for user-facing reads
- emitting notification intents for all participant lifecycle events
- enforcing visibility rules: private games are visible only to owner and members
`Game Lobby` is not responsible for:
- verifying authenticated transport signatures — handled by `Edge Gateway`
- checking session revocation state — handled by `Edge Gateway` and `Auth`
- email delivery — handled by `Mail Service`
- push delivery — handled by `Notification Service` and `Edge Gateway`
- container start and stop mechanics — handled by `Runtime Manager`
- per-turn player command routing — handled by `Game Master`
## Runtime Surface
The service starts two HTTP listeners and one Redis Stream consumer pipeline.
### Listeners
- public authenticated REST on `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094`
- internal trusted REST on `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095`
### Background workers
- enrollment automation ticker — checks enrollment deadlines and roster
thresholds at a configurable interval
- Runtime Manager result consumer — reads start-job results from a Redis Stream
- Game Master event consumer — reads runtime snapshot updates and game-finish
events from a dedicated Redis Stream
### Startup dependencies
- one reachable Redis deployment at `LOBBY_REDIS_MASTER_ADDR` (mandatory
password via `LOBBY_REDIS_PASSWORD`; replicas optional via
`LOBBY_REDIS_REPLICA_ADDRS`). Used for streams, race-name directory,
per-game runtime aggregates, and stream offsets.
- one reachable PostgreSQL primary at `LOBBY_POSTGRES_PRIMARY_DSN` (DSN
must include `search_path=lobby&sslmode=disable`). Embedded goose
migrations apply at startup before any listener opens; on migration or
ping failure the service exits non-zero. The four core enrollment
entities (game / application / invite / membership) live here after
PG_PLAN.md §6A; `docs/postgres-migration.md` is the decision record.
- `User Service` reachable at `LOBBY_USER_SERVICE_BASE_URL` (startup check only;
runtime failures are surfaced as request errors, not boot failures)
- `Game Master` at `LOBBY_GM_BASE_URL` (same policy — startup check omitted;
unreachability at registration triggers the forced-pause path)
### Probes
- `GET /healthz` on both ports returns `{"status":"ok"}`
- `GET /readyz` on both ports returns `{"status":"ready"}` after successful
startup; no live Redis or PostgreSQL ping per request
## Game Record Model
### Fields
| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | opaque, stable, `game-*` form |
| `game_name` | string | human-readable; mutable in `draft` |
| `description` | string | optional; mutable in `draft` and `enrollment_open` |
| `game_type` | enum | `public` or `private` |
| `owner_user_id` | string | private games only; empty for public |
| `status` | enum | see status table below |
| `min_players` | int | minimum approved participants to proceed to start |
| `max_players` | int | target roster size that activates the gap window |
| `start_gap_hours` | int | hours of gap window after `max_players` is reached |
| `start_gap_players` | int | additional participants admitted during the gap |
| `enrollment_ends_at` | int64 | UTC Unix seconds; deadline for automatic enrollment close |
| `turn_schedule` | string | cron expression, e.g. `0 18 * * *`; passed to GM at registration |
| `target_engine_version` | string | semver of the engine to launch; passed to GM at registration |
| `created_at` | int64 | UTC Unix milliseconds |
| `updated_at` | int64 | UTC Unix milliseconds |
| `started_at` | int64 | UTC Unix milliseconds; set when status becomes `running` |
| `finished_at` | int64 | UTC Unix milliseconds; set when status becomes `finished` |
| `current_turn` | int | denormalized from GM; zero until running |
| `runtime_status` | string | denormalized from GM; empty until running |
| `engine_health_summary` | string | denormalized from GM; empty until running |
| `runtime_binding` | object? | non-null after successful container start; contains `container_id`, `engine_endpoint`, `runtime_job_id`, `bound_at` (Unix ms) |
All fields set at creation are validated before the game record is persisted.
`game_name` is required and must be non-empty after trim.
`min_players`, `max_players`, `start_gap_hours`, `start_gap_players`, and
`enrollment_ends_at` are required positive integers with `min_players <= max_players`.
`turn_schedule` must be a valid five-field cron expression.
`target_engine_version` must be a non-empty semver string.
### Status vocabulary
| Status | Meaning |
| --- | --- |
| `draft` | Created; enrollment not yet open; editable |
| `enrollment_open` | Accepting applications (public) or invite redeems (private) |
| `ready_to_start` | Enrollment closed; start command accepted |
| `starting` | Start job submitted to Runtime Manager; awaiting result |
| `start_failed` | Container start or metadata persistence failed |
| `running` | Game engine container live; normal gameplay |
| `paused` | Platform-level pause; engine container may still be alive |
| `finished` | Game ended; record is terminal |
| `cancelled` | Cancelled before start; record is terminal |
### Status transition table
| From | To | Trigger |
| --- | --- | --- |
| `draft` | `enrollment_open` | explicit command from admin (public) or owner (private) |
| `enrollment_open` | `ready_to_start` | manual command when `approved_count >= min_players` |
| `enrollment_open` | `ready_to_start` | `enrollment_ends_at` reached and `approved_count >= min_players` |
| `enrollment_open` | `ready_to_start` | gap window exhausted (time or player count) |
| `ready_to_start` | `starting` | start command from admin (public) or owner (private) |
| `starting` | `running` | Runtime Manager confirms container; GM registration succeeds |
| `starting` | `paused` | Runtime Manager confirms container; GM registration fails (unavailable) |
| `starting` | `start_failed` | Runtime Manager reports container start failure |
| `start_failed` | `ready_to_start` | explicit retry command from admin or owner |
| `running` | `paused` | explicit pause command from admin or owner |
| `running` | `finished` | `game_finished` event from `Game Master` via Redis Stream |
| `paused` | `running` | explicit resume command from admin or owner |
| `paused` | `finished` | `game_finished` event from `Game Master` via Redis Stream |
| `draft` | `cancelled` | explicit cancel command from admin or owner |
| `enrollment_open` | `cancelled` | explicit cancel command from admin or owner |
| `ready_to_start` | `cancelled` | explicit cancel command from admin or owner |
| `start_failed` | `cancelled` | explicit cancel command from admin or owner |
| `draft` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `enrollment_open` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `ready_to_start` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `start_failed` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `starting` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `running` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
| `paused` | `cancelled` | `external_block` cascade on owner permanent_block / DeleteUser |
Outside the `external_block` cascade, `running` and `paused` games cannot be
cancelled directly; use stop operations through `Game Master` and await the
`game_finished` event instead. The cascade publishes a stop-job to Runtime
Manager before applying the `external_block` transition for in-flight games.
## Enrollment Rules
`enrollment_open → ready_to_start` fires on the first of these conditions:
### Manual close
Admin (public game) or owner (private game) issues `lobby.game.ready_to_start`
when `approved_count >= min_players`.
### Deadline
Enrollment automation worker detects that `enrollment_ends_at` is in the past
and `approved_count >= min_players`.
If the deadline is reached but `approved_count < min_players`, the game remains
in `enrollment_open` — the transition does not fire until the player count
condition is also satisfied.
### Gap exhaustion
When `approved_count` reaches `max_players`, the gap window opens.
During the gap window:
- new applications and invite redeems continue to be accepted up to
`max_players + start_gap_players` total approved participants
- the game does not automatically transition while the gap is open
The transition fires when either:
- `start_gap_hours` have elapsed since the gap window opened, or
- `approved_count` reaches `max_players + start_gap_players`
### On enrollment close
When any path transitions the game to `ready_to_start`:
- all invites in `created` status transition to `expired`
- `lobby.invite.expired` notification intents are published for each expired invite
(recipient: private-game owner)
- no new applications are accepted in `ready_to_start` status
## Application Lifecycle
Applications are used for public games only.
Private games use the invite flow exclusively.
### Submit
An authenticated user submits `lobby.application.submit` with `race_name`.
Pre-conditions checked synchronously:
- game status is `enrollment_open`
- game type is `public`
- user has no existing non-rejected application to the same game
- `User Service` eligibility check confirms `can_join_game=true`
- `approved_count < max_players + start_gap_players` (or gap window not yet open)
- Race Name Directory confirms `race_name` is available for the applicant
On success:
- an `Application` record is created with `status=submitted`
- `lobby.application.submitted` intent published (`audience_kind=admin_email`)
with payload: `game_id`, `game_name`, `applicant_user_id`, `applicant_name`
`applicant_name` in the notification payload equals the submitted `race_name`.
### Approve
Admin issues `lobby.application.approve`.
Pre-conditions:
- game is `enrollment_open`
- application is in `submitted` status
- `approved_count < max_players + start_gap_players`
On success:
- Race Name Directory reserves `race_name` for the applicant
- application `status``approved`
- `Membership` record created with `status=active`
- `lobby.membership.approved` intent published (recipient: applicant)
with payload: `game_id`, `game_name`
- gap window opens automatically if `approved_count` now equals `max_players`
- auto-transition to `ready_to_start` if gap exhaustion condition is immediately met
### Reject
Admin issues `lobby.application.reject`.
Pre-conditions:
- application is in `submitted` status
On success:
- application `status``rejected`
- any pending Race Name Directory reservation for the applicant is released
- `lobby.membership.rejected` intent published (recipient: applicant)
with payload: `game_id`, `game_name`
### Application state machine
```text
submitted → approved
submitted → rejected
```
Rejected applicants may re-apply while enrollment is open, subject to a single
active application constraint (at most one non-rejected application per user per
game).
The single-active constraint is enforced at the persistence layer by the
`user_game_application` key (see Redis Logical Model). The key is created
atomically with the submitted application record, removed on rejection, and
preserved on approval. Service-layer code can rely on this invariant without
performing its own scan of `user_applications`.
## Invite Lifecycle
Invites are used for private games only.
Public games use the application flow exclusively.
### Create
Private-game owner issues `lobby.invite.create` with `invitee_user_id`.
Pre-conditions:
- game status is `enrollment_open`
- game type is `private`
- the invitee has no active invite or active membership in the game
- `approved_count < max_players + start_gap_players`
On success:
- `Invite` record created with `status=created`
- `expires_at` is set to `enrollment_ends_at` of the game
- `lobby.invite.created` intent published (recipient: invitee)
with payload: `game_id`, `game_name`, `inviter_user_id`, `inviter_name`
`inviter_name` is the owner's race name if already a member of the game;
otherwise it is the owner's `user_id`.
### Redeem
The invited user issues `lobby.invite.redeem` with `race_name`.
Pre-conditions:
- invite status is `created`
- game is `enrollment_open`
- `approved_count < max_players + start_gap_players`
- inviter and invitee both exist and are not permanently blocked in
`User Service`
- Race Name Directory confirms `race_name` is available for the invitee
On success:
- Race Name Directory reserves `race_name` for the invitee
- invite `status``redeemed`
- `Membership` record created with `status=active`
- `lobby.invite.redeemed` intent published (recipient: private-game owner)
with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name`
- gap window opens automatically if `approved_count` now equals `max_players`
- auto-transition to `ready_to_start` if gap exhaustion condition is immediately met
The synchronous `User Service` check on both inviter and invitee enforces the
rule that an invite from or to a permanently blocked or deleted user behaves
as if it never existed, even before the asynchronous user-lifecycle cascade
has flipped the invite to `revoked`. Cascade-deleted accounts and
`permanent_block` sanctions surface as `subject_not_found`.
### Decline
The invited user issues `lobby.invite.decline`.
Pre-conditions:
- invite status is `created`
On success:
- invite `status``declined`
- no notification in v1
Declined users may receive a new invite from the owner while enrollment is open.
### Revoke
Owner issues `lobby.invite.revoke`.
Pre-conditions:
- invite status is `created`
On success:
- invite `status``revoked`
- no notification in v1
### Expire
Pending invites (`status=created`) are transitioned to `expired` automatically
when the game moves to `ready_to_start`.
`lobby.invite.expired` intent is published for each expired invite
(recipient: private-game owner)
with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name`.
### Invite state machine
```text
created → redeemed
created → declined
created → revoked
created → expired
```
## Membership Model
### Fields
| Field | Type | Notes |
| --- | --- | --- |
| `membership_id` | string | opaque, stable |
| `game_id` | string | reference to game |
| `user_id` | string | reference to platform user |
| `race_name` | string | confirmed in-game name as submitted (original casing) |
| `canonical_key` | string | canonicalized key under which the RND reservation is held |
| `status` | enum | `active`, `removed`, `blocked` |
| `joined_at` | int64 | UTC Unix milliseconds |
| `removed_at` | int64 | UTC Unix milliseconds; set on remove or block |
### Status vocabulary
| Status | Meaning |
| --- | --- |
| `active` | Full participant; may send commands through `Game Master` |
| `removed` | Permanently removed; engine slot deactivated after game start |
| `blocked` | Platform-level block; engine slot retained but commands blocked |
### Status transition table
| From | To | Trigger |
| --- | --- | --- |
| `active` | `removed` | explicit remove command from admin or owner (post-start) |
| `active` | `blocked` | explicit block command from admin or owner |
`removed` and `blocked` are terminal statuses. Pre-start remove drops the
membership record entirely rather than transitioning to `removed`
(see Removal rules below).
### Removal rules
Before game start:
- remove drops membership and releases the race name reservation
After game start:
- `blocked`: the player cannot send commands; engine keeps the player slot
- `removed`: `Game Lobby` marks membership `removed`; `Game Master` must also
deactivate the player inside the engine; race name reservation remains until
game is finished
This distinction is architectural and must remain explicit in all implementations.
## Race Name Directory
### Purpose
`Race Name Directory` (RND) is the platform source of truth for all in-game
`race_name` values. It owns three levels of state per name:
- **registered** — permanent user-owned names. Once registered, the name is
unavailable to any other user and cannot be released by the owner; only
`permanent_block` or `DeleteUser` on the owning account frees it.
- **reservation** — a per-game holding created when a participant joins
through application approval or invite redeem. Reservations are keyed by
`(game_id, canonical_key)`. One user may hold the same name in multiple
active games concurrently.
- **pending_registration** — a reservation that survived a capable finish and
is now waiting up to 30 days for the owner to upgrade it into a registered
name via `lobby.race_name.register`. Expiration releases the binding.
`User Service` does not store `race_name` values. It only exposes
`max_registered_race_names` in the eligibility snapshot and publishes
`user.lifecycle.permanent_blocked` / `user.lifecycle.deleted` events.
### Canonical key + confusable-pair policy
Every RND key is derived by
`racename.Canonicalize(raceName) (canonical string, err error)` living in
`lobby/internal/domain/racename/policy.go`:
1. trim and validate the character set via `pkg/util/string.go:ValidateTypeName`;
2. lowercase Unicode fold;
3. apply the frozen confusable-pair replacement map (ported from the former
`user/internal/ports/race_name_policy.go`).
A name is considered taken for the actor when the RND holds at least one
`registered`, active `reservation`, or `pending_registration` whose owner
differs from the actor on the same canonical key.
### Port interface
```
type RaceNameDirectory interface {
Canonicalize(raceName string) (canonical string, err error)
Check(ctx context.Context, raceName, actorUserID string) (Availability, error)
Reserve(ctx context.Context, gameID, userID, raceName string) error
ReleaseReservation(ctx context.Context, gameID, userID, raceName string) error
MarkPendingRegistration(
ctx context.Context,
gameID, userID, raceName string,
eligibleUntil time.Time,
) error
ExpirePendingRegistrations(ctx context.Context, now time.Time) ([]ExpiredPending, error)
Register(ctx context.Context, gameID, userID, raceName string) error
ListRegistered(ctx context.Context, userID string) ([]RegisteredName, error)
ListPendingRegistrations(ctx context.Context, userID string) ([]PendingRegistration, error)
ListReservations(ctx context.Context, userID string) ([]Reservation, error)
ReleaseAllByUser(ctx context.Context, userID string) error
}
type Availability struct {
Taken bool
HolderUserID string // "" when available
Kind string // "registered" | "reservation" | "pending_registration"
}
```
Sentinel errors: `ErrNameTaken`, `ErrInvalidName`, `ErrPendingMissing`,
`ErrPendingExpired`, `ErrQuotaExceeded`.
### v1 backends
- **PostgreSQL** (`lobby/internal/adapters/postgres/racenamedir/directory.go`)
— the production adapter; one row per binding under
`lobby.race_names`, transactional writes guarded by
`pg_advisory_xact_lock(hashtextextended(canonical_key, 0))`. See
`docs/postgres-migration.md` §6B for the full schema and decision
record.
- **In-memory** (`lobby/internal/adapters/racenameinmem/directory.go`) —
in-process implementation used by unit tests that do not need
PostgreSQL and by deployments that select the in-memory backend with
`LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub` (the config token name is
preserved for backward compatibility).
A future dedicated `Race Name Service` replaces the adapter without changing
the domain or service layer.
### Reservation lifecycle and capability
1. `approveapplication` / `redeeminvite` → `Reserve(game_id, user_id,
race_name)`.
2. `removemember` before start → `ReleaseReservation`.
3. `removemember` / `blockmember` after start → reservation kept; resolved at
`game_finished`.
4. On `game_finished` the capability evaluator runs per active membership:
- `capable = max_planets > initial_planets AND max_population >
initial_population`, using the per-game stats aggregate (see §Runtime
Snapshot);
- capable ⇒ `MarkPendingRegistration(..., finished_at + 30 days)` +
`lobby.race_name.registration_eligible`;
- not capable ⇒ `ReleaseReservation` + optional
`lobby.race_name.registration_denied`.
5. The pending-registration worker
(`LOBBY_RACE_NAME_EXPIRATION_INTERVAL`) releases expired entries.
### Registration flow
`lobby.race_name.register` → `POST /api/v1/lobby/race-names/register`:
- actor is the authenticated user;
- body: `{race_name, source_game_id}`;
- preconditions:
- `pending_registration` exists for `(source_game_id, user_id, canonical_key)`
with `eligible_until > now`;
- `UserService.GetEligibility` snapshot: no `permanent_block`,
`current_registered_count < max_registered_race_names` (a snapshot value
of `0` denotes unlimited);
- commit: `RND.Register` atomically deletes the pending entry, creates a
registered entry, and publishes `lobby.race_name.registered`.
Errors: `race_name_registration_quota_exceeded`,
`race_name_pending_window_expired`, `subject_not_found`, `forbidden`.
### Self-service reads
`lobby.race_names.list` → `GET /api/v1/lobby/my/race-names` returns the
acting user's `{registered[], pending[], reservations[]}` using the
`user_registered` / `user_reservations` indexes (no full scan).
The response shape is fixed by `api/public-openapi.yaml` and carries:
- `registered[]`: `canonical_key`, `race_name`, `source_game_id`,
`registered_at_ms`;
- `pending[]`: `canonical_key`, `race_name`, `source_game_id` (the
game whose capable finish promoted the reservation),
`reserved_at_ms`, `eligible_until_ms`;
- `reservations[]`: `canonical_key`, `race_name`, `game_id`,
`reserved_at_ms`, `game_status` (current `game.Status` of the
hosting game, joined on read).
Each slice is sorted ascending by its time field with `canonical_key`
as the tie-breaker so the wire output is stable. The endpoint is
exclusively self-service: there is no `?user_id=` parameter and no
admin counterpart on the internal port. Visibility is enforced by the
`X-User-ID` header alone.
### Cascade release
`Game Lobby` consumes `user:lifecycle_events` through a dedicated worker. On
`user.lifecycle.permanent_blocked` or `user.lifecycle.deleted`:
- `RND.ReleaseAllByUser(user_id)` clears every registered, reservation, and
pending entry owned by the user;
- every active membership held by the user transitions to `blocked`. For each
such membership in a third-party private game, a `lobby.membership.blocked`
intent is published to the game owner;
- every outstanding `submitted` application authored by the user is rejected;
- every `created` invite where the user is invitee or inviter transitions to
`revoked`;
- every non-terminal game owned by the user transitions to `cancelled` via
the `external_block` trigger. For in-flight games (`starting`, `running`,
`paused`) a stop-job is published to Runtime Manager before the status
transition.
Synchronous guard: `lobby.invite.redeem` calls `UserService.GetEligibility`
for both the inviter and the invitee. If either party has been permanently
blocked or soft-deleted, the redeem fails with `subject_not_found`, matching
the «as if the invite never existed» semantic even before the cascade
flips the invite to `revoked`.
### Retry and release semantics
- `Reserve` is idempotent for the same holder under the same game. A second
call returns no error so that `approveapplication` and `redeeminvite`
retries after transient upstream failures stay safe.
- `ReleaseReservation` is a no-op when no reservation exists for the tuple
and also when the reservation belongs to a different user. Defensive
release paths (`rejectapplication`, `revokeinvite`, `declineinvite`) never
surface an error.
- `Register` is idempotent only for the same `(game_id, user_id, race_name)`
tuple — repeated calls after success return the same registered record
without consuming additional quota.
- `MarkPendingRegistration` is idempotent when called with the same
`eligible_until`; re-emitting it with a different timestamp returns
`ErrInvalidName`.
## Game Start Flow
The start sequence spans three services and must be treated as a distributed
transaction with explicit failure handling.
```mermaid
sequenceDiagram
participant Admin as Admin or Private Owner
participant Lobby
participant Runtime
participant GM as Game Master
participant Redis
Admin->>Lobby: lobby.game.start
Lobby->>Lobby: validate ready_to_start + roster
Lobby->>Lobby: status → starting
Lobby->>Redis: publish start job to runtime:start_jobs
Runtime->>Runtime: start container
Runtime->>Redis: publish result to runtime:job_results
alt container start failed
Lobby->>Lobby: status → start_failed
else container started
Lobby->>Lobby: persist runtime binding
Lobby->>GM: POST /internal/games/{game_id}/register (sync)
alt GM registration success
GM-->>Lobby: 200 OK
Lobby->>Lobby: status → running; set started_at
else GM unavailable
GM-->>Lobby: error / timeout
Lobby->>Lobby: status → paused
Lobby->>Redis: publish lobby.runtime_paused_after_start intent
end
end
```
### Critical invariants
- If the container starts but `Lobby` cannot persist the runtime binding metadata,
the start is a full failure: `Lobby` must issue a stop job to `Runtime Manager`
with `reason=orphan_cleanup` before setting `start_failed`.
- If metadata is persisted but `Game Master` is unavailable, the game must be
placed in `paused`, not in `start_failed`. The container is alive; only the
platform tracking is incomplete.
- No start job is accepted while the game is not in `ready_to_start`.
- Concurrent start attempts for the same game must be serialized; the second
attempt must fail if the first already moved the game to `starting`.
### Runtime Manager envelopes
`Lobby` is the producer for both `runtime:start_jobs` and `runtime:stop_jobs`.
The `Lobby ↔ Runtime Manager` transport stays asynchronous indefinitely; there
is no synchronous Lobby→RTM REST call in v1 or planned for v2.
`runtime:start_jobs` envelope:
| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | Lobby `game_id`. |
| `image_ref` | string | Docker reference resolved from `target_engine_version` via `LOBBY_ENGINE_IMAGE_TEMPLATE`. |
| `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. |
`runtime:stop_jobs` envelope:
| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | |
| `reason` | enum | `orphan_cleanup`, `cancelled`, `finished`, `admin_request`, `timeout`. |
| `requested_at_ms` | int64 | UTC milliseconds. |
`reason` semantics (Lobby producer side):
- `orphan_cleanup` — used by Lobby's runtime-job-result consumer to release a
container whose metadata persistence failed after a successful container
start.
- `cancelled` — used by the user-lifecycle cascade and by explicit cancel paths
for in-flight games.
- `finished` — reserved; not produced by Lobby in v1 because `game_finished`
is engine-driven and stop jobs after finish are an Admin/GM concern.
- `admin_request` — reserved for future admin-initiated stop paths through
Lobby; not produced in v1.
- `timeout` — reserved for future enrollment-timeout-driven stop paths; not
produced in v1.
### Design rationale: StopReason placement
The `StopReason` enum is declared in
`lobby/internal/ports/runtimemanager.go` alongside the `RuntimeManager`
interface that consumes it. The enum is publisher-side protocol: it
mirrors the AsyncAPI discriminator on `runtime:stop_jobs`, has no
behaviour beyond `Validate`, and co-locating it with the interface keeps
the AsyncAPI ↔ Go mapping visible in one file.
Alternatives considered and rejected:
- a dedicated `lobby/internal/domain/runtimejob` package — manufactures
a domain layer for a single string enum that exists only to be
serialised onto a Redis Stream;
- placing the enum in the publisher adapter package
(`lobby/internal/adapters/runtimemanager`) — the callers (start-game
service, runtime-job-result worker, user-lifecycle worker) live
outside that package and would have to depend on a concrete adapter
for an enum value.
### Design rationale: `engineimage.Resolver` validates the template at construction
`engineimage.Resolver` stores the validated template; the per-game
`Resolve(version)` call is therefore a pure string substitution that
cannot fail except on an empty `version`.
`LOBBY_ENGINE_IMAGE_TEMPLATE` is loaded at startup. A malformed value
(missing `{engine_version}` placeholder, empty string) is an
operational misconfiguration that fails fast before any traffic arrives
— not on the first start-game request hours later. The synchronous
start handler then incurs no per-call template-shape recheck.
A stateless free function `engineimage.Resolve(template, version)` was
rejected: the only useful checkpoint for the template literal is at
startup; a free function would either re-validate on every call (waste)
or skip validation (regression).
The resolver only guards against an empty/whitespace `version`. Semver
validation lives in `lobby/internal/domain/game/model.go:validateSemver`
and runs at game-record construction time. Re-running it inside the
resolver would either duplicate the rule (drift risk) or import the
validator across package boundaries for no behavioural gain. Keeping the
resolver narrow leaves it reusable from a future producer (for example
`Game Master`, when it takes over `image_ref` resolution) without
dragging Lobby's domain rules along.
The defensive `return start game: resolve image ref: %w` in
`startgame.Service.Handle` is a guard against a future invariant
violation; it is not exercised by the service-level test suite because
the only resolver-failure mode (empty `version`) requires bypassing
`game.Validate`, which `gameinmem.Save` always runs. Adding test
scaffolding to skip validation would teach the test suite a back door
that the production code path does not have.
## Paused State
`Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime
failure states. Two paths lead to `paused`:
### Voluntary pause
Admin or owner issues `lobby.game.pause` while the game is `running`.
Resume is issued with `lobby.game.resume`; `Lobby` performs a synchronous
liveness check against `Game Master` before transitioning back to `running`.
### Forced pause (GM unavailable after start)
If the game start sequence succeeds at the runtime layer but `Game Master`
registration fails, `Lobby` transitions to `paused` and publishes
`lobby.runtime_paused_after_start` to administrators.
Administrators investigate, restore `Game Master`, and issue `lobby.game.resume`
through the internal admin surface.
## Game Finish Flow
`Game Master` publishes a `game_finished` event to the GM events Redis Stream
when the engine reports that the game has ended.
`Lobby` consumes this event and, before advancing the stream offset:
- transitions game status to `finished`
- sets `finished_at` to the event timestamp
- updates the denormalized runtime snapshot with the final values
- runs the capability evaluator against every `active` membership:
- `capable = max_planets > initial_planets AND max_population >
initial_population` from the per-member stats aggregate
- capable ⇒ `RND.MarkPendingRegistration(game_id, user_id, race_name,
finished_at + 30 days)` and publish
`lobby.race_name.registration_eligible`
- not capable ⇒ `RND.ReleaseReservation(game_id, user_id, race_name)` and
(optional) publish `lobby.race_name.registration_denied`
- resolves outstanding reservations on `removed` and `blocked` memberships by
calling `RND.ReleaseReservation` (post-start remove/block keeps the
reservation alive specifically so capability evaluation resolves it here)
- deletes the per-game stats aggregate
The `game_finished` event from `Game Master` is the sole trigger for the
`finished` status. `Lobby` does not independently decide that a game is
finished. Capability evaluation must be idempotent: a replayed
`game_finished` event must not produce additional RND side effects or
notifications.
## Runtime Snapshot
`Game Lobby` stores a denormalized runtime snapshot on the game record to
prevent fan-out reads to `Game Master` on every user-facing list or detail
request, and aggregates per-member stats to support capability evaluation at
game finish.
### Denormalized snapshot fields
| Field | Source |
| --- | --- |
| `current_turn` | GM event `runtime_snapshot_update` |
| `runtime_status` | GM event `runtime_snapshot_update` |
| `engine_health_summary` | GM event `runtime_snapshot_update` |
### Per-member stats aggregate
Each `runtime_snapshot_update` carries a `player_turn_stats` array with one
entry per active member: `{user_id, planets, population, ships_built}`.
`Lobby` aggregates these in `lobby:game_turn_stats:<game_id>:<user_id>` with
the shape
`{initial_planets, initial_population, initial_ships_built, max_planets,
max_population, max_ships_built}`.
Rules:
- `initial_*` values are frozen from the first event after
`starting → running`; later events must not change them.
- `max_*` values are maintained by max-semantic update; they never decrease.
- the aggregate is read once by the capability evaluator at `game_finished`
and then deleted.
### Update mechanism
`Game Master` publishes events to a dedicated Redis Stream consumed by `Lobby`:
- `runtime_snapshot_update`: carries updated `current_turn`, `runtime_status`,
`engine_health_summary`, and `player_turn_stats`; `Lobby` applies a
compare-and-swap update on the game record plus a stats aggregate upsert.
- `game_finished`: carries final snapshot values and signals the finish
transition; capability evaluator (see §Game Finish Flow) runs before the
stream offset is advanced.
`Lobby` does not expose the runtime snapshot update as an internal HTTP
endpoint. All snapshot updates are asynchronous and delivered through the
stream.
## Public vs Private Game Rules
### Public games
- created and controlled by system administrators through the internal admin surface
- visible in the public game list when in `enrollment_open`, `ready_to_start`,
`running`, or `finished` status
- `draft` public games are not visible to non-admin users
- players join through the application flow; admission requires admin approval
- turn schedule and engine version are set by the administrator
### Private games
- created only by eligible paid users whose `User Service` eligibility snapshot
carries `can_create_private_game=true` and whose `max_owned_private_games`
limit allows it
- visible only to the owner and to users who have an active membership or a
non-expired invite
- `draft` private games are visible only to the owner
- players join through the invite flow; invite redemption creates active
membership immediately without further owner approval
- owner manages invites, turn schedule, and engine version
## Owner-Admin Capabilities
Private-game owners have a limited owner-admin capability set over their own
games only:
- open enrollment (`draft` → `enrollment_open`)
- create and revoke invites
- manually close enrollment (`enrollment_open` → `ready_to_start`)
- start the game (`ready_to_start` → `starting`)
- pause and resume the game (`running` ↔ `paused`)
- retry start or cancel after `start_failed`
- remove or block members
- cancel the game (from `draft`, `enrollment_open`, `ready_to_start`, `start_failed`)
Owners do not have system-admin power.
They cannot see or operate on other users' private games.
They cannot approve or reject applications (applications are public-game only).
## Trusted Surfaces
### Public authenticated REST (gateway-facing)
All user-facing commands arrive through `Edge Gateway`.
Gateway verifies the authenticated session, transcodes the FlatBuffers command
to a trusted REST call, and forwards it to `Lobby` on the public port.
Gateway enriches each request with the authenticated `user_id` via the
`X-User-ID` header.
`Lobby` must never derive the acting user from the request payload.
#### Message type catalog
| `message_type` | Method | Path | Actor |
| --- | --- | --- | --- |
| `lobby.game.create` | `POST` | `/api/v1/lobby/games` | admin (public), eligible user (private) |
| `lobby.game.update` | `PATCH` | `/api/v1/lobby/games/{game_id}` | admin or owner; draft only |
| `lobby.game.get` | `GET` | `/api/v1/lobby/games/{game_id}` | any authenticated user (visibility rules apply) |
| `lobby.games.list` | `GET` | `/api/v1/lobby/games` | any authenticated user |
| `lobby.game.open_enrollment` | `POST` | `/api/v1/lobby/games/{game_id}/open-enrollment` | admin or owner |
| `lobby.game.ready_to_start` | `POST` | `/api/v1/lobby/games/{game_id}/ready-to-start` | admin or owner |
| `lobby.game.start` | `POST` | `/api/v1/lobby/games/{game_id}/start` | admin or owner |
| `lobby.game.pause` | `POST` | `/api/v1/lobby/games/{game_id}/pause` | admin or owner |
| `lobby.game.resume` | `POST` | `/api/v1/lobby/games/{game_id}/resume` | admin or owner |
| `lobby.game.cancel` | `POST` | `/api/v1/lobby/games/{game_id}/cancel` | admin or owner |
| `lobby.game.retry_start` | `POST` | `/api/v1/lobby/games/{game_id}/retry-start` | admin or owner |
| `lobby.application.submit` | `POST` | `/api/v1/lobby/games/{game_id}/applications` | authenticated user |
| `lobby.application.approve` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/approve` | admin |
| `lobby.application.reject` | `POST` | `/api/v1/lobby/games/{game_id}/applications/{application_id}/reject` | admin |
| `lobby.invite.create` | `POST` | `/api/v1/lobby/games/{game_id}/invites` | private-game owner |
| `lobby.invite.redeem` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/redeem` | invited user |
| `lobby.invite.decline` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/decline` | invited user |
| `lobby.invite.revoke` | `POST` | `/api/v1/lobby/games/{game_id}/invites/{invite_id}/revoke` | private-game owner |
| `lobby.membership.remove` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/remove` | admin or owner |
| `lobby.membership.block` | `POST` | `/api/v1/lobby/games/{game_id}/memberships/{membership_id}/block` | admin or owner |
| `lobby.memberships.list` | `GET` | `/api/v1/lobby/games/{game_id}/memberships` | admin, owner, or active member |
| `lobby.my_games.list` | `GET` | `/api/v1/lobby/my/games` | authenticated user |
| `lobby.my_applications.list` | `GET` | `/api/v1/lobby/my/applications` | authenticated user |
| `lobby.my_invites.list` | `GET` | `/api/v1/lobby/my/invites` | authenticated user |
| `lobby.race_name.register` | `POST` | `/api/v1/lobby/race-names/register` | authenticated user |
| `lobby.race_names.list` | `GET` | `/api/v1/lobby/my/race-names` | authenticated user |
### Internal trusted REST (internal-facing)
The internal port is not reachable from the public internet.
It is used by `Game Master` for the synchronous registration call and by the
administrative backend for admin-only operations.
Key internal endpoints:
| Method | Path | Purpose |
| --- | --- | --- |
| `GET` | `/api/v1/internal/games/{game_id}` | game detail read for GM/admin |
| `GET` | `/api/v1/internal/games/{game_id}/memberships` | full membership list for GM |
| `GET` | `/api/v1/internal/healthz` | health probe |
| `GET` | `/api/v1/internal/readyz` | readiness probe |
Note: the registration call from Lobby to Game Master after a successful
container start is **outgoing** — Lobby calls
`POST /api/v1/internal/games/{game_id}/register-runtime` on Game Master's
internal port. Lobby does not expose an inbound `register-runtime`
endpoint.
Admin-only operations (approve, reject, cancel, create public games, etc.) are
also exposed on the internal port and are intended to be called by `Admin Service`
after it enforces the system-admin role check at the gateway boundary.
## User-Facing Lists
### My active games
Returns games where the authenticated user has an active membership and the game
status is `running` or `paused`.
Response includes the denormalized runtime snapshot.
### My pending applications
Returns applications submitted by the authenticated user with status `submitted`.
Includes game name and type for display.
### My open invitations
Returns invites addressed to the authenticated user with status `created`.
Includes game name, inviter name, and `expires_at`.
### Public game list
Paginated list of public games with status in
`enrollment_open`, `ready_to_start`, `running`, or `finished`.
Games in `draft` or `cancelled` are excluded.
Default order: `enrollment_open` and `ready_to_start` first, then `running`, then
`finished` (most recent first within each group).
### Visibility rules
- private `draft` games: visible only to the owner
- private non-draft games: visible only to the owner and users with active
membership or non-expired invite
- public `draft` games: visible only to system administrators
- public non-draft games: visible in the public list
## Notification Contracts
`Game Lobby` publishes normalized notification intents to `notification:intents`
using the `galaxy/notificationintent` producer module.
| Trigger | `notification_type` | Audience | Channels |
| --- | --- | --- | --- |
| Application submitted (public game) | `lobby.application.submitted` | configured admin email list | `email` |
| Application approved | `lobby.membership.approved` | applicant user | `push+email` |
| Application rejected | `lobby.membership.rejected` | applicant user | `push+email` |
| Cascade membership block (`permanent_block`/`DeleteUser`) | `lobby.membership.blocked` | private-game owner | `push+email` |
| Invite created (private game) | `lobby.invite.created` | invited user | `push+email` |
| Invite redeemed (private game) | `lobby.invite.redeemed` | private-game owner | `push+email` |
| Invite expired (on enrollment close) | `lobby.invite.expired` | private-game owner | `email` |
| GM unavailable after start (forced pause) | `lobby.runtime_paused_after_start` | configured admin email list | `email` |
| Race name eligible for registration | `lobby.race_name.registration_eligible` | capable member | `push+email` |
| Race name successfully registered | `lobby.race_name.registered` | registering user | `push+email` |
| Race name registration denied (capability) | `lobby.race_name.registration_denied` | incapable member | `email` |
Rules:
- intents carry explicit `recipient_user_id` values; `Lobby` resolves recipients
before publishing rather than delegating audience resolution to `Notification Service`
- a failed intent publication is a notification degradation and must not roll back
already committed business state
- `lobby.invite.revoked` and `lobby.invite.declined` produce no notification in v1
- `lobby.application.submitted` is published only for public games; the private-game
owner-targeting path defined in the notification catalog is reserved for future use
## Domain Events
`Game Lobby` publishes auxiliary post-commit domain events to the Redis stream
configured for lobby domain events.
Frozen event types:
- `lobby.game.created`
- `lobby.game.status_changed`
- `lobby.membership.activated`
- `lobby.membership.removed`
- `lobby.membership.blocked`
Event rules:
- events are post-commit only; they are not emitted on failed operations
- event envelopes carry `game_id`, optional `user_id`, occurrence timestamp,
new status (for `status_changed`), and optional trace correlation
- domain events are observability and downstream-read-model artifacts;
they must not carry full business state payloads
## Error Model
The trusted internal REST contract uses strict JSON error envelopes:
```json
{
"error": {
"code": "invalid_request",
"message": "request is invalid"
}
}
```
Stable error codes:
- `invalid_request` — malformed input or failed validation
- `conflict` — state transition not allowed from current status
- `subject_not_found` — game, application, invite, membership, or pending
race-name registration not found
- `eligibility_denied` — user not eligible per `User Service`
- `name_taken` — `race_name` already registered, reserved, or pending for
another user
- `race_name_registration_quota_exceeded` — user's `max_registered_race_names`
slot is full
- `race_name_pending_window_expired` — the 30-day registration window has
passed for the pending entry
- `race_name_capability_not_met` — capability condition not satisfied at
game finish (reservation released)
- `race_name_permanent_blocked` — the user carries an active
`permanent_block` sanction
- `forbidden` — caller is not authorized for this operation on this game or
this race name
- `internal_error` — unexpected service error
- `service_unavailable` — upstream dependency unavailable
## Configuration
### Required
- `LOBBY_REDIS_MASTER_ADDR`
- `LOBBY_REDIS_PASSWORD`
- `LOBBY_POSTGRES_PRIMARY_DSN`
- `LOBBY_USER_SERVICE_BASE_URL`
- `LOBBY_GM_BASE_URL`
### Configuration groups
Process and logging:
- `LOBBY_SHUTDOWN_TIMEOUT` with default `30s`
- `LOBBY_LOG_LEVEL` with default `info`
Public HTTP:
- `LOBBY_PUBLIC_HTTP_ADDR` with default `:8094`
- `LOBBY_PUBLIC_HTTP_READ_HEADER_TIMEOUT` with default `2s`
- `LOBBY_PUBLIC_HTTP_READ_TIMEOUT` with default `10s`
- `LOBBY_PUBLIC_HTTP_IDLE_TIMEOUT` with default `1m`
Internal HTTP:
- `LOBBY_INTERNAL_HTTP_ADDR` with default `:8095`
- `LOBBY_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s`
- `LOBBY_INTERNAL_HTTP_READ_TIMEOUT` with default `10s`
- `LOBBY_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m`
Redis connectivity:
- `LOBBY_REDIS_MASTER_ADDR` (required)
- `LOBBY_REDIS_REPLICA_ADDRS` (optional, comma-separated; not consumed yet)
- `LOBBY_REDIS_PASSWORD` (required)
- `LOBBY_REDIS_DB` (default 0)
- `LOBBY_REDIS_OPERATION_TIMEOUT` (default 250ms)
The legacy `LOBBY_REDIS_ADDR`, `LOBBY_REDIS_USERNAME`, and
`LOBBY_REDIS_TLS_ENABLED` env vars were retired in PG_PLAN.md §6A; setting
either of the latter two now fails fast at startup. See
`ARCHITECTURE.md §Persistence Backends` for the architectural rules.
PostgreSQL connectivity (PG_PLAN.md §6A and §6B; durable game /
application / invite / membership records and the Race Name Directory
live here):
- `LOBBY_POSTGRES_PRIMARY_DSN` (required;
e.g. `postgres://lobbyservice:secret@postgres:5432/galaxy?search_path=lobby&sslmode=disable`)
- `LOBBY_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not consumed yet)
- `LOBBY_POSTGRES_OPERATION_TIMEOUT` (default 1s)
- `LOBBY_POSTGRES_MAX_OPEN_CONNS` (default 25)
- `LOBBY_POSTGRES_MAX_IDLE_CONNS` (default 5)
- `LOBBY_POSTGRES_CONN_MAX_LIFETIME` (default 30m)
Stream names:
- `LOBBY_GM_EVENTS_STREAM` with default `gm:lobby_events`
- `LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT` with default `2s`
- `LOBBY_RUNTIME_START_JOBS_STREAM` with default `runtime:start_jobs`
- `LOBBY_RUNTIME_STOP_JOBS_STREAM` with default `runtime:stop_jobs`
- `LOBBY_RUNTIME_JOB_RESULTS_STREAM` with default `runtime:job_results`
- `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s`
- `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents`
Runtime Manager integration:
- `LOBBY_ENGINE_IMAGE_TEMPLATE` with default `galaxy/game:{engine_version}` —
Go-style template applied to a game's `target_engine_version` to resolve
the Docker `image_ref` published on `runtime:start_jobs`. The template
must contain the literal placeholder `{engine_version}`; Lobby fails
fast at startup otherwise.
Upstream clients:
- `LOBBY_USER_SERVICE_TIMEOUT` with default `1s`
- `LOBBY_GM_TIMEOUT` with default `5s`
Enrollment automation:
- `LOBBY_ENROLLMENT_AUTOMATION_INTERVAL` with default `30s`
Race Name Directory:
- `LOBBY_RACE_NAME_DIRECTORY_BACKEND` with default `postgres`
(alternate: `stub` for in-process tests; PG_PLAN.md §6B retired the
`redis` backend)
- `LOBBY_RACE_NAME_EXPIRATION_INTERVAL` with default `1h` — pending
registration expiration worker tick
The 30-day eligibility window for `pending_registration` entries is the
constant `service/capabilityevaluation.PendingRegistrationWindow`. It is
intentionally not operator-tunable today; the env var name
`LOBBY_PENDING_REGISTRATION_TTL_HOURS` is reserved for a future change.
User lifecycle:
- `LOBBY_USER_LIFECYCLE_STREAM` with default `user:lifecycle_events`
- `LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT` with default `2s`
OpenTelemetry:
- standard `OTEL_*` variables
- `LOBBY_OTEL_STDOUT_TRACES_ENABLED`
- `LOBBY_OTEL_STDOUT_METRICS_ENABLED`
## Persistence Layout
Game / application / invite / membership records live in PostgreSQL after
PG_PLAN.md §6A; the Race Name Directory followed in §6B. See
`docs/postgres-migration.md` for the schema and decision records. The
`lobby` schema owns five tables — `games`, `applications`, `invites`,
`memberships`, `race_names` — plus the partial UNIQUE index on
`applications(applicant_user_id, game_id) WHERE status <> 'rejected'` that
enforces the single-active-application invariant and the partial UNIQUE
index on `race_names(canonical_key) WHERE binding_kind = 'registered'`
that enforces single-registered-per-canonical.
The Redis-backed keys below survive both stages. Redis owns the
runtime-coordination state — per-game runtime aggregates, gap activation,
capability-evaluation guards, and stream consumer offsets — plus the
event-bus streams themselves.
### Redis key table
Storage rules for Redis:
- timestamps are stored in Unix milliseconds unless noted otherwise
- dynamic key segments are base64url-encoded
| Logical artifact | Redis key |
| --- | --- |
| per-game per-user stats aggregate | `lobby:game_turn_stats:<game_id>:<user_id>` → JSON aggregate |
| per-game stats user index | `lobby:game_turn_stats_by_game:<game_id>` (set of `user_id`) |
| capability-evaluation guard | `lobby:capability_evaluation:done:<game_id>` (sentinel string) |
| GM event stream offset | `lobby:stream_offsets:gm_events` |
| runtime job result offset | `lobby:stream_offsets:runtime_results` |
| user lifecycle stream offset | `lobby:stream_offsets:user_lifecycle` |
| gap window activation time | `lobby:gap_activated_at:<game_id>` |
### Frozen record fields
The five durable records are stored in PostgreSQL columns; the field set
per record is unchanged from the previous Redis JSON shape and is
documented inline with the migration scripts under
`internal/adapters/postgres/migrations/`.
| Record | Frozen fields |
| --- | --- |
| game record | all game fields listed in Game Record Model section |
| application record | `application_id`, `game_id`, `applicant_user_id`, `race_name`, `status`, `created_at`, `decided_at` |
| invite record | `invite_id`, `game_id`, `inviter_user_id`, `invitee_user_id`, `race_name` (set at redeem), `status`, `created_at`, `expires_at`, `decided_at` |
| membership record | all membership fields listed in Membership Model section |
| race_names row | `canonical_key`, `game_id`, `holder_user_id`, `race_name`, `binding_kind`, `source_game_id`, `reserved_at_ms`, `eligible_until_ms` (pending only), `registered_at_ms` (registered only) |
## Observability
### Metrics
- `lobby.game.transitions` — counter; attributes: `from_status`, `to_status`, `trigger` (`command`, `manual`, `deadline`, `gap`, `runtime_event`, `external_block`)
- `lobby.application.outcomes` — counter; attributes: `outcome` (`submitted`, `approved`, `rejected`)
- `lobby.invite.outcomes` — counter; attributes: `outcome` (`created`, `redeemed`, `declined`, `revoked`, `expired`)
- `lobby.membership.changes` — counter; attributes: `change` (`activated`, `removed`, `blocked`, `external_block`)
- `lobby.start_flow.outcomes` — counter; attributes: `outcome` (`running`, `paused`, `start_failed`)
- `lobby.notification.publish_attempts` — counter; attributes: `notification_type`, `result` (`ok`, `error`)
- `lobby.active_games` — observable gauge; attributes: `status`
- `lobby.enrollment_automation.checks` — counter; attributes: `result` (`no_op`, `transitioned`)
- `lobby.gm_events.oldest_unprocessed_age_ms` — observable gauge
- `lobby.runtime_results.oldest_unprocessed_age_ms` — observable gauge
- `lobby.user_lifecycle.oldest_unprocessed_age_ms` — observable gauge
- `lobby.race_name.outcomes` — counter; attributes: `outcome` (`reserved`, `reservation_released`, `pending_created`, `pending_released`, `registered`, `registered_released`)
- `lobby.pending_registration.expirations` — counter; attributes: `trigger` (`tick`, `manual`)
- `lobby.user_lifecycle.cascade_releases` — counter; attributes: `event` (`permanent_blocked`, `deleted`)
- `lobby.capability_evaluations` — counter; attributes: `result` (`capable`, `incapable`, `noop`)
Metrics avoid high-cardinality attributes such as `game_id`, `user_id`,
`application_id`, `invite_id`, and `canonical_key`.
### Structured log fields
Key operations emit structured logs with these stable field names where applicable:
- `game_id`
- `game_type`
- `game_status`
- `from_status`
- `to_status`
- `user_id`
- `application_id`
- `invite_id`
- `membership_id`
- `race_name`
- `canonical_key`
- `reservation_kind` (`reserved` / `pending_registration` / `registered`)
- `eligible_until_ms`
- `trigger`
- `lifecycle_event`
- `request_id`
- `trace_id`
## Verification
Test doubles split between two styles. Wide-surface ports with no
production state (`RuntimeManager`, `IntentPublisher`, `GMClient`,
`UserService`) use `gomock`-generated mocks under
`internal/adapters/mocks/`; regenerate with `make -C lobby mocks`.
Stateful behavioural fakes that mirror the production adapter
contract (`gameinmem`, `applicationinmem`, `inviteinmem`,
`membershipinmem`, `gameturnstatsinmem`, `racenameinmem`,
`evaluationguardinmem`, `gapactivationinmem`, `streamoffsetinmem`)
live as in-memory adapters under `internal/adapters/<name>inmem/`
and stay hand-rolled because tests rely on their CAS, status-transition,
and invariant-tracking behaviour.
Focused service-local coverage verifies:
- configuration loading and validation for all env var groups
- both HTTP listeners start and serve `/healthz` and `/readyz`
- game CRUD: create, update, get, list with correct field validation
- each status transition fires only from allowed source statuses
- enrollment automation: deadline trigger, gap trigger, manual trigger
- application flow: submit (eligibility check, race name check), approve, reject
- invite flow: create, redeem (auto-membership), decline, revoke, expire on enrollment close
- membership model: activate, remove, block with correct before/after-start semantics
- Race Name Directory (PostgreSQL + in-memory adapters against the same suite):
canonicalization + confusable-pair policy, `Reserve`/`ReleaseReservation`
per-game semantics, `MarkPendingRegistration`/`ExpirePendingRegistrations`
window, `Register` idempotency + quota, `ReleaseAllByUser` cascade
- game start flow: success path (→ running), GM unavailable path (→ paused),
container failure path (→ start_failed), metadata persistence failure path
(container removed, → start_failed)
- GM event stream consumer: snapshot update (stats aggregate),
`game_finished` with capability evaluation
- user lifecycle stream consumer: `permanent_blocked` and `deleted`
cascade release + membership/application/invite settlement
- pending-registration expiration worker idempotency
- race name registration service: capability, tariff quota, pending window,
idempotent retry
- notification intent publication for all ten supported triggers
- visibility rules: private game hidden from non-member non-owner users
- error model: all stable codes returned for correct conditions
Cross-service coverage verifies:
- `Lobby → User Service` eligibility check compatibility (including the new
`max_registered_race_names` field) and failure handling
- `Lobby → Notification Service` intent publication for all lobby notification types
- `Lobby → Runtime Manager` start job publication and result consumption
- `Lobby → Game Master` synchronous registration call (success and failure)
- `User Service → Lobby` cascade flow: permanent_block or DeleteUser on a
user leads to full RND release + memberships blocked + applications/invites
cancelled