Files
galaxy-game/lobby
2026-04-26 21:12:51 +02:00
..
2026-04-25 23:20:55 +02:00
2026-04-26 20:34:39 +02:00
2026-04-26 20:34:39 +02:00
2026-04-26 20:34:39 +02:00
2026-04-26 21:12:51 +02:00
2026-04-26 21:12:51 +02:00
2026-04-26 20:34:39 +02:00
2026-04-25 23:20:55 +02:00
2026-04-26 20:34:39 +02:00

Game Lobby Service

galaxy/lobby owns platform-level metadata and lifecycle of game sessions.

References

Purpose

Game Lobby Service is the platform source of truth for game sessions as platform entities — from creation through enrollment, start, runtime tracking, and finish. It mediates all player participation actions and maintains the roster state that Game Master may cache for runtime authorization.

Scope

Game Lobby is the source of truth for:

  • opaque stable game identifiers in game-* form
  • game metadata: name, description, type, owner, schedule, engine version
  • platform-level game status from draft through finished or cancelled
  • enrollment configuration: min_players, max_players, start_gap_hours, start_gap_players, enrollment_ends_at
  • applications and their approval or rejection status (public games)
  • user-bound invitations and their lifecycle (private games)
  • platform membership roster and participant status
  • Race Name Directory state across all regular platform users: registered race names (permanent ownership), per-game reservations, and 30-day pending-registration windows
  • per-game per-user player_turn_stats aggregate used at game finish for capability evaluation
  • denormalized runtime snapshot imported from Game Master
  • user-facing lists: active games, pending applications, open invitations

Game Lobby is not the source of truth for:

  • platform user identity or profile — owned by User Service
  • device sessions or authentication state — owned by Auth / Session Service
  • runtime container lifecycle or technical health — owned by Runtime Manager
  • current turn, generation state, engine reachability — owned by Game Master
  • full per-player game state — owned by the game engine container
  • player-to-engine UUID mapping — owned by Game Master

Non-Goals

  • Game Lobby does not call game engine containers directly; all engine interaction goes through Game Master.
  • Game Lobby owns the Race Name Directory data in v1 (Redis adapter); the contract is kept behind a port interface so a future dedicated Race Name Service can replace the adapter without domain changes.
  • Game Lobby does not compute notification audiences from roster data at delivery time; notification intents carry explicit recipient_user_id values.
  • Game Lobby does not apply sanctions or session-level access control; User Service and Auth / Session Service remain authoritative for those.
  • Game Lobby does not own billing or entitlement decisions; it reads the current entitlement snapshot from User Service.

Position in the System

flowchart LR
    Gateway["Edge Gateway"]
    Lobby["Game Lobby Service"]
    User["User Service"]
    GM["Game Master"]
    Runtime["Runtime Manager"]
    Notify["Notification Service"]
    Redis["Redis\nKV + Streams"]

    Gateway --> Lobby
    Lobby --> User
    Lobby --> GM
    Lobby --> Redis
    Lobby --> Notify
    GM --> Redis
    Redis --> Lobby
    Runtime --> Redis

Gateway routes authenticated platform-level commands to Lobby over trusted REST. Lobby reads user eligibility from User Service synchronously. Lobby registers running games with Game Master synchronously at start. Lobby submits start jobs to Runtime Manager and reads job results from a dedicated Redis Stream. Game Master publishes runtime events to a dedicated Redis Stream that Lobby consumes asynchronously. Lobby publishes notification intents to notification:intents.

Responsibility Boundaries

Game Lobby is responsible for:

  • accepting and validating game creation and configuration commands
  • opening and managing enrollment for public and private games
  • validating user eligibility before accepting applications and invite redeems
  • checking race name availability through the Race Name Directory port
  • enforcing enrollment deadline and roster-size auto-transitions
  • orchestrating the game start sequence with Runtime Manager and Game Master
  • persisting game metadata atomically and removing orphaned containers when metadata persistence fails
  • maintaining the denormalized runtime snapshot for user-facing reads
  • emitting notification intents for all participant lifecycle events
  • enforcing visibility rules: private games are visible only to owner and members

Game Lobby is not responsible for:

  • verifying authenticated transport signatures — handled by Edge Gateway
  • checking session revocation state — handled by Edge Gateway and Auth
  • email delivery — handled by Mail Service
  • push delivery — handled by Notification Service and Edge Gateway
  • container start and stop mechanics — handled by Runtime Manager
  • per-turn player command routing — handled by Game Master

Runtime Surface

The service starts two HTTP listeners and one Redis Stream consumer pipeline.

Listeners

  • public authenticated REST on LOBBY_PUBLIC_HTTP_ADDR with default :8094
  • internal trusted REST on LOBBY_INTERNAL_HTTP_ADDR with default :8095

Background workers

  • enrollment automation ticker — checks enrollment deadlines and roster thresholds at a configurable interval
  • Runtime Manager result consumer — reads start-job results from a Redis Stream
  • Game Master event consumer — reads runtime snapshot updates and game-finish events from a dedicated Redis Stream

Startup dependencies

  • one reachable Redis deployment at LOBBY_REDIS_MASTER_ADDR (mandatory password via LOBBY_REDIS_PASSWORD; replicas optional via LOBBY_REDIS_REPLICA_ADDRS). Used for streams, race-name directory, per-game runtime aggregates, and stream offsets.
  • one reachable PostgreSQL primary at LOBBY_POSTGRES_PRIMARY_DSN (DSN must include search_path=lobby&sslmode=disable). Embedded goose migrations apply at startup before any listener opens; on migration or ping failure the service exits non-zero. The four core enrollment entities (game / application / invite / membership) live here after PG_PLAN.md §6A; docs/postgres-migration.md is the decision record.
  • User Service reachable at LOBBY_USER_SERVICE_BASE_URL (startup check only; runtime failures are surfaced as request errors, not boot failures)
  • Game Master at LOBBY_GM_BASE_URL (same policy — startup check omitted; unreachability at registration triggers the forced-pause path)

Probes

  • GET /healthz on both ports returns {"status":"ok"}
  • GET /readyz on both ports returns {"status":"ready"} after successful startup; no live Redis or PostgreSQL ping per request

Game Record Model

Fields

Field Type Notes
game_id string opaque, stable, game-* form
game_name string human-readable; mutable in draft
description string optional; mutable in draft and enrollment_open
game_type enum public or private
owner_user_id string private games only; empty for public
status enum see status table below
min_players int minimum approved participants to proceed to start
max_players int target roster size that activates the gap window
start_gap_hours int hours of gap window after max_players is reached
start_gap_players int additional participants admitted during the gap
enrollment_ends_at int64 UTC Unix seconds; deadline for automatic enrollment close
turn_schedule string cron expression, e.g. 0 18 * * *; passed to GM at registration
target_engine_version string semver of the engine to launch; passed to GM at registration
created_at int64 UTC Unix milliseconds
updated_at int64 UTC Unix milliseconds
started_at int64 UTC Unix milliseconds; set when status becomes running
finished_at int64 UTC Unix milliseconds; set when status becomes finished
current_turn int denormalized from GM; zero until running
runtime_status string denormalized from GM; empty until running
engine_health_summary string denormalized from GM; empty until running
runtime_binding object? non-null after successful container start; contains container_id, engine_endpoint, runtime_job_id, bound_at (Unix ms)

All fields set at creation are validated before the game record is persisted. game_name is required and must be non-empty after trim. min_players, max_players, start_gap_hours, start_gap_players, and enrollment_ends_at are required positive integers with min_players <= max_players. turn_schedule must be a valid five-field cron expression. target_engine_version must be a non-empty semver string.

Status vocabulary

Status Meaning
draft Created; enrollment not yet open; editable
enrollment_open Accepting applications (public) or invite redeems (private)
ready_to_start Enrollment closed; start command accepted
starting Start job submitted to Runtime Manager; awaiting result
start_failed Container start or metadata persistence failed
running Game engine container live; normal gameplay
paused Platform-level pause; engine container may still be alive
finished Game ended; record is terminal
cancelled Cancelled before start; record is terminal

Status transition table

From To Trigger
draft enrollment_open explicit command from admin (public) or owner (private)
enrollment_open ready_to_start manual command when approved_count >= min_players
enrollment_open ready_to_start enrollment_ends_at reached and approved_count >= min_players
enrollment_open ready_to_start gap window exhausted (time or player count)
ready_to_start starting start command from admin (public) or owner (private)
starting running Runtime Manager confirms container; GM registration succeeds
starting paused Runtime Manager confirms container; GM registration fails (unavailable)
starting start_failed Runtime Manager reports container start failure
start_failed ready_to_start explicit retry command from admin or owner
running paused explicit pause command from admin or owner
running finished game_finished event from Game Master via Redis Stream
paused running explicit resume command from admin or owner
paused finished game_finished event from Game Master via Redis Stream
draft cancelled explicit cancel command from admin or owner
enrollment_open cancelled explicit cancel command from admin or owner
ready_to_start cancelled explicit cancel command from admin or owner
start_failed cancelled explicit cancel command from admin or owner
draft cancelled external_block cascade on owner permanent_block / DeleteUser
enrollment_open cancelled external_block cascade on owner permanent_block / DeleteUser
ready_to_start cancelled external_block cascade on owner permanent_block / DeleteUser
start_failed cancelled external_block cascade on owner permanent_block / DeleteUser
starting cancelled external_block cascade on owner permanent_block / DeleteUser
running cancelled external_block cascade on owner permanent_block / DeleteUser
paused cancelled external_block cascade on owner permanent_block / DeleteUser

Outside the external_block cascade, running and paused games cannot be cancelled directly; use stop operations through Game Master and await the game_finished event instead. The cascade publishes a stop-job to Runtime Manager before applying the external_block transition for in-flight games.

Enrollment Rules

enrollment_open → ready_to_start fires on the first of these conditions:

Manual close

Admin (public game) or owner (private game) issues lobby.game.ready_to_start when approved_count >= min_players.

Deadline

Enrollment automation worker detects that enrollment_ends_at is in the past and approved_count >= min_players. If the deadline is reached but approved_count < min_players, the game remains in enrollment_open — the transition does not fire until the player count condition is also satisfied.

Gap exhaustion

When approved_count reaches max_players, the gap window opens. During the gap window:

  • new applications and invite redeems continue to be accepted up to max_players + start_gap_players total approved participants
  • the game does not automatically transition while the gap is open

The transition fires when either:

  • start_gap_hours have elapsed since the gap window opened, or
  • approved_count reaches max_players + start_gap_players

On enrollment close

When any path transitions the game to ready_to_start:

  • all invites in created status transition to expired
  • lobby.invite.expired notification intents are published for each expired invite (recipient: private-game owner)
  • no new applications are accepted in ready_to_start status

Application Lifecycle

Applications are used for public games only. Private games use the invite flow exclusively.

Submit

An authenticated user submits lobby.application.submit with race_name.

Pre-conditions checked synchronously:

  • game status is enrollment_open
  • game type is public
  • user has no existing non-rejected application to the same game
  • User Service eligibility check confirms can_join_game=true
  • approved_count < max_players + start_gap_players (or gap window not yet open)
  • Race Name Directory confirms race_name is available for the applicant

On success:

  • an Application record is created with status=submitted
  • lobby.application.submitted intent published (audience_kind=admin_email) with payload: game_id, game_name, applicant_user_id, applicant_name

applicant_name in the notification payload equals the submitted race_name.

Approve

Admin issues lobby.application.approve.

Pre-conditions:

  • game is enrollment_open
  • application is in submitted status
  • approved_count < max_players + start_gap_players

On success:

  • Race Name Directory reserves race_name for the applicant
  • application statusapproved
  • Membership record created with status=active
  • lobby.membership.approved intent published (recipient: applicant) with payload: game_id, game_name
  • gap window opens automatically if approved_count now equals max_players
  • auto-transition to ready_to_start if gap exhaustion condition is immediately met

Reject

Admin issues lobby.application.reject.

Pre-conditions:

  • application is in submitted status

On success:

  • application statusrejected
  • any pending Race Name Directory reservation for the applicant is released
  • lobby.membership.rejected intent published (recipient: applicant) with payload: game_id, game_name

Application state machine

submitted → approved
submitted → rejected

Rejected applicants may re-apply while enrollment is open, subject to a single active application constraint (at most one non-rejected application per user per game).

The single-active constraint is enforced at the persistence layer by the user_game_application key (see Redis Logical Model). The key is created atomically with the submitted application record, removed on rejection, and preserved on approval. Service-layer code can rely on this invariant without performing its own scan of user_applications.

Invite Lifecycle

Invites are used for private games only. Public games use the application flow exclusively.

Create

Private-game owner issues lobby.invite.create with invitee_user_id.

Pre-conditions:

  • game status is enrollment_open
  • game type is private
  • the invitee has no active invite or active membership in the game
  • approved_count < max_players + start_gap_players

On success:

  • Invite record created with status=created
  • expires_at is set to enrollment_ends_at of the game
  • lobby.invite.created intent published (recipient: invitee) with payload: game_id, game_name, inviter_user_id, inviter_name

inviter_name is the owner's race name if already a member of the game; otherwise it is the owner's user_id.

Redeem

The invited user issues lobby.invite.redeem with race_name.

Pre-conditions:

  • invite status is created
  • game is enrollment_open
  • approved_count < max_players + start_gap_players
  • inviter and invitee both exist and are not permanently blocked in User Service
  • Race Name Directory confirms race_name is available for the invitee

On success:

  • Race Name Directory reserves race_name for the invitee
  • invite statusredeemed
  • Membership record created with status=active
  • lobby.invite.redeemed intent published (recipient: private-game owner) with payload: game_id, game_name, invitee_user_id, invitee_name
  • gap window opens automatically if approved_count now equals max_players
  • auto-transition to ready_to_start if gap exhaustion condition is immediately met

The synchronous User Service check on both inviter and invitee enforces the rule that an invite from or to a permanently blocked or deleted user behaves as if it never existed, even before the asynchronous user-lifecycle cascade has flipped the invite to revoked. Cascade-deleted accounts and permanent_block sanctions surface as subject_not_found.

Decline

The invited user issues lobby.invite.decline.

Pre-conditions:

  • invite status is created

On success:

  • invite statusdeclined
  • no notification in v1

Declined users may receive a new invite from the owner while enrollment is open.

Revoke

Owner issues lobby.invite.revoke.

Pre-conditions:

  • invite status is created

On success:

  • invite statusrevoked
  • no notification in v1

Expire

Pending invites (status=created) are transitioned to expired automatically when the game moves to ready_to_start.

lobby.invite.expired intent is published for each expired invite (recipient: private-game owner) with payload: game_id, game_name, invitee_user_id, invitee_name.

Invite state machine

created → redeemed
created → declined
created → revoked
created → expired

Membership Model

Fields

Field Type Notes
membership_id string opaque, stable
game_id string reference to game
user_id string reference to platform user
race_name string confirmed in-game name as submitted (original casing)
canonical_key string canonicalized key under which the RND reservation is held
status enum active, removed, blocked
joined_at int64 UTC Unix milliseconds
removed_at int64 UTC Unix milliseconds; set on remove or block

Status vocabulary

Status Meaning
active Full participant; may send commands through Game Master
removed Permanently removed; engine slot deactivated after game start
blocked Platform-level block; engine slot retained but commands blocked

Status transition table

From To Trigger
active removed explicit remove command from admin or owner (post-start)
active blocked explicit block command from admin or owner

removed and blocked are terminal statuses. Pre-start remove drops the membership record entirely rather than transitioning to removed (see Removal rules below).

Removal rules

Before game start:

  • remove drops membership and releases the race name reservation

After game start:

  • blocked: the player cannot send commands; engine keeps the player slot
  • removed: Game Lobby marks membership removed; Game Master must also deactivate the player inside the engine; race name reservation remains until game is finished

This distinction is architectural and must remain explicit in all implementations.

Race Name Directory

Purpose

Race Name Directory (RND) is the platform source of truth for all in-game race_name values. It owns three levels of state per name:

  • registered — permanent user-owned names. Once registered, the name is unavailable to any other user and cannot be released by the owner; only permanent_block or DeleteUser on the owning account frees it.
  • reservation — a per-game holding created when a participant joins through application approval or invite redeem. Reservations are keyed by (game_id, canonical_key). One user may hold the same name in multiple active games concurrently.
  • pending_registration — a reservation that survived a capable finish and is now waiting up to 30 days for the owner to upgrade it into a registered name via lobby.race_name.register. Expiration releases the binding.

User Service does not store race_name values. It only exposes max_registered_race_names in the eligibility snapshot and publishes user.lifecycle.permanent_blocked / user.lifecycle.deleted events.

Canonical key + confusable-pair policy

Every RND key is derived by racename.Canonicalize(raceName) (canonical string, err error) living in lobby/internal/domain/racename/policy.go:

  1. trim and validate the character set via pkg/util/string.go:ValidateTypeName;
  2. lowercase Unicode fold;
  3. apply the frozen confusable-pair replacement map (ported from the former user/internal/ports/race_name_policy.go).

A name is considered taken for the actor when the RND holds at least one registered, active reservation, or pending_registration whose owner differs from the actor on the same canonical key.

Port interface

type RaceNameDirectory interface {
    Canonicalize(raceName string) (canonical string, err error)

    Check(ctx context.Context, raceName, actorUserID string) (Availability, error)

    Reserve(ctx context.Context, gameID, userID, raceName string) error
    ReleaseReservation(ctx context.Context, gameID, userID, raceName string) error

    MarkPendingRegistration(
        ctx context.Context,
        gameID, userID, raceName string,
        eligibleUntil time.Time,
    ) error
    ExpirePendingRegistrations(ctx context.Context, now time.Time) ([]ExpiredPending, error)

    Register(ctx context.Context, gameID, userID, raceName string) error

    ListRegistered(ctx context.Context, userID string) ([]RegisteredName, error)
    ListPendingRegistrations(ctx context.Context, userID string) ([]PendingRegistration, error)
    ListReservations(ctx context.Context, userID string) ([]Reservation, error)

    ReleaseAllByUser(ctx context.Context, userID string) error
}

type Availability struct {
    Taken        bool
    HolderUserID string // "" when available
    Kind         string // "registered" | "reservation" | "pending_registration"
}

Sentinel errors: ErrNameTaken, ErrInvalidName, ErrPendingMissing, ErrPendingExpired, ErrQuotaExceeded.

v1 backends

  • PostgreSQL (lobby/internal/adapters/postgres/racenamedir/directory.go) — the production adapter; one row per binding under lobby.race_names, transactional writes guarded by pg_advisory_xact_lock(hashtextextended(canonical_key, 0)). See docs/postgres-migration.md §6B for the full schema and decision record.
  • Stub (lobby/internal/adapters/racenamestub/directory.go) — in-process implementation for unit tests that do not need PostgreSQL. Chosen by LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub.

A future dedicated Race Name Service replaces the adapter without changing the domain or service layer.

Reservation lifecycle and capability

  1. approveapplication / redeeminviteReserve(game_id, user_id, race_name).
  2. removemember before start → ReleaseReservation.
  3. removemember / blockmember after start → reservation kept; resolved at game_finished.
  4. On game_finished the capability evaluator runs per active membership:
    • capable = max_planets > initial_planets AND max_population > initial_population, using the per-game stats aggregate (see §Runtime Snapshot);
    • capable ⇒ MarkPendingRegistration(..., finished_at + 30 days) + lobby.race_name.registration_eligible;
    • not capable ⇒ ReleaseReservation + optional lobby.race_name.registration_denied.
  5. The pending-registration worker (LOBBY_RACE_NAME_EXPIRATION_INTERVAL) releases expired entries.

Registration flow

lobby.race_name.registerPOST /api/v1/lobby/race-names/register:

  • actor is the authenticated user;
  • body: {race_name, source_game_id};
  • preconditions:
    • pending_registration exists for (source_game_id, user_id, canonical_key) with eligible_until > now;
    • UserService.GetEligibility snapshot: no permanent_block, current_registered_count < max_registered_race_names (a snapshot value of 0 denotes unlimited);
  • commit: RND.Register atomically deletes the pending entry, creates a registered entry, and publishes lobby.race_name.registered.

Errors: race_name_registration_quota_exceeded, race_name_pending_window_expired, subject_not_found, forbidden.

Self-service reads

lobby.race_names.listGET /api/v1/lobby/my/race-names returns the acting user's {registered[], pending[], reservations[]} using the user_registered / user_reservations indexes (no full scan).

The response shape is fixed by api/public-openapi.yaml and carries:

  • registered[]: canonical_key, race_name, source_game_id, registered_at_ms;
  • pending[]: canonical_key, race_name, source_game_id (the game whose capable finish promoted the reservation), reserved_at_ms, eligible_until_ms;
  • reservations[]: canonical_key, race_name, game_id, reserved_at_ms, game_status (current game.Status of the hosting game, joined on read).

Each slice is sorted ascending by its time field with canonical_key as the tie-breaker so the wire output is stable. The endpoint is exclusively self-service: there is no ?user_id= parameter and no admin counterpart on the internal port. Visibility is enforced by the X-User-ID header alone.

Cascade release

Game Lobby consumes user:lifecycle_events through a dedicated worker. On user.lifecycle.permanent_blocked or user.lifecycle.deleted:

  • RND.ReleaseAllByUser(user_id) clears every registered, reservation, and pending entry owned by the user;
  • every active membership held by the user transitions to blocked. For each such membership in a third-party private game, a lobby.membership.blocked intent is published to the game owner;
  • every outstanding submitted application authored by the user is rejected;
  • every created invite where the user is invitee or inviter transitions to revoked;
  • every non-terminal game owned by the user transitions to cancelled via the external_block trigger. For in-flight games (starting, running, paused) a stop-job is published to Runtime Manager before the status transition.

Synchronous guard: lobby.invite.redeem calls UserService.GetEligibility for both the inviter and the invitee. If either party has been permanently blocked or soft-deleted, the redeem fails with subject_not_found, matching the «as if the invite never existed» semantic even before the cascade flips the invite to revoked.

Retry and release semantics

  • Reserve is idempotent for the same holder under the same game. A second call returns no error so that approveapplication and redeeminvite retries after transient upstream failures stay safe.
  • ReleaseReservation is a no-op when no reservation exists for the tuple and also when the reservation belongs to a different user. Defensive release paths (rejectapplication, revokeinvite, declineinvite) never surface an error.
  • Register is idempotent only for the same (game_id, user_id, race_name) tuple — repeated calls after success return the same registered record without consuming additional quota.
  • MarkPendingRegistration is idempotent when called with the same eligible_until; re-emitting it with a different timestamp returns ErrInvalidName.

Game Start Flow

The start sequence spans three services and must be treated as a distributed transaction with explicit failure handling.

sequenceDiagram
    participant Admin as Admin or Private Owner
    participant Lobby
    participant Runtime
    participant GM as Game Master
    participant Redis

    Admin->>Lobby: lobby.game.start
    Lobby->>Lobby: validate ready_to_start + roster
    Lobby->>Lobby: status → starting
    Lobby->>Redis: publish start job to runtime:start_jobs
    Runtime->>Runtime: start container
    Runtime->>Redis: publish result to runtime:job_results

    alt container start failed
        Lobby->>Lobby: status → start_failed
    else container started
        Lobby->>Lobby: persist runtime binding
        Lobby->>GM: POST /internal/games/{game_id}/register (sync)
        alt GM registration success
            GM-->>Lobby: 200 OK
            Lobby->>Lobby: status → running; set started_at
        else GM unavailable
            GM-->>Lobby: error / timeout
            Lobby->>Lobby: status → paused
            Lobby->>Redis: publish lobby.runtime_paused_after_start intent
        end
    end

Critical invariants

  • If the container starts but Lobby cannot persist the runtime binding metadata, the start is a full failure: Lobby must issue a stop job to Runtime Manager before setting start_failed.
  • If metadata is persisted but Game Master is unavailable, the game must be placed in paused, not in start_failed. The container is alive; only the platform tracking is incomplete.
  • No start job is accepted while the game is not in ready_to_start.
  • Concurrent start attempts for the same game must be serialized; the second attempt must fail if the first already moved the game to starting.

Paused State

Lobby.paused is a platform-level pause, distinct from Game Master runtime failure states. Two paths lead to paused:

Voluntary pause

Admin or owner issues lobby.game.pause while the game is running. Resume is issued with lobby.game.resume; Lobby performs a synchronous liveness check against Game Master before transitioning back to running.

Forced pause (GM unavailable after start)

If the game start sequence succeeds at the runtime layer but Game Master registration fails, Lobby transitions to paused and publishes lobby.runtime_paused_after_start to administrators.

Administrators investigate, restore Game Master, and issue lobby.game.resume through the internal admin surface.

Game Finish Flow

Game Master publishes a game_finished event to the GM events Redis Stream when the engine reports that the game has ended.

Lobby consumes this event and, before advancing the stream offset:

  • transitions game status to finished
  • sets finished_at to the event timestamp
  • updates the denormalized runtime snapshot with the final values
  • runs the capability evaluator against every active membership:
    • capable = max_planets > initial_planets AND max_population > initial_population from the per-member stats aggregate
    • capable ⇒ RND.MarkPendingRegistration(game_id, user_id, race_name, finished_at + 30 days) and publish lobby.race_name.registration_eligible
    • not capable ⇒ RND.ReleaseReservation(game_id, user_id, race_name) and (optional) publish lobby.race_name.registration_denied
  • resolves outstanding reservations on removed and blocked memberships by calling RND.ReleaseReservation (post-start remove/block keeps the reservation alive specifically so capability evaluation resolves it here)
  • deletes the per-game stats aggregate

The game_finished event from Game Master is the sole trigger for the finished status. Lobby does not independently decide that a game is finished. Capability evaluation must be idempotent: a replayed game_finished event must not produce additional RND side effects or notifications.

Runtime Snapshot

Game Lobby stores a denormalized runtime snapshot on the game record to prevent fan-out reads to Game Master on every user-facing list or detail request, and aggregates per-member stats to support capability evaluation at game finish.

Denormalized snapshot fields

Field Source
current_turn GM event runtime_snapshot_update
runtime_status GM event runtime_snapshot_update
engine_health_summary GM event runtime_snapshot_update

Per-member stats aggregate

Each runtime_snapshot_update carries a player_turn_stats array with one entry per active member: {user_id, planets, population, ships_built}. Lobby aggregates these in lobby:game_turn_stats:<game_id>:<user_id> with the shape {initial_planets, initial_population, initial_ships_built, max_planets, max_population, max_ships_built}.

Rules:

  • initial_* values are frozen from the first event after starting → running; later events must not change them.
  • max_* values are maintained by max-semantic update; they never decrease.
  • the aggregate is read once by the capability evaluator at game_finished and then deleted.

Update mechanism

Game Master publishes events to a dedicated Redis Stream consumed by Lobby:

  • runtime_snapshot_update: carries updated current_turn, runtime_status, engine_health_summary, and player_turn_stats; Lobby applies a compare-and-swap update on the game record plus a stats aggregate upsert.
  • game_finished: carries final snapshot values and signals the finish transition; capability evaluator (see §Game Finish Flow) runs before the stream offset is advanced.

Lobby does not expose the runtime snapshot update as an internal HTTP endpoint. All snapshot updates are asynchronous and delivered through the stream.

Public vs Private Game Rules

Public games

  • created and controlled by system administrators through the internal admin surface
  • visible in the public game list when in enrollment_open, ready_to_start, running, or finished status
  • draft public games are not visible to non-admin users
  • players join through the application flow; admission requires admin approval
  • turn schedule and engine version are set by the administrator

Private games

  • created only by eligible paid users whose User Service eligibility snapshot carries can_create_private_game=true and whose max_owned_private_games limit allows it
  • visible only to the owner and to users who have an active membership or a non-expired invite
  • draft private games are visible only to the owner
  • players join through the invite flow; invite redemption creates active membership immediately without further owner approval
  • owner manages invites, turn schedule, and engine version

Owner-Admin Capabilities

Private-game owners have a limited owner-admin capability set over their own games only:

  • open enrollment (draftenrollment_open)
  • create and revoke invites
  • manually close enrollment (enrollment_openready_to_start)
  • start the game (ready_to_startstarting)
  • pause and resume the game (runningpaused)
  • retry start or cancel after start_failed
  • remove or block members
  • cancel the game (from draft, enrollment_open, ready_to_start, start_failed)

Owners do not have system-admin power. They cannot see or operate on other users' private games. They cannot approve or reject applications (applications are public-game only).

Trusted Surfaces

Public authenticated REST (gateway-facing)

All user-facing commands arrive through Edge Gateway. Gateway verifies the authenticated session, transcodes the FlatBuffers command to a trusted REST call, and forwards it to Lobby on the public port.

Gateway enriches each request with the authenticated user_id via the X-User-ID header. Lobby must never derive the acting user from the request payload.

Message type catalog

message_type Method Path Actor
lobby.game.create POST /api/v1/lobby/games admin (public), eligible user (private)
lobby.game.update PATCH /api/v1/lobby/games/{game_id} admin or owner; draft only
lobby.game.get GET /api/v1/lobby/games/{game_id} any authenticated user (visibility rules apply)
lobby.games.list GET /api/v1/lobby/games any authenticated user
lobby.game.open_enrollment POST /api/v1/lobby/games/{game_id}/open-enrollment admin or owner
lobby.game.ready_to_start POST /api/v1/lobby/games/{game_id}/ready-to-start admin or owner
lobby.game.start POST /api/v1/lobby/games/{game_id}/start admin or owner
lobby.game.pause POST /api/v1/lobby/games/{game_id}/pause admin or owner
lobby.game.resume POST /api/v1/lobby/games/{game_id}/resume admin or owner
lobby.game.cancel POST /api/v1/lobby/games/{game_id}/cancel admin or owner
lobby.game.retry_start POST /api/v1/lobby/games/{game_id}/retry-start admin or owner
lobby.application.submit POST /api/v1/lobby/games/{game_id}/applications authenticated user
lobby.application.approve POST /api/v1/lobby/games/{game_id}/applications/{application_id}/approve admin
lobby.application.reject POST /api/v1/lobby/games/{game_id}/applications/{application_id}/reject admin
lobby.invite.create POST /api/v1/lobby/games/{game_id}/invites private-game owner
lobby.invite.redeem POST /api/v1/lobby/games/{game_id}/invites/{invite_id}/redeem invited user
lobby.invite.decline POST /api/v1/lobby/games/{game_id}/invites/{invite_id}/decline invited user
lobby.invite.revoke POST /api/v1/lobby/games/{game_id}/invites/{invite_id}/revoke private-game owner
lobby.membership.remove POST /api/v1/lobby/games/{game_id}/memberships/{membership_id}/remove admin or owner
lobby.membership.block POST /api/v1/lobby/games/{game_id}/memberships/{membership_id}/block admin or owner
lobby.memberships.list GET /api/v1/lobby/games/{game_id}/memberships admin, owner, or active member
lobby.my_games.list GET /api/v1/lobby/my/games authenticated user
lobby.my_applications.list GET /api/v1/lobby/my/applications authenticated user
lobby.my_invites.list GET /api/v1/lobby/my/invites authenticated user
lobby.race_name.register POST /api/v1/lobby/race-names/register authenticated user
lobby.race_names.list GET /api/v1/lobby/my/race-names authenticated user

Internal trusted REST (internal-facing)

The internal port is not reachable from the public internet. It is used by Game Master for the synchronous registration call and by the administrative backend for admin-only operations.

Key internal endpoints:

Method Path Purpose
GET /api/v1/internal/games/{game_id} game detail read for GM/admin
GET /api/v1/internal/games/{game_id}/memberships full membership list for GM
GET /api/v1/internal/healthz health probe
GET /api/v1/internal/readyz readiness probe

Note: the registration call from Lobby to Game Master after a successful container start is outgoing — Lobby calls POST /api/v1/internal/games/{game_id}/register-runtime on Game Master's internal port. Lobby does not expose an inbound register-runtime endpoint.

Admin-only operations (approve, reject, cancel, create public games, etc.) are also exposed on the internal port and are intended to be called by Admin Service after it enforces the system-admin role check at the gateway boundary.

User-Facing Lists

My active games

Returns games where the authenticated user has an active membership and the game status is running or paused. Response includes the denormalized runtime snapshot.

My pending applications

Returns applications submitted by the authenticated user with status submitted. Includes game name and type for display.

My open invitations

Returns invites addressed to the authenticated user with status created. Includes game name, inviter name, and expires_at.

Public game list

Paginated list of public games with status in enrollment_open, ready_to_start, running, or finished. Games in draft or cancelled are excluded. Default order: enrollment_open and ready_to_start first, then running, then finished (most recent first within each group).

Visibility rules

  • private draft games: visible only to the owner
  • private non-draft games: visible only to the owner and users with active membership or non-expired invite
  • public draft games: visible only to system administrators
  • public non-draft games: visible in the public list

Notification Contracts

Game Lobby publishes normalized notification intents to notification:intents using the galaxy/notificationintent producer module.

Trigger notification_type Audience Channels
Application submitted (public game) lobby.application.submitted configured admin email list email
Application approved lobby.membership.approved applicant user push+email
Application rejected lobby.membership.rejected applicant user push+email
Cascade membership block (permanent_block/DeleteUser) lobby.membership.blocked private-game owner push+email
Invite created (private game) lobby.invite.created invited user push+email
Invite redeemed (private game) lobby.invite.redeemed private-game owner push+email
Invite expired (on enrollment close) lobby.invite.expired private-game owner email
GM unavailable after start (forced pause) lobby.runtime_paused_after_start configured admin email list email
Race name eligible for registration lobby.race_name.registration_eligible capable member push+email
Race name successfully registered lobby.race_name.registered registering user push+email
Race name registration denied (capability) lobby.race_name.registration_denied incapable member email

Rules:

  • intents carry explicit recipient_user_id values; Lobby resolves recipients before publishing rather than delegating audience resolution to Notification Service
  • a failed intent publication is a notification degradation and must not roll back already committed business state
  • lobby.invite.revoked and lobby.invite.declined produce no notification in v1
  • lobby.application.submitted is published only for public games; the private-game owner-targeting path defined in the notification catalog is reserved for future use

Domain Events

Game Lobby publishes auxiliary post-commit domain events to the Redis stream configured for lobby domain events.

Frozen event types:

  • lobby.game.created
  • lobby.game.status_changed
  • lobby.membership.activated
  • lobby.membership.removed
  • lobby.membership.blocked

Event rules:

  • events are post-commit only; they are not emitted on failed operations
  • event envelopes carry game_id, optional user_id, occurrence timestamp, new status (for status_changed), and optional trace correlation
  • domain events are observability and downstream-read-model artifacts; they must not carry full business state payloads

Error Model

The trusted internal REST contract uses strict JSON error envelopes:

{
  "error": {
    "code": "invalid_request",
    "message": "request is invalid"
  }
}

Stable error codes:

  • invalid_request — malformed input or failed validation
  • conflict — state transition not allowed from current status
  • subject_not_found — game, application, invite, membership, or pending race-name registration not found
  • eligibility_denied — user not eligible per User Service
  • name_takenrace_name already registered, reserved, or pending for another user
  • race_name_registration_quota_exceeded — user's max_registered_race_names slot is full
  • race_name_pending_window_expired — the 30-day registration window has passed for the pending entry
  • race_name_capability_not_met — capability condition not satisfied at game finish (reservation released)
  • race_name_permanent_blocked — the user carries an active permanent_block sanction
  • forbidden — caller is not authorized for this operation on this game or this race name
  • internal_error — unexpected service error
  • service_unavailable — upstream dependency unavailable

Configuration

Required

  • LOBBY_REDIS_MASTER_ADDR
  • LOBBY_REDIS_PASSWORD
  • LOBBY_POSTGRES_PRIMARY_DSN
  • LOBBY_USER_SERVICE_BASE_URL
  • LOBBY_GM_BASE_URL

Configuration groups

Process and logging:

  • LOBBY_SHUTDOWN_TIMEOUT with default 30s
  • LOBBY_LOG_LEVEL with default info

Public HTTP:

  • LOBBY_PUBLIC_HTTP_ADDR with default :8094
  • LOBBY_PUBLIC_HTTP_READ_HEADER_TIMEOUT with default 2s
  • LOBBY_PUBLIC_HTTP_READ_TIMEOUT with default 10s
  • LOBBY_PUBLIC_HTTP_IDLE_TIMEOUT with default 1m

Internal HTTP:

  • LOBBY_INTERNAL_HTTP_ADDR with default :8095
  • LOBBY_INTERNAL_HTTP_READ_HEADER_TIMEOUT with default 2s
  • LOBBY_INTERNAL_HTTP_READ_TIMEOUT with default 10s
  • LOBBY_INTERNAL_HTTP_IDLE_TIMEOUT with default 1m

Redis connectivity:

  • LOBBY_REDIS_MASTER_ADDR (required)
  • LOBBY_REDIS_REPLICA_ADDRS (optional, comma-separated; not consumed yet)
  • LOBBY_REDIS_PASSWORD (required)
  • LOBBY_REDIS_DB (default 0)
  • LOBBY_REDIS_OPERATION_TIMEOUT (default 250ms)

The legacy LOBBY_REDIS_ADDR, LOBBY_REDIS_USERNAME, and LOBBY_REDIS_TLS_ENABLED env vars were retired in PG_PLAN.md §6A; setting either of the latter two now fails fast at startup. See ARCHITECTURE.md §Persistence Backends for the architectural rules.

PostgreSQL connectivity (PG_PLAN.md §6A and §6B; durable game / application / invite / membership records and the Race Name Directory live here):

  • LOBBY_POSTGRES_PRIMARY_DSN (required; e.g. postgres://lobbyservice:secret@postgres:5432/galaxy?search_path=lobby&sslmode=disable)
  • LOBBY_POSTGRES_REPLICA_DSNS (optional, comma-separated; not consumed yet)
  • LOBBY_POSTGRES_OPERATION_TIMEOUT (default 1s)
  • LOBBY_POSTGRES_MAX_OPEN_CONNS (default 25)
  • LOBBY_POSTGRES_MAX_IDLE_CONNS (default 5)
  • LOBBY_POSTGRES_CONN_MAX_LIFETIME (default 30m)

Stream names:

  • LOBBY_GM_EVENTS_STREAM with default gm:lobby_events
  • LOBBY_GM_EVENTS_READ_BLOCK_TIMEOUT with default 2s
  • LOBBY_RUNTIME_START_JOBS_STREAM with default runtime:start_jobs
  • LOBBY_RUNTIME_STOP_JOBS_STREAM with default runtime:stop_jobs
  • LOBBY_RUNTIME_JOB_RESULTS_STREAM with default runtime:job_results
  • LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT with default 2s
  • LOBBY_NOTIFICATION_INTENTS_STREAM with default notification:intents

Upstream clients:

  • LOBBY_USER_SERVICE_TIMEOUT with default 1s
  • LOBBY_GM_TIMEOUT with default 5s

Enrollment automation:

  • LOBBY_ENROLLMENT_AUTOMATION_INTERVAL with default 30s

Race Name Directory:

  • LOBBY_RACE_NAME_DIRECTORY_BACKEND with default postgres (alternate: stub for in-process tests; PG_PLAN.md §6B retired the redis backend)
  • LOBBY_RACE_NAME_EXPIRATION_INTERVAL with default 1h — pending registration expiration worker tick

The 30-day eligibility window for pending_registration entries is the constant service/capabilityevaluation.PendingRegistrationWindow. It is intentionally not operator-tunable today; the env var name LOBBY_PENDING_REGISTRATION_TTL_HOURS is reserved for a future change.

User lifecycle:

  • LOBBY_USER_LIFECYCLE_STREAM with default user:lifecycle_events
  • LOBBY_USER_LIFECYCLE_READ_BLOCK_TIMEOUT with default 2s

OpenTelemetry:

  • standard OTEL_* variables
  • LOBBY_OTEL_STDOUT_TRACES_ENABLED
  • LOBBY_OTEL_STDOUT_METRICS_ENABLED

Persistence Layout

Game / application / invite / membership records live in PostgreSQL after PG_PLAN.md §6A; the Race Name Directory followed in §6B. See docs/postgres-migration.md for the schema and decision records. The lobby schema owns five tables — games, applications, invites, memberships, race_names — plus the partial UNIQUE index on applications(applicant_user_id, game_id) WHERE status <> 'rejected' that enforces the single-active-application invariant and the partial UNIQUE index on race_names(canonical_key) WHERE binding_kind = 'registered' that enforces single-registered-per-canonical.

The Redis-backed keys below survive both stages. Redis owns the runtime-coordination state — per-game runtime aggregates, gap activation, capability-evaluation guards, and stream consumer offsets — plus the event-bus streams themselves.

Redis key table

Storage rules for Redis:

  • timestamps are stored in Unix milliseconds unless noted otherwise
  • dynamic key segments are base64url-encoded
Logical artifact Redis key
per-game per-user stats aggregate lobby:game_turn_stats:<game_id>:<user_id> → JSON aggregate
per-game stats user index lobby:game_turn_stats_by_game:<game_id> (set of user_id)
capability-evaluation guard lobby:capability_evaluation:done:<game_id> (sentinel string)
GM event stream offset lobby:stream_offsets:gm_events
runtime job result offset lobby:stream_offsets:runtime_results
user lifecycle stream offset lobby:stream_offsets:user_lifecycle
gap window activation time lobby:gap_activated_at:<game_id>

Frozen record fields

The five durable records are stored in PostgreSQL columns; the field set per record is unchanged from the previous Redis JSON shape and is documented inline with the migration scripts under internal/adapters/postgres/migrations/.

Record Frozen fields
game record all game fields listed in Game Record Model section
application record application_id, game_id, applicant_user_id, race_name, status, created_at, decided_at
invite record invite_id, game_id, inviter_user_id, invitee_user_id, race_name (set at redeem), status, created_at, expires_at, decided_at
membership record all membership fields listed in Membership Model section
race_names row canonical_key, game_id, holder_user_id, race_name, binding_kind, source_game_id, reserved_at_ms, eligible_until_ms (pending only), registered_at_ms (registered only)

Observability

Metrics

  • lobby.game.transitions — counter; attributes: from_status, to_status, trigger (command, manual, deadline, gap, runtime_event, external_block)
  • lobby.application.outcomes — counter; attributes: outcome (submitted, approved, rejected)
  • lobby.invite.outcomes — counter; attributes: outcome (created, redeemed, declined, revoked, expired)
  • lobby.membership.changes — counter; attributes: change (activated, removed, blocked, external_block)
  • lobby.start_flow.outcomes — counter; attributes: outcome (running, paused, start_failed)
  • lobby.notification.publish_attempts — counter; attributes: notification_type, result (ok, error)
  • lobby.active_games — observable gauge; attributes: status
  • lobby.enrollment_automation.checks — counter; attributes: result (no_op, transitioned)
  • lobby.gm_events.oldest_unprocessed_age_ms — observable gauge
  • lobby.runtime_results.oldest_unprocessed_age_ms — observable gauge
  • lobby.user_lifecycle.oldest_unprocessed_age_ms — observable gauge
  • lobby.race_name.outcomes — counter; attributes: outcome (reserved, reservation_released, pending_created, pending_released, registered, registered_released)
  • lobby.pending_registration.expirations — counter; attributes: trigger (tick, manual)
  • lobby.user_lifecycle.cascade_releases — counter; attributes: event (permanent_blocked, deleted)
  • lobby.capability_evaluations — counter; attributes: result (capable, incapable, noop)

Metrics avoid high-cardinality attributes such as game_id, user_id, application_id, invite_id, and canonical_key.

Structured log fields

Key operations emit structured logs with these stable field names where applicable:

  • game_id
  • game_type
  • game_status
  • from_status
  • to_status
  • user_id
  • application_id
  • invite_id
  • membership_id
  • race_name
  • canonical_key
  • reservation_kind (reserved / pending_registration / registered)
  • eligible_until_ms
  • trigger
  • lifecycle_event
  • request_id
  • trace_id

Verification

Focused service-local coverage verifies:

  • configuration loading and validation for all env var groups
  • both HTTP listeners start and serve /healthz and /readyz
  • game CRUD: create, update, get, list with correct field validation
  • each status transition fires only from allowed source statuses
  • enrollment automation: deadline trigger, gap trigger, manual trigger
  • application flow: submit (eligibility check, race name check), approve, reject
  • invite flow: create, redeem (auto-membership), decline, revoke, expire on enrollment close
  • membership model: activate, remove, block with correct before/after-start semantics
  • Race Name Directory (redis + stub adapters against the same suite): canonicalization + confusable-pair policy, Reserve/ReleaseReservation per-game semantics, MarkPendingRegistration/ExpirePendingRegistrations window, Register idempotency + quota, ReleaseAllByUser cascade
  • game start flow: success path (→ running), GM unavailable path (→ paused), container failure path (→ start_failed), metadata persistence failure path (container removed, → start_failed)
  • GM event stream consumer: snapshot update (stats aggregate), game_finished with capability evaluation
  • user lifecycle stream consumer: permanent_blocked and deleted cascade release + membership/application/invite settlement
  • pending-registration expiration worker idempotency
  • race name registration service: capability, tariff quota, pending window, idempotent retry
  • notification intent publication for all ten supported triggers
  • visibility rules: private game hidden from non-member non-owner users
  • error model: all stable codes returned for correct conditions

Cross-service coverage verifies:

  • Lobby → User Service eligibility check compatibility (including the new max_registered_race_names field) and failure handling
  • Lobby → Notification Service intent publication for all lobby notification types
  • Lobby → Runtime Manager start job publication and result consumption
  • Lobby → Game Master synchronous registration call (success and failure)
  • User Service → Lobby cascade flow: permanent_block or DeleteUser on a user leads to full RND release + memberships blocked + applications/invites cancelled