feat: gamemaster
This commit is contained in:
+123
-28
@@ -417,9 +417,9 @@ It also stores a denormalized runtime snapshot for convenience, at least:
|
||||
* `engine_health_summary`.
|
||||
|
||||
Additionally, `Game Lobby` aggregates per-member game statistics from
|
||||
`player_turn_stats` carried on each `runtime_snapshot_update` event: current
|
||||
and running-max of `planets`, `population`, and `ships_built`. The aggregate
|
||||
is retained from game start until capability evaluation at `game_finished`.
|
||||
`player_turn_stats` carried on each `runtime_snapshot_update` event:
|
||||
current and running-max of `planets` and `population`. The aggregate is
|
||||
retained from game start until capability evaluation at `game_finished`.
|
||||
|
||||
This prevents user-facing list/read flows from fan-out requests into `Game Master`.
|
||||
|
||||
@@ -544,7 +544,7 @@ background worker.
|
||||
`RND.ReleaseAllByUser(user_id)` atomically with membership/application/invite
|
||||
cancellations for the affected user.
|
||||
|
||||
## 8. Game Master
|
||||
## 8. [Game Master](gamemaster/README.md)
|
||||
|
||||
`Game Master` owns runtime and operational metadata of already running games.
|
||||
|
||||
@@ -561,6 +561,40 @@ It owns:
|
||||
* engine version registry and version-specific engine options;
|
||||
* runtime mapping `platform user_id -> engine player UUID` for each running game.
|
||||
|
||||
### Topology
|
||||
|
||||
`Game Master` runs as a single process in v1. The in-process scheduler is
|
||||
authoritative; multi-instance with leader election is an explicit future
|
||||
iteration. Every other service that interacts with `Game Master`
|
||||
(`Edge Gateway`, `Game Lobby`, `Admin Service`, `Runtime Manager`) treats
|
||||
GM as a singleton on the trusted network segment.
|
||||
|
||||
### Engine container contract
|
||||
|
||||
`Game Master` is the only platform component that talks to the engine. The
|
||||
engine container exposes two route classes:
|
||||
|
||||
* admin paths under `/api/v1/admin/*` — `init`, `status`, `turn`, and
|
||||
`race/banish`. They are unauthenticated and reachable only inside the
|
||||
trusted network segment that connects GM to the engine container;
|
||||
* player paths under `/api/v1/{command, order, report}` — invoked by GM on
|
||||
behalf of an authenticated platform user; the actor field on each call
|
||||
is set by GM from the verified user identity, never from the inbound
|
||||
payload;
|
||||
* `GET /healthz` — liveness probe used by `Runtime Manager` and operator
|
||||
tooling.
|
||||
|
||||
Two engine-side fields are part of the contract:
|
||||
|
||||
* `StateResponse.finished:bool` — when `true` on a turn-generation
|
||||
response, GM transitions the runtime to `finished`, publishes
|
||||
`game_finished`, and dispatches the finish notification. The conditional
|
||||
logic that flips the flag lives in the engine's domain code and is not
|
||||
GM's concern;
|
||||
* `POST /api/v1/admin/race/banish` with body `{race_name}` — invoked by GM
|
||||
in response to the Lobby-driven banish flow after a permanent
|
||||
platform-level membership removal. The engine returns `204` on success.
|
||||
|
||||
### Game Master status model
|
||||
|
||||
Minimum runtime-level status set:
|
||||
@@ -571,8 +605,12 @@ Minimum runtime-level status set:
|
||||
* `generation_failed`
|
||||
* `stopped`
|
||||
* `engine_unreachable`
|
||||
* `finished`
|
||||
|
||||
`running` here means `running_accepting_commands`.
|
||||
`running` here means `running_accepting_commands`. `finished` is terminal:
|
||||
the runtime record stays in this state indefinitely; no further turn
|
||||
generation, command, or order is accepted, and operator cleanup is the
|
||||
only path out.
|
||||
|
||||
### Game command routing
|
||||
|
||||
@@ -599,14 +637,25 @@ Private-game owner can use the subset allowed for the owner of that game.
|
||||
|
||||
### Turn cutoff and scheduling
|
||||
|
||||
`Game Master` is the owner of authoritative platform time for turn cutoff decisions.
|
||||
`Game Master` is the owner of authoritative platform time for turn cutoff
|
||||
decisions.
|
||||
|
||||
Commands arriving exactly on the boundary of a new turn are considered stale and must not reach the engine.
|
||||
The cutoff is enforced by a single status compare-and-swap: every player
|
||||
command, order, and report read requires `runtime_status=running` at the
|
||||
moment of the call, and turn generation begins by CAS-ing
|
||||
`running → generation_in_progress`. There is no separately tracked shadow
|
||||
window or grace period — the status transition itself is the boundary.
|
||||
Commands arriving after the CAS are rejected with `runtime_not_running`.
|
||||
|
||||
The scheduler is a subsystem inside `Game Master`.
|
||||
It triggers turn generation according to the game schedule.
|
||||
The scheduler is a subsystem inside `Game Master`. It triggers turn
|
||||
generation according to the game schedule.
|
||||
|
||||
If a manual “force next turn” is executed, the next scheduled turn slot must be skipped so that players still get at least one full normal schedule interval before the following generated turn.
|
||||
If a manual `force next turn` is executed, the next scheduled turn slot
|
||||
must be skipped so that players still get at least one full normal
|
||||
schedule interval before the following generated turn. The skip is
|
||||
recorded as `runtime_records.skip_next_tick=true`; the scheduler advances
|
||||
`next_generation_at` by one extra cron step the next time it computes the
|
||||
tick and clears the flag.
|
||||
|
||||
### Runtime snapshot publishing
|
||||
|
||||
@@ -615,16 +664,27 @@ consumed by `Game Lobby`. Events include:
|
||||
|
||||
* `runtime_snapshot_update` — carries the current `current_turn`,
|
||||
`runtime_status`, `engine_health_summary`, and a `player_turn_stats` array
|
||||
with one entry per active member (`user_id`, `planets`, `population`,
|
||||
`ships_built`). `Game Lobby` maintains a per-game per-user stats aggregate
|
||||
from these events for capability evaluation at game finish.
|
||||
with one entry per active member (`user_id`, `planets`, `population`).
|
||||
`Game Lobby` maintains a per-game per-user stats aggregate from these
|
||||
events for capability evaluation at game finish.
|
||||
* `game_finished` — carries the final snapshot values and triggers the
|
||||
platform status transition plus Race Name Directory capability evaluation
|
||||
inside `Game Lobby`.
|
||||
|
||||
`Game Master` does not retain the aggregate; it only publishes the per-turn
|
||||
observation. `Game Lobby` is responsible for holding initial values and
|
||||
running maxima across the lifetime of the game.
|
||||
Publication cadence is event-driven. GM publishes a snapshot when:
|
||||
|
||||
* a turn was generated (success or failure);
|
||||
* `runtime_status` transitioned (e.g.,
|
||||
`running ↔ generation_in_progress`, `running → engine_unreachable`,
|
||||
`* → finished`);
|
||||
* `engine_health_summary` changed in response to a `runtime:health_events`
|
||||
observation; consecutive observations with identical summaries are
|
||||
debounced.
|
||||
|
||||
There is no periodic heartbeat. `Game Master` does not retain the
|
||||
aggregate; it only publishes the per-turn observation. `Game Lobby` is
|
||||
responsible for holding initial values and running maxima across the
|
||||
lifetime of the game.
|
||||
|
||||
### Runtime/engine finish flow
|
||||
|
||||
@@ -847,13 +907,17 @@ requests for no operational benefit.
|
||||
* `Gateway -> Admin Service`
|
||||
* `Gateway -> User Service`
|
||||
* `Gateway -> Game Lobby`
|
||||
* `Gateway -> Game Master`
|
||||
* `Gateway -> Game Master` for verified player command, order, and report
|
||||
calls;
|
||||
* `Auth / Session Service -> User Service`
|
||||
* `Auth / Session Service -> Mail Service`
|
||||
* `Geo Profile Service -> Auth / Session Service`
|
||||
* `Geo Profile Service -> User Service`
|
||||
* `Game Lobby -> User Service`
|
||||
* `Game Lobby -> Game Master` for critical registration/update calls
|
||||
* `Game Lobby -> Game Master` for `register-runtime` after a successful
|
||||
container start, engine-version `image-ref` resolve, membership
|
||||
invalidation hook, banish, and the liveness reply consumed by Lobby's
|
||||
resume flow;
|
||||
* `Game Master -> Runtime Manager` for inspect, restart, patch, stop, and cleanup REST calls
|
||||
* `Admin Service -> Runtime Manager` for operational inspect, restart, patch, stop, and cleanup REST calls
|
||||
|
||||
@@ -864,11 +928,15 @@ requests for no operational benefit.
|
||||
* `Lobby -> Runtime Manager` runtime jobs through `runtime:start_jobs` (`{game_id, image_ref, requested_at_ms}`) and `runtime:stop_jobs` (`{game_id, reason, requested_at_ms}`);
|
||||
* `Runtime Manager -> Lobby` job outcomes through `runtime:job_results`;
|
||||
* `Runtime Manager -> Notification Service` admin-only failure intents (image pull, container start, start config) through `notification:intents`;
|
||||
* `Runtime Manager` outbound technical health stream `runtime:health_events` consumed by `Game Master`; `Game Lobby` and `Admin Service` are reserved as future consumers;
|
||||
* `Runtime Manager` outbound technical health stream `runtime:health_events`
|
||||
consumed by `Game Master`; `Game Lobby` and `Admin Service` are reserved
|
||||
as future consumers;
|
||||
* all event-bus propagation;
|
||||
* `Game Master -> Game Lobby` runtime snapshot updates (including
|
||||
`player_turn_stats` for capability aggregation) and game-finish events
|
||||
through a dedicated Redis Stream consumed by `Game Lobby`;
|
||||
through the `gm:lobby_events` Redis Stream consumed by `Game Lobby`,
|
||||
published event-only with no periodic heartbeat (turn generation,
|
||||
status transition, or debounced engine-health summary change);
|
||||
* `User Service -> Game Lobby` user lifecycle events
|
||||
(`user.lifecycle.permanent_blocked`, `user.lifecycle.deleted`) through the
|
||||
`user:lifecycle_events` Redis Stream, consumed by `Game Lobby` to cascade
|
||||
@@ -908,6 +976,10 @@ PostgreSQL is the source of truth for table-shaped business state:
|
||||
registry (registered/reservation/pending tiers);
|
||||
* runtime manager runtime records (`game_id -> current_container_id`),
|
||||
per-operation audit log, and latest health snapshot per game;
|
||||
* game master runtime records (`game_id -> engine_endpoint`,
|
||||
status/turn/scheduling), the engine version registry (`engine_versions`),
|
||||
per-game player mappings (`game_id, user_id -> race_name,
|
||||
engine_player_uuid`), and the GM operation log;
|
||||
* idempotency records, expressed as `UNIQUE` constraints on the durable
|
||||
table — not as a separate kv;
|
||||
* retry scheduling state, expressed as a `next_attempt_at` column on the
|
||||
@@ -931,9 +1003,9 @@ Redis is the source of truth for ephemeral and runtime-coordination state:
|
||||
### Database topology
|
||||
|
||||
* Single PostgreSQL database `galaxy`.
|
||||
* Schema per service: `user`, `mail`, `notification`, `lobby`, `rtmanager`.
|
||||
Reserved for future use: `geoprofile`. Not allocated unless needed:
|
||||
`gateway`, `authsession`.
|
||||
* Schema per service: `user`, `mail`, `notification`, `lobby`, `rtmanager`,
|
||||
`gamemaster`. Reserved for future use: `geoprofile`. Not allocated unless
|
||||
needed: `gateway`, `authsession`.
|
||||
* Each service connects with its own PostgreSQL role whose grants are
|
||||
restricted to its own schema (defense-in-depth).
|
||||
* Authentication is username + password only. `sslmode=disable`. No client
|
||||
@@ -1012,7 +1084,8 @@ crossing the SQL boundary carry `time.UTC` as their location.
|
||||
### Configuration
|
||||
|
||||
For each service `<S>` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`,
|
||||
`LOBBY`, `RTMANAGER`, `GATEWAY`, `AUTHSESSION` }, the Redis connection accepts:
|
||||
`LOBBY`, `RTMANAGER`, `GAMEMASTER`, `GATEWAY`, `AUTHSESSION` }, the Redis
|
||||
connection accepts:
|
||||
|
||||
* `<S>_REDIS_MASTER_ADDR` (required)
|
||||
* `<S>_REDIS_REPLICA_ADDRS` (optional, comma-separated)
|
||||
@@ -1020,7 +1093,7 @@ For each service `<S>` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`,
|
||||
* `<S>_REDIS_DB`, `<S>_REDIS_OPERATION_TIMEOUT`
|
||||
|
||||
For PG-backed services (`USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`,
|
||||
`RTMANAGER`) the Postgres connection accepts:
|
||||
`RTMANAGER`, `GAMEMASTER`) the Postgres connection accepts:
|
||||
|
||||
* `<S>_POSTGRES_PRIMARY_DSN` (required;
|
||||
`postgres://<role>:<pwd>@<host>:5432/galaxy?search_path=<schema>&sslmode=disable`)
|
||||
@@ -1384,7 +1457,17 @@ Rules:
|
||||
* upgrade during a running game is allowed only as a patch update within the same major/minor line;
|
||||
* game-engine version management is manual in v1;
|
||||
* each engine version may carry version-specific engine options;
|
||||
* `Game Master` owns the engine version registry and its internal API.
|
||||
* `Game Master` owns the engine version registry from v1 — `(version,
|
||||
image_ref, options, status)` rows live in the `gamemaster` schema and
|
||||
are managed exclusively through GM's internal REST surface;
|
||||
* `Game Lobby` resolves `image_ref` synchronously through GM at game start
|
||||
by calling `GET /api/v1/internal/engine-versions/{version}/image-ref`;
|
||||
`LOBBY_ENGINE_IMAGE_TEMPLATE` and any Lobby-side template-based
|
||||
resolution are removed without a backward-compat shim. If GM is
|
||||
unavailable when Lobby attempts the resolve, the start fails with
|
||||
`service_unavailable` and `runtime:start_jobs` is never published;
|
||||
* `Runtime Manager` continues to receive a verbatim `image_ref` from the
|
||||
start envelope and never resolves engine versions itself.
|
||||
|
||||
## Administrative Access Model
|
||||
|
||||
@@ -1457,7 +1540,7 @@ Recommended order for implementation is:
|
||||
6. **Game Lobby Service** (implemented)
|
||||
Platform game records, membership, invites, applications, approvals, schedules, user-facing lists, pre-start lifecycle.
|
||||
|
||||
7. **Runtime Manager**
|
||||
7. **Runtime Manager** (implemented)
|
||||
Dedicated Docker-control service for container lifecycle (start, stop,
|
||||
restart, semver-patch, cleanup) and inspect/health monitoring through
|
||||
Docker events, periodic inspect, and active HTTP probes. Driven
|
||||
@@ -1466,7 +1549,19 @@ Recommended order for implementation is:
|
||||
`Admin Service` via the trusted internal REST surface.
|
||||
|
||||
8. **Game Master**
|
||||
Running-game orchestration, engine version registry, runtime state, turn scheduler, engine API mediation, operational controls.
|
||||
Single-instance running-game orchestrator. Owns the runtime state
|
||||
(`game_id → engine_endpoint`, status, current turn, scheduling, engine
|
||||
health), the engine version registry consumed synchronously by
|
||||
`Game Lobby` for `image_ref` resolution, and the platform mapping
|
||||
`(user_id, race_name, engine_player_uuid)` per running game. Drives
|
||||
the turn scheduler with the force-next-turn skip rule, mediates every
|
||||
engine HTTP call (admin paths under `/api/v1/admin/*`, player paths
|
||||
under `/api/v1/{command, order, report}`), and reacts to
|
||||
`StateResponse.finished` by transitioning the runtime to `finished` and
|
||||
publishing `game_finished`. Drives `Runtime Manager` synchronously over
|
||||
REST for stop, restart, and patch; consumes `runtime:health_events`
|
||||
from RTM; publishes `gm:lobby_events` (event-only, no heartbeat) and
|
||||
`notification:intents`. Never opens the Docker SDK.
|
||||
|
||||
9. **Admin Service**
|
||||
Admin UI backend that orchestrates trusted APIs of other services.
|
||||
|
||||
Reference in New Issue
Block a user