976 lines
42 KiB
Markdown
976 lines
42 KiB
Markdown
# Game Master
|
|
|
|
`Game Master` (GM) is the only Galaxy platform service permitted to talk to
|
|
running game engine containers. It owns runtime and operational state of
|
|
already-running games, the engine version registry, the platform mapping of
|
|
`(user_id ↔ race_name ↔ engine_player_uuid)`, the per-game turn scheduler,
|
|
and the synchronous and asynchronous boundaries that other services use to
|
|
interact with running games.
|
|
|
|
## References
|
|
|
|
- [`../ARCHITECTURE.md`](../ARCHITECTURE.md) — system architecture, §8 Game
|
|
Master.
|
|
- [`../TESTING.md`](../TESTING.md) §8 — testing matrix for GM.
|
|
- [`./PLAN.md`](./PLAN.md) — staged implementation plan.
|
|
- [`./docs/README.md`](./docs/README.md) — service-local documentation entry
|
|
point (created at PLAN stage 24).
|
|
- [`./docs/stage06-contract-files.md`](./docs/stage06-contract-files.md) —
|
|
decisions behind the OpenAPI and AsyncAPI specs frozen at PLAN stage 06.
|
|
- [`./docs/stage07-notification-catalog-audit.md`](./docs/stage07-notification-catalog-audit.md) —
|
|
notification catalog audit and producer-side freeze test added at PLAN stage 07.
|
|
- [`./docs/stage08-module-skeleton.md`](./docs/stage08-module-skeleton.md) —
|
|
module skeleton wiring decisions (config groups, telemetry instruments,
|
|
Makefile targets, deferred dependencies) recorded at PLAN stage 08.
|
|
- [`./docs/stage09-postgres-migration.md`](./docs/stage09-postgres-migration.md) —
|
|
PostgreSQL schema, embedded migration, jet generation pipeline, and
|
|
runtime wiring landed at PLAN stage 09.
|
|
- [`./docs/stage10-domain-and-ports.md`](./docs/stage10-domain-and-ports.md) —
|
|
domain types, port interfaces, and the six stage-10 decisions
|
|
(operation domain package, membership DTO placement, engine-version
|
|
options shape, schedule wrapper signature, recovery transition,
|
|
deferred mock destination) landed at PLAN stage 10.
|
|
- [`./docs/stage11-persistence-adapters.md`](./docs/stage11-persistence-adapters.md) —
|
|
PostgreSQL stores (`runtimerecordstore`, `engineversionstore`,
|
|
`playermappingstore`, `operationlog`), the Redis offset store, and
|
|
the eight stage-11 decisions (sqlx/pgtest local clones, CAS
|
|
pattern, port-level Now extension, domain conflict sentinels, jsonb
|
|
cast, idempotent Deprecate, multi-row BulkInsert, miniredis
|
|
dependency) landed at PLAN stage 11.
|
|
- [`./docs/stage12-external-clients.md`](./docs/stage12-external-clients.md) —
|
|
outbound adapters (engine, Lobby, Runtime Manager, notification
|
|
intent publisher, lobby-events publisher) and the seven stage-12
|
|
decisions (per-call engine base URL, dual engine timeout dispatch,
|
|
engine population rounding, Lobby pagination cap, no extra RTM
|
|
sentinels, AsyncAPI-aligned XADD encoding for `gm:lobby_events`,
|
|
Makefile mocks-target guard) landed at PLAN stage 12.
|
|
- [`./docs/stage13-register-runtime.md`](./docs/stage13-register-runtime.md) —
|
|
register-runtime service-layer orchestrator and the five
|
|
stage-13 decisions (`RuntimeRecordStore.Delete` extension, engine
|
|
4xx/5xx classification split, engine response validated as
|
|
`engine_protocol_violation`, initial snapshot carries `player_turn_stats`
|
|
from `/admin/init`, two-flag rollback gating) landed at PLAN
|
|
stage 13.
|
|
- [`./docs/stage14-engine-version-registry.md`](./docs/stage14-engine-version-registry.md) —
|
|
engine version registry service-layer orchestrator (List, Get,
|
|
Create, Update, Deprecate, Delete, ResolveImageRef) and the five
|
|
stage-14 decisions (`EngineVersionStore.Delete` port extension,
|
|
reference probe before hard delete, new `engine_version_delete`
|
|
op_kind in schema and domain, `operation_log.game_id` overloaded
|
|
as audit subject for registry entries, JSON-object validation for
|
|
`options`) landed at PLAN stage 14.
|
|
- [`./docs/stage15-scheduler-and-turn-generation.md`](./docs/stage15-scheduler-and-turn-generation.md) —
|
|
scheduler ticker, turn-generation orchestrator, and snapshot
|
|
publisher and the seven stage-15 decisions
|
|
(`LobbyClient.GetGameSummary` extension with fail-soft `game_name`
|
|
fallback, telemetry-only `Trigger` parameter, two-CAS pattern with
|
|
external-mutation conflict, single-snapshot-per-outcome cadence,
|
|
player_mappings as recipient source, stateless scheduler utility,
|
|
in-flight set on the ticker) landed at PLAN stage 15.
|
|
- [`./docs/stage16-membership-cache-and-invalidation.md`](./docs/stage16-membership-cache-and-invalidation.md) —
|
|
hot-path services (`commandexecute`, `orderput`, `reportget`),
|
|
membership cache, and the six stage-16 decisions (no
|
|
`runtime_not_running` for reports, GM-side envelope rewrite
|
|
`commands`→`cmd` with injected `actor`, hot-path skips
|
|
`operation_log`, hand-rolled per-game inflight tracker, raw status
|
|
string return, missing-mapping surfaces as `forbidden`) landed at
|
|
PLAN stage 16.
|
|
- [`./docs/stage17-admin-operations.md`](./docs/stage17-admin-operations.md) —
|
|
admin service-layer operations (`adminstop`, `adminforce`,
|
|
`adminpatch`, `adminbanish`, `livenessreply`) and the six
|
|
stage-17 decisions (`RuntimeRecordStore.UpdateImage` extension,
|
|
`adminstop` idempotent on terminal statuses and `conflict` on
|
|
`starting`, `adminforce` always sets `skip_next_tick`,
|
|
`adminbanish` without status check and missing race surfaces as
|
|
`forbidden`, `livenessreply` 200 + empty status on
|
|
`runtime_not_found`, RTM failures map to `service_unavailable`)
|
|
landed at PLAN stage 17.
|
|
- [`./docs/stage18-health-events-consumer.md`](./docs/stage18-health-events-consumer.md) —
|
|
`runtime:health_events` consumer worker and the seven stage-18
|
|
decisions (event-type taxonomy expanded to seven values with
|
|
`container_started` and `probe_recovered`, CAS-conflict fallback to
|
|
health-only update, new `RuntimeRecordStore.UpdateEngineHealth`
|
|
port method, in-memory dedupe of last-emitted summaries,
|
|
read-after-write snapshot construction, `health_events` stream
|
|
offset label, worker wiring deferred to Stage 19) landed at PLAN
|
|
stage 18.
|
|
- [`./api/internal-openapi.yaml`](./api/internal-openapi.yaml) — internal
|
|
trusted REST contract.
|
|
- [`./api/runtime-events-asyncapi.yaml`](./api/runtime-events-asyncapi.yaml) —
|
|
`gm:lobby_events` Redis Stream contract.
|
|
- [`../game/README.md`](../game/README.md) — game engine container contract
|
|
(env, ports, admin and player REST surfaces, `/healthz`).
|
|
- [`../lobby/README.md`](../lobby/README.md) — Game Lobby integration with GM.
|
|
- [`../rtmanager/README.md`](../rtmanager/README.md) — Runtime Manager
|
|
contract used synchronously by GM admin operations.
|
|
|
|
## Purpose
|
|
|
|
A running Galaxy game lives in exactly one Docker container managed by
|
|
`Runtime Manager`. The platform must:
|
|
|
|
- register a freshly started container with platform-level membership;
|
|
- initialise the engine with the agreed race roster;
|
|
- accept and forward player commands and orders to the engine;
|
|
- route per-player report reads;
|
|
- generate turns according to a schedule;
|
|
- detect game finish and propagate it back to platform-level state;
|
|
- expose runtime/operational controls (force-next-turn, stop, patch, banish);
|
|
- own the catalogue of supported engine versions and resolve `image_ref`
|
|
values for `Game Lobby`.
|
|
|
|
`Game Master` is the single component that performs these actions. It does
|
|
**not** own platform metadata of games (that is `Game Lobby`), Docker control
|
|
(that is `Runtime Manager`), or the full game state (that is the engine
|
|
container). Engine state on disk is the engine's domain; GM never reads or
|
|
writes the bind-mounted state directory.
|
|
|
|
## Scope
|
|
|
|
`Game Master` is the source of truth for:
|
|
|
|
- the runtime mapping `game_id → engine_endpoint` for every running game;
|
|
- the runtime status (`starting | running | generation_in_progress |
|
|
generation_failed | stopped | engine_unreachable | finished`);
|
|
- the current turn number and the next-tick timestamp;
|
|
- the per-game `(user_id, race_name, engine_player_uuid)` triple;
|
|
- the engine version registry: `(version, image_ref, options, status)`;
|
|
- the durable history of every operation GM performed (`operation_log`);
|
|
- the latest engine health summary per game.
|
|
|
|
`Game Master` is **not** the source of truth for:
|
|
|
|
- platform game records (created, draft, enrollment, finished metadata) —
|
|
owned by `Game Lobby`;
|
|
- container lifecycle and Docker reality — owned by `Runtime Manager`;
|
|
- in-game world state (planets, ships, science, reports) — owned by the
|
|
engine container;
|
|
- platform user identity and entitlements — owned by `User Service`;
|
|
- in-game `race_name` reservations and the Race Name Directory — owned by
|
|
`Game Lobby`.
|
|
|
|
## Non-Goals
|
|
|
|
- Multi-instance operation in v1. GM runs as a single process; the in-process
|
|
scheduler is authoritative. Multi-instance with leader election is an
|
|
explicit future iteration.
|
|
- Direct Docker access. GM never imports the Docker SDK; every container
|
|
operation goes through `Runtime Manager` over trusted internal REST.
|
|
- Player removal/block at platform level. `Game Lobby` owns that decision;
|
|
GM only performs the engine-side `banish` call when explicitly invoked.
|
|
- Pause/resume of a running game on the platform side. `Game Lobby.paused`
|
|
is a platform-only state; GM only answers a liveness probe used by
|
|
Lobby's resume flow.
|
|
- Automatic semver-patch upgrades. Patch is always an explicit admin
|
|
operation against a target engine version present in the registry.
|
|
- TLS or mTLS on the internal listener. GM trusts its network segment.
|
|
- Direct delivery of player-visible push events. `Notification Service`
|
|
owns user-targeted push delivery; GM publishes notification intents only.
|
|
- A separate Admin Service. GM exposes its trusted internal REST surface;
|
|
Admin Service will adopt it in a later iteration.
|
|
- Engine state file management. Backup, archival, and cleanup of the
|
|
bind-mounted state directories are operator concerns.
|
|
|
|
## Position in the System
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Gateway["Edge Gateway"]
|
|
Lobby["Game Lobby"]
|
|
Admin["Admin Service\n(future)"]
|
|
GM["Game Master"]
|
|
RTM["Runtime Manager"]
|
|
Notify["Notification Service"]
|
|
Engine["Game Engine container\n(galaxy/game)"]
|
|
Postgres["PostgreSQL\nschema gamemaster"]
|
|
Redis["Redis\nstreams + caches"]
|
|
|
|
Gateway -- "verified player commands\n(REST/JSON)" --> GM
|
|
Lobby -- "register-runtime,\nimage-ref resolve,\nmemberships invalidate" --> GM
|
|
Admin -- "internal REST" --> GM
|
|
GM -- "engine HTTP API" --> Engine
|
|
GM -- "stop / restart / patch" --> RTM
|
|
GM -- "notification:intents" --> Notify
|
|
GM -- "gm:lobby_events" --> Redis
|
|
Redis -- "runtime:health_events" --> GM
|
|
GM --> Postgres
|
|
```
|
|
|
|
`Edge Gateway` routes verified player message types (`game.command.execute`,
|
|
`game.order.put`, `game.report.get`) to GM as trusted REST/JSON after
|
|
transcoding from FlatBuffers. `Game Lobby` calls GM synchronously to
|
|
register runtimes after a successful container start, to resolve `image_ref`
|
|
from the engine version registry, to invalidate membership cache on roster
|
|
changes, and to verify GM liveness during platform resume. `Game Master`
|
|
calls `Runtime Manager` synchronously over REST for stop, restart, and
|
|
patch. `Runtime Manager` publishes `runtime:health_events`, which GM
|
|
consumes asynchronously. GM publishes `gm:lobby_events` consumed by
|
|
`Game Lobby`, and `notification:intents` consumed by `Notification Service`.
|
|
|
|
## Responsibility Boundaries
|
|
|
|
`Game Master` is responsible for:
|
|
|
|
- registering a freshly started container into platform-level runtime state;
|
|
- initialising the engine with the race roster received from Lobby;
|
|
- maintaining the platform mapping of `user_id`, `race_name`, and
|
|
`engine_player_uuid`;
|
|
- forwarding player commands, orders, and report reads to the engine after
|
|
authorising the actor;
|
|
- generating turns on schedule, including the force-next-turn skip rule;
|
|
- evaluating engine finish on every turn boundary;
|
|
- publishing runtime snapshot updates and the final game-finish event;
|
|
- consuming runtime health events from `Runtime Manager` and updating its
|
|
per-game health summary;
|
|
- exposing the engine version registry CRUD;
|
|
- driving admin-level runtime operations (stop, force-next-turn, patch,
|
|
banish) by calling `Runtime Manager` and the engine on demand.
|
|
|
|
`Game Master` is not responsible for:
|
|
|
|
- creating or stopping containers on Docker (that is `Runtime Manager`);
|
|
- evaluating whether a game is allowed to start (that is `Game Lobby`);
|
|
- deriving recipient user lists for non-game notifications (that is
|
|
`Notification Service`);
|
|
- verifying authenticated transport, signatures, freshness, and replay
|
|
(that is `Edge Gateway`);
|
|
- mapping `user_id` to platform-level membership (that is `Game Lobby`).
|
|
|
|
## Engine Container Contract
|
|
|
|
The engine container is `galaxy/game`. GM uses two route classes:
|
|
|
|
| Class | Path | Purpose |
|
|
| --- | --- | --- |
|
|
| Admin (GM-only) | `POST /api/v1/admin/init` | Initialise the engine with a race roster. |
|
|
| Admin (GM-only) | `GET /api/v1/admin/status` | Read the full game state. |
|
|
| Admin (GM-only) | `PUT /api/v1/admin/turn` | Generate the next turn. |
|
|
| Admin (GM-only) | `POST /api/v1/admin/race/banish` | Deactivate a race after permanent platform removal. Body `{race_name}`. |
|
|
| Player | `PUT /api/v1/command` | Execute a batch of player commands. |
|
|
| Player | `PUT /api/v1/order` | Validate and store a batch of player orders. |
|
|
| Player | `GET /api/v1/report` | Fetch per-player turn report. |
|
|
| Probe | `GET /healthz` | Liveness probe used by `Runtime Manager` and operator tooling. |
|
|
|
|
Admin paths are unauthenticated but routed only from inside the trusted
|
|
network segment that connects GM to the engine container. The engine does
|
|
not enforce caller identity — network-level segmentation is the boundary.
|
|
|
|
`StateResponse` carries an extra boolean `finished` field. When `true` on a
|
|
turn-generation response, GM treats the game as finished and runs the
|
|
finish flow described below. The conditional logic that flips `finished`
|
|
to `true` lives in the engine's domain code and is not GM's concern.
|
|
|
|
The engine endpoint URL is the `engine_endpoint` value handed to GM by
|
|
`Game Lobby` during `register-runtime`: `http://galaxy-game-{game_id}:8080`.
|
|
The DNS name is stable across restart and patch.
|
|
|
|
## Runtime Surface
|
|
|
|
### Listeners
|
|
|
|
| Listener | Default address | Purpose |
|
|
| --- | --- | --- |
|
|
| Internal HTTP | `:8097` (`GAMEMASTER_INTERNAL_HTTP_ADDR`) | Probes (`/healthz`, `/readyz`) and the trusted REST surface for `Edge Gateway`, `Game Lobby`, and `Admin Service`. |
|
|
|
|
There is no public listener. The internal listener is unauthenticated and
|
|
assumes a trusted network segment. Authentication of player commands has
|
|
already happened at `Edge Gateway`; GM enforces authorisation only.
|
|
|
|
### Background workers
|
|
|
|
| Worker | Driver | Description |
|
|
| --- | --- | --- |
|
|
| Scheduler ticker | 1 s loop | Scans `runtime_records` for due `next_generation_at`, runs the turn-generation service for each, recomputes `next_generation_at` from `turn_schedule` (skipping one tick when `skip_next_tick=true` is set). |
|
|
| `runtime:health_events` consumer | Redis Stream | XREADs from `runtime:health_events` (produced by RTM), updates `runtime_records.engine_health` summary, debounces `runtime_snapshot_update` publication. |
|
|
|
|
### Startup dependencies
|
|
|
|
In start order:
|
|
|
|
1. PostgreSQL primary (`GAMEMASTER_POSTGRES_PRIMARY_DSN`). Embedded goose
|
|
migrations apply synchronously before any listener opens.
|
|
2. Redis master (`GAMEMASTER_REDIS_MASTER_ADDR`).
|
|
3. Telemetry exporter (OTLP grpc/http or stdout).
|
|
4. Internal HTTP listener.
|
|
5. Health-events consumer worker.
|
|
6. Scheduler ticker worker.
|
|
|
|
A failure in any step exits the process non-zero.
|
|
|
|
### Probes
|
|
|
|
`/healthz` reports liveness — the process responds when the HTTP server is
|
|
alive.
|
|
|
|
`/readyz` reports readiness — `200` only when the PostgreSQL pool can ping
|
|
the primary and the Redis master client can ping. No deeper dependency is
|
|
checked synchronously; the engine is reached only on demand.
|
|
|
|
Both probes are documented in
|
|
[`./api/internal-openapi.yaml`](./api/internal-openapi.yaml).
|
|
|
|
## Lifecycles
|
|
|
|
### Register-runtime
|
|
|
|
**Triggered by:** `Game Lobby` after a successful container start, calling
|
|
`POST /api/v1/internal/games/{game_id}/register-runtime` with body
|
|
`{engine_endpoint, members:[{user_id, race_name}], target_engine_version,
|
|
turn_schedule}`.
|
|
|
|
**Flow on success:**
|
|
|
|
1. Validate request shape; reject with `invalid_request` if any required
|
|
field is missing.
|
|
2. Reject with `conflict` if `runtime_records.{game_id}` already exists.
|
|
3. Resolve `image_ref` for `target_engine_version` from `engine_versions`;
|
|
reject with `engine_version_not_found` when missing.
|
|
4. Persist `runtime_records` with `status=starting`, `engine_endpoint`,
|
|
`current_image_ref`, `current_engine_version`, `turn_schedule`, and
|
|
`created_at`.
|
|
5. Call engine `POST /api/v1/admin/init` with the race-name list derived
|
|
from `members`.
|
|
6. Read `StateResponse` and persist one `player_mappings` row per player:
|
|
`(game_id, user_id, race_name, engine_player_uuid)`.
|
|
7. CAS `runtime_records.status: starting → running`. Persist
|
|
`current_turn=0` and `next_generation_at` computed from `turn_schedule`.
|
|
8. Append `operation_log` entry (`op_kind=register_runtime`,
|
|
`outcome=success`).
|
|
9. Publish `runtime_snapshot_update` to `gm:lobby_events`.
|
|
10. Return `200` with the persisted `runtime_records` row.
|
|
|
|
**Failure paths:**
|
|
|
|
| Failure | Side effect | Outcome to caller |
|
|
| --- | --- | --- |
|
|
| Invalid envelope | None | `400 invalid_request` |
|
|
| `runtime_records` already exists | None | `409 conflict` |
|
|
| Engine `/admin/init` returns 4xx | Roll back `runtime_records`; append failure to `operation_log` | `502 engine_validation_error` |
|
|
| Engine `/admin/init` returns 5xx or fails at the transport layer | Roll back; append failure | `502 engine_unreachable` |
|
|
| Engine response missing players or contains races not in roster | Roll back; append failure | `502 engine_protocol_violation` |
|
|
| PostgreSQL transaction failure | Roll back; append failure if possible | `503 service_unavailable` |
|
|
|
|
A failed `register-runtime` leaves no `runtime_records` row and no
|
|
`player_mappings` rows. `Game Lobby` then transitions the platform game
|
|
record to `paused` (per the architecture's flow §4 forced-pause path).
|
|
|
|
### Turn generation
|
|
|
|
**Triggered by:** the scheduler ticker when `now >= next_generation_at`
|
|
for a game in `status=running`, or by an admin invocation of
|
|
`force-next-turn`.
|
|
|
|
**Flow on success:**
|
|
|
|
1. CAS `runtime_records.status: running → generation_in_progress`. If the
|
|
CAS fails (status changed concurrently), the tick is skipped silently.
|
|
2. Call engine `PUT /api/v1/admin/turn`. Engine returns `StateResponse`
|
|
with the new `turn` and the updated `player[]` array.
|
|
3. Persist `runtime_records.current_turn` and refresh
|
|
`runtime_records.engine_health` summary.
|
|
4. If `StateResponse.finished == true`:
|
|
- CAS `runtime_records.status: generation_in_progress → finished`;
|
|
- publish `game_finished` to `gm:lobby_events` with
|
|
`{game_id, final_turn_number, finished_at_ms, player_turn_stats[]}`;
|
|
- publish `game.finished` notification intent to all `active` members.
|
|
5. If `StateResponse.finished == false`:
|
|
- CAS `runtime_records.status: generation_in_progress → running`;
|
|
- recompute `next_generation_at` from `turn_schedule`. If
|
|
`skip_next_tick=true`, advance by one extra cron step and clear the
|
|
flag;
|
|
- publish `runtime_snapshot_update` to `gm:lobby_events` with
|
|
`{game_id, current_turn, runtime_status, engine_health_summary,
|
|
player_turn_stats[]}`;
|
|
- publish `game.turn.ready` notification intent to all `active`
|
|
members.
|
|
6. Append `operation_log` entry (`op_kind=turn_generation`,
|
|
`outcome=success`).
|
|
|
|
**Failure paths:**
|
|
|
|
| Failure | Side effect | Outcome |
|
|
| --- | --- | --- |
|
|
| Engine timeout / 5xx | CAS `status: generation_in_progress → generation_failed`; publish `runtime_snapshot_update`; publish `game.generation_failed` admin notification | Logged; ticker leaves the game in `generation_failed` until manual recovery (admin issues `force-next-turn` or `stop`). |
|
|
| Persistence failure after engine success | Append failure to `operation_log`; status stays `generation_in_progress` | Health-summary update on next probe will resync. |
|
|
|
|
`player_turn_stats[]` is built from `StateResponse.player[]` by mapping
|
|
`raceName → user_id` through `player_mappings` and projecting
|
|
`{user_id, planets, population}`. `ships_built` is intentionally absent
|
|
(see [`./docs/stage01-architecture-sync.md`](./docs/stage01-architecture-sync.md)).
|
|
|
|
### Force-next-turn
|
|
|
|
**Triggered by:** `Admin Service` or system-admin via
|
|
`POST /api/v1/internal/runtimes/{game_id}/force-next-turn`.
|
|
|
|
**Pre-conditions:** runtime exists, `status=running`.
|
|
|
|
**Flow:**
|
|
|
|
1. Run the turn-generation flow synchronously (the same code path the
|
|
scheduler uses).
|
|
2. After success, set `runtime_records.skip_next_tick = true`. The next
|
|
regular tick computed from `turn_schedule` is then advanced by one
|
|
extra step before being persisted as `next_generation_at`.
|
|
3. Append `operation_log` entry (`op_kind=force_next_turn`).
|
|
|
|
The skip rule guarantees that the inter-turn spacing is never shorter than
|
|
one schedule interval, regardless of when the force is issued.
|
|
|
|
### Game finish
|
|
|
|
The finish flow is driven entirely by the engine signal `finished:bool`.
|
|
GM never decides finish independently. After `game_finished` is published,
|
|
`Game Lobby` transitions its platform record to `finished`, runs the
|
|
capability evaluation, and finalises Race Name Directory state. The GM
|
|
record stays in `status=finished` indefinitely; cleanup is operator-driven.
|
|
|
|
### Banish (engine-side player removal)
|
|
|
|
**Triggered by:** `Game Lobby` synchronously calling
|
|
`POST /api/v1/internal/games/{game_id}/race/{race_name}/banish` after a
|
|
permanent membership removal at platform level.
|
|
|
|
**Pre-conditions:** runtime exists; `race_name` resolves to an existing
|
|
`player_mappings` row.
|
|
|
|
**Flow:**
|
|
|
|
1. Call engine `POST /api/v1/admin/race/banish` with `{race_name}`.
|
|
2. On engine success, append `operation_log` entry (`op_kind=banish`,
|
|
`outcome=success`).
|
|
3. Return `204` to Lobby.
|
|
|
|
**Failure path:** engine error returns `502 engine_unreachable`. Lobby
|
|
treats this as a degraded state and may retry; the platform-level
|
|
membership stays `removed` regardless.
|
|
|
|
### Stop
|
|
|
|
**Triggered by:** system-admin via
|
|
`POST /api/v1/internal/runtimes/{game_id}/stop` with body `{reason}`,
|
|
where `reason ∈ {admin_request, finished, timeout}`.
|
|
|
|
**Flow:**
|
|
|
|
1. Call `Runtime Manager` `POST /api/v1/internal/runtimes/{game_id}/stop`
|
|
with the same `reason`.
|
|
2. CAS `runtime_records.status: * → stopped`.
|
|
3. Append `operation_log` entry.
|
|
4. Publish `runtime_snapshot_update` reflecting the stopped status.
|
|
|
|
### Patch
|
|
|
|
**Triggered by:** system-admin via
|
|
`POST /api/v1/internal/runtimes/{game_id}/patch` with body `{version}`.
|
|
|
|
**Pre-conditions:**
|
|
|
|
- `engine_versions.{version}` exists with `status=active`;
|
|
- the new version is a semver-patch of the current version (same major and
|
|
minor); otherwise reject with `semver_patch_only`.
|
|
|
|
**Flow:**
|
|
|
|
1. Resolve `image_ref` from `engine_versions.{version}`.
|
|
2. Call `Runtime Manager`
|
|
`POST /api/v1/internal/runtimes/{game_id}/patch` with `{image_ref}`.
|
|
3. On success, persist new `current_image_ref` and `current_engine_version`
|
|
on `runtime_records`.
|
|
4. Append `operation_log` entry.
|
|
|
|
The engine container is recreated by RTM with the same DNS name; the
|
|
`engine_endpoint` is unchanged. GM does not call `/admin/init` again —
|
|
the bind-mounted state directory is preserved and the engine resumes from
|
|
the previous turn.
|
|
|
|
### Liveness reply (Lobby resume)
|
|
|
|
**Triggered by:** `Game Lobby` resuming a paused game, calling
|
|
`GET /api/v1/internal/games/{game_id}/liveness`.
|
|
|
|
**Flow:** if `runtime_records.{game_id}` exists and `status=running`,
|
|
return `200 {ready: true}`. Otherwise return `200 {ready: false, status:
|
|
"<observed status>"}`.
|
|
|
|
This endpoint never calls the engine; it reflects GM's own view only.
|
|
|
|
## Hot Path
|
|
|
|
### Player commands and orders
|
|
|
|
Both `game.command.execute` and `game.order.put` use the same FlatBuffers
|
|
schema (`pkg/schema/fbs/order.fbs` `Order{updated_at, commands:[…]}`). The
|
|
gateway transcodes the verified payload to JSON via
|
|
`pkg/transcoder/order.go` before calling GM.
|
|
|
|
**GM endpoints:**
|
|
|
|
- `POST /api/v1/internal/games/{game_id}/commands` — execute now; engine
|
|
`PUT /api/v1/command`.
|
|
- `POST /api/v1/internal/games/{game_id}/orders` — validate-and-store;
|
|
engine `PUT /api/v1/order`.
|
|
|
|
Both endpoints accept body `{commands:[{cmd_id, @type, …}, …]}` and the
|
|
`X-User-ID` header. The actor field on the engine call is **always** set
|
|
by GM from the authenticated user identity; GM never trusts a payload
|
|
field for actor identification.
|
|
|
|
**Pre-conditions:**
|
|
|
|
- `runtime_records.{game_id}` exists with `status=running`;
|
|
- the user is an `active` member of the game (cache lookup);
|
|
- `player_mappings.(game_id, user_id)` exists.
|
|
|
|
**Errors:**
|
|
|
|
- `runtime_not_found` — runtime missing.
|
|
- `runtime_not_running` — `runtime_status` is anything other than
|
|
`running`.
|
|
- `forbidden` — caller is not an active member.
|
|
- `engine_unreachable` — engine returned 5xx.
|
|
- `engine_validation_error` — engine returned 4xx; the body carries the
|
|
engine's per-command result (`cmd_applied`, `cmd_error_code`).
|
|
|
|
### Reports
|
|
|
|
**GM endpoint:** `GET /api/v1/internal/games/{game_id}/reports/{turn}`
|
|
with the `X-User-ID` header.
|
|
|
|
**Flow:**
|
|
|
|
1. Authorise: caller must be an active member of the game.
|
|
2. Resolve `race_name` from `player_mappings`.
|
|
3. Call engine `GET /api/v1/report?player={race_name}&turn={turn}`.
|
|
4. Return the engine response verbatim. Reports are full per-player
|
|
payloads and are never cached at the platform layer; the engine remains
|
|
the source of truth.
|
|
|
|
### Membership cache and invalidation
|
|
|
|
GM holds an in-process per-game TTL cache (default 30 s) of memberships
|
|
loaded from `Lobby /api/v1/internal/games/{id}/memberships`. The cache
|
|
shape is `map[user_id]MembershipStatus` plus a load timestamp. TTL is
|
|
the safety-net fallback.
|
|
|
|
The primary invalidation mechanism is an explicit hook from Lobby:
|
|
|
|
- Endpoint: `POST /api/v1/internal/games/{game_id}/memberships/invalidate`.
|
|
- Lobby invokes it post-commit on every operation that mutates roster:
|
|
application approval, application rejection, invite redeem, member
|
|
remove, member block, user-lifecycle cascade.
|
|
- Failed invalidation does not roll back Lobby state; the TTL safety net
|
|
catches stale data within the next 30 s.
|
|
|
|
This is a deliberate tight coupling. The trade-off is recorded in
|
|
[`./PLAN.md` Stage 16](./PLAN.md).
|
|
|
|
## Engine Version Registry
|
|
|
|
The registry is the source of truth for which engine versions are
|
|
deployable. CRUD is exposed on the GM internal port; `Game Lobby`
|
|
consumes it synchronously to resolve `image_ref` for `target_engine_version`
|
|
just before publishing a `runtime:start_jobs` envelope.
|
|
|
|
| Method | Path | Purpose |
|
|
| --- | --- | --- |
|
|
| `GET` | `/api/v1/internal/engine-versions` | List versions; supports `status` filter. |
|
|
| `POST` | `/api/v1/internal/engine-versions` | Create a new version with `version`, `image_ref`, optional `options`. Validates semver shape and Docker reference. |
|
|
| `GET` | `/api/v1/internal/engine-versions/{version}` | Read one version. |
|
|
| `PATCH` | `/api/v1/internal/engine-versions/{version}` | Update `image_ref`, `options`, or `status`. |
|
|
| `DELETE` | `/api/v1/internal/engine-versions/{version}` | Soft-deprecate (`status=deprecated`). Hard delete is rejected if the version is referenced by any non-finished `runtime_records` row. |
|
|
| `GET` | `/api/v1/internal/engine-versions/{version}/image-ref` | Resolve `image_ref` only. Used by Lobby's start flow. |
|
|
|
|
`options` is a free-form `jsonb` document stored verbatim. v1 does not
|
|
enforce a schema; future engine-side options follow the engine's own
|
|
contract.
|
|
|
|
`status` values: `active` (deployable), `deprecated` (rejected on new
|
|
starts; existing runtimes unaffected). Hard removal of a deprecated
|
|
version requires that no runtime references it.
|
|
|
|
Lobby resolves `image_ref` synchronously per game start. If the resolve
|
|
call fails or the version is missing, Lobby fails the start with
|
|
`engine_version_not_found` and never publishes `runtime:start_jobs`.
|
|
|
|
## Trusted Surfaces
|
|
|
|
### Internal REST
|
|
|
|
The internal REST surface is consumed by:
|
|
|
|
- `Edge Gateway` — verified player commands and report reads;
|
|
- `Game Lobby` — register-runtime, image-ref resolve, membership invalidate,
|
|
banish, liveness reply;
|
|
- `Admin Service` (future) — full administrative operations;
|
|
- platform probes — `/healthz`, `/readyz`.
|
|
|
|
The listener is unauthenticated; downstream services rely on network
|
|
segmentation. Caller identity for audit is recorded from the optional
|
|
`X-Galaxy-Caller` header (`gateway`, `lobby`, `admin`) and reflected as
|
|
`op_source` in `operation_log` (`gateway_player`, `lobby_internal`,
|
|
`admin_rest`); when missing or unrecognised, GM defaults to
|
|
`op_source=admin_rest`.
|
|
|
|
For player-command endpoints, the additional `X-User-ID` header is
|
|
required and authoritative for the acting user identity.
|
|
|
|
Request and response shapes are defined in
|
|
[`./api/internal-openapi.yaml`](./api/internal-openapi.yaml). Unknown JSON
|
|
fields are rejected with `invalid_request`.
|
|
|
|
## Async Stream Contracts
|
|
|
|
### `gm:lobby_events` (out)
|
|
|
|
Producer: `Game Master`. Consumer: `Game Lobby`.
|
|
|
|
Two message types share the stream, discriminated by `event_type`:
|
|
|
|
| `event_type` | Body |
|
|
| --- | --- |
|
|
| `runtime_snapshot_update` | `{game_id, current_turn, runtime_status, engine_health_summary, player_turn_stats:[{user_id, planets, population}], occurred_at_ms}` |
|
|
| `game_finished` | `{game_id, final_turn_number, runtime_status:"finished", player_turn_stats:[…], finished_at_ms}` |
|
|
|
|
Publication cadence: events only. GM publishes a snapshot when:
|
|
|
|
- a turn was generated (success or failure);
|
|
- `runtime_status` transitioned (e.g., `running ↔ generation_in_progress`,
|
|
`running → engine_unreachable`, `* → finished`);
|
|
- `engine_health_summary` changed in response to a `runtime:health_events`
|
|
observation (debounced — duplicates are suppressed when the summary did
|
|
not change).
|
|
|
|
There is no periodic heartbeat. `Game Lobby` consumes these events to
|
|
update its denormalised runtime snapshot and to feed the per-game
|
|
`player_turn_stats` aggregate used at game finish.
|
|
|
|
The first `runtime_snapshot_update` published right after a successful
|
|
`register-runtime` carries `player_turn_stats` projected from the
|
|
engine `/admin/init` response — the per-player baseline (`planets`,
|
|
`population`) at turn 0. Lobby treats this baseline as the reference
|
|
point against which subsequent turn deltas are measured. For other
|
|
status transitions that fire without a fresh engine state payload
|
|
(e.g., a pure health-summary change), `player_turn_stats` is empty.
|
|
|
|
The full schema is enforced by
|
|
[`./api/runtime-events-asyncapi.yaml`](./api/runtime-events-asyncapi.yaml).
|
|
|
|
### `runtime:health_events` (in)
|
|
|
|
Producer: `Runtime Manager`. Consumer: `Game Master`.
|
|
|
|
GM consumes the stream to update `runtime_records.engine_health` summary
|
|
per game. The schema is owned by `Runtime Manager` and documented in
|
|
[`../rtmanager/api/runtime-health-asyncapi.yaml`](../rtmanager/api/runtime-health-asyncapi.yaml).
|
|
GM never modifies `runtime:health_events`; it is read-only.
|
|
|
|
GM does not publish notifications in response to runtime health changes
|
|
in v1; the operator surface is `gm:lobby_events` plus the GM REST
|
|
inspect endpoints.
|
|
|
|
## Notification Contracts
|
|
|
|
`Game Master` publishes notification intents to `notification:intents`
|
|
using the shared `pkg/notificationintent` producer module:
|
|
|
|
| Trigger | `notification_type` | Audience | Channels |
|
|
| --- | --- | --- | --- |
|
|
| Successful turn generation | `game.turn.ready` | active members of the game | `push+email` |
|
|
| Game finish | `game.finished` | active members of the game | `push+email` |
|
|
| Turn generation failed | `game.generation_failed` | configured admin email list | `email` |
|
|
|
|
Recipient resolution: GM materialises `recipient_user_ids` from its own
|
|
membership cache (loaded from Lobby) at publish time; admin recipients
|
|
are resolved by `Notification Service` from configuration.
|
|
|
|
A failed publication is a notification degradation and must not roll back
|
|
already committed runtime state. Failed publications are logged and
|
|
counted via `gamemaster.notification.publish_attempts`.
|
|
|
|
## Persistence Layout
|
|
|
|
### PostgreSQL durable state (schema `gamemaster`)
|
|
|
|
| Table | Purpose | Key |
|
|
| --- | --- | --- |
|
|
| `runtime_records` | One row per game; latest known runtime status and scheduling state. | `game_id` |
|
|
| `engine_versions` | Engine version registry. | `version` |
|
|
| `player_mappings` | `(game_id, user_id) → race_name + engine_player_uuid`. | composite `(game_id, user_id)` |
|
|
| `operation_log` | Append-only audit of every GM operation. | `id` (auto) |
|
|
|
|
`runtime_records` columns:
|
|
|
|
- `game_id` — primary key, references Lobby's identifier.
|
|
- `status` — `starting | running | generation_in_progress |
|
|
generation_failed | stopped | engine_unreachable | finished`.
|
|
- `engine_endpoint` — `http://galaxy-game-{game_id}:8080`.
|
|
- `current_image_ref` — Docker reference of the running image.
|
|
- `current_engine_version` — semver string registered in `engine_versions`.
|
|
- `turn_schedule` — five-field cron expression copied from Lobby.
|
|
- `current_turn` — last completed turn number; `0` until the first turn
|
|
generates.
|
|
- `next_generation_at` — UTC timestamp of the next due tick.
|
|
- `skip_next_tick` — boolean; set by `force-next-turn`, cleared after the
|
|
first cron step is skipped.
|
|
- `engine_health` — short text summary derived from
|
|
`runtime:health_events`.
|
|
- `created_at`, `updated_at`, `started_at`, `stopped_at`, `finished_at` —
|
|
lifecycle timestamps.
|
|
|
|
`engine_versions` columns:
|
|
|
|
- `version` — primary key; semver string.
|
|
- `image_ref` — non-empty Docker reference.
|
|
- `options` — `jsonb`, free-form, default `'{}'`.
|
|
- `status` — `active | deprecated`.
|
|
- `created_at`, `updated_at`.
|
|
|
|
`player_mappings` columns:
|
|
|
|
- composite primary key `(game_id, user_id)`.
|
|
- `race_name` — non-empty string; unique per `game_id`.
|
|
- `engine_player_uuid` — UUID returned by the engine `/admin/init`.
|
|
- `created_at`.
|
|
|
|
`operation_log` columns:
|
|
|
|
- `id`, `game_id`, `op_kind` (`register_runtime | turn_generation |
|
|
force_next_turn | banish | stop | patch | engine_version_create |
|
|
engine_version_update | engine_version_deprecate |
|
|
engine_version_delete`), `op_source`, `source_ref` (request id
|
|
when known), `outcome` (`success | failure`), `error_code`,
|
|
`error_message`, `started_at`, `finished_at`.
|
|
|
|
For engine-version registry entries (`op_kind` starting with
|
|
`engine_version_`), the `game_id` column doubles as the audit subject
|
|
and stores the canonical `version` string instead of a platform game
|
|
identifier; the registry is global, not per-game. The convention is
|
|
documented in
|
|
[`./docs/stage14-engine-version-registry.md`](./docs/stage14-engine-version-registry.md).
|
|
|
|
Indexes:
|
|
|
|
- `runtime_records (status, next_generation_at)` — drives the scheduler
|
|
ticker scan.
|
|
- `operation_log (game_id, started_at DESC)` — drives audit reads.
|
|
- UNIQUE on `player_mappings (game_id, race_name)` —
|
|
one-race-per-game invariant.
|
|
|
|
Per-game roster reads (`WHERE game_id = $1`) are served by the
|
|
leftmost prefix of the composite primary key on
|
|
`player_mappings (game_id, user_id)`; no extra single-column index is
|
|
added.
|
|
|
|
Migrations are embedded `00001_init.sql` (single-init pre-launch policy
|
|
from `ARCHITECTURE.md §Persistence Backends`).
|
|
|
|
### Redis runtime-coordination state
|
|
|
|
| Key shape | Purpose |
|
|
| --- | --- |
|
|
| `gamemaster:stream_offsets:{label}` | Last processed entry id per consumer (`health_events`). Same shape as Lobby and RTM. |
|
|
|
|
GM does not persist the membership cache to Redis in v1; the cache is
|
|
in-process. This trade-off is documented in [`./PLAN.md` Stage 16](./PLAN.md).
|
|
|
|
## Error Model
|
|
|
|
Error envelope: `{ "error": { "code": "...", "message": "..." } }`,
|
|
identical to Lobby and RTM.
|
|
|
|
Stable error codes:
|
|
|
|
| Code | Meaning |
|
|
| --- | --- |
|
|
| `invalid_request` | Malformed JSON, unknown fields, missing required parameter. |
|
|
| `runtime_not_found` | `runtime_records.{game_id}` does not exist. |
|
|
| `runtime_not_running` | Operation requires `status=running`. |
|
|
| `conflict` | State transition not allowed. |
|
|
| `forbidden` | Caller is not an active member or not authorised. |
|
|
| `engine_version_not_found` | `engine_versions.{version}` does not exist. |
|
|
| `engine_version_in_use` | Hard-delete attempt against a version referenced by a non-finished runtime. |
|
|
| `semver_patch_only` | Patch attempt across major/minor boundary. |
|
|
| `engine_unreachable` | Engine returned 5xx or connection error. |
|
|
| `engine_protocol_violation` | Engine response missing required fields or carries unexpected payload. |
|
|
| `engine_validation_error` | Engine returned 4xx with per-command results. |
|
|
| `service_unavailable` | Dependency (PostgreSQL, Redis, Lobby, RTM) unavailable. |
|
|
| `internal_error` | Unspecified failure. |
|
|
|
|
## Configuration
|
|
|
|
All variables use the `GAMEMASTER_` prefix. Required variables fail-fast
|
|
on startup.
|
|
|
|
### Required
|
|
|
|
- `GAMEMASTER_INTERNAL_HTTP_ADDR`
|
|
- `GAMEMASTER_POSTGRES_PRIMARY_DSN`
|
|
- `GAMEMASTER_REDIS_MASTER_ADDR`
|
|
- `GAMEMASTER_REDIS_PASSWORD`
|
|
- `GAMEMASTER_LOBBY_INTERNAL_BASE_URL`
|
|
- `GAMEMASTER_RTM_INTERNAL_BASE_URL`
|
|
|
|
### Configuration groups
|
|
|
|
**Listener:**
|
|
|
|
- `GAMEMASTER_INTERNAL_HTTP_ADDR` (e.g., `:8097`).
|
|
- `GAMEMASTER_INTERNAL_HTTP_READ_TIMEOUT` (default `5s`).
|
|
- `GAMEMASTER_INTERNAL_HTTP_WRITE_TIMEOUT` (default `30s`).
|
|
- `GAMEMASTER_INTERNAL_HTTP_IDLE_TIMEOUT` (default `60s`).
|
|
|
|
**PostgreSQL:**
|
|
|
|
- `GAMEMASTER_POSTGRES_PRIMARY_DSN`
|
|
(`postgres://gamemaster:<pwd>@<host>:5432/galaxy?search_path=gamemaster&sslmode=disable`).
|
|
- `GAMEMASTER_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not used
|
|
in v1).
|
|
- `GAMEMASTER_POSTGRES_OPERATION_TIMEOUT` (default `2s`).
|
|
- `GAMEMASTER_POSTGRES_MAX_OPEN_CONNS` (default `10`).
|
|
- `GAMEMASTER_POSTGRES_MAX_IDLE_CONNS` (default `2`).
|
|
- `GAMEMASTER_POSTGRES_CONN_MAX_LIFETIME` (default `30m`).
|
|
|
|
**Redis:**
|
|
|
|
- `GAMEMASTER_REDIS_MASTER_ADDR`.
|
|
- `GAMEMASTER_REDIS_REPLICA_ADDRS` (optional, comma-separated).
|
|
- `GAMEMASTER_REDIS_PASSWORD`.
|
|
- `GAMEMASTER_REDIS_DB` (default `0`).
|
|
- `GAMEMASTER_REDIS_OPERATION_TIMEOUT` (default `2s`).
|
|
|
|
**Streams:**
|
|
|
|
- `GAMEMASTER_REDIS_LOBBY_EVENTS_STREAM` (default `gm:lobby_events`).
|
|
- `GAMEMASTER_REDIS_HEALTH_EVENTS_STREAM` (default
|
|
`runtime:health_events`).
|
|
- `GAMEMASTER_REDIS_NOTIFICATION_INTENTS_STREAM` (default
|
|
`notification:intents`).
|
|
- `GAMEMASTER_STREAM_BLOCK_TIMEOUT` (default `5s`).
|
|
|
|
**Engine client:**
|
|
|
|
- `GAMEMASTER_ENGINE_CALL_TIMEOUT` (default `30s` — covers turn generation
|
|
on large games).
|
|
- `GAMEMASTER_ENGINE_PROBE_TIMEOUT` (default `5s` — for inspect-style
|
|
reads).
|
|
|
|
**Lobby internal client:**
|
|
|
|
- `GAMEMASTER_LOBBY_INTERNAL_BASE_URL`.
|
|
- `GAMEMASTER_LOBBY_INTERNAL_TIMEOUT` (default `2s`).
|
|
|
|
**Runtime Manager internal client:**
|
|
|
|
- `GAMEMASTER_RTM_INTERNAL_BASE_URL`.
|
|
- `GAMEMASTER_RTM_INTERNAL_TIMEOUT` (default `5s`).
|
|
|
|
**Scheduler:**
|
|
|
|
- `GAMEMASTER_SCHEDULER_TICK_INTERVAL` (default `1s`).
|
|
- `GAMEMASTER_TURN_GENERATION_TIMEOUT` (default `60s`).
|
|
|
|
**Membership cache:**
|
|
|
|
- `GAMEMASTER_MEMBERSHIP_CACHE_TTL` (default `30s`).
|
|
- `GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES` (default `4096`; LRU eviction).
|
|
|
|
**Logging:**
|
|
|
|
- `GAMEMASTER_LOG_LEVEL` (default `info`).
|
|
|
|
**Lifecycle:**
|
|
|
|
- `GAMEMASTER_SHUTDOWN_TIMEOUT` (default `30s`).
|
|
|
|
**Telemetry:** uses the standard OTLP env vars
|
|
(`OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_PROTOCOL`, etc.)
|
|
shared with other Galaxy services.
|
|
|
|
## Observability
|
|
|
|
### Metrics (OpenTelemetry, low cardinality)
|
|
|
|
- `gamemaster.register_runtime.outcomes` — counter; labels `outcome`,
|
|
`error_code`.
|
|
- `gamemaster.turn_generation.outcomes` — counter; labels `outcome`,
|
|
`error_code`, `trigger` (`scheduler | force`).
|
|
- `gamemaster.command_execute.outcomes` — counter; labels `outcome`,
|
|
`error_code`.
|
|
- `gamemaster.order_put.outcomes` — counter; labels `outcome`,
|
|
`error_code`.
|
|
- `gamemaster.report_get.outcomes` — counter; labels `outcome`,
|
|
`error_code`.
|
|
- `gamemaster.banish.outcomes` — counter; labels `outcome`, `error_code`.
|
|
- `gamemaster.engine_call.latency` — histogram; label `op` (`init |
|
|
status | turn | banish | command | order | report`).
|
|
- `gamemaster.runtime_records_by_status` — gauge; label `status`.
|
|
- `gamemaster.scheduler.due_games` — gauge.
|
|
- `gamemaster.health_events.consumed` — counter.
|
|
- `gamemaster.lobby_events.published` — counter; label `event_type`.
|
|
- `gamemaster.notification.publish_attempts` — counter; label
|
|
`notification_type`, `result` (`ok | error`).
|
|
- `gamemaster.membership_cache.hits` — counter; labels `result` (`hit |
|
|
miss | invalidate`).
|
|
- `gamemaster.engine_versions_total` — gauge.
|
|
|
|
Metrics avoid high-cardinality attributes such as `game_id` and `user_id`.
|
|
|
|
### Structured logs (slog JSON to stdout)
|
|
|
|
Common fields on every entry: `service=gamemaster`, `request_id`,
|
|
`trace_id`, `span_id`, `game_id` (when known), `user_id` (when known),
|
|
`op_kind`, `op_source`, `outcome`, `error_code`.
|
|
|
|
Worker-specific fields: `event_type` (lobby-events publisher),
|
|
`stream_entry_id` (health-events consumer), `turn` (turn-generation),
|
|
`engine_endpoint` (engine calls).
|
|
|
|
## Verification
|
|
|
|
Service-level (per [`./PLAN.md`](./PLAN.md)):
|
|
|
|
- Unit tests for every service-layer operation against mocked engine,
|
|
Lobby, RTM, notification publisher, lobby-events publisher.
|
|
- Adapter tests using `testcontainers-go` for PostgreSQL and Redis.
|
|
- Contract tests for `internal-openapi.yaml` and
|
|
`runtime-events-asyncapi.yaml`.
|
|
|
|
Service-local integration suite under `gamemaster/integration/`:
|
|
|
|
- Register-runtime + first turn happy path against the real
|
|
`galaxy/game` test image.
|
|
- Force-next-turn skip behaviour.
|
|
- Engine version registry CRUD + resolve.
|
|
- Admin stop synchronous REST.
|
|
- Banish round-trip.
|
|
- Membership invalidation hook.
|
|
- `runtime:health_events` consumption.
|
|
|
|
Inter-service suite under `integration/lobbygm/` and
|
|
`integration/lobbygmrtm/`:
|
|
|
|
- `lobbygm`: real Lobby + real GM + real engine + stub RTM. Covers
|
|
enrollment → register-runtime → first turn → finish + capability
|
|
evaluation.
|
|
- `lobbygmrtm`: full Lobby + GM + RTM + engine. Covers happy path and the
|
|
documented failure paths from `ARCHITECTURE.md` flow §4.
|
|
|
|
Manual smoke (development):
|
|
|
|
```sh
|
|
docker network create galaxy-net # once
|
|
GAMEMASTER_INTERNAL_HTTP_ADDR=:8097 \
|
|
GAMEMASTER_POSTGRES_PRIMARY_DSN=postgres://gamemaster:secret@localhost:5432/galaxy?search_path=gamemaster&sslmode=disable \
|
|
GAMEMASTER_REDIS_MASTER_ADDR=localhost:6379 \
|
|
GAMEMASTER_REDIS_PASSWORD=secret \
|
|
GAMEMASTER_LOBBY_INTERNAL_BASE_URL=http://localhost:8095 \
|
|
GAMEMASTER_RTM_INTERNAL_BASE_URL=http://localhost:8096 \
|
|
... go run ./gamemaster/cmd/gamemaster
|
|
```
|
|
|
|
After start, `curl http://localhost:8097/readyz` returns `200`. Driving
|
|
Lobby through its public start flow brings up `galaxy-game-{game_id}`
|
|
containers, GM registers each runtime, generates turns on the configured
|
|
schedule, and propagates events to Lobby.
|