1277 lines
50 KiB
Markdown
1277 lines
50 KiB
Markdown
# Game Master Implementation Plan
|
|
|
|
This plan delivers `Game Master` (GM), the platform service that owns
|
|
runtime and operational state of running Galaxy games, mediates every call
|
|
to the engine container, runs the turn scheduler, and owns the engine
|
|
version registry.
|
|
|
|
The plan also delivers the upstream changes that GM depends on: the
|
|
extracted `pkg/cronutil` module, the engine admin-path rename plus the
|
|
`finished:bool` field and the new `/admin/race/banish` endpoint on
|
|
`galaxy/game`, the Lobby refactor that drops `LOBBY_ENGINE_IMAGE_TEMPLATE`
|
|
in favour of synchronous image-ref resolution against GM, and the
|
|
membership invalidation hook from Lobby into GM.
|
|
|
|
The architectural rules behind every decision are recorded in
|
|
[`./README.md`](./README.md). This file describes the order in which the
|
|
implementation lands.
|
|
|
|
## Global Rules
|
|
|
|
- Documentation always lands before contracts; contracts before code.
|
|
- Each stage leaves the repository in a buildable, test-green state. No
|
|
stage relies on a later stage to fix a regression it introduced.
|
|
- Existing-service refactors (Lobby image-ref resolver, Lobby membership
|
|
invalidation hook, game engine path rename plus `finished` field plus
|
|
banish endpoint, `pkg/cronutil` extraction) are full-fledged stages of
|
|
this plan; they precede every GM stage that depends on them.
|
|
- GM never opens the Docker SDK. Every container operation goes through
|
|
`Runtime Manager` over trusted internal REST.
|
|
- GM never trusts an `actor` field provided in a payload from `Edge
|
|
Gateway`; it always derives `actor=race_name` from its own
|
|
`(user_id → race_name)` mapping.
|
|
- Every functional change ships its tests in the same stage. Contract
|
|
tests freeze operation IDs and stream message names from Stage 06
|
|
onward.
|
|
- All code, docs, and identifiers are written in English.
|
|
- Engine domain logic (when `finished=true` is set, what `banish` mutates
|
|
inside the game) is user-owned and explicitly out of scope; this plan
|
|
ships only the contract, router plumbing, and stub handlers for those
|
|
pieces.
|
|
|
|
## Suggested Module Structure
|
|
|
|
```text
|
|
gamemaster/
|
|
├── cmd/
|
|
│ ├── gamemaster/
|
|
│ │ └── main.go
|
|
│ └── jetgen/
|
|
│ └── main.go
|
|
│
|
|
├── internal/
|
|
│ ├── app/
|
|
│ │ ├── app.go
|
|
│ │ ├── runtime.go
|
|
│ │ ├── wiring.go
|
|
│ │ └── bootstrap.go
|
|
│ │
|
|
│ ├── config/
|
|
│ │ ├── config.go
|
|
│ │ ├── env.go
|
|
│ │ └── validation.go
|
|
│ │
|
|
│ ├── logging/
|
|
│ │ ├── logger.go
|
|
│ │ └── context.go
|
|
│ │
|
|
│ ├── telemetry/
|
|
│ │ └── runtime.go
|
|
│ │
|
|
│ ├── domain/
|
|
│ │ ├── runtime/
|
|
│ │ │ ├── model.go
|
|
│ │ │ └── transitions.go
|
|
│ │ ├── engineversion/
|
|
│ │ │ ├── model.go
|
|
│ │ │ └── semver.go
|
|
│ │ ├── playermapping/
|
|
│ │ │ └── model.go
|
|
│ │ └── schedule/
|
|
│ │ └── nexttick.go
|
|
│ │
|
|
│ ├── ports/
|
|
│ │ ├── runtimerecordstore.go
|
|
│ │ ├── engineversionstore.go
|
|
│ │ ├── playermappingstore.go
|
|
│ │ ├── operationlog.go
|
|
│ │ ├── streamoffsetstore.go
|
|
│ │ ├── engineclient.go
|
|
│ │ ├── lobbyclient.go
|
|
│ │ ├── rtmclient.go
|
|
│ │ ├── notificationpublisher.go
|
|
│ │ └── lobbyeventspublisher.go
|
|
│ │
|
|
│ ├── adapters/
|
|
│ │ ├── postgres/
|
|
│ │ │ ├── migrations/
|
|
│ │ │ ├── jet/
|
|
│ │ │ ├── runtimerecordstore/
|
|
│ │ │ ├── engineversionstore/
|
|
│ │ │ ├── playermappingstore/
|
|
│ │ │ └── operationlog/
|
|
│ │ ├── redisstate/
|
|
│ │ │ └── streamoffsets/
|
|
│ │ ├── engineclient/
|
|
│ │ ├── lobbyclient/
|
|
│ │ ├── rtmclient/
|
|
│ │ ├── notificationpublisher/
|
|
│ │ ├── lobbyeventspublisher/
|
|
│ │ └── mocks/
|
|
│ │
|
|
│ ├── service/
|
|
│ │ ├── registerruntime/
|
|
│ │ ├── engineversion/
|
|
│ │ ├── scheduler/
|
|
│ │ ├── turngeneration/
|
|
│ │ ├── commandexecute/
|
|
│ │ ├── orderput/
|
|
│ │ ├── reportget/
|
|
│ │ ├── membership/
|
|
│ │ ├── adminstop/
|
|
│ │ ├── adminforce/
|
|
│ │ ├── adminpatch/
|
|
│ │ ├── adminbanish/
|
|
│ │ └── livenessreply/
|
|
│ │
|
|
│ ├── worker/
|
|
│ │ ├── schedulerticker/
|
|
│ │ └── healtheventsconsumer/
|
|
│ │
|
|
│ └── api/
|
|
│ └── internalhttp/
|
|
│ ├── server.go
|
|
│ └── handlers/
|
|
│
|
|
├── api/
|
|
│ ├── internal-openapi.yaml
|
|
│ └── runtime-events-asyncapi.yaml
|
|
│
|
|
├── integration/
|
|
│ ├── harness/
|
|
│ ├── registerruntime_test.go
|
|
│ ├── scheduler_test.go
|
|
│ ├── hotpath_test.go
|
|
│ ├── adminops_test.go
|
|
│ ├── healthevents_test.go
|
|
│ └── notification_test.go
|
|
│
|
|
├── docs/
|
|
│ ├── README.md
|
|
│ ├── runtime.md
|
|
│ ├── flows.md
|
|
│ ├── runbook.md
|
|
│ ├── examples.md
|
|
│ └── postgres-migration.md
|
|
│
|
|
├── README.md
|
|
├── PLAN.md
|
|
├── Makefile
|
|
└── go.mod
|
|
```
|
|
|
|
## ~~Stage 01.~~ Update `ARCHITECTURE.md`
|
|
|
|
Goal:
|
|
|
|
- align the project-wide source of truth with every decision recorded in
|
|
[`./README.md`](./README.md) before any code change touches it.
|
|
|
|
Tasks:
|
|
|
|
- Expand `ARCHITECTURE.md §8` (Game Master) with subsections: engine
|
|
container contract (admin vs player paths, `finished:bool` semantics,
|
|
`banish` endpoint), runtime status enum (`starting | running |
|
|
generation_in_progress | generation_failed | stopped |
|
|
engine_unreachable | finished`), turn cutoff rule (no shadow window;
|
|
CAS-only), force-next-turn skip rule, snapshot publishing cadence
|
|
(events only, no heartbeat), single-instance topology.
|
|
- Update §«Versioning of Game Engines»: GM owns the engine version
|
|
registry from v1; Lobby resolves `image_ref` synchronously through GM.
|
|
`LOBBY_ENGINE_IMAGE_TEMPLATE` is removed. `engine_versions` table lives
|
|
in the `gamemaster` schema.
|
|
- Update §«Fixed synchronous interactions»: add `Game Lobby → Game Master`
|
|
for `register-runtime`, image-ref resolve, membership invalidation
|
|
hook, banish, and liveness reply. Add `Edge Gateway → Game Master` for
|
|
player commands, orders, and reports.
|
|
- Update §«Fixed asynchronous interactions»: add `Game Master → Game
|
|
Lobby` runtime snapshot updates and game-finish events through the
|
|
`gm:lobby_events` Redis Stream (already mentioned, expanded with
|
|
cadence rules); add `Runtime Manager → Game Master` health events
|
|
consumption (`runtime:health_events`) — already mentioned, confirmed.
|
|
- Update §«Persistence Backends»: add `gamemaster` schema to the
|
|
schema-per-service list and to PG-backed services.
|
|
- Update §«Configuration»: add `GAMEMASTER` to the env-var prefix list
|
|
with the same shape rules as other PG/Redis-backed services.
|
|
- Update §«Recommended Order of Service Implementation» entry 8 with the
|
|
scope finalised in [`./README.md`](./README.md).
|
|
- Drop `ships_built` from every architectural mention of
|
|
`player_turn_stats`. Update the capability rule wording to use
|
|
`planets` and `population` only (no behavioural change; `ships_built`
|
|
was unused).
|
|
|
|
Files touched:
|
|
|
|
- `ARCHITECTURE.md`.
|
|
|
|
Exit criteria:
|
|
|
|
- every later GM, Lobby, Notification, or Game stage can quote its rules
|
|
from `ARCHITECTURE.md` without re-deciding them.
|
|
- `go test ./...` is unaffected (this stage changes only Markdown).
|
|
|
|
## ~~Stage 02.~~ Freeze GM `README.md`
|
|
|
|
Status: implemented as part of this planning task — see
|
|
[`./README.md`](./README.md).
|
|
|
|
Goal:
|
|
|
|
- publish the complete service description so contracts and code can
|
|
reference one source.
|
|
|
|
Exit criteria:
|
|
|
|
- a reviewer can answer any «what does GM do when X» question by reading
|
|
the README alone.
|
|
|
|
## ~~Stage 03.~~ Sync existing-service docs (Lobby, Notification, Game, RTM)
|
|
|
|
Goal:
|
|
|
|
- bring the READMEs of every touched service into agreement with the GM
|
|
contract before any code in those services changes.
|
|
|
|
Tasks:
|
|
|
|
- `lobby/README.md`:
|
|
- replace the `LOBBY_ENGINE_IMAGE_TEMPLATE` configuration entry with a
|
|
new `LOBBY_GM_BASE_URL`-backed image-ref resolve via
|
|
`GET /api/v1/internal/engine-versions/{version}/image-ref`;
|
|
- document the new outgoing `POST /api/v1/internal/games/{id}/memberships/invalidate`
|
|
call from `removemember`, `blockmember`, `approveapplication`,
|
|
`rejectapplication`, `redeeminvite`, and the user-lifecycle cascade
|
|
worker (post-commit, fail-open);
|
|
- drop `ships_built` from the `player_turn_stats` description and from
|
|
the capability evaluation wording (rule already reduces to planets +
|
|
population);
|
|
- add a paragraph in §Game Start Flow noting that `image_ref` is
|
|
resolved from GM synchronously and that GM unavailability turns
|
|
`lobby.game.start` into `service_unavailable`.
|
|
- `lobby/PLAN.md`: append a closing note stating that the image-ref
|
|
template removal and the membership invalidation hook are landed by
|
|
the Game Master plan; no new stages added in Lobby's own PLAN.
|
|
- `notification/README.md`: confirm the catalog already lists
|
|
`game.turn.ready`, `game.finished`, `game.generation_failed` and add
|
|
a one-line note that GM is the producer.
|
|
- `game/README.md`:
|
|
- document the new path layout: admin endpoints under
|
|
`/api/v1/admin/*` (`init`, `status`, `turn`, `race/banish`); player
|
|
endpoints unchanged at `/api/v1/{command, order, report}`;
|
|
- document the `finished:bool` extension on `StateResponse`;
|
|
- document the `POST /api/v1/admin/race/banish` request/response shape
|
|
(body `{race_name}`; response `204`).
|
|
- `rtmanager/README.md`: add a closing note that `runtime:health_events`
|
|
is now consumed by Game Master in production (was reserved as a future
|
|
consumer).
|
|
|
|
Files touched:
|
|
|
|
- `lobby/README.md`, `lobby/PLAN.md`, `notification/README.md`,
|
|
`game/README.md`, `rtmanager/README.md`.
|
|
|
|
Exit criteria:
|
|
|
|
- every doc in the repo agrees on the post-GM contract; no contradiction
|
|
remains between any two READMEs.
|
|
- `go test ./...` is unaffected.
|
|
|
|
## ~~Stage 04.~~ Extract `pkg/cronutil` + wire Lobby
|
|
|
|
Goal:
|
|
|
|
- own a single cron parser/calculator across the workspace, used today
|
|
by Lobby and tomorrow by GM.
|
|
|
|
Tasks:
|
|
|
|
- Create new workspace module `pkg/cronutil/` with:
|
|
- `cronutil.go`: thin wrapper over
|
|
`github.com/robfig/cron/v3.NewParser(cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow)`;
|
|
exports `Parse(expr string) (Schedule, error)` and
|
|
`Schedule.Next(after time.Time) time.Time`;
|
|
- `cronutil_test.go`: parser validation tests covering five-field cron
|
|
expressions (e.g., `0 18 * * *`, `*/15 * * * *`), invalid expressions,
|
|
DST/timezone behaviour (Schedule operates in UTC; UTC inputs yield
|
|
UTC outputs);
|
|
- `go.mod` declaring the module `galaxy/cronutil` with replace target.
|
|
- Wire from Lobby: replace any inline `robfig/cron/v3` usage in
|
|
`lobby/internal/domain/game/model.go:validateCronExpr` and the
|
|
enrollment automation worker with calls into `pkg/cronutil`. The
|
|
enrollment automation worker does not parse cron today (it uses
|
|
`enrollment_ends_at` UTC seconds), so the only Lobby caller is the
|
|
cron-validation path on game records.
|
|
- Update `go.work` to include `./pkg/cronutil` and add the replace block.
|
|
- Add Lobby unit tests confirming `validateCronExpr` accepts and rejects
|
|
the same expressions as before.
|
|
|
|
Files new:
|
|
|
|
- `pkg/cronutil/{cronutil.go, cronutil_test.go, go.mod, go.sum}`.
|
|
|
|
Files touched:
|
|
|
|
- `go.work`, `go.work.sum`, `lobby/internal/domain/game/model.go`,
|
|
`lobby/go.mod`, `lobby/go.sum`.
|
|
|
|
Exit criteria:
|
|
|
|
- `go build ./...` succeeds.
|
|
- `go test ./pkg/cronutil/... ./lobby/...` passes.
|
|
- `lobby/internal/domain/game/model_test.go` still asserts the same
|
|
acceptance set on cron expressions.
|
|
|
|
## ~~Stage 05.~~ Game engine contract: admin paths + finished + banish
|
|
|
|
Goal:
|
|
|
|
- ship the contract changes to `galaxy/game` that GM depends on: admin
|
|
routes under `/api/v1/admin/*`, the `StateResponse.finished` field,
|
|
and the new `/admin/race/banish` endpoint.
|
|
|
|
Tasks:
|
|
|
|
- `game/openapi.yaml`:
|
|
- rename `/api/v1/init` → `/api/v1/admin/init` (operation
|
|
`initGame` → `adminInitGame`);
|
|
- rename `/api/v1/status` → `/api/v1/admin/status` (operation
|
|
`getGameStatus` → `adminGetGameStatus`);
|
|
- rename `/api/v1/turn` → `/api/v1/admin/turn` (operation
|
|
`generateTurn` → `adminGenerateTurn`);
|
|
- add `POST /api/v1/admin/race/banish` (operation `adminBanishRace`)
|
|
with body `{race_name}` and `204 No Content` on success; document
|
|
the same `400` and `500` error envelopes as the existing endpoints;
|
|
- extend `StateResponse` schema with `finished:bool` (required;
|
|
default `false` from server perspective documented in description).
|
|
- `game/internal/router/router.go` (or its router-helper file): rename
|
|
the route constants and registrations to the new admin paths; add a
|
|
new route for `/admin/race/banish` wired to a stub handler returning
|
|
`204` with empty body.
|
|
- `game/internal/router/handler/banish.go`: new file with a stub handler
|
|
that decodes the body, validates `race_name` is non-empty, and returns
|
|
`204`. Logging only; no game-state mutation. The user fills in domain
|
|
logic in a separate change.
|
|
- `game/internal/model/state.go`: add `Finished bool` field to the Go
|
|
struct backing `StateResponse`. Default-zero (`false`) on serialisation;
|
|
the user fills in conditional logic.
|
|
- `game/internal/router/{init,status,turn}_test.go`: update path
|
|
literals to the new admin form; tests stay green.
|
|
- `game/openapi_contract_test.go`: assert presence of the new operation
|
|
IDs (`adminInitGame`, `adminGetGameStatus`, `adminGenerateTurn`,
|
|
`adminBanishRace`), the new path components, and the `finished` field
|
|
on `StateResponse`.
|
|
|
|
Files new:
|
|
|
|
- `game/internal/router/handler/banish.go`,
|
|
`game/internal/router/banish_test.go` (path-level test only).
|
|
|
|
Files touched:
|
|
|
|
- `game/openapi.yaml`, `game/openapi_contract_test.go`,
|
|
`game/internal/router/router.go`, `game/internal/router/handler/*.go`,
|
|
`game/internal/router/{init,status,turn}_test.go`,
|
|
`game/internal/model/state.go`.
|
|
|
|
Exit criteria:
|
|
|
|
- `go test ./game/...` passes.
|
|
- `docker build -t galaxy/game:test -f game/Dockerfile .` from the
|
|
workspace root still succeeds.
|
|
- `curl -X POST http://localhost:8080/api/v1/admin/race/banish -d
|
|
'{"race_name":"Aelinari"}'` against a running container returns `204`.
|
|
|
|
## ~~Stage 06.~~ GM contract files and contract tests
|
|
|
|
Goal:
|
|
|
|
- ship machine-readable contracts before any GM handler is written, so
|
|
the implementation has a target spec.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/api/internal-openapi.yaml`: every internal REST endpoint
|
|
with request and response schemas; error envelope `{ "error": {
|
|
"code", "message" } }` identical to Lobby. Operation IDs:
|
|
`internalRegisterRuntime`, `internalGetRuntime`, `internalListRuntimes`,
|
|
`internalForceNextTurn`, `internalStopRuntime`, `internalPatchRuntime`,
|
|
`internalBanishRace`, `internalInvalidateMemberships`,
|
|
`internalGameLiveness`, `internalListEngineVersions`,
|
|
`internalCreateEngineVersion`, `internalGetEngineVersion`,
|
|
`internalUpdateEngineVersion`, `internalDeprecateEngineVersion`,
|
|
`internalResolveEngineVersionImageRef`, `internalExecuteCommands`,
|
|
`internalPutOrders`, `internalGetReport`, `internalHealthz`,
|
|
`internalReadyz`.
|
|
- `gamemaster/api/runtime-events-asyncapi.yaml`: AsyncAPI 3.1.0 spec for
|
|
`gm:lobby_events`. Two `event_type` values: `runtime_snapshot_update`
|
|
and `game_finished`. Frozen field set per message:
|
|
`runtime_snapshot_update {game_id, current_turn, runtime_status,
|
|
engine_health_summary, player_turn_stats[], occurred_at_ms}`;
|
|
`game_finished {game_id, final_turn_number, runtime_status,
|
|
player_turn_stats[], finished_at_ms}`.
|
|
- `gamemaster/contract_openapi_test.go`: load the OpenAPI spec via
|
|
`kin-openapi`, assert every operation ID is present, every required
|
|
field on every request/response schema is present, and that
|
|
`additionalProperties: false` is set on every body schema.
|
|
- `gamemaster/contract_asyncapi_test.go`: load the AsyncAPI spec via the
|
|
shared YAML walker pattern from `notification/contract_asyncapi_test.go`;
|
|
assert message names, channel addresses, action vocabulary
|
|
(`send`/`receive`), and `event_type` discriminator values.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/api/internal-openapi.yaml`,
|
|
`gamemaster/api/runtime-events-asyncapi.yaml`,
|
|
`gamemaster/contract_openapi_test.go`,
|
|
`gamemaster/contract_asyncapi_test.go`.
|
|
|
|
Exit criteria:
|
|
|
|
- both specs validate.
|
|
- contract tests pass; tests fail loudly if any operation ID, message
|
|
name, or required field disappears.
|
|
|
|
## ~~Stage 07.~~ Notification catalog audit (no-op or minor)
|
|
|
|
Goal:
|
|
|
|
- confirm the GM-owned notification types (`game.turn.ready`,
|
|
`game.finished`, `game.generation_failed`) are already wired through
|
|
`pkg/notificationintent`, the `notification` service's catalog data
|
|
tables, and `notification/api/intents-asyncapi.yaml`. Add freeze
|
|
assertions so a future drift breaks loudly.
|
|
|
|
Tasks:
|
|
|
|
- Run a freeze test inside `gamemaster/` that imports
|
|
`galaxy/notificationintent` and asserts the existence of the three
|
|
constructors plus payload struct shapes.
|
|
- Inspect `notification/api/intents-asyncapi.yaml` for the three message
|
|
schemas; if any are missing the per-payload required fields, add them
|
|
here.
|
|
- Inspect the notification service's routing data tables (the location
|
|
is internal to `notification/internal/...`); confirm the three types
|
|
are present with audience and channel decisions matching
|
|
[`./README.md` §Notification Contracts](./README.md). Add entries if
|
|
missing.
|
|
- Extend `notification/contract_asyncapi_test.go` if any new payload
|
|
schema entries were added.
|
|
|
|
Files touched (only if drift is found):
|
|
|
|
- `notification/api/intents-asyncapi.yaml`,
|
|
`notification/internal/...` (catalog data),
|
|
`notification/contract_asyncapi_test.go`.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/notificationintent_audit_test.go`.
|
|
|
|
Exit criteria:
|
|
|
|
- the freeze test passes.
|
|
- `notification/contract_asyncapi_test.go` and
|
|
`intent_acceptance_contract_test.go` continue to pass.
|
|
|
|
## ~~Stage 08.~~ GM module skeleton
|
|
|
|
Goal:
|
|
|
|
- create a buildable `gamemaster` binary that loads config, opens
|
|
dependencies, and exits cleanly on SIGTERM. It does no business work
|
|
yet.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/cmd/gamemaster/main.go` mirroring `rtmanager/cmd/rtmanager/main.go`.
|
|
- `gamemaster/internal/config/{config.go, env.go, validation.go}` with
|
|
env prefix `GAMEMASTER` and groups Listener, Postgres, Redis, Streams,
|
|
Engine client, Lobby internal client, RTM internal client, Scheduler,
|
|
Membership cache, Logging, Lifecycle, Telemetry. Required variables
|
|
fail-fast.
|
|
- `gamemaster/internal/logging/{logger.go, context.go}` copied from
|
|
lobby/notification.
|
|
- `gamemaster/internal/telemetry/runtime.go` registering the metrics
|
|
named in [`./README.md §Observability`](./README.md).
|
|
- `gamemaster/internal/app/{runtime.go, app.go, wiring.go, bootstrap.go}`
|
|
— empty wiring with PostgreSQL open, Redis open, telemetry open, probe
|
|
listener open.
|
|
- `gamemaster/internal/api/internalhttp/server.go` — listener with
|
|
`/healthz` and `/readyz` only.
|
|
- `gamemaster/Makefile` with the `jet` target (real generation lands in
|
|
Stage 09) and a `mocks` target.
|
|
- `gamemaster/go.mod` and `go.sum` with dependencies:
|
|
`github.com/redis/go-redis/v9`, `github.com/jackc/pgx/v5`,
|
|
`github.com/go-jet/jet/v2`, `github.com/pressly/goose/v3`,
|
|
`github.com/stretchr/testify`, `go.uber.org/mock`, the testcontainers
|
|
modules for postgres/redis, the OpenTelemetry stack identical to lobby,
|
|
`galaxy/cronutil`, `galaxy/notificationintent`, `galaxy/postgres`,
|
|
`galaxy/redisconn`, `galaxy/error`, `galaxy/util`.
|
|
- Update repo-level `go.work` — `./gamemaster` is already a workspace
|
|
member; verify the module path and `go.work.sum`.
|
|
|
|
Files new:
|
|
|
|
- the entire skeleton tree under `gamemaster/`.
|
|
|
|
Exit criteria:
|
|
|
|
- `go build ./gamemaster/cmd/gamemaster` succeeds.
|
|
- Running with valid env brings `/healthz` and `/readyz` up.
|
|
- `SIGTERM` returns within `GAMEMASTER_SHUTDOWN_TIMEOUT`.
|
|
|
|
## ~~Stage 09.~~ PostgreSQL schema, migrations, jet
|
|
|
|
Goal:
|
|
|
|
- finalise the persistence schema and the code-generation pipeline.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/adapters/postgres/migrations/00001_init.sql` —
|
|
`CREATE SCHEMA IF NOT EXISTS gamemaster;` plus the four tables and
|
|
indexes from [`./README.md §Persistence Layout`](./README.md):
|
|
`runtime_records`, `engine_versions`, `player_mappings`,
|
|
`operation_log`. All time columns are `timestamptz`.
|
|
- `gamemaster/internal/adapters/postgres/migrations/migrations.go` —
|
|
`//go:embed *.sql` and `FS()` exporter, identical pattern to lobby and
|
|
rtmanager.
|
|
- `gamemaster/cmd/jetgen/main.go` — testcontainers PostgreSQL + goose up +
|
|
jet generation against the resulting database. Mirrors
|
|
`rtmanager/cmd/jetgen/main.go`.
|
|
- Generated `gamemaster/internal/adapters/postgres/jet/...` committed to
|
|
the repo.
|
|
- Wire goose migrations into `gamemaster/internal/app/runtime.go`
|
|
startup so they apply before any listener opens; non-zero exit on
|
|
failure (matches `pkg/postgres` policy).
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- `make -C gamemaster jet` regenerates the jet code with no diff after a
|
|
clean run.
|
|
- Service start applies migrations to a fresh database and exits zero if
|
|
migrations are already applied.
|
|
|
|
## ~~Stage 10.~~ Domain layer and ports
|
|
|
|
Goal:
|
|
|
|
- lock the in-memory domain model and the port interfaces for adapters.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/domain/runtime/model.go` — `RuntimeRecord` struct;
|
|
status enum (`StatusStarting`, `StatusRunning`,
|
|
`StatusGenerationInProgress`, `StatusGenerationFailed`, `StatusStopped`,
|
|
`StatusEngineUnreachable`, `StatusFinished`); error sentinels.
|
|
- `gamemaster/internal/domain/runtime/transitions.go` — allowed
|
|
transitions table and a CAS-friendly validator.
|
|
- `gamemaster/internal/domain/engineversion/{model.go, semver.go}` —
|
|
`EngineVersion` struct (`Version`, `ImageRef`, `Options`, `Status`);
|
|
semver parse + patch-only comparison helpers.
|
|
- `gamemaster/internal/domain/playermapping/model.go` — `PlayerMapping`
|
|
struct (`GameID`, `UserID`, `RaceName`, `EnginePlayerUUID`).
|
|
- `gamemaster/internal/domain/schedule/nexttick.go` — wraps
|
|
`cronutil.Schedule`; carries `skip_next_tick` semantics on
|
|
`Next(after, skip bool) (time.Time, skipConsumed bool)`.
|
|
- `gamemaster/internal/ports/`:
|
|
- `runtimerecordstore.go` — `Get`, `Insert`, `UpdateStatus` (CAS by
|
|
expected status), `UpdateScheduling`, `ListDueRunning`, `ListByStatus`.
|
|
- `engineversionstore.go` — `Get`, `List` (with `status` filter),
|
|
`Insert`, `Update`, `Deprecate`, `IsReferencedByActiveRuntime`.
|
|
- `playermappingstore.go` — `BulkInsert`, `Get(gameID, userID)`,
|
|
`ListByGame(gameID)`, `DeleteByGame(gameID)`.
|
|
- `operationlog.go` — `Append`, `ListByGame`.
|
|
- `streamoffsetstore.go` — `Load`, `Save` (Redis offset persistence
|
|
per consumer label).
|
|
- `engineclient.go` — narrow surface GM uses: `Init`, `Status`, `Turn`,
|
|
`BanishRace`, `ExecuteCommands`, `PutOrders`, `GetReport`.
|
|
- `lobbyclient.go` — `GetMemberships(ctx, gameID) ([]Membership, error)`.
|
|
- `rtmclient.go` — `Stop(ctx, gameID, reason) error`,
|
|
`Patch(ctx, gameID, imageRef) error`, `Restart` (reserved; not in v1
|
|
feature scope).
|
|
- `notificationpublisher.go` — `Publish(ctx, intent) error`.
|
|
- `lobbyeventspublisher.go` — `PublishSnapshotUpdate`,
|
|
`PublishGameFinished`.
|
|
- `//go:generate mockgen` directive next to each interface declaration.
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- the package compiles.
|
|
- every interface has a `_ ports.X = (*Y)(nil)` assertion slot ready for
|
|
the adapters that follow.
|
|
- `go test ./gamemaster/internal/domain/...` passes.
|
|
|
|
## ~~Stage 11.~~ Persistence adapters
|
|
|
|
Goal:
|
|
|
|
- implement the four PostgreSQL stores and the Redis offset store.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/adapters/postgres/runtimerecordstore/store.go`
|
|
using jet. CAS semantics on `UpdateStatus` (expected status comparison
|
|
inside the SQL `UPDATE ... WHERE game_id = $1 AND status = $2`
|
|
pattern). `UpdateScheduling` mutates `next_generation_at` and
|
|
`skip_next_tick` together.
|
|
- `gamemaster/internal/adapters/postgres/engineversionstore/store.go`.
|
|
`IsReferencedByActiveRuntime` joins against
|
|
`runtime_records WHERE status NOT IN ('finished','stopped')`.
|
|
- `gamemaster/internal/adapters/postgres/playermappingstore/store.go`.
|
|
`BulkInsert` is a single `INSERT ... ON CONFLICT DO NOTHING`.
|
|
- `gamemaster/internal/adapters/postgres/operationlog/store.go`.
|
|
- `gamemaster/internal/adapters/redisstate/streamoffsets/store.go`
|
|
(mirror Lobby's and RTM's `redisstate/streamoffsets`).
|
|
- For each adapter: store-level integration tests against testcontainers
|
|
PostgreSQL or Redis. CAS semantics on `runtime_records.UpdateStatus`
|
|
are verified by an explicit concurrent-update test (only one of two
|
|
callers wins). The semver-patch comparison in `engineversion` is
|
|
verified against a curated table of cases.
|
|
|
|
Files new:
|
|
|
|
- as above and per-package `_test.go`.
|
|
|
|
Exit criteria:
|
|
|
|
- store tests pass on a CI runner with Docker available.
|
|
|
|
## ~~Stage 12.~~ External clients (engine, lobby, RTM, notification, lobby-events)
|
|
|
|
Goal:
|
|
|
|
- ship the HTTP and Redis adapters that GM uses to talk to the engine,
|
|
Lobby internal API, RTM internal API, the notification stream, and the
|
|
lobby-events stream.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/adapters/engineclient/client.go` — REST client
|
|
over an `otelhttp`-wrapped `http.Client`. Implements `ports.EngineClient`
|
|
by calling the renamed admin endpoints (`/api/v1/admin/init`,
|
|
`/admin/status`, `/admin/turn`, `/admin/race/banish`) and the player
|
|
endpoints (`/api/v1/command`, `/api/v1/order`, `/api/v1/report`).
|
|
Builds and consumes the existing JSON shapes from `game/openapi.yaml`.
|
|
- `gamemaster/internal/adapters/lobbyclient/client.go` — REST client for
|
|
`GET /api/v1/internal/games/{game_id}/memberships`. Returns a typed
|
|
`Membership` slice.
|
|
- `gamemaster/internal/adapters/rtmclient/client.go` — REST client for
|
|
`POST /api/v1/internal/runtimes/{game_id}/stop` and `/patch`.
|
|
- `gamemaster/internal/adapters/notificationpublisher/publisher.go` —
|
|
thin XADD wrapper over `notification:intents` using
|
|
`galaxy/notificationintent` constructors.
|
|
- `gamemaster/internal/adapters/lobbyeventspublisher/publisher.go` —
|
|
XADD wrapper for `gm:lobby_events`. Two methods:
|
|
`PublishSnapshotUpdate(ctx, msg)` and
|
|
`PublishGameFinished(ctx, msg)`. Schema enforced inline against
|
|
`runtime-events-asyncapi.yaml`.
|
|
- `gamemaster/internal/adapters/mocks/` — `mockgen`-generated mocks for
|
|
every `ports.*` interface. Regenerated by `make -C gamemaster mocks`.
|
|
- Per-adapter unit tests with mocks for the clients (httptest server for
|
|
REST adapters; miniredis for the publishers).
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- mocks regenerate cleanly via `go generate`.
|
|
- unit tests pass.
|
|
- `go test ./gamemaster/internal/adapters/...` passes.
|
|
|
|
## ~~Stage 13.~~ Service: register-runtime
|
|
|
|
Goal:
|
|
|
|
- end-to-end `register-runtime` operation: validate, persist initial
|
|
record, call engine `/admin/init`, persist player mappings, mark
|
|
running, schedule first turn.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/service/registerruntime/service.go` orchestrator,
|
|
following the flow from [`./README.md §Lifecycles → Register-runtime`](./README.md):
|
|
- validate envelope;
|
|
- reject if `runtime_records.{game_id}` exists;
|
|
- resolve `image_ref` for `target_engine_version` from
|
|
`engine_versions`;
|
|
- persist `runtime_records.status=starting`;
|
|
- call engine `/admin/init`;
|
|
- persist `player_mappings` rows from the engine response;
|
|
- CAS `status: starting → running`, persist `current_turn=0` and
|
|
initial `next_generation_at`;
|
|
- append `operation_log`;
|
|
- publish `runtime_snapshot_update`;
|
|
- return persisted runtime record.
|
|
- Failure paths: roll back `runtime_records` on engine failure; ensure no
|
|
orphan `player_mappings` rows; record failure in `operation_log`.
|
|
- Unit tests cover happy path, idempotent re-registration (returns
|
|
`conflict`), engine 4xx (`engine_validation_error`), engine 5xx
|
|
(`engine_unreachable`), missing engine version
|
|
(`engine_version_not_found`), partial-rollback paths.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/service/registerruntime/{service.go, service_test.go,
|
|
errors.go}`.
|
|
|
|
Exit criteria:
|
|
|
|
- service-level tests pass.
|
|
|
|
## ~~Stage 14.~~ Service: engine version registry CRUD + image-ref resolve
|
|
|
|
Goal:
|
|
|
|
- the registry surface used by Lobby's start flow and by Admin Service.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/service/engineversion/service.go`:
|
|
- `List(ctx, statusFilter)` — list versions optionally filtered by
|
|
`status`;
|
|
- `Get(ctx, version)` — read one;
|
|
- `Create(ctx, version, imageRef, options)` — validate semver,
|
|
validate Docker reference shape, persist;
|
|
- `Update(ctx, version, patch)` — partial update (`image_ref`,
|
|
`options`, `status`);
|
|
- `Deprecate(ctx, version)` — set `status=deprecated`;
|
|
- `Delete(ctx, version)` — hard delete; rejected with
|
|
`engine_version_in_use` if `IsReferencedByActiveRuntime` returns
|
|
true;
|
|
- `ResolveImageRef(ctx, version)` — read `image_ref` only; this is the
|
|
hot path used by Lobby.
|
|
- Unit tests cover create-validate, delete-when-active rejection, and
|
|
semver shape validation. Resolve is tested against a seeded table of
|
|
versions.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/service/engineversion/{service.go, service_test.go,
|
|
errors.go}`.
|
|
|
|
Exit criteria:
|
|
|
|
- service-level tests pass.
|
|
|
|
## ~~Stage 15.~~ Service: scheduler + turn generation + snapshot publisher
|
|
|
|
Goal:
|
|
|
|
- the heart of GM: the periodic scheduler and the turn-generation flow,
|
|
with snapshot publication and finish detection.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/service/turngeneration/service.go`:
|
|
- input: `gameID`, `trigger ∈ {scheduler, force}`;
|
|
- CAS `status: running → generation_in_progress`;
|
|
- call engine `/admin/turn`;
|
|
- on success: persist `current_turn`, evaluate `finished`, branch:
|
|
- finished: CAS `status → finished`, persist `finished_at`,
|
|
`PublishGameFinished`, publish `game.finished` notification, return;
|
|
- not finished: CAS `status → running`, recompute
|
|
`next_generation_at` (skip a tick if `skip_next_tick=true`,
|
|
then clear), `PublishSnapshotUpdate`, publish `game.turn.ready`
|
|
notification, return;
|
|
- on failure: CAS `status → generation_failed`, publish
|
|
`runtime_snapshot_update` reflecting the new status, publish
|
|
`game.generation_failed` admin notification, return.
|
|
- `gamemaster/internal/service/scheduler/service.go`:
|
|
- thin wrapper that builds the next-tick value from
|
|
`domain/schedule.NextTick` given `turn_schedule` and
|
|
`skip_next_tick`;
|
|
- reused by both the ticker worker (Stage 19 wires it) and by the
|
|
`force-next-turn` admin op (Stage 17).
|
|
- `gamemaster/internal/worker/schedulerticker/worker.go`:
|
|
- 1-second loop;
|
|
- calls `runtime_records.ListDueRunning(now)` and runs
|
|
`turngeneration.Run(ctx, gameID, scheduler)` per game;
|
|
- serialises per-`game_id` calls (one in-flight per game; concurrent
|
|
games proceed in parallel).
|
|
- Unit tests cover happy path, finish detection, force trigger with skip
|
|
consumption, generation failure, CAS contention with a concurrent
|
|
external status change (e.g., admin stop).
|
|
- Player turn stats are derived from `StateResponse.player[]` and
|
|
projected to `{user_id, planets, population}` via
|
|
`playermappingstore.ListByGame`.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/service/turngeneration/{service.go, service_test.go,
|
|
errors.go}`,
|
|
`gamemaster/internal/service/scheduler/{service.go, service_test.go}`,
|
|
`gamemaster/internal/worker/schedulerticker/{worker.go, worker_test.go}`.
|
|
|
|
Exit criteria:
|
|
|
|
- service-level tests pass.
|
|
|
|
## ~~Stage 16.~~ Service: hot-path command + order + report + membership cache
|
|
|
|
Goal:
|
|
|
|
- the gateway-facing trio: command execution, order submission, report
|
|
reading. Membership cache and the invalidation hook.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/service/membership/cache.go`:
|
|
- in-process `map[gameID]entry{members map[userID]MembershipStatus,
|
|
loadedAt}`;
|
|
- `Resolve(ctx, gameID, userID) (status, error)` — checks cache, falls
|
|
back to `lobbyclient.GetMemberships` on miss or TTL expiry;
|
|
- `Invalidate(gameID)` — purges the cache entry;
|
|
- LRU eviction governed by
|
|
`GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES`.
|
|
- `gamemaster/internal/service/commandexecute/service.go`:
|
|
- input: `gameID`, `userID`, payload `{commands:[…]}`;
|
|
- validate `runtime_records.{game_id}` exists with
|
|
`status=running`;
|
|
- resolve membership; reject if not active;
|
|
- resolve `race_name` from `playermappingstore`;
|
|
- call engine `/api/v1/command` with `CommandRequest{actor=race_name,
|
|
cmd=…}`;
|
|
- return engine response verbatim.
|
|
- `gamemaster/internal/service/orderput/service.go`: identical structure,
|
|
calls `/api/v1/order`.
|
|
- `gamemaster/internal/service/reportget/service.go`: input
|
|
`{gameID, userID, turn}`; resolves `race_name`; calls
|
|
`/api/v1/report?player=…&turn=…`; returns body verbatim.
|
|
- Unit tests: each service covers happy path, runtime-not-running,
|
|
forbidden, engine 4xx, engine 5xx; membership cache tests cover hit,
|
|
miss, TTL expiry, invalidate.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/service/membership/{cache.go, cache_test.go}`,
|
|
`gamemaster/internal/service/commandexecute/{service.go, service_test.go}`,
|
|
`gamemaster/internal/service/orderput/{service.go, service_test.go}`,
|
|
`gamemaster/internal/service/reportget/{service.go, service_test.go}`.
|
|
|
|
Exit criteria:
|
|
|
|
- service-level tests pass.
|
|
|
|
## ~~Stage 17.~~ Service: admin operations (stop, force-next-turn, patch, banish, liveness)
|
|
|
|
Goal:
|
|
|
|
- the remaining service-layer operations: admin/runtime control plus the
|
|
Lobby-facing liveness reply.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/service/adminstop/service.go`:
|
|
- input `{gameID, reason}`;
|
|
- call `rtmclient.Stop(ctx, gameID, reason)`;
|
|
- on success: CAS `runtime_records.status: * → stopped`; append
|
|
`operation_log`; publish `runtime_snapshot_update`.
|
|
- `gamemaster/internal/service/adminforce/service.go`:
|
|
- run `turngeneration.Run(ctx, gameID, force)` synchronously;
|
|
- on success, set `runtime_records.skip_next_tick = true` (the next
|
|
scheduler-driven `Next` consumes it).
|
|
- `gamemaster/internal/service/adminpatch/service.go`:
|
|
- input `{gameID, version}`;
|
|
- resolve new `image_ref` via `engineversion.ResolveImageRef`;
|
|
- validate semver-patch against current
|
|
`runtime_records.current_engine_version`; reject with
|
|
`semver_patch_only` otherwise;
|
|
- call `rtmclient.Patch(ctx, gameID, imageRef)`;
|
|
- on success: persist new `current_image_ref` and
|
|
`current_engine_version`; append `operation_log`.
|
|
- `gamemaster/internal/service/adminbanish/service.go`:
|
|
- input `{gameID, raceName}`;
|
|
- validate `playermappingstore.GetByRace(gameID, raceName)` exists;
|
|
- call engine `/admin/race/banish`;
|
|
- append `operation_log`.
|
|
- `gamemaster/internal/service/livenessreply/service.go`:
|
|
- lookup `runtime_records.{game_id}`;
|
|
- return `{ready: status==running, status: <observed>}`.
|
|
- Unit tests for each service cover happy path and each documented error
|
|
code.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/service/adminstop/...`,
|
|
`gamemaster/internal/service/adminforce/...`,
|
|
`gamemaster/internal/service/adminpatch/...`,
|
|
`gamemaster/internal/service/adminbanish/...`,
|
|
`gamemaster/internal/service/livenessreply/...`.
|
|
|
|
Exit criteria:
|
|
|
|
- service-level tests pass.
|
|
|
|
## ~~Stage 18.~~ Async consumer: `runtime:health_events`
|
|
|
|
Goal:
|
|
|
|
- bring runtime health into GM's view per game and propagate to Lobby
|
|
via the snapshot stream.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/worker/healtheventsconsumer/worker.go`:
|
|
- XREADs `runtime:health_events` with a persisted offset (via
|
|
`streamoffsetstore`);
|
|
- decodes the AsyncAPI envelope from RTM;
|
|
- updates `runtime_records.engine_health` per `game_id`;
|
|
- emits a debounced `runtime_snapshot_update` only when the summary
|
|
string changes.
|
|
- The summary derivation rule:
|
|
- `healthy` ⇒ summary `healthy`;
|
|
- `probe_failed` after threshold ⇒ summary `probe_failed`;
|
|
- `inspect_unhealthy` ⇒ summary `inspect_unhealthy`;
|
|
- `container_exited` ⇒ summary `exited` and CAS `status →
|
|
engine_unreachable`;
|
|
- `container_oom` ⇒ summary `oom` and CAS `status →
|
|
engine_unreachable`;
|
|
- `container_disappeared` ⇒ summary `disappeared` and CAS
|
|
`status → engine_unreachable`.
|
|
- Unit tests use `miniredis` and the AsyncAPI fixture from
|
|
`rtmanager/api/runtime-health-asyncapi.yaml`.
|
|
|
|
Files new:
|
|
|
|
- `gamemaster/internal/worker/healtheventsconsumer/{worker.go, worker_test.go}`.
|
|
|
|
Exit criteria:
|
|
|
|
- worker tests pass.
|
|
|
|
## ~~Stage 19.~~ Internal REST handlers
|
|
|
|
Goal:
|
|
|
|
- ship the gateway-, Lobby-, and Admin-facing REST surface backed by
|
|
the service layer.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/internal/api/internalhttp/handlers/{registerruntime,
|
|
getruntime, listruntimes, forcenextturn, stopruntime, patchruntime,
|
|
banishrace, invalidatememberships, gameliveness, listengineversions,
|
|
createengineversion, getengineversion, updateengineversion,
|
|
deprecateengineversion, resolveengineversionimageref, executecommands,
|
|
putorders, getreport}.go` — one file per operation, each delegating to
|
|
the corresponding service. JSON in / JSON out. Unknown JSON fields
|
|
rejected with `invalid_request`.
|
|
- Error envelope identical to lobby and rtmanager.
|
|
- Wiring under the existing internal HTTP listener; route registration
|
|
in `gamemaster/internal/app/wiring.go`.
|
|
- Handler-level table-driven tests.
|
|
- OpenAPI conformance test that loads `api/internal-openapi.yaml` and
|
|
asserts every defined operation is reachable and matches its declared
|
|
response shape.
|
|
|
|
Files new:
|
|
|
|
- handlers + tests + the conformance test
|
|
`gamemaster/api/openapi_conformance_test.go`.
|
|
|
|
Exit criteria:
|
|
|
|
- OpenAPI conformance test passes for every endpoint.
|
|
- Handlers reject unknown JSON fields.
|
|
|
|
## Stage 20. Lobby refactor
|
|
|
|
Goal:
|
|
|
|
- complete the Lobby side of the new image-resolve and membership
|
|
invalidation contract.
|
|
|
|
Tasks:
|
|
|
|
- Replace `lobby/internal/domain/engineimage/resolver.go` with a thin
|
|
GM-client wrapper. The package goes away; the call site in
|
|
`lobby/internal/service/startgame/service.go` switches from
|
|
`engineimage.Resolver{}.Resolve(version)` to
|
|
`gmClient.ResolveImageRef(ctx, version)`.
|
|
- Drop `LOBBY_ENGINE_IMAGE_TEMPLATE` from
|
|
`lobby/internal/config/{config.go, env.go, validation.go}`. Remove the
|
|
validation function and the related env-var test cases.
|
|
- Add `InvalidateMemberships(ctx, gameID) error` to
|
|
`lobby/internal/ports/gmclient.go`. Regenerate the `mockgen`-mock and
|
|
update the inmem fake to record invocations.
|
|
- Wire the new call from:
|
|
- `lobby/internal/service/approveapplication/service.go` — post-commit;
|
|
- `lobby/internal/service/rejectapplication/service.go` — post-commit
|
|
(only if a reservation existed prior);
|
|
- `lobby/internal/service/redeeminvite/service.go` — post-commit;
|
|
- `lobby/internal/service/removemember/service.go` — post-commit
|
|
(already in scope of removal);
|
|
- `lobby/internal/service/blockmember/service.go` — post-commit;
|
|
- `lobby/internal/worker/userlifecycle/consumer.go` — post-commit per
|
|
game in the cascade.
|
|
- Failed invalidation is logged at `warn` and incremented in the
|
|
existing `lobby.notification.publish_attempts` style metric (or a new
|
|
`lobby.gm_invalidation.publish_attempts`) but does not roll back the
|
|
business commit. TTL on GM is the safety net.
|
|
- Update Lobby unit tests, in particular the start-flow tests (replace
|
|
`engineimage` mock with `gmclient.ResolveImageRef` mock) and the
|
|
membership-mutation tests (assert `InvalidateMemberships` was called
|
|
post-commit).
|
|
- Update `lobby/api/internal-openapi.yaml` only if any new field
|
|
surfaces (none expected; the call shape is on Lobby's outbound side,
|
|
not on its REST surface).
|
|
|
|
Files touched:
|
|
|
|
- `lobby/internal/service/{startgame, approveapplication,
|
|
rejectapplication, redeeminvite, removemember, blockmember}/`,
|
|
`lobby/internal/worker/userlifecycle/`,
|
|
`lobby/internal/config/{config.go, env.go, validation.go}`,
|
|
`lobby/internal/ports/gmclient.go`,
|
|
`lobby/internal/adapters/gmclient/client.go`,
|
|
`lobby/internal/adapters/mocks/gmclient/...`,
|
|
`lobby/internal/adapters/gmclientinmem/...` (if the inmem fake
|
|
exists; otherwise the mockgen mock plus the migration described in
|
|
RTM stage 22 is enough).
|
|
|
|
Files removed:
|
|
|
|
- `lobby/internal/domain/engineimage/` (entire package).
|
|
|
|
Exit criteria:
|
|
|
|
- `go test ./lobby/...` passes.
|
|
- `LOBBY_ENGINE_IMAGE_TEMPLATE` no longer appears in any Lobby source or
|
|
documentation.
|
|
- Lobby's start-flow integration test still passes against a stub
|
|
`gmclient` that returns `image_ref` synchronously.
|
|
|
|
## Stage 21. Service-local integration suite
|
|
|
|
Goal:
|
|
|
|
- end-to-end suite running against testcontainers PostgreSQL + Redis +
|
|
the real `galaxy/game` engine container.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/integration/harness/` — set up PostgreSQL with
|
|
goose-applied migrations; Redis (testcontainers Redis for
|
|
coordination suites that exercise streams); ensure the Docker bridge
|
|
network exists; build `galaxy/game` test image once per package run
|
|
with `sync.Once`; tear everything down via `t.Cleanup`. Reuse the
|
|
RTM-built image where possible (skip rebuilding when present).
|
|
- `gamemaster/integration/registerruntime_test.go` — register-runtime
|
|
happy path: GM persists the runtime record, calls engine
|
|
`/admin/init`, persists `player_mappings`, transitions to `running`,
|
|
publishes a `runtime_snapshot_update`. Engine answers with a real
|
|
`StateResponse`.
|
|
- `gamemaster/integration/scheduler_test.go` — schedules a five-second
|
|
turn cron, observes one tick, asserts engine `/admin/turn` was hit and
|
|
`current_turn` advanced. Force-next-turn test asserts `skip_next_tick`
|
|
consumes the next regular tick.
|
|
- `gamemaster/integration/hotpath_test.go` — full command, order, and
|
|
report round-trips against the real engine. Membership invalidation
|
|
hook test asserts the cache flushes on demand.
|
|
- `gamemaster/integration/adminops_test.go` — admin stop calls a stub
|
|
RTM and asserts the runtime record transitions to `stopped`. Admin
|
|
patch with a non-patch semver target fails with `semver_patch_only`.
|
|
Admin banish hits the engine endpoint.
|
|
- `gamemaster/integration/healthevents_test.go` — publishes a fake
|
|
`runtime:health_events` entry, asserts the consumer updates
|
|
`engine_health` and emits a debounced snapshot.
|
|
- `gamemaster/integration/notification_test.go` — observe
|
|
`notification:intents` after a successful turn (`game.turn.ready`),
|
|
after a finish (`game.finished`), and after a forced engine failure
|
|
(`game.generation_failed` admin email).
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- `go test ./gamemaster/integration/...` passes locally with Docker
|
|
available.
|
|
- CI runs the suite under a profile that exposes the Docker socket.
|
|
|
|
## Stage 22. Inter-service test: Lobby ↔ GM
|
|
|
|
Goal:
|
|
|
|
- exercise the new image-ref resolve, register-runtime, and membership
|
|
invalidation paths end-to-end without RTM in the loop.
|
|
|
|
Tasks:
|
|
|
|
- `integration/lobbygm/` (top-level integration directory, mirroring
|
|
existing `integration/lobbyrtm`): runs real Lobby, real GM, real
|
|
PostgreSQL, real Redis, a stub RTM that simply returns success on
|
|
`runtime:start_jobs`, and the real `galaxy/game` test engine container.
|
|
- Scenarios:
|
|
- Lobby creates a game, resolves `image_ref` from GM, publishes a
|
|
start_job, the stub RTM acks success, Lobby calls
|
|
`register-runtime` on GM, GM `/admin/init`s the engine, GM transitions
|
|
to `running`, GM publishes `runtime_snapshot_update`, Lobby updates
|
|
its denormalised view.
|
|
- One full turn generation cycle: scheduler ticks, GM calls engine
|
|
`/admin/turn`, GM publishes `runtime_snapshot_update`, Lobby's
|
|
per-game stats aggregate updates.
|
|
- Membership change: an admin removes a member; Lobby's
|
|
`removemember` post-commit calls GM `invalidate-memberships`; the
|
|
next player command from that user fails with `forbidden`.
|
|
- Game finish: engine returns `finished:true`; GM publishes
|
|
`game_finished`; Lobby transitions the platform game record to
|
|
`finished` and runs the capability evaluator.
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- all scenarios pass in CI when the Docker socket is available.
|
|
|
|
## Stage 23. Inter-service test: Lobby ↔ GM ↔ RTM (full happy path)
|
|
|
|
Goal:
|
|
|
|
- the canonical end-to-end test covering the whole running-game pipeline.
|
|
|
|
Tasks:
|
|
|
|
- `integration/lobbygmrtm/`: runs real Lobby, real GM, real RTM, real
|
|
PostgreSQL, real Redis, and the real `galaxy/game` test engine
|
|
container.
|
|
- Scenarios:
|
|
- Happy path: enrollment → start → RTM container → GM register-runtime
|
|
→ engine `/admin/init` → first player command → first scheduled turn
|
|
→ engine `finished:true` → GM `game_finished` → Lobby transitions to
|
|
`finished` → RTM cleanup TTL.
|
|
- Failure path A: RTM reports `start_config_invalid` on
|
|
`runtime:job_results`; Lobby transitions the game to `start_failed`;
|
|
no GM register-runtime is attempted.
|
|
- Failure path B: container starts but GM is unavailable when Lobby
|
|
calls `register-runtime`; Lobby transitions the game to `paused` and
|
|
publishes `lobby.runtime_paused_after_start`; once GM comes back,
|
|
Lobby's resume flow calls GM `/liveness`, receives `ready=true`,
|
|
re-issues `register-runtime`, and the game reaches `running`.
|
|
|
|
Files new:
|
|
|
|
- as above.
|
|
|
|
Exit criteria:
|
|
|
|
- all scenarios pass in CI when the Docker socket is available.
|
|
|
|
## Stage 24. Service-local docs
|
|
|
|
Goal:
|
|
|
|
- drop per-stage decisions captured during this plan into discoverable
|
|
service-local documentation, mirroring `lobby/docs/` and
|
|
`rtmanager/docs/`.
|
|
|
|
Tasks:
|
|
|
|
- `gamemaster/docs/README.md` — index pointing at the five content docs
|
|
and the postgres-migration record.
|
|
- `gamemaster/docs/runtime.md` — components, processes, in-memory state
|
|
of each worker.
|
|
- `gamemaster/docs/flows.md` — Mermaid diagrams for: register-runtime,
|
|
turn generation, force-next-turn skip, hot-path command, admin patch,
|
|
finish, health consumption, banish.
|
|
- `gamemaster/docs/runbook.md` — operator scenarios: «engine became
|
|
unreachable», «turn generation failed and stuck», «patch upgrade»,
|
|
«manual force-next-turn», «engine version registry rotation»,
|
|
«membership cache appears stale».
|
|
- `gamemaster/docs/examples.md` — env-var examples per environment
|
|
(dev / test / prod skeletons), example payloads for each stream and
|
|
each REST endpoint.
|
|
- `gamemaster/docs/postgres-migration.md` — decision record for the
|
|
schema (mirrors `notification/docs/postgres-migration.md` style).
|
|
- Add per-stage decision records under `gamemaster/docs/stage<NN>-*.md`
|
|
for any stage that produced a noteworthy decision (mirroring the RTM
|
|
pattern). At minimum:
|
|
- `stage11-persistence-adapters.md`,
|
|
- `stage12-external-clients.md`,
|
|
- `stage15-scheduler-and-turn-generation.md`,
|
|
- `stage16-membership-cache-and-invalidation.md`,
|
|
- `stage17-admin-operations.md`,
|
|
- `stage18-health-events-consumer.md`,
|
|
- `stage20-lobby-refactor.md`.
|
|
|
|
Files new:
|
|
|
|
- all of the above.
|
|
|
|
Exit criteria:
|
|
|
|
- the README of GM links to `docs/README.md`.
|
|
- a reviewer can find any operational how-to within two clicks.
|
|
|
|
## Final Acceptance Criteria
|
|
|
|
- `go build ./...` from the repository root succeeds.
|
|
- `go test ./...` from the repository root passes.
|
|
- `go test -tags=integration ./gamemaster/integration/...` passes when
|
|
Docker is available.
|
|
- `go test ./integration/lobbygm/...` and
|
|
`go test ./integration/lobbygmrtm/...` pass when Docker is available.
|
|
- `make -C gamemaster jet` regenerates jet code with no diff after a
|
|
clean run.
|
|
- `make -C gamemaster mocks` regenerates mock code with no diff after a
|
|
clean run.
|
|
- Manual smoke: bring Lobby + GM + RTM + the rest of the stack up via
|
|
the existing dev compose; create a game; observe a real
|
|
`galaxy-game-{game_id}` container; play one turn round-trip; observe
|
|
a `runtime_snapshot_update` on `gm:lobby_events`; force-next-turn;
|
|
observe the next scheduled tick is skipped; stop the game; the
|
|
container moves to `exited`.
|
|
- Documentation across `ARCHITECTURE.md`, `gamemaster/`, `lobby/`,
|
|
`notification/`, `game/`, and `rtmanager/` is internally consistent.
|
|
|
|
## Out of Scope
|
|
|
|
- Multi-instance GM with leader election (`Game Master` runs as a single
|
|
process in v1).
|
|
- Engine state file management (backup, archival, host-side cleanup).
|
|
- Direct gateway routing of admin `message_type` values (admin operations
|
|
land via Admin Service in a later iteration; v1 exposes only the GM
|
|
internal REST surface).
|
|
- TLS / mTLS on the internal listener.
|
|
- Engine-version automatic patch upgrades (manual admin operation only).
|
|
- A pause/resume flow on GM's side beyond the liveness-check reply.
|
|
|
|
## Risks and Notes
|
|
|
|
- The membership invalidation hook from Lobby into GM is a deliberate
|
|
tight coupling. TTL stays as the safety net for any failed invalidation;
|
|
the explicit hook only optimises for the staleness window. Failure to
|
|
invalidate is logged but never rolls back Lobby state. This trade-off
|
|
is recorded in [`./README.md` §Hot Path](./README.md).
|
|
- Lobby refactor (Stage 20) gates on GM stages 14 (engine version registry
|
|
resolve endpoint) and 19 (handlers wired). Once Lobby switches to GM
|
|
for image-ref resolution, Lobby cannot start a game when GM is
|
|
unavailable; this is documented as the new failure mode in
|
|
`lobby/README.md` (Stage 03).
|
|
- Engine path rename (Stage 05) is internal to `galaxy/game`. No other
|
|
service today calls `/api/v1/init`, `/api/v1/status`, or
|
|
`/api/v1/turn` (RTM probes only `/healthz`); the rename is therefore a
|
|
contained change inside the engine module. The user owns the
|
|
conditional logic that fills `StateResponse.finished` and the
|
|
body-level mechanics of `banish`.
|
|
- GM single-instance is a single point of failure for turn generation in
|
|
v1. The trade-off is acceptable for the prototype and is documented in
|
|
`gamemaster/README.md §Non-Goals`.
|
|
- Pre-launch single-init policy applies to GM exactly as documented in
|
|
`ARCHITECTURE.md §Persistence Backends`: schema evolves by editing
|
|
`00001_init.sql` until first production deploy.
|