feat: gamemaster
This commit is contained in:
@@ -1450,3 +1450,16 @@ and the addition of the `reason` enum to the stop envelope — are owned by
|
||||
the Runtime Manager implementation plan, not by this document. See
|
||||
[`../rtmanager/PLAN.md`](../rtmanager/PLAN.md) §«Stage 06. Lobby publisher
|
||||
refactor». No new stages are added here for that work.
|
||||
|
||||
## Note: Game Master Refactor (image-ref + membership invalidate)
|
||||
|
||||
The retirement of `LOBBY_ENGINE_IMAGE_TEMPLATE` together with the
|
||||
inline `engineimage.Resolver` package, the synchronous switch to
|
||||
`Game Master`'s `GET /api/v1/internal/engine-versions/{version}/image-ref`
|
||||
for image-ref resolution, and the new outgoing
|
||||
`POST /api/v1/internal/games/{game_id}/memberships/invalidate` hook from
|
||||
`approveapplication`, `rejectapplication`, `redeeminvite`,
|
||||
`removemember`, `blockmember`, and the user-lifecycle cascade worker
|
||||
are owned by the Game Master implementation plan, not by this document.
|
||||
See [`../gamemaster/PLAN.md`](../gamemaster/PLAN.md) §«Stage 20. Lobby
|
||||
refactor». No new stages are added here for that work.
|
||||
|
||||
+76
-66
@@ -150,7 +150,9 @@ The service starts two HTTP listeners and one Redis Stream consumer pipeline.
|
||||
- `User Service` reachable at `LOBBY_USER_SERVICE_BASE_URL` (startup check only;
|
||||
runtime failures are surfaced as request errors, not boot failures)
|
||||
- `Game Master` at `LOBBY_GM_BASE_URL` (same policy — startup check omitted;
|
||||
unreachability at registration triggers the forced-pause path)
|
||||
unreachability at image-ref resolve fails `lobby.game.start` with
|
||||
`service_unavailable`, unreachability at register-runtime triggers the
|
||||
forced-pause path)
|
||||
|
||||
### Probes
|
||||
|
||||
@@ -714,27 +716,55 @@ sequenceDiagram
|
||||
|
||||
Admin->>Lobby: lobby.game.start
|
||||
Lobby->>Lobby: validate ready_to_start + roster
|
||||
Lobby->>Lobby: status → starting
|
||||
Lobby->>Redis: publish start job to runtime:start_jobs
|
||||
Runtime->>Runtime: start container
|
||||
Runtime->>Redis: publish result to runtime:job_results
|
||||
Lobby->>GM: GET /internal/engine-versions/{version}/image-ref (sync)
|
||||
alt GM image-ref resolve failed
|
||||
GM-->>Lobby: error / timeout / not found
|
||||
Lobby-->>Admin: service_unavailable (GM unreachable) or engine_version_not_found
|
||||
else image_ref resolved
|
||||
GM-->>Lobby: 200 OK { image_ref }
|
||||
Lobby->>Lobby: status → starting
|
||||
Lobby->>Redis: publish start job to runtime:start_jobs (with image_ref)
|
||||
Runtime->>Runtime: start container
|
||||
Runtime->>Redis: publish result to runtime:job_results
|
||||
|
||||
alt container start failed
|
||||
Lobby->>Lobby: status → start_failed
|
||||
else container started
|
||||
Lobby->>Lobby: persist runtime binding
|
||||
Lobby->>GM: POST /internal/games/{game_id}/register (sync)
|
||||
alt GM registration success
|
||||
GM-->>Lobby: 200 OK
|
||||
Lobby->>Lobby: status → running; set started_at
|
||||
else GM unavailable
|
||||
GM-->>Lobby: error / timeout
|
||||
Lobby->>Lobby: status → paused
|
||||
Lobby->>Redis: publish lobby.runtime_paused_after_start intent
|
||||
alt container start failed
|
||||
Lobby->>Lobby: status → start_failed
|
||||
else container started
|
||||
Lobby->>Lobby: persist runtime binding
|
||||
Lobby->>GM: POST /internal/games/{game_id}/register-runtime (sync)
|
||||
alt GM registration success
|
||||
GM-->>Lobby: 200 OK
|
||||
Lobby->>Lobby: status → running; set started_at
|
||||
else GM unavailable
|
||||
GM-->>Lobby: error / timeout
|
||||
Lobby->>Lobby: status → paused
|
||||
Lobby->>Redis: publish lobby.runtime_paused_after_start intent
|
||||
end
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
### Image-ref resolution (synchronous via Game Master)
|
||||
|
||||
Before publishing the start job, `Lobby` resolves the Docker `image_ref`
|
||||
for `target_engine_version` by calling
|
||||
`GET /api/v1/internal/engine-versions/{version}/image-ref` on `Game Master`'s
|
||||
internal port. The call is synchronous and runs while the game is still
|
||||
in `ready_to_start`:
|
||||
|
||||
- success ⇒ `Lobby` proceeds to `starting`, embeds the resolved
|
||||
`image_ref` into the `runtime:start_jobs` envelope, and publishes;
|
||||
- the version is missing or deprecated on GM (`engine_version_not_found`)
|
||||
⇒ `lobby.game.start` returns `engine_version_not_found`; the game stays
|
||||
in `ready_to_start`;
|
||||
- GM is unreachable (network error, timeout, `5xx`) ⇒ `lobby.game.start`
|
||||
returns `service_unavailable`; the game stays in `ready_to_start` and
|
||||
the operator can retry.
|
||||
|
||||
Resolving against GM is the v1 contract; the legacy
|
||||
`LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template variable is retired together
|
||||
with the inline `engineimage.Resolver`.
|
||||
|
||||
### Critical invariants
|
||||
|
||||
- If the container starts but `Lobby` cannot persist the runtime binding metadata,
|
||||
@@ -743,6 +773,10 @@ sequenceDiagram
|
||||
- If metadata is persisted but `Game Master` is unavailable, the game must be
|
||||
placed in `paused`, not in `start_failed`. The container is alive; only the
|
||||
platform tracking is incomplete.
|
||||
- If `Game Master` is unavailable at image-ref resolve time, the start
|
||||
command itself fails with `service_unavailable`. The game stays in
|
||||
`ready_to_start`; no container is created and no `runtime:start_jobs`
|
||||
envelope is published.
|
||||
- No start job is accepted while the game is not in `ready_to_start`.
|
||||
- Concurrent start attempts for the same game must be serialized; the second
|
||||
attempt must fail if the first already moved the game to `starting`.
|
||||
@@ -758,7 +792,7 @@ is no synchronous Lobby→RTM REST call in v1 or planned for v2.
|
||||
| Field | Type | Notes |
|
||||
| --- | --- | --- |
|
||||
| `game_id` | string | Lobby `game_id`. |
|
||||
| `image_ref` | string | Docker reference resolved from `target_engine_version` via `LOBBY_ENGINE_IMAGE_TEMPLATE`. |
|
||||
| `image_ref` | string | Docker reference resolved synchronously from `target_engine_version` against `Game Master`'s engine version registry; see §Game Start Flow. |
|
||||
| `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. |
|
||||
|
||||
`runtime:stop_jobs` envelope:
|
||||
@@ -803,40 +837,6 @@ Alternatives considered and rejected:
|
||||
outside that package and would have to depend on a concrete adapter
|
||||
for an enum value.
|
||||
|
||||
### Design rationale: `engineimage.Resolver` validates the template at construction
|
||||
|
||||
`engineimage.Resolver` stores the validated template; the per-game
|
||||
`Resolve(version)` call is therefore a pure string substitution that
|
||||
cannot fail except on an empty `version`.
|
||||
|
||||
`LOBBY_ENGINE_IMAGE_TEMPLATE` is loaded at startup. A malformed value
|
||||
(missing `{engine_version}` placeholder, empty string) is an
|
||||
operational misconfiguration that fails fast before any traffic arrives
|
||||
— not on the first start-game request hours later. The synchronous
|
||||
start handler then incurs no per-call template-shape recheck.
|
||||
|
||||
A stateless free function `engineimage.Resolve(template, version)` was
|
||||
rejected: the only useful checkpoint for the template literal is at
|
||||
startup; a free function would either re-validate on every call (waste)
|
||||
or skip validation (regression).
|
||||
|
||||
The resolver only guards against an empty/whitespace `version`. Semver
|
||||
validation lives in `lobby/internal/domain/game/model.go:validateSemver`
|
||||
and runs at game-record construction time. Re-running it inside the
|
||||
resolver would either duplicate the rule (drift risk) or import the
|
||||
validator across package boundaries for no behavioural gain. Keeping the
|
||||
resolver narrow leaves it reusable from a future producer (for example
|
||||
`Game Master`, when it takes over `image_ref` resolution) without
|
||||
dragging Lobby's domain rules along.
|
||||
|
||||
The defensive `return start game: resolve image ref: %w` in
|
||||
`startgame.Service.Handle` is a guard against a future invariant
|
||||
violation; it is not exercised by the service-level test suite because
|
||||
the only resolver-failure mode (empty `version`) requires bypassing
|
||||
`game.Validate`, which `gameinmem.Save` always runs. Adding test
|
||||
scaffolding to skip validation would teach the test suite a back door
|
||||
that the production code path does not have.
|
||||
|
||||
## Paused State
|
||||
|
||||
`Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime
|
||||
@@ -904,11 +904,12 @@ game finish.
|
||||
### Per-member stats aggregate
|
||||
|
||||
Each `runtime_snapshot_update` carries a `player_turn_stats` array with one
|
||||
entry per active member: `{user_id, planets, population, ships_built}`.
|
||||
entry per active member: `{user_id, planets, population}`.
|
||||
`Lobby` aggregates these in `lobby:game_turn_stats:<game_id>:<user_id>` with
|
||||
the shape
|
||||
`{initial_planets, initial_population, initial_ships_built, max_planets,
|
||||
max_population, max_ships_built}`.
|
||||
`{initial_planets, initial_population, max_planets, max_population}`.
|
||||
`ships_built` is not part of the contract; the capability rule reduces to
|
||||
`planets` and `population` only.
|
||||
|
||||
Rules:
|
||||
|
||||
@@ -1032,11 +1033,18 @@ Key internal endpoints:
|
||||
| `GET` | `/api/v1/internal/healthz` | health probe |
|
||||
| `GET` | `/api/v1/internal/readyz` | readiness probe |
|
||||
|
||||
Note: the registration call from Lobby to Game Master after a successful
|
||||
container start is **outgoing** — Lobby calls
|
||||
`POST /api/v1/internal/games/{game_id}/register-runtime` on Game Master's
|
||||
internal port. Lobby does not expose an inbound `register-runtime`
|
||||
endpoint.
|
||||
Note: every Lobby ↔ Game Master synchronous call is **outgoing** from
|
||||
Lobby to Game Master's internal port at `LOBBY_GM_BASE_URL`. Lobby does
|
||||
not expose an inbound `register-runtime` endpoint or any other
|
||||
GM-facing endpoint:
|
||||
|
||||
| Call site | Method | Path on Game Master | Purpose |
|
||||
| --- | --- | --- | --- |
|
||||
| `startgame` (pre-publish) | `GET` | `/api/v1/internal/engine-versions/{version}/image-ref` | Resolve the Docker `image_ref` for `target_engine_version` synchronously before publishing `runtime:start_jobs`. Failure ⇒ `service_unavailable` or `engine_version_not_found`; the game stays in `ready_to_start`. |
|
||||
| `startgame` (post-container-up) | `POST` | `/api/v1/internal/games/{game_id}/register-runtime` | Register the runtime after a successful container start. Failure ⇒ forced `paused` (see §Paused State). |
|
||||
| `approveapplication`, `rejectapplication`, `redeeminvite`, `removemember`, `blockmember`, user-lifecycle cascade | `POST` | `/api/v1/internal/games/{game_id}/memberships/invalidate` | Tell GM to drop its in-process membership cache for the game after a roster mutation. Called **post-commit** and is fail-open: a non-2xx response is logged and metered but never rolls back the Lobby commit. GM's TTL safety net catches stale data within the next cache TTL window. |
|
||||
| `removemember` (engine-side cleanup, post-commit) | `POST` | `/api/v1/internal/games/{game_id}/race/{race_name}/banish` | Ask GM to deactivate the engine-side player after a permanent removal. Fail-open in the same sense as the invalidate call. |
|
||||
| `resumegame` | `GET` | `/api/v1/internal/games/{game_id}/liveness` | Check that GM has the runtime in `running` before transitioning the platform record from `paused` back to `running`. |
|
||||
|
||||
Admin-only operations (approve, reject, cancel, create public games, etc.) are
|
||||
also exposed on the internal port and are intended to be called by `Admin Service`
|
||||
@@ -1158,6 +1166,9 @@ Stable error codes:
|
||||
`permanent_block` sanction
|
||||
- `forbidden` — caller is not authorized for this operation on this game or
|
||||
this race name
|
||||
- `engine_version_not_found` — `target_engine_version` is missing or
|
||||
deprecated on `Game Master`'s engine version registry (returned by
|
||||
`lobby.game.start` at image-ref resolve time)
|
||||
- `internal_error` — unexpected service error
|
||||
- `service_unavailable` — upstream dependency unavailable
|
||||
|
||||
@@ -1227,13 +1238,12 @@ Stream names:
|
||||
- `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s`
|
||||
- `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents`
|
||||
|
||||
Runtime Manager integration:
|
||||
Game Master image-ref resolver:
|
||||
|
||||
- `LOBBY_ENGINE_IMAGE_TEMPLATE` with default `galaxy/game:{engine_version}` —
|
||||
Go-style template applied to a game's `target_engine_version` to resolve
|
||||
the Docker `image_ref` published on `runtime:start_jobs`. The template
|
||||
must contain the literal placeholder `{engine_version}`; Lobby fails
|
||||
fast at startup otherwise.
|
||||
- `image_ref` is resolved synchronously by `Game Master` from
|
||||
`target_engine_version` over its engine version registry; see
|
||||
§Game Start Flow. The legacy `LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template
|
||||
variable is retired and rejected at startup if set.
|
||||
|
||||
Upstream clients:
|
||||
|
||||
|
||||
+5
-2
@@ -3,6 +3,7 @@ module galaxy/lobby
|
||||
go 1.26.1
|
||||
|
||||
require (
|
||||
galaxy/cronutil v0.0.0-00010101000000-000000000000
|
||||
galaxy/postgres v0.0.0-00010101000000-000000000000
|
||||
galaxy/redisconn v0.0.0-00010101000000-000000000000
|
||||
github.com/alicebob/miniredis/v2 v2.37.0
|
||||
@@ -11,7 +12,6 @@ require (
|
||||
github.com/go-jet/jet/v2 v2.14.1
|
||||
github.com/jackc/pgx/v5 v5.9.2
|
||||
github.com/redis/go-redis/v9 v9.18.0
|
||||
github.com/robfig/cron/v3 v3.0.1
|
||||
github.com/stretchr/testify v1.11.1
|
||||
github.com/testcontainers/testcontainers-go v0.42.0
|
||||
github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0
|
||||
@@ -47,6 +47,7 @@ require (
|
||||
github.com/pressly/goose/v3 v3.27.1 // indirect
|
||||
github.com/redis/go-redis/extra/rediscmd/v9 v9.18.0 // indirect
|
||||
github.com/redis/go-redis/extra/redisotel/v9 v9.18.0 // indirect
|
||||
github.com/robfig/cron/v3 v3.0.1 // indirect
|
||||
github.com/sethvargo/go-retry v0.3.0 // indirect
|
||||
go.uber.org/multierr v1.11.0 // indirect
|
||||
golang.org/x/sync v0.20.0 // indirect
|
||||
@@ -95,7 +96,7 @@ require (
|
||||
github.com/moby/term v0.5.2 // indirect
|
||||
github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect
|
||||
github.com/oasdiff/yaml v0.0.9 // indirect
|
||||
github.com/oasdiff/yaml3 v0.0.9 // indirect
|
||||
github.com/oasdiff/yaml3 v0.0.12 // indirect
|
||||
github.com/opencontainers/go-digest v1.0.0 // indirect
|
||||
github.com/opencontainers/image-spec v1.1.1 // indirect
|
||||
github.com/perimeterx/marshmallow v1.1.5 // indirect
|
||||
@@ -123,6 +124,8 @@ require (
|
||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||
)
|
||||
|
||||
replace galaxy/cronutil => ../pkg/cronutil
|
||||
|
||||
replace galaxy/notificationintent => ../pkg/notificationintent
|
||||
|
||||
replace galaxy/postgres => ../pkg/postgres
|
||||
|
||||
+1
-2
@@ -208,8 +208,7 @@ github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOF
|
||||
github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
|
||||
github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48=
|
||||
github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM=
|
||||
github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g=
|
||||
github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o=
|
||||
github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M=
|
||||
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
|
||||
github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
|
||||
github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
|
||||
|
||||
@@ -7,9 +7,9 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"galaxy/cronutil"
|
||||
"galaxy/lobby/internal/domain/common"
|
||||
|
||||
cron "github.com/robfig/cron/v3"
|
||||
"golang.org/x/mod/semver"
|
||||
)
|
||||
|
||||
@@ -213,12 +213,6 @@ type NewGameInput struct {
|
||||
Now time.Time
|
||||
}
|
||||
|
||||
// standardCronParser parses the frozen five-field cron expression grammar
|
||||
// used by turn_schedule.
|
||||
var standardCronParser = cron.NewParser(
|
||||
cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow,
|
||||
)
|
||||
|
||||
// New validates input and returns a draft Game record. Validation errors
|
||||
// are returned verbatim so callers can surface them as invalid_request.
|
||||
func New(input NewGameInput) (Game, error) {
|
||||
@@ -401,7 +395,7 @@ func validateCronExpression(value string) error {
|
||||
if strings.TrimSpace(value) == "" {
|
||||
return fmt.Errorf("turn schedule must not be empty")
|
||||
}
|
||||
if _, err := standardCronParser.Parse(value); err != nil {
|
||||
if _, err := cronutil.Parse(value); err != nil {
|
||||
return fmt.Errorf("turn schedule must be a valid five-field cron expression: %w", err)
|
||||
}
|
||||
return nil
|
||||
|
||||
Reference in New Issue
Block a user