feat: runtime manager
This commit is contained in:
+119
-7
@@ -344,7 +344,7 @@ On success:
|
||||
|
||||
### Application state machine
|
||||
|
||||
```
|
||||
```text
|
||||
submitted → approved
|
||||
submitted → rejected
|
||||
```
|
||||
@@ -453,7 +453,7 @@ with payload: `game_id`, `game_name`, `invitee_user_id`, `invitee_name`.
|
||||
|
||||
### Invite state machine
|
||||
|
||||
```
|
||||
```text
|
||||
created → redeemed
|
||||
created → declined
|
||||
created → revoked
|
||||
@@ -591,9 +591,11 @@ Sentinel errors: `ErrNameTaken`, `ErrInvalidName`, `ErrPendingMissing`,
|
||||
`pg_advisory_xact_lock(hashtextextended(canonical_key, 0))`. See
|
||||
`docs/postgres-migration.md` §6B for the full schema and decision
|
||||
record.
|
||||
- **Stub** (`lobby/internal/adapters/racenamestub/directory.go`) — in-process
|
||||
implementation for unit tests that do not need PostgreSQL. Chosen by
|
||||
`LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub`.
|
||||
- **In-memory** (`lobby/internal/adapters/racenameinmem/directory.go`) —
|
||||
in-process implementation used by unit tests that do not need
|
||||
PostgreSQL and by deployments that select the in-memory backend with
|
||||
`LOBBY_RACE_NAME_DIRECTORY_BACKEND=stub` (the config token name is
|
||||
preserved for backward compatibility).
|
||||
|
||||
A future dedicated `Race Name Service` replaces the adapter without changing
|
||||
the domain or service layer.
|
||||
@@ -737,7 +739,7 @@ sequenceDiagram
|
||||
|
||||
- If the container starts but `Lobby` cannot persist the runtime binding metadata,
|
||||
the start is a full failure: `Lobby` must issue a stop job to `Runtime Manager`
|
||||
before setting `start_failed`.
|
||||
with `reason=orphan_cleanup` before setting `start_failed`.
|
||||
- If metadata is persisted but `Game Master` is unavailable, the game must be
|
||||
placed in `paused`, not in `start_failed`. The container is alive; only the
|
||||
platform tracking is incomplete.
|
||||
@@ -745,6 +747,96 @@ sequenceDiagram
|
||||
- Concurrent start attempts for the same game must be serialized; the second
|
||||
attempt must fail if the first already moved the game to `starting`.
|
||||
|
||||
### Runtime Manager envelopes
|
||||
|
||||
`Lobby` is the producer for both `runtime:start_jobs` and `runtime:stop_jobs`.
|
||||
The `Lobby ↔ Runtime Manager` transport stays asynchronous indefinitely; there
|
||||
is no synchronous Lobby→RTM REST call in v1 or planned for v2.
|
||||
|
||||
`runtime:start_jobs` envelope:
|
||||
|
||||
| Field | Type | Notes |
|
||||
| --- | --- | --- |
|
||||
| `game_id` | string | Lobby `game_id`. |
|
||||
| `image_ref` | string | Docker reference resolved from `target_engine_version` via `LOBBY_ENGINE_IMAGE_TEMPLATE`. |
|
||||
| `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. |
|
||||
|
||||
`runtime:stop_jobs` envelope:
|
||||
|
||||
| Field | Type | Notes |
|
||||
| --- | --- | --- |
|
||||
| `game_id` | string | |
|
||||
| `reason` | enum | `orphan_cleanup`, `cancelled`, `finished`, `admin_request`, `timeout`. |
|
||||
| `requested_at_ms` | int64 | UTC milliseconds. |
|
||||
|
||||
`reason` semantics (Lobby producer side):
|
||||
|
||||
- `orphan_cleanup` — used by Lobby's runtime-job-result consumer to release a
|
||||
container whose metadata persistence failed after a successful container
|
||||
start.
|
||||
- `cancelled` — used by the user-lifecycle cascade and by explicit cancel paths
|
||||
for in-flight games.
|
||||
- `finished` — reserved; not produced by Lobby in v1 because `game_finished`
|
||||
is engine-driven and stop jobs after finish are an Admin/GM concern.
|
||||
- `admin_request` — reserved for future admin-initiated stop paths through
|
||||
Lobby; not produced in v1.
|
||||
- `timeout` — reserved for future enrollment-timeout-driven stop paths; not
|
||||
produced in v1.
|
||||
|
||||
### Design rationale: StopReason placement
|
||||
|
||||
The `StopReason` enum is declared in
|
||||
`lobby/internal/ports/runtimemanager.go` alongside the `RuntimeManager`
|
||||
interface that consumes it. The enum is publisher-side protocol: it
|
||||
mirrors the AsyncAPI discriminator on `runtime:stop_jobs`, has no
|
||||
behaviour beyond `Validate`, and co-locating it with the interface keeps
|
||||
the AsyncAPI ↔ Go mapping visible in one file.
|
||||
|
||||
Alternatives considered and rejected:
|
||||
|
||||
- a dedicated `lobby/internal/domain/runtimejob` package — manufactures
|
||||
a domain layer for a single string enum that exists only to be
|
||||
serialised onto a Redis Stream;
|
||||
- placing the enum in the publisher adapter package
|
||||
(`lobby/internal/adapters/runtimemanager`) — the callers (start-game
|
||||
service, runtime-job-result worker, user-lifecycle worker) live
|
||||
outside that package and would have to depend on a concrete adapter
|
||||
for an enum value.
|
||||
|
||||
### Design rationale: `engineimage.Resolver` validates the template at construction
|
||||
|
||||
`engineimage.Resolver` stores the validated template; the per-game
|
||||
`Resolve(version)` call is therefore a pure string substitution that
|
||||
cannot fail except on an empty `version`.
|
||||
|
||||
`LOBBY_ENGINE_IMAGE_TEMPLATE` is loaded at startup. A malformed value
|
||||
(missing `{engine_version}` placeholder, empty string) is an
|
||||
operational misconfiguration that fails fast before any traffic arrives
|
||||
— not on the first start-game request hours later. The synchronous
|
||||
start handler then incurs no per-call template-shape recheck.
|
||||
|
||||
A stateless free function `engineimage.Resolve(template, version)` was
|
||||
rejected: the only useful checkpoint for the template literal is at
|
||||
startup; a free function would either re-validate on every call (waste)
|
||||
or skip validation (regression).
|
||||
|
||||
The resolver only guards against an empty/whitespace `version`. Semver
|
||||
validation lives in `lobby/internal/domain/game/model.go:validateSemver`
|
||||
and runs at game-record construction time. Re-running it inside the
|
||||
resolver would either duplicate the rule (drift risk) or import the
|
||||
validator across package boundaries for no behavioural gain. Keeping the
|
||||
resolver narrow leaves it reusable from a future producer (for example
|
||||
`Game Master`, when it takes over `image_ref` resolution) without
|
||||
dragging Lobby's domain rules along.
|
||||
|
||||
The defensive `return start game: resolve image ref: %w` in
|
||||
`startgame.Service.Handle` is a guard against a future invariant
|
||||
violation; it is not exercised by the service-level test suite because
|
||||
the only resolver-failure mode (empty `version`) requires bypassing
|
||||
`game.Validate`, which `gameinmem.Save` always runs. Adding test
|
||||
scaffolding to skip validation would teach the test suite a back door
|
||||
that the production code path does not have.
|
||||
|
||||
## Paused State
|
||||
|
||||
`Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime
|
||||
@@ -1135,6 +1227,14 @@ Stream names:
|
||||
- `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s`
|
||||
- `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents`
|
||||
|
||||
Runtime Manager integration:
|
||||
|
||||
- `LOBBY_ENGINE_IMAGE_TEMPLATE` with default `galaxy/game:{engine_version}` —
|
||||
Go-style template applied to a game's `target_engine_version` to resolve
|
||||
the Docker `image_ref` published on `runtime:start_jobs`. The template
|
||||
must contain the literal placeholder `{engine_version}`; Lobby fails
|
||||
fast at startup otherwise.
|
||||
|
||||
Upstream clients:
|
||||
|
||||
- `LOBBY_USER_SERVICE_TIMEOUT` with default `1s`
|
||||
@@ -1264,6 +1364,18 @@ Key operations emit structured logs with these stable field names where applicab
|
||||
|
||||
## Verification
|
||||
|
||||
Test doubles split between two styles. Wide-surface ports with no
|
||||
production state (`RuntimeManager`, `IntentPublisher`, `GMClient`,
|
||||
`UserService`) use `gomock`-generated mocks under
|
||||
`internal/adapters/mocks/`; regenerate with `make -C lobby mocks`.
|
||||
Stateful behavioural fakes that mirror the production adapter
|
||||
contract (`gameinmem`, `applicationinmem`, `inviteinmem`,
|
||||
`membershipinmem`, `gameturnstatsinmem`, `racenameinmem`,
|
||||
`evaluationguardinmem`, `gapactivationinmem`, `streamoffsetinmem`)
|
||||
live as in-memory adapters under `internal/adapters/<name>inmem/`
|
||||
and stay hand-rolled because tests rely on their CAS, status-transition,
|
||||
and invariant-tracking behaviour.
|
||||
|
||||
Focused service-local coverage verifies:
|
||||
|
||||
- configuration loading and validation for all env var groups
|
||||
@@ -1274,7 +1386,7 @@ Focused service-local coverage verifies:
|
||||
- application flow: submit (eligibility check, race name check), approve, reject
|
||||
- invite flow: create, redeem (auto-membership), decline, revoke, expire on enrollment close
|
||||
- membership model: activate, remove, block with correct before/after-start semantics
|
||||
- Race Name Directory (redis + stub adapters against the same suite):
|
||||
- Race Name Directory (PostgreSQL + in-memory adapters against the same suite):
|
||||
canonicalization + confusable-pair policy, `Reserve`/`ReleaseReservation`
|
||||
per-game semantics, `MarkPendingRegistration`/`ExpirePendingRegistrations`
|
||||
window, `Register` idempotency + quota, `ReleaseAllByUser` cascade
|
||||
|
||||
Reference in New Issue
Block a user