Files
galaxy-game/game/README.md
T
Ilia Denisov 601970b028
Tests · Go / test (push) Successful in 2m27s
Tests · UI / test (push) Waiting to run
Tests · Integration / integration (pull_request) Successful in 1m45s
Tests · Go / test (pull_request) Successful in 3m13s
Tests · UI / test (pull_request) Successful in 3m8s
refactor(game): lock-free storage, remove /command, flatten engine wrapper
Three-stage refactor of the game-engine plumbing (game logic untouched):

Stage 1 — lock-free persistence + admin serialisation. Remove the file
lock from repo/fs (the .lock file, the Read/Write-vs-*Safe duality and the
dead ReadSafe polling) and replace the two-step rename with a single atomic
rename so concurrent reads are torn-free without a lock. Serialise the
state-mutating admin writers (init/turn/banish) with one shared router
LimitMiddleware, rewritten to block on the request context instead of a
racy shared 100ms timer.

Stage 2 — remove the obsolete immediate-command path end to end. Players
submit through PUT /api/v1/order; the legacy PUT /api/v1/command path is
deleted across game (route, handler, 24 command factories, Ctrl), backend
(Commands handler/route, engineclient.ExecuteCommands), gateway (dispatch +
executeUserGamesCommand + routing entry), the FlatBuffers/model contract
(UserGamesCommand[Response]) and transcoder, plus every affected
OpenAPI/README/FUNCTIONAL/ARCHITECTURE doc. The integration proxy test is
converted to the order path.

Stage 3 — flatten the REST->engine wrapper. Replace the executor adapter,
the controller package functions and RepoController with one concrete
controller.Service; drop the single-implementation Repo and Storage
interfaces (repo.Repo / fs.FS are now concrete). Handlers depend on a thin
handler.Engine seam and own the domain->REST projection; storage is
resolved once at startup instead of per request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 13:37:07 +02:00

242 lines
11 KiB
Markdown

# Game Service Engine
`galaxy/game` is the game engine binary that runs inside one
`galaxy-game-{game_id}` container. It hosts a single game instance and exposes
a REST API for game initialization, turn advancement, player reports, and
batched player command execution.
## References
- [`openapi.yaml`](openapi.yaml) — REST contract.
- [`../docs/ARCHITECTURE.md`](../docs/ARCHITECTURE.md) — system architecture.
- [`../rtmanager/README.md`](../rtmanager/README.md) — Runtime Manager owns
container lifecycle for this binary.
## Container model
The engine is meant to be run inside a Docker container managed by
`Runtime Manager`. One container hosts exactly one game instance and listens
on TCP `:8080` inside the container. Outside the container the endpoint is
addressed as `http://galaxy-game-{game_id}:8080` through Docker's embedded DNS
on the configured `RTMANAGER_DOCKER_NETWORK`.
The container image is built from [`Dockerfile`](Dockerfile) at the root of
this module. The Dockerfile is a multi-stage build (Go builder + small runtime
base) that exposes `:8080`, runs as a non-root user, and ships container
labels that `Runtime Manager` reads at create time:
| Label | Meaning |
| --- | --- |
| `com.galaxy.cpu_quota` | CPU quota for the container (`--cpus`). |
| `com.galaxy.memory` | Memory limit for the container (`--memory`). |
| `com.galaxy.pids_limit` | PID limit for the container (`--pids-limit`). |
| `org.opencontainers.image.title` | `galaxy-game-engine`. |
Image defaults are `cpu_quota=1.0`, `memory=512m`, `pids_limit=512`. Operators
override them at image-build time by editing the Dockerfile labels; producers
do not pass per-game limits.
## Endpoints
The contract is the union of `openapi.yaml` and the technical liveness probe
described below. Endpoints split into two route classes:
| Class | Path | Caller | Purpose |
| --- | --- | --- | --- |
| Admin (GM-only) | `POST /api/v1/admin/init` | `Game Master` | Initialise the engine with a canonical `gameId` and the race roster. |
| Admin (GM-only) | `GET /api/v1/admin/status` | `Game Master` | Read the full game state. |
| Admin (GM-only) | `PUT /api/v1/admin/turn` | `Game Master` | Generate the next turn. |
| Admin (GM-only) | `POST /api/v1/admin/race/banish` | `Game Master` | Deactivate a race after a permanent platform removal. |
| Player | `PUT /api/v1/order` | `Game Master` | Validate and store a batch of player orders. |
| Player | `GET /api/v1/order` | `Game Master` | Fetch the previously stored player order for a turn. |
| Player | `GET /api/v1/report` | `Game Master` | Fetch the per-player turn report. |
| Probe | `GET /healthz` | `Runtime Manager` | Technical liveness probe. |
Admin paths are unauthenticated but are routed only from inside the
trusted network segment that connects `Game Master` to the engine
container. The engine does not enforce caller identity — network-level
segmentation is the boundary. Player paths apply the same rule and rely
on `Game Master` to forward only verified player payloads.
### Game endpoints
Documented in [`openapi.yaml`](openapi.yaml). When the engine has not been
initialised through `POST /api/v1/admin/init`, game endpoints respond
`501 Not Implemented` to make the uninitialised state unambiguous.
### `POST /api/v1/admin/init`
The canonical game identity is owned by the orchestrator (`Game Master`),
not by the engine. The request body is `{ "gameId": "<uuid>", "races": [...] }`
where:
- `gameId` is a non-zero UUID generated by the orchestrator before the
engine container is launched. The same value names the engine's host
storage directory and is persisted into `state.json`. The engine
rejects the zero UUID with `400 Bad Request` and any value that
conflicts with an existing `state.json` on disk with
`409 Conflict`. A second `init` on the same `gameId` is also
rejected with `409`; idempotency is not part of the contract.
- `races` is the race roster; minimum 10 entries.
On success the engine responds `201 Created` with a `StateResponse`
whose `id` echoes the supplied `gameId`.
### `StateResponse.finished`
`StateResponse` (returned by `GET /api/v1/admin/status` and
`PUT /api/v1/admin/turn`) carries a required boolean `finished` field.
The engine sets it to `true` exactly once on the turn-generation response
that ends the game; otherwise it stays `false`. `Game Master` uses this
field as the sole signal to run the platform finish flow. The conditional
logic that flips `finished` to `true` lives in the engine's domain code
and is owned by the engine maintainers.
### `POST /api/v1/admin/race/banish`
Deactivates a race after a permanent platform-level membership removal.
`Game Master` calls this endpoint synchronously after a Lobby-driven
remove-and-banish flow.
- Request body: `{ "race_name": "<name>" }`. `race_name` must be
non-empty and must match an existing race in the engine's roster.
- Successful response: `204 No Content` with an empty body.
- Error responses follow the same `400` / `500` envelope shape as the
other admin endpoints. The engine-side mechanics of `banish` (what
exactly happens to the race's planets, fleets, and pending orders) are
owned by the engine maintainers.
### `GET /healthz`
Technical liveness probe used by `Runtime Manager` and operator tooling.
- Returns `{"status":"ok"}` with HTTP `200` whenever the HTTP server is
serving requests, regardless of whether the engine has been initialised
through `POST /api/v1/admin/init`.
- Carries no game-state semantics. Use `GET /api/v1/admin/status` for
game-state inspection.
This endpoint exists so that `Runtime Manager` can probe a freshly started
container before `init` runs.
## Storage
The engine reads its persistent storage path from environment variables in
the following order of precedence:
1. `STORAGE_PATH` — historical name; honoured for backward compatibility.
2. `GAME_STATE_PATH` — canonical name written by `Runtime Manager`.
If both are set, `STORAGE_PATH` wins. If neither is set, the binary fails
fast on startup. The Dockerfile defaults `STORAGE_PATH=/var/lib/galaxy-game`
so the image runs out of the box if the operator does not supply either
variable.
`Runtime Manager` creates a per-game host directory under
`<RTMANAGER_GAME_STATE_ROOT>/{game_id}` and bind-mounts it into the container
at `RTMANAGER_ENGINE_STATE_MOUNT_PATH` (default `/var/lib/galaxy-game`). The
mount path is then exposed to the engine through `GAME_STATE_PATH` (and, for
compatibility, also as `STORAGE_PATH`).
The engine is responsible for the contents of the storage directory.
`Runtime Manager` never reads or writes the directory contents, never
deletes the directory, and never inspects per-game state files.
### Design rationale: storage-path env precedence
`STORAGE_PATH` wins over `GAME_STATE_PATH` because the engine already
shipped with `STORAGE_PATH` (see `game/Makefile` and
`game/internal/router/handler/handler.go`). Keeping `STORAGE_PATH` as
the authoritative variable means existing engine deployments and
integration fixtures continue to work without code change, while
`GAME_STATE_PATH` is the platform contract written by `Runtime Manager`
and documented in `ARCHITECTURE.md §9`.
Alternatives considered and rejected:
- accept only `GAME_STATE_PATH` — would force a breaking change on the
engine binary and on every existing `STORAGE_PATH=...` invocation in
`game/Makefile` and dev scripts;
- `GAME_STATE_PATH` wins over `STORAGE_PATH` — would silently invert
the meaning of an explicit `STORAGE_PATH=` invocation if the operator
also sets `GAME_STATE_PATH` for any reason.
### Design rationale: storage-path validation site
`game/internal/router/handler/handler.go` exports `ResolveStoragePath`,
which returns the engine storage path from the env-var pair above and
an error when neither is set. `cmd/http/main.go` calls it once at
startup, prints the error to stderr and exits non-zero on failure, then
builds the engine service (`controller.NewService(path)`) and hands it
to `router.NewRouter`.
Storage is resolved exactly once, at construction, rather than per
request: the `Service` holds the file-backed repo for the process
lifetime and `router.NewRouter` takes the `handler.Engine` it routes
to (in production, the `Service`). This keeps the env binding in one
place — a startup helper plus the `main` check — and leaves the
handlers free of configuration concerns.
## Build
The container image is built from [`Dockerfile`](Dockerfile). The Docker
build context is the workspace root (`galaxy/`) rather than the `game/`
subdirectory, because `game/` resolves `galaxy/{model,error,util,...}`
through `go.work` `replace` directives. From the workspace root:
```sh
docker build -t galaxy/game:test -f game/Dockerfile .
```
The build is two-staged: a `golang:1.26.2-alpine` builder produces a
statically linked binary (`CGO_ENABLED=0`), then `gcr.io/distroless/static-debian12:nonroot`
runs it as the `nonroot` user and exposes `:8080`.
### Design rationale: workspace-root build context
`game/` is a member of the multi-module `go.work` workspace at the
repository root. Its imports of `galaxy/model`, `galaxy/error`,
`galaxy/util`, etc. are satisfied by `replace` directives in `go.work`
that point at sibling modules under `pkg/`. There is no published
`galaxy/model` module to download.
A standalone `docker build ./game` therefore cannot resolve those
imports: the `pkg/` tree is outside the build context, and `game/go.mod`
alone has no `replace` directives pointing at it.
Alternatives rejected:
- adding `replace` directives to `game/go.mod` and copying `pkg/` into a
vendored layout — duplicates the workspace inside `game/`, drifts from
the rest of the repository, and forces every other workspace member
that ships a Dockerfile to repeat the trick;
- running `go mod vendor` inside `game/` before each build — workspaces
do not vendor cleanly, the resulting `vendor/` would be noisy, and CI
/ Makefile would need a custom pre-build step.
No `.dockerignore` is needed: every `COPY` in `game/Dockerfile` names an
explicit subdirectory (`pkg/calc`, `pkg/error`, `pkg/model`, `pkg/util`,
`game`), and BuildKit (forced by `# syntax=docker/dockerfile:1.7`) only
transfers the paths a `COPY` actually references.
### Design rationale: `gcr.io/distroless/static-debian12:nonroot` runtime base
Distroless static is roughly 2 MB and contains no shell or package
manager, which keeps the attack surface and CVE exposure minimal —
appropriate for a service that `Runtime Manager` will start by the
dozen. The image already runs as UID `65532:65532` named `nonroot`,
satisfying the non-root-user requirement without an explicit
`RUN adduser`.
Alternatives rejected:
- `alpine:3.20` — provides a shell for ad-hoc debugging but is roughly
10 MB and inherits regular CVE churn on `musl` / `apk`. The convenience
is not worth the larger attack surface for a fleet of identical engine
containers; operators can always `docker exec` from a debug image when
needed;
- `scratch` — smallest possible image, but ships no `/tmp`, no CA bundle,
and no `/etc/passwd`. Distroless wins on the same security axis while
leaving room for future needs (TLS, logging) without rebuilding the
base layout.