feat: runtime manager
This commit is contained in:
+180
-4
@@ -1,8 +1,184 @@
|
||||
# Game Service Engine
|
||||
|
||||
Galaxy game engine — hosts a single game instance and exposes a REST API for
|
||||
game initialization, turn advancement, player reports, and command execution.
|
||||
`galaxy/game` is the game engine binary that runs inside one
|
||||
`galaxy-game-{game_id}` container. It hosts a single game instance and exposes
|
||||
a REST API for game initialization, turn advancement, player reports, and
|
||||
batched player command execution.
|
||||
|
||||
## API
|
||||
## References
|
||||
|
||||
The REST contract is documented in [`openapi.yaml`](openapi.yaml).
|
||||
- [`openapi.yaml`](openapi.yaml) — REST contract.
|
||||
- [`../ARCHITECTURE.md`](../ARCHITECTURE.md) — system architecture.
|
||||
- [`../rtmanager/README.md`](../rtmanager/README.md) — Runtime Manager owns
|
||||
container lifecycle for this binary.
|
||||
|
||||
## Container model
|
||||
|
||||
The engine is meant to be run inside a Docker container managed by
|
||||
`Runtime Manager`. One container hosts exactly one game instance and listens
|
||||
on TCP `:8080` inside the container. Outside the container the endpoint is
|
||||
addressed as `http://galaxy-game-{game_id}:8080` through Docker's embedded DNS
|
||||
on the configured `RTMANAGER_DOCKER_NETWORK`.
|
||||
|
||||
The container image is built from [`Dockerfile`](Dockerfile) at the root of
|
||||
this module. The Dockerfile is a multi-stage build (Go builder + small runtime
|
||||
base) that exposes `:8080`, runs as a non-root user, and ships container
|
||||
labels that `Runtime Manager` reads at create time:
|
||||
|
||||
| Label | Meaning |
|
||||
| --- | --- |
|
||||
| `com.galaxy.cpu_quota` | CPU quota for the container (`--cpus`). |
|
||||
| `com.galaxy.memory` | Memory limit for the container (`--memory`). |
|
||||
| `com.galaxy.pids_limit` | PID limit for the container (`--pids-limit`). |
|
||||
| `org.opencontainers.image.title` | `galaxy-game-engine`. |
|
||||
|
||||
Image defaults are `cpu_quota=1.0`, `memory=512m`, `pids_limit=512`. Operators
|
||||
override them at image-build time by editing the Dockerfile labels; producers
|
||||
do not pass per-game limits.
|
||||
|
||||
## Endpoints
|
||||
|
||||
The contract is the union of `openapi.yaml` and the technical liveness probe
|
||||
described below.
|
||||
|
||||
### Game endpoints
|
||||
|
||||
Documented in [`openapi.yaml`](openapi.yaml). When the engine has not been
|
||||
initialised through `POST /api/v1/init`, game endpoints respond `501 Not
|
||||
Implemented` to make the uninitialised state unambiguous.
|
||||
|
||||
### `GET /healthz`
|
||||
|
||||
Technical liveness probe used by `Runtime Manager` and operator tooling.
|
||||
|
||||
- Returns `{"status":"ok"}` with HTTP `200` whenever the HTTP server is
|
||||
serving requests, regardless of whether the engine has been initialised
|
||||
through `POST /api/v1/init`.
|
||||
- Carries no game-state semantics. Use `GET /api/v1/status` for game-state
|
||||
inspection.
|
||||
|
||||
This endpoint exists so that `Runtime Manager` can probe a freshly started
|
||||
container before `init` runs.
|
||||
|
||||
## Storage
|
||||
|
||||
The engine reads its persistent storage path from environment variables in
|
||||
the following order of precedence:
|
||||
|
||||
1. `STORAGE_PATH` — historical name; honoured for backward compatibility.
|
||||
2. `GAME_STATE_PATH` — canonical name written by `Runtime Manager`.
|
||||
|
||||
If both are set, `STORAGE_PATH` wins. If neither is set, the binary fails
|
||||
fast on startup. The Dockerfile defaults `STORAGE_PATH=/var/lib/galaxy-game`
|
||||
so the image runs out of the box if the operator does not supply either
|
||||
variable.
|
||||
|
||||
`Runtime Manager` creates a per-game host directory under
|
||||
`<RTMANAGER_GAME_STATE_ROOT>/{game_id}` and bind-mounts it into the container
|
||||
at `RTMANAGER_ENGINE_STATE_MOUNT_PATH` (default `/var/lib/galaxy-game`). The
|
||||
mount path is then exposed to the engine through `GAME_STATE_PATH` (and, for
|
||||
compatibility, also as `STORAGE_PATH`).
|
||||
|
||||
The engine is responsible for the contents of the storage directory.
|
||||
`Runtime Manager` never reads or writes the directory contents, never
|
||||
deletes the directory, and never inspects per-game state files.
|
||||
|
||||
### Design rationale: storage-path env precedence
|
||||
|
||||
`STORAGE_PATH` wins over `GAME_STATE_PATH` because the engine already
|
||||
shipped with `STORAGE_PATH` (see `game/Makefile` and
|
||||
`game/internal/router/handler/handler.go`). Keeping `STORAGE_PATH` as
|
||||
the authoritative variable means existing engine deployments and
|
||||
integration fixtures continue to work without code change, while
|
||||
`GAME_STATE_PATH` is the platform contract written by `Runtime Manager`
|
||||
and documented in `ARCHITECTURE.md §9`.
|
||||
|
||||
Alternatives considered and rejected:
|
||||
|
||||
- accept only `GAME_STATE_PATH` — would force a breaking change on the
|
||||
engine binary and on every existing `STORAGE_PATH=...` invocation in
|
||||
`game/Makefile` and dev scripts;
|
||||
- `GAME_STATE_PATH` wins over `STORAGE_PATH` — would silently invert
|
||||
the meaning of an explicit `STORAGE_PATH=` invocation if the operator
|
||||
also sets `GAME_STATE_PATH` for any reason.
|
||||
|
||||
### Design rationale: storage-path validation site
|
||||
|
||||
`game/internal/router/handler/handler.go` exports `ResolveStoragePath`,
|
||||
which returns the engine storage path from the env-var pair above and
|
||||
an error when neither is set. `cmd/http/main.go` calls it before
|
||||
constructing the router, prints the error to stderr, and exits non-zero.
|
||||
The existing `initConfig` closure also calls `ResolveStoragePath` to
|
||||
populate `controller.Param.StoragePath` at request time; the error there
|
||||
is dropped because `main` already validated the environment at startup.
|
||||
|
||||
This keeps the public router surface (`router.NewRouter`) unchanged —
|
||||
the env binding is satisfied by one helper plus a startup check, with
|
||||
no API ripple. Moving env reading entirely into `main` and changing
|
||||
`NewRouter` / `NewDefaultExecutor` to accept an explicit path was
|
||||
rejected: it churns multiple call sites for no functional gain. The
|
||||
current shape leaves the configurer closure ready for future
|
||||
config-injection refactors without forcing one now.
|
||||
|
||||
## Build
|
||||
|
||||
The container image is built from [`Dockerfile`](Dockerfile). The Docker
|
||||
build context is the workspace root (`galaxy/`) rather than the `game/`
|
||||
subdirectory, because `game/` resolves `galaxy/{model,error,util,...}`
|
||||
through `go.work` `replace` directives. From the workspace root:
|
||||
|
||||
```sh
|
||||
docker build -t galaxy/game:test -f game/Dockerfile .
|
||||
```
|
||||
|
||||
The build is two-staged: a `golang:1.26.2-alpine` builder produces a
|
||||
statically linked binary (`CGO_ENABLED=0`), then `gcr.io/distroless/static-debian12:nonroot`
|
||||
runs it as the `nonroot` user and exposes `:8080`.
|
||||
|
||||
### Design rationale: workspace-root build context
|
||||
|
||||
`game/` is a member of the multi-module `go.work` workspace at the
|
||||
repository root. Its imports of `galaxy/model`, `galaxy/error`,
|
||||
`galaxy/util`, etc. are satisfied by `replace` directives in `go.work`
|
||||
that point at sibling modules under `pkg/`. There is no published
|
||||
`galaxy/model` module to download.
|
||||
|
||||
A standalone `docker build ./game` therefore cannot resolve those
|
||||
imports: the `pkg/` tree is outside the build context, and `game/go.mod`
|
||||
alone has no `replace` directives pointing at it.
|
||||
|
||||
Alternatives rejected:
|
||||
|
||||
- adding `replace` directives to `game/go.mod` and copying `pkg/` into a
|
||||
vendored layout — duplicates the workspace inside `game/`, drifts from
|
||||
the rest of the repository, and forces every other workspace member
|
||||
that ships a Dockerfile to repeat the trick;
|
||||
- running `go mod vendor` inside `game/` before each build — workspaces
|
||||
do not vendor cleanly, the resulting `vendor/` would be noisy, and CI
|
||||
/ Makefile would need a custom pre-build step.
|
||||
|
||||
No `.dockerignore` is needed: every `COPY` in `game/Dockerfile` names an
|
||||
explicit subdirectory (`pkg/calc`, `pkg/error`, `pkg/model`, `pkg/util`,
|
||||
`game`), and BuildKit (forced by `# syntax=docker/dockerfile:1.7`) only
|
||||
transfers the paths a `COPY` actually references.
|
||||
|
||||
### Design rationale: `gcr.io/distroless/static-debian12:nonroot` runtime base
|
||||
|
||||
Distroless static is roughly 2 MB and contains no shell or package
|
||||
manager, which keeps the attack surface and CVE exposure minimal —
|
||||
appropriate for a service that `Runtime Manager` will start by the
|
||||
dozen. The image already runs as UID `65532:65532` named `nonroot`,
|
||||
satisfying the non-root-user requirement without an explicit
|
||||
`RUN adduser`.
|
||||
|
||||
Alternatives rejected:
|
||||
|
||||
- `alpine:3.20` — provides a shell for ad-hoc debugging but is roughly
|
||||
10 MB and inherits regular CVE churn on `musl` / `apk`. The convenience
|
||||
is not worth the larger attack surface for a fleet of identical engine
|
||||
containers; operators can always `docker exec` from a debug image when
|
||||
needed;
|
||||
- `scratch` — smallest possible image, but ships no `/tmp`, no CA bundle,
|
||||
and no `/etc/passwd`. Distroless wins on the same security axis while
|
||||
leaving room for future needs (TLS, logging) without rebuilding the
|
||||
base layout.
|
||||
|
||||
Reference in New Issue
Block a user