feat: runtime manager
This commit is contained in:
@@ -0,0 +1,192 @@
|
||||
# Adapters
|
||||
|
||||
This document explains why the production adapters under
|
||||
[`../internal/adapters/`](../internal/adapters) — Docker SDK,
|
||||
Lobby internal HTTP client, notification-intent publisher, health-event
|
||||
publisher, job-result publisher — are shaped the way they are. The
|
||||
PostgreSQL stores and the Redis-coordination adapters live in
|
||||
[`postgres-migration.md`](postgres-migration.md).
|
||||
|
||||
## 1. `mockgen` is the repo-wide convention for wide ports
|
||||
|
||||
The Docker port has nine methods plus eight value types in the
|
||||
signatures, and most lifecycle services exercise nearly every method
|
||||
pair (start, stop, restart, patch, cleanup, reconcile, events, probe).
|
||||
A hand-rolled fake would either miss methods or balloon to a per-test
|
||||
fixture.
|
||||
|
||||
`internal/adapters/docker/` therefore uses `go.uber.org/mock` mocks:
|
||||
|
||||
- `//go:generate` directives live next to the interface declaration in
|
||||
`internal/ports/dockerclient.go`;
|
||||
- generated code is committed under `internal/adapters/docker/mocks/`
|
||||
(matching the `internal/adapters/postgres/jet/` discipline);
|
||||
- `make -C rtmanager mocks` is the single command operators run after
|
||||
a port-signature change.
|
||||
|
||||
The maintained `go.uber.org/mock` fork is preferred over the archived
|
||||
`github.com/golang/mock`. This convention applies to wide / recorder
|
||||
ports across the repository — Lobby uses the same pipeline for its
|
||||
narrow recorder ports (`RuntimeManager`, `IntentPublisher`,
|
||||
`GMClient`, `UserService`); see
|
||||
[`../../ARCHITECTURE.md`](../../ARCHITECTURE.md) for the cross-service
|
||||
rule.
|
||||
|
||||
The other two RTM ports (`LobbyInternalClient`,
|
||||
`NotificationIntentPublisher`) keep inline `_test.go` fakes: small
|
||||
surfaces, easy to fake by hand inside a single test file when needed.
|
||||
|
||||
## 2. `EngineEndpoint` is built inside the Docker adapter
|
||||
|
||||
The engine port is fixed at `8080`. Pushing it into `RunSpec` would
|
||||
force the start service to know an engine implementation detail;
|
||||
pushing it into config would give operators a knob that the engine
|
||||
image already does not honour. The Docker adapter exposes
|
||||
`EnginePort = 8080` as a package constant and constructs
|
||||
`RunResult.EngineEndpoint = "http://" + spec.Hostname + ":8080"`
|
||||
itself.
|
||||
|
||||
The adapter also leaves `container.Config.ExposedPorts` empty: RTM
|
||||
never publishes ports to the host. The user-defined Docker bridge
|
||||
network gives every container in the network DNS access to the engine
|
||||
via `galaxy-game-{game_id}:8080`.
|
||||
|
||||
## 3. `Run` removes the container on `ContainerStart` failure
|
||||
|
||||
`README.md §Lifecycles → Start` requires no orphan to remain after a
|
||||
failed start path. If `ContainerCreate` succeeds but `ContainerStart`
|
||||
fails, the adapter calls `ContainerRemove(force=true)` inside a fresh
|
||||
`context.Background()` (with a 10s timeout) so the cleanup runs even
|
||||
when the original ctx is already cancelled. The cleanup is best-effort:
|
||||
a remove failure is silently discarded because the original start
|
||||
failure is the actionable error returned to the caller.
|
||||
|
||||
The alternative — leaving rollback to the start service — would either
|
||||
duplicate the same code in every caller or invite a service that forgets
|
||||
to do it. Centralising the rule in the adapter keeps the port contract
|
||||
simple. The start service adds an additional rollback layer for the
|
||||
post-`Run` `Upsert` failure path; see [`services.md`](services.md) §5.
|
||||
|
||||
## 4. `RunSpec.Cmd` is optional
|
||||
|
||||
`ports.RunSpec` exposes an optional `Cmd []string`. Production callers
|
||||
leave it `nil` so the engine image's own `CMD` runs;
|
||||
`internal/adapters/docker/smoke_test.go` uses it to drive
|
||||
`["/bin/sh","-c","sleep 60"]` against `alpine:3.21`.
|
||||
|
||||
The alternative — building a dedicated test image with a pre-baked
|
||||
`sleep` command — would require an extra `Dockerfile` under testdata
|
||||
and a build step inside the smoke test. The single new field is
|
||||
documented as optional and ignored when empty; production behaviour is
|
||||
unchanged.
|
||||
|
||||
## 5. `EventsListen` filters at the adapter boundary
|
||||
|
||||
The Docker `/events` API accepts a `filters` query parameter, but the
|
||||
daemon treats it as a hint, not a guarantee. The adapter therefore
|
||||
double-checks at the boundary: only `Type == events.ContainerEventType`
|
||||
messages are passed through to the typed `<-chan ports.DockerEvent`.
|
||||
Doing the filter at the SDK level would still require a defensive
|
||||
recheck on the consumer side; consolidating the check in the adapter
|
||||
keeps the contract crisp and the consumer free of Docker-internal type
|
||||
discriminants.
|
||||
|
||||
The decoded event copies the actor's full `Attributes` map into
|
||||
`DockerEvent.Labels`. Docker mixes container labels and runtime
|
||||
attributes (`exitCode`, `image`, `name`, etc.) flat in the same map;
|
||||
RTM consumers filter by the `com.galaxy.` prefix when they care about
|
||||
labels, and the adapter extracts `exitCode` separately for `die`
|
||||
events.
|
||||
|
||||
## 6. Lobby HTTP client error mapping
|
||||
|
||||
`ports.LobbyInternalClient.GetGame` fixes:
|
||||
|
||||
- `200` → `LobbyGameRecord` decoded tolerantly (unknown fields
|
||||
ignored);
|
||||
- `404` → `ports.ErrLobbyGameNotFound`;
|
||||
- transport, timeout, or any other non-2xx → `ports.ErrLobbyUnavailable`
|
||||
wrapped with the original error so callers can `errors.Is` and still
|
||||
log the cause.
|
||||
|
||||
The start service treats `ErrLobbyUnavailable` as recoverable: it
|
||||
continues without the diagnostic data because the start envelope
|
||||
already carries the only required field (`image_ref`). The client
|
||||
mirrors `notification/internal/adapters/userservice/client.go`: cloned
|
||||
`*http.Transport`, `otelhttp.NewTransport` wrap, per-request
|
||||
`context.WithTimeout`, idempotent `Close()` releasing idle connections.
|
||||
|
||||
JSON decoding is tolerant: unknown fields in the success body do not
|
||||
break the call, so additive changes to Lobby's `GameRecord` schema do
|
||||
not require an RTM release.
|
||||
|
||||
## 7. Notification publisher wrapper signature
|
||||
|
||||
The wrapper drops the entry id returned by
|
||||
`notificationintent.Publisher.Publish` (rationale in
|
||||
[`domain-and-ports.md`](domain-and-ports.md) §7). The adapter is a
|
||||
thin shim:
|
||||
|
||||
- `NewPublisher(cfg)` constructs the inner publisher and forwards
|
||||
validation;
|
||||
- `Publish(ctx, intent)` calls the inner publisher and discards the
|
||||
entry id.
|
||||
|
||||
The compile-time assertion `var _ ports.NotificationIntentPublisher =
|
||||
(*Publisher)(nil)` lives in `publisher.go`.
|
||||
|
||||
## 8. Health-events publisher: snapshot upsert before stream XADD
|
||||
|
||||
Every emission goes through
|
||||
`ports.HealthEventPublisher.Publish`, which both XADDs to
|
||||
`runtime:health_events` and upserts `health_snapshots`. The snapshot
|
||||
upsert runs **before** the XADD: a successful Publish always leaves
|
||||
the snapshot store at least as fresh as the stream, and a partial
|
||||
failure leaves the snapshot a best-effort lower bound. Reversing the
|
||||
order would let consumers observe a stream entry whose
|
||||
`health_snapshots` row reflects the prior observation — a misleading
|
||||
inversion.
|
||||
|
||||
The `event_type → SnapshotStatus / SnapshotSource` mapping mirrors the
|
||||
table in [`../README.md` §Health Monitoring](../README.md). In
|
||||
particular, `container_started` collapses to `SnapshotStatusHealthy`
|
||||
and `probe_recovered` does the same (rationale in
|
||||
[`domain-and-ports.md`](domain-and-ports.md) §4).
|
||||
|
||||
## 9. Unit-test strategy
|
||||
|
||||
Both HTTP-backed adapters (Docker SDK, Lobby client) use
|
||||
`httptest.Server` fixtures. The Docker SDK speaks HTTP under the hood
|
||||
for both unix sockets and TCP, so adapter unit tests construct a
|
||||
Docker client with `client.WithHost(server.URL)` and
|
||||
`client.WithHTTPClient(server.Client())`, which lets table-driven
|
||||
handlers fake every Docker API endpoint without touching the real
|
||||
daemon. The Docker API version is pinned to `1.45`
|
||||
(`client.WithVersion("1.45")`) so the URL prefix is stable across CI
|
||||
machines whose daemon advertises a different default. Production
|
||||
wiring (in `internal/app/bootstrap.go`) keeps API negotiation enabled.
|
||||
|
||||
The notification publisher uses `miniredis` directly because the
|
||||
adapter's only side effect is an `XADD`, which `miniredis` reproduces
|
||||
faithfully and matches every other Galaxy intent test.
|
||||
|
||||
## 10. Docker smoke test
|
||||
|
||||
`internal/adapters/docker/smoke_test.go` runs on the default
|
||||
`go test ./...` invocation and calls `t.Skip` unless the local daemon
|
||||
is reachable (`/var/run/docker.sock` exists or `DOCKER_HOST` is set).
|
||||
The covered sequence:
|
||||
|
||||
1. provision a temporary user-defined bridge network;
|
||||
2. assert `EnsureNetwork` for present and missing names;
|
||||
3. pull `alpine:3.21` (`PullPolicyIfMissing`);
|
||||
4. subscribe to events;
|
||||
5. run a sleep container with the full `RunSpec` field set;
|
||||
6. observe a `start` event for the new container id;
|
||||
7. inspect, stop, remove, and verify `ErrContainerNotFound` is
|
||||
reported afterwards.
|
||||
|
||||
This is the production adapter's only end-to-end check that runs from
|
||||
the default `go test` pass; the broader service-local integration
|
||||
suite ([`integration-tests.md`](integration-tests.md)) is gated
|
||||
behind `-tags=integration`.
|
||||
Reference in New Issue
Block a user