Files
galaxy-game/gamemaster/docs/stage19-internal-rest-handlers.md
T
2026-05-03 07:59:03 +02:00

11 KiB
Raw Blame History

stage, title
stage title
19 Internal REST handlers

Stage 19 — Internal REST handlers

This decision record captures the non-obvious choices made while bringing the trusted internal REST listener of Game Master to full contract coverage. The handlers wire the existing service layer (stages 1317) and the membership cache (stage 16) to the eighteen operations frozen by ../api/internal-openapi.yaml. The listener lifecycle, OpenTelemetry middleware, and the /healthz / /readyz probes were established in stage 08; this stage adds the per-operation handler subpackage, widens the listener Dependencies struct to thread every service port, and grows ../internal/app/wiring.go to construct the entire dependency graph (stores, adapters, services, workers).

The reference precedent for the handler shape is the rtmanager internal/api/internalhttp/handlers tree; the conformance test mirrors rtmanager/internal/api/internalhttp/conformance_test.go. Eight decisions deviate from a literal reading of ../PLAN.md or are sharp enough to surface here.

Decisions

D1. Conformance test lives inside the listener package

Decision. The OpenAPI conformance test ships at ../internal/api/internalhttp/conformance_test.go, in the internalhttp package, not at gamemaster/api/openapi_conformance_test.go as the literal text of PLAN.md Stage 19 suggests.

Why. The test instantiates the live Server.handler through NewServer(...) with stub services and replays each documented operation against it. That requires reading the unexported handler field and wiring stub implementations of the handler-package interfaces; both are package-internal concerns that a sibling test under gamemaster/api/ would not have access to without exporting hooks that exist solely for the test. The rtmanager service ships the analogous test inside its own internalhttp package; we follow the same idiom.

How to apply. Future surface-shape audits go in this file. PLAN.md text is treated as a drift; the constraint that the spec is covered by a kin-openapi-driven validation is honoured exactly.

D2. DELETE /engine-versions/{version} calls Service.Deprecate

Decision. The handler bound to the OpenAPI operation internalDeprecateEngineVersion calls engineversion.Service.Deprecate and never Service.Delete. The 409 response declared by the spec for engine_version_in_use is therefore unreachable on this endpoint.

Why. The operation id and the first sentence of the description explicitly say «Sets the engine version status to deprecated». The sentence about hard removal and engine_version_in_use is a leftover of an earlier intent — Service.Deprecate does not consult IsReferencedByActiveRuntime, so the in-use rejection cannot fire through this code path. Hard delete is a future Admin Service operation; v1 does not expose it through REST.

How to apply. Calls that need to release the registry row permanently must use Service.Delete directly (not yet wired through REST). The spec's leftover 409 example is recorded here so a future contract reviewer does not chase a phantom failure mode.

D3. Workers wired and started alongside the listener

Decision. This stage constructs the scheduler ticker (stage 15) and the runtime:health_events consumer (stage 18) inside wiring.buildWorkers and registers them as App.Component-s next to the internal HTTP server.

Why. Stage 19's narrow text says «ship the gateway-, Lobby- and Admin-facing REST surface backed by the service layer». But the service layer collaborators referenced from the listener (turn generation, membership cache, runtime record store, etc.) only make sense inside a process that is also producing turns and consuming health events. Keeping the workers idle would leave the wiring graph half-built and the dev experience surprising. Constructing and starting them here makes a freshly-deployed process production-ready the moment the listener accepts traffic.

How to apply. The two workers are owned by App.Run exactly like the listener: both Run (long-lived) and Shutdown are part of App.Component. See D4 for the trivial Shutdown added on the scheduler ticker.

D4. schedulerticker.Worker.Shutdown is a no-op

Decision. The scheduler ticker adds a one-line Shutdown(_ context.Context) error { return nil } so the type satisfies app.Component.

Why. The worker's Run already returns when the supplied context is cancelled, and wg.Wait drains the in-flight per-game goroutines before Run returns. There is nothing additional to release. The healtheventsconsumer.Worker already had a Shutdown from stage 18; this just brings the two workers to the same shape.

How to apply. When future workers grow real shutdown logic (buffered output to flush, persistent connections to drain), they should embed it inside Shutdown rather than relying on context cancellation alone.

D5. New RuntimeRecordStore.List(ctx) method

Decision. The port grows a fifth read method: List(ctx) ([]runtime.RuntimeRecord, error). The PostgreSQL adapter implements it as one SELECT ordered by (created_at DESC, game_id ASC).

Why. The OpenAPI operation internalListRuntimes accepts an optional status query parameter. With the parameter set, the existing ListByStatus answers; without it, no method on the port returned every record. Composing the unfiltered list as a loop-over-statuses would dilute the ordering guarantee and double the round-trip cost. The new method is additive — every other caller keeps using its narrow read.

How to apply. Test fakes (fakeRuntimeRecords in service tests, fakeRuntimeRecordsBackend in scheduler-ticker tests) gained the method as well. The handler-side RuntimeRecordsReader interface exposes only the three read methods (Get, List, ListByStatus) so the listener cannot accidentally mutate runtime state.

D6. next_generation_at encodes as 0 when unscheduled

Decision. The wire RuntimeRecord.next_generation_at field is declared required: true and format: int64. The domain holds *time.Time and may carry nil — typically while a runtime is in status starting and the first scheduling write has not yet landed. The encoder writes 0 in that case and writes the UTC millisecond value otherwise.

Why. Encoding nil as 0 keeps the wire shape JSON-Schema-valid without forcing every record reader to handle a missing field. Optional pointer-typed timestamps (started_at, stopped_at, finished_at) are still omitted from the JSON form via omitempty, matching the required list in the spec.

How to apply. Readers must treat next_generation_at == 0 as «not yet scheduled» when the status warrants it; the field will turn into a real Unix-millisecond value once the scheduler's first write lands. The conformance test seeds a non-nil NextGenerationAt, so the strict response validator never sees this edge case at the wire boundary.

D7. Hot-path bodies are pass-through, not strict-decoded

Decision. Handlers internalExecuteCommands, internalPutOrders read the request body as raw bytes. The body is rejected only when empty or not valid JSON; unknown fields pass through.

Why. The OpenAPI request schemas for these three operations carry additionalProperties: true because the envelopes are engine-owned (galaxy/game/openapi.yaml). Strict decoding here would reject legitimate engine extensions and force every contract bump to land in two services in lockstep.

How to apply. Engine engine_validation_error responses still surface as the canonical Game Master error envelope at HTTP 502 — the engine response body is recorded in result.RawResponse for audit but the OpenAPI spec mandates the error envelope on this code path. If a future contract version requires forwarding the engine's 4xx body to the gateway, a separate response shape needs to land in the spec first.

D8. X-Galaxy-Caller mapping with admin default

Decision. The resolveOpSource helper maps the X-Galaxy-Caller header values to operation.OpSource as follows: gateway → OpSourceGatewayPlayer, lobby → OpSourceLobbyInternal, admin → OpSourceAdminRest. Missing or unrecognised values fall back to OpSourceAdminRest, matching the contract documented in ../README.md §«Internal REST API».

Why. The default is conservative: an Admin Service request without the header still records as admin instead of being dropped. The other two values are reserved for the documented callers and trim/lowercase tolerantly so a casing slip in development does not produce a confusing audit row.

How to apply. New REST callers should set the header explicitly. Adding a fourth caller type requires an OpSource constant alongside the mapping change.

What ships

What remains for later stages

  • Lobby refactor (stage 20) flips Lobby's start flow to call GET /api/v1/internal/engine-versions/{version}/image-ref synchronously and adds the InvalidateMemberships outbound call on every roster mutation.
  • Service-local integration suite (stage 21) drives the listener end-to-end against a real engine container.
  • Cross-service integration tests (stages 2223) cover Lobby + GM, Lobby + GM + RTM happy and failure paths.