Files
galaxy-game/gamemaster/docs/stage12-external-clients.md
T
2026-05-03 07:59:03 +02:00

9.9 KiB
Raw Blame History

stage, title
stage title
12 External clients

Stage 12 — External clients

This decision record captures the non-obvious choices made while implementing the five outbound adapters Game Master uses to talk to the engine, Game Lobby, Runtime Manager, the notification stream, and the lobby-events stream at PLAN Stage 12.

Context

../PLAN.md Stage 12 ships the adapter layer the service-layer stages 1318 depend on. Ports were frozen by Stage 10 (stage10-domain-and-ports.md) and the AsyncAPI/OpenAPI contracts were frozen by Stage 06 (stage06-contract-files.md). The reference precedent is rtmanager's adapter tree (rtmanager/internal/adapters/lobbyclient, rtmanager/internal/adapters/notificationpublisher, rtmanager/internal/adapters/healtheventspublisher), which Stage 11 already locked in as the canonical shape for Game Master persistence adapters. Stage 12 extends that precedent to the HTTP clients and stream publishers.

Six decisions deviate from a literal copy of the rtmanager precedent or extend the literal task list of PLAN Stage 12. Each is recorded below.

Decisions

1. Engine client carries no BaseURL in Config

Decision. engineclient.Config exposes only CallTimeout and ProbeTimeout. The engine endpoint URL is supplied per call from runtime_records.engine_endpoint.

Why. Game Master operates on N concurrent games at runtime; each game lives behind its own DNS hostname (http://galaxy-game-{game_id}:8080). Binding a base URL at construction would force a per-game client instance and complicate the caller. The port already reflects the right shape (baseURL is a method parameter on every method), so the adapter follows it. The *http.Client is shared, so the HTTP connection pool stays single-instance.

2. Two timeouts on the engine client, dispatched per method

Decision. The engine client routes turn-generation-class methods (Init, Turn, BanishRace, ExecuteCommands, PutOrders) through CallTimeout and inspect-style methods (Status, GetReport) through ProbeTimeout. Both are required and must be positive at construction.

Why. README §Configuration already declares the two (GAMEMASTER_ENGINE_CALL_TIMEOUT=30s, GAMEMASTER_ENGINE_PROBE_TIMEOUT=5s) for exactly this dispatch: turn generation on a large game can run for tens of seconds, while status/report reads are bounded and benefit from a tight ceiling. A single shared timeout would either starve the long calls or relax the short ones; the dispatch keeps the contract consistent with the documented intent.

3. Engine population (number) decoded into int via math.Round

Decision. engineclient decodes each PlayerState.population (typed as number in game/openapi.yaml) into a private float64 field, then converts to the port-level int through int(math.Round(value)). NaN, infinite, and negative values are rejected as ports.ErrEngineProtocolViolation.

Why. The port (Stage 10) and the AsyncAPI for gm:lobby_events both treat population as a non-negative integer; the engine spec is the only place it is typed as number. The engine in practice returns whole values, but a defensive math.Round removes any floating-point noise that would otherwise propagate to Lobby. Rejecting NaN/Inf/negative payloads keeps the protocol invariant explicit at the trust boundary.

4. Lobby client walks pagination with a hard page cap

Decision. lobbyclient.GetMemberships walks the next_page_token chain transparently with page_size=200, stopping when the upstream response carries an empty next_page_token. A hard cap of 64 pages (maxPages) surfaces as fmt.Errorf("%w: pagination overflow ...", ports.ErrLobbyUnavailable) when crossed.

Why. The port contract is "every membership of gameID, in any status"; the only way to satisfy it across Lobby's paged contract is to follow the chain. The 64-page cap is a defensive guard against a broken upstream that keeps issuing tokens; 64 × 200 = 12 800 memberships per game, two orders of magnitude beyond any realistic Galaxy roster, so legitimate traffic never trips it. Surfacing the overflow as ErrLobbyUnavailable lets the membership cache treat it the same as any other transport fault.

5. RTM client does not introduce ErrSemverPatchOnly

Decision. RTM's 409 conflict with error_code=semver_patch_only is wrapped as fmt.Errorf("%w: rtm patch: ... (error_code=semver_patch_only)", ports.ErrRTMUnavailable) without a dedicated typed sentinel.

Why. The Stage 10 port RTMClient.Patch declares only ErrRTMUnavailable. Adding ErrSemverPatchOnly here would extend the port contract beyond Stage 10's frozen surface, and the v1 service-layer caller (Stage 17, adminpatch) already validates semver-patch eligibility against engineversionstore before issuing the call. The 409 path is therefore a defence-in-depth signal, not a primary branch; a single wrapped error keeps the port narrow and lets the caller match on the message substring if it ever needs to (today it does not).

6. Lobby-events publisher reuses the rtmanager/healtheventspublisher

shape, with two methods sharing one stream

Decision. lobbyeventspublisher.Publisher exposes PublishSnapshotUpdate and PublishGameFinished, both hitting the same Redis Stream key (cfg.Streams.LobbyEvents, default gm:lobby_events). Each XADD encodes the same field vocabulary as rtmanager/healtheventspublisher: integer fields are serialised through strconv.FormatInt / strconv.Itoa, the per-player projection is JSON-encoded into one stream field (player_turn_stats), and the discriminator field (event_type) is a string literal pinned to one of the two AsyncAPI const values. No MAXLEN cap is set on XADD; an empty PlayerTurnStats slice is serialised as "[]" (literal). All time.Time fields are coerced to UTC before UnixMilli() so the published timestamps match the contract regardless of caller-supplied timezone.

Why. The two messages share one channel per the AsyncAPI spec (runtime-events-asyncapi.yaml); the discriminator is the documented dispatch key for Lobby's consumer. Using the existing field-encoding pattern from rtmanager/healtheventspublisher keeps the wire format consistent across services and lets Lobby reuse the same XADD-decoding helpers it already runs against runtime:health_events. Setting MAXLEN was considered and rejected: Game Master never processes the stream itself, and the Lobby consumer owns its consumer-group offset, so trimming would risk dropping unconsumed entries. The empty "[]" default keeps the stream entry valid JSON for the field even before the first turn generates (when no per-player stats exist yet).

7. Defensive Makefile guard for make mocks between Stage 12 and Stage 19

Decision. The mocks Makefile target now skips the internal/api/internalhttp/handlers/... line when that directory does not yet exist:

mocks:
    go generate ./internal/ports/...
    @if [ -d ./internal/api/internalhttp/handlers ]; then \
        go generate ./internal/api/internalhttp/handlers/...; \
    fi

Why. Stage 8 wired the Makefile to regenerate both port-level and handler-level mocks, but the handlers directory only appears at Stage 19. Without the guard, make mocks fails with lstat: no such file or directory between Stage 12 and Stage 19 — exactly when GM is being grown stage by stage. The guard makes the target idempotent across stages and adds zero cost when the directory is finally created.

Files landed

Verification

cd gamemaster

# Mocks regenerate cleanly with no diff after a second run.
make mocks
git diff --exit-code internal/adapters/mocks

# Adapter-level unit tests against httptest / miniredis.
go test ./internal/adapters/engineclient/...
go test ./internal/adapters/lobbyclient/...
go test ./internal/adapters/rtmclient/...
go test ./internal/adapters/notificationpublisher/...
go test ./internal/adapters/lobbyeventspublisher/...

# Full repo build remains green; Stage 06/07/0911 contract and
# adapter tests are unaffected.
go test ./...