Files
galaxy-game/gamemaster/docs/stage08-module-skeleton.md
T
2026-05-03 07:59:03 +02:00

6.5 KiB
Raw Blame History

stage, title
stage title
08 Module skeleton

Stage 08 — GM module skeleton

This decision record captures the wiring choices made when bootstrapping the runnable gamemaster binary on top of the contracts and freeze tests landed by Stages 0107.

Context

../PLAN.md Stage 08 calls for a buildable gamemaster process that loads its environment-driven configuration, opens PostgreSQL and Redis pools, installs the OpenTelemetry runtime, exposes /healthz and /readyz on the trusted internal HTTP listener, and exits cleanly on SIGTERM within GAMEMASTER_SHUTDOWN_TIMEOUT. No business endpoints, no workers, and no persistence stores yet.

The reference implementation is rtmanager, the most recently landed Galaxy service that follows the platform-wide skeleton conventions (layered cmd / internal/{app, api, config, logging, telemetry}, app.Component lifecycle, OpenTelemetry runtime with deferred observable gauges, fail-fast environment loader). Stage 08 mirrors that skeleton with two deliberate divergences described below.

Decisions

1. go.mod scope is minimal at Stage 08

Only modules actually imported by Stage 08 code land in ../go.mod:

  • galaxy/postgres, galaxy/redisconn, galaxy/notificationintent (the last one was already present from Stage 07 freeze test);
  • the OpenTelemetry stack (otel, metric, trace, sdk, sdk/metric, OTLP exporters for traces and metrics over gRPC and HTTP, stdout exporters);
  • go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp;
  • github.com/redis/go-redis/v9 (promoted from indirect to direct);
  • github.com/jackc/pgx/v5 (transitive via pkg/postgres).

PLAN-listed modules that arrive with later consumers (go-jet/jet/v2, pressly/goose/v3, the testcontainers modules, go.uber.org/mock, galaxy/cronutil, galaxy/error, galaxy/util) are deliberately left out of Stage 08's go.mod. They join the module together with their first consumers in Stages 09 / 10 / 11 / 12.

Reasoning: keeping go mod tidy honest at every stage is cheaper than pre-declaring blank-import stubs. The PLAN's full list is the eventual shape of the module across the series, not a Stage 08 contract.

2. ShutdownTimeout lives at the top level of Config

The README §Configuration groups one variable — GAMEMASTER_SHUTDOWN_TIMEOUT — under a documentation group called "Lifecycle". The Go struct does not split that single field into a substruct: Config.ShutdownTimeout mirrors the rtmanager.Config.ShutdownTimeout shape so the two services stay isomorphic. The "Lifecycle" group remains a documentation grouping in ../README.md only.

3. Telemetry — counters and histograms now, observable gauges later

internal/telemetry/runtime.go registers every counter and histogram listed under ../README.md §Observability at process start (buildRuntime). The three observable gauges (gamemaster.runtime_records_by_status, gamemaster.scheduler.due_games, gamemaster.engine_versions_total) are declared up front but their callbacks are installed via a deferred Runtime.RegisterGauges(deps) call. The wiring layer at Stages 11 / 14 / 15 supplies the probes (per-status row count, due-now scheduler count, registered engine versions) once the persistence stores and the scheduler exist.

This matches the rtmanager pattern where runtime_records_by_status is registered through an analogous RegisterGauges plumbing.

4. PostgreSQL migrations are deferred to Stage 09

The README §Startup dependencies states "Embedded goose migrations apply synchronously before any listener opens." Stage 08 opens, instruments, and pings the PostgreSQL pool but does not call postgres.RunMigrations. The migrations package (internal/adapters/postgres/migrations/) is shipped by Stage 09; the runtime adds the one-line RunMigrations call at that stage.

Until then, the runtime is buildable, listener-ready, and serves /healthz + /readyz against a fresh PostgreSQL pool with no schema applied. This is acceptable because Stage 08 ships no business handlers and no workers; nothing reads or writes gamemaster.* tables yet.

5. Makefile mirrors rtmanager

../Makefile declares jet, mocks, integration targets identical in shape to rtmanager/Makefile. The jet target runs go run ./cmd/jetgen; the binary lands in Stage 09. The mocks target runs go generate ./internal/ports/... ./internal/api/internalhttp/handlers/...; the //go:generate directives land in Stages 10 / 12 / 19. Both targets fail until their prerequisites land — accepted because Stage 08 does not require either to succeed; only go build and go test ./gamemaster/... matter.

6. No Docker dependency

Game Master is forbidden from importing the Docker SDK (../README.md §Non-Goals). The skeleton therefore drops the newDockerClient / pingDocker helpers from internal/app/bootstrap.go and the Docker-related fields from internal/app/wiring.go. The readiness probe pings PostgreSQL and Redis only.

Files landed

  • cmd/gamemaster/main.go — process entrypoint.
  • internal/config/{config.go, env.go, validation.go, config_test.go} — GAMEMASTER-prefixed env loader plus required-vars fail-fast.
  • internal/logging/{logger.go, context.go} — slog JSON-stdout logger with request id and span id helpers.
  • internal/telemetry/{runtime.go, runtime_test.go} — OpenTelemetry runtime, instruments listed in §Observability, deferred gauge plumbing.
  • internal/api/internalhttp/{server.go, server_test.go}/healthz and /readyz listener with observability middleware.
  • internal/app/{app.go, app_test.go, bootstrap.go, runtime.go, wiring.go} — process lifecycle (component supervisor + reverse-order cleanup), Redis bootstrap helpers, minimal placeholder wiring.
  • Makefilejet, mocks, integration target stubs.
  • Updated go.mod / go.sum with the dependencies and replace directives for galaxy/postgres and galaxy/redisconn.

Verification

  • go build ./gamemaster/... succeeds.
  • go test ./gamemaster/... passes (existing contract / freeze tests plus the four new test files).
  • Manual smoke against a local Postgres + Redis confirms: /healthz returns 200 ok, /readyz returns 200 ready while both dependencies respond, and 503 service_unavailable once one of them is brought down.
  • SIGTERM ends the process within GAMEMASTER_SHUTDOWN_TIMEOUT, releasing PostgreSQL pool, Redis client, and telemetry providers in reverse construction order.