16 KiB
Testing
Test strategy and runbook for the Galaxy Game
platform. The platform ships three executables — gateway,
backend, game (the engine container) — plus the shared pkg/*
libraries. This document defines the layering of tests, the
mandatory minimum coverage per executable, the integration runbook,
and the principles every test must follow.
Layers
- Service tests verify a single executable in isolation. They
live next to the implementation as
*_test.gofiles and use only in-process or testcontainers-managed dependencies. The package either runs entirely in process or boots a single Postgres testcontainer per test. - Inter-service integration tests verify one cross-process seam
between two real executables (most often
gateway ↔ backend, sometimesbackend ↔ game). They live ingalaxy/integration/and drive the platform from outside the trust boundary. - Full system tests are a small, focused subset of the
integration suite that walks an entire user-facing flow from the
client edge through every component the flow touches. They live
in the same
integration/module and reuse the same fixtures.
Service tests are the cheapest and the broadest; integration tests are slower and broader; full-system tests are the slowest and the narrowest. The pyramid stays in this order — never replace a service test with a system test.
Global rules
- Every executable owns the service tests for its packages. Adding a
new package without
_test.gofiles is a review block. - Every cross-process seam must have at least one passing inter-service test before the seam is wired in production.
- Async flows (mail outbox, notification routes, runtime workers, push gRPC) get tests for both the success path and the retry / dead-letter path, and a duplicate-event safety check.
- Sync flows get happy path, validation failure, timeout propagation, and dependency unavailable.
- Every external or trusted-internal API must have contract tests
alongside behaviour tests.
backend/internal/server/contract_test.gois the reference; gateway runs the same shape againstgateway/openapi.yaml. - The integration suite must keep running on a developer machine
with Docker available. The only acceptable
t.Skipistestenv.RequireDocker(no daemon at all). Any failure deeper than that —tcpostgres.Run, network create, image build, schema migration — fails the test loudly witht.Fatal. The historical bug we fixed (silent skips on reaper failures masking 27 integration tests as "ok") came from treating an environment break as a skip.
Service-specific coverage
galaxy/gateway
Service tests live under gateway/internal/:
- Public REST routing, error projection, and OpenAPI contract validation.
- Authenticated gRPC envelope verification (
grpcapi.Server): signature, payload hash, freshness window, anti-replay reservation, unknown / revoked sessions. - Session cache (
session.BackendCache) — the only implementation in the codebase, a thin wrapper around thebackendclient.RESTClientper-request lookup. - Response signing for unary responses and stream events
(
authn.ResponseSigner). - Push hub (
push.Hub) and push fan-out (push_fanout.go). - Replay store (
replay.RedisStore) reservation semantics. - Anti-abuse rate limits per IP / session / user / message class.
galaxy/backend
Service tests live under backend/internal/:
- Startup wiring:
app.Applifecycle, telemetry runtime, Postgres pool, embedded migrations. - OpenAPI contract test (
internal/server/contract_test.go): validates every documented operation against the live gin engine. - Domain unit + e2e tests per package (
auth,user,admin,lobby,runtime,mail,notification,geo,push). E2E tests (*_e2e_test.go) spin up a Postgres testcontainer. - Mail outbox: pickup with
SELECT FOR UPDATE SKIP LOCKED, retry with backoff plus jitter, dead-letter pastMAX_ATTEMPTS, resend semantics (pending|retrying|dead_lettered→ re-armed,sent→ 409). - Notification: idempotent
Submit, route materialisation, push + email fan-out,OnUserDeletedcascade. Coverage of every catalog kind inbuildClientPushEventlives ininternal/notification/events_test.go. - Lobby: state-machine transitions, RND canonicalisation, sweeper.
- Runtime: per-game mutex serialisation, worker pool, scheduler, reconciler, force-next-turn skip flag.
- Admin: bcrypt cost 12, idempotent bootstrap, write-through cache, 409 Conflict on duplicate username, last-used timestamp.
- Geo: counter increment on every authenticated request, declared-country write at registration, fail-open semantics.
galaxy/game
The engine has its own service tests under game/:
- OpenAPI contract test (
game/openapi_contract_test.go). - Engine lifecycle (init, status, turn, banish, command, order, report) implemented by the engine package suites.
Integration runbook
Entry points
make -C integration preclean # idempotent leftover cleanup
make -C integration integration # preclean + serial test run
make -C integration integration-step # preclean + one-test-at-a-time
integration runs every test in the module sequentially
(-p=1 -parallel=1) — recommended default on a slow / shared
Docker. integration-step runs them one at a time with a fresh
preclean before each test and stops on the first failure; useful to
isolate a flake or build up to a full pass without losing context to
subsequent tests.
Why preclean matters
preclean keys off labels and removes:
- Containers labelled
org.testcontainers=true(every container the testcontainers-go library brings up — backend, gateway, game, postgres, redis, mailpit, ryuk). - Containers labelled
galaxy.backend=1— engine instances spawned by backend's runtime adapter directly on the host Docker daemon (seebackend/internal/dockerclient/types.go). - Networks labelled
org.testcontainers=true. - Locally-built images labelled
galaxy.test.kind=integration-image— thegalaxy/{backend,gateway,game}:integrationbuilds produced byintegration/testenv/images.go. Pulled service images (postgres:16-alpine,redis:7-alpine,axllent/mailpit,testcontainers/ryuk) are not touched, so the cache stays warm.
Ryuk reaper
The integration runners disable the testcontainers Ryuk reaper:
export TESTCONTAINERS_RYUK_DISABLED = true
This is environment-driven, not principled — Ryuk does not start
cleanly on the local colima setup we use, and preclean covers the
same job by labels. Re-enable Ryuk by exporting
TESTCONTAINERS_RYUK_DISABLED=false (or unset) before invoking the
make target if you have an environment where Ryuk works.
Cold runs
The first run after a clean checkout (or after preclean) rebuilds
three images: galaxy/backend:integration,
galaxy/gateway:integration, galaxy/game:integration. Cold cost
is ~30 s per image. Subsequent runs reuse the build cache; preclean
removes the tagged images themselves but BuildKit cache mounts
survive, so re-builds are fast.
Integration test coverage
Mandatory inter-service coverage in integration/:
- Gateway ↔ Backend (public auth):
auth_flow_test.go— register + confirm with mailpit-captured code; declared_country populated; idempotent re-confirm. - Gateway ↔ Backend (authenticated user surface):
user_account_test.go,user_profile_update_test.go,user_settings_update_test.go— signed envelope, FlatBuffers payload, response signature verification, BCP 47 / IANA validation. - Gateway ↔ Backend (anti-replay, signature, freshness):
gateway_edge_test.go— body-too-large, bad signature, payload_hash mismatch, stale timestamp, unknown session, unsupportedprotocol_version. - Gateway ↔ Backend (push):
notification_flow_test.go,session_revoke_test.go— push delivery to a SubscribeEvents stream and immediate stream close on revoke. - Gateway ↔ Backend (anti-replay):
anti_replay_test.go— duplicaterequest_idrejected. - Backend ↔ Postgres is exercised by every backend e2e test through testcontainers; integration tests do not duplicate it.
- Backend ↔ SMTP:
mail_flow_test.go— login-code email captured by mailpit; admin list reachessent; resend onsentreturns 409. - Backend ↔ Game engine:
runtime_lifecycle_test.go,engine_command_proxy_test.go— start container, healthz green, command, force-next-turn, finish, race name promotion. - Admin surface (REST):
admin_flow_test.go,admin_global_games_view_test.go,admin_engine_versions_test.go,admin_user_sanction_test.go— bootstrap + CRUD; visibility split between user and admin queries; engine-version registry CRUD; permanent block cascade. - Lobby flow without engine:
lobby_flow_test.go— owner-creates-private-game → open-enrollment → invite → redeem → memberships listing. - Soft delete cascade:
soft_delete_test.go—POST /api/v1/user/account/deletecascades through auth/lobby/notification/geo, gateway rejects subsequent calls. - Geo counters:
geo_counter_increments_test.go— multiple authenticated requests with differentX-Forwarded-Forvalues increment the user's per-country counter rows.
Full-system flows beyond the inter-service set are intentionally limited; pick scenarios that exercise the longest vertical slice the platform supports today.
Principles
Service tests
-
Postgres testcontainers must pin no-op observability providers. Tests that call
pgshared.OpenPrimary(ctx, cfg)fromgalaxy/postgrespassbackendpg.NoObservabilityOptions()...sootelsqlcannot fall through to the global tracer/meter providers. Without this, an unset OTEL endpoint in the developer environment can stall the test on a background exporter handshake.See
backend/internal/postgres/testopts.gofor the helper andbackend/internal/{auth,user,admin,lobby,mail,notification,runtime,geo,postgres}/test files for the established call sites. -
A bootstrap failure is fatal, not a skip. A test that needs a testcontainer must fail loudly when the container fails to come up.
t.Skipfis reserved fortestenv.RequireDocker(no daemon at all); anything past that —tcpostgres.Run,db.Ping, schema migration — usest.Fatalf.
Integration tests
-
Bootstrap is per-test. Each test calls
testenv.Bootstrap(t)to spin up a dedicated Postgres, Redis, mailpit, backend, and gateway. Cross-test contamination is impossible. -
Tests do not call
t.Parallel. Docker resource pressure makes parallel bootstraps flaky on commodity hardware. -
Anti-abuse limits are loosened by
testenv/gateway.go. The bulk-scenario default lifts every gateway rate-limit class (public_auth, identity-bucket per-email, IP/session/user/ message-class) to 10 000 req/window with a 1 000 burst. Negative- path edge tests ingateway_edge_test.gotighten specific limits per test to observe the protection firing. -
Image labels are intentional.
integration/testenv/images.gostamps every locally-built image withgalaxy.test.kind=integration-image;precleankeys off this label. Do not strip it from new image builds added to the test harness.
Test file ownership matrix
| Suite | Where | Boots | Runs how |
|---|---|---|---|
backend/internal/<pkg>/... unit |
per package | one Postgres testcontainer per test | go test ./internal/<pkg>/ |
backend/push |
backend/push/ |
nothing | go test ./push/ |
gateway/internal/<pkg>/... unit |
per package | mostly nothing; few use redis tc | go test ./internal/<pkg>/ |
pkg/transcoder, pkg/postgres unit |
per package | nothing / one tc per test | go test ./... from the package |
integration/ |
integration/ |
postgres + redis + mailpit + backend + gateway (+ optional game) | make -C integration integration |
Adding a new test
- Decide the layer: service, inter-service, or system. A backend change usually lands as service tests plus an integration test for any new cross-process behaviour.
- Reuse
testenvfixtures rather than rolling your own container orchestration. - Follow the bootstrap-per-test pattern; do not share a global stack across tests.
- Make the test deterministic: explicit timeouts (no
time.Sleep),t.Logfinstead offmt.Println, not.Parallel()inintegration/. - Service test that hits Postgres: copy the
startPostgres(t)helper from one of the existing packages (e.g.backend/internal/auth/auth_e2e_test.go) and passbackendpg.NoObservabilityOptions()...topgshared.OpenPrimary. - Integration test: add the file under
integration/, calltestenv.Bootstrap(t), and use the typed clients exposed bytestenvrather than reaching for raw HTTP. New scenarios that need bespoke gateway env should passExtrathroughBootstrapOptionsso the loosened defaults stay shared. - Any test that brings up its own Docker container (rare — most go
through
testenv) must label the container soprecleancan find it on the next run.
Day-to-day execution
- Run
go test ./<service>/...for the service you are touching; this is fast (Postgres testcontainers add ~3–5 s per package that uses them). - Run
make -C integration integrationbefore opening a PR that touches a cross-process seam. Cold runs build three Docker images (galaxy/backend:integration,galaxy/gateway:integration,galaxy/game:integration) — budget ~3 min for the cold path, ~75 s for the warm path. - Use
make -C integration integration-stepwhen a flake or a real regression needs a per-test isolation pass. - CI runs every layer on every push. Integration tests rely on a
reachable Docker daemon; missing daemon yields a clear skip from
testenv.RequireDocker, anything past that is a hard failure.
Out-of-scope (legacy architecture)
The previous nine-service architecture defined components that no
longer exist as distinct services. Their behaviour either lives
inside backend (and is therefore covered by backend service or
integration tests) or has been removed:
- Auth/Session Service, User Service, Notification Service,
Mail Service, Game Lobby Service, Runtime Manager,
Game Master, Admin Service — consolidated into
backend/internal/*. Inter-service seams between these former services are now in-process function calls; they are exercised by backend service tests, not by integration tests. - Geo Profile Service (suspicious-multi-country detection,
review-recommended state, session blocking through geo) — not
implemented. The geo concern is intentionally minimal (see
ARCHITECTURE.md §10) and the test plan does not assert on features we do not ship. - Billing Service — not implemented; no tests required until it appears.