Files
galaxy-game/gamemaster/docs/stage09-postgres-migration.md
T
2026-05-03 07:59:03 +02:00

12 KiB

stage, title
stage title
09 PostgreSQL schema, migrations, jet

Stage 09 — PostgreSQL schema, migrations, jet

This decision record captures the schema and code-generation pipeline landed for Game Master at PLAN Stage 09. It is a service-local mirror of ../../rtmanager/docs/postgres-migration.md but only documents the decisions specific to Stage 09; the stage-24 postgres-migration.md reorganisation will later subsume and supersede this record.

Context

../PLAN.md Stage 09 finalises the persistence schema and the code-generation pipeline. Stage 08 already opens, instruments, and pings the PostgreSQL pool but does not apply any migrations. The durable surface for runtime state, engine version registry, player mappings, and the audit log is described in ../README.md §Persistence Layout. Stage 09 ships:

  • internal/adapters/postgres/migrations/00001_init.sql plus the matching embed package;
  • cmd/jetgen — a testcontainers-driven regeneration pipeline for the go-jet/v2 query builder code;
  • the generated jet code under internal/adapters/postgres/jet/gamemaster/{model,table}/, committed verbatim;
  • the postgres.RunMigrations call in internal/app/runtime.go, applied after the PostgreSQL pool ping and before any listener is built.

The reference precedent is rtmanager, the most recently landed PG-backed service in the workspace.

Decisions

1. Schema and role provisioning are excluded from 00001_init.sql

Decision. The gamemaster schema and the matching gamemasterservice role are created outside the migration sequence (in tests by ../cmd/jetgen/main.go provisionRoleAndSchema; in production by an ops init script not in scope for this stage). The embedded migration 00001_init.sql only contains DDL for the four service-owned tables and indexes and assumes it runs as the schema owner with search_path=gamemaster.

Why. ../../ARCHITECTURE.md §Database topology mandates that each service connects with its own role whose grants are restricted to its own schema. Mixing role creation, schema creation, and table DDL into one script forces the migration to run as a superuser on every replica boot and effectively relaxes the per-service role boundary. The rtmanager precedent settled on the split first; GM follows it for the same architectural reason. This is a deliberate deviation from PLAN Stage 09's literal CREATE SCHEMA IF NOT EXISTS gamemaster; instruction, called out in the comment header at the top of 00001_init.sql.

2. Natural primary keys mirror the platform identifiers

Decision. Every PK is a natural identifier already owned by another component:

  • runtime_records.game_id — Lobby's platform identifier;
  • engine_versions.version — semver string from the registry;
  • player_mappings (game_id, user_id) — composite, both columns owned by Lobby/User Service.
  • operation_log.idbigserial, the only synthetic PK because the audit table has no natural identity per row.

Why. The same reasoning as in ../../rtmanager/docs/postgres-migration.md §2 applies: surrogate keys would force every cross-service join through a lookup table, while the natural keys keep the persistence layer pin-compatible with the contracts (every register-runtime envelope already names game_id, every Lobby resolve names version, every player command names user_id).

3. Defense-in-depth CHECK constraints on every status enum

Decision. Five CHECK constraints reproduce the Go-level enums in the schema:

  • runtime_records_status_chk — seven runtime statuses (starting, running, generation_in_progress, generation_failed, stopped, engine_unreachable, finished);
  • engine_versions_status_chkactive | deprecated;
  • operation_log_op_kind_chk — nine operation kinds (register_runtime, turn_generation, force_next_turn, banish, stop, patch, engine_version_create, engine_version_update, engine_version_deprecate);
  • operation_log_op_source_chk — three op sources (gateway_player, lobby_internal, admin_rest);
  • operation_log_outcome_chksuccess | failure.

The Go-level enums in the domain layer (added in Stage 10) remain the source of truth for application code.

Why. The same defense-in-depth argument as for rtmanager: the storage boundary catches an adapter regression that would otherwise persist an unexpected string. Operator-side queries (SELECT … WHERE op_kind = 'patch') benefit from the enum being verifiable directly in psql without consulting the Go source. PostgreSQL's CREATE TYPE … AS ENUM was rejected because adding values to a PG enum type requires ALTER TYPE outside a transaction and complicates the single-init pre-launch policy (decision §6).

4. Indexes derive from concrete query shapes

Decision. Three secondary indexes ship with 00001_init.sql:

  • runtime_records (status, next_generation_at) — drives the scheduler ticker scan (WHERE status='running' AND next_generation_at <= now() once per second);
  • player_mappings (game_id, race_name) UNIQUE — enforces the one-race-per-game invariant at the storage boundary;
  • operation_log (game_id, started_at DESC) — drives audit reads ordered by recency.

The README §Persistence Layout list also mentions player_mappings (game_id), which is intentionally not added: the composite primary key on (game_id, user_id) already serves as a leftmost-prefix index for WHERE game_id = $1, and a one-column duplicate would only double the write cost for no plan-stability gain. The README's indexes list is corrected in the same patch to drop the redundant entry.

Why. Each remaining index has a single concrete read shape behind it. The composite ordering on (status, next_generation_at) lets the planner satisfy the scheduler scan with one index sweep. The descending ordering on (game_id, started_at DESC) matches the ListByGame ORDER BY started_at DESC shape already established by rtmanager.operationlogstore.ListByGame.

5. next_generation_at is nullable

Decision. runtime_records.next_generation_at timestamptz admits NULL; runtime_records.skip_next_tick boolean NOT NULL DEFAULT false does not.

Why. A row enters the table at register-runtime with status='starting' and no scheduled tick yet — the tick is only computed once the engine /admin/init succeeds and the CAS flips the status to running. NULL captures «no tick scheduled» without forcing a sentinel value into the column. The scheduler index (status, next_generation_at) still works correctly: the predicate next_generation_at <= now() is undefined for NULL inputs, and PG excludes those rows from the result set, which is the desired behaviour. skip_next_tick is a boolean knob set or cleared by the force-next-turn flow; NULL would be a third state with no semantic, so the column is NOT NULL with a false default.

6. Single-init pre-launch policy applies as documented

Decision. 00001_init.sql evolves in place until first production deploy. Adding a column, an index, or a new table during the pre-launch development window edits this file directly rather than producing 00002_*.sql. The runtime applies the migration on every boot; if the schema is already at head, pkg/postgres's goose adapter exits zero.

Why. The schema-per-service architectural rule (../../ARCHITECTURE.md §Persistence Backends) endorses a single-init policy for pre-launch services. The pre-launch window allows non-additive changes (column rename, type narrowing, CHECK tightening) that a multi-step migration sequence would force into awkward two-step rewrites. Once the service ships to production, the next schema change becomes 00002_*.sql and the policy lifts.

7. cmd/jetgen is a one-to-one mirror of rtmanager/cmd/jetgen

Decision. ../cmd/jetgen/main.go follows the same shape as ../../rtmanager/cmd/jetgen/main.go: spin a postgres:16-alpine testcontainer, open it as superuser, provision the role and schema, open a second pool with search_path=gamemaster, apply the embedded goose migrations, then invoke github.com/go-jet/jet/v2/generator/postgres.GenerateDB with schema=gamemaster. Constants differ (gamemasterservice, gamemaster, galaxy_gamemaster) but the algorithm and helper shape are intentionally identical.

Why. Two PG-backed services should not diverge on a dev-only code generator that nothing else in the workspace relies on. Mirroring rtmanager keeps make -C <service> jet interchangeable for operators and minimises the cognitive overhead of moving between services.

8. Generated jet code is committed

Decision. The output of make -C gamemaster jet lands under ../internal/adapters/postgres/jet/gamemaster/{model,table}/ and is committed verbatim.

Why. go build ./... from the repository root must work without Docker; CI runners and contributor machines without a local Docker daemon must still pass go test ./gamemaster/... for the non-PG-store parts of the module. The generation pipeline itself remains available behind make jet for everyone who wants to regenerate.

9. Migrations apply synchronously before any listener opens

Decision. ../internal/app/runtime.go calls postgres.RunMigrations(ctx, pgPool, migrations.FS(), ".") immediately after the postgres.Ping succeeds and before newWiring/internalhttp.NewServer are constructed. A non-zero exit on migration failure follows the pkg/postgres policy.

Why. ../README.md §Startup dependencies specifies that «embedded goose migrations apply synchronously before any listener opens». Repeated process boots against a head schema return goose's «no work to do» success — this is how the policy stays operationally cheap, since a freshly-spawned replica re-applies the same 00001_init.sql with no work and proceeds straight to opening its listeners.

Files landed

Verification

  • cd gamemaster && go mod tidy — no missing dependency, no superfluous indirect.
  • make -C gamemaster jet — bring up postgres:16-alpine, apply 00001_init.sql, regenerate internal/adapters/postgres/jet/...; git status is clean after a second run.
  • go build ./gamemaster/... succeeds (including the generated jet code).
  • go test ./gamemaster/... passes — existing contract, freeze, and config/telemetry/HTTP tests are unaffected.
  • Manual smoke against a local PostgreSQL with an empty gamemaster schema and a gamemasterservice role: the process applies the migration, /readyz returns 200, and a second boot exits zero on the «no work to do» path.