Files
galaxy-game/gamemaster/docs/stage14-engine-version-registry.md
T
2026-05-03 07:59:03 +02:00

10 KiB

stage, title
stage title
14 Engine version registry service

Stage 14 — Engine version registry service

This decision record captures the non-obvious choices made while implementing the engine_version registry service-layer at PLAN Stage 14. The service backs the /api/v1/internal/engine-versions/* REST surface (Stage 19) and the hot-path image_ref resolve called synchronously by Game Lobby's start flow.

Context

../PLAN.md Stage 14 lists seven service methods: List, Get, Create, Update, Deprecate, Delete, ResolveImageRef. The lifecycle the service drives is frozen by ../README.md §Engine Version Registry. The reference precedent for shape and audit semantics is ../internal/service/registerruntime landed at Stage 13.

Five decisions deviate from a literal reading of either Stage 14 or the existing port and migration shapes. Each is recorded below.

Decisions

1. EngineVersionStore.Delete extension

Decision. ports.EngineVersionStore gains a Delete(ctx, version) error method that returns engineversion.ErrNotFound when no row matches. The PostgreSQL-backed adapter engineversionstore.Store.Delete issues a single DELETE FROM engine_versions WHERE version = $1 and distinguishes "missing" from "removed" via RowsAffected. The mock at internal/adapters/mocks/mock_engineversionstore.go is regenerated by make -C gamemaster mocks. Three adapter tests (TestDeleteHappy, TestDeleteNotFound, TestDeleteRejectsEmptyVersion) mirror the pattern from the existing Deprecate tests.

Why. Stage 14 explicitly requires the service to expose a hard Delete distinct from Deprecate. The Stage 11 port surface only carried Deprecate (idempotent soft-mark) and IsReferencedByActiveRuntime (read probe). Three alternatives were considered and rejected:

  • Skip hard delete: omits a Stage 14 deliverable and forces a port delta later. The OpenAPI 409 engine_version_in_use example would also become a dangling spec entry.
  • Reuse Deprecate for both soft and hard semantics: contradicts README §Engine Version Registry ("status values: ... deprecated (rejected on new starts; existing runtimes unaffected)"). A referenced version must remain deprecable so the operator can phase in a successor while existing runtimes finish out — folding the reference check into Deprecate would break that flow.
  • Inline the SQL inside the service: contradicts the per-port abstraction Stage 10 set up; the service must not import the jet table package.

This is the same pattern Stage 13 D1 used for RuntimeRecordStore.Delete: a small, targeted contract delta admitted by the pre-launch single-init policy.

2. Hard-delete reference probe runs before adapter Delete

Decision. Service.Delete calls versions.IsReferencedByActiveRuntime first; on a positive result it surfaces ErrInUse without ever calling the adapter Delete. Only when the probe reports zero references does the service issue the SQL DELETE.

Why. Two alternatives were rejected:

  • Single transaction with SELECT ... FOR UPDATE plus DELETE: requires the adapter to expose a transactional sub-interface and forces the service into store-internal locking semantics. The plan is single-instance (README §Non-Goals), so the small race window between probe and delete is acceptable and self-correcting (a late-arriving register-runtime against a deprecated version would fail at runtime_records insert anyway because the version row is gone — the eventual outcome is the same).
  • Probe-after-delete: leaks the DELETE on transient probe failures and surfaces a misleading "deleted" outcome to the caller.

Surfacing engine_version_in_use before any mutation matches the README §Error Model wording and the OpenAPI EngineVersionInUseError example.

3. engine_version_delete op kind added to schema and domain

Decision. A new audit value engine_version_delete is added to:

The pre-launch single-init policy from ../../ARCHITECTURE.md §Persistence Backends allows editing 00001_init.sql until first production deploy.

Why. Two alternatives were rejected:

  • Reuse engine_version_deprecate for hard delete: semantically weak; audit consumers would have to inspect outcome plus an out-of-band column to tell soft from hard, defeating the audit's signal value.
  • Skip audit for hard delete: inconsistent with every other service-layer mutation (every Stage 13/14 mutation writes operation_log). Forensics on a destructive admin action are exactly where audit matters most.

4. operation_log.game_id column doubles as audit subject

Decision. Engine-version CRUD audit entries store the canonical version string in the OperationEntry.GameID field (and therefore in the operation_log.game_id column). For OpKindEngineVersionCreate the canonical post-ParseSemver form is used (v1.2.3); for OpKindEngineVersionUpdate / Deprecate / Delete the user-supplied version is used so failed lookups still record the attempt verbatim.

Why. Three alternatives were considered and rejected:

  • Make game_id nullable and add a subject_id column: requires a migration delta + jet regeneration + a domain field rename. Out of scope for stage 14 and inconsistent with the minimal-diff principle.
  • Use a sentinel engine_version:<v> prefix: harder to query alongside per-game audit reads; the index operation_log (game_id, started_at DESC) already covers subject-scoped reads, and a sentinel prefix would force callers to strip it.
  • Skip audit for engine-version CRUD: README §Persistence Layout explicitly lists engine_version_create | engine_version_update | engine_version_deprecate as op_kind values; the audit table is the canonical surface.

The decision is recorded both here and in the README §Persistence Layout note so future readers can find the overload rationale.

5. JSON-object validation for Options

Decision. Service.Create and Service.Update validate the Options byte slice as a JSON object before persisting (raw bytes are decoded into map[string]any; non-objects, including arrays and scalars, are rejected with invalid_request). Empty/whitespace-only input passes through as nil; the adapter (Stage 11 D5) already substitutes the schema default '{}'::jsonb.

Why. The engine_versions.options column is jsonb. Persisting an array, scalar, or malformed JSON would either be rejected by the PostgreSQL parser at INSERT time (surfacing as a generic 500) or accepted and break engine-side consumers that expect an object. The service-layer validation surfaces a clear invalid_request early and keeps the contract honest. README §Engine Version Registry already describes options as a "free-form jsonb document" (object implied); the validation makes that wording load-bearing.

Files landed

Verification

cd gamemaster

# Mocks regenerate cleanly with no diff after the port extension is
# committed alongside this stage.
make mocks
git diff --exit-code internal/adapters/mocks

# Domain + port tests still pass (operation log enum membership).
go test ./internal/domain/... ./internal/ports/...

# Adapter test for the new Delete method and the migration's CHECK
# constraint.
go test ./internal/adapters/postgres/engineversionstore/...
go test ./internal/adapters/postgres/operationlog/...

# Service-level tests for the new orchestrator.
go test ./internal/service/engineversion/...

# Stage 13 service tests still pass (the fake gains a stub Delete).
go test ./internal/service/registerruntime/...

# Repo build succeeds at the workspace root.
go build ./...