--- stage: 14 title: Engine version registry service --- # Stage 14 — Engine version registry service This decision record captures the non-obvious choices made while implementing the `engine_version` registry service-layer at PLAN Stage 14. The service backs the `/api/v1/internal/engine-versions/*` REST surface (Stage 19) and the hot-path `image_ref` resolve called synchronously by Game Lobby's start flow. ## Context [`../PLAN.md` Stage 14](../PLAN.md) lists seven service methods: `List`, `Get`, `Create`, `Update`, `Deprecate`, `Delete`, `ResolveImageRef`. The lifecycle the service drives is frozen by [`../README.md` §Engine Version Registry](../README.md). The reference precedent for shape and audit semantics is [`../internal/service/registerruntime`](../internal/service/registerruntime/service.go) landed at Stage 13. Five decisions deviate from a literal reading of either Stage 14 or the existing port and migration shapes. Each is recorded below. ## Decisions ### 1. `EngineVersionStore.Delete` extension **Decision.** [`ports.EngineVersionStore`](../internal/ports/engineversionstore.go) gains a `Delete(ctx, version) error` method that returns `engineversion.ErrNotFound` when no row matches. The PostgreSQL-backed adapter [`engineversionstore.Store.Delete`](../internal/adapters/postgres/engineversionstore/store.go) issues a single `DELETE FROM engine_versions WHERE version = $1` and distinguishes "missing" from "removed" via `RowsAffected`. The mock at [`internal/adapters/mocks/mock_engineversionstore.go`](../internal/adapters/mocks/mock_engineversionstore.go) is regenerated by `make -C gamemaster mocks`. Three adapter tests (`TestDeleteHappy`, `TestDeleteNotFound`, `TestDeleteRejectsEmptyVersion`) mirror the pattern from the existing Deprecate tests. **Why.** Stage 14 explicitly requires the service to expose a hard `Delete` distinct from `Deprecate`. The Stage 11 port surface only carried `Deprecate` (idempotent soft-mark) and `IsReferencedByActiveRuntime` (read probe). Three alternatives were considered and rejected: - **Skip hard delete**: omits a Stage 14 deliverable and forces a port delta later. The OpenAPI 409 `engine_version_in_use` example would also become a dangling spec entry. - **Reuse `Deprecate` for both soft and hard semantics**: contradicts README §Engine Version Registry ("`status` values: ... `deprecated` (rejected on new starts; existing runtimes unaffected)"). A referenced version must remain deprecable so the operator can phase in a successor while existing runtimes finish out — folding the reference check into Deprecate would break that flow. - **Inline the SQL inside the service**: contradicts the per-port abstraction Stage 10 set up; the service must not import the jet table package. This is the same pattern Stage 13 D1 used for `RuntimeRecordStore.Delete`: a small, targeted contract delta admitted by the pre-launch single-init policy. ### 2. Hard-delete reference probe runs before adapter `Delete` **Decision.** [`Service.Delete`](../internal/service/engineversion/service.go) calls `versions.IsReferencedByActiveRuntime` first; on a positive result it surfaces `ErrInUse` without ever calling the adapter `Delete`. Only when the probe reports zero references does the service issue the SQL DELETE. **Why.** Two alternatives were rejected: - **Single transaction with `SELECT ... FOR UPDATE` plus DELETE**: requires the adapter to expose a transactional sub-interface and forces the service into store-internal locking semantics. The plan is single-instance (README §Non-Goals), so the small race window between probe and delete is acceptable and self-correcting (a late-arriving register-runtime against a deprecated version would fail at `runtime_records` insert anyway because the version row is gone — the eventual outcome is the same). - **Probe-after-delete**: leaks the DELETE on transient probe failures and surfaces a misleading "deleted" outcome to the caller. Surfacing `engine_version_in_use` before any mutation matches the README §Error Model wording and the OpenAPI `EngineVersionInUseError` example. ### 3. `engine_version_delete` op kind added to schema and domain **Decision.** A new audit value `engine_version_delete` is added to: - [`domain/operation.OpKind`](../internal/domain/operation/log.go) (constant, `IsKnown`, `AllOpKinds`); - [`migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) (the `operation_log_op_kind_chk` CHECK constraint); - README §Persistence Layout (the `op_kind` enum listing in the `operation_log` description). The pre-launch single-init policy from [`../../ARCHITECTURE.md` §Persistence Backends](../../ARCHITECTURE.md) allows editing `00001_init.sql` until first production deploy. **Why.** Two alternatives were rejected: - **Reuse `engine_version_deprecate`** for hard delete: semantically weak; audit consumers would have to inspect outcome plus an out-of-band column to tell soft from hard, defeating the audit's signal value. - **Skip audit for hard delete**: inconsistent with every other service-layer mutation (every Stage 13/14 mutation writes operation_log). Forensics on a destructive admin action are exactly where audit matters most. ### 4. `operation_log.game_id` column doubles as audit subject **Decision.** Engine-version CRUD audit entries store the canonical `version` string in the `OperationEntry.GameID` field (and therefore in the `operation_log.game_id` column). For `OpKindEngineVersionCreate` the canonical post-`ParseSemver` form is used (`v1.2.3`); for `OpKindEngineVersionUpdate` / `Deprecate` / `Delete` the user-supplied version is used so failed lookups still record the attempt verbatim. **Why.** Three alternatives were considered and rejected: - **Make `game_id` nullable and add a `subject_id` column**: requires a migration delta + jet regeneration + a domain field rename. Out of scope for stage 14 and inconsistent with the minimal-diff principle. - **Use a sentinel `engine_version:` prefix**: harder to query alongside per-game audit reads; the index `operation_log (game_id, started_at DESC)` already covers subject-scoped reads, and a sentinel prefix would force callers to strip it. - **Skip audit for engine-version CRUD**: README §Persistence Layout explicitly lists `engine_version_create | engine_version_update | engine_version_deprecate` as op_kind values; the audit table is the canonical surface. The decision is recorded both here and in the README §Persistence Layout note so future readers can find the overload rationale. ### 5. JSON-object validation for `Options` **Decision.** [`Service.Create`](../internal/service/engineversion/service.go) and `Service.Update` validate the `Options` byte slice as a JSON object before persisting (raw bytes are decoded into `map[string]any`; non-objects, including arrays and scalars, are rejected with `invalid_request`). Empty/whitespace-only input passes through as nil; the adapter (Stage 11 D5) already substitutes the schema default `'{}'::jsonb`. **Why.** The `engine_versions.options` column is `jsonb`. Persisting an array, scalar, or malformed JSON would either be rejected by the PostgreSQL parser at INSERT time (surfacing as a generic 500) or accepted and break engine-side consumers that expect an object. The service-layer validation surfaces a clear `invalid_request` early and keeps the contract honest. README §Engine Version Registry already describes `options` as a "free-form `jsonb` document" (object implied); the validation makes that wording load-bearing. ## Files landed - [`../internal/ports/engineversionstore.go`](../internal/ports/engineversionstore.go) — added `Delete` to the interface and the comment block. - [`../internal/adapters/postgres/engineversionstore/store.go`](../internal/adapters/postgres/engineversionstore/store.go) — implemented `Delete`. - [`../internal/adapters/postgres/engineversionstore/store_test.go`](../internal/adapters/postgres/engineversionstore/store_test.go) — added `TestDeleteHappy`, `TestDeleteNotFound`, `TestDeleteRejectsEmptyVersion`. - [`../internal/adapters/mocks/mock_engineversionstore.go`](../internal/adapters/mocks/mock_engineversionstore.go) — regenerated. - [`../internal/adapters/postgres/migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) — added `engine_version_delete` to `operation_log_op_kind_chk`. - [`../internal/domain/operation/log.go`](../internal/domain/operation/log.go) with [`log_test.go`](../internal/domain/operation/log_test.go) — added `OpKindEngineVersionDelete` plus `IsKnown`/`AllOpKinds` membership. - [`../internal/service/engineversion/service.go`](../internal/service/engineversion/service.go) with [`errors.go`](../internal/service/engineversion/errors.go) and [`service_test.go`](../internal/service/engineversion/service_test.go) — new orchestrator package and tests. - [`../internal/service/registerruntime/service_test.go`](../internal/service/registerruntime/service_test.go) — `fakeEngineVersions` gains a stub `Delete` to satisfy the extended port. - [`../README.md`](../README.md) — §References pointer to this record; §Persistence Layout note that engine-version CRUD audit entries store `version` in the `game_id` column and that `engine_version_delete` joins the op_kind enum. - [`../PLAN.md`](../PLAN.md) — Stage 14 marked done. ## Verification ```sh cd gamemaster # Mocks regenerate cleanly with no diff after the port extension is # committed alongside this stage. make mocks git diff --exit-code internal/adapters/mocks # Domain + port tests still pass (operation log enum membership). go test ./internal/domain/... ./internal/ports/... # Adapter test for the new Delete method and the migration's CHECK # constraint. go test ./internal/adapters/postgres/engineversionstore/... go test ./internal/adapters/postgres/operationlog/... # Service-level tests for the new orchestrator. go test ./internal/service/engineversion/... # Stage 13 service tests still pass (the fake gains a stub Delete). go test ./internal/service/registerruntime/... # Repo build succeeds at the workspace root. go build ./... ```