diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 5711900..b635636 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -417,9 +417,9 @@ It also stores a denormalized runtime snapshot for convenience, at least: * `engine_health_summary`. Additionally, `Game Lobby` aggregates per-member game statistics from -`player_turn_stats` carried on each `runtime_snapshot_update` event: current -and running-max of `planets`, `population`, and `ships_built`. The aggregate -is retained from game start until capability evaluation at `game_finished`. +`player_turn_stats` carried on each `runtime_snapshot_update` event: +current and running-max of `planets` and `population`. The aggregate is +retained from game start until capability evaluation at `game_finished`. This prevents user-facing list/read flows from fan-out requests into `Game Master`. @@ -544,7 +544,7 @@ background worker. `RND.ReleaseAllByUser(user_id)` atomically with membership/application/invite cancellations for the affected user. -## 8. Game Master +## 8. [Game Master](gamemaster/README.md) `Game Master` owns runtime and operational metadata of already running games. @@ -561,6 +561,40 @@ It owns: * engine version registry and version-specific engine options; * runtime mapping `platform user_id -> engine player UUID` for each running game. +### Topology + +`Game Master` runs as a single process in v1. The in-process scheduler is +authoritative; multi-instance with leader election is an explicit future +iteration. Every other service that interacts with `Game Master` +(`Edge Gateway`, `Game Lobby`, `Admin Service`, `Runtime Manager`) treats +GM as a singleton on the trusted network segment. + +### Engine container contract + +`Game Master` is the only platform component that talks to the engine. The +engine container exposes two route classes: + +* admin paths under `/api/v1/admin/*` — `init`, `status`, `turn`, and + `race/banish`. They are unauthenticated and reachable only inside the + trusted network segment that connects GM to the engine container; +* player paths under `/api/v1/{command, order, report}` — invoked by GM on + behalf of an authenticated platform user; the actor field on each call + is set by GM from the verified user identity, never from the inbound + payload; +* `GET /healthz` — liveness probe used by `Runtime Manager` and operator + tooling. + +Two engine-side fields are part of the contract: + +* `StateResponse.finished:bool` — when `true` on a turn-generation + response, GM transitions the runtime to `finished`, publishes + `game_finished`, and dispatches the finish notification. The conditional + logic that flips the flag lives in the engine's domain code and is not + GM's concern; +* `POST /api/v1/admin/race/banish` with body `{race_name}` — invoked by GM + in response to the Lobby-driven banish flow after a permanent + platform-level membership removal. The engine returns `204` on success. + ### Game Master status model Minimum runtime-level status set: @@ -571,8 +605,12 @@ Minimum runtime-level status set: * `generation_failed` * `stopped` * `engine_unreachable` +* `finished` -`running` here means `running_accepting_commands`. +`running` here means `running_accepting_commands`. `finished` is terminal: +the runtime record stays in this state indefinitely; no further turn +generation, command, or order is accepted, and operator cleanup is the +only path out. ### Game command routing @@ -599,14 +637,25 @@ Private-game owner can use the subset allowed for the owner of that game. ### Turn cutoff and scheduling -`Game Master` is the owner of authoritative platform time for turn cutoff decisions. +`Game Master` is the owner of authoritative platform time for turn cutoff +decisions. -Commands arriving exactly on the boundary of a new turn are considered stale and must not reach the engine. +The cutoff is enforced by a single status compare-and-swap: every player +command, order, and report read requires `runtime_status=running` at the +moment of the call, and turn generation begins by CAS-ing +`running → generation_in_progress`. There is no separately tracked shadow +window or grace period — the status transition itself is the boundary. +Commands arriving after the CAS are rejected with `runtime_not_running`. -The scheduler is a subsystem inside `Game Master`. -It triggers turn generation according to the game schedule. +The scheduler is a subsystem inside `Game Master`. It triggers turn +generation according to the game schedule. -If a manual “force next turn” is executed, the next scheduled turn slot must be skipped so that players still get at least one full normal schedule interval before the following generated turn. +If a manual `force next turn` is executed, the next scheduled turn slot +must be skipped so that players still get at least one full normal +schedule interval before the following generated turn. The skip is +recorded as `runtime_records.skip_next_tick=true`; the scheduler advances +`next_generation_at` by one extra cron step the next time it computes the +tick and clears the flag. ### Runtime snapshot publishing @@ -615,16 +664,27 @@ consumed by `Game Lobby`. Events include: * `runtime_snapshot_update` — carries the current `current_turn`, `runtime_status`, `engine_health_summary`, and a `player_turn_stats` array - with one entry per active member (`user_id`, `planets`, `population`, - `ships_built`). `Game Lobby` maintains a per-game per-user stats aggregate - from these events for capability evaluation at game finish. + with one entry per active member (`user_id`, `planets`, `population`). + `Game Lobby` maintains a per-game per-user stats aggregate from these + events for capability evaluation at game finish. * `game_finished` — carries the final snapshot values and triggers the platform status transition plus Race Name Directory capability evaluation inside `Game Lobby`. -`Game Master` does not retain the aggregate; it only publishes the per-turn -observation. `Game Lobby` is responsible for holding initial values and -running maxima across the lifetime of the game. +Publication cadence is event-driven. GM publishes a snapshot when: + +* a turn was generated (success or failure); +* `runtime_status` transitioned (e.g., + `running ↔ generation_in_progress`, `running → engine_unreachable`, + `* → finished`); +* `engine_health_summary` changed in response to a `runtime:health_events` + observation; consecutive observations with identical summaries are + debounced. + +There is no periodic heartbeat. `Game Master` does not retain the +aggregate; it only publishes the per-turn observation. `Game Lobby` is +responsible for holding initial values and running maxima across the +lifetime of the game. ### Runtime/engine finish flow @@ -847,13 +907,17 @@ requests for no operational benefit. * `Gateway -> Admin Service` * `Gateway -> User Service` * `Gateway -> Game Lobby` -* `Gateway -> Game Master` +* `Gateway -> Game Master` for verified player command, order, and report + calls; * `Auth / Session Service -> User Service` * `Auth / Session Service -> Mail Service` * `Geo Profile Service -> Auth / Session Service` * `Geo Profile Service -> User Service` * `Game Lobby -> User Service` -* `Game Lobby -> Game Master` for critical registration/update calls +* `Game Lobby -> Game Master` for `register-runtime` after a successful + container start, engine-version `image-ref` resolve, membership + invalidation hook, banish, and the liveness reply consumed by Lobby's + resume flow; * `Game Master -> Runtime Manager` for inspect, restart, patch, stop, and cleanup REST calls * `Admin Service -> Runtime Manager` for operational inspect, restart, patch, stop, and cleanup REST calls @@ -864,11 +928,15 @@ requests for no operational benefit. * `Lobby -> Runtime Manager` runtime jobs through `runtime:start_jobs` (`{game_id, image_ref, requested_at_ms}`) and `runtime:stop_jobs` (`{game_id, reason, requested_at_ms}`); * `Runtime Manager -> Lobby` job outcomes through `runtime:job_results`; * `Runtime Manager -> Notification Service` admin-only failure intents (image pull, container start, start config) through `notification:intents`; -* `Runtime Manager` outbound technical health stream `runtime:health_events` consumed by `Game Master`; `Game Lobby` and `Admin Service` are reserved as future consumers; +* `Runtime Manager` outbound technical health stream `runtime:health_events` + consumed by `Game Master`; `Game Lobby` and `Admin Service` are reserved + as future consumers; * all event-bus propagation; * `Game Master -> Game Lobby` runtime snapshot updates (including `player_turn_stats` for capability aggregation) and game-finish events - through a dedicated Redis Stream consumed by `Game Lobby`; + through the `gm:lobby_events` Redis Stream consumed by `Game Lobby`, + published event-only with no periodic heartbeat (turn generation, + status transition, or debounced engine-health summary change); * `User Service -> Game Lobby` user lifecycle events (`user.lifecycle.permanent_blocked`, `user.lifecycle.deleted`) through the `user:lifecycle_events` Redis Stream, consumed by `Game Lobby` to cascade @@ -908,6 +976,10 @@ PostgreSQL is the source of truth for table-shaped business state: registry (registered/reservation/pending tiers); * runtime manager runtime records (`game_id -> current_container_id`), per-operation audit log, and latest health snapshot per game; +* game master runtime records (`game_id -> engine_endpoint`, + status/turn/scheduling), the engine version registry (`engine_versions`), + per-game player mappings (`game_id, user_id -> race_name, + engine_player_uuid`), and the GM operation log; * idempotency records, expressed as `UNIQUE` constraints on the durable table — not as a separate kv; * retry scheduling state, expressed as a `next_attempt_at` column on the @@ -931,9 +1003,9 @@ Redis is the source of truth for ephemeral and runtime-coordination state: ### Database topology * Single PostgreSQL database `galaxy`. -* Schema per service: `user`, `mail`, `notification`, `lobby`, `rtmanager`. - Reserved for future use: `geoprofile`. Not allocated unless needed: - `gateway`, `authsession`. +* Schema per service: `user`, `mail`, `notification`, `lobby`, `rtmanager`, + `gamemaster`. Reserved for future use: `geoprofile`. Not allocated unless + needed: `gateway`, `authsession`. * Each service connects with its own PostgreSQL role whose grants are restricted to its own schema (defense-in-depth). * Authentication is username + password only. `sslmode=disable`. No client @@ -1012,7 +1084,8 @@ crossing the SQL boundary carry `time.UTC` as their location. ### Configuration For each service `` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`, -`LOBBY`, `RTMANAGER`, `GATEWAY`, `AUTHSESSION` }, the Redis connection accepts: +`LOBBY`, `RTMANAGER`, `GAMEMASTER`, `GATEWAY`, `AUTHSESSION` }, the Redis +connection accepts: * `_REDIS_MASTER_ADDR` (required) * `_REDIS_REPLICA_ADDRS` (optional, comma-separated) @@ -1020,7 +1093,7 @@ For each service `` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`, * `_REDIS_DB`, `_REDIS_OPERATION_TIMEOUT` For PG-backed services (`USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`, -`RTMANAGER`) the Postgres connection accepts: +`RTMANAGER`, `GAMEMASTER`) the Postgres connection accepts: * `_POSTGRES_PRIMARY_DSN` (required; `postgres://:@:5432/galaxy?search_path=&sslmode=disable`) @@ -1384,7 +1457,17 @@ Rules: * upgrade during a running game is allowed only as a patch update within the same major/minor line; * game-engine version management is manual in v1; * each engine version may carry version-specific engine options; -* `Game Master` owns the engine version registry and its internal API. +* `Game Master` owns the engine version registry from v1 — `(version, + image_ref, options, status)` rows live in the `gamemaster` schema and + are managed exclusively through GM's internal REST surface; +* `Game Lobby` resolves `image_ref` synchronously through GM at game start + by calling `GET /api/v1/internal/engine-versions/{version}/image-ref`; + `LOBBY_ENGINE_IMAGE_TEMPLATE` and any Lobby-side template-based + resolution are removed without a backward-compat shim. If GM is + unavailable when Lobby attempts the resolve, the start fails with + `service_unavailable` and `runtime:start_jobs` is never published; +* `Runtime Manager` continues to receive a verbatim `image_ref` from the + start envelope and never resolves engine versions itself. ## Administrative Access Model @@ -1457,7 +1540,7 @@ Recommended order for implementation is: 6. **Game Lobby Service** (implemented) Platform game records, membership, invites, applications, approvals, schedules, user-facing lists, pre-start lifecycle. -7. **Runtime Manager** +7. **Runtime Manager** (implemented) Dedicated Docker-control service for container lifecycle (start, stop, restart, semver-patch, cleanup) and inspect/health monitoring through Docker events, periodic inspect, and active HTTP probes. Driven @@ -1466,7 +1549,19 @@ Recommended order for implementation is: `Admin Service` via the trusted internal REST surface. 8. **Game Master** - Running-game orchestration, engine version registry, runtime state, turn scheduler, engine API mediation, operational controls. + Single-instance running-game orchestrator. Owns the runtime state + (`game_id → engine_endpoint`, status, current turn, scheduling, engine + health), the engine version registry consumed synchronously by + `Game Lobby` for `image_ref` resolution, and the platform mapping + `(user_id, race_name, engine_player_uuid)` per running game. Drives + the turn scheduler with the force-next-turn skip rule, mediates every + engine HTTP call (admin paths under `/api/v1/admin/*`, player paths + under `/api/v1/{command, order, report}`), and reacts to + `StateResponse.finished` by transitioning the runtime to `finished` and + publishing `game_finished`. Drives `Runtime Manager` synchronously over + REST for stop, restart, and patch; consumes `runtime:health_events` + from RTM; publishes `gm:lobby_events` (event-only, no heartbeat) and + `notification:intents`. Never opens the Docker SDK. 9. **Admin Service** Admin UI backend that orchestrates trusted APIs of other services. diff --git a/PG_PLAN.md b/PG_PLAN.md deleted file mode 100644 index d5fac13..0000000 --- a/PG_PLAN.md +++ /dev/null @@ -1,920 +0,0 @@ -# PostgreSQL Migration Plan - -This plan has been already implemented and stays here for historical reasons. - -It should NOT be threated as source of truth for service functionality. - -## Context - -The Galaxy Game project currently uses Redis as the only persistence backend -across all implemented services (`user`, `mail`, `notification`, `lobby`, -`gateway`, `authsession`). Redis serves both kinds of state: ephemeral and -runtime-coordination state (where it shines — Streams, caches, replay keys, -runtime queues, session caches, leases) and table-shaped business state where -it is a poor fit (durable user accounts, entitlements/sanctions, mail audit -records, notification routes/idempotency, lobby memberships and invites). -Replication and standby for Redis are not configured anywhere. There is no -SQL/migration tooling in the repo at all. - -We migrate to a Redis + PostgreSQL split where each backend owns the data it -serves best. PostgreSQL becomes the source of truth for table-shaped business -state, gives us ACID transactions, mature physical/logical replication, and -backup/restore via `pg_dump` and WAL archiving. Redis remains the source of -truth for streams, pub/sub, caches, leases, replay keys, rate limits, session -caches, runtime queues, and stream consumer offsets. - -The plan migrates only services already implemented and explicitly excludes -`galaxy/game`. It targets steady-state architecture rules first (one -authoritative document, `ARCHITECTURE.md`), then walks each service end to end -— code, tests, service-local README/docs, and integration suites — so that no -intermediate commit leaves docs and code in conflict. - -## Confirmed decisions (with project owner) - -1. **Documentation strategy**: `ARCHITECTURE.md` is updated as the very first - stage with the architecture-wide rules. Each per-service README and per- - service `docs/` change inside that service's own stage, paired with code - and tests. This keeps `ARCHITECTURE.md` ≡ policy, README ≡ current state, - and ensures any commit can be checked out without code/doc divergence. -2. **Service scope**: full migration of durable storage to PostgreSQL for - `user`, `mail`, `notification`, `lobby`. Only Redis configuration refactor - (master/replica + mandatory password, drop `TLS_ENABLED` / `USERNAME`) for - `gateway` and `authsession` — these services intentionally stay Redis- - only. `geoprofile` has no implementation; its `PLAN.md` and `README.md` - absorb the new persistence rules so future implementation follows them. -3. **Idempotency and retry-schedule placement**: idempotency records and - retry schedule queues live in PostgreSQL on the same table as the durable - record they protect (`(producer, idempotency_key)` UNIQUE on `records`, - `next_attempt_at` column on `deliveries` / `routes`). One source of truth, - no dual-write hazard between PG and Redis ZSETs. -4. **Stack**: `github.com/jackc/pgx/v5` driver, exposed as `*sql.DB` via - `github.com/jackc/pgx/v5/stdlib`. `github.com/go-jet/jet/v2` for - type-safe query building + code generation, generated against a - testcontainers PostgreSQL instance with migrations applied (Makefile - target per service). `github.com/pressly/goose/v3` library API for - embedded migrations applied at service startup; the `goose` CLI may be - used for local development and rollback investigations but is not in the - service binary path. -5. **Code**: all postgres queries must use pre-generated code with `jet` and - appropriate builders rather than raw SQL queries, unless this usage cannot - achive the goal of businness-scenario due to lack of `go-jet` functionality. - -## Architectural rules (target steady-state) - -These rules land in `ARCHITECTURE.md` in Stage 0 and govern every subsequent -service stage. - -### Backend assignment - -PostgreSQL is the source of truth for: - -- Domain entities with table-shaped business state (`accounts`, - `entitlement_records`, `sanction_records`, `limit_records`, - `blocked_emails`, `deliveries`, `attempts`, `dead_letters`, - `malformed_commands`, `notification_records`, `notification_routes`, - `games`, `applications`, `invites`, `memberships`, `race_names`). -- Idempotency records (UNIQUE constraint on the durable table, not a - separate kv). -- Retry scheduling state (`next_attempt_at` column + supporting index on the - durable table). -- Audit history records that must outlive any Redis snapshot. - -Redis is the source of truth for: - -- Redis Streams used as the event bus (`user:domain_events`, - `user:lifecycle_events`, `gm:lobby_events`, `runtime:job_results`, - `notification:intents`, `gateway:client-events`, `mail:delivery_commands`). -- Stream consumer offsets (small runtime coordination state, rebuildable). -- Caches and projections (gateway session cache). -- Replay reservation keys. -- Rate limit counters. -- Runtime coordination locks/leases (e.g. notification `route_leases`). -- Authentication challenge state and active session tokens (TTL-bounded; loss - is recoverable by re-authentication). -- Ephemeral per-game runtime aggregates that are deleted at game finish - (lobby `game_turn_stats`, `gap_activated_at`, capability evaluation - marker). - -### Database topology - -- Single PostgreSQL database `galaxy`. -- Schema-per-service: `user`, `mail`, `notification`, `lobby`. Reserved for - later: `geoprofile`. Not allocated unless needed: `gateway`, `authsession`. -- Per-service PostgreSQL role with grants restricted to its own schema - (defense-in-depth, simple to express in the initial migration). -- Authentication: username + password only. `sslmode=disable`. No client - certificates, no SCRAM channel binding, no custom auth plugins. -- Each service connects to one primary plus zero-or-more read-only replicas. - In this iteration only the primary is used; the replica pool is wired but - receives no traffic. Future read-routing is non-breaking. - -### Redis topology - -- Each service connects to one master Redis plus zero-or-more replica Redis - hosts. -- All connections use a mandatory password. `USERNAME`/ACL not used. TLS off. -- In this iteration only the master is used; the replica list is wired but - unused — non-breaking switch later when the app starts routing reads. -- Existing env vars `*_REDIS_TLS_ENABLED`, `*_REDIS_USERNAME` are removed - (hard rename; no backward-compat shim — fresh project, no production - deploys to migrate). - -### Library stack - -- Driver: `github.com/jackc/pgx/v5` (modern, actively maintained), exposed - to `database/sql` via `github.com/jackc/pgx/v5/stdlib` so go-jet's - `qrm.Queryable` interface is satisfied without changes. -- Query layer: `github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated - code lives under each service `internal/adapters/postgres/jet/`, - regenerated via a `make jet` target and committed to the repo. -- Migrations: `github.com/pressly/goose/v3` library API; migration files - embedded via `//go:embed *.sql`; applied at startup, before opening any - HTTP/gRPC listener; non-zero exit on failure. -- Test infrastructure: `github.com/testcontainers/testcontainers-go` plus - the `modules/postgres` submodule; the same setup is reused by `make jet` - to host a transient instance for jet codegen. - -### Migration discipline - -- Forward-only sequence-numbered files: `00001_init.sql`, `00002_*.sql`, … -- Lowercase snake_case names; goose `-- +goose Up` / `-- +goose Down` - markers; statements that need transaction-wrapping use - `-- +goose StatementBegin` / `-- +goose StatementEnd`. -- Migrations apply at service startup; service exits non-zero on failure. -- Per-service decision record at `galaxy//docs/postgres-migration.md` - captures schema decisions and any non-trivial deviation from the rules. - -### Per-service code organisation - -```text -galaxy// - internal/ - adapters/ - postgres/ - migrations/ # *.sql files + migrations.go (//go:embed) - jet/ # generated; commit-checked - / # adapter implementations matching internal/ports - config/ - config.go # adds Postgres + new Redis schema - Makefile # `jet` target: testcontainers + goose + jet -``` - -### Test patterns - -- Per-service unit tests against a real PostgreSQL via - `testcontainers-go`; replace the corresponding miniredis test path where - storage moved to PG. -- Shared port-test suites (e.g. `lobby/internal/ports/racenamedirtest/`) - gain a Postgres harness; they remain backend-agnostic in shape. -- `integration/internal/harness/postgres_container.go` is added; integration - suites that need PG declare it next to their existing Redis container. -- Stub adapters (`*stub/`) are kept where the in-memory port is useful for - tests that don't need a real backend. Redis adapters that previously - implemented these ports are removed (no dead code). - -### Configuration env vars (target) - -For each service `` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`, -`GATEWAY`, `AUTHSESSION` }: - -- `_REDIS_MASTER_ADDR` (required) -- `_REDIS_REPLICA_ADDRS` (optional, comma-separated; default empty) -- `_REDIS_PASSWORD` (required) -- `_REDIS_DB` (default 0) -- `_REDIS_OPERATION_TIMEOUT` (default 250ms) - -For PG-backed services (`USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`): - -- `_POSTGRES_PRIMARY_DSN` (required; - e.g. `postgres://userservice:secret@postgres:5432/galaxy?search_path=user&sslmode=disable`) -- `_POSTGRES_REPLICA_DSNS` (optional, comma-separated) -- `_POSTGRES_OPERATION_TIMEOUT` (default 1s) -- `_POSTGRES_MAX_OPEN_CONNS` (default 25) -- `_POSTGRES_MAX_IDLE_CONNS` (default 5) -- `_POSTGRES_CONN_MAX_LIFETIME` (default 30m) - -DSN sets `search_path=` so unqualified table references resolve into -the service-owned schema; `sslmode=disable` is set explicitly per the -"no TLS" requirement. - -Service-prefix-specific stream/keyspace env vars (`*_REDIS_DOMAIN_EVENTS_STREAM`, -`*_REDIS_LIFECYCLE_EVENTS_STREAM`, `*_REDIS_KEYSPACE_PREFIX`, -`MAIL_REDIS_COMMAND_STREAM`, etc.) keep their current names and semantics — -they describe stream/key shapes, not connection topology. - ---- - -## Stages - -Each stage is independently executable and shippable. - -### ~~Stage 0~~ — Architecture-wide rules and PG_PLAN.md materialisation - -This stage is implemented. - -**Goal**: land the steady-state rules in `ARCHITECTURE.md` and place -`PG_PLAN.md` at the project root so subsequent `/stage-implementation` -invocations have an authoritative reference. - -**Actions**: - -1. Write the contents of this plan file to `/Users/id/src/go/galaxy/PG_PLAN.md`. -2. Add a new section to `ARCHITECTURE.md` (e.g. `§9 Persistence Backends`) - capturing every rule under the *Architectural rules* heading above: - backend assignment, database/Redis topology, library stack, migration - discipline, code organisation, test patterns, env-var conventions. -3. Add a short *Migration Window* sub-section to `ARCHITECTURE.md` noting - that until all `PG_PLAN.md` stages complete, each service's `README.md` - continues to describe its actual current state — this caveat is removed - in Stage 9. -4. Adjust `ARCHITECTURE.md §8` (publisher rules) so cross-references - distinguish "Redis Stream" (event bus, stays Redis) from "PG-backed - table" (durable record). - -**Files (modified / new)**: - -- `/Users/id/src/go/galaxy/PG_PLAN.md` — new -- `/Users/id/src/go/galaxy/ARCHITECTURE.md` — modified - -**Out of scope**: zero service code, zero per-service README/docs, zero -`go.mod` changes, zero new dependencies in service modules. - -**Verification**: - -- `git diff --stat` reports two paths only: `PG_PLAN.md`, `ARCHITECTURE.md`. -- `ARCHITECTURE.md` reads coherently end to end, with the new section - cross-referenced from §8 and from any other place that today says - "Redis is the v1 backend". -- Manual: read `PG_PLAN.md` top to bottom, confirm every architectural - decision matches the section in `ARCHITECTURE.md`. - ---- - -### ~~Stage 1~~ — Shared infrastructure packages (`pkg/postgres`, `pkg/redisconn`) - -This stage is implemented. - -**Goal**: provide one canonical helper each for Postgres and Redis so per- -service stages don't reinvent connection/migration wiring. No service -consumes them yet. - -**Files (new)**: - -- `pkg/postgres/config.go` — `Config` struct (PrimaryDSN, ReplicaDSNs, - OperationTimeout, MaxOpenConns, MaxIdleConns, ConnMaxLifetime); helper - `LoadFromEnv(prefix string) (Config, error)` that reads - `_POSTGRES_*`. -- `pkg/postgres/open.go` — `OpenPrimary(ctx, cfg) (*sql.DB, error)` and - `OpenReplicas(ctx, cfg) ([]*sql.DB, error)` using - `pgx.ConnConfig` → `stdlib.OpenDB(...)`; configures pool sizes and - per-statement context timeout. -- `pkg/postgres/migrate.go` — `RunMigrations(ctx context.Context, db *sql.DB, - fs embed.FS) error` wrapping `goose.SetBaseFS(fs)` + `goose.UpContext`. -- `pkg/postgres/otel.go` — `Instrument(db *sql.DB, telemetry telemetry.Runtime)` - applying `otelsql.RegisterDBStatsMetrics` and statement spans. -- `pkg/postgres/postgres_test.go` — testcontainers-backed smoke test: - open primary, run a one-line migration, insert/select. -- `pkg/redisconn/config.go` — `Config` struct (MasterAddr, ReplicaAddrs, - Password, DB, OperationTimeout); helper `LoadFromEnv(prefix string) - (Config, error)` that reads `_REDIS_*` (the new shape only; - rejects deprecated TLS/USERNAME vars with a clear error). -- `pkg/redisconn/client.go` — `NewMasterClient(cfg) *redis.Client` and - `NewReplicaClients(cfg) []*redis.Client` (latter returns nil/empty when - replicas not configured). -- `pkg/redisconn/otel.go` — `Instrument(client *redis.Client, - telemetry telemetry.Runtime)` applying `redisotel.InstrumentTracing` / - `InstrumentMetrics`. -- `pkg/redisconn/redisconn_test.go` — miniredis-backed config and master - client tests. - -**Files (touched)**: - -- `pkg/go.mod` — add `github.com/jackc/pgx/v5`, - `github.com/jackc/pgx/v5/stdlib`, `github.com/pressly/goose/v3`, - `github.com/testcontainers/testcontainers-go/modules/postgres`, - `github.com/XSAM/otelsql` (for db instrumentation; alternative: - `go.nhat.io/otelsql` — pick one in implementation). -- `go.work` — confirm `pkg/` is registered (already is). - -**Verification**: - -- `cd /Users/id/src/go/galaxy/pkg && go test ./postgres/... ./redisconn/...` - passes locally with Docker available. -- `go vet ./...` clean. - ---- - -### ~~Stage 2~~ — Integration test harness extension - -This stage is implemented. - -**Goal**: extend `integration/internal/harness/` with a Postgres container -helper and a service-bootstrap helper that builds the per-service DSN with -the right `search_path`. All existing integration suites stay green. - -**Files (new)**: - -- `integration/internal/harness/postgres_container.go` — - `StartPostgresContainer(t testing.TB) *PostgresRuntime`. The runtime - exposes `BaseDSN()`, `DSNForSchema(schema, role string) string`, and - `EnsureRoleAndSchema(ctx, schema, role, password string) error` so each - test can prepare an isolated schema for the service it is booting. -- `integration/internal/harness/postgres_container_test.go` — smoke test. - -**Files (touched)**: - -- `integration/internal/harness/binary.go` — extend `Process`/launch - helpers with `WithPostgres(rt *PostgresRuntime, schema, role string)` - that injects the right `_POSTGRES_PRIMARY_DSN`. (Existing API already - takes `env map[string]string`; this is a thin wrapper.) -- `integration/go.mod` — add the testcontainers Postgres module. - -**Out of scope**: no integration suite is yet wired to Postgres; each -service stage wires in its suites. - -**Verification**: - -- `cd integration && go test ./internal/harness/...` passes. -- `cd integration && go test ./...` still green for all existing suites - (Redis-only services remain Redis-only). - ---- - -### ~~Stage 3~~ — User Service migration (pilot) - -**Goal**: replace User Service's Redis durable storage with PostgreSQL. The -two Redis Streams (`user:domain_events`, `user:lifecycle_events`) remain on -Redis. This stage is the pilot; subsequent service stages copy its shape. - -**Schema (`user` schema)**: - -- `accounts` (user_id PK, email UNIQUE, user_name UNIQUE, display_name, - preferred_language, time_zone, declared_country, created_at, updated_at, - deleted_at). -- `blocked_emails` (email PK, reason_code, blocked_at, actor_type, actor_id, - resolved_user_id). -- `entitlement_records` (record_id PK, user_id FK, plan_code, is_paid, - starts_at, ends_at, source, actor_type, actor_id, reason_code, - updated_at). -- `entitlement_snapshots` (user_id PK FK → accounts, …current effective - values mirroring Redis snapshot shape). -- `sanction_records` (record_id PK, user_id FK, sanction_code, scope, - reason_code, actor_type, actor_id, applied_at, expires_at, removed_at, - removed_by_type, removed_by_id, removed_reason_code). -- `sanction_active` (user_id, sanction_code, record_id) PRIMARY KEY - (user_id, sanction_code). -- `limit_records`, `limit_active` — analogous to sanctions. -- Indexes: `accounts(created_at DESC, user_id DESC)` for newest-first - pagination; `accounts(declared_country)`; - `entitlement_snapshots(plan_code, is_paid)`; - `entitlement_snapshots(ends_at) WHERE is_paid AND ends_at IS NOT NULL`; - `sanction_active(sanction_code)`; `limit_active(limit_code)`. Eligibility - flags become computed predicates on these columns. - -**Files (new)**: - -- `galaxy/user/internal/adapters/postgres/migrations/00001_init.sql` — - full schema with grants (`GRANT USAGE ON SCHEMA user TO userservice; - GRANT … ON ALL TABLES …;`). -- `galaxy/user/internal/adapters/postgres/migrations/migrations.go` — - `//go:embed *.sql` and a `Migrations() embed.FS` accessor. -- `galaxy/user/internal/adapters/postgres/jet/...` — generated code - (commit-checked). -- `galaxy/user/internal/adapters/postgres/userstore/store.go` — Postgres - implementation of `ports.UserAccountStore` and `ports.AuthDirectoryStore`. -- `galaxy/user/internal/adapters/postgres/userstore/entitlement_store.go` — - Postgres implementation of `EntitlementSnapshotStore` and - `EntitlementHistoryStore`. -- `galaxy/user/internal/adapters/postgres/userstore/policy_store.go` — - Postgres implementation of `SanctionStore` and `LimitStore`. -- `galaxy/user/internal/adapters/postgres/userstore/list_store.go` — - Postgres implementation of `UserListStore` (pagination + filters - expressed as SQL). -- `galaxy/user/internal/adapters/postgres/userstore/store_test.go` and - siblings — testcontainers-backed unit tests covering the same matrix the - current Redis tests cover. -- `galaxy/user/Makefile` — `jet` target. -- `galaxy/user/docs/postgres-migration.md` — decision record (schema - shape, why we keep `entitlement_snapshots` denormalised, eligibility - expressed as SQL predicates, schema role grants). - -**Files (touched)**: - -- `galaxy/user/internal/config/config.go` — add Postgres config; refactor - Redis config to master/replica/password (drop `TLS_ENABLED`, `USERNAME`). -- `galaxy/user/internal/config/config_test.go` — update to new env shape. -- `galaxy/user/internal/app/runtime.go` — open Postgres pool, run - migrations on startup before listeners open, wire postgres adapters - into services. Redis client now serves only the two stream publishers. -- `galaxy/user/README.md` — replace "Redis-backed user state" with the - new persistence model, update env-var section. -- `galaxy/user/docs/runbook.md`, `galaxy/user/docs/runtime.md`, - `galaxy/user/docs/examples.md` — update storage references and - config sections. -- `galaxy/user/go.mod` — add `github.com/jackc/pgx/v5{,/stdlib}`, - `github.com/pressly/goose/v3`, `github.com/go-jet/jet/v2`, - `github.com/testcontainers/testcontainers-go/modules/postgres`. Use - `pkg/postgres`, `pkg/redisconn`. - -**Files (deleted)**: - -- `galaxy/user/internal/adapters/redis/userstore/` — entire directory. -- The portions of `galaxy/user/internal/adapters/redisstate/keyspace.go` - that defined account/entitlement/sanction/limit/index keys (keep only - what `domainevents` and `lifecycleevents` publishers still require — if - none, delete the file outright). - -**Files retained on Redis**: - -- `galaxy/user/internal/adapters/redis/domainevents/publisher.go`. -- `galaxy/user/internal/adapters/redis/lifecycleevents/publisher.go`. - -**Touched integration suites** (each gets a Postgres container in addition -to the existing Redis one): - -- `integration/authsessionuser/` -- `integration/gatewayauthsessionuser/` -- `integration/gatewayauthsessionusermail/` -- `integration/notificationuser/` -- `integration/lobbyuser/` - -**Verification**: - -- `cd galaxy/user && make jet && go test ./...` (Docker needed). -- `cd integration && go test ./authsessionuser/... ./gatewayauthsessionuser/... ./gatewayauthsessionusermail/... ./notificationuser/... ./lobbyuser/...` -- Manual smoke against a `docker-compose` stack (PG + Redis with - passwords) using flows from `galaxy/user/docs/examples.md`. - ---- - -### ~~Stage 4~~ — Mail Service migration - -This stage is implemented. - -**Goal**: move durable mail storage (deliveries, attempts, dead letters, -malformed commands, payloads, idempotency, attempt schedule) into -PostgreSQL. Keep Redis only for the inbound `mail:delivery_commands` -stream and its consumer offset. - -**Schema (`mail` schema)**: - -- `deliveries` (delivery_id PK, source, status, recipient_envelope JSONB, - subject, text_body, html_body, payload_mode, template_id, - idempotency_source, idempotency_key, locale_fallback_used, - next_attempt_at, attempt_count, max_attempts, created_at, updated_at). - - INDEX (status, next_attempt_at) for the scheduler. - - UNIQUE (idempotency_source, idempotency_key) — the idempotency record - IS this row (no separate kv). - - INDEX (created_at DESC) for operator listings; INDEX on status, source, - template_id, recipient as needed. -- `attempts` (delivery_id FK, attempt_no, status, provider_summary, - scheduled_for_ms, started_at_ms, completed_at_ms, PRIMARY KEY - (delivery_id, attempt_no)). -- `dead_letters` (delivery_id PK FK, final_attempt_count, max_attempts, - failure_classification, failure_message, created_at_ms). -- `delivery_payloads` (delivery_id PK FK, template_variables JSONB). -- `malformed_commands` (stream_entry_id PK, failure_code, failure_message, - raw_fields JSONB, recorded_at_ms; INDEX created_at). - -**Files**: mirror Stage 3 (postgres adapter package, migrations, jet -codegen, Makefile, decision record, removal of corresponding -`internal/adapters/redisstate/*` files for migrated entities, retention -of stream offset and consumer wiring on Redis). - -**Worker change**: the mail attempt scheduler loop replaces -`ZRANGEBYSCORE` over `mail:attempt_schedule` with -`SELECT … FROM deliveries WHERE status IN ('queued','retry_pending') AND next_attempt_at <= now() ORDER BY next_attempt_at LIMIT N FOR UPDATE SKIP LOCKED`. - -**Files (deleted)**: - -- `galaxy/mail/internal/adapters/redisstate/auth_acceptance_store.go` -- `galaxy/mail/internal/adapters/redisstate/generic_acceptance_store.go` -- `galaxy/mail/internal/adapters/redisstate/attempt_execution_store.go` -- `galaxy/mail/internal/adapters/redisstate/operator_store.go` -- `galaxy/mail/internal/adapters/redisstate/malformed_command_store.go` -- `galaxy/mail/internal/adapters/redisstate/render_store.go` -- The portions of `galaxy/mail/internal/adapters/redisstate/keyspace.go` - no longer used (`mail:attempt_schedule`, `mail:idempotency:*`, all - delivery/attempt/dead-letter/index keys). - -**Files retained on Redis**: - -- `galaxy/mail/internal/adapters/redisstate/stream_offset_store.go` (offset - for `mail:delivery_commands` consumer). -- The command stream consumer wiring itself. - -**Touched integration suites**: - -- `integration/authsessionmail/` -- `integration/gatewayauthsessionmail/` -- `integration/gatewayauthsessionusermail/` -- `integration/notificationmail/` - -**Verification**: per Stage 3 pattern; plus end-to-end smoke that pushes -a delivery through retry_pending → provider_accepted using the SMTP stub. - ---- - -### ~~Stage 5~~ — Notification Service migration - -This stage is implemented. - -**Goal**: move durable notification storage (records, routes, idempotency, -dead letters, malformed intents) into PostgreSQL. Keep Redis for the -inbound `notification:intents` stream, the outbound `gateway:client-events` -stream, the outbound `mail:delivery_commands` stream, the corresponding -stream offsets, and the short-lived per-route lease (`route_leases:*`). - -**Schema (`notification` schema)**: - -- `records` (notification_id PK, notification_type, producer, audience_kind, - recipient_user_ids JSONB, payload JSONB, idempotency_key, - request_fingerprint, request_id, trace_id, occurred_at_ms, - accepted_at_ms, updated_at_ms). - - UNIQUE (producer, idempotency_key) — idempotency record IS this row. -- `routes` (notification_id, route_id, channel, recipient_ref, status, - attempt_count, max_attempts, next_attempt_at_ms, resolved_email, - resolved_locale, last_error_classification, last_error_message, - last_error_at_ms, created_at_ms, updated_at_ms, published_at_ms, - dead_lettered_at_ms, skipped_at_ms, PRIMARY KEY - (notification_id, route_id)). - - INDEX (status, next_attempt_at_ms) for the scheduler. -- `dead_letters` (notification_id, route_id PK FK, channel, recipient_ref, - final_attempt_count, max_attempts, failure_classification, - failure_message, recovery_hint, created_at_ms). -- `malformed_intents` (stream_entry_id PK, notification_type, producer, - idempotency_key, failure_code, failure_message, raw_fields JSONB, - recorded_at_ms). - -**Worker change**: route publisher selects work via the same -`FOR UPDATE SKIP LOCKED` pattern as Mail. The Redis lease is still used -as a short-lived, per-process exclusivity hint atop the SQL claim. - -**Files (deleted)**: - -- `galaxy/notification/internal/adapters/redisstate/acceptance_store.go` -- `galaxy/notification/internal/adapters/redisstate/route_state_store.go` -- `galaxy/notification/internal/adapters/redisstate/malformed_intent_store.go` -- The portions of - `galaxy/notification/internal/adapters/redisstate/keyspace.go` no longer - used (records, routes, idempotency, dead_letters, malformed_intents). - -**Files retained on Redis**: - -- `galaxy/notification/internal/adapters/redisstate/stream_offset_store.go`. -- Route lease key generator (still under `redisstate/`, narrowed to leases - only). -- All stream consumer/publisher wiring. - -**Touched integration suites**: - -- `integration/notificationgateway/` -- `integration/notificationmail/` -- `integration/notificationuser/` - ---- - -### ~~Stage 6A~~ — Lobby Service: core enrollment entities - -**Goal**: move `Game`, `Application`, `Invite`, `Membership` records and -their indexes into PostgreSQL. RaceNameDirectory, GameTurnStats, -GapActivation, EvaluationGuard, StreamOffset remain on Redis until later -sub-stages. - -**Schema (`lobby` schema, partial)**: - -- `games` (game_id PK, owner_id, kind ('public'|'private'), status, - created_at, updated_at, runtime_snapshot JSONB, runtime_binding JSONB, - …other denormalised game settings). - - INDEX (status, created_at). - - INDEX (owner_id) WHERE kind = 'private'. -- `applications` (application_id PK, game_id FK, user_id, status, - canonical_key, submitted_at, decided_at). - - PARTIAL UNIQUE INDEX (user_id, game_id) WHERE status = 'active' — - enforces the single-active constraint at the DB level (replaces - `lobby:user_game_application:*:*`). - - INDEX (game_id), INDEX (user_id). -- `invites` (invite_id PK, game_id FK, inviter_id, invitee_id, race_name, - status, created_at, expires_at, decided_at). - - INDEX (game_id), INDEX (invitee_id), INDEX (inviter_id). - - INDEX (status, expires_at) for any expiration scanner if needed. -- `memberships` (membership_id PK, game_id FK, user_id, status, joined_at, - canonical_key, …). - - INDEX (game_id), INDEX (user_id). - -**Files (new)**: - -- `galaxy/lobby/internal/adapters/postgres/migrations/00001_core_entities.sql`. -- `galaxy/lobby/internal/adapters/postgres/migrations/migrations.go`. -- `galaxy/lobby/internal/adapters/postgres/jet/...`. -- `galaxy/lobby/internal/adapters/postgres/gamestore/store.go`. -- `galaxy/lobby/internal/adapters/postgres/applicationstore/store.go`. -- `galaxy/lobby/internal/adapters/postgres/invitestore/store.go`. -- `galaxy/lobby/internal/adapters/postgres/membershipstore/store.go`. -- Test files for each store using the existing test patterns. -- `galaxy/lobby/Makefile` (`jet` target). -- `galaxy/lobby/docs/postgres-migration.md` (decision record covering - this sub-stage and what is intentionally left for 6B/6C). - -**Files (touched)**: - -- `galaxy/lobby/internal/config/config.go` — add Postgres config; refactor - Redis config to the new shape. -- `galaxy/lobby/internal/app/runtime.go` — open Postgres pool, run - migrations on startup, wire core PG-backed stores into services. - RaceNameDirectory and stats/guard stores still wired to Redis until 6B/6C. -- `galaxy/lobby/README.md` and `galaxy/lobby/docs/runbook.md` — updated - to describe core entities on PG, RND/stats still on Redis until 6B/6C. - -**Files (deleted)**: - -- `galaxy/lobby/internal/adapters/redisstate/gamestore.go`, - `applicationstore.go`, `invitestore.go`, `membershipstore.go`. -- The corresponding sections of `redisstate/keyspace.go`. - -**Stub adapters retained**: `gamestub/`, `applicationstub/`, `invitestub/`, -`membershipstub/` stay — they are pure in-memory ports useful for tests -that don't need real PG. - -**Touched integration suites**: - -- `integration/lobbyuser/` -- `integration/lobbynotification/` - -**Verification**: per Stage 3 pattern; plus the existing lobby HTTP -contract tests against the public/internal ports. - ---- - -### ~~Stage 6B~~ — Lobby Service: RaceNameDirectory - -This stage is implemented. - -**Goal**: replace the Lua-backed Redis `RaceNameDirectory` with a PG -implementation that preserves the two-tier model (registered / reservation / -pending_registration) and atomic registration semantics via SQL -transactions and (where required) advisory locks. - -**Schema (additions to `lobby` schema)**: - -- `race_names` (canonical_key PK, holder_user_id, binding_kind ('registered' - | 'reserved' | 'pending_registration'), source_game_id, eligible_until_ms, - registered_at_ms, reserved_at_ms). - - INDEX (holder_user_id) for `ListRegistered`/`ListReservations`/ - `ListPendingRegistrations` queries. - - PARTIAL INDEX (eligible_until_ms) WHERE binding_kind = - 'pending_registration' for the expiration scanner. - - The confusable-pair policy is enforced at write time inside - `BEGIN … COMMIT` transactions; `Reserve`/`Register`/ - `MarkPendingRegistration` use `SELECT … FOR UPDATE` on the canonical - keys involved (or PG advisory locks keyed by `hashtext(canonical_key)`) - to serialise concurrent attempts. - -**Files (new)**: - -- `galaxy/lobby/internal/adapters/postgres/migrations/00002_race_names.sql`. -- `galaxy/lobby/internal/adapters/postgres/racenamedir/directory.go` — - Postgres implementation of `ports.RaceNameDirectory`. -- `galaxy/lobby/internal/adapters/postgres/racenamedir/directory_test.go` - — runs the existing shared suite at - `galaxy/lobby/internal/ports/racenamedirtest/suite.go`. - -**Files (touched)**: - -- `galaxy/lobby/internal/app/runtime.go` — wire PG RND. -- `galaxy/lobby/internal/ports/racenamedirtest/suite.go` — only - shape-preserving updates if the suite assumed Redis-only behaviour - (e.g. SCAN-based list ordering). -- `galaxy/lobby/README.md`, `galaxy/lobby/docs/runbook.md` — RND now PG- - backed; canonical_lookup cache no longer needed (PG indexed lookup is - fast enough; remove the Redis cache key from `redisstate/keyspace.go`). - -**Files (deleted)**: - -- `galaxy/lobby/internal/adapters/redisstate/racenamedir.go` and the - embedded Lua scripts. -- `galaxy/lobby/internal/adapters/racenamestub/` stays (useful for unit - tests that don't need PG). - -**Worker change**: the pending-registration expiration worker switches -from `ZRANGEBYSCORE` on `lobby:race_names:pending_index` to -`SELECT … FROM race_names WHERE binding_kind='pending_registration' AND eligible_until_ms <= now()`. - -**Verification**: shared port suite (`racenamedirtest`) green against PG -adapter; lobby unit tests green; `integration/lobbyuser/`, -`integration/lobbynotification/` green. - ---- - -### ~~Stage 6C~~ — Lobby Service: workers, ephemeral stores, cleanup - -This stage is implemented. - -**Goal**: finish the lobby migration. Confirm what stays Redis-only, -update workers that touch both backends, drop dead Redis adapters. - -**Stays on Redis (per architectural rules)**: - -- `GameTurnStatsStore` — ephemeral per-game aggregate, deleted at game - finish, rebuildable from GM events. -- `EvaluationGuardStore` — ephemeral marker. -- `GapActivationStore` — short-lived gap-window timestamp cache. -- `StreamOffsetStore` — runtime coordination per the architectural rule. -- All stream consumers and publishers (`gm:lobby_events`, - `runtime:job_results`, `user:lifecycle_events`, `notification:intents`). - -This is documented in `galaxy/lobby/docs/postgres-migration.md`. - -**Files (touched)**: - -- `galaxy/lobby/internal/worker/gmevents/consumer.go` — write durable - updates via PG-backed `GameStore`. -- `galaxy/lobby/internal/worker/runtimejobresult/consumer.go` — same. -- `galaxy/lobby/internal/adapters/userlifecycle/consumer.go` (and the - worker that drives it) — RND release, membership/application/invite - cascade all flow through PG. -- `galaxy/lobby/internal/worker/pendingregistration/worker.go` — PG-based - scan, no Redis ZSET. -- `galaxy/lobby/internal/worker/enrollmentautomation/worker.go` — uses PG - `GameStore.GetByStatus("enrollment_open")`. -- `galaxy/lobby/internal/adapters/redisstate/keyspace.go` — pruned to the - remaining Redis keys (turn stats, gap activation, evaluation guard, - stream offsets, lifecycle stream consumer state). -- `galaxy/lobby/README.md`, `galaxy/lobby/docs/runtime.md`, - `galaxy/lobby/docs/runbook.md`, `galaxy/lobby/docs/examples.md` — - finalised storage descriptions. - -**Files (deleted)**: - -- Anything left in `galaxy/lobby/internal/adapters/redisstate/` whose - only consumer was a port now PG-backed (see 6A/6B deletions). - -**Verification**: - -- All previously-green lobby unit tests pass with PG-backed adapters. -- `integration/lobbyuser/`, `integration/lobbynotification/` pass. -- `grep -rn "redisstate" galaxy/lobby/internal/` returns only the keys - intentionally retained on Redis. - ---- - -### ~~Stage 7~~ — Gateway and Auth/Session: Redis configuration refactor - -This stage is implemented. - -**Goal**: apply the new Redis configuration shape (master/replica/password, -drop TLS/USERNAME) to Gateway and Auth/Session. No PG migration; these -services intentionally stay Redis-only. - -**Files (touched)**: - -- `galaxy/gateway/internal/config/config.go` — switch `RedisConfig` - fields to the `pkg/redisconn.Config` shape; update the three - prefixes: `GATEWAY_SESSION_CACHE_REDIS_*`, `GATEWAY_REPLAY_REDIS_*`, - `GATEWAY_SESSION_EVENTS_REDIS_*`. Drop `TLS_ENABLED`, `USERNAME`. -- `galaxy/gateway/internal/session/redis.go`, - `galaxy/gateway/internal/replay/redis.go`, - `galaxy/gateway/internal/events/subscriber.go` — adopt new client - constructor via `pkg/redisconn`. -- `galaxy/gateway/internal/config/config_test.go`, - `galaxy/gateway/internal/session/redis_test.go`, - `galaxy/gateway/internal/replay/redis_test.go` — updated to new env shape. -- `galaxy/authsession/internal/config/config.go` — same pattern; drop - TLS, USERNAME. -- `galaxy/authsession/internal/adapters/redis/sessionstore/store.go`, - `challengestore/store.go`, `projectionpublisher/publisher.go`, - `sendemailcodeabuse/protector.go`, `configprovider/store.go` — adopt - new client. -- `galaxy/authsession/internal/config/config_test.go` — updated. -- `galaxy/gateway/README.md`, `galaxy/authsession/README.md`, - `galaxy/gateway/docs/runbook.md`, `galaxy/authsession/docs/runbook.md` - — note that Redis-only is intentional and reference the `ARCHITECTURE.md` - rule on TTL-bounded auth state. - -**No deletions of business logic**; only env-var refactor and adapter -plumbing through `pkg/redisconn`. - -**Touched integration suites**: - -- `integration/gatewayauthsession/` -- `integration/authsession/` -- (every suite that boots gateway or authsession picks up the new env vars - via the harness; confirm none still pass `*_REDIS_TLS_ENABLED`). - -**Verification**: - -- `cd galaxy/gateway && go test ./...` -- `cd galaxy/authsession && go test ./...` -- `cd integration && go test ./gatewayauthsession/... ./authsession/...` - ---- - -### ~~Stage 8~~ — GeoProfile: documentation only - -**Goal**: ensure the GeoProfile plan and README reflect the new -persistence rules so its future implementation follows them. No code -exists yet. - -**Files (touched)**: - -- `galaxy/geoprofile/PLAN.md` — add a stage referencing `pkg/postgres` - and `pkg/redisconn`; specify that observed-country aggregates, - declared_country history and review records will live in a `geoprofile` - schema, while ephemeral per-session signals (if any) stay on Redis. -- `galaxy/geoprofile/README.md` — note ownership of the `geoprofile` - schema and the stack choices. - -**No code change**. - ---- - -### ~~Stage 9~~ — Final sweep - -**Goal**: confirm no dead Redis adapter code, no orphaned stub, no -broken doc reference. Remove the *Migration Window* caveat from -`ARCHITECTURE.md` once all stages are done. - -**Activities**: - -- Walk every PG-backed service: `grep -rn "redis" galaxy//internal/adapters/` - and verify every match belongs to a still-active stream/cache/runtime - use case. -- Walk integration suites: confirm each one provisions only the - containers it actually needs; no stale env vars. -- Update `ARCHITECTURE.md` to drop the *Migration Window* sub-section. -- Combine sequences of migration `.sql` files into a single first file. - Rewrite SQL-code, not just concat. - The reason is that project still in in development state and all schema updates - can go directly in the only and first step of relevant migrations. This should - be represented in `ARCHITECTURE.md` as well. -- One round of `go test ./...` in every module plus - `cd integration && go test ./...`. - -**Verification**: - -- All tests pass in every module. -- No file matches `// TODO.*postgres` or `// TODO.*migrate`. -- `git grep -n REDIS_TLS_ENABLED REDIS_USERNAME` returns nothing under - `galaxy/` (these env vars are fully retired). - ---- - -## Verification strategy (whole project) - -After each stage: - -- `cd /Users/id/src/go/galaxy/pkg && go test ./...` -- `cd /Users/id/src/go/galaxy/ && go test ./...` - (with Docker available for testcontainers). -- `cd /Users/id/src/go/galaxy/integration && go test .//...` -- Manual smoke against a `docker-compose` stack (PG + Redis, both with - passwords) using the example flows in each service's `docs/examples.md`. - -After Stage 9: - -- `cd /Users/id/src/go/galaxy/integration && go test ./...` end to end - against real PG + real Redis. -- Confirm `git grep -nE 'REDIS_(TLS_ENABLED|USERNAME)'` returns nothing - under `galaxy/`. -- Confirm `git grep -n 'TODO.*(postgres|migrate)'` returns nothing. - -## Out of scope - -- `galaxy/game` — explicitly excluded by the project owner. -- Production deployment manifests (Helm/k8s) — local `docker-compose` is - enough for development. -- Backup/restore tooling configuration — `pg_dump` and WAL archiving are - available out of the box; operational setup is not part of this plan. -- Sentinel/Cluster Redis topology code paths — config exposes replica - addresses for future use; no failover routing implemented yet. -- Read-traffic routing to PG replicas — config exposes - `*_POSTGRES_REPLICA_DSNS` for future use; no routing implemented yet. -- `golangci-lint` config addition — not part of this migration. -- CI pipeline — no `.github/workflows/` exists; not added by this plan. - -## Risks and notes - -- **`go-jet` codegen requires a live database**. The `make jet` target - per service uses `testcontainers-go` to bring up a transient PG, applies - the same goose migrations the service applies at startup, then runs - `jet -dsn=… -path=internal/adapters/postgres/jet`. Generated code is - committed; consumers don't need Docker just to build. -- **Schema-per-service vs single-DB cross-service joins**: there are no - cross-schema joins in this plan. Each service reads only its own schema; - cross-service data flows go via Redis Streams (event bus) or HTTP - contracts (User Service is queried by Lobby for eligibility) — same as - today. The DB-level role grants enforce this. -- **Pending registration expiration worker**: under Redis it scanned a - global ZSET; under PG it does an indexed scan. The partial index on - `eligible_until_ms WHERE binding_kind='pending_registration'` keeps the - scan cheap. -- **Idempotency under crash**: with idempotency expressed as a UNIQUE - constraint on the durable record, recovery is "the row either exists or - it doesn't" — no Redis-loss window where duplicates can sneak through. -- **lib/pq vs pgx (revisit)**: confirmed pgx/v5 + jet via stdlib adapter. - The `make jet` target will pass `-source=postgres` to jet (the dialect - is independent of which Go driver runs the queries at runtime). -- **No backward-compat shim for env vars**: `*_REDIS_TLS_ENABLED` and - `*_REDIS_USERNAME` are retired in one cut. Any external dev environment - that sets these will start failing fast at startup with a clear error - emitted by `pkg/redisconn.LoadFromEnv`. diff --git a/authsession/go.mod b/authsession/go.mod index f09789c..9de2d63 100644 --- a/authsession/go.mod +++ b/authsession/go.mod @@ -58,7 +58,7 @@ require ( github.com/modern-go/reflect2 v1.0.2 // indirect github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/pelletier/go-toml/v2 v2.3.0 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect diff --git a/authsession/go.sum b/authsession/go.sum index b69b933..88b875d 100644 --- a/authsession/go.sum +++ b/authsession/go.sum @@ -87,8 +87,7 @@ github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 h1:RWengNIwukTxcDr9 github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826/go.mod h1:TaXosZuwdSHYgviHp1DAtfrULt5eUgsSMsZf+YrPgl8= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/pelletier/go-toml/v2 v2.3.0 h1:k59bC/lIZREW0/iVaQR8nDHxVq8OVlIzYCOJf421CaM= github.com/pelletier/go-toml/v2 v2.3.0/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY= github.com/perimeterx/marshmallow v1.1.5 h1:a2LALqQ1BlHM8PZblsDdidgv1mWi1DgC2UmX50IvK2s= diff --git a/game/README.md b/game/README.md index e8b31a4..4d1f209 100644 --- a/game/README.md +++ b/game/README.md @@ -39,13 +39,54 @@ do not pass per-game limits. ## Endpoints The contract is the union of `openapi.yaml` and the technical liveness probe -described below. +described below. Endpoints split into two route classes: + +| Class | Path | Caller | Purpose | +| --- | --- | --- | --- | +| Admin (GM-only) | `POST /api/v1/admin/init` | `Game Master` | Initialise the engine with the race roster. | +| Admin (GM-only) | `GET /api/v1/admin/status` | `Game Master` | Read the full game state. | +| Admin (GM-only) | `PUT /api/v1/admin/turn` | `Game Master` | Generate the next turn. | +| Admin (GM-only) | `POST /api/v1/admin/race/banish` | `Game Master` | Deactivate a race after a permanent platform removal. | +| Player | `PUT /api/v1/command` | `Game Master` (forwarded from `Edge Gateway`) | Execute a batch of player commands. | +| Player | `PUT /api/v1/order` | `Game Master` | Validate and store a batch of player orders. | +| Player | `GET /api/v1/report` | `Game Master` | Fetch the per-player turn report. | +| Probe | `GET /healthz` | `Runtime Manager` | Technical liveness probe. | + +Admin paths are unauthenticated but are routed only from inside the +trusted network segment that connects `Game Master` to the engine +container. The engine does not enforce caller identity — network-level +segmentation is the boundary. Player paths apply the same rule and rely +on `Game Master` to forward only verified player payloads. ### Game endpoints Documented in [`openapi.yaml`](openapi.yaml). When the engine has not been -initialised through `POST /api/v1/init`, game endpoints respond `501 Not -Implemented` to make the uninitialised state unambiguous. +initialised through `POST /api/v1/admin/init`, game endpoints respond +`501 Not Implemented` to make the uninitialised state unambiguous. + +### `StateResponse.finished` + +`StateResponse` (returned by `GET /api/v1/admin/status` and +`PUT /api/v1/admin/turn`) carries a required boolean `finished` field. +The engine sets it to `true` exactly once on the turn-generation response +that ends the game; otherwise it stays `false`. `Game Master` uses this +field as the sole signal to run the platform finish flow. The conditional +logic that flips `finished` to `true` lives in the engine's domain code +and is owned by the engine maintainers. + +### `POST /api/v1/admin/race/banish` + +Deactivates a race after a permanent platform-level membership removal. +`Game Master` calls this endpoint synchronously after a Lobby-driven +remove-and-banish flow. + +- Request body: `{ "race_name": "" }`. `race_name` must be + non-empty and must match an existing race in the engine's roster. +- Successful response: `204 No Content` with an empty body. +- Error responses follow the same `400` / `500` envelope shape as the + other admin endpoints. The engine-side mechanics of `banish` (what + exactly happens to the race's planets, fleets, and pending orders) are + owned by the engine maintainers. ### `GET /healthz` @@ -53,9 +94,9 @@ Technical liveness probe used by `Runtime Manager` and operator tooling. - Returns `{"status":"ok"}` with HTTP `200` whenever the HTTP server is serving requests, regardless of whether the engine has been initialised - through `POST /api/v1/init`. -- Carries no game-state semantics. Use `GET /api/v1/status` for game-state - inspection. + through `POST /api/v1/admin/init`. +- Carries no game-state semantics. Use `GET /api/v1/admin/status` for + game-state inspection. This endpoint exists so that `Runtime Manager` can probe a freshly started container before `init` runs. diff --git a/game/go.mod b/game/go.mod index 600c144..e0fd519 100644 --- a/game/go.mod +++ b/game/go.mod @@ -34,7 +34,7 @@ require ( github.com/modern-go/reflect2 v1.0.2 // indirect github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/pelletier/go-toml/v2 v2.3.0 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect diff --git a/game/go.sum b/game/go.sum index c010075..f03ff29 100644 --- a/game/go.sum +++ b/game/go.sum @@ -66,8 +66,7 @@ github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 h1:RWengNIwukTxcDr9 github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826/go.mod h1:TaXosZuwdSHYgviHp1DAtfrULt5eUgsSMsZf+YrPgl8= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/pelletier/go-toml/v2 v2.3.0 h1:k59bC/lIZREW0/iVaQR8nDHxVq8OVlIzYCOJf421CaM= github.com/pelletier/go-toml/v2 v2.3.0/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY= github.com/perimeterx/marshmallow v1.1.5 h1:a2LALqQ1BlHM8PZblsDdidgv1mWi1DgC2UmX50IvK2s= diff --git a/game/internal/controller/command.go b/game/internal/controller/command.go index 6ba8808..c08622b 100644 --- a/game/internal/controller/command.go +++ b/game/internal/controller/command.go @@ -19,6 +19,15 @@ func (c Controller) RaceID(actor string) (uuid.UUID, error) { return c.Cache.g.Race[ri].ID, nil } +func (c Controller) RaceBanish(actor string) error { + ri, err := c.Cache.validRace(actor) + if err != nil { + return err + } + c.Cache.g.Race[ri].Extinct = true + return nil +} + func (c Controller) RaceQuit(actor string) error { ri, err := c.Cache.validRace(actor) if err != nil { diff --git a/game/internal/controller/controller.go b/game/internal/controller/controller.go index bbb84cd..1b14546 100644 --- a/game/internal/controller/controller.go +++ b/game/internal/controller/controller.go @@ -134,6 +134,14 @@ func ValidateOrder(configure func(*Param), actor string, cmd ...order.DecodableC return ec.validateOrder(actor, cmd...) } +func BanishRace(configure func(*Param), actor string) error { + ec, err := NewRepoController(configure) + if err != nil { + return err + } + return ec.banishRace(actor) +} + func GameState(configure func(*Param)) (s game.State, err error) { ec, err := NewRepoController(configure) if err != nil { @@ -146,10 +154,11 @@ func GameState(configure func(*Param)) (s game.State, err error) { } result := &game.State{ - ID: g.ID, - Turn: g.Turn, - Stage: g.Stage, - Players: make([]game.PlayerState, len(g.Race)), + ID: g.ID, + Turn: g.Turn, + Stage: g.Stage, + Finished: g.Finished(), + Players: make([]game.PlayerState, len(g.Race)), } planetCount := make(map[uuid.UUID]uint) @@ -243,6 +252,16 @@ func (ec *RepoController) executeCommand(consumer func(*Controller) error) (err }) } +func (ec *RepoController) banishRace(actor string) (err error) { + return ec.executeLocked(func(c *Controller) error { + err = c.RaceBanish(actor) + if err != nil { + return err + } + return c.saveState() + }) +} + func (ec *RepoController) executeSafe(consumer func(uint, *Controller) error) (err error) { g, err := ec.Repo.LoadStateSafe() if err != nil { diff --git a/game/internal/controller/race.go b/game/internal/controller/race.go index 2c0a551..20eb0b4 100644 --- a/game/internal/controller/race.go +++ b/game/internal/controller/race.go @@ -118,7 +118,7 @@ func (c *Cache) raceTechLevel(ri int, t game.Tech, v float64) { func (c *Cache) TurnWipeExtinctRaces() { for i := range c.listRaceActingIdx() { - if c.g.Race[i].TTL == 0 { + if (c.g.Race[i].Extinct && c.g.Race[i].TTL > 0) || (!c.g.Race[i].Extinct && c.g.Race[i].TTL == 0) { c.wipeRace(i) } } diff --git a/game/internal/model/game/state.go b/game/internal/model/game/state.go index aaeba7e..f404b66 100644 --- a/game/internal/model/game/state.go +++ b/game/internal/model/game/state.go @@ -3,10 +3,11 @@ package game import "github.com/google/uuid" type State struct { - ID uuid.UUID - Turn uint - Stage uint - Players []PlayerState + ID uuid.UUID + Turn uint + Stage uint + Players []PlayerState + Finished bool } type PlayerState struct { diff --git a/game/internal/router/banish_test.go b/game/internal/router/banish_test.go new file mode 100644 index 0000000..8311329 --- /dev/null +++ b/game/internal/router/banish_test.go @@ -0,0 +1,45 @@ +package router_test + +import ( + "net/http" + "net/http/httptest" + "testing" + + "galaxy/model/rest" + + "github.com/stretchr/testify/assert" +) + +const apiBanishPath = "/api/v1/admin/race/banish" + +func TestBanishHappyPath(t *testing.T) { + r := setupRouter() + + w := httptest.NewRecorder() + req, _ := http.NewRequest(http.MethodPost, apiBanishPath, asBody(rest.BanishRequest{RaceName: "Aelinari"})) + r.ServeHTTP(w, req) + + assert.Equal(t, http.StatusNoContent, w.Code, w.Body) + assert.Empty(t, w.Body.String()) +} + +func TestBanishValidation(t *testing.T) { + r := setupRouter() + + for _, tc := range []struct { + description string + body any + }{ + {"missing race_name", struct{}{}}, + {"empty race_name", rest.BanishRequest{RaceName: ""}}, + {"blank race_name", rest.BanishRequest{RaceName: " "}}, + } { + t.Run(tc.description, func(t *testing.T) { + w := httptest.NewRecorder() + req, _ := http.NewRequest(http.MethodPost, apiBanishPath, asBody(tc.body)) + r.ServeHTTP(w, req) + + assert.Equal(t, http.StatusBadRequest, w.Code, w.Body) + }) + } +} diff --git a/game/internal/router/handler/banish.go b/game/internal/router/handler/banish.go new file mode 100644 index 0000000..2448ba5 --- /dev/null +++ b/game/internal/router/handler/banish.go @@ -0,0 +1,22 @@ +package handler + +import ( + "net/http" + + "galaxy/model/rest" + + "github.com/gin-gonic/gin" +) + +func BanishHandler(c *gin.Context, executor CommandExecutor) { + var req rest.BanishRequest + if errorResponse(c, c.ShouldBindJSON(&req)) { + return + } + + if errorResponse(c, executor.BanishRace(req.RaceName)) { + return + } + + c.Status(http.StatusNoContent) +} diff --git a/game/internal/router/handler/handler.go b/game/internal/router/handler/handler.go index bbaed12..8d3b314 100644 --- a/game/internal/router/handler/handler.go +++ b/game/internal/router/handler/handler.go @@ -23,6 +23,7 @@ type CommandExecutor interface { GenerateGame([]string) (rest.StateResponse, error) GenerateTurn() (rest.StateResponse, error) GameState() (rest.StateResponse, error) + BanishRace(string) error LoadReport(actor string, turn uint) (*report.Report, error) Execute(cmd ...Command) error ValidateOrder(actor string, cmd ...order.DecodableCommand) error @@ -103,16 +104,21 @@ func (e *executor) GameState() (rest.StateResponse, error) { return stateResponse(s), nil } +func (e *executor) BanishRace(raceName string) error { + return controller.BanishRace(e.cfg, raceName) +} + func (e *executor) LoadReport(actor string, turn uint) (*report.Report, error) { return controller.LoadReport(e.cfg, actor, turn) } func stateResponse(s game.State) rest.StateResponse { result := &rest.StateResponse{ - ID: s.ID, - Turn: s.Turn, - Stage: s.Stage, - Players: make([]rest.PlayerState, len(s.Players)), + ID: s.ID, + Turn: s.Turn, + Stage: s.Stage, + Finished: s.Finished, + Players: make([]rest.PlayerState, len(s.Players)), } for i := range s.Players { result.Players[i].ID = s.Players[i].ID diff --git a/game/internal/router/handler/healthz.go b/game/internal/router/handler/healthz.go index 9c71905..39b6f7f 100644 --- a/game/internal/router/handler/healthz.go +++ b/game/internal/router/handler/healthz.go @@ -8,7 +8,7 @@ import ( // HealthzHandler is the technical liveness probe used by Runtime Manager // and operator tooling. It returns 200 with {"status":"ok"} regardless -// of whether the engine has been initialised through POST /api/v1/init. +// of whether the engine has been initialised through POST /api/v1/admin/init. func HealthzHandler(c *gin.Context) { c.JSON(http.StatusOK, gin.H{"status": "ok"}) } diff --git a/game/internal/router/init_test.go b/game/internal/router/init_test.go index 0431295..8341ccb 100644 --- a/game/internal/router/init_test.go +++ b/game/internal/router/init_test.go @@ -27,7 +27,7 @@ func TestInit(t *testing.T) { payload := generateInitRequest(10) w := httptest.NewRecorder() - req, _ := http.NewRequest("POST", "/api/v1/init", asBody(payload)) + req, _ := http.NewRequest("POST", "/api/v1/admin/init", asBody(payload)) r.ServeHTTP(w, req) assert.Equal(t, http.StatusCreated, w.Code, w.Body) @@ -42,7 +42,7 @@ func TestInitValidators(t *testing.T) { payload := generateInitRequest(9) w := httptest.NewRecorder() - req, _ := http.NewRequest("POST", "/api/v1/init", asBody(payload)) + req, _ := http.NewRequest("POST", "/api/v1/admin/init", asBody(payload)) r.ServeHTTP(w, req) assert.Equal(t, http.StatusBadRequest, w.Code, w.Body) diff --git a/game/internal/router/report_test.go b/game/internal/router/report_test.go index a87756e..73d0d69 100644 --- a/game/internal/router/report_test.go +++ b/game/internal/router/report_test.go @@ -24,7 +24,7 @@ func TestGetReport(t *testing.T) { payload := generateInitRequest(10) w := httptest.NewRecorder() - req, _ := http.NewRequest("POST", "/api/v1/init", asBody(payload)) + req, _ := http.NewRequest("POST", "/api/v1/admin/init", asBody(payload)) r.ServeHTTP(w, req) assert.Equal(t, http.StatusCreated, w.Code, w.Body) diff --git a/game/internal/router/router.go b/game/internal/router/router.go index 1e165de..592cf0d 100644 --- a/game/internal/router/router.go +++ b/game/internal/router/router.go @@ -67,12 +67,15 @@ func setupRouter(executor handler.CommandExecutor) *gin.Engine { groupV1 := r.Group("/api/v1") - groupV1.GET("/status", func(ctx *gin.Context) { handler.StatusHandler(ctx, executor) }) - groupV1.POST("/init", func(ctx *gin.Context) { handler.InitHandler(ctx, executor) }) + groupAdmin := groupV1.Group("/admin") + groupAdmin.GET("/status", func(ctx *gin.Context) { handler.StatusHandler(ctx, executor) }) + groupAdmin.POST("/init", func(ctx *gin.Context) { handler.InitHandler(ctx, executor) }) + groupAdmin.PUT("/turn", func(ctx *gin.Context) { handler.TurnHandler(ctx, executor) }) + groupAdmin.POST("/race/banish", func(ctx *gin.Context) { handler.BanishHandler(ctx, executor) }) + groupV1.GET("/report", func(ctx *gin.Context) { handler.ReportHandler(ctx, executor) }) groupV1.PUT("/command", LimitMiddleware(1), func(ctx *gin.Context) { handler.CommandHandler(ctx, executor) }) groupV1.PUT("/order", func(ctx *gin.Context) { handler.OrderHandler(ctx, executor) }) - groupV1.PUT("/turn", func(ctx *gin.Context) { handler.TurnHandler(ctx, executor) }) return r } diff --git a/game/internal/router/router_helper_test.go b/game/internal/router/router_helper_test.go index e8db00f..4be3963 100644 --- a/game/internal/router/router_helper_test.go +++ b/game/internal/router/router_helper_test.go @@ -52,6 +52,10 @@ func (e *dummyExecutor) GenerateTurn() (rest.StateResponse, error) { return rest.StateResponse{}, nil } +func (e *dummyExecutor) BanishRace(raceName string) error { + return nil +} + func (e *dummyExecutor) GameState() (rest.StateResponse, error) { return rest.StateResponse{}, nil } diff --git a/game/internal/router/status_test.go b/game/internal/router/status_test.go index bad6e0a..9608e0a 100644 --- a/game/internal/router/status_test.go +++ b/game/internal/router/status_test.go @@ -27,7 +27,7 @@ func TestGetStatus(t *testing.T) { payload := generateInitRequest(10) w := httptest.NewRecorder() - req, _ := http.NewRequest("POST", "/api/v1/init", asBody(payload)) + req, _ := http.NewRequest("POST", "/api/v1/admin/init", asBody(payload)) r.ServeHTTP(w, req) assert.Equal(t, http.StatusCreated, w.Code, w.Body) @@ -37,7 +37,7 @@ func TestGetStatus(t *testing.T) { assert.NotEqual(t, uuid.Nil, uuid.MustParse(initResponse.ID.String())) w = httptest.NewRecorder() - req, _ = http.NewRequest("GET", "/api/v1/status", nil) + req, _ = http.NewRequest("GET", "/api/v1/admin/status", nil) r.ServeHTTP(w, req) assert.Equal(t, http.StatusOK, w.Code, w.Body) @@ -47,6 +47,7 @@ func TestGetStatus(t *testing.T) { assert.Equal(t, initResponse.ID, stateResponse.ID) assert.Equal(t, uint(0), stateResponse.Turn) assert.Equal(t, uint(0), stateResponse.Stage) + assert.False(t, stateResponse.Finished) assert.Len(t, stateResponse.Players, 10) for i := range stateResponse.Players { assert.NoError(t, uuid.Validate(stateResponse.Players[i].ID.String())) diff --git a/game/internal/router/turn_test.go b/game/internal/router/turn_test.go index b961c7f..c07bc8f 100644 --- a/game/internal/router/turn_test.go +++ b/game/internal/router/turn_test.go @@ -29,7 +29,7 @@ func TestGetTurn(t *testing.T) { payload := generateInitRequest(10) w := httptest.NewRecorder() - req, _ := http.NewRequest("POST", "/api/v1/init", asBody(payload)) + req, _ := http.NewRequest("POST", "/api/v1/admin/init", asBody(payload)) r.ServeHTTP(w, req) assert.Equal(t, http.StatusCreated, w.Code, w.Body) @@ -50,7 +50,7 @@ func TestGetTurn(t *testing.T) { // generate next turn w = httptest.NewRecorder() - req, _ = http.NewRequest("PUT", "/api/v1/turn", nil) + req, _ = http.NewRequest("PUT", "/api/v1/admin/turn", nil) r.ServeHTTP(w, req) assert.Equal(t, http.StatusOK, w.Code, w.Body) @@ -72,7 +72,7 @@ func TestGetTurn(t *testing.T) { // validate status w = httptest.NewRecorder() - req, _ = http.NewRequest("GET", "/api/v1/status", nil) + req, _ = http.NewRequest("GET", "/api/v1/admin/status", nil) r.ServeHTTP(w, req) assert.Equal(t, http.StatusOK, w.Code, w.Body) diff --git a/game/openapi.yaml b/game/openapi.yaml index aae26a3..51ba5ee 100644 --- a/game/openapi.yaml +++ b/game/openapi.yaml @@ -30,16 +30,17 @@ tags: - name: Health description: Technical liveness probes used by Runtime Manager and operator tooling. paths: - /api/v1/status: + /api/v1/admin/status: get: tags: - GameLifecycle - operationId: getGameStatus + operationId: adminGetGameStatus summary: Get the current game state description: | Returns the current game state including turn number, stage, and a summary of all players. Returns `501` if the game has not yet been - initialized. + initialized. Routed only from the trusted network segment that + connects `Game Master` to the engine container. responses: "200": description: Current game state. @@ -51,15 +52,17 @@ paths: description: Game has not been initialized yet. "500": $ref: "#/components/responses/InternalError" - /api/v1/init: + /api/v1/admin/init: post: tags: - GameLifecycle - operationId: initGame + operationId: adminInitGame summary: Initialize a new game description: | Generates a new game instance with the supplied list of races. - Requires at least 10 race entries. + Requires at least 10 race entries. Routed only from the trusted + network segment that connects `Game Master` to the engine + container. requestBody: required: true content: @@ -77,6 +80,30 @@ paths: $ref: "#/components/responses/ValidationError" "500": $ref: "#/components/responses/InternalError" + /api/v1/admin/race/banish: + post: + tags: + - GameLifecycle + operationId: adminBanishRace + summary: Deactivate a race after a permanent platform-level removal + description: | + Deactivates the named race in the running engine. Called by `Game + Master` after a Lobby-driven permanent membership removal. Routed + only from the trusted network segment that connects `Game Master` + to the engine container. + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/BanishRequest" + responses: + "204": + description: Race deactivated; no response body. + "400": + $ref: "#/components/responses/ValidationError" + "500": + $ref: "#/components/responses/InternalError" /api/v1/report: get: tags: @@ -148,15 +175,16 @@ paths: $ref: "#/components/responses/ValidationError" "500": $ref: "#/components/responses/InternalError" - /api/v1/turn: + /api/v1/admin/turn: put: tags: - GameLifecycle - operationId: generateTurn + operationId: adminGenerateTurn summary: Advance the game to the next turn description: | Processes the current turn and generates the next one. Returns the - updated game state. + updated game state. Routed only from the trusted network segment + that connects `Game Master` to the engine container. responses: "200": description: Updated game state after turn generation. @@ -175,10 +203,10 @@ paths: description: | Returns `{"status":"ok"}` with HTTP `200` whenever the HTTP server is serving requests, regardless of whether the engine has been - initialised through `POST /api/v1/init`. Used by `Runtime Manager` - to probe a freshly started container before `init` runs. Carries - no game-state semantics; use `GET /api/v1/status` for game-state - inspection. + initialised through `POST /api/v1/admin/init`. Used by `Runtime + Manager` to probe a freshly started container before `init` runs. + Carries no game-state semantics; use `GET /api/v1/admin/status` + for game-state inspection. responses: "200": description: Engine HTTP server is up. @@ -225,6 +253,7 @@ components: - turn - stage - player + - finished properties: id: type: string @@ -243,6 +272,15 @@ components: description: Summary state for each player participating in the game. items: $ref: "#/components/schemas/PlayerState" + finished: + type: boolean + description: | + True exactly once on the turn-generation response that ends the + game; otherwise false. Server default: false. `Game Master` + uses this flag as the sole signal to run the platform finish + flow. The conditional logic that flips it to true lives in + the engine's domain code and is owned by the engine + maintainers. PlayerState: type: object description: Brief player state returned as part of the game state response. @@ -292,6 +330,18 @@ components: type: string description: Name of the race. Must be non-blank and satisfy the entity-name format. minLength: 1 + BanishRequest: + type: object + description: | + Request body for the admin banish endpoint. `race_name` must + identify an existing race in the engine roster. + required: + - race_name + properties: + race_name: + type: string + description: Name of the race to banish. Must be non-blank. + minLength: 1 CommandRequest: type: object description: | diff --git a/game/openapi_contract_test.go b/game/openapi_contract_test.go index 43cbd81..0e93535 100644 --- a/game/openapi_contract_test.go +++ b/game/openapi_contract_test.go @@ -31,15 +31,15 @@ func TestGameOpenAPISpecFreezesResponseSchemas(t *testing.T) { wantRef string }{ { - name: "get game status", - path: "/api/v1/status", + name: "admin get game status", + path: "/api/v1/admin/status", method: http.MethodGet, status: http.StatusOK, wantRef: "#/components/schemas/StateResponse", }, { - name: "init game", - path: "/api/v1/init", + name: "admin init game", + path: "/api/v1/admin/init", method: http.MethodPost, status: http.StatusCreated, wantRef: "#/components/schemas/StateResponse", @@ -52,8 +52,8 @@ func TestGameOpenAPISpecFreezesResponseSchemas(t *testing.T) { wantRef: "#/components/schemas/Report", }, { - name: "generate turn", - path: "/api/v1/turn", + name: "admin generate turn", + path: "/api/v1/admin/turn", method: http.MethodPut, status: http.StatusOK, wantRef: "#/components/schemas/StateResponse", @@ -81,7 +81,7 @@ func TestGameOpenAPISpecFreezesInitRequest(t *testing.T) { t.Parallel() doc := loadOpenAPISpec(t) - operation := getOpenAPIOperation(t, doc, "/api/v1/init", http.MethodPost) + operation := getOpenAPIOperation(t, doc, "/api/v1/admin/init", http.MethodPost) assertSchemaRef(t, requestSchemaRef(t, operation), "#/components/schemas/InitRequest", "init request schema") @@ -93,6 +93,68 @@ func TestGameOpenAPISpecFreezesInitRequest(t *testing.T) { require.Equal(t, uint64(10), racesSchema.Value.MinItems, "InitRequest.races minItems must be 10") } +func TestGameOpenAPISpecFreezesAdminOperationIDs(t *testing.T) { + t.Parallel() + + doc := loadOpenAPISpec(t) + + tests := []struct { + path string + method string + opID string + }{ + {"/api/v1/admin/init", http.MethodPost, "adminInitGame"}, + {"/api/v1/admin/status", http.MethodGet, "adminGetGameStatus"}, + {"/api/v1/admin/turn", http.MethodPut, "adminGenerateTurn"}, + {"/api/v1/admin/race/banish", http.MethodPost, "adminBanishRace"}, + } + + for _, tt := range tests { + t.Run(tt.opID, func(t *testing.T) { + t.Parallel() + + operation := getOpenAPIOperation(t, doc, tt.path, tt.method) + require.Equal(t, tt.opID, operation.OperationID, "operation id for %s %s", tt.method, tt.path) + }) + } +} + +func TestGameOpenAPISpecFreezesBanishRequest(t *testing.T) { + t.Parallel() + + doc := loadOpenAPISpec(t) + operation := getOpenAPIOperation(t, doc, "/api/v1/admin/race/banish", http.MethodPost) + + assertSchemaRef(t, requestSchemaRef(t, operation), "#/components/schemas/BanishRequest", "banish request schema") + + if operation.Responses == nil { + require.FailNow(t, "banish operation is missing responses") + } + noContent := operation.Responses.Status(http.StatusNoContent) + require.NotNil(t, noContent, "banish operation must declare 204 response") + require.NotNil(t, noContent.Value, "banish 204 response must have a value") + + schema := componentSchemaRef(t, doc, "BanishRequest") + assertRequiredFields(t, schema, "race_name") + + raceNameSchema := schema.Value.Properties["race_name"] + require.NotNil(t, raceNameSchema, "BanishRequest.race_name schema must exist") + require.Equal(t, uint64(1), raceNameSchema.Value.MinLength, "BanishRequest.race_name minLength must be 1") +} + +func TestGameOpenAPISpecFreezesStateResponseFinished(t *testing.T) { + t.Parallel() + + doc := loadOpenAPISpec(t) + schema := componentSchemaRef(t, doc, "StateResponse") + + assertRequiredFields(t, schema, "id", "turn", "stage", "player", "finished") + + finishedSchema := schema.Value.Properties["finished"] + require.NotNil(t, finishedSchema, "StateResponse.finished schema must exist") + require.True(t, finishedSchema.Value.Type.Is("boolean"), "StateResponse.finished must be boolean") +} + func TestGameOpenAPISpecFreezesCommandRequest(t *testing.T) { t.Parallel() diff --git a/gamemaster/Makefile b/gamemaster/Makefile new file mode 100644 index 0000000..420950e --- /dev/null +++ b/gamemaster/Makefile @@ -0,0 +1,32 @@ +# Makefile for galaxy/gamemaster. +# +# The `jet` target regenerates the go-jet/v2 query-builder code under +# internal/adapters/postgres/jet/ against a transient PostgreSQL container +# brought up by cmd/jetgen. Generated code is committed; running this +# target requires a reachable Docker daemon (testcontainers spins up a +# postgres:16-alpine container). +# +# The `mocks` target regenerates the gomock-driven mocks via the +# //go:generate directives that live next to the interfaces they cover: +# - internal/ports/ — port interfaces (PLAN stage 10) +# - internal/api/internalhttp/handlers/ — REST handler service ports (PLAN stage 19) +# Generated code is committed. +# +# The `integration` target runs the service-local end-to-end suite under +# integration/ (PLAN stage 21). It requires a reachable Docker daemon +# (`/var/run/docker.sock` or `DOCKER_HOST`); without one the helpers in +# integration/harness call t.Skip and the tests are no-ops. + +.PHONY: jet mocks integration + +jet: + go run ./cmd/jetgen + +mocks: + go generate ./internal/ports/... + @if [ -d ./internal/api/internalhttp/handlers ]; then \ + go generate ./internal/api/internalhttp/handlers/...; \ + fi + +integration: + go test -tags=integration -count=1 ./integration/... diff --git a/gamemaster/PLAN.md b/gamemaster/PLAN.md new file mode 100644 index 0000000..fb85320 --- /dev/null +++ b/gamemaster/PLAN.md @@ -0,0 +1,1276 @@ +# Game Master Implementation Plan + +This plan delivers `Game Master` (GM), the platform service that owns +runtime and operational state of running Galaxy games, mediates every call +to the engine container, runs the turn scheduler, and owns the engine +version registry. + +The plan also delivers the upstream changes that GM depends on: the +extracted `pkg/cronutil` module, the engine admin-path rename plus the +`finished:bool` field and the new `/admin/race/banish` endpoint on +`galaxy/game`, the Lobby refactor that drops `LOBBY_ENGINE_IMAGE_TEMPLATE` +in favour of synchronous image-ref resolution against GM, and the +membership invalidation hook from Lobby into GM. + +The architectural rules behind every decision are recorded in +[`./README.md`](./README.md). This file describes the order in which the +implementation lands. + +## Global Rules + +- Documentation always lands before contracts; contracts before code. +- Each stage leaves the repository in a buildable, test-green state. No + stage relies on a later stage to fix a regression it introduced. +- Existing-service refactors (Lobby image-ref resolver, Lobby membership + invalidation hook, game engine path rename plus `finished` field plus + banish endpoint, `pkg/cronutil` extraction) are full-fledged stages of + this plan; they precede every GM stage that depends on them. +- GM never opens the Docker SDK. Every container operation goes through + `Runtime Manager` over trusted internal REST. +- GM never trusts an `actor` field provided in a payload from `Edge + Gateway`; it always derives `actor=race_name` from its own + `(user_id → race_name)` mapping. +- Every functional change ships its tests in the same stage. Contract + tests freeze operation IDs and stream message names from Stage 06 + onward. +- All code, docs, and identifiers are written in English. +- Engine domain logic (when `finished=true` is set, what `banish` mutates + inside the game) is user-owned and explicitly out of scope; this plan + ships only the contract, router plumbing, and stub handlers for those + pieces. + +## Suggested Module Structure + +```text +gamemaster/ +├── cmd/ +│ ├── gamemaster/ +│ │ └── main.go +│ └── jetgen/ +│ └── main.go +│ +├── internal/ +│ ├── app/ +│ │ ├── app.go +│ │ ├── runtime.go +│ │ ├── wiring.go +│ │ └── bootstrap.go +│ │ +│ ├── config/ +│ │ ├── config.go +│ │ ├── env.go +│ │ └── validation.go +│ │ +│ ├── logging/ +│ │ ├── logger.go +│ │ └── context.go +│ │ +│ ├── telemetry/ +│ │ └── runtime.go +│ │ +│ ├── domain/ +│ │ ├── runtime/ +│ │ │ ├── model.go +│ │ │ └── transitions.go +│ │ ├── engineversion/ +│ │ │ ├── model.go +│ │ │ └── semver.go +│ │ ├── playermapping/ +│ │ │ └── model.go +│ │ └── schedule/ +│ │ └── nexttick.go +│ │ +│ ├── ports/ +│ │ ├── runtimerecordstore.go +│ │ ├── engineversionstore.go +│ │ ├── playermappingstore.go +│ │ ├── operationlog.go +│ │ ├── streamoffsetstore.go +│ │ ├── engineclient.go +│ │ ├── lobbyclient.go +│ │ ├── rtmclient.go +│ │ ├── notificationpublisher.go +│ │ └── lobbyeventspublisher.go +│ │ +│ ├── adapters/ +│ │ ├── postgres/ +│ │ │ ├── migrations/ +│ │ │ ├── jet/ +│ │ │ ├── runtimerecordstore/ +│ │ │ ├── engineversionstore/ +│ │ │ ├── playermappingstore/ +│ │ │ └── operationlog/ +│ │ ├── redisstate/ +│ │ │ └── streamoffsets/ +│ │ ├── engineclient/ +│ │ ├── lobbyclient/ +│ │ ├── rtmclient/ +│ │ ├── notificationpublisher/ +│ │ ├── lobbyeventspublisher/ +│ │ └── mocks/ +│ │ +│ ├── service/ +│ │ ├── registerruntime/ +│ │ ├── engineversion/ +│ │ ├── scheduler/ +│ │ ├── turngeneration/ +│ │ ├── commandexecute/ +│ │ ├── orderput/ +│ │ ├── reportget/ +│ │ ├── membership/ +│ │ ├── adminstop/ +│ │ ├── adminforce/ +│ │ ├── adminpatch/ +│ │ ├── adminbanish/ +│ │ └── livenessreply/ +│ │ +│ ├── worker/ +│ │ ├── schedulerticker/ +│ │ └── healtheventsconsumer/ +│ │ +│ └── api/ +│ └── internalhttp/ +│ ├── server.go +│ └── handlers/ +│ +├── api/ +│ ├── internal-openapi.yaml +│ └── runtime-events-asyncapi.yaml +│ +├── integration/ +│ ├── harness/ +│ ├── registerruntime_test.go +│ ├── scheduler_test.go +│ ├── hotpath_test.go +│ ├── adminops_test.go +│ ├── healthevents_test.go +│ └── notification_test.go +│ +├── docs/ +│ ├── README.md +│ ├── runtime.md +│ ├── flows.md +│ ├── runbook.md +│ ├── examples.md +│ └── postgres-migration.md +│ +├── README.md +├── PLAN.md +├── Makefile +└── go.mod +``` + +## ~~Stage 01.~~ Update `ARCHITECTURE.md` + +Goal: + +- align the project-wide source of truth with every decision recorded in + [`./README.md`](./README.md) before any code change touches it. + +Tasks: + +- Expand `ARCHITECTURE.md §8` (Game Master) with subsections: engine + container contract (admin vs player paths, `finished:bool` semantics, + `banish` endpoint), runtime status enum (`starting | running | + generation_in_progress | generation_failed | stopped | + engine_unreachable | finished`), turn cutoff rule (no shadow window; + CAS-only), force-next-turn skip rule, snapshot publishing cadence + (events only, no heartbeat), single-instance topology. +- Update §«Versioning of Game Engines»: GM owns the engine version + registry from v1; Lobby resolves `image_ref` synchronously through GM. + `LOBBY_ENGINE_IMAGE_TEMPLATE` is removed. `engine_versions` table lives + in the `gamemaster` schema. +- Update §«Fixed synchronous interactions»: add `Game Lobby → Game Master` + for `register-runtime`, image-ref resolve, membership invalidation + hook, banish, and liveness reply. Add `Edge Gateway → Game Master` for + player commands, orders, and reports. +- Update §«Fixed asynchronous interactions»: add `Game Master → Game + Lobby` runtime snapshot updates and game-finish events through the + `gm:lobby_events` Redis Stream (already mentioned, expanded with + cadence rules); add `Runtime Manager → Game Master` health events + consumption (`runtime:health_events`) — already mentioned, confirmed. +- Update §«Persistence Backends»: add `gamemaster` schema to the + schema-per-service list and to PG-backed services. +- Update §«Configuration»: add `GAMEMASTER` to the env-var prefix list + with the same shape rules as other PG/Redis-backed services. +- Update §«Recommended Order of Service Implementation» entry 8 with the + scope finalised in [`./README.md`](./README.md). +- Drop `ships_built` from every architectural mention of + `player_turn_stats`. Update the capability rule wording to use + `planets` and `population` only (no behavioural change; `ships_built` + was unused). + +Files touched: + +- `ARCHITECTURE.md`. + +Exit criteria: + +- every later GM, Lobby, Notification, or Game stage can quote its rules + from `ARCHITECTURE.md` without re-deciding them. +- `go test ./...` is unaffected (this stage changes only Markdown). + +## ~~Stage 02.~~ Freeze GM `README.md` + +Status: implemented as part of this planning task — see +[`./README.md`](./README.md). + +Goal: + +- publish the complete service description so contracts and code can + reference one source. + +Exit criteria: + +- a reviewer can answer any «what does GM do when X» question by reading + the README alone. + +## ~~Stage 03.~~ Sync existing-service docs (Lobby, Notification, Game, RTM) + +Goal: + +- bring the READMEs of every touched service into agreement with the GM + contract before any code in those services changes. + +Tasks: + +- `lobby/README.md`: + - replace the `LOBBY_ENGINE_IMAGE_TEMPLATE` configuration entry with a + new `LOBBY_GM_BASE_URL`-backed image-ref resolve via + `GET /api/v1/internal/engine-versions/{version}/image-ref`; + - document the new outgoing `POST /api/v1/internal/games/{id}/memberships/invalidate` + call from `removemember`, `blockmember`, `approveapplication`, + `rejectapplication`, `redeeminvite`, and the user-lifecycle cascade + worker (post-commit, fail-open); + - drop `ships_built` from the `player_turn_stats` description and from + the capability evaluation wording (rule already reduces to planets + + population); + - add a paragraph in §Game Start Flow noting that `image_ref` is + resolved from GM synchronously and that GM unavailability turns + `lobby.game.start` into `service_unavailable`. +- `lobby/PLAN.md`: append a closing note stating that the image-ref + template removal and the membership invalidation hook are landed by + the Game Master plan; no new stages added in Lobby's own PLAN. +- `notification/README.md`: confirm the catalog already lists + `game.turn.ready`, `game.finished`, `game.generation_failed` and add + a one-line note that GM is the producer. +- `game/README.md`: + - document the new path layout: admin endpoints under + `/api/v1/admin/*` (`init`, `status`, `turn`, `race/banish`); player + endpoints unchanged at `/api/v1/{command, order, report}`; + - document the `finished:bool` extension on `StateResponse`; + - document the `POST /api/v1/admin/race/banish` request/response shape + (body `{race_name}`; response `204`). +- `rtmanager/README.md`: add a closing note that `runtime:health_events` + is now consumed by Game Master in production (was reserved as a future + consumer). + +Files touched: + +- `lobby/README.md`, `lobby/PLAN.md`, `notification/README.md`, + `game/README.md`, `rtmanager/README.md`. + +Exit criteria: + +- every doc in the repo agrees on the post-GM contract; no contradiction + remains between any two READMEs. +- `go test ./...` is unaffected. + +## ~~Stage 04.~~ Extract `pkg/cronutil` + wire Lobby + +Goal: + +- own a single cron parser/calculator across the workspace, used today + by Lobby and tomorrow by GM. + +Tasks: + +- Create new workspace module `pkg/cronutil/` with: + - `cronutil.go`: thin wrapper over + `github.com/robfig/cron/v3.NewParser(cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow)`; + exports `Parse(expr string) (Schedule, error)` and + `Schedule.Next(after time.Time) time.Time`; + - `cronutil_test.go`: parser validation tests covering five-field cron + expressions (e.g., `0 18 * * *`, `*/15 * * * *`), invalid expressions, + DST/timezone behaviour (Schedule operates in UTC; UTC inputs yield + UTC outputs); + - `go.mod` declaring the module `galaxy/cronutil` with replace target. +- Wire from Lobby: replace any inline `robfig/cron/v3` usage in + `lobby/internal/domain/game/model.go:validateCronExpr` and the + enrollment automation worker with calls into `pkg/cronutil`. The + enrollment automation worker does not parse cron today (it uses + `enrollment_ends_at` UTC seconds), so the only Lobby caller is the + cron-validation path on game records. +- Update `go.work` to include `./pkg/cronutil` and add the replace block. +- Add Lobby unit tests confirming `validateCronExpr` accepts and rejects + the same expressions as before. + +Files new: + +- `pkg/cronutil/{cronutil.go, cronutil_test.go, go.mod, go.sum}`. + +Files touched: + +- `go.work`, `go.work.sum`, `lobby/internal/domain/game/model.go`, + `lobby/go.mod`, `lobby/go.sum`. + +Exit criteria: + +- `go build ./...` succeeds. +- `go test ./pkg/cronutil/... ./lobby/...` passes. +- `lobby/internal/domain/game/model_test.go` still asserts the same + acceptance set on cron expressions. + +## ~~Stage 05.~~ Game engine contract: admin paths + finished + banish + +Goal: + +- ship the contract changes to `galaxy/game` that GM depends on: admin + routes under `/api/v1/admin/*`, the `StateResponse.finished` field, + and the new `/admin/race/banish` endpoint. + +Tasks: + +- `game/openapi.yaml`: + - rename `/api/v1/init` → `/api/v1/admin/init` (operation + `initGame` → `adminInitGame`); + - rename `/api/v1/status` → `/api/v1/admin/status` (operation + `getGameStatus` → `adminGetGameStatus`); + - rename `/api/v1/turn` → `/api/v1/admin/turn` (operation + `generateTurn` → `adminGenerateTurn`); + - add `POST /api/v1/admin/race/banish` (operation `adminBanishRace`) + with body `{race_name}` and `204 No Content` on success; document + the same `400` and `500` error envelopes as the existing endpoints; + - extend `StateResponse` schema with `finished:bool` (required; + default `false` from server perspective documented in description). +- `game/internal/router/router.go` (or its router-helper file): rename + the route constants and registrations to the new admin paths; add a + new route for `/admin/race/banish` wired to a stub handler returning + `204` with empty body. +- `game/internal/router/handler/banish.go`: new file with a stub handler + that decodes the body, validates `race_name` is non-empty, and returns + `204`. Logging only; no game-state mutation. The user fills in domain + logic in a separate change. +- `game/internal/model/state.go`: add `Finished bool` field to the Go + struct backing `StateResponse`. Default-zero (`false`) on serialisation; + the user fills in conditional logic. +- `game/internal/router/{init,status,turn}_test.go`: update path + literals to the new admin form; tests stay green. +- `game/openapi_contract_test.go`: assert presence of the new operation + IDs (`adminInitGame`, `adminGetGameStatus`, `adminGenerateTurn`, + `adminBanishRace`), the new path components, and the `finished` field + on `StateResponse`. + +Files new: + +- `game/internal/router/handler/banish.go`, + `game/internal/router/banish_test.go` (path-level test only). + +Files touched: + +- `game/openapi.yaml`, `game/openapi_contract_test.go`, + `game/internal/router/router.go`, `game/internal/router/handler/*.go`, + `game/internal/router/{init,status,turn}_test.go`, + `game/internal/model/state.go`. + +Exit criteria: + +- `go test ./game/...` passes. +- `docker build -t galaxy/game:test -f game/Dockerfile .` from the + workspace root still succeeds. +- `curl -X POST http://localhost:8080/api/v1/admin/race/banish -d + '{"race_name":"Aelinari"}'` against a running container returns `204`. + +## ~~Stage 06.~~ GM contract files and contract tests + +Goal: + +- ship machine-readable contracts before any GM handler is written, so + the implementation has a target spec. + +Tasks: + +- `gamemaster/api/internal-openapi.yaml`: every internal REST endpoint + with request and response schemas; error envelope `{ "error": { + "code", "message" } }` identical to Lobby. Operation IDs: + `internalRegisterRuntime`, `internalGetRuntime`, `internalListRuntimes`, + `internalForceNextTurn`, `internalStopRuntime`, `internalPatchRuntime`, + `internalBanishRace`, `internalInvalidateMemberships`, + `internalGameLiveness`, `internalListEngineVersions`, + `internalCreateEngineVersion`, `internalGetEngineVersion`, + `internalUpdateEngineVersion`, `internalDeprecateEngineVersion`, + `internalResolveEngineVersionImageRef`, `internalExecuteCommands`, + `internalPutOrders`, `internalGetReport`, `internalHealthz`, + `internalReadyz`. +- `gamemaster/api/runtime-events-asyncapi.yaml`: AsyncAPI 3.1.0 spec for + `gm:lobby_events`. Two `event_type` values: `runtime_snapshot_update` + and `game_finished`. Frozen field set per message: + `runtime_snapshot_update {game_id, current_turn, runtime_status, + engine_health_summary, player_turn_stats[], occurred_at_ms}`; + `game_finished {game_id, final_turn_number, runtime_status, + player_turn_stats[], finished_at_ms}`. +- `gamemaster/contract_openapi_test.go`: load the OpenAPI spec via + `kin-openapi`, assert every operation ID is present, every required + field on every request/response schema is present, and that + `additionalProperties: false` is set on every body schema. +- `gamemaster/contract_asyncapi_test.go`: load the AsyncAPI spec via the + shared YAML walker pattern from `notification/contract_asyncapi_test.go`; + assert message names, channel addresses, action vocabulary + (`send`/`receive`), and `event_type` discriminator values. + +Files new: + +- `gamemaster/api/internal-openapi.yaml`, + `gamemaster/api/runtime-events-asyncapi.yaml`, + `gamemaster/contract_openapi_test.go`, + `gamemaster/contract_asyncapi_test.go`. + +Exit criteria: + +- both specs validate. +- contract tests pass; tests fail loudly if any operation ID, message + name, or required field disappears. + +## ~~Stage 07.~~ Notification catalog audit (no-op or minor) + +Goal: + +- confirm the GM-owned notification types (`game.turn.ready`, + `game.finished`, `game.generation_failed`) are already wired through + `pkg/notificationintent`, the `notification` service's catalog data + tables, and `notification/api/intents-asyncapi.yaml`. Add freeze + assertions so a future drift breaks loudly. + +Tasks: + +- Run a freeze test inside `gamemaster/` that imports + `galaxy/notificationintent` and asserts the existence of the three + constructors plus payload struct shapes. +- Inspect `notification/api/intents-asyncapi.yaml` for the three message + schemas; if any are missing the per-payload required fields, add them + here. +- Inspect the notification service's routing data tables (the location + is internal to `notification/internal/...`); confirm the three types + are present with audience and channel decisions matching + [`./README.md` §Notification Contracts](./README.md). Add entries if + missing. +- Extend `notification/contract_asyncapi_test.go` if any new payload + schema entries were added. + +Files touched (only if drift is found): + +- `notification/api/intents-asyncapi.yaml`, + `notification/internal/...` (catalog data), + `notification/contract_asyncapi_test.go`. + +Files new: + +- `gamemaster/notificationintent_audit_test.go`. + +Exit criteria: + +- the freeze test passes. +- `notification/contract_asyncapi_test.go` and + `intent_acceptance_contract_test.go` continue to pass. + +## ~~Stage 08.~~ GM module skeleton + +Goal: + +- create a buildable `gamemaster` binary that loads config, opens + dependencies, and exits cleanly on SIGTERM. It does no business work + yet. + +Tasks: + +- `gamemaster/cmd/gamemaster/main.go` mirroring `rtmanager/cmd/rtmanager/main.go`. +- `gamemaster/internal/config/{config.go, env.go, validation.go}` with + env prefix `GAMEMASTER` and groups Listener, Postgres, Redis, Streams, + Engine client, Lobby internal client, RTM internal client, Scheduler, + Membership cache, Logging, Lifecycle, Telemetry. Required variables + fail-fast. +- `gamemaster/internal/logging/{logger.go, context.go}` copied from + lobby/notification. +- `gamemaster/internal/telemetry/runtime.go` registering the metrics + named in [`./README.md §Observability`](./README.md). +- `gamemaster/internal/app/{runtime.go, app.go, wiring.go, bootstrap.go}` + — empty wiring with PostgreSQL open, Redis open, telemetry open, probe + listener open. +- `gamemaster/internal/api/internalhttp/server.go` — listener with + `/healthz` and `/readyz` only. +- `gamemaster/Makefile` with the `jet` target (real generation lands in + Stage 09) and a `mocks` target. +- `gamemaster/go.mod` and `go.sum` with dependencies: + `github.com/redis/go-redis/v9`, `github.com/jackc/pgx/v5`, + `github.com/go-jet/jet/v2`, `github.com/pressly/goose/v3`, + `github.com/stretchr/testify`, `go.uber.org/mock`, the testcontainers + modules for postgres/redis, the OpenTelemetry stack identical to lobby, + `galaxy/cronutil`, `galaxy/notificationintent`, `galaxy/postgres`, + `galaxy/redisconn`, `galaxy/error`, `galaxy/util`. +- Update repo-level `go.work` — `./gamemaster` is already a workspace + member; verify the module path and `go.work.sum`. + +Files new: + +- the entire skeleton tree under `gamemaster/`. + +Exit criteria: + +- `go build ./gamemaster/cmd/gamemaster` succeeds. +- Running with valid env brings `/healthz` and `/readyz` up. +- `SIGTERM` returns within `GAMEMASTER_SHUTDOWN_TIMEOUT`. + +## ~~Stage 09.~~ PostgreSQL schema, migrations, jet + +Goal: + +- finalise the persistence schema and the code-generation pipeline. + +Tasks: + +- `gamemaster/internal/adapters/postgres/migrations/00001_init.sql` — + `CREATE SCHEMA IF NOT EXISTS gamemaster;` plus the four tables and + indexes from [`./README.md §Persistence Layout`](./README.md): + `runtime_records`, `engine_versions`, `player_mappings`, + `operation_log`. All time columns are `timestamptz`. +- `gamemaster/internal/adapters/postgres/migrations/migrations.go` — + `//go:embed *.sql` and `FS()` exporter, identical pattern to lobby and + rtmanager. +- `gamemaster/cmd/jetgen/main.go` — testcontainers PostgreSQL + goose up + + jet generation against the resulting database. Mirrors + `rtmanager/cmd/jetgen/main.go`. +- Generated `gamemaster/internal/adapters/postgres/jet/...` committed to + the repo. +- Wire goose migrations into `gamemaster/internal/app/runtime.go` + startup so they apply before any listener opens; non-zero exit on + failure (matches `pkg/postgres` policy). + +Files new: + +- as above. + +Exit criteria: + +- `make -C gamemaster jet` regenerates the jet code with no diff after a + clean run. +- Service start applies migrations to a fresh database and exits zero if + migrations are already applied. + +## ~~Stage 10.~~ Domain layer and ports + +Goal: + +- lock the in-memory domain model and the port interfaces for adapters. + +Tasks: + +- `gamemaster/internal/domain/runtime/model.go` — `RuntimeRecord` struct; + status enum (`StatusStarting`, `StatusRunning`, + `StatusGenerationInProgress`, `StatusGenerationFailed`, `StatusStopped`, + `StatusEngineUnreachable`, `StatusFinished`); error sentinels. +- `gamemaster/internal/domain/runtime/transitions.go` — allowed + transitions table and a CAS-friendly validator. +- `gamemaster/internal/domain/engineversion/{model.go, semver.go}` — + `EngineVersion` struct (`Version`, `ImageRef`, `Options`, `Status`); + semver parse + patch-only comparison helpers. +- `gamemaster/internal/domain/playermapping/model.go` — `PlayerMapping` + struct (`GameID`, `UserID`, `RaceName`, `EnginePlayerUUID`). +- `gamemaster/internal/domain/schedule/nexttick.go` — wraps + `cronutil.Schedule`; carries `skip_next_tick` semantics on + `Next(after, skip bool) (time.Time, skipConsumed bool)`. +- `gamemaster/internal/ports/`: + - `runtimerecordstore.go` — `Get`, `Insert`, `UpdateStatus` (CAS by + expected status), `UpdateScheduling`, `ListDueRunning`, `ListByStatus`. + - `engineversionstore.go` — `Get`, `List` (with `status` filter), + `Insert`, `Update`, `Deprecate`, `IsReferencedByActiveRuntime`. + - `playermappingstore.go` — `BulkInsert`, `Get(gameID, userID)`, + `ListByGame(gameID)`, `DeleteByGame(gameID)`. + - `operationlog.go` — `Append`, `ListByGame`. + - `streamoffsetstore.go` — `Load`, `Save` (Redis offset persistence + per consumer label). + - `engineclient.go` — narrow surface GM uses: `Init`, `Status`, `Turn`, + `BanishRace`, `ExecuteCommands`, `PutOrders`, `GetReport`. + - `lobbyclient.go` — `GetMemberships(ctx, gameID) ([]Membership, error)`. + - `rtmclient.go` — `Stop(ctx, gameID, reason) error`, + `Patch(ctx, gameID, imageRef) error`, `Restart` (reserved; not in v1 + feature scope). + - `notificationpublisher.go` — `Publish(ctx, intent) error`. + - `lobbyeventspublisher.go` — `PublishSnapshotUpdate`, + `PublishGameFinished`. +- `//go:generate mockgen` directive next to each interface declaration. + +Files new: + +- as above. + +Exit criteria: + +- the package compiles. +- every interface has a `_ ports.X = (*Y)(nil)` assertion slot ready for + the adapters that follow. +- `go test ./gamemaster/internal/domain/...` passes. + +## ~~Stage 11.~~ Persistence adapters + +Goal: + +- implement the four PostgreSQL stores and the Redis offset store. + +Tasks: + +- `gamemaster/internal/adapters/postgres/runtimerecordstore/store.go` + using jet. CAS semantics on `UpdateStatus` (expected status comparison + inside the SQL `UPDATE ... WHERE game_id = $1 AND status = $2` + pattern). `UpdateScheduling` mutates `next_generation_at` and + `skip_next_tick` together. +- `gamemaster/internal/adapters/postgres/engineversionstore/store.go`. + `IsReferencedByActiveRuntime` joins against + `runtime_records WHERE status NOT IN ('finished','stopped')`. +- `gamemaster/internal/adapters/postgres/playermappingstore/store.go`. + `BulkInsert` is a single `INSERT ... ON CONFLICT DO NOTHING`. +- `gamemaster/internal/adapters/postgres/operationlog/store.go`. +- `gamemaster/internal/adapters/redisstate/streamoffsets/store.go` + (mirror Lobby's and RTM's `redisstate/streamoffsets`). +- For each adapter: store-level integration tests against testcontainers + PostgreSQL or Redis. CAS semantics on `runtime_records.UpdateStatus` + are verified by an explicit concurrent-update test (only one of two + callers wins). The semver-patch comparison in `engineversion` is + verified against a curated table of cases. + +Files new: + +- as above and per-package `_test.go`. + +Exit criteria: + +- store tests pass on a CI runner with Docker available. + +## ~~Stage 12.~~ External clients (engine, lobby, RTM, notification, lobby-events) + +Goal: + +- ship the HTTP and Redis adapters that GM uses to talk to the engine, + Lobby internal API, RTM internal API, the notification stream, and the + lobby-events stream. + +Tasks: + +- `gamemaster/internal/adapters/engineclient/client.go` — REST client + over an `otelhttp`-wrapped `http.Client`. Implements `ports.EngineClient` + by calling the renamed admin endpoints (`/api/v1/admin/init`, + `/admin/status`, `/admin/turn`, `/admin/race/banish`) and the player + endpoints (`/api/v1/command`, `/api/v1/order`, `/api/v1/report`). + Builds and consumes the existing JSON shapes from `game/openapi.yaml`. +- `gamemaster/internal/adapters/lobbyclient/client.go` — REST client for + `GET /api/v1/internal/games/{game_id}/memberships`. Returns a typed + `Membership` slice. +- `gamemaster/internal/adapters/rtmclient/client.go` — REST client for + `POST /api/v1/internal/runtimes/{game_id}/stop` and `/patch`. +- `gamemaster/internal/adapters/notificationpublisher/publisher.go` — + thin XADD wrapper over `notification:intents` using + `galaxy/notificationintent` constructors. +- `gamemaster/internal/adapters/lobbyeventspublisher/publisher.go` — + XADD wrapper for `gm:lobby_events`. Two methods: + `PublishSnapshotUpdate(ctx, msg)` and + `PublishGameFinished(ctx, msg)`. Schema enforced inline against + `runtime-events-asyncapi.yaml`. +- `gamemaster/internal/adapters/mocks/` — `mockgen`-generated mocks for + every `ports.*` interface. Regenerated by `make -C gamemaster mocks`. +- Per-adapter unit tests with mocks for the clients (httptest server for + REST adapters; miniredis for the publishers). + +Files new: + +- as above. + +Exit criteria: + +- mocks regenerate cleanly via `go generate`. +- unit tests pass. +- `go test ./gamemaster/internal/adapters/...` passes. + +## ~~Stage 13.~~ Service: register-runtime + +Goal: + +- end-to-end `register-runtime` operation: validate, persist initial + record, call engine `/admin/init`, persist player mappings, mark + running, schedule first turn. + +Tasks: + +- `gamemaster/internal/service/registerruntime/service.go` orchestrator, + following the flow from [`./README.md §Lifecycles → Register-runtime`](./README.md): + - validate envelope; + - reject if `runtime_records.{game_id}` exists; + - resolve `image_ref` for `target_engine_version` from + `engine_versions`; + - persist `runtime_records.status=starting`; + - call engine `/admin/init`; + - persist `player_mappings` rows from the engine response; + - CAS `status: starting → running`, persist `current_turn=0` and + initial `next_generation_at`; + - append `operation_log`; + - publish `runtime_snapshot_update`; + - return persisted runtime record. +- Failure paths: roll back `runtime_records` on engine failure; ensure no + orphan `player_mappings` rows; record failure in `operation_log`. +- Unit tests cover happy path, idempotent re-registration (returns + `conflict`), engine 4xx (`engine_validation_error`), engine 5xx + (`engine_unreachable`), missing engine version + (`engine_version_not_found`), partial-rollback paths. + +Files new: + +- `gamemaster/internal/service/registerruntime/{service.go, service_test.go, + errors.go}`. + +Exit criteria: + +- service-level tests pass. + +## ~~Stage 14.~~ Service: engine version registry CRUD + image-ref resolve + +Goal: + +- the registry surface used by Lobby's start flow and by Admin Service. + +Tasks: + +- `gamemaster/internal/service/engineversion/service.go`: + - `List(ctx, statusFilter)` — list versions optionally filtered by + `status`; + - `Get(ctx, version)` — read one; + - `Create(ctx, version, imageRef, options)` — validate semver, + validate Docker reference shape, persist; + - `Update(ctx, version, patch)` — partial update (`image_ref`, + `options`, `status`); + - `Deprecate(ctx, version)` — set `status=deprecated`; + - `Delete(ctx, version)` — hard delete; rejected with + `engine_version_in_use` if `IsReferencedByActiveRuntime` returns + true; + - `ResolveImageRef(ctx, version)` — read `image_ref` only; this is the + hot path used by Lobby. +- Unit tests cover create-validate, delete-when-active rejection, and + semver shape validation. Resolve is tested against a seeded table of + versions. + +Files new: + +- `gamemaster/internal/service/engineversion/{service.go, service_test.go, + errors.go}`. + +Exit criteria: + +- service-level tests pass. + +## ~~Stage 15.~~ Service: scheduler + turn generation + snapshot publisher + +Goal: + +- the heart of GM: the periodic scheduler and the turn-generation flow, + with snapshot publication and finish detection. + +Tasks: + +- `gamemaster/internal/service/turngeneration/service.go`: + - input: `gameID`, `trigger ∈ {scheduler, force}`; + - CAS `status: running → generation_in_progress`; + - call engine `/admin/turn`; + - on success: persist `current_turn`, evaluate `finished`, branch: + - finished: CAS `status → finished`, persist `finished_at`, + `PublishGameFinished`, publish `game.finished` notification, return; + - not finished: CAS `status → running`, recompute + `next_generation_at` (skip a tick if `skip_next_tick=true`, + then clear), `PublishSnapshotUpdate`, publish `game.turn.ready` + notification, return; + - on failure: CAS `status → generation_failed`, publish + `runtime_snapshot_update` reflecting the new status, publish + `game.generation_failed` admin notification, return. +- `gamemaster/internal/service/scheduler/service.go`: + - thin wrapper that builds the next-tick value from + `domain/schedule.NextTick` given `turn_schedule` and + `skip_next_tick`; + - reused by both the ticker worker (Stage 19 wires it) and by the + `force-next-turn` admin op (Stage 17). +- `gamemaster/internal/worker/schedulerticker/worker.go`: + - 1-second loop; + - calls `runtime_records.ListDueRunning(now)` and runs + `turngeneration.Run(ctx, gameID, scheduler)` per game; + - serialises per-`game_id` calls (one in-flight per game; concurrent + games proceed in parallel). +- Unit tests cover happy path, finish detection, force trigger with skip + consumption, generation failure, CAS contention with a concurrent + external status change (e.g., admin stop). +- Player turn stats are derived from `StateResponse.player[]` and + projected to `{user_id, planets, population}` via + `playermappingstore.ListByGame`. + +Files new: + +- `gamemaster/internal/service/turngeneration/{service.go, service_test.go, + errors.go}`, + `gamemaster/internal/service/scheduler/{service.go, service_test.go}`, + `gamemaster/internal/worker/schedulerticker/{worker.go, worker_test.go}`. + +Exit criteria: + +- service-level tests pass. + +## ~~Stage 16.~~ Service: hot-path command + order + report + membership cache + +Goal: + +- the gateway-facing trio: command execution, order submission, report + reading. Membership cache and the invalidation hook. + +Tasks: + +- `gamemaster/internal/service/membership/cache.go`: + - in-process `map[gameID]entry{members map[userID]MembershipStatus, + loadedAt}`; + - `Resolve(ctx, gameID, userID) (status, error)` — checks cache, falls + back to `lobbyclient.GetMemberships` on miss or TTL expiry; + - `Invalidate(gameID)` — purges the cache entry; + - LRU eviction governed by + `GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES`. +- `gamemaster/internal/service/commandexecute/service.go`: + - input: `gameID`, `userID`, payload `{commands:[…]}`; + - validate `runtime_records.{game_id}` exists with + `status=running`; + - resolve membership; reject if not active; + - resolve `race_name` from `playermappingstore`; + - call engine `/api/v1/command` with `CommandRequest{actor=race_name, + cmd=…}`; + - return engine response verbatim. +- `gamemaster/internal/service/orderput/service.go`: identical structure, + calls `/api/v1/order`. +- `gamemaster/internal/service/reportget/service.go`: input + `{gameID, userID, turn}`; resolves `race_name`; calls + `/api/v1/report?player=…&turn=…`; returns body verbatim. +- Unit tests: each service covers happy path, runtime-not-running, + forbidden, engine 4xx, engine 5xx; membership cache tests cover hit, + miss, TTL expiry, invalidate. + +Files new: + +- `gamemaster/internal/service/membership/{cache.go, cache_test.go}`, + `gamemaster/internal/service/commandexecute/{service.go, service_test.go}`, + `gamemaster/internal/service/orderput/{service.go, service_test.go}`, + `gamemaster/internal/service/reportget/{service.go, service_test.go}`. + +Exit criteria: + +- service-level tests pass. + +## ~~Stage 17.~~ Service: admin operations (stop, force-next-turn, patch, banish, liveness) + +Goal: + +- the remaining service-layer operations: admin/runtime control plus the + Lobby-facing liveness reply. + +Tasks: + +- `gamemaster/internal/service/adminstop/service.go`: + - input `{gameID, reason}`; + - call `rtmclient.Stop(ctx, gameID, reason)`; + - on success: CAS `runtime_records.status: * → stopped`; append + `operation_log`; publish `runtime_snapshot_update`. +- `gamemaster/internal/service/adminforce/service.go`: + - run `turngeneration.Run(ctx, gameID, force)` synchronously; + - on success, set `runtime_records.skip_next_tick = true` (the next + scheduler-driven `Next` consumes it). +- `gamemaster/internal/service/adminpatch/service.go`: + - input `{gameID, version}`; + - resolve new `image_ref` via `engineversion.ResolveImageRef`; + - validate semver-patch against current + `runtime_records.current_engine_version`; reject with + `semver_patch_only` otherwise; + - call `rtmclient.Patch(ctx, gameID, imageRef)`; + - on success: persist new `current_image_ref` and + `current_engine_version`; append `operation_log`. +- `gamemaster/internal/service/adminbanish/service.go`: + - input `{gameID, raceName}`; + - validate `playermappingstore.GetByRace(gameID, raceName)` exists; + - call engine `/admin/race/banish`; + - append `operation_log`. +- `gamemaster/internal/service/livenessreply/service.go`: + - lookup `runtime_records.{game_id}`; + - return `{ready: status==running, status: }`. +- Unit tests for each service cover happy path and each documented error + code. + +Files new: + +- `gamemaster/internal/service/adminstop/...`, + `gamemaster/internal/service/adminforce/...`, + `gamemaster/internal/service/adminpatch/...`, + `gamemaster/internal/service/adminbanish/...`, + `gamemaster/internal/service/livenessreply/...`. + +Exit criteria: + +- service-level tests pass. + +## ~~Stage 18.~~ Async consumer: `runtime:health_events` + +Goal: + +- bring runtime health into GM's view per game and propagate to Lobby + via the snapshot stream. + +Tasks: + +- `gamemaster/internal/worker/healtheventsconsumer/worker.go`: + - XREADs `runtime:health_events` with a persisted offset (via + `streamoffsetstore`); + - decodes the AsyncAPI envelope from RTM; + - updates `runtime_records.engine_health` per `game_id`; + - emits a debounced `runtime_snapshot_update` only when the summary + string changes. +- The summary derivation rule: + - `healthy` ⇒ summary `healthy`; + - `probe_failed` after threshold ⇒ summary `probe_failed`; + - `inspect_unhealthy` ⇒ summary `inspect_unhealthy`; + - `container_exited` ⇒ summary `exited` and CAS `status → + engine_unreachable`; + - `container_oom` ⇒ summary `oom` and CAS `status → + engine_unreachable`; + - `container_disappeared` ⇒ summary `disappeared` and CAS + `status → engine_unreachable`. +- Unit tests use `miniredis` and the AsyncAPI fixture from + `rtmanager/api/runtime-health-asyncapi.yaml`. + +Files new: + +- `gamemaster/internal/worker/healtheventsconsumer/{worker.go, worker_test.go}`. + +Exit criteria: + +- worker tests pass. + +## ~~Stage 19.~~ Internal REST handlers + +Goal: + +- ship the gateway-, Lobby-, and Admin-facing REST surface backed by + the service layer. + +Tasks: + +- `gamemaster/internal/api/internalhttp/handlers/{registerruntime, + getruntime, listruntimes, forcenextturn, stopruntime, patchruntime, + banishrace, invalidatememberships, gameliveness, listengineversions, + createengineversion, getengineversion, updateengineversion, + deprecateengineversion, resolveengineversionimageref, executecommands, + putorders, getreport}.go` — one file per operation, each delegating to + the corresponding service. JSON in / JSON out. Unknown JSON fields + rejected with `invalid_request`. +- Error envelope identical to lobby and rtmanager. +- Wiring under the existing internal HTTP listener; route registration + in `gamemaster/internal/app/wiring.go`. +- Handler-level table-driven tests. +- OpenAPI conformance test that loads `api/internal-openapi.yaml` and + asserts every defined operation is reachable and matches its declared + response shape. + +Files new: + +- handlers + tests + the conformance test + `gamemaster/api/openapi_conformance_test.go`. + +Exit criteria: + +- OpenAPI conformance test passes for every endpoint. +- Handlers reject unknown JSON fields. + +## Stage 20. Lobby refactor + +Goal: + +- complete the Lobby side of the new image-resolve and membership + invalidation contract. + +Tasks: + +- Replace `lobby/internal/domain/engineimage/resolver.go` with a thin + GM-client wrapper. The package goes away; the call site in + `lobby/internal/service/startgame/service.go` switches from + `engineimage.Resolver{}.Resolve(version)` to + `gmClient.ResolveImageRef(ctx, version)`. +- Drop `LOBBY_ENGINE_IMAGE_TEMPLATE` from + `lobby/internal/config/{config.go, env.go, validation.go}`. Remove the + validation function and the related env-var test cases. +- Add `InvalidateMemberships(ctx, gameID) error` to + `lobby/internal/ports/gmclient.go`. Regenerate the `mockgen`-mock and + update the inmem fake to record invocations. +- Wire the new call from: + - `lobby/internal/service/approveapplication/service.go` — post-commit; + - `lobby/internal/service/rejectapplication/service.go` — post-commit + (only if a reservation existed prior); + - `lobby/internal/service/redeeminvite/service.go` — post-commit; + - `lobby/internal/service/removemember/service.go` — post-commit + (already in scope of removal); + - `lobby/internal/service/blockmember/service.go` — post-commit; + - `lobby/internal/worker/userlifecycle/consumer.go` — post-commit per + game in the cascade. +- Failed invalidation is logged at `warn` and incremented in the + existing `lobby.notification.publish_attempts` style metric (or a new + `lobby.gm_invalidation.publish_attempts`) but does not roll back the + business commit. TTL on GM is the safety net. +- Update Lobby unit tests, in particular the start-flow tests (replace + `engineimage` mock with `gmclient.ResolveImageRef` mock) and the + membership-mutation tests (assert `InvalidateMemberships` was called + post-commit). +- Update `lobby/api/internal-openapi.yaml` only if any new field + surfaces (none expected; the call shape is on Lobby's outbound side, + not on its REST surface). + +Files touched: + +- `lobby/internal/service/{startgame, approveapplication, + rejectapplication, redeeminvite, removemember, blockmember}/`, + `lobby/internal/worker/userlifecycle/`, + `lobby/internal/config/{config.go, env.go, validation.go}`, + `lobby/internal/ports/gmclient.go`, + `lobby/internal/adapters/gmclient/client.go`, + `lobby/internal/adapters/mocks/gmclient/...`, + `lobby/internal/adapters/gmclientinmem/...` (if the inmem fake + exists; otherwise the mockgen mock plus the migration described in + RTM stage 22 is enough). + +Files removed: + +- `lobby/internal/domain/engineimage/` (entire package). + +Exit criteria: + +- `go test ./lobby/...` passes. +- `LOBBY_ENGINE_IMAGE_TEMPLATE` no longer appears in any Lobby source or + documentation. +- Lobby's start-flow integration test still passes against a stub + `gmclient` that returns `image_ref` synchronously. + +## Stage 21. Service-local integration suite + +Goal: + +- end-to-end suite running against testcontainers PostgreSQL + Redis + + the real `galaxy/game` engine container. + +Tasks: + +- `gamemaster/integration/harness/` — set up PostgreSQL with + goose-applied migrations; Redis (testcontainers Redis for + coordination suites that exercise streams); ensure the Docker bridge + network exists; build `galaxy/game` test image once per package run + with `sync.Once`; tear everything down via `t.Cleanup`. Reuse the + RTM-built image where possible (skip rebuilding when present). +- `gamemaster/integration/registerruntime_test.go` — register-runtime + happy path: GM persists the runtime record, calls engine + `/admin/init`, persists `player_mappings`, transitions to `running`, + publishes a `runtime_snapshot_update`. Engine answers with a real + `StateResponse`. +- `gamemaster/integration/scheduler_test.go` — schedules a five-second + turn cron, observes one tick, asserts engine `/admin/turn` was hit and + `current_turn` advanced. Force-next-turn test asserts `skip_next_tick` + consumes the next regular tick. +- `gamemaster/integration/hotpath_test.go` — full command, order, and + report round-trips against the real engine. Membership invalidation + hook test asserts the cache flushes on demand. +- `gamemaster/integration/adminops_test.go` — admin stop calls a stub + RTM and asserts the runtime record transitions to `stopped`. Admin + patch with a non-patch semver target fails with `semver_patch_only`. + Admin banish hits the engine endpoint. +- `gamemaster/integration/healthevents_test.go` — publishes a fake + `runtime:health_events` entry, asserts the consumer updates + `engine_health` and emits a debounced snapshot. +- `gamemaster/integration/notification_test.go` — observe + `notification:intents` after a successful turn (`game.turn.ready`), + after a finish (`game.finished`), and after a forced engine failure + (`game.generation_failed` admin email). + +Files new: + +- as above. + +Exit criteria: + +- `go test ./gamemaster/integration/...` passes locally with Docker + available. +- CI runs the suite under a profile that exposes the Docker socket. + +## Stage 22. Inter-service test: Lobby ↔ GM + +Goal: + +- exercise the new image-ref resolve, register-runtime, and membership + invalidation paths end-to-end without RTM in the loop. + +Tasks: + +- `integration/lobbygm/` (top-level integration directory, mirroring + existing `integration/lobbyrtm`): runs real Lobby, real GM, real + PostgreSQL, real Redis, a stub RTM that simply returns success on + `runtime:start_jobs`, and the real `galaxy/game` test engine container. +- Scenarios: + - Lobby creates a game, resolves `image_ref` from GM, publishes a + start_job, the stub RTM acks success, Lobby calls + `register-runtime` on GM, GM `/admin/init`s the engine, GM transitions + to `running`, GM publishes `runtime_snapshot_update`, Lobby updates + its denormalised view. + - One full turn generation cycle: scheduler ticks, GM calls engine + `/admin/turn`, GM publishes `runtime_snapshot_update`, Lobby's + per-game stats aggregate updates. + - Membership change: an admin removes a member; Lobby's + `removemember` post-commit calls GM `invalidate-memberships`; the + next player command from that user fails with `forbidden`. + - Game finish: engine returns `finished:true`; GM publishes + `game_finished`; Lobby transitions the platform game record to + `finished` and runs the capability evaluator. + +Files new: + +- as above. + +Exit criteria: + +- all scenarios pass in CI when the Docker socket is available. + +## Stage 23. Inter-service test: Lobby ↔ GM ↔ RTM (full happy path) + +Goal: + +- the canonical end-to-end test covering the whole running-game pipeline. + +Tasks: + +- `integration/lobbygmrtm/`: runs real Lobby, real GM, real RTM, real + PostgreSQL, real Redis, and the real `galaxy/game` test engine + container. +- Scenarios: + - Happy path: enrollment → start → RTM container → GM register-runtime + → engine `/admin/init` → first player command → first scheduled turn + → engine `finished:true` → GM `game_finished` → Lobby transitions to + `finished` → RTM cleanup TTL. + - Failure path A: RTM reports `start_config_invalid` on + `runtime:job_results`; Lobby transitions the game to `start_failed`; + no GM register-runtime is attempted. + - Failure path B: container starts but GM is unavailable when Lobby + calls `register-runtime`; Lobby transitions the game to `paused` and + publishes `lobby.runtime_paused_after_start`; once GM comes back, + Lobby's resume flow calls GM `/liveness`, receives `ready=true`, + re-issues `register-runtime`, and the game reaches `running`. + +Files new: + +- as above. + +Exit criteria: + +- all scenarios pass in CI when the Docker socket is available. + +## Stage 24. Service-local docs + +Goal: + +- drop per-stage decisions captured during this plan into discoverable + service-local documentation, mirroring `lobby/docs/` and + `rtmanager/docs/`. + +Tasks: + +- `gamemaster/docs/README.md` — index pointing at the five content docs + and the postgres-migration record. +- `gamemaster/docs/runtime.md` — components, processes, in-memory state + of each worker. +- `gamemaster/docs/flows.md` — Mermaid diagrams for: register-runtime, + turn generation, force-next-turn skip, hot-path command, admin patch, + finish, health consumption, banish. +- `gamemaster/docs/runbook.md` — operator scenarios: «engine became + unreachable», «turn generation failed and stuck», «patch upgrade», + «manual force-next-turn», «engine version registry rotation», + «membership cache appears stale». +- `gamemaster/docs/examples.md` — env-var examples per environment + (dev / test / prod skeletons), example payloads for each stream and + each REST endpoint. +- `gamemaster/docs/postgres-migration.md` — decision record for the + schema (mirrors `notification/docs/postgres-migration.md` style). +- Add per-stage decision records under `gamemaster/docs/stage-*.md` + for any stage that produced a noteworthy decision (mirroring the RTM + pattern). At minimum: + - `stage11-persistence-adapters.md`, + - `stage12-external-clients.md`, + - `stage15-scheduler-and-turn-generation.md`, + - `stage16-membership-cache-and-invalidation.md`, + - `stage17-admin-operations.md`, + - `stage18-health-events-consumer.md`, + - `stage20-lobby-refactor.md`. + +Files new: + +- all of the above. + +Exit criteria: + +- the README of GM links to `docs/README.md`. +- a reviewer can find any operational how-to within two clicks. + +## Final Acceptance Criteria + +- `go build ./...` from the repository root succeeds. +- `go test ./...` from the repository root passes. +- `go test -tags=integration ./gamemaster/integration/...` passes when + Docker is available. +- `go test ./integration/lobbygm/...` and + `go test ./integration/lobbygmrtm/...` pass when Docker is available. +- `make -C gamemaster jet` regenerates jet code with no diff after a + clean run. +- `make -C gamemaster mocks` regenerates mock code with no diff after a + clean run. +- Manual smoke: bring Lobby + GM + RTM + the rest of the stack up via + the existing dev compose; create a game; observe a real + `galaxy-game-{game_id}` container; play one turn round-trip; observe + a `runtime_snapshot_update` on `gm:lobby_events`; force-next-turn; + observe the next scheduled tick is skipped; stop the game; the + container moves to `exited`. +- Documentation across `ARCHITECTURE.md`, `gamemaster/`, `lobby/`, + `notification/`, `game/`, and `rtmanager/` is internally consistent. + +## Out of Scope + +- Multi-instance GM with leader election (`Game Master` runs as a single + process in v1). +- Engine state file management (backup, archival, host-side cleanup). +- Direct gateway routing of admin `message_type` values (admin operations + land via Admin Service in a later iteration; v1 exposes only the GM + internal REST surface). +- TLS / mTLS on the internal listener. +- Engine-version automatic patch upgrades (manual admin operation only). +- A pause/resume flow on GM's side beyond the liveness-check reply. + +## Risks and Notes + +- The membership invalidation hook from Lobby into GM is a deliberate + tight coupling. TTL stays as the safety net for any failed invalidation; + the explicit hook only optimises for the staleness window. Failure to + invalidate is logged but never rolls back Lobby state. This trade-off + is recorded in [`./README.md` §Hot Path](./README.md). +- Lobby refactor (Stage 20) gates on GM stages 14 (engine version registry + resolve endpoint) and 19 (handlers wired). Once Lobby switches to GM + for image-ref resolution, Lobby cannot start a game when GM is + unavailable; this is documented as the new failure mode in + `lobby/README.md` (Stage 03). +- Engine path rename (Stage 05) is internal to `galaxy/game`. No other + service today calls `/api/v1/init`, `/api/v1/status`, or + `/api/v1/turn` (RTM probes only `/healthz`); the rename is therefore a + contained change inside the engine module. The user owns the + conditional logic that fills `StateResponse.finished` and the + body-level mechanics of `banish`. +- GM single-instance is a single point of failure for turn generation in + v1. The trade-off is acceptable for the prototype and is documented in + `gamemaster/README.md §Non-Goals`. +- Pre-launch single-init policy applies to GM exactly as documented in + `ARCHITECTURE.md §Persistence Backends`: schema evolves by editing + `00001_init.sql` until first production deploy. diff --git a/gamemaster/README.md b/gamemaster/README.md new file mode 100644 index 0000000..db3f829 --- /dev/null +++ b/gamemaster/README.md @@ -0,0 +1,975 @@ +# Game Master + +`Game Master` (GM) is the only Galaxy platform service permitted to talk to +running game engine containers. It owns runtime and operational state of +already-running games, the engine version registry, the platform mapping of +`(user_id ↔ race_name ↔ engine_player_uuid)`, the per-game turn scheduler, +and the synchronous and asynchronous boundaries that other services use to +interact with running games. + +## References + +- [`../ARCHITECTURE.md`](../ARCHITECTURE.md) — system architecture, §8 Game + Master. +- [`../TESTING.md`](../TESTING.md) §8 — testing matrix for GM. +- [`./PLAN.md`](./PLAN.md) — staged implementation plan. +- [`./docs/README.md`](./docs/README.md) — service-local documentation entry + point (created at PLAN stage 24). +- [`./docs/stage06-contract-files.md`](./docs/stage06-contract-files.md) — + decisions behind the OpenAPI and AsyncAPI specs frozen at PLAN stage 06. +- [`./docs/stage07-notification-catalog-audit.md`](./docs/stage07-notification-catalog-audit.md) — + notification catalog audit and producer-side freeze test added at PLAN stage 07. +- [`./docs/stage08-module-skeleton.md`](./docs/stage08-module-skeleton.md) — + module skeleton wiring decisions (config groups, telemetry instruments, + Makefile targets, deferred dependencies) recorded at PLAN stage 08. +- [`./docs/stage09-postgres-migration.md`](./docs/stage09-postgres-migration.md) — + PostgreSQL schema, embedded migration, jet generation pipeline, and + runtime wiring landed at PLAN stage 09. +- [`./docs/stage10-domain-and-ports.md`](./docs/stage10-domain-and-ports.md) — + domain types, port interfaces, and the six stage-10 decisions + (operation domain package, membership DTO placement, engine-version + options shape, schedule wrapper signature, recovery transition, + deferred mock destination) landed at PLAN stage 10. +- [`./docs/stage11-persistence-adapters.md`](./docs/stage11-persistence-adapters.md) — + PostgreSQL stores (`runtimerecordstore`, `engineversionstore`, + `playermappingstore`, `operationlog`), the Redis offset store, and + the eight stage-11 decisions (sqlx/pgtest local clones, CAS + pattern, port-level Now extension, domain conflict sentinels, jsonb + cast, idempotent Deprecate, multi-row BulkInsert, miniredis + dependency) landed at PLAN stage 11. +- [`./docs/stage12-external-clients.md`](./docs/stage12-external-clients.md) — + outbound adapters (engine, Lobby, Runtime Manager, notification + intent publisher, lobby-events publisher) and the seven stage-12 + decisions (per-call engine base URL, dual engine timeout dispatch, + engine population rounding, Lobby pagination cap, no extra RTM + sentinels, AsyncAPI-aligned XADD encoding for `gm:lobby_events`, + Makefile mocks-target guard) landed at PLAN stage 12. +- [`./docs/stage13-register-runtime.md`](./docs/stage13-register-runtime.md) — + register-runtime service-layer orchestrator and the five + stage-13 decisions (`RuntimeRecordStore.Delete` extension, engine + 4xx/5xx classification split, engine response validated as + `engine_protocol_violation`, initial snapshot carries `player_turn_stats` + from `/admin/init`, two-flag rollback gating) landed at PLAN + stage 13. +- [`./docs/stage14-engine-version-registry.md`](./docs/stage14-engine-version-registry.md) — + engine version registry service-layer orchestrator (List, Get, + Create, Update, Deprecate, Delete, ResolveImageRef) and the five + stage-14 decisions (`EngineVersionStore.Delete` port extension, + reference probe before hard delete, new `engine_version_delete` + op_kind in schema and domain, `operation_log.game_id` overloaded + as audit subject for registry entries, JSON-object validation for + `options`) landed at PLAN stage 14. +- [`./docs/stage15-scheduler-and-turn-generation.md`](./docs/stage15-scheduler-and-turn-generation.md) — + scheduler ticker, turn-generation orchestrator, and snapshot + publisher and the seven stage-15 decisions + (`LobbyClient.GetGameSummary` extension with fail-soft `game_name` + fallback, telemetry-only `Trigger` parameter, two-CAS pattern with + external-mutation conflict, single-snapshot-per-outcome cadence, + player_mappings as recipient source, stateless scheduler utility, + in-flight set on the ticker) landed at PLAN stage 15. +- [`./docs/stage16-membership-cache-and-invalidation.md`](./docs/stage16-membership-cache-and-invalidation.md) — + hot-path services (`commandexecute`, `orderput`, `reportget`), + membership cache, and the six stage-16 decisions (no + `runtime_not_running` for reports, GM-side envelope rewrite + `commands`→`cmd` with injected `actor`, hot-path skips + `operation_log`, hand-rolled per-game inflight tracker, raw status + string return, missing-mapping surfaces as `forbidden`) landed at + PLAN stage 16. +- [`./docs/stage17-admin-operations.md`](./docs/stage17-admin-operations.md) — + admin service-layer operations (`adminstop`, `adminforce`, + `adminpatch`, `adminbanish`, `livenessreply`) and the six + stage-17 decisions (`RuntimeRecordStore.UpdateImage` extension, + `adminstop` idempotent on terminal statuses and `conflict` on + `starting`, `adminforce` always sets `skip_next_tick`, + `adminbanish` without status check and missing race surfaces as + `forbidden`, `livenessreply` 200 + empty status on + `runtime_not_found`, RTM failures map to `service_unavailable`) + landed at PLAN stage 17. +- [`./docs/stage18-health-events-consumer.md`](./docs/stage18-health-events-consumer.md) — + `runtime:health_events` consumer worker and the seven stage-18 + decisions (event-type taxonomy expanded to seven values with + `container_started` and `probe_recovered`, CAS-conflict fallback to + health-only update, new `RuntimeRecordStore.UpdateEngineHealth` + port method, in-memory dedupe of last-emitted summaries, + read-after-write snapshot construction, `health_events` stream + offset label, worker wiring deferred to Stage 19) landed at PLAN + stage 18. +- [`./api/internal-openapi.yaml`](./api/internal-openapi.yaml) — internal + trusted REST contract. +- [`./api/runtime-events-asyncapi.yaml`](./api/runtime-events-asyncapi.yaml) — + `gm:lobby_events` Redis Stream contract. +- [`../game/README.md`](../game/README.md) — game engine container contract + (env, ports, admin and player REST surfaces, `/healthz`). +- [`../lobby/README.md`](../lobby/README.md) — Game Lobby integration with GM. +- [`../rtmanager/README.md`](../rtmanager/README.md) — Runtime Manager + contract used synchronously by GM admin operations. + +## Purpose + +A running Galaxy game lives in exactly one Docker container managed by +`Runtime Manager`. The platform must: + +- register a freshly started container with platform-level membership; +- initialise the engine with the agreed race roster; +- accept and forward player commands and orders to the engine; +- route per-player report reads; +- generate turns according to a schedule; +- detect game finish and propagate it back to platform-level state; +- expose runtime/operational controls (force-next-turn, stop, patch, banish); +- own the catalogue of supported engine versions and resolve `image_ref` + values for `Game Lobby`. + +`Game Master` is the single component that performs these actions. It does +**not** own platform metadata of games (that is `Game Lobby`), Docker control +(that is `Runtime Manager`), or the full game state (that is the engine +container). Engine state on disk is the engine's domain; GM never reads or +writes the bind-mounted state directory. + +## Scope + +`Game Master` is the source of truth for: + +- the runtime mapping `game_id → engine_endpoint` for every running game; +- the runtime status (`starting | running | generation_in_progress | + generation_failed | stopped | engine_unreachable | finished`); +- the current turn number and the next-tick timestamp; +- the per-game `(user_id, race_name, engine_player_uuid)` triple; +- the engine version registry: `(version, image_ref, options, status)`; +- the durable history of every operation GM performed (`operation_log`); +- the latest engine health summary per game. + +`Game Master` is **not** the source of truth for: + +- platform game records (created, draft, enrollment, finished metadata) — + owned by `Game Lobby`; +- container lifecycle and Docker reality — owned by `Runtime Manager`; +- in-game world state (planets, ships, science, reports) — owned by the + engine container; +- platform user identity and entitlements — owned by `User Service`; +- in-game `race_name` reservations and the Race Name Directory — owned by + `Game Lobby`. + +## Non-Goals + +- Multi-instance operation in v1. GM runs as a single process; the in-process + scheduler is authoritative. Multi-instance with leader election is an + explicit future iteration. +- Direct Docker access. GM never imports the Docker SDK; every container + operation goes through `Runtime Manager` over trusted internal REST. +- Player removal/block at platform level. `Game Lobby` owns that decision; + GM only performs the engine-side `banish` call when explicitly invoked. +- Pause/resume of a running game on the platform side. `Game Lobby.paused` + is a platform-only state; GM only answers a liveness probe used by + Lobby's resume flow. +- Automatic semver-patch upgrades. Patch is always an explicit admin + operation against a target engine version present in the registry. +- TLS or mTLS on the internal listener. GM trusts its network segment. +- Direct delivery of player-visible push events. `Notification Service` + owns user-targeted push delivery; GM publishes notification intents only. +- A separate Admin Service. GM exposes its trusted internal REST surface; + Admin Service will adopt it in a later iteration. +- Engine state file management. Backup, archival, and cleanup of the + bind-mounted state directories are operator concerns. + +## Position in the System + +```mermaid +flowchart LR + Gateway["Edge Gateway"] + Lobby["Game Lobby"] + Admin["Admin Service\n(future)"] + GM["Game Master"] + RTM["Runtime Manager"] + Notify["Notification Service"] + Engine["Game Engine container\n(galaxy/game)"] + Postgres["PostgreSQL\nschema gamemaster"] + Redis["Redis\nstreams + caches"] + + Gateway -- "verified player commands\n(REST/JSON)" --> GM + Lobby -- "register-runtime,\nimage-ref resolve,\nmemberships invalidate" --> GM + Admin -- "internal REST" --> GM + GM -- "engine HTTP API" --> Engine + GM -- "stop / restart / patch" --> RTM + GM -- "notification:intents" --> Notify + GM -- "gm:lobby_events" --> Redis + Redis -- "runtime:health_events" --> GM + GM --> Postgres +``` + +`Edge Gateway` routes verified player message types (`game.command.execute`, +`game.order.put`, `game.report.get`) to GM as trusted REST/JSON after +transcoding from FlatBuffers. `Game Lobby` calls GM synchronously to +register runtimes after a successful container start, to resolve `image_ref` +from the engine version registry, to invalidate membership cache on roster +changes, and to verify GM liveness during platform resume. `Game Master` +calls `Runtime Manager` synchronously over REST for stop, restart, and +patch. `Runtime Manager` publishes `runtime:health_events`, which GM +consumes asynchronously. GM publishes `gm:lobby_events` consumed by +`Game Lobby`, and `notification:intents` consumed by `Notification Service`. + +## Responsibility Boundaries + +`Game Master` is responsible for: + +- registering a freshly started container into platform-level runtime state; +- initialising the engine with the race roster received from Lobby; +- maintaining the platform mapping of `user_id`, `race_name`, and + `engine_player_uuid`; +- forwarding player commands, orders, and report reads to the engine after + authorising the actor; +- generating turns on schedule, including the force-next-turn skip rule; +- evaluating engine finish on every turn boundary; +- publishing runtime snapshot updates and the final game-finish event; +- consuming runtime health events from `Runtime Manager` and updating its + per-game health summary; +- exposing the engine version registry CRUD; +- driving admin-level runtime operations (stop, force-next-turn, patch, + banish) by calling `Runtime Manager` and the engine on demand. + +`Game Master` is not responsible for: + +- creating or stopping containers on Docker (that is `Runtime Manager`); +- evaluating whether a game is allowed to start (that is `Game Lobby`); +- deriving recipient user lists for non-game notifications (that is + `Notification Service`); +- verifying authenticated transport, signatures, freshness, and replay + (that is `Edge Gateway`); +- mapping `user_id` to platform-level membership (that is `Game Lobby`). + +## Engine Container Contract + +The engine container is `galaxy/game`. GM uses two route classes: + +| Class | Path | Purpose | +| --- | --- | --- | +| Admin (GM-only) | `POST /api/v1/admin/init` | Initialise the engine with a race roster. | +| Admin (GM-only) | `GET /api/v1/admin/status` | Read the full game state. | +| Admin (GM-only) | `PUT /api/v1/admin/turn` | Generate the next turn. | +| Admin (GM-only) | `POST /api/v1/admin/race/banish` | Deactivate a race after permanent platform removal. Body `{race_name}`. | +| Player | `PUT /api/v1/command` | Execute a batch of player commands. | +| Player | `PUT /api/v1/order` | Validate and store a batch of player orders. | +| Player | `GET /api/v1/report` | Fetch per-player turn report. | +| Probe | `GET /healthz` | Liveness probe used by `Runtime Manager` and operator tooling. | + +Admin paths are unauthenticated but routed only from inside the trusted +network segment that connects GM to the engine container. The engine does +not enforce caller identity — network-level segmentation is the boundary. + +`StateResponse` carries an extra boolean `finished` field. When `true` on a +turn-generation response, GM treats the game as finished and runs the +finish flow described below. The conditional logic that flips `finished` +to `true` lives in the engine's domain code and is not GM's concern. + +The engine endpoint URL is the `engine_endpoint` value handed to GM by +`Game Lobby` during `register-runtime`: `http://galaxy-game-{game_id}:8080`. +The DNS name is stable across restart and patch. + +## Runtime Surface + +### Listeners + +| Listener | Default address | Purpose | +| --- | --- | --- | +| Internal HTTP | `:8097` (`GAMEMASTER_INTERNAL_HTTP_ADDR`) | Probes (`/healthz`, `/readyz`) and the trusted REST surface for `Edge Gateway`, `Game Lobby`, and `Admin Service`. | + +There is no public listener. The internal listener is unauthenticated and +assumes a trusted network segment. Authentication of player commands has +already happened at `Edge Gateway`; GM enforces authorisation only. + +### Background workers + +| Worker | Driver | Description | +| --- | --- | --- | +| Scheduler ticker | 1 s loop | Scans `runtime_records` for due `next_generation_at`, runs the turn-generation service for each, recomputes `next_generation_at` from `turn_schedule` (skipping one tick when `skip_next_tick=true` is set). | +| `runtime:health_events` consumer | Redis Stream | XREADs from `runtime:health_events` (produced by RTM), updates `runtime_records.engine_health` summary, debounces `runtime_snapshot_update` publication. | + +### Startup dependencies + +In start order: + +1. PostgreSQL primary (`GAMEMASTER_POSTGRES_PRIMARY_DSN`). Embedded goose + migrations apply synchronously before any listener opens. +2. Redis master (`GAMEMASTER_REDIS_MASTER_ADDR`). +3. Telemetry exporter (OTLP grpc/http or stdout). +4. Internal HTTP listener. +5. Health-events consumer worker. +6. Scheduler ticker worker. + +A failure in any step exits the process non-zero. + +### Probes + +`/healthz` reports liveness — the process responds when the HTTP server is +alive. + +`/readyz` reports readiness — `200` only when the PostgreSQL pool can ping +the primary and the Redis master client can ping. No deeper dependency is +checked synchronously; the engine is reached only on demand. + +Both probes are documented in +[`./api/internal-openapi.yaml`](./api/internal-openapi.yaml). + +## Lifecycles + +### Register-runtime + +**Triggered by:** `Game Lobby` after a successful container start, calling +`POST /api/v1/internal/games/{game_id}/register-runtime` with body +`{engine_endpoint, members:[{user_id, race_name}], target_engine_version, +turn_schedule}`. + +**Flow on success:** + +1. Validate request shape; reject with `invalid_request` if any required + field is missing. +2. Reject with `conflict` if `runtime_records.{game_id}` already exists. +3. Resolve `image_ref` for `target_engine_version` from `engine_versions`; + reject with `engine_version_not_found` when missing. +4. Persist `runtime_records` with `status=starting`, `engine_endpoint`, + `current_image_ref`, `current_engine_version`, `turn_schedule`, and + `created_at`. +5. Call engine `POST /api/v1/admin/init` with the race-name list derived + from `members`. +6. Read `StateResponse` and persist one `player_mappings` row per player: + `(game_id, user_id, race_name, engine_player_uuid)`. +7. CAS `runtime_records.status: starting → running`. Persist + `current_turn=0` and `next_generation_at` computed from `turn_schedule`. +8. Append `operation_log` entry (`op_kind=register_runtime`, + `outcome=success`). +9. Publish `runtime_snapshot_update` to `gm:lobby_events`. +10. Return `200` with the persisted `runtime_records` row. + +**Failure paths:** + +| Failure | Side effect | Outcome to caller | +| --- | --- | --- | +| Invalid envelope | None | `400 invalid_request` | +| `runtime_records` already exists | None | `409 conflict` | +| Engine `/admin/init` returns 4xx | Roll back `runtime_records`; append failure to `operation_log` | `502 engine_validation_error` | +| Engine `/admin/init` returns 5xx or fails at the transport layer | Roll back; append failure | `502 engine_unreachable` | +| Engine response missing players or contains races not in roster | Roll back; append failure | `502 engine_protocol_violation` | +| PostgreSQL transaction failure | Roll back; append failure if possible | `503 service_unavailable` | + +A failed `register-runtime` leaves no `runtime_records` row and no +`player_mappings` rows. `Game Lobby` then transitions the platform game +record to `paused` (per the architecture's flow §4 forced-pause path). + +### Turn generation + +**Triggered by:** the scheduler ticker when `now >= next_generation_at` +for a game in `status=running`, or by an admin invocation of +`force-next-turn`. + +**Flow on success:** + +1. CAS `runtime_records.status: running → generation_in_progress`. If the + CAS fails (status changed concurrently), the tick is skipped silently. +2. Call engine `PUT /api/v1/admin/turn`. Engine returns `StateResponse` + with the new `turn` and the updated `player[]` array. +3. Persist `runtime_records.current_turn` and refresh + `runtime_records.engine_health` summary. +4. If `StateResponse.finished == true`: + - CAS `runtime_records.status: generation_in_progress → finished`; + - publish `game_finished` to `gm:lobby_events` with + `{game_id, final_turn_number, finished_at_ms, player_turn_stats[]}`; + - publish `game.finished` notification intent to all `active` members. +5. If `StateResponse.finished == false`: + - CAS `runtime_records.status: generation_in_progress → running`; + - recompute `next_generation_at` from `turn_schedule`. If + `skip_next_tick=true`, advance by one extra cron step and clear the + flag; + - publish `runtime_snapshot_update` to `gm:lobby_events` with + `{game_id, current_turn, runtime_status, engine_health_summary, + player_turn_stats[]}`; + - publish `game.turn.ready` notification intent to all `active` + members. +6. Append `operation_log` entry (`op_kind=turn_generation`, + `outcome=success`). + +**Failure paths:** + +| Failure | Side effect | Outcome | +| --- | --- | --- | +| Engine timeout / 5xx | CAS `status: generation_in_progress → generation_failed`; publish `runtime_snapshot_update`; publish `game.generation_failed` admin notification | Logged; ticker leaves the game in `generation_failed` until manual recovery (admin issues `force-next-turn` or `stop`). | +| Persistence failure after engine success | Append failure to `operation_log`; status stays `generation_in_progress` | Health-summary update on next probe will resync. | + +`player_turn_stats[]` is built from `StateResponse.player[]` by mapping +`raceName → user_id` through `player_mappings` and projecting +`{user_id, planets, population}`. `ships_built` is intentionally absent +(see [`./docs/stage01-architecture-sync.md`](./docs/stage01-architecture-sync.md)). + +### Force-next-turn + +**Triggered by:** `Admin Service` or system-admin via +`POST /api/v1/internal/runtimes/{game_id}/force-next-turn`. + +**Pre-conditions:** runtime exists, `status=running`. + +**Flow:** + +1. Run the turn-generation flow synchronously (the same code path the + scheduler uses). +2. After success, set `runtime_records.skip_next_tick = true`. The next + regular tick computed from `turn_schedule` is then advanced by one + extra step before being persisted as `next_generation_at`. +3. Append `operation_log` entry (`op_kind=force_next_turn`). + +The skip rule guarantees that the inter-turn spacing is never shorter than +one schedule interval, regardless of when the force is issued. + +### Game finish + +The finish flow is driven entirely by the engine signal `finished:bool`. +GM never decides finish independently. After `game_finished` is published, +`Game Lobby` transitions its platform record to `finished`, runs the +capability evaluation, and finalises Race Name Directory state. The GM +record stays in `status=finished` indefinitely; cleanup is operator-driven. + +### Banish (engine-side player removal) + +**Triggered by:** `Game Lobby` synchronously calling +`POST /api/v1/internal/games/{game_id}/race/{race_name}/banish` after a +permanent membership removal at platform level. + +**Pre-conditions:** runtime exists; `race_name` resolves to an existing +`player_mappings` row. + +**Flow:** + +1. Call engine `POST /api/v1/admin/race/banish` with `{race_name}`. +2. On engine success, append `operation_log` entry (`op_kind=banish`, + `outcome=success`). +3. Return `204` to Lobby. + +**Failure path:** engine error returns `502 engine_unreachable`. Lobby +treats this as a degraded state and may retry; the platform-level +membership stays `removed` regardless. + +### Stop + +**Triggered by:** system-admin via +`POST /api/v1/internal/runtimes/{game_id}/stop` with body `{reason}`, +where `reason ∈ {admin_request, finished, timeout}`. + +**Flow:** + +1. Call `Runtime Manager` `POST /api/v1/internal/runtimes/{game_id}/stop` + with the same `reason`. +2. CAS `runtime_records.status: * → stopped`. +3. Append `operation_log` entry. +4. Publish `runtime_snapshot_update` reflecting the stopped status. + +### Patch + +**Triggered by:** system-admin via +`POST /api/v1/internal/runtimes/{game_id}/patch` with body `{version}`. + +**Pre-conditions:** + +- `engine_versions.{version}` exists with `status=active`; +- the new version is a semver-patch of the current version (same major and + minor); otherwise reject with `semver_patch_only`. + +**Flow:** + +1. Resolve `image_ref` from `engine_versions.{version}`. +2. Call `Runtime Manager` + `POST /api/v1/internal/runtimes/{game_id}/patch` with `{image_ref}`. +3. On success, persist new `current_image_ref` and `current_engine_version` + on `runtime_records`. +4. Append `operation_log` entry. + +The engine container is recreated by RTM with the same DNS name; the +`engine_endpoint` is unchanged. GM does not call `/admin/init` again — +the bind-mounted state directory is preserved and the engine resumes from +the previous turn. + +### Liveness reply (Lobby resume) + +**Triggered by:** `Game Lobby` resuming a paused game, calling +`GET /api/v1/internal/games/{game_id}/liveness`. + +**Flow:** if `runtime_records.{game_id}` exists and `status=running`, +return `200 {ready: true}`. Otherwise return `200 {ready: false, status: +""}`. + +This endpoint never calls the engine; it reflects GM's own view only. + +## Hot Path + +### Player commands and orders + +Both `game.command.execute` and `game.order.put` use the same FlatBuffers +schema (`pkg/schema/fbs/order.fbs` `Order{updated_at, commands:[…]}`). The +gateway transcodes the verified payload to JSON via +`pkg/transcoder/order.go` before calling GM. + +**GM endpoints:** + +- `POST /api/v1/internal/games/{game_id}/commands` — execute now; engine + `PUT /api/v1/command`. +- `POST /api/v1/internal/games/{game_id}/orders` — validate-and-store; + engine `PUT /api/v1/order`. + +Both endpoints accept body `{commands:[{cmd_id, @type, …}, …]}` and the +`X-User-ID` header. The actor field on the engine call is **always** set +by GM from the authenticated user identity; GM never trusts a payload +field for actor identification. + +**Pre-conditions:** + +- `runtime_records.{game_id}` exists with `status=running`; +- the user is an `active` member of the game (cache lookup); +- `player_mappings.(game_id, user_id)` exists. + +**Errors:** + +- `runtime_not_found` — runtime missing. +- `runtime_not_running` — `runtime_status` is anything other than + `running`. +- `forbidden` — caller is not an active member. +- `engine_unreachable` — engine returned 5xx. +- `engine_validation_error` — engine returned 4xx; the body carries the + engine's per-command result (`cmd_applied`, `cmd_error_code`). + +### Reports + +**GM endpoint:** `GET /api/v1/internal/games/{game_id}/reports/{turn}` +with the `X-User-ID` header. + +**Flow:** + +1. Authorise: caller must be an active member of the game. +2. Resolve `race_name` from `player_mappings`. +3. Call engine `GET /api/v1/report?player={race_name}&turn={turn}`. +4. Return the engine response verbatim. Reports are full per-player + payloads and are never cached at the platform layer; the engine remains + the source of truth. + +### Membership cache and invalidation + +GM holds an in-process per-game TTL cache (default 30 s) of memberships +loaded from `Lobby /api/v1/internal/games/{id}/memberships`. The cache +shape is `map[user_id]MembershipStatus` plus a load timestamp. TTL is +the safety-net fallback. + +The primary invalidation mechanism is an explicit hook from Lobby: + +- Endpoint: `POST /api/v1/internal/games/{game_id}/memberships/invalidate`. +- Lobby invokes it post-commit on every operation that mutates roster: + application approval, application rejection, invite redeem, member + remove, member block, user-lifecycle cascade. +- Failed invalidation does not roll back Lobby state; the TTL safety net + catches stale data within the next 30 s. + +This is a deliberate tight coupling. The trade-off is recorded in +[`./PLAN.md` Stage 16](./PLAN.md). + +## Engine Version Registry + +The registry is the source of truth for which engine versions are +deployable. CRUD is exposed on the GM internal port; `Game Lobby` +consumes it synchronously to resolve `image_ref` for `target_engine_version` +just before publishing a `runtime:start_jobs` envelope. + +| Method | Path | Purpose | +| --- | --- | --- | +| `GET` | `/api/v1/internal/engine-versions` | List versions; supports `status` filter. | +| `POST` | `/api/v1/internal/engine-versions` | Create a new version with `version`, `image_ref`, optional `options`. Validates semver shape and Docker reference. | +| `GET` | `/api/v1/internal/engine-versions/{version}` | Read one version. | +| `PATCH` | `/api/v1/internal/engine-versions/{version}` | Update `image_ref`, `options`, or `status`. | +| `DELETE` | `/api/v1/internal/engine-versions/{version}` | Soft-deprecate (`status=deprecated`). Hard delete is rejected if the version is referenced by any non-finished `runtime_records` row. | +| `GET` | `/api/v1/internal/engine-versions/{version}/image-ref` | Resolve `image_ref` only. Used by Lobby's start flow. | + +`options` is a free-form `jsonb` document stored verbatim. v1 does not +enforce a schema; future engine-side options follow the engine's own +contract. + +`status` values: `active` (deployable), `deprecated` (rejected on new +starts; existing runtimes unaffected). Hard removal of a deprecated +version requires that no runtime references it. + +Lobby resolves `image_ref` synchronously per game start. If the resolve +call fails or the version is missing, Lobby fails the start with +`engine_version_not_found` and never publishes `runtime:start_jobs`. + +## Trusted Surfaces + +### Internal REST + +The internal REST surface is consumed by: + +- `Edge Gateway` — verified player commands and report reads; +- `Game Lobby` — register-runtime, image-ref resolve, membership invalidate, + banish, liveness reply; +- `Admin Service` (future) — full administrative operations; +- platform probes — `/healthz`, `/readyz`. + +The listener is unauthenticated; downstream services rely on network +segmentation. Caller identity for audit is recorded from the optional +`X-Galaxy-Caller` header (`gateway`, `lobby`, `admin`) and reflected as +`op_source` in `operation_log` (`gateway_player`, `lobby_internal`, +`admin_rest`); when missing or unrecognised, GM defaults to +`op_source=admin_rest`. + +For player-command endpoints, the additional `X-User-ID` header is +required and authoritative for the acting user identity. + +Request and response shapes are defined in +[`./api/internal-openapi.yaml`](./api/internal-openapi.yaml). Unknown JSON +fields are rejected with `invalid_request`. + +## Async Stream Contracts + +### `gm:lobby_events` (out) + +Producer: `Game Master`. Consumer: `Game Lobby`. + +Two message types share the stream, discriminated by `event_type`: + +| `event_type` | Body | +| --- | --- | +| `runtime_snapshot_update` | `{game_id, current_turn, runtime_status, engine_health_summary, player_turn_stats:[{user_id, planets, population}], occurred_at_ms}` | +| `game_finished` | `{game_id, final_turn_number, runtime_status:"finished", player_turn_stats:[…], finished_at_ms}` | + +Publication cadence: events only. GM publishes a snapshot when: + +- a turn was generated (success or failure); +- `runtime_status` transitioned (e.g., `running ↔ generation_in_progress`, + `running → engine_unreachable`, `* → finished`); +- `engine_health_summary` changed in response to a `runtime:health_events` + observation (debounced — duplicates are suppressed when the summary did + not change). + +There is no periodic heartbeat. `Game Lobby` consumes these events to +update its denormalised runtime snapshot and to feed the per-game +`player_turn_stats` aggregate used at game finish. + +The first `runtime_snapshot_update` published right after a successful +`register-runtime` carries `player_turn_stats` projected from the +engine `/admin/init` response — the per-player baseline (`planets`, +`population`) at turn 0. Lobby treats this baseline as the reference +point against which subsequent turn deltas are measured. For other +status transitions that fire without a fresh engine state payload +(e.g., a pure health-summary change), `player_turn_stats` is empty. + +The full schema is enforced by +[`./api/runtime-events-asyncapi.yaml`](./api/runtime-events-asyncapi.yaml). + +### `runtime:health_events` (in) + +Producer: `Runtime Manager`. Consumer: `Game Master`. + +GM consumes the stream to update `runtime_records.engine_health` summary +per game. The schema is owned by `Runtime Manager` and documented in +[`../rtmanager/api/runtime-health-asyncapi.yaml`](../rtmanager/api/runtime-health-asyncapi.yaml). +GM never modifies `runtime:health_events`; it is read-only. + +GM does not publish notifications in response to runtime health changes +in v1; the operator surface is `gm:lobby_events` plus the GM REST +inspect endpoints. + +## Notification Contracts + +`Game Master` publishes notification intents to `notification:intents` +using the shared `pkg/notificationintent` producer module: + +| Trigger | `notification_type` | Audience | Channels | +| --- | --- | --- | --- | +| Successful turn generation | `game.turn.ready` | active members of the game | `push+email` | +| Game finish | `game.finished` | active members of the game | `push+email` | +| Turn generation failed | `game.generation_failed` | configured admin email list | `email` | + +Recipient resolution: GM materialises `recipient_user_ids` from its own +membership cache (loaded from Lobby) at publish time; admin recipients +are resolved by `Notification Service` from configuration. + +A failed publication is a notification degradation and must not roll back +already committed runtime state. Failed publications are logged and +counted via `gamemaster.notification.publish_attempts`. + +## Persistence Layout + +### PostgreSQL durable state (schema `gamemaster`) + +| Table | Purpose | Key | +| --- | --- | --- | +| `runtime_records` | One row per game; latest known runtime status and scheduling state. | `game_id` | +| `engine_versions` | Engine version registry. | `version` | +| `player_mappings` | `(game_id, user_id) → race_name + engine_player_uuid`. | composite `(game_id, user_id)` | +| `operation_log` | Append-only audit of every GM operation. | `id` (auto) | + +`runtime_records` columns: + +- `game_id` — primary key, references Lobby's identifier. +- `status` — `starting | running | generation_in_progress | + generation_failed | stopped | engine_unreachable | finished`. +- `engine_endpoint` — `http://galaxy-game-{game_id}:8080`. +- `current_image_ref` — Docker reference of the running image. +- `current_engine_version` — semver string registered in `engine_versions`. +- `turn_schedule` — five-field cron expression copied from Lobby. +- `current_turn` — last completed turn number; `0` until the first turn + generates. +- `next_generation_at` — UTC timestamp of the next due tick. +- `skip_next_tick` — boolean; set by `force-next-turn`, cleared after the + first cron step is skipped. +- `engine_health` — short text summary derived from + `runtime:health_events`. +- `created_at`, `updated_at`, `started_at`, `stopped_at`, `finished_at` — + lifecycle timestamps. + +`engine_versions` columns: + +- `version` — primary key; semver string. +- `image_ref` — non-empty Docker reference. +- `options` — `jsonb`, free-form, default `'{}'`. +- `status` — `active | deprecated`. +- `created_at`, `updated_at`. + +`player_mappings` columns: + +- composite primary key `(game_id, user_id)`. +- `race_name` — non-empty string; unique per `game_id`. +- `engine_player_uuid` — UUID returned by the engine `/admin/init`. +- `created_at`. + +`operation_log` columns: + +- `id`, `game_id`, `op_kind` (`register_runtime | turn_generation | + force_next_turn | banish | stop | patch | engine_version_create | + engine_version_update | engine_version_deprecate | + engine_version_delete`), `op_source`, `source_ref` (request id + when known), `outcome` (`success | failure`), `error_code`, + `error_message`, `started_at`, `finished_at`. + +For engine-version registry entries (`op_kind` starting with +`engine_version_`), the `game_id` column doubles as the audit subject +and stores the canonical `version` string instead of a platform game +identifier; the registry is global, not per-game. The convention is +documented in +[`./docs/stage14-engine-version-registry.md`](./docs/stage14-engine-version-registry.md). + +Indexes: + +- `runtime_records (status, next_generation_at)` — drives the scheduler + ticker scan. +- `operation_log (game_id, started_at DESC)` — drives audit reads. +- UNIQUE on `player_mappings (game_id, race_name)` — + one-race-per-game invariant. + +Per-game roster reads (`WHERE game_id = $1`) are served by the +leftmost prefix of the composite primary key on +`player_mappings (game_id, user_id)`; no extra single-column index is +added. + +Migrations are embedded `00001_init.sql` (single-init pre-launch policy +from `ARCHITECTURE.md §Persistence Backends`). + +### Redis runtime-coordination state + +| Key shape | Purpose | +| --- | --- | +| `gamemaster:stream_offsets:{label}` | Last processed entry id per consumer (`health_events`). Same shape as Lobby and RTM. | + +GM does not persist the membership cache to Redis in v1; the cache is +in-process. This trade-off is documented in [`./PLAN.md` Stage 16](./PLAN.md). + +## Error Model + +Error envelope: `{ "error": { "code": "...", "message": "..." } }`, +identical to Lobby and RTM. + +Stable error codes: + +| Code | Meaning | +| --- | --- | +| `invalid_request` | Malformed JSON, unknown fields, missing required parameter. | +| `runtime_not_found` | `runtime_records.{game_id}` does not exist. | +| `runtime_not_running` | Operation requires `status=running`. | +| `conflict` | State transition not allowed. | +| `forbidden` | Caller is not an active member or not authorised. | +| `engine_version_not_found` | `engine_versions.{version}` does not exist. | +| `engine_version_in_use` | Hard-delete attempt against a version referenced by a non-finished runtime. | +| `semver_patch_only` | Patch attempt across major/minor boundary. | +| `engine_unreachable` | Engine returned 5xx or connection error. | +| `engine_protocol_violation` | Engine response missing required fields or carries unexpected payload. | +| `engine_validation_error` | Engine returned 4xx with per-command results. | +| `service_unavailable` | Dependency (PostgreSQL, Redis, Lobby, RTM) unavailable. | +| `internal_error` | Unspecified failure. | + +## Configuration + +All variables use the `GAMEMASTER_` prefix. Required variables fail-fast +on startup. + +### Required + +- `GAMEMASTER_INTERNAL_HTTP_ADDR` +- `GAMEMASTER_POSTGRES_PRIMARY_DSN` +- `GAMEMASTER_REDIS_MASTER_ADDR` +- `GAMEMASTER_REDIS_PASSWORD` +- `GAMEMASTER_LOBBY_INTERNAL_BASE_URL` +- `GAMEMASTER_RTM_INTERNAL_BASE_URL` + +### Configuration groups + +**Listener:** + +- `GAMEMASTER_INTERNAL_HTTP_ADDR` (e.g., `:8097`). +- `GAMEMASTER_INTERNAL_HTTP_READ_TIMEOUT` (default `5s`). +- `GAMEMASTER_INTERNAL_HTTP_WRITE_TIMEOUT` (default `30s`). +- `GAMEMASTER_INTERNAL_HTTP_IDLE_TIMEOUT` (default `60s`). + +**PostgreSQL:** + +- `GAMEMASTER_POSTGRES_PRIMARY_DSN` + (`postgres://gamemaster:@:5432/galaxy?search_path=gamemaster&sslmode=disable`). +- `GAMEMASTER_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not used + in v1). +- `GAMEMASTER_POSTGRES_OPERATION_TIMEOUT` (default `2s`). +- `GAMEMASTER_POSTGRES_MAX_OPEN_CONNS` (default `10`). +- `GAMEMASTER_POSTGRES_MAX_IDLE_CONNS` (default `2`). +- `GAMEMASTER_POSTGRES_CONN_MAX_LIFETIME` (default `30m`). + +**Redis:** + +- `GAMEMASTER_REDIS_MASTER_ADDR`. +- `GAMEMASTER_REDIS_REPLICA_ADDRS` (optional, comma-separated). +- `GAMEMASTER_REDIS_PASSWORD`. +- `GAMEMASTER_REDIS_DB` (default `0`). +- `GAMEMASTER_REDIS_OPERATION_TIMEOUT` (default `2s`). + +**Streams:** + +- `GAMEMASTER_REDIS_LOBBY_EVENTS_STREAM` (default `gm:lobby_events`). +- `GAMEMASTER_REDIS_HEALTH_EVENTS_STREAM` (default + `runtime:health_events`). +- `GAMEMASTER_REDIS_NOTIFICATION_INTENTS_STREAM` (default + `notification:intents`). +- `GAMEMASTER_STREAM_BLOCK_TIMEOUT` (default `5s`). + +**Engine client:** + +- `GAMEMASTER_ENGINE_CALL_TIMEOUT` (default `30s` — covers turn generation + on large games). +- `GAMEMASTER_ENGINE_PROBE_TIMEOUT` (default `5s` — for inspect-style + reads). + +**Lobby internal client:** + +- `GAMEMASTER_LOBBY_INTERNAL_BASE_URL`. +- `GAMEMASTER_LOBBY_INTERNAL_TIMEOUT` (default `2s`). + +**Runtime Manager internal client:** + +- `GAMEMASTER_RTM_INTERNAL_BASE_URL`. +- `GAMEMASTER_RTM_INTERNAL_TIMEOUT` (default `5s`). + +**Scheduler:** + +- `GAMEMASTER_SCHEDULER_TICK_INTERVAL` (default `1s`). +- `GAMEMASTER_TURN_GENERATION_TIMEOUT` (default `60s`). + +**Membership cache:** + +- `GAMEMASTER_MEMBERSHIP_CACHE_TTL` (default `30s`). +- `GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES` (default `4096`; LRU eviction). + +**Logging:** + +- `GAMEMASTER_LOG_LEVEL` (default `info`). + +**Lifecycle:** + +- `GAMEMASTER_SHUTDOWN_TIMEOUT` (default `30s`). + +**Telemetry:** uses the standard OTLP env vars +(`OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_PROTOCOL`, etc.) +shared with other Galaxy services. + +## Observability + +### Metrics (OpenTelemetry, low cardinality) + +- `gamemaster.register_runtime.outcomes` — counter; labels `outcome`, + `error_code`. +- `gamemaster.turn_generation.outcomes` — counter; labels `outcome`, + `error_code`, `trigger` (`scheduler | force`). +- `gamemaster.command_execute.outcomes` — counter; labels `outcome`, + `error_code`. +- `gamemaster.order_put.outcomes` — counter; labels `outcome`, + `error_code`. +- `gamemaster.report_get.outcomes` — counter; labels `outcome`, + `error_code`. +- `gamemaster.banish.outcomes` — counter; labels `outcome`, `error_code`. +- `gamemaster.engine_call.latency` — histogram; label `op` (`init | + status | turn | banish | command | order | report`). +- `gamemaster.runtime_records_by_status` — gauge; label `status`. +- `gamemaster.scheduler.due_games` — gauge. +- `gamemaster.health_events.consumed` — counter. +- `gamemaster.lobby_events.published` — counter; label `event_type`. +- `gamemaster.notification.publish_attempts` — counter; label + `notification_type`, `result` (`ok | error`). +- `gamemaster.membership_cache.hits` — counter; labels `result` (`hit | + miss | invalidate`). +- `gamemaster.engine_versions_total` — gauge. + +Metrics avoid high-cardinality attributes such as `game_id` and `user_id`. + +### Structured logs (slog JSON to stdout) + +Common fields on every entry: `service=gamemaster`, `request_id`, +`trace_id`, `span_id`, `game_id` (when known), `user_id` (when known), +`op_kind`, `op_source`, `outcome`, `error_code`. + +Worker-specific fields: `event_type` (lobby-events publisher), +`stream_entry_id` (health-events consumer), `turn` (turn-generation), +`engine_endpoint` (engine calls). + +## Verification + +Service-level (per [`./PLAN.md`](./PLAN.md)): + +- Unit tests for every service-layer operation against mocked engine, + Lobby, RTM, notification publisher, lobby-events publisher. +- Adapter tests using `testcontainers-go` for PostgreSQL and Redis. +- Contract tests for `internal-openapi.yaml` and + `runtime-events-asyncapi.yaml`. + +Service-local integration suite under `gamemaster/integration/`: + +- Register-runtime + first turn happy path against the real + `galaxy/game` test image. +- Force-next-turn skip behaviour. +- Engine version registry CRUD + resolve. +- Admin stop synchronous REST. +- Banish round-trip. +- Membership invalidation hook. +- `runtime:health_events` consumption. + +Inter-service suite under `integration/lobbygm/` and +`integration/lobbygmrtm/`: + +- `lobbygm`: real Lobby + real GM + real engine + stub RTM. Covers + enrollment → register-runtime → first turn → finish + capability + evaluation. +- `lobbygmrtm`: full Lobby + GM + RTM + engine. Covers happy path and the + documented failure paths from `ARCHITECTURE.md` flow §4. + +Manual smoke (development): + +```sh +docker network create galaxy-net # once +GAMEMASTER_INTERNAL_HTTP_ADDR=:8097 \ +GAMEMASTER_POSTGRES_PRIMARY_DSN=postgres://gamemaster:secret@localhost:5432/galaxy?search_path=gamemaster&sslmode=disable \ +GAMEMASTER_REDIS_MASTER_ADDR=localhost:6379 \ +GAMEMASTER_REDIS_PASSWORD=secret \ +GAMEMASTER_LOBBY_INTERNAL_BASE_URL=http://localhost:8095 \ +GAMEMASTER_RTM_INTERNAL_BASE_URL=http://localhost:8096 \ +... go run ./gamemaster/cmd/gamemaster +``` + +After start, `curl http://localhost:8097/readyz` returns `200`. Driving +Lobby through its public start flow brings up `galaxy-game-{game_id}` +containers, GM registers each runtime, generates turns on the configured +schedule, and propagates events to Lobby. diff --git a/gamemaster/api/internal-openapi.yaml b/gamemaster/api/internal-openapi.yaml new file mode 100644 index 0000000..67bef21 --- /dev/null +++ b/gamemaster/api/internal-openapi.yaml @@ -0,0 +1,1083 @@ +openapi: 3.0.3 +info: + title: Galaxy Game Master Internal REST API + version: v1 + description: | + This specification documents the internal trusted REST contract of + `galaxy/gamemaster` served on `GAMEMASTER_INTERNAL_HTTP_ADDR` + (default `:8097`). + + This port is not reachable from the public internet. Callers are: + + - `Edge Gateway` for verified player commands, orders, and reports. + - `Game Lobby` for runtime registration, image-ref resolution, + membership-cache invalidation, race banishment, and liveness + probes. + - `Admin Service` (future) for runtime control operations and the + engine version registry. + + Transport rules: + + - request bodies are strict JSON only; unknown fields are rejected + - error responses use `{ "error": { "code", "message" } }` matching + the envelope used by `galaxy/lobby` and `galaxy/rtmanager` + - timestamps are UTC Unix milliseconds (`integer, format: int64`) + - the listener is unauthenticated; downstream services rely on + network segmentation. The `X-User-ID` header is required only on + the three Edge Gateway hot-path operations + (`internalExecuteCommands`, `internalPutOrders`, + `internalGetReport`) and carries the verified player identity. + + Schema closure: + + - every body schema owned by `Game Master` sets + `additionalProperties: false` + - three operations forward engine-owned payloads verbatim + (`internalExecuteCommands`, `internalPutOrders`, + `internalGetReport`) and therefore use + `additionalProperties: true` on the corresponding request and + response bodies. The source of truth for those shapes is + `galaxy/game/openapi.yaml`. + - `EngineVersion.options` is a free-form `jsonb` document and uses + `additionalProperties: true` for the same reason. +servers: + - url: http://localhost:8097 + description: Default local internal listener for Game Master. +tags: + - name: Probes + description: Health and readiness probes. + - name: Runtimes + description: Runtime control surface used by Admin Service. + - name: GMIntegration + description: Game Lobby integration paths under /api/v1/internal/games. + - name: EngineVersions + description: Engine version registry CRUD and image-ref resolve. + - name: Gateway + description: Edge Gateway hot-path commands, orders, and reports. +paths: + /healthz: + get: + tags: + - Probes + operationId: internalHealthz + summary: Internal listener health probe + responses: + "200": + description: Service is alive. + content: + application/json: + schema: + $ref: "#/components/schemas/ProbeResponse" + examples: + ok: + value: + status: ok + "503": + $ref: "#/components/responses/ServiceUnavailableError" + /readyz: + get: + tags: + - Probes + operationId: internalReadyz + summary: Internal listener readiness probe + responses: + "200": + description: Service is ready to serve traffic. + content: + application/json: + schema: + $ref: "#/components/schemas/ProbeResponse" + examples: + ready: + value: + status: ready + "503": + $ref: "#/components/responses/ServiceUnavailableError" + /api/v1/internal/games/{game_id}/register-runtime: + post: + tags: + - GMIntegration + operationId: internalRegisterRuntime + summary: Register a runtime after a successful container start + description: | + Called by `Game Lobby` after `Runtime Manager` has reported a + successful container start. Game Master persists the runtime + record, calls the engine `/api/v1/admin/init`, persists player + mappings derived from the engine response, and transitions the + runtime to `running`. + parameters: + - $ref: "#/components/parameters/GameIDPath" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/RegisterRuntimeRequest" + responses: + "200": + description: Runtime registered and transitioned to `running`. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeRecord" + "400": + $ref: "#/components/responses/InvalidRequestError" + "404": + $ref: "#/components/responses/EngineVersionNotFoundError" + "409": + $ref: "#/components/responses/ConflictError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" + "503": + $ref: "#/components/responses/ServiceUnavailableError" + /api/v1/internal/games/{game_id}/race/{race_name}/banish: + post: + tags: + - GMIntegration + operationId: internalBanishRace + summary: Banish a race from the running engine after a permanent removal + description: | + Called by `Game Lobby` synchronously after a permanent + platform-level membership removal. Game Master forwards the call + to the engine `/api/v1/admin/race/banish` and records the + outcome in the operation log. + parameters: + - $ref: "#/components/parameters/GameIDPath" + - $ref: "#/components/parameters/RaceNamePath" + responses: + "204": + description: Race banished from the engine. + "404": + $ref: "#/components/responses/NotFoundError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/games/{game_id}/memberships/invalidate: + post: + tags: + - GMIntegration + operationId: internalInvalidateMemberships + summary: Invalidate the membership cache for a game + description: | + Called by `Game Lobby` post-commit on every roster mutation + (application approval, rejection, invite redeem, member remove, + member block, user-lifecycle cascade). Game Master purges the + in-process per-game membership cache; the TTL is the safety net + for missed calls. + parameters: + - $ref: "#/components/parameters/GameIDPath" + responses: + "204": + description: Membership cache entry invalidated. + "404": + $ref: "#/components/responses/NotFoundError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/games/{game_id}/liveness: + get: + tags: + - GMIntegration + operationId: internalGameLiveness + summary: Report whether a runtime is ready + description: | + Called by `Game Lobby` as part of the resume flow for a paused + game. Reflects Game Master's own runtime view; the engine is not + contacted by this endpoint. + parameters: + - $ref: "#/components/parameters/GameIDPath" + responses: + "200": + description: Liveness reply. + content: + application/json: + schema: + $ref: "#/components/schemas/LivenessResponse" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/runtimes: + get: + tags: + - Runtimes + operationId: internalListRuntimes + summary: List runtime records + description: | + Returns runtime records ordered by `created_at` descending. The + optional `status` query parameter narrows the result to runtimes + in the given runtime status. + parameters: + - $ref: "#/components/parameters/RuntimeStatusQuery" + responses: + "200": + description: Page of runtime records. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeListResponse" + "400": + $ref: "#/components/responses/InvalidRequestError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/runtimes/{game_id}: + get: + tags: + - Runtimes + operationId: internalGetRuntime + summary: Read one runtime record + parameters: + - $ref: "#/components/parameters/GameIDPath" + responses: + "200": + description: Runtime record. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeRecord" + "404": + $ref: "#/components/responses/NotFoundError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/runtimes/{game_id}/force-next-turn: + post: + tags: + - Runtimes + operationId: internalForceNextTurn + summary: Force immediate generation of the next turn + description: | + Runs the turn-generation flow synchronously and sets + `skip_next_tick` so the next regular cron tick is consumed + without producing back-to-back turns. + parameters: + - $ref: "#/components/parameters/GameIDPath" + responses: + "200": + description: Turn generated; runtime record reflects the new turn number and scheduling state. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeRecord" + "404": + $ref: "#/components/responses/NotFoundError" + "409": + $ref: "#/components/responses/ConflictError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/runtimes/{game_id}/stop: + post: + tags: + - Runtimes + operationId: internalStopRuntime + summary: Stop a runtime through Runtime Manager + description: | + Game Master forwards the request to `Runtime Manager` and CASes + the runtime status to `stopped` on success. + parameters: + - $ref: "#/components/parameters/GameIDPath" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/StopRuntimeRequest" + responses: + "200": + description: Runtime stopped. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeRecord" + "400": + $ref: "#/components/responses/InvalidRequestError" + "404": + $ref: "#/components/responses/NotFoundError" + "500": + $ref: "#/components/responses/InternalError" + "503": + $ref: "#/components/responses/ServiceUnavailableError" + /api/v1/internal/runtimes/{game_id}/patch: + post: + tags: + - Runtimes + operationId: internalPatchRuntime + summary: Patch the engine version of a runtime through Runtime Manager + description: | + Resolves the new image reference from the engine version + registry, validates the target version is a semver-patch of the + currently running version, and forwards the patch call to + `Runtime Manager`. + parameters: + - $ref: "#/components/parameters/GameIDPath" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/PatchRuntimeRequest" + responses: + "200": + description: Runtime patched; `current_engine_version` and `current_image_ref` updated. + content: + application/json: + schema: + $ref: "#/components/schemas/RuntimeRecord" + "400": + $ref: "#/components/responses/InvalidRequestError" + "404": + $ref: "#/components/responses/NotFoundError" + "409": + $ref: "#/components/responses/ConflictError" + "500": + $ref: "#/components/responses/InternalError" + "503": + $ref: "#/components/responses/ServiceUnavailableError" + /api/v1/internal/engine-versions: + get: + tags: + - EngineVersions + operationId: internalListEngineVersions + summary: List engine versions + parameters: + - $ref: "#/components/parameters/EngineVersionStatusQuery" + responses: + "200": + description: Engine version registry contents. + content: + application/json: + schema: + $ref: "#/components/schemas/EngineVersionListResponse" + "400": + $ref: "#/components/responses/InvalidRequestError" + "500": + $ref: "#/components/responses/InternalError" + post: + tags: + - EngineVersions + operationId: internalCreateEngineVersion + summary: Create a new engine version record + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/CreateEngineVersionRequest" + responses: + "201": + description: Engine version created. + content: + application/json: + schema: + $ref: "#/components/schemas/EngineVersion" + "400": + $ref: "#/components/responses/InvalidRequestError" + "409": + $ref: "#/components/responses/ConflictError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/engine-versions/{version}: + get: + tags: + - EngineVersions + operationId: internalGetEngineVersion + summary: Read one engine version record + parameters: + - $ref: "#/components/parameters/VersionPath" + responses: + "200": + description: Engine version record. + content: + application/json: + schema: + $ref: "#/components/schemas/EngineVersion" + "404": + $ref: "#/components/responses/NotFoundError" + "500": + $ref: "#/components/responses/InternalError" + patch: + tags: + - EngineVersions + operationId: internalUpdateEngineVersion + summary: Patch an engine version record + parameters: + - $ref: "#/components/parameters/VersionPath" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/UpdateEngineVersionRequest" + responses: + "200": + description: Engine version updated. + content: + application/json: + schema: + $ref: "#/components/schemas/EngineVersion" + "400": + $ref: "#/components/responses/InvalidRequestError" + "404": + $ref: "#/components/responses/NotFoundError" + "500": + $ref: "#/components/responses/InternalError" + delete: + tags: + - EngineVersions + operationId: internalDeprecateEngineVersion + summary: Deprecate an engine version + description: | + Sets the engine version status to `deprecated`. Hard removal of + a version that is referenced by a non-finished runtime is + rejected with `engine_version_in_use`. + parameters: + - $ref: "#/components/parameters/VersionPath" + responses: + "204": + description: Engine version deprecated. + "404": + $ref: "#/components/responses/NotFoundError" + "409": + $ref: "#/components/responses/EngineVersionInUseError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/engine-versions/{version}/image-ref: + get: + tags: + - EngineVersions + operationId: internalResolveEngineVersionImageRef + summary: Resolve the image reference of an engine version + description: | + Hot path used by `Game Lobby` synchronously before publishing + a `runtime:start_jobs` envelope. Returns the `image_ref` only. + parameters: + - $ref: "#/components/parameters/VersionPath" + responses: + "200": + description: Image reference of the requested version. + content: + application/json: + schema: + $ref: "#/components/schemas/ImageRefResponse" + "404": + $ref: "#/components/responses/EngineVersionNotFoundError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/games/{game_id}/commands: + post: + tags: + - Gateway + operationId: internalExecuteCommands + summary: Execute a batch of player commands + description: | + Edge Gateway hot path for `game.command.execute`. Game Master + authorises the user, resolves `actor=race_name` from its own + player mappings, and forwards the request to the engine + `/api/v1/command`. The request and response bodies are + engine-owned and pass through unchanged + (`additionalProperties: true`). + parameters: + - $ref: "#/components/parameters/GameIDPath" + - $ref: "#/components/parameters/XUserIDHeader" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/ExecuteCommandsRequest" + responses: + "200": + description: Engine response forwarded verbatim. + content: + application/json: + schema: + $ref: "#/components/schemas/ExecuteCommandsResponse" + "400": + $ref: "#/components/responses/InvalidRequestError" + "403": + $ref: "#/components/responses/ForbiddenError" + "404": + $ref: "#/components/responses/NotFoundError" + "409": + $ref: "#/components/responses/ConflictError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/games/{game_id}/orders: + post: + tags: + - Gateway + operationId: internalPutOrders + summary: Submit a batch of player orders + description: | + Edge Gateway hot path for `game.order.put`. Same authorisation + and forwarding semantics as `internalExecuteCommands`; the + engine endpoint is `/api/v1/order`. + parameters: + - $ref: "#/components/parameters/GameIDPath" + - $ref: "#/components/parameters/XUserIDHeader" + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/PutOrdersRequest" + responses: + "200": + description: Engine response forwarded verbatim. + content: + application/json: + schema: + $ref: "#/components/schemas/PutOrdersResponse" + "400": + $ref: "#/components/responses/InvalidRequestError" + "403": + $ref: "#/components/responses/ForbiddenError" + "404": + $ref: "#/components/responses/NotFoundError" + "409": + $ref: "#/components/responses/ConflictError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/internal/games/{game_id}/reports/{turn}: + get: + tags: + - Gateway + operationId: internalGetReport + summary: Read a per-player turn report + description: | + Edge Gateway hot path for `game.report.get`. Game Master + authorises the user and forwards + `GET /api/v1/report?player={race_name}&turn={turn}` to the + engine. The response body is engine-owned and pass-through. + parameters: + - $ref: "#/components/parameters/GameIDPath" + - $ref: "#/components/parameters/TurnPath" + - $ref: "#/components/parameters/XUserIDHeader" + responses: + "200": + description: Engine response forwarded verbatim. + content: + application/json: + schema: + $ref: "#/components/schemas/ReportResponse" + "403": + $ref: "#/components/responses/ForbiddenError" + "404": + $ref: "#/components/responses/NotFoundError" + "502": + $ref: "#/components/responses/EngineUnreachableError" + "500": + $ref: "#/components/responses/InternalError" +components: + parameters: + GameIDPath: + name: game_id + in: path + required: true + description: Opaque stable game identifier owned by Game Lobby. + schema: + type: string + VersionPath: + name: version + in: path + required: true + description: Semver string of an engine version registered with Game Master. + schema: + type: string + RaceNamePath: + name: race_name + in: path + required: true + description: Race name registered for a player in the running game. + schema: + type: string + TurnPath: + name: turn + in: path + required: true + description: Turn number for which the per-player report is fetched. + schema: + type: integer + minimum: 0 + XUserIDHeader: + name: X-User-ID + in: header + required: true + description: Verified player identity propagated by Edge Gateway. Trusted as authoritative. + schema: + type: string + RuntimeStatusQuery: + name: status + in: query + required: false + description: Optional filter; when set, only runtimes in the given runtime status are returned. + schema: + $ref: "#/components/schemas/RuntimeStatus" + EngineVersionStatusQuery: + name: status + in: query + required: false + description: Optional filter; when set, only engine versions in the given status are returned. + schema: + $ref: "#/components/schemas/EngineVersionStatus" + schemas: + RuntimeStatus: + type: string + enum: + - starting + - running + - generation_in_progress + - generation_failed + - stopped + - engine_unreachable + - finished + description: Current runtime status of a registered game. + EngineVersionStatus: + type: string + enum: + - active + - deprecated + description: Engine version registry status. + StopReason: + type: string + enum: + - admin_request + - finished + - timeout + description: Reason argument passed to Runtime Manager when stopping a runtime. + ProbeResponse: + type: object + additionalProperties: false + required: + - status + properties: + status: + type: string + description: Probe outcome string (`ok` or `ready`). + LivenessResponse: + type: object + additionalProperties: false + required: + - ready + - status + properties: + ready: + type: boolean + description: True when the runtime is in `running`; false otherwise. + status: + $ref: "#/components/schemas/RuntimeStatus" + ImageRefResponse: + type: object + additionalProperties: false + required: + - image_ref + properties: + image_ref: + type: string + description: Docker reference of the engine image registered for the requested version. + RegisterRuntimeMember: + type: object + additionalProperties: false + required: + - user_id + - race_name + properties: + user_id: + type: string + description: Platform user identifier of an active member. + race_name: + type: string + description: Race name reserved for the member in this game. + RegisterRuntimeRequest: + type: object + additionalProperties: false + required: + - engine_endpoint + - members + - target_engine_version + - turn_schedule + properties: + engine_endpoint: + type: string + description: Engine container DNS endpoint, e.g. http://galaxy-game-{game_id}:8080. + members: + type: array + minItems: 1 + items: + $ref: "#/components/schemas/RegisterRuntimeMember" + description: Members included in the engine init roster. + target_engine_version: + type: string + description: Semver of the engine version under which the container was started. + turn_schedule: + type: string + description: Five-field cron expression copied from the platform game record. + RuntimeRecord: + type: object + additionalProperties: false + required: + - game_id + - runtime_status + - engine_endpoint + - current_image_ref + - current_engine_version + - turn_schedule + - current_turn + - next_generation_at + - skip_next_tick + - engine_health_summary + - created_at + - updated_at + properties: + game_id: + type: string + description: Opaque stable game identifier; primary key. + runtime_status: + $ref: "#/components/schemas/RuntimeStatus" + engine_endpoint: + type: string + description: Engine container DNS endpoint observed at register-runtime time. + current_image_ref: + type: string + description: Docker reference of the running image. + current_engine_version: + type: string + description: Semver of the running engine version. + turn_schedule: + type: string + description: Five-field cron expression governing the scheduler ticker. + current_turn: + type: integer + minimum: 0 + description: Last completed turn number; zero until the first turn generates. + next_generation_at: + type: integer + format: int64 + description: UTC Unix milliseconds of the next scheduled tick. + skip_next_tick: + type: boolean + description: True when force-next-turn has set the skip flag for the next regular tick. + engine_health_summary: + type: string + description: Short text summary derived from runtime:health_events; empty until the first health observation. + created_at: + type: integer + format: int64 + description: UTC Unix milliseconds; record creation timestamp. + updated_at: + type: integer + format: int64 + description: UTC Unix milliseconds; last mutation timestamp. + started_at: + type: integer + format: int64 + description: UTC Unix milliseconds; set when status first becomes running. Optional. + stopped_at: + type: integer + format: int64 + description: UTC Unix milliseconds; set when status becomes stopped. Optional. + finished_at: + type: integer + format: int64 + description: UTC Unix milliseconds; set when status becomes finished. Optional. + RuntimeListResponse: + type: object + additionalProperties: false + required: + - runtimes + properties: + runtimes: + type: array + items: + $ref: "#/components/schemas/RuntimeRecord" + StopRuntimeRequest: + type: object + additionalProperties: false + required: + - reason + properties: + reason: + $ref: "#/components/schemas/StopReason" + PatchRuntimeRequest: + type: object + additionalProperties: false + required: + - version + properties: + version: + type: string + description: Target engine version; must be a semver-patch of the running version. + EngineVersion: + type: object + additionalProperties: false + required: + - version + - image_ref + - options + - status + - created_at + - updated_at + properties: + version: + type: string + description: Semver string; primary key in the registry. + image_ref: + type: string + description: Non-empty Docker reference of the engine image. + options: + type: object + additionalProperties: true + description: Free-form jsonb document of engine-side options. Pass-through; Game Master does not enforce a schema. + status: + $ref: "#/components/schemas/EngineVersionStatus" + created_at: + type: integer + format: int64 + description: UTC Unix milliseconds; record creation timestamp. + updated_at: + type: integer + format: int64 + description: UTC Unix milliseconds; last mutation timestamp. + EngineVersionListResponse: + type: object + additionalProperties: false + required: + - versions + properties: + versions: + type: array + items: + $ref: "#/components/schemas/EngineVersion" + CreateEngineVersionRequest: + type: object + additionalProperties: false + required: + - version + - image_ref + properties: + version: + type: string + description: Semver string of the new version. + image_ref: + type: string + description: Non-empty Docker reference of the engine image. + options: + type: object + additionalProperties: true + description: Optional engine-side options document. Free-form jsonb. + UpdateEngineVersionRequest: + type: object + additionalProperties: false + description: PATCH body. Every field is optional; at least one must be present. + properties: + image_ref: + type: string + description: New Docker reference for the version. + options: + type: object + additionalProperties: true + description: Replacement options document. + status: + $ref: "#/components/schemas/EngineVersionStatus" + ExecuteCommandsRequest: + type: object + additionalProperties: true + required: + - commands + description: | + Player command batch carried inside `commands`. Game Master rewrites + the envelope before forwarding to the engine `/api/v1/command`: the + `commands` array is renamed to `cmd` and a top-level `actor` field + is set to the caller's race name resolved from `player_mappings`. + Caller-supplied envelope fields other than `commands` are dropped; + Game Master never trusts a caller-supplied `actor` per + `gamemaster/README.md` §Hot Path. + properties: + commands: + type: array + items: + type: object + additionalProperties: true + ExecuteCommandsResponse: + type: object + additionalProperties: true + description: Engine-owned shape; the response from the engine /api/v1/command endpoint, returned to Edge Gateway unchanged. + PutOrdersRequest: + type: object + additionalProperties: true + required: + - commands + description: | + Player order batch carried inside `commands`. Same envelope-rewrite + semantics as `ExecuteCommandsRequest`: Game Master renames + `commands` to `cmd` and sets `actor` from the caller identity + before forwarding to the engine `/api/v1/order`. + properties: + commands: + type: array + items: + type: object + additionalProperties: true + PutOrdersResponse: + type: object + additionalProperties: true + description: Engine-owned shape; the response from the engine /api/v1/order endpoint, returned to Edge Gateway unchanged. + ReportResponse: + type: object + additionalProperties: true + description: Engine-owned shape; the response from the engine /api/v1/report endpoint, returned to Edge Gateway unchanged. + ErrorResponse: + type: object + additionalProperties: false + required: + - error + properties: + error: + $ref: "#/components/schemas/ErrorBody" + ErrorBody: + type: object + additionalProperties: false + required: + - code + - message + properties: + code: + type: string + description: Stable internal API error code. + message: + type: string + description: Human-readable trusted error message. + responses: + InvalidRequestError: + description: Request validation failed. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + invalidRequest: + value: + error: + code: invalid_request + message: request is invalid + ForbiddenError: + description: Caller is not an active member of the game or is otherwise not authorised. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + forbidden: + value: + error: + code: forbidden + message: caller is not authorised for this operation + NotFoundError: + description: The requested runtime, race, or engine version does not exist. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + runtimeNotFound: + value: + error: + code: runtime_not_found + message: runtime not found + EngineVersionNotFoundError: + description: The requested engine version is missing or has been deprecated. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + engineVersionNotFound: + value: + error: + code: engine_version_not_found + message: engine version not found + EngineVersionInUseError: + description: Hard delete attempt against a version referenced by a non-finished runtime. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + engineVersionInUse: + value: + error: + code: engine_version_in_use + message: engine version is referenced by a non-finished runtime + ConflictError: + description: The requested state transition is not allowed from the current status. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + conflict: + value: + error: + code: conflict + message: operation not allowed in current status + runtimeNotRunning: + value: + error: + code: runtime_not_running + message: operation requires runtime status running + semverPatchOnly: + value: + error: + code: semver_patch_only + message: patch attempt across major or minor boundary + EngineUnreachableError: + description: The engine container returned 5xx or could not be reached. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + engineUnreachable: + value: + error: + code: engine_unreachable + message: engine container is unreachable + engineProtocolViolation: + value: + error: + code: engine_protocol_violation + message: engine response missing required fields or malformed + engineValidationError: + value: + error: + code: engine_validation_error + message: engine rejected one or more commands + ServiceUnavailableError: + description: An upstream dependency (PostgreSQL, Redis, Lobby, RTM) is unavailable. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + unavailable: + value: + error: + code: service_unavailable + message: service is unavailable + InternalError: + description: Unexpected internal service error. + content: + application/json: + schema: + $ref: "#/components/schemas/ErrorResponse" + examples: + internal: + value: + error: + code: internal_error + message: internal server error diff --git a/gamemaster/api/runtime-events-asyncapi.yaml b/gamemaster/api/runtime-events-asyncapi.yaml new file mode 100644 index 0000000..db48746 --- /dev/null +++ b/gamemaster/api/runtime-events-asyncapi.yaml @@ -0,0 +1,204 @@ +asyncapi: 3.1.0 +info: + title: Galaxy Game Master Runtime Events Contract + version: 1.0.0 + description: | + Stable Redis Streams contract for runtime snapshot updates and game + finish events published by `Game Master` toward `Game Lobby` on the + `gm:lobby_events` stream. + + Two distinct message types share the channel and are discriminated + by the `event_type` field on the payload: + + - `RuntimeSnapshotUpdate` (`event_type=runtime_snapshot_update`) is + published whenever a turn was generated (success or failure), the + runtime status transitioned, or the engine health summary changed + in response to a `runtime:health_events` observation. Duplicates + are suppressed when the summary did not change. + - `GameFinished` (`event_type=game_finished`) is published once + when the engine reports `finished:true` on a turn-generation + response. The runtime stays in `status=finished` indefinitely; + no further events are published for the game. + + Both payload schemas are closed (`additionalProperties: false`). + Adding a field to either payload after this contract was frozen is + a breaking change that requires a contract bump and a coordinated + consumer update. + + Polymorphism: the AsyncAPI surface uses two messages on one channel + and one `send` operation per message. The + `runtime_health-asyncapi.yaml` style of a single message with + `oneOf` details is not used here because the two payload shapes + have no shared field set beyond the discriminator and the + `game_id`. See `gamemaster/docs/stage06-contract-files.md`. +channels: + lobbyEvents: + address: gm:lobby_events + messages: + runtimeSnapshotUpdate: + $ref: '#/components/messages/RuntimeSnapshotUpdate' + gameFinished: + $ref: '#/components/messages/GameFinished' +operations: + publishRuntimeSnapshotUpdate: + action: send + summary: Publish a runtime snapshot update for Game Lobby. + channel: + $ref: '#/channels/lobbyEvents' + messages: + - $ref: '#/channels/lobbyEvents/messages/runtimeSnapshotUpdate' + publishGameFinished: + action: send + summary: Publish a game finish event for Game Lobby. + channel: + $ref: '#/channels/lobbyEvents' + messages: + - $ref: '#/channels/lobbyEvents/messages/gameFinished' +components: + messages: + RuntimeSnapshotUpdate: + name: RuntimeSnapshotUpdate + title: Runtime snapshot update + summary: Snapshot of one game's runtime state, published on transitions and health changes. + payload: + $ref: '#/components/schemas/RuntimeSnapshotUpdatePayload' + examples: + - name: runningTurnReady + summary: Snapshot published after a successful turn generation. + payload: + event_type: runtime_snapshot_update + game_id: game-123 + current_turn: 17 + runtime_status: running + engine_health_summary: healthy + player_turn_stats: + - user_id: user-1 + planets: 4 + population: 12000 + - user_id: user-2 + planets: 3 + population: 9000 + occurred_at_ms: 1775121700000 + GameFinished: + name: GameFinished + title: Game finished + summary: Terminal event published once when the engine reports finished:true on a turn-generation response. + payload: + $ref: '#/components/schemas/GameFinishedPayload' + examples: + - name: gameFinished + summary: Game finished on turn 42; final per-player stats included. + payload: + event_type: game_finished + game_id: game-123 + final_turn_number: 42 + runtime_status: finished + player_turn_stats: + - user_id: user-1 + planets: 6 + population: 25000 + - user_id: user-2 + planets: 0 + population: 0 + finished_at_ms: 1775130000000 + schemas: + RuntimeStatus: + type: string + enum: + - starting + - running + - generation_in_progress + - generation_failed + - stopped + - engine_unreachable + - finished + description: Runtime status enum; identical to the value used in the internal REST contract. + PlayerTurnStat: + type: object + additionalProperties: false + required: + - user_id + - planets + - population + properties: + user_id: + type: string + description: Platform user identifier of the player. + planets: + type: integer + minimum: 0 + description: Number of planets controlled by the player at the snapshot turn. + population: + type: integer + minimum: 0 + description: Total population controlled by the player at the snapshot turn. + RuntimeSnapshotUpdatePayload: + type: object + additionalProperties: false + required: + - event_type + - game_id + - current_turn + - runtime_status + - engine_health_summary + - player_turn_stats + - occurred_at_ms + properties: + event_type: + type: string + const: runtime_snapshot_update + description: Discriminator pinned to `runtime_snapshot_update`; consumers dispatch on this value. + game_id: + type: string + description: Opaque stable game identifier. + current_turn: + type: integer + minimum: 0 + description: Last completed turn number; zero when the snapshot reflects the pre-first-turn state. + runtime_status: + $ref: '#/components/schemas/RuntimeStatus' + engine_health_summary: + type: string + description: Short text summary of engine health; empty until the first health observation. + player_turn_stats: + type: array + items: + $ref: '#/components/schemas/PlayerTurnStat' + description: Per-player stats projection; empty before any turn has generated. + occurred_at_ms: + type: integer + format: int64 + description: UTC Unix milliseconds when Game Master observed the underlying transition. + GameFinishedPayload: + type: object + additionalProperties: false + required: + - event_type + - game_id + - final_turn_number + - runtime_status + - player_turn_stats + - finished_at_ms + properties: + event_type: + type: string + const: game_finished + description: Discriminator pinned to `game_finished`; consumers dispatch on this value. + game_id: + type: string + description: Opaque stable game identifier. + final_turn_number: + type: integer + minimum: 0 + description: Last turn number generated before the engine reported finished:true. + runtime_status: + $ref: '#/components/schemas/RuntimeStatus' + player_turn_stats: + type: array + items: + $ref: '#/components/schemas/PlayerTurnStat' + description: Final per-player stats projection at the finish turn. + finished_at_ms: + type: integer + format: int64 + description: UTC Unix milliseconds when Game Master persisted the finish transition. diff --git a/gamemaster/cmd/gamemaster/main.go b/gamemaster/cmd/gamemaster/main.go new file mode 100644 index 0000000..723bf23 --- /dev/null +++ b/gamemaster/cmd/gamemaster/main.go @@ -0,0 +1,46 @@ +// Binary gamemaster is the runnable Game Master process entrypoint. +package main + +import ( + "context" + "fmt" + "os" + "os/signal" + "syscall" + + "galaxy/gamemaster/internal/app" + "galaxy/gamemaster/internal/config" + "galaxy/gamemaster/internal/logging" +) + +func main() { + if err := run(); err != nil { + _, _ = fmt.Fprintf(os.Stderr, "gamemaster: %v\n", err) + os.Exit(1) + } +} + +func run() error { + cfg, err := config.LoadFromEnv() + if err != nil { + return err + } + + logger, err := logging.New(cfg.Logging.Level) + if err != nil { + return err + } + + rootCtx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM) + defer stop() + + runtime, err := app.NewRuntime(rootCtx, cfg, logger) + if err != nil { + return err + } + defer func() { + _ = runtime.Close() + }() + + return runtime.Run(rootCtx) +} diff --git a/gamemaster/cmd/jetgen/main.go b/gamemaster/cmd/jetgen/main.go new file mode 100644 index 0000000..1199e3d --- /dev/null +++ b/gamemaster/cmd/jetgen/main.go @@ -0,0 +1,237 @@ +// Command jetgen regenerates the go-jet/v2 query-builder code under +// galaxy/gamemaster/internal/adapters/postgres/jet/ against a transient +// PostgreSQL instance. +// +// The program is intended to be invoked as `go run ./cmd/jetgen` (or via +// the `make jet` Makefile target) from within `galaxy/gamemaster`. It is +// not part of the runtime binary. +// +// Steps: +// +// 1. start a postgres:16-alpine container via testcontainers-go +// 2. open it through pkg/postgres as the superuser +// 3. CREATE ROLE gamemasterservice and CREATE SCHEMA "gamemaster" +// AUTHORIZATION gamemasterservice +// 4. open a second pool as gamemasterservice with search_path=gamemaster +// and apply the embedded goose migrations +// 5. run jet's PostgreSQL generator against schema=gamemaster, writing +// into ../internal/adapters/postgres/jet +package main + +import ( + "context" + "errors" + "fmt" + "log" + "net/url" + "os" + "path/filepath" + "runtime" + "time" + + "galaxy/postgres" + + "galaxy/gamemaster/internal/adapters/postgres/migrations" + + jetpostgres "github.com/go-jet/jet/v2/generator/postgres" + testcontainers "github.com/testcontainers/testcontainers-go" + tcpostgres "github.com/testcontainers/testcontainers-go/modules/postgres" + "github.com/testcontainers/testcontainers-go/wait" +) + +const ( + postgresImage = "postgres:16-alpine" + superuserName = "galaxy" + superuserPassword = "galaxy" + superuserDatabase = "galaxy_gamemaster" + serviceRole = "gamemasterservice" + servicePassword = "gamemasterservice" + serviceSchema = "gamemaster" + containerStartup = 90 * time.Second + defaultOpTimeout = 10 * time.Second + jetOutputDirSuffix = "internal/adapters/postgres/jet" +) + +func main() { + if err := run(context.Background()); err != nil { + log.Fatalf("jetgen: %v", err) + } +} + +func run(ctx context.Context) error { + outputDir, err := jetOutputDir() + if err != nil { + return err + } + + container, err := tcpostgres.Run(ctx, postgresImage, + tcpostgres.WithDatabase(superuserDatabase), + tcpostgres.WithUsername(superuserName), + tcpostgres.WithPassword(superuserPassword), + testcontainers.WithWaitStrategy( + wait.ForLog("database system is ready to accept connections"). + WithOccurrence(2). + WithStartupTimeout(containerStartup), + ), + ) + if err != nil { + return fmt.Errorf("start postgres container: %w", err) + } + defer func() { + if termErr := testcontainers.TerminateContainer(container); termErr != nil { + log.Printf("jetgen: terminate container: %v", termErr) + } + }() + + baseDSN, err := container.ConnectionString(ctx, "sslmode=disable") + if err != nil { + return fmt.Errorf("resolve container dsn: %w", err) + } + + if err := provisionRoleAndSchema(ctx, baseDSN); err != nil { + return err + } + + scopedDSN, err := dsnForServiceRole(baseDSN) + if err != nil { + return err + } + if err := applyMigrations(ctx, scopedDSN); err != nil { + return err + } + + if err := os.RemoveAll(outputDir); err != nil { + return fmt.Errorf("remove existing jet output %q: %w", outputDir, err) + } + if err := os.MkdirAll(filepath.Dir(outputDir), 0o755); err != nil { + return fmt.Errorf("ensure jet output parent: %w", err) + } + + jetCfg := postgres.DefaultConfig() + jetCfg.PrimaryDSN = scopedDSN + jetCfg.OperationTimeout = defaultOpTimeout + jetDB, err := postgres.OpenPrimary(ctx, jetCfg) + if err != nil { + return fmt.Errorf("open scoped pool for jet generation: %w", err) + } + defer func() { _ = jetDB.Close() }() + + if err := jetpostgres.GenerateDB(jetDB, serviceSchema, outputDir); err != nil { + return fmt.Errorf("jet generate: %w", err) + } + + log.Printf("jetgen: generated jet code into %s (schema=%s)", outputDir, serviceSchema) + return nil +} + +func provisionRoleAndSchema(ctx context.Context, baseDSN string) error { + cfg := postgres.DefaultConfig() + cfg.PrimaryDSN = baseDSN + cfg.OperationTimeout = defaultOpTimeout + db, err := postgres.OpenPrimary(ctx, cfg) + if err != nil { + return fmt.Errorf("open admin pool: %w", err) + } + defer func() { _ = db.Close() }() + + statements := []string{ + fmt.Sprintf(`DO $$ BEGIN + IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = %s) THEN + CREATE ROLE %s LOGIN PASSWORD %s; + END IF; + END $$;`, sqlLiteral(serviceRole), sqlIdentifier(serviceRole), sqlLiteral(servicePassword)), + fmt.Sprintf(`CREATE SCHEMA IF NOT EXISTS %s AUTHORIZATION %s;`, + sqlIdentifier(serviceSchema), sqlIdentifier(serviceRole)), + fmt.Sprintf(`GRANT USAGE ON SCHEMA %s TO %s;`, + sqlIdentifier(serviceSchema), sqlIdentifier(serviceRole)), + } + for _, statement := range statements { + if _, err := db.ExecContext(ctx, statement); err != nil { + return fmt.Errorf("provision %q/%q: %w", serviceSchema, serviceRole, err) + } + } + return nil +} + +func dsnForServiceRole(baseDSN string) (string, error) { + parsed, err := url.Parse(baseDSN) + if err != nil { + return "", fmt.Errorf("parse base dsn: %w", err) + } + values := url.Values{} + values.Set("search_path", serviceSchema) + values.Set("sslmode", "disable") + scoped := url.URL{ + Scheme: parsed.Scheme, + User: url.UserPassword(serviceRole, servicePassword), + Host: parsed.Host, + Path: parsed.Path, + RawQuery: values.Encode(), + } + return scoped.String(), nil +} + +func applyMigrations(ctx context.Context, dsn string) error { + cfg := postgres.DefaultConfig() + cfg.PrimaryDSN = dsn + cfg.OperationTimeout = defaultOpTimeout + db, err := postgres.OpenPrimary(ctx, cfg) + if err != nil { + return fmt.Errorf("open scoped pool: %w", err) + } + defer func() { _ = db.Close() }() + + if err := postgres.Ping(ctx, db, defaultOpTimeout); err != nil { + return err + } + if err := postgres.RunMigrations(ctx, db, migrations.FS(), "."); err != nil { + return fmt.Errorf("run migrations: %w", err) + } + return nil +} + +// jetOutputDir returns the absolute path that jet should write into. We +// rely on the runtime caller info to anchor it to galaxy/gamemaster +// regardless of the invoking working directory. +func jetOutputDir() (string, error) { + _, file, _, ok := runtime.Caller(0) + if !ok { + return "", errors.New("resolve runtime caller for jet output path") + } + dir := filepath.Dir(file) + // dir = .../galaxy/gamemaster/cmd/jetgen + moduleRoot := filepath.Clean(filepath.Join(dir, "..", "..")) + return filepath.Join(moduleRoot, jetOutputDirSuffix), nil +} + +func sqlIdentifier(name string) string { + return `"` + escapeDoubleQuotes(name) + `"` +} + +func sqlLiteral(value string) string { + return "'" + escapeSingleQuotes(value) + "'" +} + +func escapeDoubleQuotes(value string) string { + out := make([]byte, 0, len(value)) + for index := 0; index < len(value); index++ { + if value[index] == '"' { + out = append(out, '"', '"') + continue + } + out = append(out, value[index]) + } + return string(out) +} + +func escapeSingleQuotes(value string) string { + out := make([]byte, 0, len(value)) + for index := 0; index < len(value); index++ { + if value[index] == '\'' { + out = append(out, '\'', '\'') + continue + } + out = append(out, value[index]) + } + return string(out) +} diff --git a/gamemaster/contract_asyncapi_test.go b/gamemaster/contract_asyncapi_test.go new file mode 100644 index 0000000..cab9418 --- /dev/null +++ b/gamemaster/contract_asyncapi_test.go @@ -0,0 +1,360 @@ +package gamemaster + +import ( + "os" + "path/filepath" + "runtime" + "testing" + + "github.com/stretchr/testify/require" + "gopkg.in/yaml.v3" +) + +type runtimeEventPayloadExpectation struct { + schemaName string + eventTypeConst string + required []string +} + +var expectedRuntimeEventPayloads = []runtimeEventPayloadExpectation{ + { + schemaName: "RuntimeSnapshotUpdatePayload", + eventTypeConst: "runtime_snapshot_update", + required: []string{ + "event_type", + "game_id", + "current_turn", + "runtime_status", + "engine_health_summary", + "player_turn_stats", + "occurred_at_ms", + }, + }, + { + schemaName: "GameFinishedPayload", + eventTypeConst: "game_finished", + required: []string{ + "event_type", + "game_id", + "final_turn_number", + "runtime_status", + "player_turn_stats", + "finished_at_ms", + }, + }, +} + +var expectedRuntimeStatusEnum = []string{ + "starting", + "running", + "generation_in_progress", + "generation_failed", + "stopped", + "engine_unreachable", + "finished", +} + +// TestRuntimeEventsAsyncAPISpecLoads verifies the spec parses as YAML and is +// pinned to AsyncAPI 3.1.0. +func TestRuntimeEventsAsyncAPISpecLoads(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + require.Equal(t, "3.1.0", getStringValue(t, doc, "asyncapi")) +} + +// TestRuntimeEventsAsyncAPIChannel verifies the single channel address and +// the two message references attached to it. +func TestRuntimeEventsAsyncAPIChannel(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + channel := getMapValue(t, doc, "channels", "lobbyEvents") + + require.Equal(t, "gm:lobby_events", getStringValue(t, channel, "address")) + + channelMessages := getMapValue(t, channel, "messages") + require.ElementsMatch(t, + []string{"runtimeSnapshotUpdate", "gameFinished"}, + mapKeys(channelMessages)) + + require.Equal(t, + "#/components/messages/RuntimeSnapshotUpdate", + getStringValue(t, getMapValue(t, channelMessages, "runtimeSnapshotUpdate"), "$ref")) + require.Equal(t, + "#/components/messages/GameFinished", + getStringValue(t, getMapValue(t, channelMessages, "gameFinished"), "$ref")) +} + +// TestRuntimeEventsAsyncAPIOperations verifies that each message has its own +// `send` operation with the correct channel and message reference. Game +// Master is the publisher; no `receive` operations exist on this stream. +func TestRuntimeEventsAsyncAPIOperations(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + operations := getMapValue(t, doc, "operations") + + require.ElementsMatch(t, + []string{"publishRuntimeSnapshotUpdate", "publishGameFinished"}, + mapKeys(operations)) + + cases := []struct { + operationName string + messageKey string + }{ + {"publishRuntimeSnapshotUpdate", "runtimeSnapshotUpdate"}, + {"publishGameFinished", "gameFinished"}, + } + + for _, tc := range cases { + tc := tc + t.Run(tc.operationName, func(t *testing.T) { + t.Parallel() + + op := getMapValue(t, operations, tc.operationName) + require.Equal(t, "send", getStringValue(t, op, "action")) + require.Equal(t, "#/channels/lobbyEvents", + getStringValue(t, getMapValue(t, op, "channel"), "$ref")) + + messageRefs := getSliceValue(t, op, "messages") + require.Len(t, messageRefs, 1, "%s must reference exactly one message", tc.operationName) + + ref, ok := messageRefs[0].(map[string]any) + require.True(t, ok, "%s message reference must be a map", tc.operationName) + require.Equal(t, + "#/channels/lobbyEvents/messages/"+tc.messageKey, + getStringValue(t, ref, "$ref")) + }) + } +} + +// TestRuntimeEventsAsyncAPIMessageNames verifies that components.messages +// contains exactly the two message names frozen by Stage 06. +func TestRuntimeEventsAsyncAPIMessageNames(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + messages := getMapValue(t, doc, "components", "messages") + + require.ElementsMatch(t, + []string{"RuntimeSnapshotUpdate", "GameFinished"}, + mapKeys(messages)) + + for _, name := range []string{"RuntimeSnapshotUpdate", "GameFinished"} { + message := getMapValue(t, messages, name) + require.Equal(t, name, getStringValue(t, message, "name"), + "message %s must declare its own name", name) + require.Equal(t, + "#/components/schemas/"+name+"Payload", + getStringValue(t, getMapValue(t, message, "payload"), "$ref"), + "message %s must reference its payload schema", name) + } +} + +// TestRuntimeEventsAsyncAPIPayloadFreeze verifies that each payload schema +// has the expected required-field set, the correct `event_type` const, and +// `additionalProperties: false`. +func TestRuntimeEventsAsyncAPIPayloadFreeze(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + schemas := getMapValue(t, doc, "components", "schemas") + + for _, expectation := range expectedRuntimeEventPayloads { + expectation := expectation + t.Run(expectation.schemaName, func(t *testing.T) { + t.Parallel() + + payload := getMapValue(t, schemas, expectation.schemaName) + + require.Equal(t, false, getScalarValue(t, payload, "additionalProperties"), + "%s must reject unknown fields", expectation.schemaName) + + require.ElementsMatch(t, + toAnySlice(expectation.required), + getSliceValue(t, payload, "required"), + "%s required field set", expectation.schemaName) + + properties := getMapValue(t, payload, "properties") + + eventType := getMapValue(t, properties, "event_type") + require.Equal(t, "string", getStringValue(t, eventType, "type")) + require.Equal(t, expectation.eventTypeConst, + getScalarValue(t, eventType, "const"), + "%s.event_type const must be %q", expectation.schemaName, expectation.eventTypeConst) + + runtimeStatus := getMapValue(t, properties, "runtime_status") + require.Equal(t, "#/components/schemas/RuntimeStatus", + getStringValue(t, runtimeStatus, "$ref"), + "%s.runtime_status must reference RuntimeStatus", expectation.schemaName) + + playerTurnStats := getMapValue(t, properties, "player_turn_stats") + require.Equal(t, "array", getStringValue(t, playerTurnStats, "type")) + require.Equal(t, "#/components/schemas/PlayerTurnStat", + getStringValue(t, getMapValue(t, playerTurnStats, "items"), "$ref"), + "%s.player_turn_stats items must reference PlayerTurnStat", expectation.schemaName) + }) + } +} + +// TestRuntimeEventsAsyncAPIPlayerTurnStat verifies the per-player stat +// schema shape from gamemaster/README.md §Async Stream Contracts. +func TestRuntimeEventsAsyncAPIPlayerTurnStat(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + stat := getMapValue(t, doc, "components", "schemas", "PlayerTurnStat") + + require.Equal(t, false, getScalarValue(t, stat, "additionalProperties")) + require.ElementsMatch(t, + []any{"user_id", "planets", "population"}, + getSliceValue(t, stat, "required")) + + properties := getMapValue(t, stat, "properties") + require.Equal(t, "string", getStringValue(t, getMapValue(t, properties, "user_id"), "type")) + require.Equal(t, "integer", getStringValue(t, getMapValue(t, properties, "planets"), "type")) + require.Equal(t, "integer", getStringValue(t, getMapValue(t, properties, "population"), "type")) +} + +// TestRuntimeEventsAsyncAPIRuntimeStatusEnum verifies the RuntimeStatus +// enum copied locally for the AsyncAPI surface contains the same seven +// values as the OpenAPI surface. +func TestRuntimeEventsAsyncAPIRuntimeStatusEnum(t *testing.T) { + t.Parallel() + + doc := loadAsyncAPISpec(t) + schema := getMapValue(t, doc, "components", "schemas", "RuntimeStatus") + + require.ElementsMatch(t, expectedRuntimeStatusEnum, getStringSlice(t, schema, "enum")) +} + +func loadAsyncAPISpec(t *testing.T) map[string]any { + t.Helper() + + payload := loadTextFile(t, filepath.Join("api", "runtime-events-asyncapi.yaml")) + + var doc map[string]any + if err := yaml.Unmarshal([]byte(payload), &doc); err != nil { + require.Failf(t, "test failed", "decode spec: %v", err) + } + + return doc +} + +func loadTextFile(t *testing.T, relativePath string) string { + t.Helper() + + path := filepath.Join(moduleRoot(t), relativePath) + payload, err := os.ReadFile(path) + if err != nil { + require.Failf(t, "test failed", "read file %s: %v", path, err) + } + + return string(payload) +} + +func moduleRoot(t *testing.T) string { + t.Helper() + + _, thisFile, _, ok := runtime.Caller(0) + if !ok { + require.FailNow(t, "runtime.Caller failed") + } + + return filepath.Dir(thisFile) +} + +func getMapValue(t *testing.T, value map[string]any, path ...string) map[string]any { + t.Helper() + + current := value + for _, segment := range path { + raw, ok := current[segment] + if !ok { + require.Failf(t, "test failed", "missing map key %s", segment) + } + next, ok := raw.(map[string]any) + if !ok { + require.Failf(t, "test failed", "value at %s is not a map", segment) + } + current = next + } + + return current +} + +func getStringValue(t *testing.T, value map[string]any, key string) string { + t.Helper() + + raw, ok := value[key] + if !ok { + require.Failf(t, "test failed", "missing key %s", key) + } + result, ok := raw.(string) + if !ok { + require.Failf(t, "test failed", "value at %s is not a string", key) + } + + return result +} + +func getStringSlice(t *testing.T, value map[string]any, key string) []string { + t.Helper() + + raw := getSliceValue(t, value, key) + result := make([]string, 0, len(raw)) + for _, item := range raw { + text, ok := item.(string) + if !ok { + require.Failf(t, "test failed", "value at %s is not a string slice", key) + } + result = append(result, text) + } + + return result +} + +func getScalarValue(t *testing.T, value map[string]any, key string) any { + t.Helper() + + raw, ok := value[key] + if !ok { + require.Failf(t, "test failed", "missing key %s", key) + } + + return raw +} + +func getSliceValue(t *testing.T, value map[string]any, key string) []any { + t.Helper() + + raw, ok := value[key] + if !ok { + require.Failf(t, "test failed", "missing key %s", key) + } + result, ok := raw.([]any) + if !ok { + require.Failf(t, "test failed", "value at %s is not a slice", key) + } + + return result +} + +func mapKeys(value map[string]any) []string { + keys := make([]string, 0, len(value)) + for key := range value { + keys = append(keys, key) + } + + return keys +} + +func toAnySlice(values []string) []any { + result := make([]any, 0, len(values)) + for _, value := range values { + result = append(result, value) + } + + return result +} diff --git a/gamemaster/contract_openapi_test.go b/gamemaster/contract_openapi_test.go new file mode 100644 index 0000000..859d051 --- /dev/null +++ b/gamemaster/contract_openapi_test.go @@ -0,0 +1,718 @@ +package gamemaster + +import ( + "context" + "net/http" + "path/filepath" + "runtime" + "testing" + + "github.com/getkin/kin-openapi/openapi3" + "github.com/stretchr/testify/require" +) + +var expectedInternalOperationIDs = []string{ + "internalHealthz", + "internalReadyz", + "internalRegisterRuntime", + "internalGetRuntime", + "internalListRuntimes", + "internalForceNextTurn", + "internalStopRuntime", + "internalPatchRuntime", + "internalBanishRace", + "internalInvalidateMemberships", + "internalGameLiveness", + "internalListEngineVersions", + "internalCreateEngineVersion", + "internalGetEngineVersion", + "internalUpdateEngineVersion", + "internalDeprecateEngineVersion", + "internalResolveEngineVersionImageRef", + "internalExecuteCommands", + "internalPutOrders", + "internalGetReport", +} + +// gmOwnedClosedSchemas lists every component schema for which Game Master +// owns the wire shape and therefore must reject unknown fields. The list +// is curated; the matching test fails if any schema in this list opens up. +var gmOwnedClosedSchemas = []string{ + "ProbeResponse", + "LivenessResponse", + "ImageRefResponse", + "RegisterRuntimeMember", + "RegisterRuntimeRequest", + "RuntimeRecord", + "RuntimeListResponse", + "StopRuntimeRequest", + "PatchRuntimeRequest", + "EngineVersion", + "EngineVersionListResponse", + "CreateEngineVersionRequest", + "UpdateEngineVersionRequest", + "ErrorResponse", + "ErrorBody", +} + +// engineOwnedPassthroughSchemas lists every component schema that forwards +// engine-owned payloads verbatim and therefore deliberately uses +// `additionalProperties: true`. The matching test fails if any schema in +// this list closes up. +var engineOwnedPassthroughSchemas = []string{ + "ExecuteCommandsRequest", + "ExecuteCommandsResponse", + "PutOrdersRequest", + "PutOrdersResponse", + "ReportResponse", +} + +// TestInternalOpenAPISpecValidates loads internal-openapi.yaml and verifies +// it is a syntactically valid OpenAPI 3.0 document. +func TestInternalOpenAPISpecValidates(t *testing.T) { + t.Parallel() + loadInternalSpec(t) +} + +// TestInternalSpecHasAllOperationIDs verifies that the spec declares every +// operationId required by gamemaster/PLAN.md Stage 06 and no extras. Adding +// a new operation requires updating expectedInternalOperationIDs in the same +// patch as the spec change. +func TestInternalSpecHasAllOperationIDs(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + got := make([]string, 0, len(expectedInternalOperationIDs)) + for _, pathItem := range doc.Paths.Map() { + for _, op := range pathItem.Operations() { + require.NotEmpty(t, op.OperationID, "every operation must declare a non-empty operationId") + got = append(got, op.OperationID) + } + } + + require.ElementsMatch(t, expectedInternalOperationIDs, got) +} + +// TestInternalSpecRegisterRuntime verifies the register-runtime contract +// used by Game Lobby after a successful container start. +func TestInternalSpecRegisterRuntime(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/games/{game_id}/register-runtime", http.MethodPost) + + require.Equal(t, "internalRegisterRuntime", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, requestSchemaRef(t, op), "#/components/schemas/RegisterRuntimeRequest", "internalRegisterRuntime request") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeRecord", "internalRegisterRuntime 200") + assertResponseRef(t, op, http.StatusBadRequest, "#/components/responses/InvalidRequestError") + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/EngineVersionNotFoundError") + assertResponseRef(t, op, http.StatusConflict, "#/components/responses/ConflictError") + assertResponseRef(t, op, http.StatusBadGateway, "#/components/responses/EngineUnreachableError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") + assertResponseRef(t, op, http.StatusServiceUnavailable, "#/components/responses/ServiceUnavailableError") + + req := componentSchemaRef(t, doc, "RegisterRuntimeRequest") + assertRequiredFields(t, req, + "engine_endpoint", "members", "target_engine_version", "turn_schedule") + + member := componentSchemaRef(t, doc, "RegisterRuntimeMember") + assertRequiredFields(t, member, "user_id", "race_name") +} + +// TestInternalSpecGetRuntime verifies the runtime read contract. +func TestInternalSpecGetRuntime(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/runtimes/{game_id}", http.MethodGet) + + require.Equal(t, "internalGetRuntime", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeRecord", "internalGetRuntime 200") + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") +} + +// TestInternalSpecListRuntimes verifies the list contract and the optional +// status query parameter. +func TestInternalSpecListRuntimes(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/runtimes", http.MethodGet) + + require.Equal(t, "internalListRuntimes", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/RuntimeStatusQuery") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeListResponse", "internalListRuntimes 200") + assertResponseRef(t, op, http.StatusBadRequest, "#/components/responses/InvalidRequestError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") + + param := componentParameterRef(t, doc, "RuntimeStatusQuery") + require.Equal(t, "status", param.Value.Name) + require.Equal(t, "query", param.Value.In) + require.False(t, param.Value.Required, "status filter must be optional") + require.Equal(t, "#/components/schemas/RuntimeStatus", param.Value.Schema.Ref, + "status filter schema must reference RuntimeStatus") +} + +// TestInternalSpecForceNextTurn verifies the force-next-turn admin contract. +func TestInternalSpecForceNextTurn(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/runtimes/{game_id}/force-next-turn", http.MethodPost) + + require.Equal(t, "internalForceNextTurn", op.OperationID) + require.Nil(t, op.RequestBody, "internalForceNextTurn must have no request body") + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeRecord", "internalForceNextTurn 200") + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusConflict, "#/components/responses/ConflictError") + assertResponseRef(t, op, http.StatusBadGateway, "#/components/responses/EngineUnreachableError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") +} + +// TestInternalSpecStopRuntime verifies the stop admin contract. +func TestInternalSpecStopRuntime(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/runtimes/{game_id}/stop", http.MethodPost) + + require.Equal(t, "internalStopRuntime", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, requestSchemaRef(t, op), "#/components/schemas/StopRuntimeRequest", "internalStopRuntime request") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeRecord", "internalStopRuntime 200") + assertResponseRef(t, op, http.StatusBadRequest, "#/components/responses/InvalidRequestError") + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") + assertResponseRef(t, op, http.StatusServiceUnavailable, "#/components/responses/ServiceUnavailableError") + + req := componentSchemaRef(t, doc, "StopRuntimeRequest") + assertRequiredFields(t, req, "reason") + reason := req.Value.Properties["reason"] + require.NotNil(t, reason) + require.Equal(t, "#/components/schemas/StopReason", reason.Ref, + "StopRuntimeRequest.reason must reference StopReason") +} + +// TestInternalSpecPatchRuntime verifies the patch admin contract. +func TestInternalSpecPatchRuntime(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/runtimes/{game_id}/patch", http.MethodPost) + + require.Equal(t, "internalPatchRuntime", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, requestSchemaRef(t, op), "#/components/schemas/PatchRuntimeRequest", "internalPatchRuntime request") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/RuntimeRecord", "internalPatchRuntime 200") + assertResponseRef(t, op, http.StatusBadRequest, "#/components/responses/InvalidRequestError") + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusConflict, "#/components/responses/ConflictError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") + assertResponseRef(t, op, http.StatusServiceUnavailable, "#/components/responses/ServiceUnavailableError") + + req := componentSchemaRef(t, doc, "PatchRuntimeRequest") + assertRequiredFields(t, req, "version") +} + +// TestInternalSpecBanishRace verifies the engine-side race banish contract +// called by Game Lobby after a permanent membership removal. +func TestInternalSpecBanishRace(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/games/{game_id}/race/{race_name}/banish", http.MethodPost) + + require.Equal(t, "internalBanishRace", op.OperationID) + require.Nil(t, op.RequestBody, "internalBanishRace must have no request body; the race_name is on the path") + assertOperationParameterRefs(t, op, + "#/components/parameters/GameIDPath", + "#/components/parameters/RaceNamePath", + ) + + assertNoContentResponse(t, op, http.StatusNoContent) + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusBadGateway, "#/components/responses/EngineUnreachableError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") +} + +// TestInternalSpecInvalidateMemberships verifies the membership cache hook +// called by Game Lobby on every roster mutation. +func TestInternalSpecInvalidateMemberships(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/games/{game_id}/memberships/invalidate", http.MethodPost) + + require.Equal(t, "internalInvalidateMemberships", op.OperationID) + require.Nil(t, op.RequestBody) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + + assertNoContentResponse(t, op, http.StatusNoContent) + assertResponseRef(t, op, http.StatusNotFound, "#/components/responses/NotFoundError") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") +} + +// TestInternalSpecGameLiveness verifies the liveness reply used by Lobby's +// resume flow. +func TestInternalSpecGameLiveness(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + op := getOperation(t, doc, "/api/v1/internal/games/{game_id}/liveness", http.MethodGet) + + require.Equal(t, "internalGameLiveness", op.OperationID) + assertOperationParameterRefs(t, op, "#/components/parameters/GameIDPath") + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/LivenessResponse", "internalGameLiveness 200") + assertResponseRef(t, op, http.StatusInternalServerError, "#/components/responses/InternalError") + + resp := componentSchemaRef(t, doc, "LivenessResponse") + assertRequiredFields(t, resp, "ready", "status") + status := resp.Value.Properties["status"] + require.NotNil(t, status) + require.Equal(t, "#/components/schemas/RuntimeStatus", status.Ref, + "LivenessResponse.status must reference RuntimeStatus") +} + +// TestInternalSpecEngineVersionsCRUD verifies all six engine version +// registry operations: list, create, get, update, deprecate, resolve. +func TestInternalSpecEngineVersionsCRUD(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + listOp := getOperation(t, doc, "/api/v1/internal/engine-versions", http.MethodGet) + require.Equal(t, "internalListEngineVersions", listOp.OperationID) + assertOperationParameterRefs(t, listOp, "#/components/parameters/EngineVersionStatusQuery") + assertSchemaRef(t, responseSchemaRef(t, listOp, http.StatusOK), "#/components/schemas/EngineVersionListResponse", "internalListEngineVersions 200") + + createOp := getOperation(t, doc, "/api/v1/internal/engine-versions", http.MethodPost) + require.Equal(t, "internalCreateEngineVersion", createOp.OperationID) + assertSchemaRef(t, requestSchemaRef(t, createOp), "#/components/schemas/CreateEngineVersionRequest", "create request") + assertSchemaRef(t, responseSchemaRef(t, createOp, http.StatusCreated), "#/components/schemas/EngineVersion", "internalCreateEngineVersion 201") + assertResponseRef(t, createOp, http.StatusConflict, "#/components/responses/ConflictError") + + getOp := getOperation(t, doc, "/api/v1/internal/engine-versions/{version}", http.MethodGet) + require.Equal(t, "internalGetEngineVersion", getOp.OperationID) + assertOperationParameterRefs(t, getOp, "#/components/parameters/VersionPath") + assertSchemaRef(t, responseSchemaRef(t, getOp, http.StatusOK), "#/components/schemas/EngineVersion", "internalGetEngineVersion 200") + assertResponseRef(t, getOp, http.StatusNotFound, "#/components/responses/NotFoundError") + + updateOp := getOperation(t, doc, "/api/v1/internal/engine-versions/{version}", http.MethodPatch) + require.Equal(t, "internalUpdateEngineVersion", updateOp.OperationID) + assertOperationParameterRefs(t, updateOp, "#/components/parameters/VersionPath") + assertSchemaRef(t, requestSchemaRef(t, updateOp), "#/components/schemas/UpdateEngineVersionRequest", "update request") + assertSchemaRef(t, responseSchemaRef(t, updateOp, http.StatusOK), "#/components/schemas/EngineVersion", "internalUpdateEngineVersion 200") + + deprecateOp := getOperation(t, doc, "/api/v1/internal/engine-versions/{version}", http.MethodDelete) + require.Equal(t, "internalDeprecateEngineVersion", deprecateOp.OperationID) + assertNoContentResponse(t, deprecateOp, http.StatusNoContent) + assertResponseRef(t, deprecateOp, http.StatusConflict, "#/components/responses/EngineVersionInUseError") + + resolveOp := getOperation(t, doc, "/api/v1/internal/engine-versions/{version}/image-ref", http.MethodGet) + require.Equal(t, "internalResolveEngineVersionImageRef", resolveOp.OperationID) + assertOperationParameterRefs(t, resolveOp, "#/components/parameters/VersionPath") + assertSchemaRef(t, responseSchemaRef(t, resolveOp, http.StatusOK), "#/components/schemas/ImageRefResponse", "internalResolveEngineVersionImageRef 200") + assertResponseRef(t, resolveOp, http.StatusNotFound, "#/components/responses/EngineVersionNotFoundError") + + createReq := componentSchemaRef(t, doc, "CreateEngineVersionRequest") + assertRequiredFields(t, createReq, "version", "image_ref") +} + +// TestInternalSpecHotPathContracts verifies the three Edge Gateway hot-path +// operations and their pass-through schema treatment. +func TestInternalSpecHotPathContracts(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + cmdOp := getOperation(t, doc, "/api/v1/internal/games/{game_id}/commands", http.MethodPost) + require.Equal(t, "internalExecuteCommands", cmdOp.OperationID) + assertOperationParameterRefs(t, cmdOp, + "#/components/parameters/GameIDPath", + "#/components/parameters/XUserIDHeader", + ) + assertSchemaRef(t, requestSchemaRef(t, cmdOp), "#/components/schemas/ExecuteCommandsRequest", "internalExecuteCommands request") + assertSchemaRef(t, responseSchemaRef(t, cmdOp, http.StatusOK), "#/components/schemas/ExecuteCommandsResponse", "internalExecuteCommands 200") + assertResponseRef(t, cmdOp, http.StatusForbidden, "#/components/responses/ForbiddenError") + assertResponseRef(t, cmdOp, http.StatusBadGateway, "#/components/responses/EngineUnreachableError") + + orderOp := getOperation(t, doc, "/api/v1/internal/games/{game_id}/orders", http.MethodPost) + require.Equal(t, "internalPutOrders", orderOp.OperationID) + assertOperationParameterRefs(t, orderOp, + "#/components/parameters/GameIDPath", + "#/components/parameters/XUserIDHeader", + ) + assertSchemaRef(t, requestSchemaRef(t, orderOp), "#/components/schemas/PutOrdersRequest", "internalPutOrders request") + assertSchemaRef(t, responseSchemaRef(t, orderOp, http.StatusOK), "#/components/schemas/PutOrdersResponse", "internalPutOrders 200") + + reportOp := getOperation(t, doc, "/api/v1/internal/games/{game_id}/reports/{turn}", http.MethodGet) + require.Equal(t, "internalGetReport", reportOp.OperationID) + assertOperationParameterRefs(t, reportOp, + "#/components/parameters/GameIDPath", + "#/components/parameters/TurnPath", + "#/components/parameters/XUserIDHeader", + ) + require.Nil(t, reportOp.RequestBody, "internalGetReport must have no request body") + assertSchemaRef(t, responseSchemaRef(t, reportOp, http.StatusOK), "#/components/schemas/ReportResponse", "internalGetReport 200") + assertResponseRef(t, reportOp, http.StatusForbidden, "#/components/responses/ForbiddenError") + assertResponseRef(t, reportOp, http.StatusBadGateway, "#/components/responses/EngineUnreachableError") +} + +// TestInternalSpecProbes verifies the two probe operations. +func TestInternalSpecProbes(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + for _, path := range []string{"/healthz", "/readyz"} { + op := getOperation(t, doc, path, http.MethodGet) + assertSchemaRef(t, responseSchemaRef(t, op, http.StatusOK), "#/components/schemas/ProbeResponse", op.OperationID+" 200") + assertResponseRef(t, op, http.StatusServiceUnavailable, "#/components/responses/ServiceUnavailableError") + } + + healthz := getOperation(t, doc, "/healthz", http.MethodGet) + require.Equal(t, "internalHealthz", healthz.OperationID) + readyz := getOperation(t, doc, "/readyz", http.MethodGet) + require.Equal(t, "internalReadyz", readyz.OperationID) +} + +// TestInternalSpecRuntimeRecordSchema verifies that RuntimeRecord declares +// the required field set documented in gamemaster/README.md §Persistence +// Layout, with the optional lifecycle timestamps present in properties. +func TestInternalSpecRuntimeRecordSchema(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + schema := componentSchemaRef(t, doc, "RuntimeRecord") + + assertRequiredFields(t, schema, + "game_id", + "runtime_status", + "engine_endpoint", + "current_image_ref", + "current_engine_version", + "turn_schedule", + "current_turn", + "next_generation_at", + "skip_next_tick", + "engine_health_summary", + "created_at", + "updated_at", + ) + + for _, optional := range []string{"started_at", "stopped_at", "finished_at"} { + require.Contains(t, schema.Value.Properties, optional, + "RuntimeRecord.%s must be present in properties", optional) + } + + runtimeStatus := schema.Value.Properties["runtime_status"] + require.NotNil(t, runtimeStatus) + require.Equal(t, "#/components/schemas/RuntimeStatus", runtimeStatus.Ref, + "RuntimeRecord.runtime_status must reference RuntimeStatus") +} + +// TestInternalSpecEngineVersionSchema verifies the EngineVersion schema's +// required field set and the deliberate `additionalProperties: true` on +// the free-form `options` field. +func TestInternalSpecEngineVersionSchema(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + schema := componentSchemaRef(t, doc, "EngineVersion") + + assertRequiredFields(t, schema, + "version", "image_ref", "options", "status", "created_at", "updated_at") + + options := schema.Value.Properties["options"] + require.NotNil(t, options) + require.NotNil(t, options.Value.AdditionalProperties.Has, + "EngineVersion.options must declare additionalProperties explicitly") + require.True(t, *options.Value.AdditionalProperties.Has, + "EngineVersion.options is free-form jsonb and must keep additionalProperties: true") + + status := schema.Value.Properties["status"] + require.NotNil(t, status) + require.Equal(t, "#/components/schemas/EngineVersionStatus", status.Ref, + "EngineVersion.status must reference EngineVersionStatus") +} + +// TestInternalSpecRuntimeStatusEnum verifies the seven-value RuntimeStatus +// enum from gamemaster/README.md §Scope. +func TestInternalSpecRuntimeStatusEnum(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + schema := componentSchemaRef(t, doc, "RuntimeStatus") + + got := stringEnumValues(t, schema) + require.ElementsMatch(t, + []string{ + "starting", + "running", + "generation_in_progress", + "generation_failed", + "stopped", + "engine_unreachable", + "finished", + }, + got) +} + +// TestInternalSpecEngineVersionStatusEnum verifies the EngineVersionStatus +// enum from gamemaster/README.md §Engine Version Registry. +func TestInternalSpecEngineVersionStatusEnum(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + schema := componentSchemaRef(t, doc, "EngineVersionStatus") + + got := stringEnumValues(t, schema) + require.ElementsMatch(t, []string{"active", "deprecated"}, got) +} + +// TestInternalSpecStopReasonEnum verifies the StopReason enum from +// gamemaster/README.md §Lifecycles -> Stop. +func TestInternalSpecStopReasonEnum(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + schema := componentSchemaRef(t, doc, "StopReason") + + got := stringEnumValues(t, schema) + require.ElementsMatch(t, []string{"admin_request", "finished", "timeout"}, got) +} + +// TestInternalSpecErrorEnvelope verifies the error envelope shape, which +// must be identical to the Lobby and Runtime Manager envelopes. +func TestInternalSpecErrorEnvelope(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + envelope := componentSchemaRef(t, doc, "ErrorResponse") + assertRequiredFields(t, envelope, "error") + assertAdditionalPropertiesFalse(t, envelope, "ErrorResponse") + errRef := envelope.Value.Properties["error"] + require.NotNil(t, errRef) + require.Equal(t, "#/components/schemas/ErrorBody", errRef.Ref, + "ErrorResponse.error must reference ErrorBody") + + body := componentSchemaRef(t, doc, "ErrorBody") + assertRequiredFields(t, body, "code", "message") + assertAdditionalPropertiesFalse(t, body, "ErrorBody") +} + +// TestInternalSpecGMOwnedSchemasAreClosed verifies that every schema for +// which Game Master owns the wire shape rejects unknown fields. +func TestInternalSpecGMOwnedSchemasAreClosed(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + for _, name := range gmOwnedClosedSchemas { + name := name + t.Run(name, func(t *testing.T) { + t.Parallel() + schema := componentSchemaRef(t, doc, name) + assertAdditionalPropertiesFalse(t, schema, name) + }) + } +} + +// TestInternalSpecHotPathSchemasArePassthrough verifies that every engine +// pass-through schema deliberately keeps `additionalProperties: true`. +// The matching test guards against a refactor that closes these by mistake. +func TestInternalSpecHotPathSchemasArePassthrough(t *testing.T) { + t.Parallel() + + doc := loadInternalSpec(t) + + for _, name := range engineOwnedPassthroughSchemas { + name := name + t.Run(name, func(t *testing.T) { + t.Parallel() + schema := componentSchemaRef(t, doc, name) + require.NotNil(t, schema.Value.AdditionalProperties.Has, + "%s must declare additionalProperties explicitly", name) + require.True(t, *schema.Value.AdditionalProperties.Has, + "%s must keep additionalProperties: true (engine pass-through)", name) + }) + } +} + +// loadInternalSpec loads and validates gamemaster/api/internal-openapi.yaml +// relative to this test file. +func loadInternalSpec(t *testing.T) *openapi3.T { + t.Helper() + return loadSpec(t, filepath.Join("api", "internal-openapi.yaml")) +} + +func loadSpec(t *testing.T, rel string) *openapi3.T { + t.Helper() + + _, thisFile, _, ok := runtime.Caller(0) + if !ok { + require.FailNow(t, "runtime.Caller failed") + } + + specPath := filepath.Join(filepath.Dir(thisFile), rel) + loader := openapi3.NewLoader() + doc, err := loader.LoadFromFile(specPath) + if err != nil { + require.Failf(t, "test failed", "load spec %s: %v", specPath, err) + } + if doc == nil { + require.Failf(t, "test failed", "load spec %s: returned nil document", specPath) + } + if err := doc.Validate(context.Background()); err != nil { + require.Failf(t, "test failed", "validate spec %s: %v", specPath, err) + } + + return doc +} + +func getOperation(t *testing.T, doc *openapi3.T, path, method string) *openapi3.Operation { + t.Helper() + + if doc.Paths == nil { + require.FailNow(t, "spec is missing paths") + } + pathItem := doc.Paths.Value(path) + if pathItem == nil { + require.Failf(t, "test failed", "spec is missing path %s", path) + } + op := pathItem.GetOperation(method) + if op == nil { + require.Failf(t, "test failed", "spec is missing %s operation for path %s", method, path) + } + + return op +} + +func requestSchemaRef(t *testing.T, op *openapi3.Operation) *openapi3.SchemaRef { + t.Helper() + + if op.RequestBody == nil || op.RequestBody.Value == nil { + require.FailNow(t, "operation is missing request body") + } + mt := op.RequestBody.Value.Content.Get("application/json") + if mt == nil || mt.Schema == nil { + require.FailNow(t, "operation is missing application/json request schema") + } + + return mt.Schema +} + +func responseSchemaRef(t *testing.T, op *openapi3.Operation, status int) *openapi3.SchemaRef { + t.Helper() + + ref := op.Responses.Status(status) + if ref == nil || ref.Value == nil { + require.Failf(t, "test failed", "operation is missing %d response", status) + } + mt := ref.Value.Content.Get("application/json") + if mt == nil || mt.Schema == nil { + require.Failf(t, "test failed", "operation is missing application/json schema for %d response", status) + } + + return mt.Schema +} + +func componentSchemaRef(t *testing.T, doc *openapi3.T, name string) *openapi3.SchemaRef { + t.Helper() + + if doc.Components.Schemas == nil { + require.FailNow(t, "spec is missing component schemas") + } + ref := doc.Components.Schemas[name] + if ref == nil { + require.Failf(t, "test failed", "spec is missing component schema %s", name) + } + + return ref +} + +func componentParameterRef(t *testing.T, doc *openapi3.T, name string) *openapi3.ParameterRef { + t.Helper() + + if doc.Components.Parameters == nil { + require.FailNow(t, "spec is missing component parameters") + } + ref := doc.Components.Parameters[name] + if ref == nil { + require.Failf(t, "test failed", "spec is missing component parameter %s", name) + } + + return ref +} + +func assertSchemaRef(t *testing.T, schemaRef *openapi3.SchemaRef, want, name string) { + t.Helper() + require.NotNil(t, schemaRef, "%s schema ref", name) + require.Equal(t, want, schemaRef.Ref, "%s schema ref", name) +} + +func assertRequiredFields(t *testing.T, schemaRef *openapi3.SchemaRef, fields ...string) { + t.Helper() + require.NotNil(t, schemaRef) + require.ElementsMatch(t, fields, schemaRef.Value.Required) +} + +func assertOperationParameterRefs(t *testing.T, op *openapi3.Operation, refs ...string) { + t.Helper() + + got := make([]string, 0, len(op.Parameters)) + for _, p := range op.Parameters { + got = append(got, p.Ref) + } + + require.ElementsMatch(t, refs, got) +} + +func assertResponseRef(t *testing.T, op *openapi3.Operation, status int, want string) { + t.Helper() + + ref := op.Responses.Status(status) + if ref == nil { + require.Failf(t, "test failed", "operation %s is missing %d response", op.OperationID, status) + } + require.Equal(t, want, ref.Ref, + "operation %s response %d must reference %s", op.OperationID, status, want) +} + +func assertNoContentResponse(t *testing.T, op *openapi3.Operation, status int) { + t.Helper() + + ref := op.Responses.Status(status) + if ref == nil || ref.Value == nil { + require.Failf(t, "test failed", "operation %s is missing %d response", op.OperationID, status) + } + require.Empty(t, ref.Value.Content, + "operation %s response %d must have no content body", op.OperationID, status) +} + +func assertAdditionalPropertiesFalse(t *testing.T, schemaRef *openapi3.SchemaRef, name string) { + t.Helper() + require.NotNil(t, schemaRef.Value.AdditionalProperties.Has, + "%s must declare additionalProperties explicitly", name) + require.False(t, *schemaRef.Value.AdditionalProperties.Has, + "%s must reject unknown fields (additionalProperties: false)", name) +} + +func stringEnumValues(t *testing.T, schemaRef *openapi3.SchemaRef) []string { + t.Helper() + + require.NotNil(t, schemaRef) + got := make([]string, 0, len(schemaRef.Value.Enum)) + for _, value := range schemaRef.Value.Enum { + s, ok := value.(string) + require.True(t, ok, "enum value %v is not a string", value) + got = append(got, s) + } + return got +} diff --git a/gamemaster/docs/stage01-architecture-sync.md b/gamemaster/docs/stage01-architecture-sync.md new file mode 100644 index 0000000..f459d8e --- /dev/null +++ b/gamemaster/docs/stage01-architecture-sync.md @@ -0,0 +1,62 @@ +# Stage 01 — Architecture sync + +This decision record captures the non-obvious choice from +[`../PLAN.md` Stage 01](../PLAN.md#stage-01-update-architecturemd): +the drop of `ships_built` from every architectural mention of +`player_turn_stats`. + +## Context + +Before Stage 01, `ARCHITECTURE.md` and `lobby/README.md` described +`player_turn_stats` as carrying `{user_id, planets, population, +ships_built}`, and the Race Name Directory capability rule was wired in +prose as if `ships_built` could affect the outcome. In practice, the +formal capability rule was already +`max_planets > initial_planets AND max_population > initial_population` +— `ships_built` was named in the stats payload but never referenced by +the rule. + +## Decision + +`player_turn_stats` carries `{user_id, planets, population}` only. +`ships_built` is removed from: + +- `ARCHITECTURE.md §8 Game Master` — `runtime_snapshot_update` payload + description. +- `ARCHITECTURE.md §7 Game Lobby` — per-member aggregate description + (`current and running-max of planets and population`). +- `gamemaster/README.md` — already aligned at the stage-02 README + freeze. + +The capability rule wording is unchanged because it was already +`planets`/`population`-only; only the surrounding prose mentioning the +unused field was inaccurate. + +This is a documentation-only change. No runtime behaviour, wire format, +schema, or test fixture is affected. + +## Why + +`ships_built` was unused. Naming it in the contract obliged every +producer (GM) and consumer (Lobby aggregator) to populate and forward a +field with no consumer. Dropping it now — before any GM code lands — +keeps the contract minimal and avoids future drift between "what the +spec lists" and "what the code uses". `lobby/README.md` and the lobby +aggregate code are aligned in Stage 03 of the same plan. + +## Alternatives considered + +- **Keep `ships_built` in the contract for future use.** Rejected: no + concrete plan exists for a `ships_built`-driven capability or stat + surface; speculative fields rot. +- **Add `ships_built` only as an opaque stat without changing the + capability rule.** Rejected: the runtime cost of carrying it is + negligible, but the documentation burden of explaining why an unused + field is in the payload is not. + +## References + +- [`../PLAN.md` Stage 01](../PLAN.md) +- [`../../ARCHITECTURE.md` §7 Game Lobby](../../ARCHITECTURE.md) +- [`../../ARCHITECTURE.md` §8 Game Master](../../ARCHITECTURE.md) +- [`../README.md`](../README.md) — `player_turn_stats[]` description. diff --git a/gamemaster/docs/stage03-existing-service-docs-sync.md b/gamemaster/docs/stage03-existing-service-docs-sync.md new file mode 100644 index 0000000..cedc1d6 --- /dev/null +++ b/gamemaster/docs/stage03-existing-service-docs-sync.md @@ -0,0 +1,124 @@ +--- +stage: 03 +title: Existing-service docs sync (Lobby, Notification, Game, RTM) +--- + +# Stage 03 — Existing-service docs sync + +This decision record captures the non-obvious choices made while +synchronising every touched-service README with the post-Game-Master +contract before any code change lands. The mechanical edits +(strikethrough renames, drop of `ships_built`, replacement of the +`engineimage.Resolver` block) are not enumerated here — they are direct +consequences of the rules already recorded in +[`../README.md`](../README.md) and +[`../../ARCHITECTURE.md`](../../ARCHITECTURE.md). + +## Context + +Stage 03 had to reach a state where every README in the repository +agreed on three new contractual rules before any service-level code +landed: + +- `image_ref` is resolved synchronously from `Game Master`'s engine + version registry, not from a Go-template held by `Game Lobby`. +- A new outgoing `POST /api/v1/internal/games/{game_id}/memberships/invalidate` + hook from `Game Lobby` into `Game Master` fires post-commit on every + roster mutation. +- The engine container splits its REST surface into `/api/v1/admin/*` + (GM-only) and `/api/v1/{command,order,report}` (player), and + `StateResponse` carries a new boolean `finished` field that GM uses + as the sole finish signal. + +Three decisions were not derivable from the GM README and required a +deliberate choice while editing `lobby/README.md`, `game/README.md`, +and `rtmanager/README.md`. + +## Decision 1 — `lobby.game.start` failure modes for GM-driven image resolve + +`Game Lobby` now calls +`GET /api/v1/internal/engine-versions/{version}/image-ref` synchronously +before publishing `runtime:start_jobs`. The contract defines two new +failure modes for the `lobby.game.start` command: + +- GM unreachable (network error, timeout, `5xx`) ⇒ + `lobby.game.start` returns `service_unavailable`; the game stays in + `ready_to_start`. No container is created, no envelope is published. +- GM reports the version is missing or deprecated (`404` or + `engine_version_not_found` payload) ⇒ `lobby.game.start` returns + `engine_version_not_found`; the game stays in `ready_to_start`. + +Both error codes were added to the stable error code list in +`lobby/README.md`. They are deliberately distinct from the existing +GM-unavailable-after-container-start path, which transitions the game to +`paused` (the container is alive; only platform tracking is missing). +Conflating the two would force operators to inspect the `paused` set +for misconfigurations that never produced a container. + +Alternatives considered and rejected: + +- treat GM-unavailable at resolve time as `paused` for symmetry with the + later path — rejected because no container exists, so the + `lobby.runtime_paused_after_start` admin notification (which announces + a stranded container) would be a lie; +- silently fall back to a Go-template default when GM is unreachable — + rejected because it brings back the very coupling the stage is + retiring and lets a misconfigured registry slip through unnoticed. + +## Decision 2 — Membership invalidate hook is fail-open + +The new outgoing +`POST /api/v1/internal/games/{game_id}/memberships/invalidate` call from +`approveapplication`, `rejectapplication`, `redeeminvite`, +`removemember`, `blockmember`, and the user-lifecycle cascade worker is +documented as **fail-open**: a non-2xx response is logged and metered +but never rolls back the Lobby commit. GM's TTL safety net catches +stale data within the next cache TTL window. + +This matches the architectural rule that a failed cross-service hook +must not invalidate an already committed business state. The TTL on +GM's in-process membership cache (default `30s`) bounds the staleness +window; the explicit hook only optimises for the time between commit +and TTL expiry. + +Alternatives considered and rejected: + +- two-phase commit across Lobby and GM — rejected: GM is allowed to be + unavailable without rolling back Lobby's roster mutation; +- queue the invalidation on a Redis Stream and let GM consume it + asynchronously — rejected for v1 because it introduces a new stream + contract for a rare event, and the synchronous post-commit call is + cheap enough that the staleness reduction beats the operational cost. + +## Decision 3 — Keep `runtime:start_jobs` envelope shape unchanged + +The `runtime:start_jobs` envelope continues to carry `image_ref` as a +top-level string field. Only the source of that string changes (from a +Lobby-side template substitution to a Lobby-side synchronous call into +GM). `Runtime Manager` does not need a contract change in this stage +and does not learn about engine versions — it still receives a +ready-to-pull Docker reference. + +Alternatives considered and rejected: + +- replace `image_ref` with `engine_version` and have RTM resolve the + image — rejected: it would force RTM to call GM, which violates the + rule that RTM has no upstream service dependencies for runtime + operations; +- attach the resolved version metadata to the envelope alongside + `image_ref` — rejected: RTM has no consumer for the metadata and + carrying it would invite divergence between Lobby and RTM views of + the engine version registry. + +## References + +- [`../PLAN.md` Stage 03](../PLAN.md) +- [`../README.md`](../README.md) — Game Master service description. +- [`../../lobby/README.md`](../../lobby/README.md) — updated Game Start + Flow, internal trusted REST, configuration, and error codes. +- [`../../game/README.md`](../../game/README.md) — admin path layout, + `StateResponse.finished`, `/admin/race/banish` shape. +- [`../../rtmanager/README.md`](../../rtmanager/README.md) — + `runtime:health_events` consumer note. +- [`../../notification/README.md`](../../notification/README.md) — GM as + the producer of the three `game.*` notification types. diff --git a/gamemaster/docs/stage06-contract-files.md b/gamemaster/docs/stage06-contract-files.md new file mode 100644 index 0000000..2525ed7 --- /dev/null +++ b/gamemaster/docs/stage06-contract-files.md @@ -0,0 +1,177 @@ +--- +stage: 06 +title: Contract files and contract tests +--- + +# Stage 06 — Contract files and contract tests + +This decision record captures the non-obvious choices made while +producing the machine-readable contracts for `Game Master`: +[`../api/internal-openapi.yaml`](../api/internal-openapi.yaml), +[`../api/runtime-events-asyncapi.yaml`](../api/runtime-events-asyncapi.yaml), +and the matching contract tests in the `gamemaster` package. + +## Context + +[`../PLAN.md` Stage 06](../PLAN.md) freezes the GM REST and event +contracts before any handler is written, so later stages have a target +spec. The plan enumerates the 20 internal REST `operationId` values and +the two `gm:lobby_events` message types and asks contract tests to +fail loudly if anything drifts. + +Three decisions were not derivable from `../README.md` or +[`../../ARCHITECTURE.md`](../../ARCHITECTURE.md) and required a +deliberate choice while writing the YAML. + +## Decision 1 — Two messages and two send operations on one channel + +`gm:lobby_events` carries two distinct message types — a recurring +`runtime_snapshot_update` and a terminal `game_finished`. The AsyncAPI +3.1.0 surface encodes them as **two separate messages on one channel +with one `send` operation per message**: + +```yaml +channels: + lobbyEvents: + address: gm:lobby_events + messages: + runtimeSnapshotUpdate: { $ref: '#/components/messages/RuntimeSnapshotUpdate' } + gameFinished: { $ref: '#/components/messages/GameFinished' } +operations: + publishRuntimeSnapshotUpdate: { action: send, ... } + publishGameFinished: { action: send, ... } +``` + +The `notification:intents` contract uses a single message with +`allOf`-conditional discriminator branches; the `runtime:health_events` +contract uses a single message with a `oneOf` `details` field. Both +patterns work when most fields are shared and only one variant slot +differs. + +For `gm:lobby_events` the two payloads share only `event_type`, +`game_id`, `runtime_status`, and `player_turn_stats[]`. The remaining +fields (`current_turn`, `engine_health_summary`, `occurred_at_ms` on +the snapshot vs `final_turn_number`, `finished_at_ms` on the finish +event) have no overlap, and their semantics differ — the snapshot is +recurring, the finish event is terminal. Two messages reflect this +asymmetry directly and keep each payload schema closed without +needing per-variant `if/then` rules. + +Alternatives considered: + +- **One message with `allOf` discriminator** — rejected: would force + every shared field to be optional at the envelope level and + re-required inside each `if/then` branch, doubling the schema size + and complicating the contract test. The notification spec accepts + this cost because it has 18 message types and the payload-shape + asymmetry is the whole point; here it's two types with no field + overlap. +- **Two channels** — rejected: would require Game Lobby to subscribe + to two streams, breaking the cadence guarantees in `../README.md` + §Async Stream Contracts ("snapshot transitions and finish are + ordered relative to each other on the same stream"). + +## Decision 2 — `event_type` is a required schema-level `const` + +[`../PLAN.md` Stage 06](../PLAN.md) lists the "frozen field set per +message" without naming `event_type`. The implementation pins +`event_type` as a required schema property with a `const` value: + +```yaml +RuntimeSnapshotUpdatePayload: + required: [event_type, ...] + properties: + event_type: { type: string, const: runtime_snapshot_update } +``` + +Reasons: + +1. The wire payload must carry a discriminator; consumers (Game Lobby) + dispatch on `event_type` after `XREAD`. Omitting it from the schema + would require Game Master to inject the value at publish time + without spec backing. +2. `const` at the schema level lets the contract test assert the + discriminator value, which is the only meaningful check Stage 06 + asks for ("`event_type` discriminator values"). Asserting only the + message component name without the on-wire `event_type` would not + protect consumers from a misconfigured publisher. +3. `rtmanager/api/runtime-health-asyncapi.yaml` already uses + `event_type` as a schema-level enum-typed discriminator; treating + `gm:lobby_events` the same way keeps the patterns consistent for a + reader cross-walking the two specs. + +Alternatives considered: + +- **Leave `event_type` out of the spec and produce it only at the + publish-side adapter** — rejected: hides the discriminator from the + contract test, which then cannot fail when the publisher renames or + drops it. +- **Encode discrimination through AsyncAPI message names alone** + (relying on `header.X-Message-Type` or similar) — rejected: Redis + Streams have no message-headers concept; everything travels in the + payload field set. + +## Decision 3 — `additionalProperties: true` on engine pass-through schemas + +Three internal REST operations forward engine-owned payloads without +modification: + +- `internalExecuteCommands` — `POST /api/v1/command` on the engine +- `internalPutOrders` — `PUT /api/v1/order` on the engine +- `internalGetReport` — `GET /api/v1/report` on the engine + +Their request and response bodies use `additionalProperties: true`: + +```yaml +ExecuteCommandsRequest: + type: object + additionalProperties: true + required: [commands] + properties: + commands: + type: array + items: { type: object, additionalProperties: true } +``` + +Game Master does not own the shape of these payloads — `galaxy/game/openapi.yaml` +is the source of truth — and freezing them in the GM contract would +turn every engine-side schema bump into a coordinated GM release. The +same reasoning applies to `EngineVersion.options`, which is a +free-form `jsonb` document Game Master stores verbatim. + +To prevent the open-by-default flag from spreading by accident, the +contract test +[`../contract_openapi_test.go`](../contract_openapi_test.go) maintains +two explicit allowlists: + +- `gmOwnedClosedSchemas` — every schema for which Game Master owns + the wire shape; the test asserts each one closes with + `additionalProperties: false`. +- `engineOwnedPassthroughSchemas` — the five pass-through schemas + (request and response bodies of the three hot-path operations); the + test asserts each one keeps `additionalProperties: true`. + +Adding a new GM schema requires registering it in +`gmOwnedClosedSchemas`; the test fails loudly if it isn't. + +Alternatives considered: + +- **Close the pass-through schemas with `additionalProperties: false` + and hand-mirror every engine field** — rejected: `galaxy/game` and + `galaxy/gamemaster` would have to release in lockstep; even cosmetic + field renames in the engine would break Edge Gateway routing. +- **Rely on a `// pass-through` comment in the YAML alone** — rejected: + comments do not survive automated reformatters and provide no + test-time signal. + +## References + +- [`../PLAN.md` Stage 06](../PLAN.md) +- [`../README.md` §Hot Path](../README.md), [`../README.md` §Async Stream Contracts](../README.md) +- [`../api/internal-openapi.yaml`](../api/internal-openapi.yaml) +- [`../api/runtime-events-asyncapi.yaml`](../api/runtime-events-asyncapi.yaml) +- [`../contract_openapi_test.go`](../contract_openapi_test.go) +- [`../contract_asyncapi_test.go`](../contract_asyncapi_test.go) +- [`../../lobby/contract_openapi_test.go`](../../lobby/contract_openapi_test.go) — OpenAPI test pattern reused here. +- [`../../notification/contract_asyncapi_test.go`](../../notification/contract_asyncapi_test.go) — YAML walker pattern reused here. +- [`../../rtmanager/api/runtime-health-asyncapi.yaml`](../../rtmanager/api/runtime-health-asyncapi.yaml) — `event_type` const precedent. diff --git a/gamemaster/docs/stage07-notification-catalog-audit.md b/gamemaster/docs/stage07-notification-catalog-audit.md new file mode 100644 index 0000000..81190d1 --- /dev/null +++ b/gamemaster/docs/stage07-notification-catalog-audit.md @@ -0,0 +1,125 @@ +--- +stage: 07 +title: Notification catalog audit +--- + +# Stage 07 — Notification catalog audit + +This decision record captures the audit outcome and the freeze-test +choice made for the GM-owned notification types +(`game.turn.ready`, `game.finished`, `game.generation_failed`). + +## Context + +[`../PLAN.md` Stage 07](../PLAN.md) asks for confirmation that the three +notification types `Game Master` will produce in Stage 15 are already +wired through the shared producer module +[`../../pkg/notificationintent/`](../../pkg/notificationintent/), the +`notification` service AsyncAPI contract +[`../../notification/api/intents-asyncapi.yaml`](../../notification/api/intents-asyncapi.yaml), +and the catalog freeze in +[`../../notification/contract_asyncapi_test.go`](../../notification/contract_asyncapi_test.go). +The stage is described as «no-op or minor»: edits land elsewhere only if +the audit finds drift. + +The producer-side surface is consumed in Stage 15 by +`gamemaster/internal/adapters/notificationpublisher/`; this stage locks +the contract before the publisher is implemented. + +## Audit outcome — no drift + +Each artefact already matches the `Game Master` notification table at +[`../README.md` §Notification Contracts](../README.md): + +- [`../../pkg/notificationintent/intent.go`](../../pkg/notificationintent/intent.go) + declares `NotificationTypeGameTurnReady`, `NotificationTypeGameFinished`, + `NotificationTypeGameGenerationFailed`; `ExpectedProducer` maps the + three to `ProducerGameMaster`; `SupportsAudience` and `SupportsChannel` + encode `user + (push|email)` for the first two and `admin_email + email` + for the failure type. +- [`../../pkg/notificationintent/payloads.go`](../../pkg/notificationintent/payloads.go) + defines `GameTurnReadyPayload`, `GameFinishedPayload`, + `GameGenerationFailedPayload` with the exact field set required by the + README table, and exposes `NewGameTurnReadyIntent`, + `NewGameFinishedIntent`, `NewGameGenerationFailedIntent`. The + user-targeted constructors take `recipientUserIDs`; the admin-email + constructor does not. +- [`../../notification/api/intents-asyncapi.yaml`](../../notification/api/intents-asyncapi.yaml) + carries the three values in the `notification_type` enum, declares + one `if/then` branch each on the envelope, and defines the + `GameTurnReadyPayload`, `GameFinishedPayload`, + `GameGenerationFailedPayload` schemas with the per-type required + fields. +- [`../../notification/contract_asyncapi_test.go`](../../notification/contract_asyncapi_test.go) + freezes the three types inside `expectedNotificationCatalog` and + exercises them through `TestIntentAsyncAPISpecFreezesNotificationCatalogBranches` + and `TestNotificationCatalogDocsStayInSync`. + +There is no separate «catalog data table» inside `notification/internal/`: +the routing decisions live in `pkg/notificationintent/intent.go` and are +shared by every producer and by the notification service itself. +Consequently no edits to +`notification/api/intents-asyncapi.yaml`, +`notification/internal/...`, or +`notification/contract_asyncapi_test.go` are required by this stage. + +## Decision — producer-side compile-time freeze in addition to the YAML freeze + +[`../notificationintent_audit_test.go`](../notificationintent_audit_test.go) +imports `galaxy/notificationintent` from inside the `gamemaster` +package. Because the test names every constant, constructor, and +payload struct field directly, any rename or removal in +`pkg/notificationintent` breaks `go build ./gamemaster/...` before the +test even runs. At runtime the test additionally asserts: + +- the wire value of every `NotificationType` constant + (`game.turn.ready`, `game.finished`, `game.generation_failed`); +- the `Producer`, `AudienceKind`, recipient handling, and `Validate()` + outcome of the constructed intent; +- the on-wire field names through `Contains` checks against + `Intent.PayloadJSON` (catches a JSON tag rename even when the Go + struct field name stays); +- the audience/channel matrix via `SupportsAudience` and + `SupportsChannel`. + +Reasons for adding this in addition to the YAML freeze in +`notification/contract_asyncapi_test.go`: + +1. The YAML freeze runs in the `notification` module. A drift in + `pkg/notificationintent` that is *consistent* with a drift in + `notification/api/intents-asyncapi.yaml` would still be caught, but + the failure surface is on the consumer side, not the producer side. + The GM-side test fails first and points the engineer at the producer + they own. +2. The test binds the contract at compile time. A field rename in + `pkg/notificationintent/payloads.go` cannot land without breaking + `gamemaster/notificationintent_audit_test.go` build, even before + `go test` runs. +3. Stage 15 will introduce a publisher adapter that calls the same + constructors. Locking the constructor signatures here removes one + class of churn from that stage — the test serves as a contract + reference that the adapter has to satisfy. + +Alternatives considered: + +- **YAML re-parse in `gamemaster/`** — rejected: would duplicate the + walker logic already present in + `notification/contract_asyncapi_test.go` and bind the GM module to + the YAML file path through a relative `../notification/` reference. + The Go-import test catches the relevant drift class with no + cross-module file lookups. +- **No GM-side test, rely on the YAML freeze alone** — rejected: + Stage 07's exit criterion is «the freeze test passes», which the + PLAN explicitly anchors to a new file under `gamemaster/`. The YAML + freeze alone would also miss a Go-side rename that the test author + forgot to mirror in the YAML in the same change. + +## References + +- [`../PLAN.md` Stage 07](../PLAN.md) +- [`../README.md` §Notification Contracts](../README.md) +- [`../notificationintent_audit_test.go`](../notificationintent_audit_test.go) +- [`../../pkg/notificationintent/intent.go`](../../pkg/notificationintent/intent.go) +- [`../../pkg/notificationintent/payloads.go`](../../pkg/notificationintent/payloads.go) +- [`../../notification/api/intents-asyncapi.yaml`](../../notification/api/intents-asyncapi.yaml) +- [`../../notification/contract_asyncapi_test.go`](../../notification/contract_asyncapi_test.go) — YAML-level catalog freeze. diff --git a/gamemaster/docs/stage08-module-skeleton.md b/gamemaster/docs/stage08-module-skeleton.md new file mode 100644 index 0000000..de0d832 --- /dev/null +++ b/gamemaster/docs/stage08-module-skeleton.md @@ -0,0 +1,145 @@ +--- +stage: 08 +title: Module skeleton +--- + +# Stage 08 — GM module skeleton + +This decision record captures the wiring choices made when bootstrapping +the runnable `gamemaster` binary on top of the contracts and freeze +tests landed by Stages 01–07. + +## Context + +[`../PLAN.md` Stage 08](../PLAN.md) calls for a buildable `gamemaster` +process that loads its environment-driven configuration, opens +PostgreSQL and Redis pools, installs the OpenTelemetry runtime, exposes +`/healthz` and `/readyz` on the trusted internal HTTP listener, and +exits cleanly on `SIGTERM` within `GAMEMASTER_SHUTDOWN_TIMEOUT`. No +business endpoints, no workers, and no persistence stores yet. + +The reference implementation is `rtmanager`, the most recently landed +Galaxy service that follows the platform-wide skeleton conventions +(layered `cmd / internal/{app, api, config, logging, telemetry}`, +`app.Component` lifecycle, OpenTelemetry runtime with deferred +observable gauges, fail-fast environment loader). Stage 08 mirrors that +skeleton with two deliberate divergences described below. + +## Decisions + +### 1. `go.mod` scope is minimal at Stage 08 + +Only modules actually imported by Stage 08 code land in +[`../go.mod`](../go.mod): + +- `galaxy/postgres`, `galaxy/redisconn`, `galaxy/notificationintent` + (the last one was already present from Stage 07 freeze test); +- the OpenTelemetry stack (`otel`, `metric`, `trace`, `sdk`, + `sdk/metric`, OTLP exporters for traces and metrics over gRPC and + HTTP, stdout exporters); +- `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp`; +- `github.com/redis/go-redis/v9` (promoted from indirect to direct); +- `github.com/jackc/pgx/v5` (transitive via `pkg/postgres`). + +PLAN-listed modules that arrive with later consumers (`go-jet/jet/v2`, +`pressly/goose/v3`, the testcontainers modules, `go.uber.org/mock`, +`galaxy/cronutil`, `galaxy/error`, `galaxy/util`) are deliberately left +out of Stage 08's `go.mod`. They join the module together with their +first consumers in Stages 09 / 10 / 11 / 12. + +Reasoning: keeping `go mod tidy` honest at every stage is cheaper than +pre-declaring blank-import stubs. The PLAN's full list is the eventual +shape of the module across the series, not a Stage 08 contract. + +### 2. `ShutdownTimeout` lives at the top level of `Config` + +The README §Configuration groups one variable — +`GAMEMASTER_SHUTDOWN_TIMEOUT` — under a documentation group called +"Lifecycle". The Go struct does not split that single field into a +substruct: `Config.ShutdownTimeout` mirrors the +`rtmanager.Config.ShutdownTimeout` shape so the two services stay +isomorphic. The "Lifecycle" group remains a documentation grouping in +[`../README.md`](../README.md) only. + +### 3. Telemetry — counters and histograms now, observable gauges later + +`internal/telemetry/runtime.go` registers every counter and histogram +listed under [`../README.md` §Observability](../README.md) at process +start (`buildRuntime`). The three observable gauges +(`gamemaster.runtime_records_by_status`, +`gamemaster.scheduler.due_games`, `gamemaster.engine_versions_total`) +are declared up front but their callbacks are installed via a deferred +`Runtime.RegisterGauges(deps)` call. The wiring layer at Stages 11 / 14 +/ 15 supplies the probes (per-status row count, due-now scheduler +count, registered engine versions) once the persistence stores and the +scheduler exist. + +This matches the `rtmanager` pattern where +`runtime_records_by_status` is registered through an analogous +`RegisterGauges` plumbing. + +### 4. PostgreSQL migrations are deferred to Stage 09 + +The README §Startup dependencies states "Embedded goose migrations +apply synchronously before any listener opens." Stage 08 opens, +instruments, and pings the PostgreSQL pool but **does not** call +`postgres.RunMigrations`. The migrations package +(`internal/adapters/postgres/migrations/`) is shipped by Stage 09; the +runtime adds the one-line `RunMigrations` call at that stage. + +Until then, the runtime is buildable, listener-ready, and serves +`/healthz` + `/readyz` against a fresh PostgreSQL pool with no schema +applied. This is acceptable because Stage 08 ships no business handlers +and no workers; nothing reads or writes `gamemaster.*` tables yet. + +### 5. Makefile mirrors `rtmanager` + +[`../Makefile`](../Makefile) declares `jet`, `mocks`, `integration` +targets identical in shape to `rtmanager/Makefile`. The `jet` target +runs `go run ./cmd/jetgen`; the binary lands in Stage 09. The `mocks` +target runs `go generate ./internal/ports/... +./internal/api/internalhttp/handlers/...`; the `//go:generate` +directives land in Stages 10 / 12 / 19. Both targets fail until their +prerequisites land — accepted because Stage 08 does not require either +to succeed; only `go build` and `go test ./gamemaster/...` matter. + +### 6. No Docker dependency + +`Game Master` is forbidden from importing the Docker SDK +([`../README.md` §Non-Goals](../README.md)). The skeleton therefore +drops the `newDockerClient` / `pingDocker` helpers from +`internal/app/bootstrap.go` and the Docker-related fields from +`internal/app/wiring.go`. The readiness probe pings PostgreSQL and +Redis only. + +## Files landed + +- `cmd/gamemaster/main.go` — process entrypoint. +- `internal/config/{config.go, env.go, validation.go, config_test.go}` — + GAMEMASTER-prefixed env loader plus required-vars fail-fast. +- `internal/logging/{logger.go, context.go}` — slog JSON-stdout logger + with request id and span id helpers. +- `internal/telemetry/{runtime.go, runtime_test.go}` — OpenTelemetry + runtime, instruments listed in §Observability, deferred gauge + plumbing. +- `internal/api/internalhttp/{server.go, server_test.go}` — `/healthz` + and `/readyz` listener with observability middleware. +- `internal/app/{app.go, app_test.go, bootstrap.go, runtime.go, + wiring.go}` — process lifecycle (component supervisor + reverse-order + cleanup), Redis bootstrap helpers, minimal placeholder wiring. +- `Makefile` — `jet`, `mocks`, `integration` target stubs. +- Updated `go.mod` / `go.sum` with the dependencies and replace + directives for `galaxy/postgres` and `galaxy/redisconn`. + +## Verification + +- `go build ./gamemaster/...` succeeds. +- `go test ./gamemaster/...` passes (existing contract / freeze tests + plus the four new test files). +- Manual smoke against a local Postgres + Redis confirms: + `/healthz` returns `200 ok`, `/readyz` returns `200 ready` while both + dependencies respond, and `503 service_unavailable` once one of them + is brought down. +- `SIGTERM` ends the process within `GAMEMASTER_SHUTDOWN_TIMEOUT`, + releasing PostgreSQL pool, Redis client, and telemetry providers in + reverse construction order. diff --git a/gamemaster/docs/stage09-postgres-migration.md b/gamemaster/docs/stage09-postgres-migration.md new file mode 100644 index 0000000..fb90783 --- /dev/null +++ b/gamemaster/docs/stage09-postgres-migration.md @@ -0,0 +1,257 @@ +--- +stage: 09 +title: PostgreSQL schema, migrations, jet +--- + +# Stage 09 — PostgreSQL schema, migrations, jet + +This decision record captures the schema and code-generation pipeline +landed for Game Master at PLAN Stage 09. It is a service-local mirror +of [`../../rtmanager/docs/postgres-migration.md`](../../rtmanager/docs/postgres-migration.md) +but only documents the decisions specific to Stage 09; the stage-24 +[`postgres-migration.md`](postgres-migration.md) reorganisation will +later subsume and supersede this record. + +## Context + +[`../PLAN.md` Stage 09](../PLAN.md) finalises the persistence schema +and the code-generation pipeline. Stage 08 already opens, instruments, +and pings the PostgreSQL pool but does not apply any migrations. The +durable surface for runtime state, engine version registry, player +mappings, and the audit log is described in +[`../README.md` §Persistence Layout](../README.md). Stage 09 ships: + +- `internal/adapters/postgres/migrations/00001_init.sql` plus the + matching embed package; +- `cmd/jetgen` — a testcontainers-driven regeneration pipeline for + the go-jet/v2 query builder code; +- the generated jet code under + `internal/adapters/postgres/jet/gamemaster/{model,table}/`, + committed verbatim; +- the `postgres.RunMigrations` call in `internal/app/runtime.go`, + applied after the PostgreSQL pool ping and before any listener is + built. + +The reference precedent is `rtmanager`, the most recently landed +PG-backed service in the workspace. + +## Decisions + +### 1. Schema and role provisioning are excluded from `00001_init.sql` + +**Decision.** The `gamemaster` schema and the matching +`gamemasterservice` role are created outside the migration sequence +(in tests by [`../cmd/jetgen/main.go`](../cmd/jetgen/main.go) +`provisionRoleAndSchema`; in production by an ops init script not in +scope for this stage). The embedded migration `00001_init.sql` only +contains DDL for the four service-owned tables and indexes and assumes +it runs as the schema owner with `search_path=gamemaster`. + +**Why.** [`../../ARCHITECTURE.md` §Database topology](../../ARCHITECTURE.md) +mandates that each service connects with its own role whose grants are +restricted to its own schema. Mixing role creation, schema creation, +and table DDL into one script forces the migration to run as a +superuser on every replica boot and effectively relaxes the per-service +role boundary. The `rtmanager` precedent settled on the split first; +GM follows it for the same architectural reason. This is a deliberate +deviation from PLAN Stage 09's literal `CREATE SCHEMA IF NOT EXISTS +gamemaster;` instruction, called out in the comment header at the top +of `00001_init.sql`. + +### 2. Natural primary keys mirror the platform identifiers + +**Decision.** Every PK is a natural identifier already owned by another +component: + +- `runtime_records.game_id` — Lobby's platform identifier; +- `engine_versions.version` — semver string from the registry; +- `player_mappings (game_id, user_id)` — composite, both columns owned + by Lobby/User Service. +- `operation_log.id` — `bigserial`, the only synthetic PK because the + audit table has no natural identity per row. + +**Why.** The same reasoning as in +[`../../rtmanager/docs/postgres-migration.md` §2](../../rtmanager/docs/postgres-migration.md) +applies: surrogate keys would force every cross-service join through a +lookup table, while the natural keys keep the persistence layer +pin-compatible with the contracts (every `register-runtime` envelope +already names `game_id`, every Lobby resolve names `version`, every +player command names `user_id`). + +### 3. Defense-in-depth CHECK constraints on every status enum + +**Decision.** Five CHECK constraints reproduce the Go-level enums in +the schema: + +- `runtime_records_status_chk` — seven runtime statuses + (`starting`, `running`, `generation_in_progress`, `generation_failed`, + `stopped`, `engine_unreachable`, `finished`); +- `engine_versions_status_chk` — `active | deprecated`; +- `operation_log_op_kind_chk` — nine operation kinds + (`register_runtime`, `turn_generation`, `force_next_turn`, `banish`, + `stop`, `patch`, `engine_version_create`, `engine_version_update`, + `engine_version_deprecate`); +- `operation_log_op_source_chk` — three op sources + (`gateway_player`, `lobby_internal`, `admin_rest`); +- `operation_log_outcome_chk` — `success | failure`. + +The Go-level enums in the domain layer (added in Stage 10) remain the +source of truth for application code. + +**Why.** The same defense-in-depth argument as for `rtmanager`: the +storage boundary catches an adapter regression that would otherwise +persist an unexpected string. Operator-side queries (`SELECT … WHERE +op_kind = 'patch'`) benefit from the enum being verifiable directly in +psql without consulting the Go source. PostgreSQL's `CREATE TYPE … AS +ENUM` was rejected because adding values to a PG enum type requires +`ALTER TYPE` outside a transaction and complicates the single-init +pre-launch policy (decision §6). + +### 4. Indexes derive from concrete query shapes + +**Decision.** Three secondary indexes ship with `00001_init.sql`: + +- `runtime_records (status, next_generation_at)` — drives the + scheduler ticker scan + (`WHERE status='running' AND next_generation_at <= now()` once per + second); +- `player_mappings (game_id, race_name)` UNIQUE — enforces the + one-race-per-game invariant at the storage boundary; +- `operation_log (game_id, started_at DESC)` — drives audit reads + ordered by recency. + +The README §Persistence Layout list also mentions `player_mappings +(game_id)`, which is intentionally **not** added: the composite +primary key on `(game_id, user_id)` already serves as a leftmost-prefix +index for `WHERE game_id = $1`, and a one-column duplicate would only +double the write cost for no plan-stability gain. The README's +indexes list is corrected in the same patch to drop the redundant +entry. + +**Why.** Each remaining index has a single concrete read shape behind +it. The composite ordering on `(status, next_generation_at)` lets the +planner satisfy the scheduler scan with one index sweep. The descending +ordering on `(game_id, started_at DESC)` matches the +`ListByGame ORDER BY started_at DESC` shape already established by +`rtmanager.operationlogstore.ListByGame`. + +### 5. `next_generation_at` is nullable + +**Decision.** `runtime_records.next_generation_at timestamptz` admits +NULL; `runtime_records.skip_next_tick boolean NOT NULL DEFAULT false` +does not. + +**Why.** A row enters the table at register-runtime with +`status='starting'` and no scheduled tick yet — the tick is only +computed once the engine `/admin/init` succeeds and the CAS flips the +status to `running`. NULL captures «no tick scheduled» without forcing +a sentinel value into the column. The scheduler index +`(status, next_generation_at)` still works correctly: the predicate +`next_generation_at <= now()` is undefined for NULL inputs, and PG +excludes those rows from the result set, which is the desired +behaviour. `skip_next_tick` is a boolean knob set or cleared by the +force-next-turn flow; NULL would be a third state with no semantic, so +the column is NOT NULL with a `false` default. + +### 6. Single-init pre-launch policy applies as documented + +**Decision.** `00001_init.sql` evolves in place until first production +deploy. Adding a column, an index, or a new table during the +pre-launch development window edits this file directly rather than +producing `00002_*.sql`. The runtime applies the migration on every +boot; if the schema is already at head, `pkg/postgres`'s goose +adapter exits zero. + +**Why.** The schema-per-service architectural rule +([`../../ARCHITECTURE.md` §Persistence Backends](../../ARCHITECTURE.md)) +endorses a single-init policy for pre-launch services. The pre-launch +window allows non-additive changes (column rename, type narrowing, +CHECK tightening) that a multi-step migration sequence would force into +awkward two-step rewrites. Once the service ships to production, the +next schema change becomes `00002_*.sql` and the policy lifts. + +### 7. `cmd/jetgen` is a one-to-one mirror of `rtmanager/cmd/jetgen` + +**Decision.** [`../cmd/jetgen/main.go`](../cmd/jetgen/main.go) follows +the same shape as +[`../../rtmanager/cmd/jetgen/main.go`](../../rtmanager/cmd/jetgen/main.go): +spin a `postgres:16-alpine` testcontainer, open it as superuser, +provision the role and schema, open a second pool with +`search_path=gamemaster`, apply the embedded goose migrations, then +invoke `github.com/go-jet/jet/v2/generator/postgres.GenerateDB` with +schema=gamemaster. Constants differ (`gamemasterservice`, +`gamemaster`, `galaxy_gamemaster`) but the algorithm and helper shape +are intentionally identical. + +**Why.** Two PG-backed services should not diverge on a dev-only code +generator that nothing else in the workspace relies on. Mirroring +`rtmanager` keeps `make -C jet` interchangeable for +operators and minimises the cognitive overhead of moving between +services. + +### 8. Generated jet code is committed + +**Decision.** The output of `make -C gamemaster jet` lands under +[`../internal/adapters/postgres/jet/gamemaster/{model,table}/`](../internal/adapters/postgres/jet/gamemaster) +and is committed verbatim. + +**Why.** `go build ./...` from the repository root must work without +Docker; CI runners and contributor machines without a local Docker +daemon must still pass `go test ./gamemaster/...` for the non-PG-store +parts of the module. The generation pipeline itself remains available +behind `make jet` for everyone who wants to regenerate. + +### 9. Migrations apply synchronously before any listener opens + +**Decision.** [`../internal/app/runtime.go`](../internal/app/runtime.go) +calls `postgres.RunMigrations(ctx, pgPool, migrations.FS(), ".")` +immediately after the `postgres.Ping` succeeds and before +`newWiring`/`internalhttp.NewServer` are constructed. A non-zero exit +on migration failure follows the `pkg/postgres` policy. + +**Why.** [`../README.md` §Startup dependencies](../README.md) +specifies that «embedded goose migrations apply synchronously before +any listener opens». Repeated process boots against a head schema +return goose's «no work to do» success — this is how the policy stays +operationally cheap, since a freshly-spawned replica re-applies the +same `00001_init.sql` with no work and proceeds straight to opening +its listeners. + +## Files landed + +- [`../internal/adapters/postgres/migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) + — full schema for the four service tables plus indexes and CHECK + constraints. +- [`../internal/adapters/postgres/migrations/migrations.go`](../internal/adapters/postgres/migrations/migrations.go) + — `//go:embed *.sql` and `FS()` exporter. +- [`../cmd/jetgen/main.go`](../cmd/jetgen/main.go) — testcontainers + + goose + jet pipeline. +- [`../internal/adapters/postgres/jet/gamemaster/`](../internal/adapters/postgres/jet/gamemaster) + — generated model and table packages. +- [`../internal/app/runtime.go`](../internal/app/runtime.go) — wired + `postgres.RunMigrations` call after the pool ping. +- [`../Makefile`](../Makefile) — refreshed `jet` target comment now + that the pipeline is real. +- [`../go.mod`](../go.mod), [`../go.sum`](../go.sum) — promoted + `github.com/go-jet/jet/v2`, `github.com/testcontainers/testcontainers-go`, + and `github.com/testcontainers/testcontainers-go/modules/postgres` + to direct dependencies. +- [`../README.md`](../README.md) — corrected §Persistence Layout + indexes list (dropped redundant `player_mappings (game_id)` entry) + and added a §References pointer to this record. + +## Verification + +- `cd gamemaster && go mod tidy` — no missing dependency, no + superfluous indirect. +- `make -C gamemaster jet` — bring up `postgres:16-alpine`, apply + `00001_init.sql`, regenerate `internal/adapters/postgres/jet/...`; + `git status` is clean after a second run. +- `go build ./gamemaster/...` succeeds (including the generated jet + code). +- `go test ./gamemaster/...` passes — existing contract, freeze, and + config/telemetry/HTTP tests are unaffected. +- Manual smoke against a local PostgreSQL with an empty `gamemaster` + schema and a `gamemasterservice` role: the process applies the + migration, `/readyz` returns `200`, and a second boot exits zero on + the «no work to do» path. diff --git a/gamemaster/docs/stage10-domain-and-ports.md b/gamemaster/docs/stage10-domain-and-ports.md new file mode 100644 index 0000000..6ee95c4 --- /dev/null +++ b/gamemaster/docs/stage10-domain-and-ports.md @@ -0,0 +1,184 @@ +--- +stage: 10 +title: Domain layer and ports +--- + +# Stage 10 — Domain layer and ports + +This decision record captures the non-obvious choices made while +introducing the in-memory domain model and port interfaces of Game +Master at PLAN Stage 10. + +## Context + +[`../PLAN.md` Stage 10](../PLAN.md) freezes the domain types and the +port surfaces that adapters (Stage 11/12), services (Stages 13–17), and +workers (Stage 18) will adopt. No adapter or service code lands here; +the stage exists so every consumer of these types in later stages can +import a stable contract. + +The reference precedent is `rtmanager`, the most recently landed +PG-backed service. Its +[`internal/domain/`](../../rtmanager/internal/domain) and +[`internal/ports/`](../../rtmanager/internal/ports) directories define +the shape every Stage 10 file follows: `Status string` enums with +`IsKnown` / `AllStatuses`; `*InvalidTransitionError` wrapping +`ErrInvalidTransition`; transition tables keyed by `(from, to)` pairs; +input structs with `Validate()` methods on every store mutation. + +Six decisions deviate from a direct copy of `rtmanager` or extend the +literal task list of PLAN Stage 10. Each is recorded below. + +## Decisions + +### 1. `internal/domain/operation/` is added beyond the literal task list + +**Decision.** Stage 10 ships +[`internal/domain/operation/log.go`](../internal/domain/operation/log.go) +with `OperationEntry`, `OpKind`, `OpSource`, and `Outcome` types even +though PLAN Stage 10's bullet list does not enumerate them. + +**Why.** The Stage 09 +[`00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) +schema already declares CHECK constraints on `op_kind`, `op_source`, +and `outcome`. The +[`ports/operationlog.go`](../internal/ports/operationlog.go) interface +returns and accepts an `OperationEntry` parameter, which must therefore +live in the domain layer or be redefined inside `ports`. The +`rtmanager` precedent +([`rtmanager/internal/domain/operation/log.go`](../../rtmanager/internal/domain/operation/log.go)) +treats it as a domain package; mirroring that keeps Game Master's layout +recognisable and lets later service code import a single canonical +type. The alternative (defining the type on the port file) would +duplicate the SQL CHECK enums in two places once Stage 11's adapter +ships and would force every service-layer caller to import the port +package for what is structurally a value type. + +### 2. `Membership` lives on `ports/lobbyclient.go`, not in the domain + +**Decision.** The DTO consumed by `LobbyClient.GetMemberships` is +declared inside +[`ports/lobbyclient.go`](../internal/ports/lobbyclient.go) rather than a +new `internal/domain/membership/` package. + +**Why.** Game Master does not own membership state — Game Lobby does +([`../../ARCHITECTURE.md` §Membership rules](../../ARCHITECTURE.md)). +Anything GM holds about membership is a remote projection used solely +for hot-path authorisation. Treating it as a port-level DTO matches +`rtmanager`'s precedent for cross-service projections +([`rtmanager/internal/ports/lobbyinternal.go:LobbyGameRecord`](../../rtmanager/internal/ports/lobbyinternal.go)) +and keeps the domain layer free of types that GM does not author. +Promoting it to a domain package later costs nothing if a real +GM-owned invariant ever attaches to it, but the v1 surface has none. + +### 3. `EngineVersion.Options` is `[]byte`, not `map[string]any` + +**Decision.** +[`engineversion.EngineVersion.Options`](../internal/domain/engineversion/model.go) +is declared as `[]byte` carrying the raw `jsonb` document. + +**Why.** The OpenAPI contract +([`../api/internal-openapi.yaml`](../api/internal-openapi.yaml)) marks +`EngineVersion.options` as `additionalProperties: true` — the engine +owns the schema, GM is a pass-through registry. A `map[string]any` Go +field would encourage callers to introspect or mutate keys, breaking +that pass-through guarantee. `[]byte` matches how `rtmanager` keeps +`Details json.RawMessage` on health snapshots +([`rtmanager/internal/domain/health/snapshot.go`](../../rtmanager/internal/domain/health/snapshot.go)) +for the same reason. Schema-aware handling can introduce a typed shape +in a future iteration without disturbing existing rows. + +### 4. `Schedule.Next(after, skip)` returns `skipConsumed`, not mutated state + +**Decision.** The wrapper at +[`internal/domain/schedule/nexttick.go`](../internal/domain/schedule/nexttick.go) +exposes `Next(after time.Time, skip bool) (time.Time, bool)`. The +boolean return reports whether the skip flag was consumed; the wrapper +itself stores no state. + +**Why.** Persisting `skip_next_tick=false` is a column update on the +`runtime_records` row and belongs to the service layer (Stage 15), +together with the `next_generation_at` write. Encapsulating that +mutation inside the schedule wrapper would couple a pure value type to +the store; the boolean return keeps the wrapper trivially testable and +lets the caller (service layer) issue the column update via an +existing `UpdateScheduling` port call. + +### 5. The transition table includes `engine_unreachable → running` + +**Decision.** The runtime transitions map +([`internal/domain/runtime/transitions.go`](../internal/domain/runtime/transitions.go)) +permits `engine_unreachable → running` even though Stage 10's task +list does not introduce a producer for that edge. + +**Why.** The Stage 18 +([`../PLAN.md` Stage 18](../PLAN.md)) health-events consumer must be +able to recover an engine that previously appeared unreachable when a +subsequent health observation reports `healthy`. Declaring the edge in +Stage 10 means Stage 18 needs no transitions.go edit — the consumer +calls `UpdateStatus` with the existing CAS guard. The alternative +(wait until Stage 18 to add the edge) would couple two unrelated +stages and force a domain-level edit during a worker stage. + +### 6. mockgen directives target `internal/adapters/mocks/` (deferred) + +**Decision.** Every port file carries a +`//go:generate go run go.uber.org/mock/mockgen +-destination=../adapters/mocks/mock_.go -package=mocks +galaxy/gamemaster/internal/ports ` directive even though +the destination directory does not exist yet. + +**Why.** Stage 12 ships the +[`internal/adapters/mocks/`](../internal/adapters/mocks) directory and +the first regeneration of `make mocks`. Putting the directives in +place during Stage 10 means Stage 12 only adds the directory and the +generated files; no port file has to be edited then. The directives +are inert until the destination directory exists; running +`go generate ./internal/ports/...` before Stage 12 is expected to +fail. The +[`Makefile`](../Makefile)'s `mocks` target already references the +directives, matching the lobby and rtmanager pattern +([`../../lobby/internal/ports/gmclient.go`](../../lobby/internal/ports/gmclient.go), +[`../../rtmanager/internal/ports/dockerclient.go`](../../rtmanager/internal/ports/dockerclient.go)). + +## Files landed + +- [`../internal/domain/runtime/{model,errors,transitions}.go`](../internal/domain/runtime) + with seven-status enum, `RuntimeRecord` struct, and the transition + table from PLAN Stage 10 plus decision §5. +- [`../internal/domain/engineversion/{model,semver}.go`](../internal/domain/engineversion) + with the registry status enum, `EngineVersion` struct, and the + `ParseSemver` / `IsPatchUpgrade` helpers. +- [`../internal/domain/playermapping/model.go`](../internal/domain/playermapping/model.go) + carrying the (game_id, user_id) → race_name + engine_player_uuid + projection. +- [`../internal/domain/operation/log.go`](../internal/domain/operation/log.go) + per decision §1. +- [`../internal/domain/schedule/nexttick.go`](../internal/domain/schedule/nexttick.go) + per decision §4. +- Ten port files under + [`../internal/ports/`](../internal/ports) covering the runtime + record, engine version, player mapping, operation log, stream + offset, engine, lobby, runtime manager, notification publisher, and + lobby events surfaces. +- Unit tests next to every source file; the suite covers status + enums, transition matrix, validators, semver normalisation, and + schedule skip semantics. +- [`../go.mod`](../go.mod) gains direct dependencies on + `galaxy/cronutil` and `golang.org/x/mod` for the schedule wrapper + and the semver helpers. + +## Verification + +- `cd gamemaster && go build ./...` — clean. +- `cd gamemaster && go test ./internal/domain/... ./internal/ports/...` + — green; transition matrix exhaustively asserts every allowed and + forbidden pair, semver parser rejects shortened forms, schedule + wrapper honours both `skip` modes. +- `cd gamemaster && go vet ./internal/...` — clean. +- `gofmt -l gamemaster/internal` — empty. +- Stage 09 contract tests + ([`../contract_openapi_test.go`](../contract_openapi_test.go), + [`../contract_asyncapi_test.go`](../contract_asyncapi_test.go), + [`../notificationintent_audit_test.go`](../notificationintent_audit_test.go)) + remain green; Stage 10 introduces no contract changes. diff --git a/gamemaster/docs/stage11-persistence-adapters.md b/gamemaster/docs/stage11-persistence-adapters.md new file mode 100644 index 0000000..29607f8 --- /dev/null +++ b/gamemaster/docs/stage11-persistence-adapters.md @@ -0,0 +1,242 @@ +--- +stage: 11 +title: Persistence adapters +--- + +# Stage 11 — Persistence adapters + +This decision record captures the non-obvious choices made while +implementing the four PostgreSQL stores and the Redis offset store of +Game Master at PLAN Stage 11. + +## Context + +[`../PLAN.md` Stage 11](../PLAN.md) ships the persistence layer that +the service-layer stages (13-17) and the worker stage (18) consume. +Stage 09 already shipped the schema, embedded migration, and the +generated jet code; Stage 10 fixed the domain types and the port +interfaces. Stage 11 plugs concrete adapters into those ports. + +The reference precedent is `rtmanager`, the most recently landed +PG-backed service. Its +[`internal/adapters/postgres/`](../../rtmanager/internal/adapters/postgres) +and +[`internal/adapters/redisstate/`](../../rtmanager/internal/adapters/redisstate) +trees define the shape every Stage 11 file follows: per-store package +under `postgres//store.go`, helper packages under +`internal/sqlx` and `internal/pgtest`, `Config`/`Store`/`New` triple, +ColumnList-driven canonical SELECTs, `sqlx.WithTimeout`/`sqlx.IsNoRows`/ +`sqlx.IsUniqueViolation` shared boundary helpers. + +Eight decisions either deviate from a literal copy of `rtmanager` or +extend the literal task list of PLAN Stage 11. Each is recorded below. + +## Decisions + +### 1. `internal/sqlx` and `internal/pgtest` are local clones, not a shared module + +**Decision.** +[`internal/adapters/postgres/internal/sqlx/sqlx.go`](../internal/adapters/postgres/internal/sqlx/sqlx.go) +and +[`internal/adapters/postgres/internal/pgtest/pgtest.go`](../internal/adapters/postgres/internal/pgtest/pgtest.go) +are full copies of `rtmanager`'s sibling files, with the few constants +that name the schema and role (`gamemaster`, `gamemasterservice`, +`galaxy_gamemaster`) replaced verbatim. + +**Why.** Each PG-backed service owns its own role, schema, and +migration FS. Promoting these helpers into `pkg/postgres` would force +that package to either know about every schema or take them as +configuration; either path adds surface area for a runtime helper that +already covers exactly one boundary. The `rtmanager` precedent settled +on the per-service clone first and Game Master mirrors it for the +same architectural reason. The duplication cost is small (≈250 lines +total, mechanical) and the alternative would couple services through a +testing concern that has no business in production code. + +### 2. CAS via `(game_id, status)` predicate, not `SELECT … FOR UPDATE` + +**Decision.** +[`runtimerecordstore.UpdateStatus`](../internal/adapters/postgres/runtimerecordstore/store.go) +encodes the compare-and-swap as a `WHERE game_id = $1 AND status = $2` +predicate on a single `UPDATE`, then probes the row's existence on +`RowsAffected == 0` to distinguish `runtime.ErrConflict` (status +changed concurrently) from `runtime.ErrNotFound` (row absent). + +**Why.** Same reasoning as +[`rtmanager/docs/postgres-migration.md` §CAS](../../rtmanager/docs/postgres-migration.md): +holding a `SELECT … FOR UPDATE` lock would block every other tick on +the same game while the Go code computed the next status, lengthening +the locked region for no correctness gain. The CAS-only path is +verified by `TestUpdateStatusConcurrentCAS` (8 goroutines, exactly one +winner). + +### 3. Port-level deviation: `UpdateEngineVersionInput.Now` and `Deprecate(ctx, version, now)` + +**Decision.** +[`ports/engineversionstore.go`](../internal/ports/engineversionstore.go) +gains a `Now time.Time` field on `UpdateEngineVersionInput` (validated +by `Validate` to be non-zero) and a `now time.Time` argument on +`Deprecate`. The corresponding port-level test fixtures in +`engineversionstore_test.go` are updated to carry the new value. + +**Why.** Stage 10's literal port did not include a wall-clock for the +engine-version mutators, while +[`UpdateStatusInput`](../internal/ports/runtimerecordstore.go) and +[`UpdateSchedulingInput`](../internal/ports/runtimerecordstore.go) do. +Without Now in the input, the adapter would have to either call +`time.Now()` directly (loses test determinism) or accept a `Clock` +dependency in `Config` (adds adapter infrastructure for a single use +case). Aligning the inputs is a small, targeted contract change +allowed by the pre-launch single-init policy and consistent with the +clock-from-input convention adopted everywhere else in the service. + +### 4. Domain-level conflict sentinels `engineversion.ErrConflict` and `playermapping.ErrConflict` + +**Decision.** The domain packages +[`engineversion`](../internal/domain/engineversion/model.go) and +[`playermapping`](../internal/domain/playermapping/model.go) gain +`ErrConflict` sentinels. Adapters surface PostgreSQL unique violations +as `fmt.Errorf("...: %w", .ErrConflict)` so service callers can +branch with `errors.Is`. + +**Why.** `runtime.ErrConflict` already exists in the runtime package +and the rest of the codebase (lobby, rtmanager, notification) uses +domain-level conflict sentinels (e.g. +`membership.ErrConflict`, +`runtime.ErrConflict`). Returning a generic wrapped error for +engine-version and player-mapping conflicts would break the +established pattern and force the service layer to carry adapter +implementation knowledge (`sqlx.IsUniqueViolation`). Adding two +sentinels is a small, idiomatic deviation from PLAN Stage 11's bullet +list, called out here so future contract diffs do not re-litigate it. + +### 5. `Options` jsonb requires explicit `CAST(... AS jsonb)` in dynamic UPDATE + +**Decision.** In +[`engineversionstore.Update`](../internal/adapters/postgres/engineversionstore/store.go) +the dynamic assignment for `options` wraps the value in +`pg.StringExp(pg.CAST(pg.String(...)).AS("jsonb"))`. The plain +`pg.String(...)` literal makes PostgreSQL infer the right-hand side as +`text` and the assignment to a `jsonb` column then fails with +SQLSTATE `42804` (`column is of type jsonb but expression is of type +text`). + +**Why.** `INSERT ... VALUES(...)` paths bind the `[]byte` through pgx, +which knows how to coerce text into jsonb at the protocol level. +Dynamic `UPDATE … SET options = '...'` does not go through that bind +because the SQL contains a string literal directly; PostgreSQL applies +its own type inference and fails. Using +[`jet`'s `CAST`](https://pkg.go.dev/github.com/go-jet/jet/v2/postgres#CAST) +is the cleanest way to force the right-hand-side type without dropping +to raw SQL. Storing `'{}'::jsonb` as the empty default mirrors the SQL +column default. + +### 6. `Deprecate` is idempotent through a pre-check `Get` + +**Decision.** +[`engineversionstore.Deprecate`](../internal/adapters/postgres/engineversionstore/store.go) +runs `Get(version)` first to distinguish three cases: row absent +(return `engineversion.ErrNotFound`), row already deprecated (return +`nil` with no further mutation), row active (run the +`UPDATE ... SET status='deprecated'`). Without the pre-check the +adapter would have to interpret `RowsAffected == 0` against an +ambiguous SQL guard (`WHERE version = ? AND status != 'deprecated'`). + +**Why.** Deprecation is a relatively rare admin operation; the extra +read costs ≈one millisecond and removes the ambiguity. The +alternative is the same `classifyMissingUpdate` probe pattern used by +`UpdateStatus`, which would still need a Get to tell "missing" from +"already deprecated". The pre-check is the simplest path. + +### 7. `BulkInsert` ships every row in one multi-row `INSERT`, not a transaction + +**Decision.** +[`playermappingstore.BulkInsert`](../internal/adapters/postgres/playermappingstore/store.go) +emits a single `INSERT ... VALUES (a), (b), …` with as many tuples as +the input slice. Any unique-violation rolls back every row in the same +statement. + +**Why.** The atomicity guarantee Game Master needs (no partial +roster) is already provided by PostgreSQL's per-statement implicit +transaction; wrapping the same rows in `BEGIN; INSERT; INSERT; COMMIT` +buys nothing and adds round-trips. The multi-row form is also the +only path that lets jet's +[`InsertStatement.VALUES(...)`](https://pkg.go.dev/github.com/go-jet/jet/v2/postgres#InsertStatement) +chain without escape hatches. Atomicity is verified end-to-end by +[`TestBulkInsertAtomicConflictRaceName`](../internal/adapters/postgres/playermappingstore/store_test.go) +(3 valid rows + 1 conflicting → 0 rows persisted). + +### 8. `miniredis/v2` is a direct gamemaster dependency + +**Decision.** +[`go.mod`](../go.mod) gains `github.com/alicebob/miniredis/v2` as a +direct dependency. The +[`streamoffsets` test suite](../internal/adapters/redisstate/streamoffsets/store_test.go) +uses `miniredis.RunT(t)` per test for full isolation. + +**Why.** Same reasoning as `rtmanager`: an in-memory Redis is faster +than testcontainers Redis, fully isolated per test, and fits the +shape of the offset-store API. Adding it as a direct dep matches the +pattern in the repo (`rtmanager`, `notification`, `lobby` all do this +for similar adapter test suites). + +## Files landed + +- [`../internal/domain/engineversion/model.go`](../internal/domain/engineversion/model.go) + — `ErrConflict` sentinel. +- [`../internal/domain/playermapping/model.go`](../internal/domain/playermapping/model.go) + — `ErrConflict` sentinel. +- [`../internal/ports/engineversionstore.go`](../internal/ports/engineversionstore.go) + — `Now` field, `Deprecate(ctx, version, now)` signature. +- [`../internal/ports/engineversionstore_test.go`](../internal/ports/engineversionstore_test.go) + — port-level fixtures plus the new `now must not be zero` reject + case. +- [`../internal/adapters/postgres/internal/sqlx/sqlx.go`](../internal/adapters/postgres/internal/sqlx/sqlx.go) + — `WithTimeout`, `IsNoRows`, `IsUniqueViolation`, `Nullable*` + helpers (mirror of `rtmanager`). +- [`../internal/adapters/postgres/internal/pgtest/pgtest.go`](../internal/adapters/postgres/internal/pgtest/pgtest.go) + — testcontainers harness scoped to the `gamemaster` schema and + service role. +- [`../internal/adapters/postgres/runtimerecordstore/store.go`](../internal/adapters/postgres/runtimerecordstore/store.go) + with full `_test.go`. +- [`../internal/adapters/postgres/engineversionstore/store.go`](../internal/adapters/postgres/engineversionstore/store.go) + with full `_test.go`. +- [`../internal/adapters/postgres/playermappingstore/store.go`](../internal/adapters/postgres/playermappingstore/store.go) + with full `_test.go`. +- [`../internal/adapters/postgres/operationlog/store.go`](../internal/adapters/postgres/operationlog/store.go) + with full `_test.go`. +- [`../internal/adapters/redisstate/keyspace.go`](../internal/adapters/redisstate/keyspace.go). +- [`../internal/adapters/redisstate/streamoffsets/store.go`](../internal/adapters/redisstate/streamoffsets/store.go) + with full `_test.go`. +- [`../go.mod`](../go.mod), [`../go.sum`](../go.sum) — `miniredis/v2` + promoted to a direct dependency. +- [`../README.md`](../README.md) — §References pointer to this + record. + +## Verification + +```sh +cd gamemaster + +# Domain + port unit tests still pass after the Stage-11 contract +# touch-ups. +go test ./internal/domain/... ./internal/ports/... + +# All adapter test suites (require Docker for testcontainers; without +# Docker, the pgtest helpers call t.Skip). +go test ./internal/adapters/postgres/... +go test ./internal/adapters/redisstate/... + +# CAS race coverage with -race; the test must observe exactly one +# winner per run. +go test -count=3 -race -run TestUpdateStatusConcurrentCAS \ + ./internal/adapters/postgres/runtimerecordstore + +# Stage 06/07 contract freeze tests stay green: +go test ./... -run Contract +go test ./... -run NotificationIntent +``` + +The full repo-level `go build ./...` from the workspace root also +succeeds; service-layer stages (13+) and the mocks regeneration +(stage 12) are unaffected by Stage 11's adapter additions. diff --git a/gamemaster/docs/stage12-external-clients.md b/gamemaster/docs/stage12-external-clients.md new file mode 100644 index 0000000..813ec8e --- /dev/null +++ b/gamemaster/docs/stage12-external-clients.md @@ -0,0 +1,211 @@ +--- +stage: 12 +title: External clients +--- + +# Stage 12 — External clients + +This decision record captures the non-obvious choices made while +implementing the five outbound adapters Game Master uses to talk to +the engine, Game Lobby, Runtime Manager, the notification stream, and +the lobby-events stream at PLAN Stage 12. + +## Context + +[`../PLAN.md` Stage 12](../PLAN.md) ships the adapter layer the +service-layer stages 13–18 depend on. Ports were frozen by Stage 10 +([`stage10-domain-and-ports.md`](./stage10-domain-and-ports.md)) and +the AsyncAPI/OpenAPI contracts were frozen by Stage 06 +([`stage06-contract-files.md`](./stage06-contract-files.md)). The +reference precedent is `rtmanager`'s adapter tree +([`rtmanager/internal/adapters/lobbyclient`](../../rtmanager/internal/adapters/lobbyclient), +[`rtmanager/internal/adapters/notificationpublisher`](../../rtmanager/internal/adapters/notificationpublisher), +[`rtmanager/internal/adapters/healtheventspublisher`](../../rtmanager/internal/adapters/healtheventspublisher)), +which Stage 11 already locked in as the canonical shape for Game +Master persistence adapters. Stage 12 extends that precedent to the +HTTP clients and stream publishers. + +Six decisions deviate from a literal copy of the `rtmanager` precedent +or extend the literal task list of PLAN Stage 12. Each is recorded +below. + +## Decisions + +### 1. Engine client carries no `BaseURL` in `Config` + +**Decision.** +[`engineclient.Config`](../internal/adapters/engineclient/client.go) +exposes only `CallTimeout` and `ProbeTimeout`. The engine endpoint +URL is supplied per call from `runtime_records.engine_endpoint`. + +**Why.** Game Master operates on N concurrent games at runtime; each +game lives behind its own DNS hostname (`http://galaxy-game-{game_id}:8080`). +Binding a base URL at construction would force a per-game client +instance and complicate the caller. The port already reflects the +right shape (`baseURL` is a method parameter on every method), so the +adapter follows it. The `*http.Client` is shared, so the HTTP +connection pool stays single-instance. + +### 2. Two timeouts on the engine client, dispatched per method + +**Decision.** The engine client routes turn-generation-class methods +(`Init`, `Turn`, `BanishRace`, `ExecuteCommands`, `PutOrders`) +through `CallTimeout` and inspect-style methods (`Status`, +`GetReport`) through `ProbeTimeout`. Both are required and must be +positive at construction. + +**Why.** README §Configuration already declares the two +(`GAMEMASTER_ENGINE_CALL_TIMEOUT=30s`, +`GAMEMASTER_ENGINE_PROBE_TIMEOUT=5s`) for exactly this dispatch: +turn generation on a large game can run for tens of seconds, while +status/report reads are bounded and benefit from a tight ceiling. +A single shared timeout would either starve the long calls or relax +the short ones; the dispatch keeps the contract consistent with the +documented intent. + +### 3. Engine `population` (number) decoded into `int` via `math.Round` + +**Decision.** +[`engineclient`](../internal/adapters/engineclient/client.go) decodes +each `PlayerState.population` (typed as `number` in `game/openapi.yaml`) +into a private `float64` field, then converts to the port-level `int` +through `int(math.Round(value))`. NaN, infinite, and negative values +are rejected as `ports.ErrEngineProtocolViolation`. + +**Why.** The port (Stage 10) and the AsyncAPI for `gm:lobby_events` +both treat population as a non-negative integer; the engine spec is +the only place it is typed as `number`. The engine in practice +returns whole values, but a defensive `math.Round` removes any +floating-point noise that would otherwise propagate to Lobby. +Rejecting NaN/Inf/negative payloads keeps the protocol invariant +explicit at the trust boundary. + +### 4. Lobby client walks pagination with a hard page cap + +**Decision.** +[`lobbyclient.GetMemberships`](../internal/adapters/lobbyclient/client.go) +walks the `next_page_token` chain transparently with `page_size=200`, +stopping when the upstream response carries an empty +`next_page_token`. A hard cap of 64 pages (`maxPages`) surfaces as +`fmt.Errorf("%w: pagination overflow ...", ports.ErrLobbyUnavailable)` +when crossed. + +**Why.** The port contract is "every membership of gameID, in any +status"; the only way to satisfy it across Lobby's paged contract is +to follow the chain. The 64-page cap is a defensive guard against a +broken upstream that keeps issuing tokens; 64 × 200 = 12 800 +memberships per game, two orders of magnitude beyond any realistic +Galaxy roster, so legitimate traffic never trips it. Surfacing the +overflow as `ErrLobbyUnavailable` lets the membership cache treat it +the same as any other transport fault. + +### 5. RTM client does not introduce `ErrSemverPatchOnly` + +**Decision.** RTM's `409 conflict` with `error_code=semver_patch_only` +is wrapped as `fmt.Errorf("%w: rtm patch: ... (error_code=semver_patch_only)", ports.ErrRTMUnavailable)` +without a dedicated typed sentinel. + +**Why.** The Stage 10 port [`RTMClient.Patch`](../internal/ports/rtmclient.go) +declares only `ErrRTMUnavailable`. Adding `ErrSemverPatchOnly` here +would extend the port contract beyond Stage 10's frozen surface, and +the v1 service-layer caller (Stage 17, `adminpatch`) already +validates semver-patch eligibility against `engineversionstore` +before issuing the call. The 409 path is therefore a defence-in-depth +signal, not a primary branch; a single wrapped error keeps the port +narrow and lets the caller match on the message substring if it +ever needs to (today it does not). + +### 6. Lobby-events publisher reuses the `rtmanager/healtheventspublisher` +shape, with two methods sharing one stream + +**Decision.** +[`lobbyeventspublisher.Publisher`](../internal/adapters/lobbyeventspublisher/publisher.go) +exposes `PublishSnapshotUpdate` and `PublishGameFinished`, both +hitting the same Redis Stream key (`cfg.Streams.LobbyEvents`, +default `gm:lobby_events`). Each XADD encodes the same field +vocabulary as `rtmanager/healtheventspublisher`: integer fields are +serialised through `strconv.FormatInt` / `strconv.Itoa`, the +per-player projection is JSON-encoded into one stream field +(`player_turn_stats`), and the discriminator field (`event_type`) is +a string literal pinned to one of the two AsyncAPI const values. +No MAXLEN cap is set on XADD; an empty `PlayerTurnStats` slice is +serialised as `"[]"` (literal). All `time.Time` fields are coerced +to UTC before `UnixMilli()` so the published timestamps match the +contract regardless of caller-supplied timezone. + +**Why.** The two messages share one channel per the AsyncAPI spec +([`runtime-events-asyncapi.yaml`](../api/runtime-events-asyncapi.yaml)); +the discriminator is the documented dispatch key for Lobby's +consumer. Using the existing field-encoding pattern from +`rtmanager/healtheventspublisher` keeps the wire format consistent +across services and lets Lobby reuse the same XADD-decoding helpers +it already runs against `runtime:health_events`. Setting MAXLEN was +considered and rejected: Game Master never processes the stream +itself, and the Lobby consumer owns its consumer-group offset, so +trimming would risk dropping unconsumed entries. The empty `"[]"` +default keeps the stream entry valid JSON for the field even before +the first turn generates (when no per-player stats exist yet). + +### 7. Defensive Makefile guard for `make mocks` between Stage 12 and Stage 19 + +**Decision.** The `mocks` Makefile target now skips the +`internal/api/internalhttp/handlers/...` line when that directory +does not yet exist: + +```makefile +mocks: + go generate ./internal/ports/... + @if [ -d ./internal/api/internalhttp/handlers ]; then \ + go generate ./internal/api/internalhttp/handlers/...; \ + fi +``` + +**Why.** Stage 8 wired the Makefile to regenerate both port-level +and handler-level mocks, but the handlers directory only appears at +Stage 19. Without the guard, `make mocks` fails with `lstat: no such +file or directory` between Stage 12 and Stage 19 — exactly when GM +is being grown stage by stage. The guard makes the target idempotent +across stages and adds zero cost when the directory is finally +created. + +## Files landed + +- [`../internal/adapters/engineclient/client.go`](../internal/adapters/engineclient/client.go), + [`../internal/adapters/engineclient/client_test.go`](../internal/adapters/engineclient/client_test.go) +- [`../internal/adapters/lobbyclient/client.go`](../internal/adapters/lobbyclient/client.go), + [`../internal/adapters/lobbyclient/client_test.go`](../internal/adapters/lobbyclient/client_test.go) +- [`../internal/adapters/rtmclient/client.go`](../internal/adapters/rtmclient/client.go), + [`../internal/adapters/rtmclient/client_test.go`](../internal/adapters/rtmclient/client_test.go) +- [`../internal/adapters/notificationpublisher/publisher.go`](../internal/adapters/notificationpublisher/publisher.go), + [`../internal/adapters/notificationpublisher/publisher_test.go`](../internal/adapters/notificationpublisher/publisher_test.go) +- [`../internal/adapters/lobbyeventspublisher/publisher.go`](../internal/adapters/lobbyeventspublisher/publisher.go), + [`../internal/adapters/lobbyeventspublisher/publisher_test.go`](../internal/adapters/lobbyeventspublisher/publisher_test.go) +- [`../internal/adapters/mocks/`](../internal/adapters/mocks) — ten + generated `mockgen` files covering every Stage 10 port (engine, + lobby, rtm, notification publisher, lobby-events publisher, plus + the five store/log ports landed by Stage 11). +- [`../Makefile`](../Makefile) — defensive guard on the `mocks` + target. +- [`../README.md`](../README.md) — §References pointer to this + record. + +## Verification + +```sh +cd gamemaster + +# Mocks regenerate cleanly with no diff after a second run. +make mocks +git diff --exit-code internal/adapters/mocks + +# Adapter-level unit tests against httptest / miniredis. +go test ./internal/adapters/engineclient/... +go test ./internal/adapters/lobbyclient/... +go test ./internal/adapters/rtmclient/... +go test ./internal/adapters/notificationpublisher/... +go test ./internal/adapters/lobbyeventspublisher/... + +# Full repo build remains green; Stage 06/07/09–11 contract and +# adapter tests are unaffected. +go test ./... +``` diff --git a/gamemaster/docs/stage13-register-runtime.md b/gamemaster/docs/stage13-register-runtime.md new file mode 100644 index 0000000..607089f --- /dev/null +++ b/gamemaster/docs/stage13-register-runtime.md @@ -0,0 +1,230 @@ +--- +stage: 13 +title: Register-runtime service +--- + +# Stage 13 — Register-runtime service + +This decision record captures the non-obvious choices made while +implementing the `register-runtime` service-layer orchestrator at PLAN +Stage 13. The service is the single entry point Game Lobby uses (after +Runtime Manager has reported a successful container start) to install a +freshly-started game in Game Master. + +## Context + +[`../PLAN.md` Stage 13](../PLAN.md) ships the first service-layer stage +of Game Master. It lays the orchestrator pattern that Stages 14–17 will +reuse (engine version registry CRUD, scheduler, hot path, admin +operations). The lifecycle the service drives is frozen by +[`../README.md` §Lifecycles → Register-runtime](../README.md): + +1. validate request shape; +2. reject if `runtime_records.{game_id}` already exists; +3. resolve `image_ref` for `target_engine_version`; +4. persist `runtime_records` with `status=starting`; +5. call engine `POST /api/v1/admin/init`; +6. persist `player_mappings` from the engine response; +7. CAS `status: starting → running` and persist initial scheduling; +8. append `operation_log`; +9. publish `runtime_snapshot_update`; +10. return the persisted record. + +The reference precedent is +[`rtmanager/internal/service/startruntime`](../../rtmanager/internal/service/startruntime), +which established the `Input` / `Result` / `Dependencies` / `NewService` +/ `Handle` shape, the `recordFailure` helper, and the +`bestEffortAppend` audit-log convention. + +Five decisions deviate from a literal reading of either PLAN Stage 13 +or the rtmanager precedent. Each is recorded below. + +## Decisions + +### 1. `RuntimeRecordStore.Delete` extension + +**Decision.** [`ports.RuntimeRecordStore`](../internal/ports/runtimerecordstore.go) +gains an idempotent `Delete(ctx, gameID) error` method. The +PostgreSQL-backed adapter +[`runtimerecordstore.Store.Delete`](../internal/adapters/postgres/runtimerecordstore/store.go) +issues a single `DELETE FROM runtime_records WHERE game_id = $1` and +returns `nil` even when no row matches. The mock at +[`internal/adapters/mocks/mock_runtimerecordstore.go`](../internal/adapters/mocks/mock_runtimerecordstore.go) +is regenerated by `make -C gamemaster mocks`. A lone integration +test `TestDeleteIdempotent` mirrors `TestDeleteByGameIdempotent` in +`playermappingstore`. + +**Why.** The README's failure paths for `register-runtime` mandate +"roll back `runtime_records`" on every post-Insert failure. The Stage 10 +port surface had no Delete primitive, so the orchestrator could not +satisfy the README without one. Three alternatives were considered +and rejected: + +- **Reorder the flow** (call engine init first, only then persist + `runtime_records`): contradicts the README, which lists the Insert + step before the engine call so that the in-flight `starting` row is + observable to inspect surfaces and acts as a coordination point for + concurrent register-runtime requests on the same game id. +- **Introduce a `removed` status enum**: changes the runtime status + machine for one transient bookkeeping case; complicates indexes, + filters, and the inspect surface; is not described anywhere in + README §Game Master status model. +- **Single SQL transaction across both stores**: requires the adapter + layer to expose a transactional sub-interface, breaking the per-port + abstraction Stage 10 set up. The cost of one extra method on a + single port is far smaller. + +This is the same pattern Stage 11 used for `UpdateEngineVersionInput.Now` +and `Deprecate(ctx, version, now)`: a small, targeted contract delta +admitted by the pre-launch single-init policy. + +### 2. Engine 4xx → `engine_validation_error`, engine 5xx → +`engine_unreachable` + +**Decision.** When the engine `/admin/init` call returns 4xx, the +service produces `Result{ErrorCode: engine_validation_error}`. When it +returns 5xx (or fails at the transport layer), the service produces +`Result{ErrorCode: engine_unreachable}`. The classification lives in +[`classifyEngineError`](../internal/service/registerruntime/service.go) +and dispatches on the engine port sentinels +(`ports.ErrEngineValidation`, `ports.ErrEngineUnreachable`, +`ports.ErrEngineProtocolViolation`). + +**Why.** [`../PLAN.md` Stage 13](../PLAN.md) lists the two as separate +test cases ("engine 4xx (engine_validation_error), engine 5xx +(engine_unreachable)"), but [`../README.md` §Lifecycles → +Register-runtime](../README.md)'s failure-path table at the time of +Stage 13 lumped them as `engine_unreachable`. PLAN's classification is +more useful operationally: + +- 4xx from the engine signals a contract violation (the engine + rejected the request shape, which is a Game Master bug or a stale + contract). Treating this as `engine_unreachable` would push + operators down the "is the engine alive?" branch when the right + branch is "did the GM build send the right shape?". +- 5xx (and transport failures) signal that the engine is unreachable + or unhealthy. `engine_unreachable` is the right code. + +The README §Lifecycles failure-path table is updated in the same +patch to reflect the split, so the two documents agree. + +### 3. Engine response validated as `engine_protocol_violation` + +**Decision.** After a successful engine `/admin/init` HTTP response, +the service performs two extra checks before persisting any +player_mappings: + +- the number of returned players must equal the input roster size; +- the set of `RaceName` values returned must be a subset of the + roster (no extra races, no missing races). + +A failure on either check rolls back the runtime record and returns +`Result{ErrorCode: engine_protocol_violation}`. + +**Why.** The README's failure-path table includes +`engine_protocol_violation` for "engine response missing players or +contains races not in roster". The engine adapter ([Stage 12, +`engineclient.decodeStateResponse`](../internal/adapters/engineclient/client.go)) +validates the wire shape (presence of required fields, well-formed +numeric values), but it cannot validate against the roster Game Master +sent — only the service layer knows the roster. Splitting the two +checks keeps the adapter narrow and lets the service-layer error code +carry the semantic meaning. + +### 4. Initial `runtime_snapshot_update` carries non-empty +`player_turn_stats` + +**Decision.** The first `runtime_snapshot_update` published by +register-runtime carries one +`PlayerTurnStats{UserID, Planets, Population}` row per active member, +projected from the `engine.Init` response by joining on `RaceName` +against the input roster. The projection is sorted by `UserID` for a +deterministic wire order. + +**Why.** The README §Async Stream Contracts cadence note used to read +"empty when the snapshot is published for a status transition with no +new turn payload". For register-runtime there *is* a new payload — the +engine returns the initial player state in its `/admin/init` response, +including `Planets` and `Population`. That state is the turn-0 +baseline against which Lobby's per-game stats aggregator measures +later deltas: without it, the first per-player delta after turn 1 +would silently equal "everything" instead of "the change since +turn 0". The README cadence wording is updated in the same patch to +say the register-runtime snapshot carries the engine's turn-0 stats. + +### 5. Best-effort rollback with two-flag gating + +**Decision.** The service exposes a single `rollback(ctx, gameID, +playerMappingsInstalled)` helper that always tries `runtime_records.Delete` +and conditionally tries `playermappings.DeleteByGame`. The two booleans +on `recordFailure` (`runtimeInserted`, `playerMappingsInstalled`) +gate the rollback so: + +- a pre-Insert failure (`invalid_request`, `conflict` from `Get`, + `engine_version_not_found`, `Insert`'s own `ErrConflict`) skips + rollback entirely; +- a post-Insert / pre-BulkInsert failure deletes only the runtime + row; +- a post-BulkInsert failure deletes both. Note that BulkInsert errors + themselves never install rows (per stage 11 D7's per-statement + atomicity), so on `BulkInsert` returning ErrConflict the rollback + flag for player_mappings is `false`. + +The rollback uses a fresh `context.Background()` with a 5-second +timeout so a cancelled request context does not strand the +`starting` row. + +**Why.** A common pitfall in rollback paths is to call `Delete` on +state owned by another caller. The Insert-conflict branch is the +canonical example: when our `Insert` returns `ErrConflict`, another +request inserted the row first and owns it. Blindly deleting it +would corrupt that other caller's state. The two-flag gating makes +the ownership transfer explicit. The fresh background context +mirrors the same pattern in `rtmanager.startruntime.releaseLease`. + +## Files landed + +- [`../internal/ports/runtimerecordstore.go`](../internal/ports/runtimerecordstore.go) + — added `Delete` to the interface and the comment block. +- [`../internal/adapters/postgres/runtimerecordstore/store.go`](../internal/adapters/postgres/runtimerecordstore/store.go) + — implemented `Delete`. +- [`../internal/adapters/postgres/runtimerecordstore/store_test.go`](../internal/adapters/postgres/runtimerecordstore/store_test.go) + — added `TestDeleteIdempotent` and `TestDeleteRejectsEmptyGameID`. +- [`../internal/adapters/mocks/mock_runtimerecordstore.go`](../internal/adapters/mocks/mock_runtimerecordstore.go) + — regenerated. +- [`../internal/service/registerruntime/service.go`](../internal/service/registerruntime/service.go) + with [`errors.go`](../internal/service/registerruntime/errors.go) + and [`service_test.go`](../internal/service/registerruntime/service_test.go) + — new orchestrator package and tests. +- [`../README.md`](../README.md) — §References pointer to this record + plus one-line clarifications in §Lifecycles → Register-runtime + (failure-path table now splits 4xx/5xx per **D2**) and §Async Stream + Contracts (cadence note now says the register-runtime snapshot + carries `player_turn_stats` from the engine-init response per **D4**). +- [`../PLAN.md`](../PLAN.md) — Stage 13 marked done. + +## Verification + +```sh +cd gamemaster + +# Mocks regenerate cleanly with no diff after the port extension. +make mocks +git diff --exit-code internal/adapters/mocks + +# Domain + port tests still pass. +go test ./internal/domain/... ./internal/ports/... + +# Adapter test for the new Delete method. +go test ./internal/adapters/postgres/runtimerecordstore/... + +# Service-level tests for the new orchestrator. +go test ./internal/service/registerruntime/... + +# Stage 06/07/09–12 contract / adapter / freeze tests stay green. +go test ./... +``` + +The full repo-level `go build ./...` from the workspace root succeeds; +later stages (14+) build on the orchestrator shape Stage 13 +establishes. diff --git a/gamemaster/docs/stage14-engine-version-registry.md b/gamemaster/docs/stage14-engine-version-registry.md new file mode 100644 index 0000000..f830a6a --- /dev/null +++ b/gamemaster/docs/stage14-engine-version-registry.md @@ -0,0 +1,220 @@ +--- +stage: 14 +title: Engine version registry service +--- + +# Stage 14 — Engine version registry service + +This decision record captures the non-obvious choices made while +implementing the `engine_version` registry service-layer at PLAN +Stage 14. The service backs the +`/api/v1/internal/engine-versions/*` REST surface (Stage 19) and the +hot-path `image_ref` resolve called synchronously by Game Lobby's +start flow. + +## Context + +[`../PLAN.md` Stage 14](../PLAN.md) lists seven service methods: +`List`, `Get`, `Create`, `Update`, `Deprecate`, `Delete`, +`ResolveImageRef`. The lifecycle the service drives is frozen by +[`../README.md` §Engine Version Registry](../README.md). The reference +precedent for shape and audit semantics is +[`../internal/service/registerruntime`](../internal/service/registerruntime/service.go) +landed at Stage 13. + +Five decisions deviate from a literal reading of either Stage 14 or +the existing port and migration shapes. Each is recorded below. + +## Decisions + +### 1. `EngineVersionStore.Delete` extension + +**Decision.** [`ports.EngineVersionStore`](../internal/ports/engineversionstore.go) +gains a `Delete(ctx, version) error` method that returns +`engineversion.ErrNotFound` when no row matches. The PostgreSQL-backed +adapter [`engineversionstore.Store.Delete`](../internal/adapters/postgres/engineversionstore/store.go) +issues a single `DELETE FROM engine_versions WHERE version = $1` and +distinguishes "missing" from "removed" via `RowsAffected`. The mock at +[`internal/adapters/mocks/mock_engineversionstore.go`](../internal/adapters/mocks/mock_engineversionstore.go) +is regenerated by `make -C gamemaster mocks`. Three adapter tests +(`TestDeleteHappy`, `TestDeleteNotFound`, `TestDeleteRejectsEmptyVersion`) +mirror the pattern from the existing Deprecate tests. + +**Why.** Stage 14 explicitly requires the service to expose a hard +`Delete` distinct from `Deprecate`. The Stage 11 port surface only +carried `Deprecate` (idempotent soft-mark) and +`IsReferencedByActiveRuntime` (read probe). Three alternatives were +considered and rejected: + +- **Skip hard delete**: omits a Stage 14 deliverable and forces a port + delta later. The OpenAPI 409 `engine_version_in_use` example would + also become a dangling spec entry. +- **Reuse `Deprecate` for both soft and hard semantics**: contradicts + README §Engine Version Registry ("`status` values: ... `deprecated` + (rejected on new starts; existing runtimes unaffected)"). A + referenced version must remain deprecable so the operator can phase + in a successor while existing runtimes finish out — folding the + reference check into Deprecate would break that flow. +- **Inline the SQL inside the service**: contradicts the per-port + abstraction Stage 10 set up; the service must not import the jet + table package. + +This is the same pattern Stage 13 D1 used for +`RuntimeRecordStore.Delete`: a small, targeted contract delta admitted +by the pre-launch single-init policy. + +### 2. Hard-delete reference probe runs before adapter `Delete` + +**Decision.** [`Service.Delete`](../internal/service/engineversion/service.go) +calls `versions.IsReferencedByActiveRuntime` first; on a positive +result it surfaces `ErrInUse` without ever calling the adapter +`Delete`. Only when the probe reports zero references does the service +issue the SQL DELETE. + +**Why.** Two alternatives were rejected: + +- **Single transaction with `SELECT ... FOR UPDATE` plus DELETE**: + requires the adapter to expose a transactional sub-interface and + forces the service into store-internal locking semantics. The plan + is single-instance (README §Non-Goals), so the small race window + between probe and delete is acceptable and self-correcting (a + late-arriving register-runtime against a deprecated version would + fail at `runtime_records` insert anyway because the version row is + gone — the eventual outcome is the same). +- **Probe-after-delete**: leaks the DELETE on transient probe + failures and surfaces a misleading "deleted" outcome to the caller. + +Surfacing `engine_version_in_use` before any mutation matches the +README §Error Model wording and the OpenAPI `EngineVersionInUseError` +example. + +### 3. `engine_version_delete` op kind added to schema and domain + +**Decision.** A new audit value `engine_version_delete` is added to: + +- [`domain/operation.OpKind`](../internal/domain/operation/log.go) + (constant, `IsKnown`, `AllOpKinds`); +- [`migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) + (the `operation_log_op_kind_chk` CHECK constraint); +- README §Persistence Layout (the `op_kind` enum listing in the + `operation_log` description). + +The pre-launch single-init policy from +[`../../ARCHITECTURE.md` §Persistence Backends](../../ARCHITECTURE.md) +allows editing `00001_init.sql` until first production deploy. + +**Why.** Two alternatives were rejected: + +- **Reuse `engine_version_deprecate`** for hard delete: semantically + weak; audit consumers would have to inspect outcome plus an + out-of-band column to tell soft from hard, defeating the audit's + signal value. +- **Skip audit for hard delete**: inconsistent with every other + service-layer mutation (every Stage 13/14 mutation writes + operation_log). Forensics on a destructive admin action are exactly + where audit matters most. + +### 4. `operation_log.game_id` column doubles as audit subject + +**Decision.** Engine-version CRUD audit entries store the canonical +`version` string in the `OperationEntry.GameID` field (and therefore +in the `operation_log.game_id` column). For `OpKindEngineVersionCreate` +the canonical post-`ParseSemver` form is used (`v1.2.3`); for +`OpKindEngineVersionUpdate` / `Deprecate` / `Delete` the user-supplied +version is used so failed lookups still record the attempt verbatim. + +**Why.** Three alternatives were considered and rejected: + +- **Make `game_id` nullable and add a `subject_id` column**: requires + a migration delta + jet regeneration + a domain field rename. Out + of scope for stage 14 and inconsistent with the minimal-diff + principle. +- **Use a sentinel `engine_version:` prefix**: harder to query + alongside per-game audit reads; the index + `operation_log (game_id, started_at DESC)` already covers + subject-scoped reads, and a sentinel prefix would force callers to + strip it. +- **Skip audit for engine-version CRUD**: README §Persistence Layout + explicitly lists `engine_version_create | engine_version_update | + engine_version_deprecate` as op_kind values; the audit table is + the canonical surface. + +The decision is recorded both here and in the README §Persistence +Layout note so future readers can find the overload rationale. + +### 5. JSON-object validation for `Options` + +**Decision.** [`Service.Create`](../internal/service/engineversion/service.go) +and `Service.Update` validate the `Options` byte slice as a JSON +object before persisting (raw bytes are decoded into +`map[string]any`; non-objects, including arrays and scalars, are +rejected with `invalid_request`). Empty/whitespace-only input passes +through as nil; the adapter (Stage 11 D5) already substitutes the +schema default `'{}'::jsonb`. + +**Why.** The `engine_versions.options` column is `jsonb`. Persisting +an array, scalar, or malformed JSON would either be rejected by the +PostgreSQL parser at INSERT time (surfacing as a generic 500) or +accepted and break engine-side consumers that expect an object. The +service-layer validation surfaces a clear `invalid_request` early and +keeps the contract honest. README §Engine Version Registry already +describes `options` as a "free-form `jsonb` document" (object +implied); the validation makes that wording load-bearing. + +## Files landed + +- [`../internal/ports/engineversionstore.go`](../internal/ports/engineversionstore.go) + — added `Delete` to the interface and the comment block. +- [`../internal/adapters/postgres/engineversionstore/store.go`](../internal/adapters/postgres/engineversionstore/store.go) + — implemented `Delete`. +- [`../internal/adapters/postgres/engineversionstore/store_test.go`](../internal/adapters/postgres/engineversionstore/store_test.go) + — added `TestDeleteHappy`, `TestDeleteNotFound`, + `TestDeleteRejectsEmptyVersion`. +- [`../internal/adapters/mocks/mock_engineversionstore.go`](../internal/adapters/mocks/mock_engineversionstore.go) + — regenerated. +- [`../internal/adapters/postgres/migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql) + — added `engine_version_delete` to `operation_log_op_kind_chk`. +- [`../internal/domain/operation/log.go`](../internal/domain/operation/log.go) + with [`log_test.go`](../internal/domain/operation/log_test.go) + — added `OpKindEngineVersionDelete` plus `IsKnown`/`AllOpKinds` + membership. +- [`../internal/service/engineversion/service.go`](../internal/service/engineversion/service.go) + with [`errors.go`](../internal/service/engineversion/errors.go) + and [`service_test.go`](../internal/service/engineversion/service_test.go) + — new orchestrator package and tests. +- [`../internal/service/registerruntime/service_test.go`](../internal/service/registerruntime/service_test.go) + — `fakeEngineVersions` gains a stub `Delete` to satisfy the + extended port. +- [`../README.md`](../README.md) — §References pointer to this + record; §Persistence Layout note that engine-version CRUD audit + entries store `version` in the `game_id` column and that + `engine_version_delete` joins the op_kind enum. +- [`../PLAN.md`](../PLAN.md) — Stage 14 marked done. + +## Verification + +```sh +cd gamemaster + +# Mocks regenerate cleanly with no diff after the port extension is +# committed alongside this stage. +make mocks +git diff --exit-code internal/adapters/mocks + +# Domain + port tests still pass (operation log enum membership). +go test ./internal/domain/... ./internal/ports/... + +# Adapter test for the new Delete method and the migration's CHECK +# constraint. +go test ./internal/adapters/postgres/engineversionstore/... +go test ./internal/adapters/postgres/operationlog/... + +# Service-level tests for the new orchestrator. +go test ./internal/service/engineversion/... + +# Stage 13 service tests still pass (the fake gains a stub Delete). +go test ./internal/service/registerruntime/... + +# Repo build succeeds at the workspace root. +go build ./... +``` diff --git a/gamemaster/docs/stage15-scheduler-and-turn-generation.md b/gamemaster/docs/stage15-scheduler-and-turn-generation.md new file mode 100644 index 0000000..8937b6e --- /dev/null +++ b/gamemaster/docs/stage15-scheduler-and-turn-generation.md @@ -0,0 +1,297 @@ +--- +stage: 15 +title: Scheduler, turn generation, and snapshot publisher +--- + +# Stage 15 — Scheduler, turn generation, and snapshot publisher + +This decision record captures the non-obvious choices made while +implementing the scheduler ticker, the turn-generation orchestrator, +and the publication of `gm:lobby_events` plus `notification:intents` +at PLAN Stage 15. It is the heart of Game Master: every running game +flows through this code path on every scheduled or admin-forced turn. + +## Context + +[`../PLAN.md` Stage 15](../PLAN.md) ships three components that +together drive a turn: + +1. `service/turngeneration` — the orchestrator that CAS's `running → + generation_in_progress`, calls the engine `/admin/turn`, branches + on `finished`, and publishes a `runtime_snapshot_update` / + `game_finished` event plus the corresponding `game.turn.ready` / + `game.finished` / `game.generation_failed` notification. +2. `service/scheduler` — a thin, stateless wrapper around + `domain/schedule.Schedule.Next` reused by the turn-generation + recompute step and (in Stage 17) by `service/adminforce`. +3. `worker/schedulerticker` — the 1-second loop that scans + `runtime_records.ListDueRunning(now)` and dispatches one + `turngeneration.Handle` per due game. + +The lifecycle the orchestrator drives is frozen by +[`../README.md` §Lifecycles → Turn generation](../README.md), and the +publication cadence by [§Async Stream Contracts](../README.md) and +[§Notification Contracts](../README.md). The reference precedent for +the orchestrator shape (Input / Result / Dependencies / NewService / +Handle) is Stage 13's `service/registerruntime`. + +Seven decisions deviate from a literal reading of either PLAN Stage 15, +the README, or the Stage 13 precedent. Each is recorded below. + +## Decisions + +### D1. Resolve `game_name` synchronously from Lobby per notification + +**Decision.** [`ports.LobbyClient`](../internal/ports/lobbyclient.go) +gains a `GetGameSummary(ctx, gameID) (GameSummary, error)` method plus +a narrow `GameSummary{GameID, GameName, Status}` type. The +HTTP-backed adapter at +[`internal/adapters/lobbyclient/client.go`](../internal/adapters/lobbyclient/client.go) +issues a `GET /api/v1/internal/games/{game_id}` against the Lobby +internal listener, decodes the `GameRecord` shape (Lobby's frozen +contract), and wraps every non-success outcome with +`ports.ErrLobbyUnavailable`. The `turngeneration` service calls it +before publishing each `notification:intents` entry; on any error the +orchestrator falls back to using `game_id` as `game_name` and logs a +`warn` event with `error_code=lobby_unavailable`. + +**Why.** `notificationintent.GameTurnReadyPayload`, +`GameFinishedPayload`, and `GameGenerationFailedPayload` all require a +`game_name` string, but Game Master does not own the platform name and +the `register-runtime` envelope does not carry it. Three alternatives +were considered and rejected: + +- **Extend the `register-runtime` contract with `game_name` and + persist it on `runtime_records`.** Cleanest architecturally, but + requires editing the Stage 06 frozen OpenAPI spec, the contract + test, the Stage 09 migration, the Stage 10 domain type, the + Stage 11 store and tests, the Stage 13 register-runtime service and + tests, and the regenerated jet code. Substantial cross-stage churn + for a single denormalised string. +- **Use `game_id` as the `game_name` placeholder unconditionally.** + Zero change cost, but every push notification a user receives + carries the opaque platform identifier — a user-visible regression. +- **Defer notification publication to Stage 16.** Contradicts the + PLAN Stage 15 task list, which explicitly enumerates + `game.turn.ready`, `game.finished`, and `game.generation_failed` + publication. + +The chosen design adds one method and one return type to a port +already established in Stage 12, with fail-soft fallback semantics +that keep notification publication best-effort. + +### D2. `Trigger` parameter classifies telemetry, never logic + +**Decision.** The plan's input shape `{gameID, trigger ∈ {scheduler, +force}}` is preserved as `turngeneration.Input.Trigger`. The value +flows into the +`gamemaster.turn_generation.outcomes` counter as a +`trigger` label and into structured logs; it does **not** branch the +orchestrator's persistence path. The skip-tick mechanic is driven +exclusively by the runtime record's `skip_next_tick` column. + +**Why.** [`../README.md §Force-next-turn`](../README.md) describes +adminforce as: "Run the turn-generation flow synchronously (the same +code path the scheduler uses). After success, set +`runtime_records.skip_next_tick = true`." Adminforce flips the flag +*after* the forced turn completes; the *next* scheduler-driven +generation consumes it. Forking the orchestrator on `Trigger` would +duplicate the recompute logic in two places and reopen the question +"what if a force fires while skip_next_tick is already true?". +Single-path makes the answer fall out of the existing rule (read the +flag at start, clear at recompute) without special cases. + +### D3. Two CAS pattern with cleanup on engine failure + +**Decision.** Persistence steps mirror Stage 13's CAS-then-rollback +pattern with two CAS transitions per generation: + +1. `running → generation_in_progress` at the start. On + `runtime.ErrConflict` (concurrent stop / external mutation) the + orchestrator returns `Result{ErrorCode: conflict}` without + publishing events; the external mutation is responsible for its + own snapshot. +2. After the engine call: + - success + `finished=true` → `generation_in_progress → finished`; + - success + `finished=false` → `generation_in_progress → running`; + - engine error → `generation_in_progress → generation_failed`. + +The post-engine CAS surfaces `runtime.ErrConflict` only when an +external mutation (typical cause: admin issued a stop while the engine +was generating) overtook the orchestrator. The engine call has +already mutated state, but the runtime row is owned by the new actor; +the orchestrator records the audit failure with `conflict` and exits. + +**Why.** This keeps Stage 13's pattern intact: every CAS knows what +state the row should be in before the call, and a mismatch always +yields `conflict`. Mixing the two CAS guards with a single combined +status update (e.g., a transactional "running and not stopped") would +require the adapter to expose multi-status CAS predicates, breaking +the per-row CAS abstraction Stage 11 settled on. + +### D4. Snapshot cadence: one publication per outcome + +**Decision.** The orchestrator publishes exactly one +`runtime_snapshot_update` *or* `game_finished` per turn-generation +call: + +- success + not finished → `PublishSnapshotUpdate` with full + `player_turn_stats`; +- success + finished → `PublishGameFinished` with full + `player_turn_stats`; +- engine failure → `PublishSnapshotUpdate` with + `RuntimeStatus=generation_failed` and empty `player_turn_stats` + (no fresh engine payload). + +The intermediate `running → generation_in_progress` transition is +**not** broadcast. + +**Why.** The README cadence enumerates "transitioned" cases as +examples (`running ↔ generation_in_progress`), but PLAN Stage 15 +explicitly anchors publication on the outcome side. Publishing twice +would double Lobby's processing cost without delivering new +information, because `generation_in_progress` carries no fresh engine +state and Lobby cannot act on the in-progress moment. + +### D5. Notification recipients = `playermappingstore.ListByGame` + +**Decision.** `game.turn.ready` and `game.finished` use +`AudienceKindUser` and need a sorted unique non-empty +`recipient_user_ids` list. The orchestrator derives it from +`playermappingstore.ListByGame(gameID)` projected to `UserID` values, +deduplicated and sorted ascending. Empty rosters cause the +notification to be skipped silently with a `warn` log; the runtime +mutation persists. + +**Why.** This is the only roster data Game Master owns until Stage 16 +delivers the membership cache. After Stage 17 wires `banish`, the +player_mappings rows still represent the engine-known roster and +remain a correct conservative recipient set (banished members will be +filtered separately by Notification Service's user resolution if +absent in `User Service`). Adding a synchronous Lobby +`GetMemberships` call here would duplicate the work Stage 16 is +already on the hook to provide. + +### D6. Scheduler service is a stateless utility + +**Decision.** +[`service/scheduler.Service`](../internal/service/scheduler/service.go) +exposes a single `ComputeNext(turnSchedule, after, skipNextTick) +(time.Time, bool, error)` method that wraps `schedule.Parse(...).Next(after, +skipNextTick)`. The service holds no dependencies and no clock; the +caller passes `after`. `turngeneration` injects a +`*scheduler.Service` and uses it during the post-success recompute; +Stage 17 will reuse the same instance from `adminforce`. + +**Why.** Centralising the parse-then-next sequence in one place keeps +the skip rule in one place and makes the future Stage 17 caller +trivial. Holding no state means tests are pure value tests against the +`domain/schedule` wrapper; no clock injection or dependency wiring is +required. + +### D7. Per-game in-flight set on the scheduler ticker + +**Decision.** +[`worker/schedulerticker.Worker`](../internal/worker/schedulerticker/worker.go) +holds a `sync.Map[gameID]struct{}` of currently-dispatched games. At +each tick the worker scans `RuntimeRecords.ListDueRunning(now)` and +launches one goroutine per due game; if `LoadOrStore` reports the game +is already in-flight, the worker logs at `debug` and skips. The +goroutine releases the slot via `defer w.inflight.Delete(gameID)`. + +**Why.** A 1-second tick is shorter than typical engine call latency +plus PostgreSQL round-trips, so two ticks can observe the same due row +before the first completes. The CAS in `turngeneration` is the +authoritative protection (only one goroutine can flip `running → +generation_in_progress`), but two goroutines doing the engine call and +discarding the loser as `conflict` would waste an engine call and +inflate `engine_validation_error` / `engine_unreachable` counters with +spurious entries. The in-flight set is a 4-line optimisation that +removes the spurious work. + +`Worker.Wait` exposes the in-flight `sync.WaitGroup` so tests (and +Stage 19's wiring) can drive `Tick` deterministically and observe +completion. `Run` itself waits on the same group before returning so +context cancellation gracefully drains in-flight work. + +## Files landed + +**Modified:** + +- [`../internal/ports/lobbyclient.go`](../internal/ports/lobbyclient.go) + — added `GetGameSummary` to the interface plus the `GameSummary` + type. +- [`../internal/adapters/lobbyclient/client.go`](../internal/adapters/lobbyclient/client.go) + — implemented `GetGameSummary` with the same `ErrLobbyUnavailable` + wrapping precedent as `GetMemberships`. +- [`../internal/adapters/lobbyclient/client_test.go`](../internal/adapters/lobbyclient/client_test.go) + — table-driven tests for happy path, 404, 5xx, malformed JSON, + missing required fields, timeout, and bad input. +- [`../internal/adapters/mocks/mock_lobbyclient.go`](../internal/adapters/mocks/mock_lobbyclient.go) + — regenerated. + +**Created:** + +- [`../internal/service/scheduler/service.go`](../internal/service/scheduler/service.go), + [`../internal/service/scheduler/service_test.go`](../internal/service/scheduler/service_test.go) + — stateless scheduler utility. +- [`../internal/service/turngeneration/service.go`](../internal/service/turngeneration/service.go), + [`../internal/service/turngeneration/errors.go`](../internal/service/turngeneration/errors.go), + [`../internal/service/turngeneration/service_test.go`](../internal/service/turngeneration/service_test.go) + — turn-generation orchestrator and tests. +- [`../internal/worker/schedulerticker/worker.go`](../internal/worker/schedulerticker/worker.go), + [`../internal/worker/schedulerticker/worker_test.go`](../internal/worker/schedulerticker/worker_test.go) + — scheduler ticker worker and tests. +- This decision record. + +**Reused (not modified):** + +- `internal/domain/runtime/{model.go, transitions.go}` — + `running → generation_in_progress`, `generation_in_progress → + running`, `generation_in_progress → generation_failed`, + `generation_in_progress → finished` were all permitted by the + Stage 10 transitions table. +- `internal/domain/schedule/nexttick.go` — the cron + skip wrapper. +- `internal/domain/operation/log.go` — the `OpKindTurnGeneration` + enum value already in place. +- `internal/ports/{runtimerecordstore.go, engineclient.go, + playermappingstore.go, operationlog.go, + notificationpublisher.go, lobbyeventspublisher.go}` — every store + and publisher used by the orchestrator was already present. +- `internal/telemetry/runtime.go` — `RecordTurnGenerationOutcome`, + `RecordLobbyEventPublished`, `RecordNotificationPublishAttempt`. +- `pkg/notificationintent.NewGameTurnReadyIntent`, + `NewGameFinishedIntent`, `NewGameGenerationFailedIntent`. + +## Verification + +```sh +cd gamemaster + +# Mock regeneration must produce the GetGameSummary additions and +# nothing else. +make mocks +git diff --stat internal/adapters/mocks + +# Domain + ports tests still pass. +go test ./internal/domain/... ./internal/ports/... + +# Scheduler utility. +go test ./internal/service/scheduler/... + +# Turn-generation orchestrator. +go test ./internal/service/turngeneration/... + +# Scheduler ticker worker. +go test ./internal/worker/schedulerticker/... + +# Updated lobby client adapter. +go test ./internal/adapters/lobbyclient/... + +# Module-wide build remains green. +go test ./... +``` + +Out-of-scope for this stage: app wiring (Stage 19), service-local +integration suite (Stage 21), cross-service Lobby ↔ GM tests +(Stage 22). diff --git a/gamemaster/docs/stage16-membership-cache-and-invalidation.md b/gamemaster/docs/stage16-membership-cache-and-invalidation.md new file mode 100644 index 0000000..943bd37 --- /dev/null +++ b/gamemaster/docs/stage16-membership-cache-and-invalidation.md @@ -0,0 +1,256 @@ +--- +stage: 16 +title: Hot-path services and membership cache +--- + +# Stage 16 — Hot-path services and membership cache + +This decision record captures the non-obvious choices made while +implementing the gateway-facing trio of player services +(`commandexecute`, `orderput`, `reportget`) and the in-process membership +cache that authorises every hot-path call. It is the last service-layer +stage before Stage 17 (admin operations) and Stage 19 (REST handlers and +wiring). + +## Context + +[`../PLAN.md` Stage 16](../PLAN.md) ships four components that together +make the player surface usable: + +1. `service/membership` — concurrent in-process LRU cache holding the + per-game `user_id → status` projection from + `Lobby /api/v1/internal/games/{game_id}/memberships`. TTL is the + safety net; the explicit invalidation hook from Lobby is the + primary staleness control. +2. `service/commandexecute` — orchestrator behind + `POST /api/v1/internal/games/{game_id}/commands`. Authorises the + caller, resolves `actor=race_name`, reshapes the JSON envelope, and + forwards `PUT /api/v1/command` to the engine. +3. `service/orderput` — same shape as `commandexecute`, targeting the + engine `PUT /api/v1/order`. +4. `service/reportget` — orchestrator behind + `GET /api/v1/internal/games/{game_id}/reports/{turn}`. Authorises + the caller, resolves `race_name`, and forwards + `GET /api/v1/report?player=&turn=` to the engine. + +The reference precedent for the orchestrator shape (Input / Result / +Dependencies / NewService / Handle, plus a private `classifyEngineError` +helper) is Stage 15's `service/turngeneration`. Six decisions deviate +from a literal reading of the README, the OpenAPI surface, or the +turngeneration precedent. Each is recorded below. + +## Decisions + +### D1. `reportget` does not require `runtime_records.status = running` + +**Decision.** +[`service/reportget`](../internal/service/reportget/service.go) accepts +any non-deleted runtime row and forwards the read to the engine. +`runtime_not_running` is **not** part of `reportget`'s error vocabulary +([`errors.go`](../internal/service/reportget/errors.go)). +`commandexecute` and `orderput`, by contrast, reject anything other than +`StatusRunning` with `runtime_not_running`. + +**Why.** Three signals point at the same conclusion: + +- The OpenAPI surface for `internalGetReport` + (`api/internal-openapi.yaml` lines 546–575) lists only + `403 / 404 / 502 / 500` responses; there is no 409 / `runtime_not_running` + on the report path. The matching error response on commands and + orders (lines 502, 540) does include 409. +- The README §Reports flow (`../README.md` lines 508–520) lists only + authorisation, race-name resolution, and engine forwarding. The + preceding §Player commands and orders block (lines 492–506) lists the + `status=running` precondition explicitly. The two sections are + separately worded by design. +- A finished or stopped runtime is a normal target for a post-mortem + read of older turns. Refusing the read forces operators to use ad-hoc + database access for the same data the engine already exposes. + +The `engine_unreachable` outcome remains the natural failure mode when +the engine container is genuinely gone (e.g., on `engine_unreachable` +status); no extra branch is required. + +This decision was confirmed with the user during plan-mode review. + +### D2. GM rewrites the engine envelope (`commands` → `cmd`, inject `actor`) + +**Decision.** +[`commandexecute.rewriteCommandPayload`](../internal/service/commandexecute/service.go) +and the parallel +[`orderput.rewriteOrderPayload`](../internal/service/orderput/service.go) +unmarshal the GM `ExecuteCommandsRequest` / `PutOrdersRequest` body as +`map[string]json.RawMessage`, take the `commands` field, and emit a +fresh JSON object containing only `actor` (set to the resolved race +name) and `cmd` (carrying the original array). Every other top-level +key is dropped. The OpenAPI descriptions for `ExecuteCommandsRequest` +and `PutOrdersRequest` were updated in the same patch to document the +rewrite. + +**Why.** The literal "forwarded verbatim" wording in the original +Stage 06 OpenAPI description conflicted with two upstream constraints: + +- The engine `CommandRequest` schema in `game/openapi.yaml` lines + 345–364 declares `actor` and `cmd` as required, with no top-level + `commands`. +- The README §Hot Path rule "GM never trusts a payload field for actor + identification" (`../README.md` lines 487–490) requires GM to set + `actor` from the authenticated user identity. + +Two alternatives were rejected: + +- **Move the rewrite into `engineclient`.** The adapter's role is thin + transport; injecting actor (an authorisation concern) into transport + would muddle the boundary and make the adapter test harness + authorisation-aware. The service is the right home. +- **Inject `actor` only and keep the `commands` key.** The engine schema + requires `cmd`; this would require an engine contract change outside + the Stage 16 scope and break Stage 05's frozen path. + +The transform is duplicated across the two services rather than +extracted to a shared package. Each implementation is twelve lines and +each service is otherwise independent; a shared package would add +import-edge surface for marginal savings, and the project convention is +to prefer the minimal diff (`CLAUDE.md §Priorities`). The duplication is +explicitly documented in both file-level comments. + +This decision was confirmed with the user during plan-mode review. + +### D3. Hot-path services do not append to `operation_log` + +**Decision.** None of the three services emit an `operation_log` entry. +The `Input` shape carries no `OpSource`/`SourceRef` fields. Telemetry +counters +(`gamemaster.command_execute.outcomes`, +`gamemaster.order_put.outcomes`, `gamemaster.report_get.outcomes`) are +the only audit surface. + +**Why.** The `operation.OpKind` enum +(`internal/domain/operation/log.go`) intentionally has no value for +command, order, or report — it stops at admin and lifecycle operations. +Every hot-path call would multiply audit volume by the order rate +without adding investigative value: the telemetry counter already +exposes outcome distribution, and the engine itself is the source of +truth for per-command results. Adding three new `OpKind` values would +also bloat the SQL CHECK on `operation_log` with no operational +consumer. + +### D4. Membership cache uses a hand-rolled per-game inflight tracker + +**Decision.** +[`Cache.fetch`](../internal/service/membership/cache.go) coordinates +concurrent misses on the same `game_id` through a tiny +`map[gameID]*flight` plus a per-flight `done` channel. Joiners block on +`select { case <-existing.done: case <-ctx.Done(): }`. The leader +populates `members` (or `err`) on the flight before closing the channel. + +**Why.** `golang.org/x/sync/singleflight` would be a sharper tool, but +adding it as a *direct* dependency (it is currently only an indirect +transitive of other modules in the workspace) requires the +"justification for direct deps" bar set by `CLAUDE.md §Dependencies`. +The cache is the only consumer in `gamemaster`, the implementation is +~30 lines, and a context-cancellable wait is one extra `select` line we +would otherwise have to wrap around `singleflight.Do` anyway. The +cache-internal helper is the cheaper choice. + +### D5. Cache returns the raw status string + +**Decision.** +[`Cache.Resolve`](../internal/service/membership/cache.go) returns +`(status string, err error)` where the status is the verbatim Lobby +vocabulary (`"active"`, `"removed"`, `"blocked"`) plus the empty string +when the user is not in the roster. Callers compare against +`membershipStatusActive = "active"` directly. There is no typed +wrapper. + +**Why.** `ports.Membership.Status` is already `string` +(`internal/ports/lobbyclient.go` line 56); introducing a `MembershipStatus` +domain type purely to be passed through would add boilerplate without +enforcing any invariant Go's type system can check. The hot-path +services need only a single equality check, so a typed enum buys +nothing; it would also need a fallback for "unknown vocabulary" +defensive against future Lobby additions, which is more decision +surface than the cache should own. + +### D6. Empty roster slot surfaces as `forbidden` + +**Decision.** Two distinct underlying conditions both surface as +`ErrorCodeForbidden` from the three services: + +- The membership cache returns the empty string for the requested + `(gameID, userID)`: the user is not present in the Lobby roster. +- The membership cache returns `"active"` but + `playermappingstore.Get(gameID, userID)` returns + `playermapping.ErrNotFound`: the user is an active platform member + but has no engine roster slot. + +The second condition is an internal inconsistency (register-runtime +should have installed the row), but the user-visible semantics — "you +are not authorised to act on this game" — are identical to the first. +The structured log captures the underlying cause. + +**Why.** Surfacing the second condition as `internal_error` would +expose 500 to a perfectly-routine "user not part of the engine roster" +case and obscure the actual outcome from the gateway and the user. The +inconsistency, if it ever materialises, is an operator concern visible +in the warn-level log and the `forbidden` metric attribution; treating +it as a 5xx would not help operators (who would then ignore the false +alarm) nor users (who only care that they cannot act). + +## Files landed + +**Created:** + +- [`../internal/service/membership/{errors.go, cache.go, cache_test.go}`](../internal/service/membership/) + — concurrent LRU cache plus `ErrLobbyUnavailable` sentinel. +- [`../internal/service/commandexecute/{errors.go, service.go, service_test.go}`](../internal/service/commandexecute/) + — command-execute orchestrator and tests. +- [`../internal/service/orderput/{errors.go, service.go, service_test.go}`](../internal/service/orderput/) + — order-put orchestrator and tests. +- [`../internal/service/reportget/{errors.go, service.go, service_test.go}`](../internal/service/reportget/) + — report-get orchestrator and tests. +- This decision record. + +**Modified:** + +- [`../api/internal-openapi.yaml`](../api/internal-openapi.yaml) — + rewrote the description fields of `ExecuteCommandsRequest` and + `PutOrdersRequest` to document the GM-side envelope rewrite. + +**Reused (not modified):** + +- `internal/ports/{engineclient.go, lobbyclient.go, + playermappingstore.go, runtimerecordstore.go}` — every interface and + sentinel was already present. +- `internal/domain/runtime/model.go` — `StatusRunning` constant + the + whole status vocabulary. +- `internal/domain/playermapping/model.go` — `PlayerMapping` and + `ErrNotFound`. +- `internal/domain/operation/log.go` — `Outcome` enum. +- `internal/config/config.go` — `MembershipCacheConfig.{TTL, MaxGames}` + with defaults `30s` / `4096`. +- `internal/telemetry/runtime.go` — + `RecordCommandExecuteOutcome`, `RecordOrderPutOutcome`, + `RecordReportGetOutcome`, `RecordMembershipCacheResult`, + `RecordEngineCall` (already wired in Stage 08). + +## Verification + +```sh +cd gamemaster + +# Membership cache (race-clean concurrency). +go test -race ./internal/service/membership/... + +# Each new player service. +go test ./internal/service/commandexecute/... +go test ./internal/service/orderput/... +go test ./internal/service/reportget/... + +# Module-wide build + suite. +go build ./... +go test ./... +``` + +Out-of-scope for this stage: app wiring (Stage 19), service-local +integration suite (Stage 21), cross-service Lobby ↔ GM tests (Stage 22). diff --git a/gamemaster/docs/stage17-admin-operations.md b/gamemaster/docs/stage17-admin-operations.md new file mode 100644 index 0000000..06cc7da --- /dev/null +++ b/gamemaster/docs/stage17-admin-operations.md @@ -0,0 +1,264 @@ +--- +stage: 17 +title: Admin operations and Lobby-facing liveness +--- + +# Stage 17 — Admin operations and Lobby-facing liveness + +This decision record captures the non-obvious choices made while +implementing the five Game Master admin/inspect service-layer +operations and the Lobby-facing liveness reply +(`adminstop`, `adminforce`, `adminpatch`, `adminbanish`, +`livenessreply`). Stage 17 is the last service-layer stage before +Stage 18 (health-events consumer) and Stage 19 (REST handlers and +wiring). + +## Context + +[`../PLAN.md` Stage 17](../PLAN.md) ships five services that close +the GM service surface: + +1. `service/adminstop` — orchestrator behind + `POST /api/v1/internal/runtimes/{game_id}/stop`. Calls Runtime + Manager and CASes `runtime_records.status → stopped`. +2. `service/adminforce` — orchestrator behind + `POST /api/v1/internal/runtimes/{game_id}/force-next-turn`. Runs + the inner `service/turngeneration` flow synchronously, then sets + `runtime_records.skip_next_tick = true`. +3. `service/adminpatch` — orchestrator behind + `POST /api/v1/internal/runtimes/{game_id}/patch`. Calls Runtime + Manager and rotates `runtime_records.current_image_ref` plus + `current_engine_version`. +4. `service/adminbanish` — orchestrator behind + `POST /api/v1/internal/games/{game_id}/race/{race_name}/banish`. + Resolves the race and calls the engine `/admin/race/banish`. +5. `service/livenessreply` — orchestrator behind + `GET /api/v1/internal/games/{game_id}/liveness`. Reflects GM's own + view of the runtime without ever calling the engine. + +The reference precedent for the orchestrator shape (`Input` / +`Result` / `Dependencies` / `NewService` / `Handle`) is Stage 13's +`service/registerruntime` and Stage 15's `service/turngeneration`. +Six decisions deviate from a literal reading of the README, the +OpenAPI surface, or the turngeneration precedent. Each is recorded +below. + +## Decisions + +### D1. `RuntimeRecordStore` grows a dedicated `UpdateImage` method + +**Decision.** +[`ports/runtimerecordstore.go`](../internal/ports/runtimerecordstore.go) +adds a new `UpdateImage(ctx, UpdateImageInput) error` method with its +own `UpdateImageInput` struct and `Validate`. The Postgres adapter +gains a matching SQL UPDATE under a CAS guard on `(game_id, status)`. +The existing `UpdateStatus` is **not** repurposed for patch updates. + +**Why.** `UpdateStatusInput.Validate()` (Stage 11) calls +`runtime.Transition(ExpectedFrom, To)` and rejects every pair where +`ExpectedFrom == To`. Patch deliberately keeps the runtime in +`running`, so any attempt to feed `UpdateStatus` with +`ExpectedFrom == To == running` is rejected before the SQL even +runs. Three alternatives were on the table: + +- Drop the `runtime.Transition` invariant from `UpdateStatusInput` + to allow self-transitions. That would weaken the CAS validator + for every existing caller — register-runtime, turngeneration, + health-events consumer — and reintroduce the «accidental no-op + status update» class of bugs the validator was added to catch. +- Introduce a synthetic `runtime.StatusRunning → runtime.StatusRunning` + edge in `domain/runtime/transitions.go`. Same blast radius as + above, only with stronger semantic baggage in the transition table. +- Add a dedicated `UpdateImage` method that only writes the two + image columns plus `updated_at`. Bounded blast radius (one new + method, one new input struct, one new SQL UPDATE), preserves the + CAS invariant, and matches how Stage 11 already separated + `UpdateScheduling` from `UpdateStatus` for the same reason. + +The third option is what shipped. Existing fakes (`registerruntime`, +`turngeneration`, hot-path tests, schedulerticker) carry a no-op +`UpdateImage` stub that returns `errors.New(...)` so a test that +accidentally exercises the new path fails loudly. + +### D2. `adminstop` is idempotent on `stopped` and `finished`, rejects `starting` + +**Decision.** +[`service/adminstop`](../internal/service/adminstop/service.go) reads +the runtime row first; if `Status ∈ {stopped, finished}`, the service +returns `OutcomeSuccess` without calling Runtime Manager and without +publishing a `runtime_snapshot_update`. If `Status == starting`, the +service returns `conflict` with `OutcomeFailure`. Every other +non-terminal status (`running`, `generation_in_progress`, +`generation_failed`, `engine_unreachable`) takes the regular path: +RTM call → CAS → snapshot publication. + +**Why.** The README §Stop says «CAS `runtime_records.status: * → +stopped`» but in practice three edge cases pull the service away +from a literal CAS-only implementation: + +- `stopped` and `finished` are common operator races: an admin clicks + «stop» on a UI list while another admin already pressed it (or the + game finished naturally). Returning `conflict` would force the UI + to retry the read and confuse the operator. Idempotent success is + the smallest-surprise behaviour and matches how Lobby's other + admin-cancel flows handle terminal states. +- `starting` is the active engine-init window. RTM has just been + asked to start the container; an admin stop here would race the + init flow and almost certainly leave the system in a partially + cleaned state. The transition table in Stage 10 deliberately + excludes `starting → stopped` for the same reason. Returning + `conflict` lets the admin tooling surface «runtime is mid-init, + retry in a moment» instead of pretending the stop succeeded. +- The «obvious» fourth path — letting the CAS validator reject + `starting → stopped` and surface that as the natural conflict — + was rejected because it depends on validator implementation + detail leaking through; the explicit pre-CAS check makes the + intent obvious in the audit log and the structured logs. + +The audit log records every pre-CAS rejection with +`outcome=failure / error_code=conflict`, and every idempotent no-op +with `outcome=success`, so operators can distinguish the cases in +post-hoc analysis. + +### D3. `adminforce` always sets `skip_next_tick=true`, even on a finishing turn + +**Decision.** +[`service/adminforce`](../internal/service/adminforce/service.go) +issues `UpdateScheduling{SkipNextTick=true, +NextGenerationAt=turnResult.Record.NextGenerationAt, +CurrentTurn=turnResult.Record.CurrentTurn}` after every successful +inner turn-generation, regardless of whether `Result.Finished` is +`true`. + +**Why.** The cleaner branch — «skip the scheduling write when the +turn just finished the game» — was considered and rejected: + +- `turngeneration` already cleared `next_generation_at` and updated + `current_turn` on the finishing branch (Stage 15 + `completeFinished`). A redundant write that re-affirms those + values plus sets `skip_next_tick=true` does no harm: the row is + already in `status=finished` and no scheduler tick will ever + consume the flag. +- The branchless code is shorter and the test contract is simpler + («adminforce always writes the skip flag on success»). One extra + conditional saves zero SQL on the production path but doubles the + set of cases the test matrix has to assert. +- The README §Force-next-turn wording «After success, set + `runtime_records.skip_next_tick = true`» is unconditional. Adding + a runtime-side branch would silently weaken that contract. + +The driver `op_kind=force_next_turn` audit row records the eventual +outcome (success / failure with the same error code that +turngeneration surfaced) so audit consumers can tell apart a forced +turn that finished the game from a forced turn that prepared the +next regular tick. + +### D4. `adminbanish` does not check runtime status; missing race surfaces as `forbidden` + +**Decision.** +[`service/adminbanish`](../internal/service/adminbanish/service.go) +reads the runtime row only to retrieve the `engine_endpoint`, then +calls `playermappingstore.GetByRace`. A missing row maps to +`error_code=forbidden`. The runtime status itself is **not** +inspected; banish is dispatched even when the runtime is in +`stopped`, `finished`, or `engine_unreachable`. + +**Why.** Two threads informed the choice: + +- README §Banish lists only two preconditions: «runtime exists» + and «`race_name` resolves to an existing player_mappings row». + Adding a status guard would silently extend the contract beyond + what Lobby is allowed to depend on, and would make the banish + flow fail differently from the documented set. +- A banish on a stopped/finished runtime is a no-op at the engine + side (the container is exited or absent). The engine call will + fail with `engine_unreachable`, which is the right error for the + caller to see — it means «the runtime was stopped before banish + could land». Pre-rejecting with a different code would hide the + real state from the operator. + +The `forbidden` mapping for missing race mirrors Stage 16 D6 («empty +roster surfaces as `forbidden`»). The frozen error vocabulary does +not contain a `race_not_found` code, and `forbidden` is the +semantically closest match: «the platform user this race belonged +to is no longer authorised to act on the runtime». + +### D5. `livenessreply` returns 200 / `status=""` on `runtime_not_found` + +**Decision.** +[`service/livenessreply`](../internal/service/livenessreply/service.go) +absorbs `runtime.ErrNotFound` into a successful Result with +`Ready=false` and `Status=runtime.Status("")`. The Go-level error +return is reserved for non-business failures only (nil context, nil +receiver, store-read errors, invalid input). A handler that wraps +this service answers 200 with body `{"ready": false, "status": ""}` +when GM has no record for the requested game. + +**Why.** README §Liveness reply specifies the endpoint «never calls +the engine; it reflects GM's own view only» and explicitly says it +returns 200 even when the runtime is not running. Three response +shapes were considered: + +- 200 with `status="runtime_not_found"`. Mixes runtime-status + values with error codes in the same field, breaking the + caller's enum-match dispatch. +- 404 `runtime_not_found`. Contradicts the README §Liveness reply + «return `200`» wording and forces Lobby's resume flow to add a + 404 handler that means «no observation» — semantically the same + as `Ready=false`. +- 200 with `status=""`. The empty status reads naturally as «GM + has no observation»; Lobby's resume flow already needs to handle + the `Ready=false` branch and the empty status is exactly what + «no observation» looks like in practice. Chosen for the smallest + caller-side complexity. + +### D6. RTM client errors surface as `service_unavailable`, not a dedicated code + +**Decision.** Both `service/adminstop` and `service/adminpatch` map +every error from `RTMClient.Stop` / `RTMClient.Patch` to +`error_code=service_unavailable`, regardless of whether the +underlying failure is `ErrRTMUnavailable`, a wrapped HTTP 5xx, or a +dialler-level transport error. + +**Why.** The frozen error vocabulary in +[`gamemaster/api/internal-openapi.yaml`](../api/internal-openapi.yaml) +does not contain a `runtime_manager_unavailable` code. Three options +were on the table: + +- Add a new code. Rejected: the OpenAPI surface is contract-frozen + from Stage 06 and adding a new error code is a wire-format change + that pulls every consumer into a re-validation. Stage 17 deals + with service-layer code only; no contract change is in scope. +- Map RTM failures to `engine_unreachable`. Rejected: the RTM call + is a sibling-service hop, not an engine call; mixing the two in + a single label confuses operators reading metric / log labels. +- Map RTM failures to `service_unavailable`. Accepted: the + vocabulary already documents `service_unavailable` as «a + steady-state dependency was unreachable for this call», which is + exactly what an RTM outage looks like from GM's perspective. + +The Stage 12 D5 decision record in +[`stage12-external-clients.md`](./stage12-external-clients.md) +already records that the RTM adapter wraps every non-success +outcome in `ports.ErrRTMUnavailable` without distinguishing +sub-cases; Stage 17 simply consumes the unified sentinel. + +## Cross-stage consequences + +- The new port surface `RuntimeRecordStore.UpdateImage` is + available to every later consumer; Stage 18 and Stage 19 do not + use it. Existing hand-rolled fakes carry a no-op stub. +- `OpKindStop`, `OpKindForceNextTurn`, `OpKindPatch`, `OpKindBanish` + were introduced in Stage 09 / Stage 10 already; Stage 17 is their + first writer. +- The telemetry counter `gamemaster.banish.outcomes` (declared in + Stage 08) gets its first call site in `service/adminbanish`. No + new counters are introduced for `adminstop` / `adminforce` / + `adminpatch` / `livenessreply`; the README §Observability list + does not mention them and Stage 17 deliberately stays inside the + declared instrument set. +- The Stage 19 REST handlers consume the five services without + service-layer changes: each handler decodes the JSON envelope, + fills `Input.OpSource` / `Input.SourceRef` from the + `X-Galaxy-Caller` header convention, and translates `Result.ErrorCode` + into the standard error envelope. diff --git a/gamemaster/docs/stage18-health-events-consumer.md b/gamemaster/docs/stage18-health-events-consumer.md new file mode 100644 index 0000000..63676cd --- /dev/null +++ b/gamemaster/docs/stage18-health-events-consumer.md @@ -0,0 +1,171 @@ +--- +stage: 18 +title: runtime:health_events consumer +--- + +# Stage 18 — `runtime:health_events` consumer + +This decision record captures the non-obvious choices made while +implementing the asynchronous consumer of the `runtime:health_events` +Redis Stream produced by Runtime Manager. The consumer translates RTM +observations into three effects on Game Master state: + +1. Updates `runtime_records.engine_health` per game with a short + summary string. +2. For terminal container events applies a CAS + `running → engine_unreachable`; for `probe_recovered` applies the + symmetric recovery CAS `engine_unreachable → running`. +3. Publishes a debounced `runtime_snapshot_update` on `gm:lobby_events` + only when the engine-health summary or the runtime status actually + changed. + +The reference precedent for the worker shape (`Dependencies` / +`NewWorker` / `Run` / `Shutdown` / exported `HandleMessage`) is the +Lobby `gmevents` consumer at `lobby/internal/worker/gmevents`. Seven +decisions deviate from a literal reading of [`../PLAN.md`](../PLAN.md) +or are sharp enough to surface here. + +## Decisions + +### D1. Event-type taxonomy expanded to seven values + +**Decision.** The consumer maps all seven values published by RTM +([`rtmanager/internal/domain/health/snapshot.go`](../../rtmanager/internal/domain/health/snapshot.go)), +not the six listed in PLAN Stage 18. The added values are +`container_started` and `probe_recovered`. Both are mapped to the +summary string `healthy`. `probe_recovered` additionally attempts the +recovery CAS `engine_unreachable → running`. `container_started` does +not transition status — Game Master owns runtime startup through the +register-runtime flow, so RTM's container_started observation is +informational at the consumer level. + +**Why.** The transition table in +[`internal/domain/runtime/transitions.go`](../internal/domain/runtime/transitions.go) +already declares `engine_unreachable → running` with the comment +`reserved for the Stage 18 consumer; declared here so Stage 18 needs +no transitions edit`. The reserved transition is only useful when an +event in the input stream actually triggers it; the only such event in +RTM's vocabulary is `probe_recovered`. Leaving the two extra event +types unmapped would either drop information (if ignored entirely) or +keep the recovery transition forever unreachable. Mapping them now is +the minimum diff that closes the loop. + +### D2. CAS conflict on a status mutation falls back to a health-only update + +**Decision.** When the worker plans a status transition (e.g., +`running → engine_unreachable` for `container_oom`) and +`RuntimeRecordStore.UpdateStatus` returns `runtime.ErrConflict` or +`runtime.ErrInvalidTransition`, the worker logs the conflict at debug +and falls back to `RuntimeRecordStore.UpdateEngineHealth`. The summary +column is refreshed; the status column stays under whatever the +concurrent flow holds. + +**Why.** Two flows can hold the runtime row when an RTM event arrives: +turn generation (`generation_in_progress`) and admin operations +(`stopped`, `finished`). Forcing the consumer to win over those flows +would either reintroduce stale-status writes or require expanding the +allowed-transitions table to include every non-terminal source — the +latter weakens the guard that turn generation relies on. The failure +semantics turn-generation already implements (engine call timeout → +`generation_failed`) cover the case where an `oom` arrives while a +turn is in flight: the engine call from turngeneration will fail +naturally a moment later. The consumer's job in that window is to keep +the summary current so operators see «last known: oom» on +`gm:lobby_events`. + +### D3. New port method `UpdateEngineHealth` + +**Decision.** [`internal/ports/runtimerecordstore.go`](../internal/ports/runtimerecordstore.go) +gains a new method `UpdateEngineHealth(ctx, UpdateEngineHealthInput) error` +with its own input struct and `Validate`. The Postgres adapter gains a +matching `UPDATE runtime_records SET engine_health = $1, updated_at = +$2 WHERE game_id = $3`. The existing `UpdateStatus` is **not** +repurposed for health-only updates. + +**Why.** `UpdateStatusInput.Validate` calls +`runtime.Transition(ExpectedFrom, To)` and rejects every pair where +`ExpectedFrom == To` (Stage 17 D1). A health-only update keeps the +runtime in its current status, so any attempt to feed `UpdateStatus` +with `ExpectedFrom == To` is rejected before the SQL even runs. The +same precedent led Stage 17 to add `UpdateImage` rather than relax the +self-transition guard. Stage 18 follows that precedent. + +In addition, the health update is not gated on a CAS at all: late- +arriving events should still bookkeep the summary regardless of the +current status (including `stopped` and `finished`). A guarded +`UpdateStatus`-shaped variant would have to enumerate every source +status the consumer might observe; an unguarded `UpdateEngineHealth` +sidesteps the question. + +### D4. In-memory dedupe of last-emitted summaries per game + +**Decision.** The worker keeps a `map[string]string` (`gameID → +lastEmittedSummary`) under a `sync.RWMutex`. A snapshot is published +when either the status transitioned in this iteration or when the new +summary differs from the cached one for the same game. The cache is +process-local; on restart it is empty. + +**Why.** [`./README.md` §`gm:lobby_events`](../README.md) freezes the +publication rule: snapshots are emitted on transitions and on health- +summary changes («debounced — duplicates are suppressed when the +summary did not change»). Stage 18 chooses an in-process map over a +Redis-backed dedupe for two reasons: + +1. Game Master is single-instance in v1 + ([`./README.md §Non-Goals`](../README.md)); a per-process map is + sufficient for v1 correctness. +2. Losing the cache on restart causes at most one extra snapshot per + game right after restart — Lobby's `gmevents` consumer is + idempotent (CAS-protected status transitions, deterministic + snapshot blob), so the extra emission is benign. + +A Redis-backed dedupe is cheap to introduce later if multi-instance +Game Master ever lands; until then the simpler choice ships less code. + +### D5. Snapshot construction reads the runtime row again after the mutation + +**Decision.** Whenever the worker decides to publish, it re-reads the +runtime record (`RuntimeRecordStore.Get`) and builds the +`RuntimeSnapshotUpdate` from that fresh row. The `EngineHealthSummary`, +`RuntimeStatus`, and `CurrentTurn` fields therefore reflect whatever +the database holds after the mutation, rather than what the worker +just intended to write. + +**Why.** Two paths can produce the same publish decision: the CAS +succeeded (status changed, summary changed), or the CAS conflicted and +the fallback `UpdateEngineHealth` took over (status unchanged from the +worker's point of view, but possibly mutated by a concurrent flow +between the conflict and the read). A single read-after-write reduces +both paths to the same envelope-building code and keeps the snapshot +honest about what is actually in the database. `PlayerTurnStats` is +intentionally left as `nil`: the consumer does not have a fresh engine +state payload, so per-player stats stay empty until the next turn +(this matches [`./README.md §`gm:lobby_events`] for status-only +transitions). + +### D6. Stream-offset label is `health_events` + +**Decision.** The consumer uses the short label `health_events` for +`StreamOffsetStore.Load` / `Save`. The corresponding Redis key is +`gamemaster:stream_offsets:health_events`. + +**Why.** The label convention is documented in +[`./README.md §Persistence Layout / Redis runtime-coordination state`](../README.md): +short logical identifier of the consumer, stable across renames of the +underlying stream key. The Lobby `gmevents` consumer follows the same +shape (`gm_lobby_events`). + +### D7. Worker wiring deferred to Stage 19 + +**Decision.** Stage 18 ships the worker package and unit/loop tests but +does not register the worker as an `app.Component` in +`internal/app/runtime.go`. Wiring is deferred to Stage 19. + +**Why.** The same pattern is already in place for the scheduler ticker +introduced at Stage 15: the worker exists in the source tree but is +not wired into `runtime.app = New(cfg, internalServer)`. Stage 19 +explicitly bundles handler wiring with worker wiring (see PLAN +Stage 19), so deferring is consistent with the precedent. The +configuration values the wiring will need (stream name, block timeout, +offset-store DSN) are already loaded by `internal/config` and were +introduced in Stage 08. diff --git a/gamemaster/docs/stage19-internal-rest-handlers.md b/gamemaster/docs/stage19-internal-rest-handlers.md new file mode 100644 index 0000000..dd6041d --- /dev/null +++ b/gamemaster/docs/stage19-internal-rest-handlers.md @@ -0,0 +1,230 @@ +--- +stage: 19 +title: Internal REST handlers +--- + +# Stage 19 — Internal REST handlers + +This decision record captures the non-obvious choices made while +bringing the trusted internal REST listener of Game Master to full +contract coverage. The handlers wire the existing service layer +(stages 13–17) and the membership cache (stage 16) to the eighteen +operations frozen by +[`../api/internal-openapi.yaml`](../api/internal-openapi.yaml). The +listener lifecycle, OpenTelemetry middleware, and the `/healthz` / +`/readyz` probes were established in stage 08; this stage adds the +per-operation handler subpackage, widens the listener `Dependencies` +struct to thread every service port, and grows +[`../internal/app/wiring.go`](../internal/app/wiring.go) to construct +the entire dependency graph (stores, adapters, services, workers). + +The reference precedent for the handler shape is the rtmanager +`internal/api/internalhttp/handlers` tree; the conformance test +mirrors `rtmanager/internal/api/internalhttp/conformance_test.go`. +Eight decisions deviate from a literal reading of +[`../PLAN.md`](../PLAN.md) or are sharp enough to surface here. + +## Decisions + +### D1. Conformance test lives inside the listener package + +**Decision.** The OpenAPI conformance test ships at +[`../internal/api/internalhttp/conformance_test.go`](../internal/api/internalhttp/conformance_test.go), +in the `internalhttp` package, not at +`gamemaster/api/openapi_conformance_test.go` as the literal text of +PLAN.md Stage 19 suggests. + +**Why.** The test instantiates the live `Server.handler` through +`NewServer(...)` with stub services and replays each documented +operation against it. That requires reading the unexported +`handler` field and wiring stub implementations of the +handler-package interfaces; both are package-internal concerns that a +sibling test under `gamemaster/api/` would not have access to without +exporting hooks that exist solely for the test. The rtmanager +service ships the analogous test inside its own `internalhttp` +package; we follow the same idiom. + +**How to apply.** Future surface-shape audits go in this file. +PLAN.md text is treated as a drift; the constraint that the spec is +covered by a kin-openapi-driven validation is honoured exactly. + +### D2. `DELETE /engine-versions/{version}` calls `Service.Deprecate` + +**Decision.** The handler bound to the OpenAPI operation +`internalDeprecateEngineVersion` calls +[`engineversion.Service.Deprecate`](../internal/service/engineversion/service.go) +and never `Service.Delete`. The 409 response declared by the +spec for `engine_version_in_use` is therefore unreachable on this +endpoint. + +**Why.** The operation id and the first sentence of the description +explicitly say «Sets the engine version status to `deprecated`». The +sentence about hard removal and `engine_version_in_use` is a +leftover of an earlier intent — `Service.Deprecate` does not consult +`IsReferencedByActiveRuntime`, so the in-use rejection cannot fire +through this code path. Hard delete is a future Admin Service +operation; v1 does not expose it through REST. + +**How to apply.** Calls that need to release the registry row +permanently must use `Service.Delete` directly (not yet wired through +REST). The spec's leftover 409 example is recorded here so a future +contract reviewer does not chase a phantom failure mode. + +### D3. Workers wired and started alongside the listener + +**Decision.** This stage constructs the scheduler ticker (stage 15) +and the runtime:health_events consumer (stage 18) inside +`wiring.buildWorkers` and registers them as `App.Component`-s next +to the internal HTTP server. + +**Why.** Stage 19's narrow text says «ship the gateway-, Lobby- and +Admin-facing REST surface backed by the service layer». But the +service layer collaborators referenced from the listener (turn +generation, membership cache, runtime record store, etc.) only make +sense inside a process that is also producing turns and consuming +health events. Keeping the workers idle would leave the wiring graph +half-built and the dev experience surprising. Constructing and +starting them here makes a freshly-deployed process production-ready +the moment the listener accepts traffic. + +**How to apply.** The two workers are owned by `App.Run` exactly +like the listener: both `Run` (long-lived) and `Shutdown` are part +of `App.Component`. See D4 for the trivial `Shutdown` added on the +scheduler ticker. + +### D4. `schedulerticker.Worker.Shutdown` is a no-op + +**Decision.** The scheduler ticker adds a one-line +`Shutdown(_ context.Context) error { return nil }` so the type +satisfies `app.Component`. + +**Why.** The worker's `Run` already returns when the supplied +context is cancelled, and `wg.Wait` drains the in-flight per-game +goroutines before `Run` returns. There is nothing additional to +release. The `healtheventsconsumer.Worker` already had a `Shutdown` +from stage 18; this just brings the two workers to the same shape. + +**How to apply.** When future workers grow real shutdown logic +(buffered output to flush, persistent connections to drain), they +should embed it inside `Shutdown` rather than relying on context +cancellation alone. + +### D5. New `RuntimeRecordStore.List(ctx)` method + +**Decision.** The port grows a fifth read method: +`List(ctx) ([]runtime.RuntimeRecord, error)`. The PostgreSQL +adapter implements it as one SELECT ordered by +`(created_at DESC, game_id ASC)`. + +**Why.** The OpenAPI operation `internalListRuntimes` accepts an +optional `status` query parameter. With the parameter set, the +existing `ListByStatus` answers; without it, no method on the port +returned every record. Composing the unfiltered list as a +loop-over-statuses would dilute the ordering guarantee and double +the round-trip cost. The new method is additive — every other +caller keeps using its narrow read. + +**How to apply.** Test fakes (`fakeRuntimeRecords` in service tests, +`fakeRuntimeRecordsBackend` in scheduler-ticker tests) gained the +method as well. The handler-side `RuntimeRecordsReader` interface +exposes only the three read methods (`Get`, `List`, `ListByStatus`) +so the listener cannot accidentally mutate runtime state. + +### D6. `next_generation_at` encodes as `0` when unscheduled + +**Decision.** The wire `RuntimeRecord.next_generation_at` field is +declared `required: true` and `format: int64`. The domain holds +`*time.Time` and may carry `nil` — typically while a runtime is in +status `starting` and the first scheduling write has not yet +landed. The encoder writes `0` in that case and writes the UTC +millisecond value otherwise. + +**Why.** Encoding `nil` as `0` keeps the wire shape JSON-Schema-valid +without forcing every record reader to handle a missing field. +Optional pointer-typed timestamps (`started_at`, `stopped_at`, +`finished_at`) are still omitted from the JSON form via `omitempty`, +matching the `required` list in the spec. + +**How to apply.** Readers must treat `next_generation_at == 0` as +«not yet scheduled» when the status warrants it; the field will +turn into a real Unix-millisecond value once the scheduler's first +write lands. The conformance test seeds a non-nil +`NextGenerationAt`, so the strict response validator never sees +this edge case at the wire boundary. + +### D7. Hot-path bodies are pass-through, not strict-decoded + +**Decision.** Handlers `internalExecuteCommands`, `internalPutOrders` +read the request body as raw bytes. The body is rejected only when +empty or not valid JSON; unknown fields pass through. + +**Why.** The OpenAPI request schemas for these three operations carry +`additionalProperties: true` because the envelopes are engine-owned +(`galaxy/game/openapi.yaml`). Strict decoding here would reject +legitimate engine extensions and force every contract bump to land +in two services in lockstep. + +**How to apply.** Engine `engine_validation_error` responses still +surface as the canonical Game Master error envelope at HTTP 502 — +the engine response body is recorded in `result.RawResponse` for +audit but the OpenAPI spec mandates the error envelope on this code +path. If a future contract version requires forwarding the engine's +4xx body to the gateway, a separate response shape needs to land in +the spec first. + +### D8. `X-Galaxy-Caller` mapping with admin default + +**Decision.** The `resolveOpSource` helper maps the +`X-Galaxy-Caller` header values to +[`operation.OpSource`](../internal/domain/operation/log.go) as +follows: `gateway → OpSourceGatewayPlayer`, +`lobby → OpSourceLobbyInternal`, `admin → OpSourceAdminRest`. +Missing or unrecognised values fall back to `OpSourceAdminRest`, +matching the contract documented in +[`../README.md` §«Internal REST API»](../README.md). + +**Why.** The default is conservative: an Admin Service request +without the header still records as admin instead of being dropped. +The other two values are reserved for the documented callers and +trim/lowercase tolerantly so a casing slip in development does not +produce a confusing audit row. + +**How to apply.** New REST callers should set the header +explicitly. Adding a fourth caller type requires an `OpSource` +constant alongside the mapping change. + +## What ships + +- Eighteen operation handlers under + [`../internal/api/internalhttp/handlers`](../internal/api/internalhttp/handlers). +- The probe-only `internal/api/internalhttp/server.go` now widens + `Dependencies` and forwards the per-operation services to + `handlers.Register`. +- Full dependency graph in + [`../internal/app/wiring.go`](../internal/app/wiring.go): five + stores, five external adapters, eleven services, two workers. +- `RuntimeRecordStore.List(ctx)` plus its PostgreSQL adapter + implementation and regression tests + ([`../internal/adapters/postgres/runtimerecordstore`](../internal/adapters/postgres/runtimerecordstore)). +- `schedulerticker.Worker.Shutdown` so the worker is an + `App.Component`. +- Mockgen-generated handler-port mocks under + [`../internal/api/internalhttp/handlers/mocks`](../internal/api/internalhttp/handlers/mocks). +- A kin-openapi-driven conformance test + ([`../internal/api/internalhttp/conformance_test.go`](../internal/api/internalhttp/conformance_test.go)) + that validates request and response shapes for every documented + operation against + [`../api/internal-openapi.yaml`](../api/internal-openapi.yaml). +- Per-handler unit tests covering happy paths, error-code mapping, + unknown-field rejection, and header validation. + +## What remains for later stages + +- Lobby refactor (stage 20) flips Lobby's start flow to call + `GET /api/v1/internal/engine-versions/{version}/image-ref` + synchronously and adds the `InvalidateMemberships` outbound call + on every roster mutation. +- Service-local integration suite (stage 21) drives the listener + end-to-end against a real engine container. +- Cross-service integration tests (stages 22–23) cover Lobby + GM, + Lobby + GM + RTM happy and failure paths. diff --git a/gamemaster/go.mod b/gamemaster/go.mod new file mode 100644 index 0000000..70102bc --- /dev/null +++ b/gamemaster/go.mod @@ -0,0 +1,128 @@ +module galaxy/gamemaster + +go 1.26.2 + +require ( + galaxy/cronutil v0.0.0-00010101000000-000000000000 + galaxy/notificationintent v0.0.0-00010101000000-000000000000 + galaxy/postgres v0.0.0-00010101000000-000000000000 + galaxy/redisconn v0.0.0-00010101000000-000000000000 + github.com/alicebob/miniredis/v2 v2.37.0 + github.com/getkin/kin-openapi v0.135.0 + github.com/go-jet/jet/v2 v2.14.1 + github.com/jackc/pgx/v5 v5.9.2 + github.com/redis/go-redis/v9 v9.18.0 + github.com/stretchr/testify v1.11.1 + github.com/testcontainers/testcontainers-go v0.42.0 + github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0 + go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.68.0 + go.opentelemetry.io/otel v1.43.0 + go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.43.0 + go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.43.0 + go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0 + go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0 + go.opentelemetry.io/otel/exporters/stdout/stdoutmetric v1.43.0 + go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.43.0 + go.opentelemetry.io/otel/metric v1.43.0 + go.opentelemetry.io/otel/sdk v1.43.0 + go.opentelemetry.io/otel/sdk/metric v1.43.0 + go.opentelemetry.io/otel/trace v1.43.0 + golang.org/x/mod v0.35.0 + gopkg.in/yaml.v3 v3.0.1 +) + +require ( + dario.cat/mergo v1.0.2 // indirect + github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c // indirect + github.com/Microsoft/go-winio v0.6.2 // indirect + github.com/XSAM/otelsql v0.42.0 // indirect + github.com/cenkalti/backoff/v4 v4.3.0 // indirect + github.com/cenkalti/backoff/v5 v5.0.3 // indirect + github.com/cespare/xxhash/v2 v2.3.0 // indirect + github.com/containerd/errdefs v1.0.0 // indirect + github.com/containerd/errdefs/pkg v0.3.0 // indirect + github.com/containerd/log v0.1.0 // indirect + github.com/containerd/platforms v0.2.1 // indirect + github.com/cpuguy83/dockercfg v0.3.2 // indirect + github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect + github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect + github.com/distribution/reference v0.6.0 // indirect + github.com/docker/go-connections v0.7.0 // indirect + github.com/docker/go-units v0.5.0 // indirect + github.com/ebitengine/purego v0.10.0 // indirect + github.com/felixge/httpsnoop v1.0.4 // indirect + github.com/go-logr/logr v1.4.3 // indirect + github.com/go-logr/stdr v1.2.2 // indirect + github.com/go-ole/go-ole v1.2.6 // indirect + github.com/go-openapi/jsonpointer v0.21.0 // indirect + github.com/go-openapi/swag v0.23.0 // indirect + github.com/google/uuid v1.6.0 // indirect + github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 // indirect + github.com/jackc/chunkreader/v2 v2.0.1 // indirect + github.com/jackc/pgconn v1.14.3 // indirect + github.com/jackc/pgio v1.0.0 // indirect + github.com/jackc/pgpassfile v1.0.0 // indirect + github.com/jackc/pgproto3/v2 v2.3.3 // indirect + github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect + github.com/jackc/pgtype v1.14.4 // indirect + github.com/jackc/puddle/v2 v2.2.2 // indirect + github.com/josharian/intern v1.0.0 // indirect + github.com/klauspost/compress v1.18.5 // indirect + github.com/lib/pq v1.10.9 // indirect + github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect + github.com/magiconair/properties v1.8.10 // indirect + github.com/mailru/easyjson v0.7.7 // indirect + github.com/mfridman/interpolate v0.0.2 // indirect + github.com/moby/docker-image-spec v1.3.1 // indirect + github.com/moby/go-archive v0.2.0 // indirect + github.com/moby/moby/api v1.54.2 // indirect + github.com/moby/moby/client v0.4.1 // indirect + github.com/moby/patternmatcher v0.6.1 // indirect + github.com/moby/sys/sequential v0.6.0 // indirect + github.com/moby/sys/user v0.4.0 // indirect + github.com/moby/sys/userns v0.1.0 // indirect + github.com/moby/term v0.5.2 // indirect + github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect + github.com/oasdiff/yaml v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect + github.com/opencontainers/go-digest v1.0.0 // indirect + github.com/opencontainers/image-spec v1.1.1 // indirect + github.com/perimeterx/marshmallow v1.1.5 // indirect + github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect + github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect + github.com/pressly/goose/v3 v3.27.1 // indirect + github.com/redis/go-redis/extra/rediscmd/v9 v9.18.0 // indirect + github.com/redis/go-redis/extra/redisotel/v9 v9.18.0 // indirect + github.com/robfig/cron/v3 v3.0.1 // indirect + github.com/sethvargo/go-retry v0.3.0 // indirect + github.com/shirou/gopsutil/v4 v4.26.3 // indirect + github.com/sirupsen/logrus v1.9.4 // indirect + github.com/tklauser/go-sysconf v0.3.16 // indirect + github.com/tklauser/numcpus v0.11.0 // indirect + github.com/ugorji/go/codec v1.3.1 // indirect + github.com/woodsbury/decimal128 v1.3.0 // indirect + github.com/yuin/gopher-lua v1.1.1 // indirect + github.com/yusufpapurcu/wmi v1.2.4 // indirect + go.opentelemetry.io/auto/sdk v1.2.1 // indirect + go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0 // indirect + go.opentelemetry.io/proto/otlp v1.10.0 // indirect + go.uber.org/atomic v1.11.0 // indirect + go.uber.org/multierr v1.11.0 // indirect + golang.org/x/crypto v0.50.0 // indirect + golang.org/x/net v0.53.0 // indirect + golang.org/x/sync v0.20.0 // indirect + golang.org/x/sys v0.43.0 // indirect + golang.org/x/text v0.36.0 // indirect + google.golang.org/genproto/googleapis/api v0.0.0-20260401024825-9d38bb4040a9 // indirect + google.golang.org/genproto/googleapis/rpc v0.0.0-20260420184626-e10c466a9529 // indirect + google.golang.org/grpc v1.80.0 // indirect + google.golang.org/protobuf v1.36.11 // indirect +) + +replace galaxy/cronutil => ../pkg/cronutil + +replace galaxy/notificationintent => ../pkg/notificationintent + +replace galaxy/postgres => ../pkg/postgres + +replace galaxy/redisconn => ../pkg/redisconn diff --git a/gamemaster/go.sum b/gamemaster/go.sum new file mode 100644 index 0000000..7dd0cc6 --- /dev/null +++ b/gamemaster/go.sum @@ -0,0 +1,463 @@ +dario.cat/mergo v1.0.2 h1:85+piFYR1tMbRrLcDwR18y4UKJ3aH1Tbzi24VRW1TK8= +dario.cat/mergo v1.0.2/go.mod h1:E/hbnu0NxMFBjpMIE34DRGLWqDy0g5FuKDhCb31ngxA= +github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6 h1:He8afgbRMd7mFxO99hRNu+6tazq8nFF9lIwo9JFroBk= +github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6/go.mod h1:8o94RPi1/7XTJvwPpRSzSUedZrtlirdB3r9Z20bi2f8= +github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c h1:udKWzYgxTojEKWjV8V+WSxDXJ4NFATAsZjh8iIbsQIg= +github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E= +github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU= +github.com/Masterminds/semver/v3 v3.1.1/go.mod h1:VPu/7SZ7ePZ3QOrcuXROw5FAcLl4a0cBrbBpGY/8hQs= +github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY= +github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU= +github.com/XSAM/otelsql v0.42.0 h1:Li0xF4eJUxG2e0x3D4rvRlys1f27yJKvjTh7ljkUP5o= +github.com/XSAM/otelsql v0.42.0/go.mod h1:4mOrEv+cS1KmKzrvTktvJnstr5GtKSAK+QHvFR9OcpI= +github.com/alicebob/miniredis/v2 v2.37.0 h1:RheObYW32G1aiJIj81XVt78ZHJpHonHLHW7OLIshq68= +github.com/alicebob/miniredis/v2 v2.37.0/go.mod h1:TcL7YfarKPGDAthEtl5NBeHZfeUQj6OXMm/+iu5cLMM= +github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs= +github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c= +github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA= +github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0= +github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8= +github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE= +github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM= +github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw= +github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= +github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= +github.com/cockroachdb/apd v1.1.0/go.mod h1:8Sl8LxpKi29FqWXR16WEFZRNSz3SoPzUzeMeY4+DwBQ= +github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI= +github.com/containerd/errdefs v1.0.0/go.mod h1:+YBYIdtsnF4Iw6nWZhJcqGSg/dwvV7tyJ/kCkyJ2k+M= +github.com/containerd/errdefs/pkg v0.3.0 h1:9IKJ06FvyNlexW690DXuQNx2KA2cUJXx151Xdx3ZPPE= +github.com/containerd/errdefs/pkg v0.3.0/go.mod h1:NJw6s9HwNuRhnjJhM7pylWwMyAkmCQvQ4GpJHEqRLVk= +github.com/containerd/log v0.1.0 h1:TCJt7ioM2cr/tfR8GPbGf9/VRAX8D2B4PjzCpfX540I= +github.com/containerd/log v0.1.0/go.mod h1:VRRf09a7mHDIRezVKTRCrOq78v577GXq3bSa3EhrzVo= +github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpSBQv6A= +github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw= +github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4= +github.com/coreos/go-systemd v0.0.0-20190719114852-fd7a80b32e1f/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4= +github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA= +github.com/cpuguy83/dockercfg v0.3.2/go.mod h1:sugsbF4//dDlL/i+S+rtpIWp+5h0BHJHfjj5/jFyUJc= +github.com/creack/pty v1.1.7/go.mod h1:lj5s0c3V2DBrqTV7llrYr5NG6My20zk30Fl46Y7DoTY= +github.com/creack/pty v1.1.24 h1:bJrF4RRfyJnbTJqzRLHzcGaZK1NeM5kTC9jGgovnR1s= +github.com/creack/pty v1.1.24/go.mod h1:08sCNb52WyoAwi2QDyzUCTgcvVFhUzewun7wtTfvcwE= +github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc= +github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5QvfrDyIgxBk= +github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E= +github.com/docker/go-connections v0.7.0 h1:6SsRfJddP22WMrCkj19x9WKjEDTB+ahsdiGYf0mN39c= +github.com/docker/go-connections v0.7.0/go.mod h1:no1qkHdjq7kLMGUXYAduOhYPSJxxvgWBh7ogVvptn3Q= +github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4= +github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk= +github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY= +github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto= +github.com/ebitengine/purego v0.10.0 h1:QIw4xfpWT6GWTzaW5XEKy3HXoqrJGx1ijYHzTF0/ISU= +github.com/ebitengine/purego v0.10.0/go.mod h1:iIjxzd6CiRiOG0UyXP+V1+jWqUXVjPKLAI0mRfJZTmQ= +github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg= +github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U= +github.com/getkin/kin-openapi v0.135.0 h1:751SjYfbiwqukYuVjwYEIKNfrSwS5YpA7DZnKSwQgtg= +github.com/getkin/kin-openapi v0.135.0/go.mod h1:6dd5FJl6RdX4usBtFBaQhk9q62Yb2J0Mk5IhUO/QqFI= +github.com/go-jet/jet/v2 v2.14.1 h1:wsfD9e7CGP9h46+IFNlftfncBcmVnKddikbTtapQM3M= +github.com/go-jet/jet/v2 v2.14.1/go.mod h1:dqTAECV2Mo3S2NFjbm4vJ1aDruZjhaJ1RAAR8rGUkkc= +github.com/go-kit/log v0.1.0/go.mod h1:zbhenjAZHb184qTLMA9ZjW7ThYL0H2mk7Q6pNt4vbaY= +github.com/go-logfmt/logfmt v0.5.0/go.mod h1:wCYkCAKZfumFQihp8CzCvQ3paCTfi41vtzG1KdI/P7A= +github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A= +github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI= +github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= +github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag= +github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE= +github.com/go-ole/go-ole v1.2.6 h1:/Fpf6oFPoeFik9ty7siob0G6Ke8QvQEuVcuChpwXzpY= +github.com/go-ole/go-ole v1.2.6/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0= +github.com/go-openapi/jsonpointer v0.21.0 h1:YgdVicSA9vH5RiHs9TZW5oyafXZFc6+2Vc1rr/O9oNQ= +github.com/go-openapi/jsonpointer v0.21.0/go.mod h1:IUyH9l/+uyhIYQ/PXVA41Rexl+kOkAPDdXEYns6fzUY= +github.com/go-openapi/swag v0.23.0 h1:vsEVJDUo2hPJ2tu0/Xc+4noaxyEffXNIs3cOULZ+GrE= +github.com/go-openapi/swag v0.23.0/go.mod h1:esZ8ITTYEsH1V2trKHjAN8Ai7xHb8RV+YSZ577vPjgQ= +github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY= +github.com/go-test/deep v1.0.8 h1:TDsG77qcSprGbC6vTN8OuXp5g+J+b5Pcguhf7Zt61VM= +github.com/go-test/deep v1.0.8/go.mod h1:5C2ZWiW0ErCdrYzpqxLbTX7MG14M9iiw8DgHncVwcsE= +github.com/gofrs/uuid v4.0.0+incompatible/go.mod h1:b2aQJv3Z4Fp6yNu3cdSllBxTCLRxnplIgP/c0N/04lM= +github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek= +github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= +github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= +github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI= +github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= +github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= +github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 h1:HWRh5R2+9EifMyIHV7ZV+MIZqgz+PMpZ14Jynv3O2Zs= +github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0/go.mod h1:JfhWUomR1baixubs02l85lZYYOm7LV6om4ceouMv45c= +github.com/jackc/chunkreader v1.0.0/go.mod h1:RT6O25fNZIuasFJRyZ4R/Y2BbhasbmZXF9QQ7T3kePo= +github.com/jackc/chunkreader/v2 v2.0.0/go.mod h1:odVSm741yZoC3dpHEUXIqA9tQRhFrgOHwnPIn9lDKlk= +github.com/jackc/chunkreader/v2 v2.0.1 h1:i+RDz65UE+mmpjTfyz0MoVTnzeYxroil2G82ki7MGG8= +github.com/jackc/chunkreader/v2 v2.0.1/go.mod h1:odVSm741yZoC3dpHEUXIqA9tQRhFrgOHwnPIn9lDKlk= +github.com/jackc/pgconn v0.0.0-20190420214824-7e0022ef6ba3/go.mod h1:jkELnwuX+w9qN5YIfX0fl88Ehu4XC3keFuOJJk9pcnA= +github.com/jackc/pgconn v0.0.0-20190824142844-760dd75542eb/go.mod h1:lLjNuW/+OfW9/pnVKPazfWOgNfH2aPem8YQ7ilXGvJE= +github.com/jackc/pgconn v0.0.0-20190831204454-2fabfa3c18b7/go.mod h1:ZJKsE/KZfsUgOEh9hBm+xYTstcNHg7UPMVJqRfQxq4s= +github.com/jackc/pgconn v1.8.0/go.mod h1:1C2Pb36bGIP9QHGBYCjnyhqu7Rv3sGshaQUvmfGIB/o= +github.com/jackc/pgconn v1.9.0/go.mod h1:YctiPyvzfU11JFxoXokUOOKQXQmDMoJL9vJzHH8/2JY= +github.com/jackc/pgconn v1.9.1-0.20210724152538-d89c8390a530/go.mod h1:4z2w8XhRbP1hYxkpTuBjTS3ne3J48K83+u0zoyvg2pI= +github.com/jackc/pgconn v1.14.3 h1:bVoTr12EGANZz66nZPkMInAV/KHD2TxH9npjXXgiB3w= +github.com/jackc/pgconn v1.14.3/go.mod h1:RZbme4uasqzybK2RK5c65VsHxoyaml09lx3tXOcO/VM= +github.com/jackc/pgio v1.0.0 h1:g12B9UwVnzGhueNavwioyEEpAmqMe1E/BN9ES+8ovkE= +github.com/jackc/pgio v1.0.0/go.mod h1:oP+2QK2wFfUWgr+gxjoBH9KGBb31Eio69xUb0w5bYf8= +github.com/jackc/pgmock v0.0.0-20190831213851-13a1b77aafa2/go.mod h1:fGZlG77KXmcq05nJLRkk0+p82V8B8Dw8KN2/V9c/OAE= +github.com/jackc/pgmock v0.0.0-20201204152224-4fe30f7445fd/go.mod h1:hrBW0Enj2AZTNpt/7Y5rr2xe/9Mn757Wtb2xeBzPv2c= +github.com/jackc/pgmock v0.0.0-20210724152146-4ad1a8207f65 h1:DadwsjnMwFjfWc9y5Wi/+Zz7xoE5ALHsRQlOctkOiHc= +github.com/jackc/pgmock v0.0.0-20210724152146-4ad1a8207f65/go.mod h1:5R2h2EEX+qri8jOWMbJCtaPWkrrNc7OHwsp2TCqp7ak= +github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM= +github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg= +github.com/jackc/pgproto3 v1.1.0/go.mod h1:eR5FA3leWg7p9aeAqi37XOTgTIbkABlvcPB3E5rlc78= +github.com/jackc/pgproto3/v2 v2.0.0-alpha1.0.20190420180111-c116219b62db/go.mod h1:bhq50y+xrl9n5mRYyCBFKkpRVTLYJVWeCc+mEAI3yXA= +github.com/jackc/pgproto3/v2 v2.0.0-alpha1.0.20190609003834-432c2951c711/go.mod h1:uH0AWtUmuShn0bcesswc4aBTWGvw0cAxIJp+6OB//Wg= +github.com/jackc/pgproto3/v2 v2.0.0-rc3/go.mod h1:ryONWYqW6dqSg1Lw6vXNMXoBJhpzvWKnT95C46ckYeM= +github.com/jackc/pgproto3/v2 v2.0.0-rc3.0.20190831210041-4c03ce451f29/go.mod h1:ryONWYqW6dqSg1Lw6vXNMXoBJhpzvWKnT95C46ckYeM= +github.com/jackc/pgproto3/v2 v2.0.6/go.mod h1:WfJCnwN3HIg9Ish/j3sgWXnAfK8A9Y0bwXYU5xKaEdA= +github.com/jackc/pgproto3/v2 v2.1.1/go.mod h1:WfJCnwN3HIg9Ish/j3sgWXnAfK8A9Y0bwXYU5xKaEdA= +github.com/jackc/pgproto3/v2 v2.3.3 h1:1HLSx5H+tXR9pW3in3zaztoEwQYRC9SQaYUHjTSUOag= +github.com/jackc/pgproto3/v2 v2.3.3/go.mod h1:WfJCnwN3HIg9Ish/j3sgWXnAfK8A9Y0bwXYU5xKaEdA= +github.com/jackc/pgservicefile v0.0.0-20200714003250-2b9c44734f2b/go.mod h1:vsD4gTJCa9TptPL8sPkXrLZ+hDuNrZCnj29CQpr4X1E= +github.com/jackc/pgservicefile v0.0.0-20221227161230-091c0ba34f0a/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM= +github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo= +github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM= +github.com/jackc/pgtype v0.0.0-20190421001408-4ed0de4755e0/go.mod h1:hdSHsc1V01CGwFsrv11mJRHWJ6aifDLfdV3aVjFF0zg= +github.com/jackc/pgtype v0.0.0-20190824184912-ab885b375b90/go.mod h1:KcahbBH1nCMSo2DXpzsoWOAfFkdEtEJpPbVLq8eE+mc= +github.com/jackc/pgtype v0.0.0-20190828014616-a8802b16cc59/go.mod h1:MWlu30kVJrUS8lot6TQqcg7mtthZ9T0EoIBFiJcmcyw= +github.com/jackc/pgtype v1.8.1-0.20210724151600-32e20a603178/go.mod h1:C516IlIV9NKqfsMCXTdChteoXmwgUceqaLfjg2e3NlM= +github.com/jackc/pgtype v1.14.0/go.mod h1:LUMuVrfsFfdKGLw+AFFVv6KtHOFMwRgDDzBt76IqCA4= +github.com/jackc/pgtype v1.14.4 h1:fKuNiCumbKTAIxQwXfB/nsrnkEI6bPJrrSiMKgbJ2j8= +github.com/jackc/pgtype v1.14.4/go.mod h1:aKeozOde08iifGosdJpz9MBZonJOUJxqNpPBcMJTlVA= +github.com/jackc/pgx/v4 v4.0.0-20190420224344-cc3461e65d96/go.mod h1:mdxmSJJuR08CZQyj1PVQBHy9XOp5p8/SHH6a0psbY9Y= +github.com/jackc/pgx/v4 v4.0.0-20190421002000-1b8f0016e912/go.mod h1:no/Y67Jkk/9WuGR0JG/JseM9irFbnEPbuWV2EELPNuM= +github.com/jackc/pgx/v4 v4.0.0-pre1.0.20190824185557-6972a5742186/go.mod h1:X+GQnOEnf1dqHGpw7JmHqHc1NxDoalibchSk9/RWuDc= +github.com/jackc/pgx/v4 v4.12.1-0.20210724153913-640aa07df17c/go.mod h1:1QD0+tgSXP7iUjYm9C1NxKhny7lq6ee99u/z+IHFcgs= +github.com/jackc/pgx/v4 v4.18.2/go.mod h1:Ey4Oru5tH5sB6tV7hDmfWFahwF15Eb7DNXlRKx2CkVw= +github.com/jackc/pgx/v4 v4.18.3 h1:dE2/TrEsGX3RBprb3qryqSV9Y60iZN1C6i8IrmW9/BA= +github.com/jackc/pgx/v4 v4.18.3/go.mod h1:Ey4Oru5tH5sB6tV7hDmfWFahwF15Eb7DNXlRKx2CkVw= +github.com/jackc/pgx/v5 v5.9.2 h1:3ZhOzMWnR4yJ+RW1XImIPsD1aNSz4T4fyP7zlQb56hw= +github.com/jackc/pgx/v5 v5.9.2/go.mod h1:mal1tBGAFfLHvZzaYh77YS/eC6IX9OWbRV1QIIM0Jn4= +github.com/jackc/puddle v0.0.0-20190413234325-e4ced69a3a2b/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk= +github.com/jackc/puddle v0.0.0-20190608224051-11cab39313c9/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk= +github.com/jackc/puddle v1.1.3/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk= +github.com/jackc/puddle v1.3.0/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk= +github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo= +github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4= +github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY= +github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y= +github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck= +github.com/klauspost/compress v1.18.5 h1:/h1gH5Ce+VWNLSWqPzOVn6XBO+vJbCNGvjoaGBFW2IE= +github.com/klauspost/compress v1.18.5/go.mod h1:cwPg85FWrGar70rWktvGQj8/hthj3wpl0PGDogxkrSQ= +github.com/klauspost/cpuid/v2 v2.3.0 h1:S4CRMLnYUhGeDFDqkGriYKdfoFlDnMtqTiI/sFzhA9Y= +github.com/klauspost/cpuid/v2 v2.3.0/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0= +github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= +github.com/konsorten/go-windows-terminal-sequences v1.0.2/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= +github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= +github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= +github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= +github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= +github.com/kr/pty v1.1.8/go.mod h1:O1sed60cT9XZ5uDucP5qwvh+TE3NnUj51EiZO/lmSfw= +github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= +github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= +github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= +github.com/lib/pq v1.0.0/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo= +github.com/lib/pq v1.1.0/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo= +github.com/lib/pq v1.2.0/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo= +github.com/lib/pq v1.10.2/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o= +github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw= +github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o= +github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 h1:6E+4a0GO5zZEnZ81pIr0yLvtUWk2if982qA3F3QD6H4= +github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0/go.mod h1:zJYVVT2jmtg6P3p1VtQj7WsuWi/y4VnjVBn7F8KPB3I= +github.com/magiconair/properties v1.8.10 h1:s31yESBquKXCV9a/ScB3ESkOjUYYv+X0rg8SYxI99mE= +github.com/magiconair/properties v1.8.10/go.mod h1:Dhd985XPs7jluiymwWYZ0G4Z61jb3vdS329zhj2hYo0= +github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0= +github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc= +github.com/mattn/go-colorable v0.1.1/go.mod h1:FuOcm+DKB9mbwrcAfNl7/TZVBZ6rcnceauSikq3lYCQ= +github.com/mattn/go-colorable v0.1.6/go.mod h1:u6P/XSegPjTcexA+o6vUJrdnUu04hMope9wVRipJSqc= +github.com/mattn/go-isatty v0.0.5/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s= +github.com/mattn/go-isatty v0.0.7/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s= +github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU= +github.com/mattn/go-isatty v0.0.21 h1:xYae+lCNBP7QuW4PUnNG61ffM4hVIfm+zUzDuSzYLGs= +github.com/mattn/go-isatty v0.0.21/go.mod h1:ZXfXG4SQHsB/w3ZeOYbR0PrPwLy+n6xiMrJlRFqopa4= +github.com/mdelapenya/tlscert v0.2.0 h1:7H81W6Z/4weDvZBNOfQte5GpIMo0lGYEeWbkGp5LJHI= +github.com/mdelapenya/tlscert v0.2.0/go.mod h1:O4njj3ELLnJjGdkN7M/vIVCpZ+Cf0L6muqOG4tLSl8o= +github.com/mfridman/interpolate v0.0.2 h1:pnuTK7MQIxxFz1Gr+rjSIx9u7qVjf5VOoM/u6BbAxPY= +github.com/mfridman/interpolate v0.0.2/go.mod h1:p+7uk6oE07mpE/Ik1b8EckO0O4ZXiGAfshKBWLUM9Xg= +github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0= +github.com/moby/docker-image-spec v1.3.1/go.mod h1:eKmb5VW8vQEh/BAr2yvVNvuiJuY6UIocYsFu/DxxRpo= +github.com/moby/go-archive v0.2.0 h1:zg5QDUM2mi0JIM9fdQZWC7U8+2ZfixfTYoHL7rWUcP8= +github.com/moby/go-archive v0.2.0/go.mod h1:mNeivT14o8xU+5q1YnNrkQVpK+dnNe/K6fHqnTg4qPU= +github.com/moby/moby/api v1.54.2 h1:wiat9QAhnDQjA7wk1kh/TqHz2I1uUA7M7t9SAl/JNXg= +github.com/moby/moby/api v1.54.2/go.mod h1:+RQ6wluLwtYaTd1WnPLykIDPekkuyD/ROWQClE83pzs= +github.com/moby/moby/client v0.4.1 h1:DMQgisVoMkmMs7fp3ROSdiBnoAu8+vo3GggFl06M/wY= +github.com/moby/moby/client v0.4.1/go.mod h1:z52C9O2POPOsnxZAy//WtKcQ32P+jT/NGeXu/7nfjGQ= +github.com/moby/patternmatcher v0.6.1 h1:qlhtafmr6kgMIJjKJMDmMWq7WLkKIo23hsrpR3x084U= +github.com/moby/patternmatcher v0.6.1/go.mod h1:hDPoyOpDY7OrrMDLaYoY3hf52gNCR/YOUYxkhApJIxc= +github.com/moby/sys/sequential v0.6.0 h1:qrx7XFUd/5DxtqcoH1h438hF5TmOvzC/lspjy7zgvCU= +github.com/moby/sys/sequential v0.6.0/go.mod h1:uyv8EUTrca5PnDsdMGXhZe6CCe8U/UiTWd+lL+7b/Ko= +github.com/moby/sys/user v0.4.0 h1:jhcMKit7SA80hivmFJcbB1vqmw//wU61Zdui2eQXuMs= +github.com/moby/sys/user v0.4.0/go.mod h1:bG+tYYYJgaMtRKgEmuueC0hJEAZWwtIbZTB+85uoHjs= +github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g= +github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28= +github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ= +github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc= +github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 h1:RWengNIwukTxcDr9M+97sNutRR1RKhG96O6jWumTTnw= +github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826/go.mod h1:TaXosZuwdSHYgviHp1DAtfrULt5eUgsSMsZf+YrPgl8= +github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOFAw7w= +github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls= +github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= +github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= +github.com/oasdiff/yaml3 v0.0.12/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= +github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= +github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040= +github.com/opencontainers/image-spec v1.1.1/go.mod h1:qpqAh3Dmcf36wStyyWU+kCeDgrGnAve2nCC8+7h8Q0M= +github.com/perimeterx/marshmallow v1.1.5 h1:a2LALqQ1BlHM8PZblsDdidgv1mWi1DgC2UmX50IvK2s= +github.com/perimeterx/marshmallow v1.1.5/go.mod h1:dsXbUu8CRzfYP5a87xpp0xq9S3u0Vchtcl8we9tYaXw= +github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 h1:o4JXh1EVt9k/+g42oCprj/FisM4qX9L3sZB3upGN2ZU= +github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55/go.mod h1:OmDBASR4679mdNQnz2pUhc2G8CO2JrUAVFDRBDP/hJE= +github.com/pressly/goose/v3 v3.27.1 h1:6uEvcprBybDmW4hcz3gYujhARhye+GoWKhEWyzD5sh4= +github.com/pressly/goose/v3 v3.27.1/go.mod h1:maruOxsPnIG2yHHyo8UqKWXYKFcH7Q76csUV7+7KYoM= +github.com/redis/go-redis/extra/rediscmd/v9 v9.18.0 h1:QY4nmPHLFAJjtT5O4OMUEOxP8WVaRNOFpcbmxT2NLZU= +github.com/redis/go-redis/extra/rediscmd/v9 v9.18.0/go.mod h1:WH8cY/0fT41Bsf341qzo8v4nx0GCE8FykAA23IVbVmo= +github.com/redis/go-redis/extra/redisotel/v9 v9.18.0 h1:2dKdoEYBJ0CZCLPiCdvvc7luz3DPwY6hKdzjL6m1eHE= +github.com/redis/go-redis/extra/redisotel/v9 v9.18.0/go.mod h1:WzkrVG9ro9BwCQD0eJOWn6AGL4Z1CleGflM45w1hu10= +github.com/redis/go-redis/v9 v9.18.0 h1:pMkxYPkEbMPwRdenAzUNyFNrDgHx9U+DrBabWNfSRQs= +github.com/redis/go-redis/v9 v9.18.0/go.mod h1:k3ufPphLU5YXwNTUcCRXGxUoF1fqxnhFQmscfkCoDA0= +github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE= +github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo= +github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs= +github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro= +github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4= +github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= +github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= +github.com/rs/xid v1.2.1/go.mod h1:+uKXf+4Djp6Md1KODXJxgGQPKngRmWyn10oCKFzNHOQ= +github.com/rs/zerolog v1.13.0/go.mod h1:YbFCdg8HfsridGWAh22vktObvhZbQsZXe4/zB0OKkWU= +github.com/rs/zerolog v1.15.0/go.mod h1:xYTKnLHcpfU2225ny5qZjxnj9NvkumZYjJHlAThCjNc= +github.com/satori/go.uuid v1.2.0/go.mod h1:dA0hQrYB0VpLJoorglMZABFdXlWrHn1NEOzdhQKdks0= +github.com/sethvargo/go-retry v0.3.0 h1:EEt31A35QhrcRZtrYFDTBg91cqZVnFL2navjDrah2SE= +github.com/sethvargo/go-retry v0.3.0/go.mod h1:mNX17F0C/HguQMyMyJxcnU471gOZGxCLyYaFyAZraas= +github.com/shirou/gopsutil/v4 v4.26.3 h1:2ESdQt90yU3oXF/CdOlRCJxrP+Am1aBYubTMTfxJ1qc= +github.com/shirou/gopsutil/v4 v4.26.3/go.mod h1:LZ6ewCSkBqUpvSOf+LsTGnRinC6iaNUNMGBtDkJBaLQ= +github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24/go.mod h1:M+9NzErvs504Cn4c5DxATwIqPbtswREoFCre64PpcG4= +github.com/shopspring/decimal v1.2.0/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFRcu2hWCYk4o= +github.com/sirupsen/logrus v1.4.1/go.mod h1:ni0Sbl8bgC9z8RoU9G6nDWqqs/fq4eDPysMBDgk/93Q= +github.com/sirupsen/logrus v1.4.2/go.mod h1:tLMulIdttU9McNUspp0xgXVQah82FyeX6MwdIuYE2rE= +github.com/sirupsen/logrus v1.9.4 h1:TsZE7l11zFCLZnZ+teH4Umoq5BhEIfIzfRDZ1Uzql2w= +github.com/sirupsen/logrus v1.9.4/go.mod h1:ftWc9WdOfJ0a92nsE2jF5u5ZwH8Bv2zdeOC42RjbV2g= +github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/objx v0.2.0/go.mod h1:qt09Ya8vawLte6SNmTgCsAVtYtaKzEcn8ATUoHMkEqE= +github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= +github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= +github.com/stretchr/objx v0.5.3 h1:jmXUvGomnU1o3W/V5h2VEradbpJDwGrzugQQvL0POH4= +github.com/stretchr/objx v0.5.3/go.mod h1:rDQraq+vQZU7Fde9LOZLr8Tax6zZvy4kuNKF+QYS+U0= +github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= +github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= +github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= +github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA= +github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= +github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= +github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= +github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= +github.com/testcontainers/testcontainers-go v0.42.0 h1:He3IhTzTZOygSXLJPMX7n44XtK+qhjat1nI9cneBbUY= +github.com/testcontainers/testcontainers-go v0.42.0/go.mod h1:vZjdY1YmUA1qEForxOIOazfsrdyORJAbhi0bp8plN30= +github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0 h1:GCbb1ndrF7OTDiIvxXyItaDab4qkzTFJ48LKFdM7EIo= +github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0/go.mod h1:IRPBaI8jXdrNfD0e4Zm7Fbcgaz5shKxOQv4axiL09xs= +github.com/tklauser/go-sysconf v0.3.16 h1:frioLaCQSsF5Cy1jgRBrzr6t502KIIwQ0MArYICU0nA= +github.com/tklauser/go-sysconf v0.3.16/go.mod h1:/qNL9xxDhc7tx3HSRsLWNnuzbVfh3e7gh/BmM179nYI= +github.com/tklauser/numcpus v0.11.0 h1:nSTwhKH5e1dMNsCdVBukSZrURJRoHbSEQjdEbY+9RXw= +github.com/tklauser/numcpus v0.11.0/go.mod h1:z+LwcLq54uWZTX0u/bGobaV34u6V7KNlTZejzM6/3MQ= +github.com/ugorji/go/codec v1.3.1 h1:waO7eEiFDwidsBN6agj1vJQ4AG7lh2yqXyOXqhgQuyY= +github.com/ugorji/go/codec v1.3.1/go.mod h1:pRBVtBSKl77K30Bv8R2P+cLSGaTtex6fsA2Wjqmfxj4= +github.com/woodsbury/decimal128 v1.3.0 h1:8pffMNWIlC0O5vbyHWFZAt5yWvWcrHA+3ovIIjVWss0= +github.com/woodsbury/decimal128 v1.3.0/go.mod h1:C5UTmyTjW3JftjUFzOVhC20BEQa2a4ZKOB5I6Zjb+ds= +github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY= +github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M= +github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw= +github.com/yusufpapurcu/wmi v1.2.4 h1:zFUKzehAFReQwLys1b/iSMl+JQGSCSjtVqQn9bBrPo0= +github.com/yusufpapurcu/wmi v1.2.4/go.mod h1:SBZ9tNy3G9/m5Oi98Zks0QjeHVDvuK0qfxQmPyzfmi0= +github.com/zeebo/xxh3 v1.0.2 h1:xZmwmqxHZA8AI603jOQ0tMqmBr9lPeFwGg6d+xy9DC0= +github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA= +github.com/zenazn/goji v0.9.0/go.mod h1:7S9M489iMyHBNxwZnk9/EHS098H4/F6TATF2mIxtB1Q= +go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64= +go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y= +go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.68.0 h1:CqXxU8VOmDefoh0+ztfGaymYbhdB/tT3zs79QaZTNGY= +go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.68.0/go.mod h1:BuhAPThV8PBHBvg8ZzZ/Ok3idOdhWIodywz2xEcRbJo= +go.opentelemetry.io/otel v1.43.0 h1:mYIM03dnh5zfN7HautFE4ieIig9amkNANT+xcVxAj9I= +go.opentelemetry.io/otel v1.43.0/go.mod h1:JuG+u74mvjvcm8vj8pI5XiHy1zDeoCS2LB1spIq7Ay0= +go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.43.0 h1:8UQVDcZxOJLtX6gxtDt3vY2WTgvZqMQRzjsqiIHQdkc= +go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.43.0/go.mod h1:2lmweYCiHYpEjQ/lSJBYhj9jP1zvCvQW4BqL9dnT7FQ= +go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.43.0 h1:w1K+pCJoPpQifuVpsKamUdn9U0zM3xUziVOqsGksUrY= +go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.43.0/go.mod h1:HBy4BjzgVE8139ieRI75oXm3EcDN+6GhD88JT1Kjvxg= +go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0 h1:88Y4s2C8oTui1LGM6bTWkw0ICGcOLCAI5l6zsD1j20k= +go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0/go.mod h1:Vl1/iaggsuRlrHf/hfPJPvVag77kKyvrLeD10kpMl+A= +go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0 h1:RAE+JPfvEmvy+0LzyUA25/SGawPwIUbZ6u0Wug54sLc= +go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0/go.mod h1:AGmbycVGEsRx9mXMZ75CsOyhSP6MFIcj/6dnG+vhVjk= +go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0 h1:3iZJKlCZufyRzPzlQhUIWVmfltrXuGyfjREgGP3UUjc= +go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0/go.mod h1:/G+nUPfhq2e+qiXMGxMwumDrP5jtzU+mWN7/sjT2rak= +go.opentelemetry.io/otel/exporters/stdout/stdoutmetric v1.43.0 h1:TC+BewnDpeiAmcscXbGMfxkO+mwYUwE/VySwvw88PfA= +go.opentelemetry.io/otel/exporters/stdout/stdoutmetric v1.43.0/go.mod h1:J/ZyF4vfPwsSr9xJSPyQ4LqtcTPULFR64KwTikGLe+A= +go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.43.0 h1:mS47AX77OtFfKG4vtp+84kuGSFZHTyxtXIN269vChY0= +go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.43.0/go.mod h1:PJnsC41lAGncJlPUniSwM81gc80GkgWJWr3cu2nKEtU= +go.opentelemetry.io/otel/metric v1.43.0 h1:d7638QeInOnuwOONPp4JAOGfbCEpYb+K6DVWvdxGzgM= +go.opentelemetry.io/otel/metric v1.43.0/go.mod h1:RDnPtIxvqlgO8GRW18W6Z/4P462ldprJtfxHxyKd2PY= +go.opentelemetry.io/otel/sdk v1.43.0 h1:pi5mE86i5rTeLXqoF/hhiBtUNcrAGHLKQdhg4h4V9Dg= +go.opentelemetry.io/otel/sdk v1.43.0/go.mod h1:P+IkVU3iWukmiit/Yf9AWvpyRDlUeBaRg6Y+C58QHzg= +go.opentelemetry.io/otel/sdk/metric v1.43.0 h1:S88dyqXjJkuBNLeMcVPRFXpRw2fuwdvfCGLEo89fDkw= +go.opentelemetry.io/otel/sdk/metric v1.43.0/go.mod h1:C/RJtwSEJ5hzTiUz5pXF1kILHStzb9zFlIEe85bhj6A= +go.opentelemetry.io/otel/trace v1.43.0 h1:BkNrHpup+4k4w+ZZ86CZoHHEkohws8AY+WTX09nk+3A= +go.opentelemetry.io/otel/trace v1.43.0/go.mod h1:/QJhyVBUUswCphDVxq+8mld+AvhXZLhe+8WVFxiFff0= +go.opentelemetry.io/proto/otlp v1.10.0 h1:IQRWgT5srOCYfiWnpqUYz9CVmbO8bFmKcwYxpuCSL2g= +go.opentelemetry.io/proto/otlp v1.10.0/go.mod h1:/CV4QoCR/S9yaPj8utp3lvQPoqMtxXdzn7ozvvozVqk= +go.uber.org/atomic v1.3.2/go.mod h1:gD2HeocX3+yG+ygLZcrzQJaqmWj9AIm7n08wl/qW/PE= +go.uber.org/atomic v1.4.0/go.mod h1:gD2HeocX3+yG+ygLZcrzQJaqmWj9AIm7n08wl/qW/PE= +go.uber.org/atomic v1.5.0/go.mod h1:sABNBOSYdrvTF6hTgEIbc7YasKWGhgEQZyfxyTvoXHQ= +go.uber.org/atomic v1.6.0/go.mod h1:sABNBOSYdrvTF6hTgEIbc7YasKWGhgEQZyfxyTvoXHQ= +go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE= +go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0= +go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= +go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= +go.uber.org/multierr v1.1.0/go.mod h1:wR5kodmAFQ0UK8QlbwjlSNy0Z68gJhDJUG5sjR94q/0= +go.uber.org/multierr v1.3.0/go.mod h1:VgVr7evmIr6uPjLBxg28wmKNXyqE9akIJ5XnfpiKl+4= +go.uber.org/multierr v1.5.0/go.mod h1:FeouvMocqHpRaaGuG9EjoKcStLC43Zu/fmqdUMPcKYU= +go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0= +go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y= +go.uber.org/tools v0.0.0-20190618225709-2cfd321de3ee/go.mod h1:vJERXedbb3MVM5f9Ejo0C68/HhF8uaILCdgjnY+goOA= +go.uber.org/zap v1.9.1/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q= +go.uber.org/zap v1.10.0/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q= +go.uber.org/zap v1.13.0/go.mod h1:zwrFLgMcdUuIBviXEYEH1YKNaOBnKXsx2IPda5bBwHM= +golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= +golang.org/x/crypto v0.0.0-20190411191339-88737f569e3a/go.mod h1:WFFai1msRO1wXaEeE5yQxYXgSfI8pQAWXbQop6sCtWE= +golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20190820162420-60c769a6c586/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= +golang.org/x/crypto v0.0.0-20201203163018-be400aefbc4c/go.mod h1:jdWPYTVW3xRLrWPugEBEK3UY2ZEsg3UU495nc5E+M+I= +golang.org/x/crypto v0.0.0-20210616213533-5ff15b29337e/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= +golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= +golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= +golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU= +golang.org/x/crypto v0.20.0/go.mod h1:Xwo95rrVNIoSMx9wa1JroENMToLWn3RNVrTBpLHgZPQ= +golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI= +golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q= +golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= +golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc= +golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg= +golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4= +golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs= +golang.org/x/mod v0.35.0 h1:Ww1D637e6Pg+Zb2KrWfHQUnH2dQRLBQyAtpr/haaJeM= +golang.org/x/mod v0.35.0/go.mod h1:+GwiRhIInF8wPm+4AoT6L0FA1QWAad3OMdTRx4tFYlU= +golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20190813141303-74dc4d7220e7/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= +golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= +golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c= +golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs= +golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg= +golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44= +golang.org/x/net v0.53.0 h1:d+qAbo5L0orcWAr0a9JweQpjXF19LMXJE8Ey7hwOdUA= +golang.org/x/net v0.53.0/go.mod h1:JvMuJH7rrdiCfbeHoo3fCQU24Lf5JJwT9W3sJFulfgs= +golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4= +golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0= +golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190222072716-a9d3bda3a223/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190403152447-81d4e9dc473e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190813064441-fde4db37ae7a/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200116001909-b77594299b42/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201204225414-ed752295db88/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20210616094352-59db8d763f22/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= +golang.org/x/sys v0.43.0 h1:Rlag2XtaFTxp19wS8MXlJwTvoh8ArU6ezoyFsMyCTNI= +golang.org/x/sys v0.43.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= +golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw= +golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= +golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= +golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k= +golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo= +golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk= +golang.org/x/term v0.42.0 h1:UiKe+zDFmJobeJ5ggPwOshJIVt6/Ft0rcfrXZDLWAWY= +golang.org/x/term v0.42.0/go.mod h1:Dq/D+snpsbazcBG5+F9Q1n2rXV8Ma+71xEjTRufARgY= +golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk= +golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.4/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= +golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ= +golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8= +golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8= +golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU= +golang.org/x/text v0.36.0 h1:JfKh3XmcRPqZPKevfXVpI1wXPTqbkE5f7JA92a55Yxg= +golang.org/x/text v0.36.0/go.mod h1:NIdBknypM8iqVmPiuco0Dh6P5Jcdk8lJL0CUebqK164= +golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= +golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= +golang.org/x/tools v0.0.0-20190425163242-31fd60d6bfdc/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= +golang.org/x/tools v0.0.0-20190621195816-6e04913cbbac/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= +golang.org/x/tools v0.0.0-20190823170909-c4a336ef6a2f/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191029041327-9cc4af7d6b2c/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191029190741-b9c20aec41a5/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= +golang.org/x/tools v0.0.0-20200103221440-774c71fcf114/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= +golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc= +golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU= +golang.org/x/xerrors v0.0.0-20190410155217-1f06c39b4373/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20190513163551-3ee3066db522/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +gonum.org/v1/gonum v0.17.0 h1:VbpOemQlsSMrYmn7T2OUvQ4dqxQXU+ouZFQsZOx50z4= +gonum.org/v1/gonum v0.17.0/go.mod h1:El3tOrEuMpv2UdMrbNlKEh9vd86bmQ6vqIcDwxEOc1E= +google.golang.org/genproto/googleapis/api v0.0.0-20260401024825-9d38bb4040a9 h1:VPWxll4HlMw1Vs/qXtN7BvhZqsS9cdAittCNvVENElA= +google.golang.org/genproto/googleapis/api v0.0.0-20260401024825-9d38bb4040a9/go.mod h1:7QBABkRtR8z+TEnmXTqIqwJLlzrZKVfAUm7tY3yGv0M= +google.golang.org/genproto/googleapis/rpc v0.0.0-20260420184626-e10c466a9529 h1:XF8+t6QQiS0o9ArVan/HW8Q7cycNPGsJf6GA2nXxYAg= +google.golang.org/genproto/googleapis/rpc v0.0.0-20260420184626-e10c466a9529/go.mod h1:4Hqkh8ycfw05ld/3BWL7rJOSfebL2Q+DVDeRgYgxUU8= +google.golang.org/grpc v1.80.0 h1:Xr6m2WmWZLETvUNvIUmeD5OAagMw3FiKmMlTdViWsHM= +google.golang.org/grpc v1.80.0/go.mod h1:ho/dLnxwi3EDJA4Zghp7k2Ec1+c2jqup0bFkw07bwF4= +google.golang.org/protobuf v1.36.11 h1:fV6ZwhNocDyBLK0dj+fg8ektcVegBBuEolpbTQyBNVE= +google.golang.org/protobuf v1.36.11/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= +gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI= +gopkg.in/inconshreveable/log15.v2 v2.0.0-20180818164646-67afb5ed74ec/go.mod h1:aPpfJ7XW+gOuirDoZ8gHhLh3kZ1B08FtV2bbmy7Jv3s= +gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gotest.tools/v3 v3.5.2 h1:7koQfIKdy+I8UTetycgUqXWSDwpgv193Ka+qRsmBY8Q= +gotest.tools/v3 v3.5.2/go.mod h1:LtdLGcnqToBH83WByAAi/wiwSFCArdFIUV/xxN4pcjA= +honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg= +modernc.org/libc v1.72.1 h1:db1xwJ6u1kE3KHTFTTbe2GCrczHPKzlURP0aDC4NGD0= +modernc.org/libc v1.72.1/go.mod h1:HRMiC/PhPGLIPM7GzAFCbI+oSgE3dhZ8FWftmRrHVlY= +modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU= +modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg= +modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI= +modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw= +modernc.org/sqlite v1.49.1 h1:dYGHTKcX1sJ+EQDnUzvz4TJ5GbuvhNJa8Fg6ElGx73U= +modernc.org/sqlite v1.49.1/go.mod h1:m0w8xhwYUVY3H6pSDwc3gkJ/irZT/0YEXwBlhaxQEew= +pgregory.net/rapid v1.2.0 h1:keKAYRcjm+e1F0oAuU5F5+YPAWcyxNNRK2wud503Gnk= +pgregory.net/rapid v1.2.0/go.mod h1:PY5XlDGj0+V1FCq0o192FdRhpKHGTRIWBgqjDBTrq04= diff --git a/gamemaster/internal/adapters/engineclient/client.go b/gamemaster/internal/adapters/engineclient/client.go new file mode 100644 index 0000000..a5d6236 --- /dev/null +++ b/gamemaster/internal/adapters/engineclient/client.go @@ -0,0 +1,441 @@ +// Package engineclient provides the trusted-internal HTTP client Game +// Master uses to talk to the engine container. The adapter implements +// `ports.EngineClient` over the routes documented in +// `galaxy/game/openapi.yaml`: +// +// - admin paths under `/api/v1/admin/*` (init, status, turn, +// race/banish); +// - player paths under `/api/v1/{command, order, report}`. +// +// The engine endpoint URL is per-call (Game Master keeps it on +// `runtime_records.engine_endpoint`), so the client does not bind a +// base URL at construction time. Only the per-call timeouts are wired +// through `Config`: `CallTimeout` covers turn-generation-class +// operations, `ProbeTimeout` covers inspect-style reads. +package engineclient + +import ( + "bytes" + "context" + "encoding/json" + "errors" + "fmt" + "io" + "math" + "net/http" + "net/url" + "strconv" + "strings" + "time" + + "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" + + "galaxy/gamemaster/internal/ports" +) + +const ( + pathAdminInit = "/api/v1/admin/init" + pathAdminStatus = "/api/v1/admin/status" + pathAdminTurn = "/api/v1/admin/turn" + pathAdminRaceBanish = "/api/v1/admin/race/banish" + pathPlayerCommand = "/api/v1/command" + pathPlayerOrder = "/api/v1/order" + pathPlayerReport = "/api/v1/report" +) + +// Config configures one HTTP-backed engine client. +type Config struct { + // CallTimeout bounds turn-generation-class operations: init, turn, + // banish, command, order. Mirrors `GAMEMASTER_ENGINE_CALL_TIMEOUT`. + CallTimeout time.Duration + + // ProbeTimeout bounds inspect-style reads: status, report. Mirrors + // `GAMEMASTER_ENGINE_PROBE_TIMEOUT`. + ProbeTimeout time.Duration +} + +// Client speaks REST/JSON to the engine container. +type Client struct { + callTimeout time.Duration + probeTimeout time.Duration + httpClient *http.Client + closeIdleConnections func() +} + +// NewClient constructs an engine client with `otelhttp`-instrumented +// transport cloned from `http.DefaultTransport`. The returned `Close` +// hook releases idle connections owned by that transport. +func NewClient(cfg Config) (*Client, error) { + transport, ok := http.DefaultTransport.(*http.Transport) + if !ok { + return nil, errors.New("new engine client: default transport is not *http.Transport") + } + cloned := transport.Clone() + return newClient(cfg, &http.Client{Transport: otelhttp.NewTransport(cloned)}, cloned.CloseIdleConnections) +} + +func newClient(cfg Config, httpClient *http.Client, closeIdleConnections func()) (*Client, error) { + switch { + case cfg.CallTimeout <= 0: + return nil, errors.New("new engine client: call timeout must be positive") + case cfg.ProbeTimeout <= 0: + return nil, errors.New("new engine client: probe timeout must be positive") + case httpClient == nil: + return nil, errors.New("new engine client: http client must not be nil") + } + return &Client{ + callTimeout: cfg.CallTimeout, + probeTimeout: cfg.ProbeTimeout, + httpClient: httpClient, + closeIdleConnections: closeIdleConnections, + }, nil +} + +// Close releases idle HTTP connections owned by the underlying +// transport. Safe to call multiple times. +func (client *Client) Close() error { + if client == nil || client.closeIdleConnections == nil { + return nil + } + client.closeIdleConnections() + return nil +} + +// Init calls POST /api/v1/admin/init. +func (client *Client) Init(ctx context.Context, baseURL string, request ports.InitRequest) (ports.StateResponse, error) { + if err := client.validateBase(baseURL); err != nil { + return ports.StateResponse{}, err + } + if len(request.Races) == 0 { + return ports.StateResponse{}, errors.New("engine init: races must not be empty") + } + body, err := encodeInitRequest(request) + if err != nil { + return ports.StateResponse{}, fmt.Errorf("engine init: encode request: %w", err) + } + payload, status, doErr := client.doRequest(ctx, http.MethodPost, baseURL+pathAdminInit, body, client.callTimeout) + if doErr != nil { + return ports.StateResponse{}, fmt.Errorf("%w: engine init: %w", ports.ErrEngineUnreachable, doErr) + } + switch status { + case http.StatusOK, http.StatusCreated: + return decodeStateResponse(payload, "engine init") + case http.StatusBadRequest: + return ports.StateResponse{}, fmt.Errorf("%w: engine init: %s", ports.ErrEngineValidation, summariseEngineError(payload, status)) + default: + return ports.StateResponse{}, fmt.Errorf("%w: engine init: %s", ports.ErrEngineUnreachable, summariseEngineError(payload, status)) + } +} + +// Status calls GET /api/v1/admin/status. +func (client *Client) Status(ctx context.Context, baseURL string) (ports.StateResponse, error) { + if err := client.validateBase(baseURL); err != nil { + return ports.StateResponse{}, err + } + payload, status, doErr := client.doRequest(ctx, http.MethodGet, baseURL+pathAdminStatus, nil, client.probeTimeout) + if doErr != nil { + return ports.StateResponse{}, fmt.Errorf("%w: engine status: %w", ports.ErrEngineUnreachable, doErr) + } + switch status { + case http.StatusOK: + return decodeStateResponse(payload, "engine status") + case http.StatusBadRequest: + return ports.StateResponse{}, fmt.Errorf("%w: engine status: %s", ports.ErrEngineValidation, summariseEngineError(payload, status)) + default: + return ports.StateResponse{}, fmt.Errorf("%w: engine status: %s", ports.ErrEngineUnreachable, summariseEngineError(payload, status)) + } +} + +// Turn calls PUT /api/v1/admin/turn. +func (client *Client) Turn(ctx context.Context, baseURL string) (ports.StateResponse, error) { + if err := client.validateBase(baseURL); err != nil { + return ports.StateResponse{}, err + } + payload, status, doErr := client.doRequest(ctx, http.MethodPut, baseURL+pathAdminTurn, nil, client.callTimeout) + if doErr != nil { + return ports.StateResponse{}, fmt.Errorf("%w: engine turn: %w", ports.ErrEngineUnreachable, doErr) + } + switch status { + case http.StatusOK: + return decodeStateResponse(payload, "engine turn") + case http.StatusBadRequest: + return ports.StateResponse{}, fmt.Errorf("%w: engine turn: %s", ports.ErrEngineValidation, summariseEngineError(payload, status)) + default: + return ports.StateResponse{}, fmt.Errorf("%w: engine turn: %s", ports.ErrEngineUnreachable, summariseEngineError(payload, status)) + } +} + +// BanishRace calls POST /api/v1/admin/race/banish with body +// `{race_name}`. Engine returns 204 on success. +func (client *Client) BanishRace(ctx context.Context, baseURL, raceName string) error { + if err := client.validateBase(baseURL); err != nil { + return err + } + if strings.TrimSpace(raceName) == "" { + return errors.New("engine banish: race name must not be empty") + } + body, err := json.Marshal(banishRequestEnvelope{RaceName: raceName}) + if err != nil { + return fmt.Errorf("engine banish: encode request: %w", err) + } + payload, status, doErr := client.doRequest(ctx, http.MethodPost, baseURL+pathAdminRaceBanish, body, client.callTimeout) + if doErr != nil { + return fmt.Errorf("%w: engine banish: %w", ports.ErrEngineUnreachable, doErr) + } + switch status { + case http.StatusNoContent, http.StatusOK: + return nil + case http.StatusBadRequest: + return fmt.Errorf("%w: engine banish: %s", ports.ErrEngineValidation, summariseEngineError(payload, status)) + default: + return fmt.Errorf("%w: engine banish: %s", ports.ErrEngineUnreachable, summariseEngineError(payload, status)) + } +} + +// ExecuteCommands calls PUT /api/v1/command with payload forwarded +// verbatim. The engine response body is returned verbatim; on 4xx the +// body is returned alongside `ports.ErrEngineValidation` so callers can +// forward the per-command errors. +func (client *Client) ExecuteCommands(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + return client.forwardPlayerWrite(ctx, baseURL, pathPlayerCommand, payload, "engine command") +} + +// PutOrders calls PUT /api/v1/order with the same forwarding semantics +// as ExecuteCommands. +func (client *Client) PutOrders(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + return client.forwardPlayerWrite(ctx, baseURL, pathPlayerOrder, payload, "engine order") +} + +// GetReport calls GET /api/v1/report?player=&turn= and +// returns the engine response body verbatim. +func (client *Client) GetReport(ctx context.Context, baseURL, raceName string, turn int) (json.RawMessage, error) { + if err := client.validateBase(baseURL); err != nil { + return nil, err + } + if strings.TrimSpace(raceName) == "" { + return nil, errors.New("engine report: race name must not be empty") + } + if turn < 0 { + return nil, fmt.Errorf("engine report: turn must not be negative, got %d", turn) + } + values := url.Values{} + values.Set("player", raceName) + values.Set("turn", strconv.Itoa(turn)) + target := baseURL + pathPlayerReport + "?" + values.Encode() + body, status, doErr := client.doRequest(ctx, http.MethodGet, target, nil, client.probeTimeout) + if doErr != nil { + return nil, fmt.Errorf("%w: engine report: %w", ports.ErrEngineUnreachable, doErr) + } + switch status { + case http.StatusOK: + if len(body) == 0 { + return nil, fmt.Errorf("%w: engine report: empty response body", ports.ErrEngineProtocolViolation) + } + return json.RawMessage(body), nil + case http.StatusBadRequest: + return json.RawMessage(body), fmt.Errorf("%w: engine report: %s", ports.ErrEngineValidation, summariseEngineError(body, status)) + default: + return nil, fmt.Errorf("%w: engine report: %s", ports.ErrEngineUnreachable, summariseEngineError(body, status)) + } +} + +func (client *Client) forwardPlayerWrite(ctx context.Context, baseURL, requestPath string, payload json.RawMessage, opLabel string) (json.RawMessage, error) { + if err := client.validateBase(baseURL); err != nil { + return nil, err + } + if len(bytes.TrimSpace(payload)) == 0 { + return nil, fmt.Errorf("%s: payload must not be empty", opLabel) + } + body, status, doErr := client.doRequest(ctx, http.MethodPut, baseURL+requestPath, []byte(payload), client.callTimeout) + if doErr != nil { + return nil, fmt.Errorf("%w: %s: %w", ports.ErrEngineUnreachable, opLabel, doErr) + } + switch status { + case http.StatusNoContent, http.StatusOK: + if len(body) == 0 { + return nil, nil + } + return json.RawMessage(body), nil + case http.StatusBadRequest: + return json.RawMessage(body), fmt.Errorf("%w: %s: %s", ports.ErrEngineValidation, opLabel, summariseEngineError(body, status)) + default: + return nil, fmt.Errorf("%w: %s: %s", ports.ErrEngineUnreachable, opLabel, summariseEngineError(body, status)) + } +} + +// validateBase rejects nil clients, nil/cancelled contexts, and +// malformed engine endpoints up-front so transport-layer plumbing does +// not need to handle them. +func (client *Client) validateBase(baseURL string) error { + if client == nil || client.httpClient == nil { + return errors.New("engine client: nil client") + } + if strings.TrimSpace(baseURL) == "" { + return errors.New("engine client: base url must not be empty") + } + parsed, err := url.Parse(baseURL) + if err != nil { + return fmt.Errorf("engine client: parse base url: %w", err) + } + if parsed.Scheme == "" || parsed.Host == "" { + return fmt.Errorf("engine client: base url %q must be absolute", baseURL) + } + return nil +} + +func (client *Client) doRequest(ctx context.Context, method, target string, body []byte, timeout time.Duration) ([]byte, int, error) { + if ctx == nil { + return nil, 0, errors.New("nil context") + } + if err := ctx.Err(); err != nil { + return nil, 0, err + } + attemptCtx, cancel := context.WithTimeout(ctx, timeout) + defer cancel() + + var reader io.Reader + if len(body) > 0 { + reader = bytes.NewReader(body) + } + req, err := http.NewRequestWithContext(attemptCtx, method, target, reader) + if err != nil { + return nil, 0, fmt.Errorf("build request: %w", err) + } + req.Header.Set("Accept", "application/json") + if len(body) > 0 { + req.Header.Set("Content-Type", "application/json") + } + resp, err := client.httpClient.Do(req) + if err != nil { + return nil, 0, err + } + defer resp.Body.Close() + respBody, err := io.ReadAll(resp.Body) + if err != nil { + return nil, resp.StatusCode, fmt.Errorf("read response body: %w", err) + } + return respBody, resp.StatusCode, nil +} + +// encodeInitRequest serialises ports.InitRequest into the engine spec +// shape (`InitRequest`/`InitRace`). +func encodeInitRequest(request ports.InitRequest) ([]byte, error) { + envelope := initRequestEnvelope{Races: make([]initRaceEnvelope, 0, len(request.Races))} + for _, race := range request.Races { + if strings.TrimSpace(race.RaceName) == "" { + return nil, errors.New("init race: race name must not be empty") + } + envelope.Races = append(envelope.Races, initRaceEnvelope{RaceName: race.RaceName}) + } + return json.Marshal(envelope) +} + +// decodeStateResponse decodes the engine StateResponse payload into the +// port-level StateResponse projection. Unknown fields are tolerated; +// missing required ones surface as ErrEngineProtocolViolation. +func decodeStateResponse(payload []byte, opLabel string) (ports.StateResponse, error) { + if len(payload) == 0 { + return ports.StateResponse{}, fmt.Errorf("%w: %s: empty response body", ports.ErrEngineProtocolViolation, opLabel) + } + var envelope stateResponseEnvelope + decoder := json.NewDecoder(bytes.NewReader(payload)) + if err := decoder.Decode(&envelope); err != nil { + return ports.StateResponse{}, fmt.Errorf("%w: %s: decode body: %w", ports.ErrEngineProtocolViolation, opLabel, err) + } + if strings.TrimSpace(envelope.ID) == "" { + return ports.StateResponse{}, fmt.Errorf("%w: %s: missing id", ports.ErrEngineProtocolViolation, opLabel) + } + if envelope.Player == nil { + return ports.StateResponse{}, fmt.Errorf("%w: %s: missing player array", ports.ErrEngineProtocolViolation, opLabel) + } + state := ports.StateResponse{ + Turn: envelope.Turn, + Finished: envelope.Finished, + Players: make([]ports.PlayerState, 0, len(envelope.Player)), + } + for index, player := range envelope.Player { + if strings.TrimSpace(player.RaceName) == "" { + return ports.StateResponse{}, fmt.Errorf("%w: %s: player[%d] missing raceName", ports.ErrEngineProtocolViolation, opLabel, index) + } + if strings.TrimSpace(player.ID) == "" { + return ports.StateResponse{}, fmt.Errorf("%w: %s: player[%d] missing id", ports.ErrEngineProtocolViolation, opLabel, index) + } + if player.Planets < 0 { + return ports.StateResponse{}, fmt.Errorf("%w: %s: player[%d] negative planets", ports.ErrEngineProtocolViolation, opLabel, index) + } + if math.IsNaN(player.Population) || math.IsInf(player.Population, 0) || player.Population < 0 { + return ports.StateResponse{}, fmt.Errorf("%w: %s: player[%d] invalid population", ports.ErrEngineProtocolViolation, opLabel, index) + } + state.Players = append(state.Players, ports.PlayerState{ + RaceName: player.RaceName, + EnginePlayerUUID: player.ID, + Planets: player.Planets, + Population: int(math.Round(player.Population)), + }) + } + return state, nil +} + +// summariseEngineError extracts a short, human-readable summary from +// the engine's validation/internal-error envelopes for the wrapped +// error message. +func summariseEngineError(payload []byte, status int) string { + trimmed := bytes.TrimSpace(payload) + if len(trimmed) == 0 { + return fmt.Sprintf("status=%d", status) + } + var envelope engineErrorEnvelope + if err := json.Unmarshal(trimmed, &envelope); err == nil { + switch { + case envelope.GenericError != "": + return fmt.Sprintf("status=%d generic_error=%q code=%d", status, envelope.GenericError, envelope.Code) + case envelope.Error != "": + return fmt.Sprintf("status=%d error=%q", status, envelope.Error) + } + } + return fmt.Sprintf("status=%d", status) +} + +// stateResponseEnvelope mirrors `StateResponse` from +// `game/openapi.yaml`. Unknown fields are tolerated by encoding/json. +type stateResponseEnvelope struct { + ID string `json:"id"` + Turn int `json:"turn"` + Stage int `json:"stage"` + Player []playerStateEnvelope `json:"player"` + Finished bool `json:"finished"` +} + +// playerStateEnvelope mirrors `PlayerState`. Population is `number` +// per the engine spec, so the adapter decodes into float64 and rounds +// to the port-level int (engine in practice always returns whole +// numbers; rounding is a defensive guard against floating-point +// noise). +type playerStateEnvelope struct { + ID string `json:"id"` + RaceName string `json:"raceName"` + Planets int `json:"planets"` + Population float64 `json:"population"` + Extinct bool `json:"extinct"` +} + +type initRequestEnvelope struct { + Races []initRaceEnvelope `json:"races"` +} + +type initRaceEnvelope struct { + RaceName string `json:"raceName"` +} + +type banishRequestEnvelope struct { + RaceName string `json:"race_name"` +} + +type engineErrorEnvelope struct { + Error string `json:"error"` + GenericError string `json:"generic_error"` + Code int `json:"code"` +} + +// Compile-time assertion: Client implements ports.EngineClient. +var _ ports.EngineClient = (*Client)(nil) diff --git a/gamemaster/internal/adapters/engineclient/client_test.go b/gamemaster/internal/adapters/engineclient/client_test.go new file mode 100644 index 0000000..4f4b10e --- /dev/null +++ b/gamemaster/internal/adapters/engineclient/client_test.go @@ -0,0 +1,363 @@ +package engineclient + +import ( + "context" + "encoding/json" + "errors" + "io" + "net/http" + "net/http/httptest" + "strings" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "galaxy/gamemaster/internal/ports" +) + +func newTestClient(t *testing.T, callTimeout, probeTimeout time.Duration) *Client { + t.Helper() + client, err := NewClient(Config{CallTimeout: callTimeout, ProbeTimeout: probeTimeout}) + require.NoError(t, err) + t.Cleanup(func() { _ = client.Close() }) + return client +} + +func TestNewClientValidatesConfig(t *testing.T) { + cases := map[string]Config{ + "non-positive call timeout": {CallTimeout: 0, ProbeTimeout: time.Second}, + "non-positive probe timeout": {CallTimeout: time.Second, ProbeTimeout: 0}, + } + for name, cfg := range cases { + t.Run(name, func(t *testing.T) { + _, err := NewClient(cfg) + require.Error(t, err) + }) + } +} + +func TestInitHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPost, r.Method) + require.Equal(t, "/api/v1/admin/init", r.URL.Path) + require.Equal(t, "application/json", r.Header.Get("Content-Type")) + + body, err := io.ReadAll(r.Body) + require.NoError(t, err) + var got initRequestEnvelope + require.NoError(t, json.Unmarshal(body, &got)) + require.Equal(t, []initRaceEnvelope{{RaceName: "Human"}, {RaceName: "Klingon"}}, got.Races) + + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusCreated) + _, _ = w.Write([]byte(`{ + "id": "00000000-0000-0000-0000-000000000001", + "turn": 0, + "stage": 0, + "finished": false, + "player": [ + {"id":"00000000-0000-0000-0000-000000000010","raceName":"Human","planets":3,"population":1500,"extinct":false}, + {"id":"00000000-0000-0000-0000-000000000011","raceName":"Klingon","planets":3,"population":1500,"extinct":false} + ] + }`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + state, err := client.Init(context.Background(), server.URL, ports.InitRequest{ + Races: []ports.InitRace{{RaceName: "Human"}, {RaceName: "Klingon"}}, + }) + require.NoError(t, err) + assert.Equal(t, 0, state.Turn) + assert.False(t, state.Finished) + require.Len(t, state.Players, 2) + assert.Equal(t, "Human", state.Players[0].RaceName) + assert.Equal(t, "00000000-0000-0000-0000-000000000010", state.Players[0].EnginePlayerUUID) + assert.Equal(t, 3, state.Players[0].Planets) + assert.Equal(t, 1500, state.Players[0].Population) +} + +func TestInitRejectsEmptyRaces(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + t.Fatal("must not contact engine on empty races") + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + _, err := client.Init(context.Background(), server.URL, ports.InitRequest{}) + require.Error(t, err) +} + +func TestInitValidationErrorMapsToEngineValidation(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusBadRequest) + _, _ = w.Write([]byte(`{"error":"races must contain at least 10 entries"}`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + _, err := client.Init(context.Background(), server.URL, ports.InitRequest{ + Races: []ports.InitRace{{RaceName: "X"}}, + }) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineValidation)) + assert.Contains(t, err.Error(), "must contain at least 10") +} + +func TestInitInternalErrorMapsToUnreachable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"generic_error":"boom","code":42}`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + _, err := client.Init(context.Background(), server.URL, ports.InitRequest{Races: []ports.InitRace{{RaceName: "X"}}}) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineUnreachable)) + assert.Contains(t, err.Error(), "code=42") +} + +func TestStatusHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodGet, r.Method) + require.Equal(t, "/api/v1/admin/status", r.URL.Path) + _, _ = w.Write([]byte(`{ + "id": "g-1", + "turn": 5, + "stage": 0, + "finished": false, + "player": [ + {"id":"p-1","raceName":"Human","planets":4,"population":1700.0,"extinct":false} + ] + }`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + state, err := client.Status(context.Background(), server.URL) + require.NoError(t, err) + assert.Equal(t, 5, state.Turn) + require.Len(t, state.Players, 1) + assert.Equal(t, "Human", state.Players[0].RaceName) + assert.Equal(t, 1700, state.Players[0].Population) +} + +func TestStatusUsesProbeTimeout(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + time.Sleep(120 * time.Millisecond) + _, _ = w.Write([]byte(`{}`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, 30*time.Millisecond) + _, err := client.Status(context.Background(), server.URL) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineUnreachable)) +} + +func TestTurnFinishedFlagPropagates(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPut, r.Method) + require.Equal(t, "/api/v1/admin/turn", r.URL.Path) + _, _ = w.Write([]byte(`{ + "id":"g","turn":42,"stage":0,"finished":true, + "player":[{"id":"p1","raceName":"Human","planets":0,"population":0,"extinct":true}] + }`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + state, err := client.Turn(context.Background(), server.URL) + require.NoError(t, err) + assert.Equal(t, 42, state.Turn) + assert.True(t, state.Finished) +} + +func TestDecodeProtocolViolations(t *testing.T) { + cases := map[string]string{ + "missing id": `{"turn":0,"stage":0,"finished":false,"player":[]}`, + "missing player": `{"id":"g","turn":0,"stage":0,"finished":false}`, + "missing race name": `{"id":"g","turn":0,"stage":0,"finished":false,"player":[{"id":"p","planets":0,"population":0,"extinct":false}]}`, + "missing player id": `{"id":"g","turn":0,"stage":0,"finished":false,"player":[{"raceName":"X","planets":0,"population":0,"extinct":false}]}`, + "negative planets": `{"id":"g","turn":0,"stage":0,"finished":false,"player":[{"id":"p","raceName":"X","planets":-1,"population":0,"extinct":false}]}`, + "infinite population": `{"id":"g","turn":0,"stage":0,"finished":false,"player":[{"id":"p","raceName":"X","planets":1,"population":1e400,"extinct":false}]}`, + } + for name, body := range cases { + t.Run(name, func(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(body)) + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + _, err := client.Status(context.Background(), server.URL) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineProtocolViolation), "case %q: %v", name, err) + }) + } +} + +func TestBanishRaceHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPost, r.Method) + require.Equal(t, "/api/v1/admin/race/banish", r.URL.Path) + var got banishRequestEnvelope + require.NoError(t, json.NewDecoder(r.Body).Decode(&got)) + assert.Equal(t, "Klingon", got.RaceName) + w.WriteHeader(http.StatusNoContent) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + require.NoError(t, client.BanishRace(context.Background(), server.URL, "Klingon")) +} + +func TestBanishRaceRejectsBlankName(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact engine on blank race name") + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + require.Error(t, client.BanishRace(context.Background(), server.URL, " ")) +} + +func TestBanishRaceValidationError(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusBadRequest) + _, _ = w.Write([]byte(`{"error":"unknown race"}`)) + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + err := client.BanishRace(context.Background(), server.URL, "Vulcan") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineValidation)) +} + +func TestExecuteCommandsHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPut, r.Method) + require.Equal(t, "/api/v1/command", r.URL.Path) + body, _ := io.ReadAll(r.Body) + assert.JSONEq(t, `{"actor":"Human","cmd":[]}`, string(body)) + w.WriteHeader(http.StatusNoContent) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + body, err := client.ExecuteCommands(context.Background(), server.URL, json.RawMessage(`{"actor":"Human","cmd":[]}`)) + require.NoError(t, err) + assert.Nil(t, body) +} + +func TestExecuteCommandsValidationReturnsBody(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusBadRequest) + _, _ = w.Write([]byte(`{"error":"bad command"}`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + body, err := client.ExecuteCommands(context.Background(), server.URL, json.RawMessage(`{"actor":"Human","cmd":[{}]}`)) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineValidation)) + assert.JSONEq(t, `{"error":"bad command"}`, string(body)) +} + +func TestExecuteCommandsRejectsEmptyPayload(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact engine with empty payload") + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + _, err := client.ExecuteCommands(context.Background(), server.URL, json.RawMessage(` `)) + require.Error(t, err) +} + +func TestPutOrdersHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPut, r.Method) + require.Equal(t, "/api/v1/order", r.URL.Path) + w.WriteHeader(http.StatusNoContent) + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + body, err := client.PutOrders(context.Background(), server.URL, json.RawMessage(`{"actor":"Human","cmd":[]}`)) + require.NoError(t, err) + assert.Nil(t, body) +} + +func TestGetReportHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodGet, r.Method) + require.Equal(t, "/api/v1/report", r.URL.Path) + assert.Equal(t, "Human", r.URL.Query().Get("player")) + assert.Equal(t, "7", r.URL.Query().Get("turn")) + _, _ = w.Write([]byte(`{"version":"1","turn":7,"race":"Human"}`)) + })) + defer server.Close() + + client := newTestClient(t, time.Second, time.Second) + body, err := client.GetReport(context.Background(), server.URL, "Human", 7) + require.NoError(t, err) + assert.JSONEq(t, `{"version":"1","turn":7,"race":"Human"}`, string(body)) +} + +func TestGetReportEmptyBodyIsProtocolViolation(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusOK) + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + _, err := client.GetReport(context.Background(), server.URL, "Human", 0) + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrEngineProtocolViolation)) +} + +func TestGetReportRejectsBadInput(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact engine on bad input") + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + _, err := client.GetReport(context.Background(), server.URL, " ", 0) + require.Error(t, err) + _, err = client.GetReport(context.Background(), server.URL, "Human", -1) + require.Error(t, err) +} + +func TestValidateBaseRejectsBadURLs(t *testing.T) { + client := newTestClient(t, time.Second, time.Second) + _, err := client.Status(context.Background(), "") + require.Error(t, err) + _, err = client.Status(context.Background(), "engine:8080") + require.Error(t, err) + require.Contains(t, err.Error(), "absolute") +} + +func TestCancelledContextSurfaces(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact engine with cancelled context") + })) + defer server.Close() + client := newTestClient(t, time.Second, time.Second) + ctx, cancel := context.WithCancel(context.Background()) + cancel() + _, err := client.Status(ctx, server.URL) + require.Error(t, err) + assert.True(t, errors.Is(err, context.Canceled)) +} + +func TestSummariseEngineErrorFallback(t *testing.T) { + got := summariseEngineError([]byte("not json"), 502) + assert.True(t, strings.Contains(got, "status=502")) +} + +func TestCloseIsIdempotent(t *testing.T) { + client := newTestClient(t, time.Second, time.Second) + require.NoError(t, client.Close()) + require.NoError(t, client.Close()) +} diff --git a/gamemaster/internal/adapters/lobbyclient/client.go b/gamemaster/internal/adapters/lobbyclient/client.go new file mode 100644 index 0000000..e4d0776 --- /dev/null +++ b/gamemaster/internal/adapters/lobbyclient/client.go @@ -0,0 +1,343 @@ +// Package lobbyclient provides the trusted-internal Lobby REST client +// Game Master uses to fetch membership lists for the in-process +// authorization cache and to resolve the human-readable `game_name` +// consumed by notification intents. +// +// Two endpoints are mounted today: +// +// - `GET /api/v1/internal/games/{game_id}/memberships` — pagination is +// handled internally so callers always receive every membership of +// the game; +// - `GET /api/v1/internal/games/{game_id}` — single read used by the +// turn-generation orchestrator to resolve `game_name` per +// notification. +package lobbyclient + +import ( + "bytes" + "context" + "encoding/json" + "errors" + "fmt" + "io" + "net/http" + "net/url" + "strconv" + "strings" + "time" + + "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" + + "galaxy/gamemaster/internal/ports" +) + +const ( + membershipsPathTemplate = "/api/v1/internal/games/%s/memberships" + + gameRecordPathTemplate = "/api/v1/internal/games/%s" + + // pageSize is the per-call page size; matches the Lobby spec + // maximum (200) so we walk fewer pages on large rosters. + pageSize = 200 + + // maxPages caps the page walk to defend against an upstream that + // keeps returning a `next_page_token` indefinitely. 64 pages of + // 200 items each cover 12_800 memberships per game — orders of + // magnitude beyond any realistic Galaxy roster. + maxPages = 64 +) + +// Config configures one HTTP-backed Lobby internal client. +type Config struct { + // BaseURL stores the absolute base URL of the Lobby internal HTTP + // listener (e.g. `http://lobby:8095`). + BaseURL string + + // RequestTimeout bounds one outbound page request. The total + // wall-clock for `GetMemberships` is at most + // `RequestTimeout * `, capped indirectly by the per-page + // limit and `maxPages`. + RequestTimeout time.Duration +} + +// Client resolves Lobby memberships through the trusted internal HTTP +// API. +type Client struct { + baseURL string + requestTimeout time.Duration + httpClient *http.Client + closeIdleConnections func() +} + +type membershipListEnvelope struct { + Items []membershipRecordEnvelope `json:"items"` + NextPageToken string `json:"next_page_token"` +} + +type membershipRecordEnvelope struct { + MembershipID string `json:"membership_id"` + GameID string `json:"game_id"` + UserID string `json:"user_id"` + RaceName string `json:"race_name"` + Status string `json:"status"` + JoinedAt int64 `json:"joined_at"` + RemovedAt *int64 `json:"removed_at,omitempty"` +} + +// gameRecordEnvelope captures the fields GM consumes from Lobby's +// `GameRecord` schema. Lobby may carry additional fields; the JSON +// decoder ignores them. +type gameRecordEnvelope struct { + GameID string `json:"game_id"` + GameName string `json:"game_name"` + Status string `json:"status"` +} + +type errorEnvelope struct { + Error *errorBody `json:"error"` +} + +type errorBody struct { + Code string `json:"code"` + Message string `json:"message"` +} + +// NewClient constructs a Lobby internal client with otelhttp-wrapped +// transport cloned from `http.DefaultTransport`. Call `Close` to +// release idle connections at shutdown. +func NewClient(cfg Config) (*Client, error) { + transport, ok := http.DefaultTransport.(*http.Transport) + if !ok { + return nil, errors.New("new lobby client: default transport is not *http.Transport") + } + cloned := transport.Clone() + return newClient(cfg, &http.Client{Transport: otelhttp.NewTransport(cloned)}, cloned.CloseIdleConnections) +} + +func newClient(cfg Config, httpClient *http.Client, closeIdleConnections func()) (*Client, error) { + switch { + case strings.TrimSpace(cfg.BaseURL) == "": + return nil, errors.New("new lobby client: base url must not be empty") + case cfg.RequestTimeout <= 0: + return nil, errors.New("new lobby client: request timeout must be positive") + case httpClient == nil: + return nil, errors.New("new lobby client: http client must not be nil") + } + parsed, err := url.Parse(strings.TrimRight(strings.TrimSpace(cfg.BaseURL), "/")) + if err != nil { + return nil, fmt.Errorf("new lobby client: parse base url: %w", err) + } + if parsed.Scheme == "" || parsed.Host == "" { + return nil, errors.New("new lobby client: base url must be absolute") + } + return &Client{ + baseURL: parsed.String(), + requestTimeout: cfg.RequestTimeout, + httpClient: httpClient, + closeIdleConnections: closeIdleConnections, + }, nil +} + +// Close releases idle HTTP connections owned by the underlying +// transport. Safe to call multiple times. +func (client *Client) Close() error { + if client == nil || client.closeIdleConnections == nil { + return nil + } + client.closeIdleConnections() + return nil +} + +// GetMemberships returns every membership of gameID, walking the +// pagination chain transparently. Transport faults, non-2xx responses, +// malformed payloads, and pagination overflow all surface as +// `ports.ErrLobbyUnavailable` so callers can branch with `errors.Is`. +func (client *Client) GetMemberships(ctx context.Context, gameID string) ([]ports.Membership, error) { + if client == nil || client.httpClient == nil { + return nil, errors.New("lobby get memberships: nil client") + } + if ctx == nil { + return nil, errors.New("lobby get memberships: nil context") + } + if err := ctx.Err(); err != nil { + return nil, err + } + if strings.TrimSpace(gameID) == "" { + return nil, errors.New("lobby get memberships: game id must not be empty") + } + + var memberships []ports.Membership + pathPrefix := fmt.Sprintf(membershipsPathTemplate, url.PathEscape(gameID)) + pageToken := "" + for range maxPages { + payload, statusCode, err := client.doRequest(ctx, http.MethodGet, buildPagedQuery(pathPrefix, pageToken)) + if err != nil { + return nil, fmt.Errorf("%w: %w", ports.ErrLobbyUnavailable, err) + } + if statusCode != http.StatusOK { + errorCode := decodeErrorCode(payload) + if errorCode != "" { + return nil, fmt.Errorf("%w: unexpected status %d (error_code=%s)", ports.ErrLobbyUnavailable, statusCode, errorCode) + } + return nil, fmt.Errorf("%w: unexpected status %d", ports.ErrLobbyUnavailable, statusCode) + } + var envelope membershipListEnvelope + if err := decodeJSONPayload(payload, &envelope); err != nil { + return nil, fmt.Errorf("%w: decode response: %w", ports.ErrLobbyUnavailable, err) + } + for index, item := range envelope.Items { + converted, err := toMembership(item) + if err != nil { + return nil, fmt.Errorf("%w: items[%d]: %w", ports.ErrLobbyUnavailable, index, err) + } + memberships = append(memberships, converted) + } + if strings.TrimSpace(envelope.NextPageToken) == "" { + return memberships, nil + } + pageToken = envelope.NextPageToken + } + return nil, fmt.Errorf("%w: pagination overflow after %d pages", ports.ErrLobbyUnavailable, maxPages) +} + +// GetGameSummary returns the narrow projection of Lobby's GameRecord +// (game id, game name, lifecycle status) for gameID. Transport faults, +// non-2xx responses, malformed payloads, and missing required fields +// surface as `ports.ErrLobbyUnavailable` so callers can branch with +// `errors.Is`. +func (client *Client) GetGameSummary(ctx context.Context, gameID string) (ports.GameSummary, error) { + if client == nil || client.httpClient == nil { + return ports.GameSummary{}, errors.New("lobby get game summary: nil client") + } + if ctx == nil { + return ports.GameSummary{}, errors.New("lobby get game summary: nil context") + } + if err := ctx.Err(); err != nil { + return ports.GameSummary{}, err + } + if strings.TrimSpace(gameID) == "" { + return ports.GameSummary{}, errors.New("lobby get game summary: game id must not be empty") + } + + requestPath := fmt.Sprintf(gameRecordPathTemplate, url.PathEscape(gameID)) + payload, statusCode, err := client.doRequest(ctx, http.MethodGet, requestPath) + if err != nil { + return ports.GameSummary{}, fmt.Errorf("%w: %w", ports.ErrLobbyUnavailable, err) + } + if statusCode != http.StatusOK { + errorCode := decodeErrorCode(payload) + if errorCode != "" { + return ports.GameSummary{}, fmt.Errorf( + "%w: unexpected status %d (error_code=%s)", + ports.ErrLobbyUnavailable, statusCode, errorCode, + ) + } + return ports.GameSummary{}, fmt.Errorf( + "%w: unexpected status %d", ports.ErrLobbyUnavailable, statusCode, + ) + } + var envelope gameRecordEnvelope + if err := decodeJSONPayload(payload, &envelope); err != nil { + return ports.GameSummary{}, fmt.Errorf("%w: decode response: %w", ports.ErrLobbyUnavailable, err) + } + if strings.TrimSpace(envelope.GameID) == "" { + return ports.GameSummary{}, fmt.Errorf("%w: missing game_id", ports.ErrLobbyUnavailable) + } + if strings.TrimSpace(envelope.GameName) == "" { + return ports.GameSummary{}, fmt.Errorf("%w: missing game_name", ports.ErrLobbyUnavailable) + } + if strings.TrimSpace(envelope.Status) == "" { + return ports.GameSummary{}, fmt.Errorf("%w: missing status", ports.ErrLobbyUnavailable) + } + return ports.GameSummary{ + GameID: envelope.GameID, + GameName: envelope.GameName, + Status: envelope.Status, + }, nil +} + +func buildPagedQuery(path, pageToken string) string { + params := url.Values{} + params.Set("page_size", strconv.Itoa(pageSize)) + if pageToken != "" { + params.Set("page_token", pageToken) + } + return path + "?" + params.Encode() +} + +func toMembership(record membershipRecordEnvelope) (ports.Membership, error) { + if strings.TrimSpace(record.UserID) == "" { + return ports.Membership{}, errors.New("missing user_id") + } + if strings.TrimSpace(record.RaceName) == "" { + return ports.Membership{}, errors.New("missing race_name") + } + if strings.TrimSpace(record.Status) == "" { + return ports.Membership{}, errors.New("missing status") + } + membership := ports.Membership{ + UserID: record.UserID, + RaceName: record.RaceName, + Status: record.Status, + JoinedAt: time.UnixMilli(record.JoinedAt).UTC(), + } + if record.RemovedAt != nil { + removedAt := time.UnixMilli(*record.RemovedAt).UTC() + membership.RemovedAt = &removedAt + } + return membership, nil +} + +func (client *Client) doRequest(ctx context.Context, method, requestPath string) ([]byte, int, error) { + attemptCtx, cancel := context.WithTimeout(ctx, client.requestTimeout) + defer cancel() + + req, err := http.NewRequestWithContext(attemptCtx, method, client.baseURL+requestPath, nil) + if err != nil { + return nil, 0, fmt.Errorf("build request: %w", err) + } + req.Header.Set("Accept", "application/json") + + resp, err := client.httpClient.Do(req) + if err != nil { + return nil, 0, err + } + defer resp.Body.Close() + + body, err := io.ReadAll(resp.Body) + if err != nil { + return nil, 0, fmt.Errorf("read response body: %w", err) + } + return body, resp.StatusCode, nil +} + +func decodeJSONPayload(payload []byte, target any) error { + decoder := json.NewDecoder(bytes.NewReader(payload)) + if err := decoder.Decode(target); err != nil { + return err + } + if err := decoder.Decode(&struct{}{}); err != io.EOF { + if err == nil { + return errors.New("unexpected trailing JSON input") + } + return err + } + return nil +} + +func decodeErrorCode(payload []byte) string { + if len(payload) == 0 { + return "" + } + var envelope errorEnvelope + if err := json.Unmarshal(payload, &envelope); err != nil { + return "" + } + if envelope.Error == nil { + return "" + } + return envelope.Error.Code +} + +// Compile-time assertion: Client implements ports.LobbyClient. +var _ ports.LobbyClient = (*Client)(nil) diff --git a/gamemaster/internal/adapters/lobbyclient/client_test.go b/gamemaster/internal/adapters/lobbyclient/client_test.go new file mode 100644 index 0000000..104545c --- /dev/null +++ b/gamemaster/internal/adapters/lobbyclient/client_test.go @@ -0,0 +1,344 @@ +package lobbyclient + +import ( + "context" + "errors" + "net/http" + "net/http/httptest" + "strconv" + "sync/atomic" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "galaxy/gamemaster/internal/ports" +) + +func newTestClient(t *testing.T, baseURL string, timeout time.Duration) *Client { + t.Helper() + client, err := NewClient(Config{BaseURL: baseURL, RequestTimeout: timeout}) + require.NoError(t, err) + t.Cleanup(func() { _ = client.Close() }) + return client +} + +func TestNewClientValidatesConfig(t *testing.T) { + cases := map[string]Config{ + "empty base url": {BaseURL: "", RequestTimeout: time.Second}, + "non-absolute base url": {BaseURL: "lobby:8095", RequestTimeout: time.Second}, + "non-positive timeout": {BaseURL: "http://lobby:8095", RequestTimeout: 0}, + } + for name, cfg := range cases { + t.Run(name, func(t *testing.T) { + _, err := NewClient(cfg) + require.Error(t, err) + }) + } +} + +func TestGetMembershipsHappyPathSinglePage(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodGet, r.Method) + require.Equal(t, "/api/v1/internal/games/game-1/memberships", r.URL.Path) + assert.Equal(t, strconv.Itoa(pageSize), r.URL.Query().Get("page_size")) + assert.Empty(t, r.URL.Query().Get("page_token")) + + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{ + "items": [ + {"membership_id":"m1","game_id":"game-1","user_id":"u1","race_name":"Human","status":"active","joined_at":1700000000000}, + {"membership_id":"m2","game_id":"game-1","user_id":"u2","race_name":"Klingon","status":"removed","joined_at":1700000010000,"removed_at":1700000020000} + ] + }`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + memberships, err := client.GetMemberships(context.Background(), "game-1") + require.NoError(t, err) + require.Len(t, memberships, 2) + + assert.Equal(t, "u1", memberships[0].UserID) + assert.Equal(t, "Human", memberships[0].RaceName) + assert.Equal(t, "active", memberships[0].Status) + assert.Equal(t, time.UnixMilli(1700000000000).UTC(), memberships[0].JoinedAt) + assert.Nil(t, memberships[0].RemovedAt) + + assert.Equal(t, "removed", memberships[1].Status) + require.NotNil(t, memberships[1].RemovedAt) + assert.Equal(t, time.UnixMilli(1700000020000).UTC(), *memberships[1].RemovedAt) +} + +func TestGetMembershipsFollowsPagination(t *testing.T) { + var calls atomic.Int32 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + call := calls.Add(1) + w.Header().Set("Content-Type", "application/json") + switch call { + case 1: + assert.Empty(t, r.URL.Query().Get("page_token")) + _, _ = w.Write([]byte(`{ + "items":[{"membership_id":"m1","game_id":"g","user_id":"u1","race_name":"Human","status":"active","joined_at":1}], + "next_page_token":"tok-2" + }`)) + case 2: + assert.Equal(t, "tok-2", r.URL.Query().Get("page_token")) + _, _ = w.Write([]byte(`{ + "items":[{"membership_id":"m2","game_id":"g","user_id":"u2","race_name":"Klingon","status":"active","joined_at":2}], + "next_page_token":"tok-3" + }`)) + case 3: + assert.Equal(t, "tok-3", r.URL.Query().Get("page_token")) + _, _ = w.Write([]byte(`{ + "items":[{"membership_id":"m3","game_id":"g","user_id":"u3","race_name":"Vulcan","status":"blocked","joined_at":3}] + }`)) + default: + t.Fatalf("unexpected extra call %d", call) + } + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + memberships, err := client.GetMemberships(context.Background(), "g") + require.NoError(t, err) + require.Len(t, memberships, 3) + assert.Equal(t, "u1", memberships[0].UserID) + assert.Equal(t, "u2", memberships[1].UserID) + assert.Equal(t, "u3", memberships[2].UserID) + assert.Equal(t, int32(3), calls.Load()) +} + +func TestGetMembershipsPaginationOverflow(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{"items":[],"next_page_token":"never-ends"}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetMemberships(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + assert.Contains(t, err.Error(), "pagination overflow") +} + +func TestGetMembershipsInternalErrorMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"error":{"code":"internal_error","message":"boom"}}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetMemberships(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + assert.Contains(t, err.Error(), "internal_error") +} + +func TestGetMembershipsTimeoutMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + time.Sleep(120 * time.Millisecond) + _, _ = w.Write([]byte(`{}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, 30*time.Millisecond) + _, err := client.GetMemberships(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) +} + +func TestGetMembershipsRejectsBadInput(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact lobby on bad input") + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetMemberships(context.Background(), " ") + require.Error(t, err) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + _, err = client.GetMemberships(ctx, "g") + require.Error(t, err) + assert.True(t, errors.Is(err, context.Canceled)) +} + +func TestGetMembershipsMalformedPayload(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{"items":[{"membership_id":"m","game_id":"g","user_id":"","race_name":"","status":"active","joined_at":1}]}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetMemberships(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) +} + +func TestGetMembershipsEmptyList(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{"items":[]}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + memberships, err := client.GetMemberships(context.Background(), "g") + require.NoError(t, err) + assert.Empty(t, memberships) +} + +func TestGetMembershipsTrailingJSONIsRejected(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{"items":[]}{"items":[]}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetMemberships(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) +} + +func TestGetGameSummaryHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodGet, r.Method) + require.Equal(t, "/api/v1/internal/games/game-1", r.URL.Path) + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{ + "game_id":"game-1", + "game_name":"Andromeda Conquest", + "game_type":"public", + "owner_user_id":"", + "status":"running", + "min_players":2, + "max_players":8, + "start_gap_hours":2, + "start_gap_players":4, + "enrollment_ends_at":1700000000, + "turn_schedule":"0 18 * * *", + "target_engine_version":"v1.2.3", + "created_at":1700000000000, + "updated_at":1700000000000, + "current_turn":0, + "runtime_status":"", + "engine_health_summary":"" + }`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + summary, err := client.GetGameSummary(context.Background(), "game-1") + require.NoError(t, err) + assert.Equal(t, ports.GameSummary{ + GameID: "game-1", + GameName: "Andromeda Conquest", + Status: "running", + }, summary) +} + +func TestGetGameSummaryNotFoundMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusNotFound) + _, _ = w.Write([]byte(`{"error":{"code":"not_found","message":"game not found"}}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetGameSummary(context.Background(), "missing") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + assert.Contains(t, err.Error(), "not_found") +} + +func TestGetGameSummaryInternalErrorMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"error":{"code":"internal_error","message":"boom"}}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetGameSummary(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + assert.Contains(t, err.Error(), "internal_error") +} + +func TestGetGameSummaryTimeoutMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + time.Sleep(120 * time.Millisecond) + _, _ = w.Write([]byte(`{}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, 30*time.Millisecond) + _, err := client.GetGameSummary(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) +} + +func TestGetGameSummaryMalformedJSON(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{not-json}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetGameSummary(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) +} + +func TestGetGameSummaryMissingRequiredFields(t *testing.T) { + cases := map[string]string{ + "missing game_id": `{"game_name":"Andromeda","status":"running"}`, + "missing game_name": `{"game_id":"g","status":"running"}`, + "missing status": `{"game_id":"g","game_name":"Andromeda"}`, + } + for name, body := range cases { + t.Run(name, func(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(body)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetGameSummary(context.Background(), "g") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + }) + } +} + +func TestGetGameSummaryRejectsBadInput(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(_ http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact lobby on bad input") + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + _, err := client.GetGameSummary(context.Background(), " ") + require.Error(t, err) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + _, err = client.GetGameSummary(ctx, "g") + require.Error(t, err) + assert.True(t, errors.Is(err, context.Canceled)) +} + +func TestCloseIsIdempotent(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{"items":[]}`)) + })) + defer server.Close() + client := newTestClient(t, server.URL, time.Second) + _, _ = client.GetMemberships(context.Background(), "g") + require.NoError(t, client.Close()) + require.NoError(t, client.Close()) +} + diff --git a/gamemaster/internal/adapters/lobbyeventspublisher/publisher.go b/gamemaster/internal/adapters/lobbyeventspublisher/publisher.go new file mode 100644 index 0000000..05b4955 --- /dev/null +++ b/gamemaster/internal/adapters/lobbyeventspublisher/publisher.go @@ -0,0 +1,180 @@ +// Package lobbyeventspublisher provides the Redis-Streams-backed +// publisher for `gm:lobby_events`. The stream carries two distinct +// message types — `runtime_snapshot_update` and `game_finished` — +// discriminated by the `event_type` field as fixed by +// `gamemaster/api/runtime-events-asyncapi.yaml`. +// +// The adapter mirrors `rtmanager/internal/adapters/healtheventspublisher` +// behaviourally: the publisher validates the message before XADDing, +// emits one entry per call, and never trims the stream (consumers own +// their consumer-group offsets). +package lobbyeventspublisher + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "strconv" + + "github.com/redis/go-redis/v9" + + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" +) + +// Wire field names used by the Redis Streams payload. Frozen by +// `gamemaster/api/runtime-events-asyncapi.yaml`; renaming any of them +// breaks Game Lobby's consumer. +const ( + fieldEventType = "event_type" + fieldGameID = "game_id" + fieldCurrentTurn = "current_turn" + fieldFinalTurnNumber = "final_turn_number" + fieldRuntimeStatus = "runtime_status" + fieldEngineHealthSummary = "engine_health_summary" + fieldPlayerTurnStats = "player_turn_stats" + fieldOccurredAtMS = "occurred_at_ms" + fieldFinishedAtMS = "finished_at_ms" + + eventTypeRuntimeSnapshotUpdate = "runtime_snapshot_update" + eventTypeGameFinished = "game_finished" + + emptyPlayerTurnStatsJSON = "[]" +) + +// Config groups the dependencies and stream name required to +// construct a Publisher. +type Config struct { + // Client appends entries to Redis Streams. Must be non-nil. + Client *redis.Client + + // Stream stores the Redis Stream key events are published to. + // Must not be empty (typically `gm:lobby_events`). + Stream string +} + +// Publisher implements `ports.LobbyEventsPublisher` on top of a shared +// Redis client. +type Publisher struct { + client *redis.Client + stream string +} + +// NewPublisher constructs a Publisher from cfg. Validation errors +// surface the missing collaborator verbatim. +func NewPublisher(cfg Config) (*Publisher, error) { + if cfg.Client == nil { + return nil, errors.New("new gamemaster lobby events publisher: nil redis client") + } + if cfg.Stream == "" { + return nil, errors.New("new gamemaster lobby events publisher: stream must not be empty") + } + return &Publisher{client: cfg.Client, stream: cfg.Stream}, nil +} + +// PublishSnapshotUpdate appends a `runtime_snapshot_update` message to +// the stream after validating msg through msg.Validate. +func (publisher *Publisher) PublishSnapshotUpdate(ctx context.Context, msg ports.RuntimeSnapshotUpdate) error { + if err := publisher.guardCall(ctx); err != nil { + return err + } + if err := msg.Validate(); err != nil { + return fmt.Errorf("publish runtime snapshot update: %w", err) + } + statsJSON, err := encodePlayerTurnStats(msg.PlayerTurnStats) + if err != nil { + return fmt.Errorf("publish runtime snapshot update: %w", err) + } + values := map[string]any{ + fieldEventType: eventTypeRuntimeSnapshotUpdate, + fieldGameID: msg.GameID, + fieldCurrentTurn: strconv.Itoa(msg.CurrentTurn), + fieldRuntimeStatus: string(msg.RuntimeStatus), + fieldEngineHealthSummary: msg.EngineHealthSummary, + fieldPlayerTurnStats: statsJSON, + fieldOccurredAtMS: strconv.FormatInt(msg.OccurredAt.UTC().UnixMilli(), 10), + } + if err := publisher.client.XAdd(ctx, &redis.XAddArgs{ + Stream: publisher.stream, + Values: values, + }).Err(); err != nil { + return fmt.Errorf("publish runtime snapshot update: xadd: %w", err) + } + return nil +} + +// PublishGameFinished appends a `game_finished` message to the stream +// after validating msg through msg.Validate. +func (publisher *Publisher) PublishGameFinished(ctx context.Context, msg ports.GameFinished) error { + if err := publisher.guardCall(ctx); err != nil { + return err + } + if err := msg.Validate(); err != nil { + return fmt.Errorf("publish game finished: %w", err) + } + if msg.RuntimeStatus != runtime.StatusFinished { + return fmt.Errorf("publish game finished: runtime status must be %q, got %q", runtime.StatusFinished, msg.RuntimeStatus) + } + statsJSON, err := encodePlayerTurnStats(msg.PlayerTurnStats) + if err != nil { + return fmt.Errorf("publish game finished: %w", err) + } + values := map[string]any{ + fieldEventType: eventTypeGameFinished, + fieldGameID: msg.GameID, + fieldFinalTurnNumber: strconv.Itoa(msg.FinalTurnNumber), + fieldRuntimeStatus: string(msg.RuntimeStatus), + fieldPlayerTurnStats: statsJSON, + fieldFinishedAtMS: strconv.FormatInt(msg.FinishedAt.UTC().UnixMilli(), 10), + } + if err := publisher.client.XAdd(ctx, &redis.XAddArgs{ + Stream: publisher.stream, + Values: values, + }).Err(); err != nil { + return fmt.Errorf("publish game finished: xadd: %w", err) + } + return nil +} + +func (publisher *Publisher) guardCall(ctx context.Context) error { + if publisher == nil || publisher.client == nil { + return errors.New("nil publisher") + } + if ctx == nil { + return errors.New("nil context") + } + return nil +} + +// encodePlayerTurnStats returns the JSON serialisation of the per-player +// stats array. Empty input becomes the literal `[]` so the stream entry +// always carries a valid JSON document for the field. +func encodePlayerTurnStats(stats []ports.PlayerTurnStats) (string, error) { + if len(stats) == 0 { + return emptyPlayerTurnStatsJSON, nil + } + envelope := make([]playerTurnStatEnvelope, 0, len(stats)) + for _, item := range stats { + envelope = append(envelope, playerTurnStatEnvelope{ + UserID: item.UserID, + Planets: item.Planets, + Population: item.Population, + }) + } + encoded, err := json.Marshal(envelope) + if err != nil { + return "", fmt.Errorf("encode player turn stats: %w", err) + } + return string(encoded), nil +} + +type playerTurnStatEnvelope struct { + UserID string `json:"user_id"` + Planets int `json:"planets"` + Population int `json:"population"` +} + +// Compile-time assertion: Publisher implements +// ports.LobbyEventsPublisher. +var _ ports.LobbyEventsPublisher = (*Publisher)(nil) diff --git a/gamemaster/internal/adapters/lobbyeventspublisher/publisher_test.go b/gamemaster/internal/adapters/lobbyeventspublisher/publisher_test.go new file mode 100644 index 0000000..dccc010 --- /dev/null +++ b/gamemaster/internal/adapters/lobbyeventspublisher/publisher_test.go @@ -0,0 +1,186 @@ +package lobbyeventspublisher + +import ( + "context" + "encoding/json" + "strconv" + "testing" + "time" + + "github.com/alicebob/miniredis/v2" + "github.com/redis/go-redis/v9" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" +) + +const testStream = "gm:lobby_events" + +func newTestPublisher(t *testing.T) (*Publisher, *redis.Client) { + t.Helper() + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + publisher, err := NewPublisher(Config{Client: client, Stream: testStream}) + require.NoError(t, err) + return publisher, client +} + +func TestNewPublisherValidation(t *testing.T) { + t.Run("nil client", func(t *testing.T) { + _, err := NewPublisher(Config{Stream: testStream}) + require.Error(t, err) + }) + t.Run("empty stream", func(t *testing.T) { + client := redis.NewClient(&redis.Options{Addr: "127.0.0.1:0"}) + t.Cleanup(func() { _ = client.Close() }) + _, err := NewPublisher(Config{Client: client}) + require.Error(t, err) + }) +} + +func TestPublishSnapshotUpdateHappyPath(t *testing.T) { + publisher, client := newTestPublisher(t) + + occurredAt := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + msg := ports.RuntimeSnapshotUpdate{ + GameID: "game-1", + CurrentTurn: 17, + RuntimeStatus: runtime.StatusRunning, + EngineHealthSummary: "healthy", + PlayerTurnStats: []ports.PlayerTurnStats{ + {UserID: "user-1", Planets: 4, Population: 12000}, + {UserID: "user-2", Planets: 3, Population: 9000}, + }, + OccurredAt: occurredAt, + } + require.NoError(t, publisher.PublishSnapshotUpdate(context.Background(), msg)) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + require.Len(t, entries, 1) + values := entries[0].Values + assert.Equal(t, "runtime_snapshot_update", values[fieldEventType]) + assert.Equal(t, "game-1", values[fieldGameID]) + assert.Equal(t, "17", values[fieldCurrentTurn]) + assert.Equal(t, "running", values[fieldRuntimeStatus]) + assert.Equal(t, "healthy", values[fieldEngineHealthSummary]) + assert.Equal(t, strconv.FormatInt(occurredAt.UnixMilli(), 10), values[fieldOccurredAtMS]) + + statsRaw, ok := values[fieldPlayerTurnStats].(string) + require.True(t, ok) + var stats []playerTurnStatEnvelope + require.NoError(t, json.Unmarshal([]byte(statsRaw), &stats)) + assert.Equal(t, []playerTurnStatEnvelope{ + {UserID: "user-1", Planets: 4, Population: 12000}, + {UserID: "user-2", Planets: 3, Population: 9000}, + }, stats) +} + +func TestPublishSnapshotUpdateEmptyStatsBecomesArray(t *testing.T) { + publisher, client := newTestPublisher(t) + msg := ports.RuntimeSnapshotUpdate{ + GameID: "g", + CurrentTurn: 0, + RuntimeStatus: runtime.StatusStarting, + EngineHealthSummary: "", + OccurredAt: time.Now().UTC(), + } + require.NoError(t, publisher.PublishSnapshotUpdate(context.Background(), msg)) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + require.Len(t, entries, 1) + assert.Equal(t, "[]", entries[0].Values[fieldPlayerTurnStats]) +} + +func TestPublishSnapshotUpdateRejectsInvalid(t *testing.T) { + publisher, client := newTestPublisher(t) + require.Error(t, publisher.PublishSnapshotUpdate(context.Background(), ports.RuntimeSnapshotUpdate{})) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + assert.Empty(t, entries, "invalid messages must not reach the stream") +} + +func TestPublishGameFinishedHappyPath(t *testing.T) { + publisher, client := newTestPublisher(t) + + finishedAt := time.Date(2026, 4, 28, 8, 30, 0, 0, time.UTC) + msg := ports.GameFinished{ + GameID: "game-1", + FinalTurnNumber: 42, + RuntimeStatus: runtime.StatusFinished, + PlayerTurnStats: []ports.PlayerTurnStats{ + {UserID: "user-1", Planets: 6, Population: 25000}, + {UserID: "user-2", Planets: 0, Population: 0}, + }, + FinishedAt: finishedAt, + } + require.NoError(t, publisher.PublishGameFinished(context.Background(), msg)) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + require.Len(t, entries, 1) + values := entries[0].Values + assert.Equal(t, "game_finished", values[fieldEventType]) + assert.Equal(t, "game-1", values[fieldGameID]) + assert.Equal(t, "42", values[fieldFinalTurnNumber]) + assert.Equal(t, "finished", values[fieldRuntimeStatus]) + assert.Equal(t, strconv.FormatInt(finishedAt.UnixMilli(), 10), values[fieldFinishedAtMS]) + + _, hasOccurred := values[fieldOccurredAtMS] + assert.False(t, hasOccurred, "game_finished must not carry occurred_at_ms") + _, hasCurrentTurn := values[fieldCurrentTurn] + assert.False(t, hasCurrentTurn, "game_finished must not carry current_turn") + _, hasHealth := values[fieldEngineHealthSummary] + assert.False(t, hasHealth, "game_finished must not carry engine_health_summary") +} + +func TestPublishGameFinishedRejectsBadStatus(t *testing.T) { + publisher, client := newTestPublisher(t) + require.Error(t, publisher.PublishGameFinished(context.Background(), ports.GameFinished{ + GameID: "g", + FinalTurnNumber: 1, + RuntimeStatus: runtime.StatusRunning, // wrong status + FinishedAt: time.Now().UTC(), + })) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + assert.Empty(t, entries) +} + +func TestTimestampsNormalisedToUTC(t *testing.T) { + publisher, client := newTestPublisher(t) + loc, err := time.LoadLocation("Asia/Tokyo") + require.NoError(t, err) + + msg := ports.RuntimeSnapshotUpdate{ + GameID: "g", + CurrentTurn: 1, + RuntimeStatus: runtime.StatusRunning, + OccurredAt: time.Date(2026, 4, 27, 21, 0, 0, 0, loc), + } + require.NoError(t, publisher.PublishSnapshotUpdate(context.Background(), msg)) + + entries, err := client.XRange(context.Background(), testStream, "-", "+").Result() + require.NoError(t, err) + require.Len(t, entries, 1) + wantMs := msg.OccurredAt.UTC().UnixMilli() + assert.Equal(t, strconv.FormatInt(wantMs, 10), entries[0].Values[fieldOccurredAtMS]) +} + +func TestRejectsNilContext(t *testing.T) { + publisher, _ := newTestPublisher(t) + //nolint:staticcheck // explicitly testing nil-context rejection. + err := publisher.PublishSnapshotUpdate(nil, ports.RuntimeSnapshotUpdate{ + GameID: "g", + CurrentTurn: 0, + RuntimeStatus: runtime.StatusStarting, + OccurredAt: time.Now().UTC(), + }) + require.Error(t, err) +} diff --git a/gamemaster/internal/adapters/mocks/mock_engineclient.go b/gamemaster/internal/adapters/mocks/mock_engineclient.go new file mode 100644 index 0000000..7d150ae --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_engineclient.go @@ -0,0 +1,147 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: EngineClient) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_engineclient.go -package=mocks galaxy/gamemaster/internal/ports EngineClient +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + json "encoding/json" + ports "galaxy/gamemaster/internal/ports" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockEngineClient is a mock of EngineClient interface. +type MockEngineClient struct { + ctrl *gomock.Controller + recorder *MockEngineClientMockRecorder + isgomock struct{} +} + +// MockEngineClientMockRecorder is the mock recorder for MockEngineClient. +type MockEngineClientMockRecorder struct { + mock *MockEngineClient +} + +// NewMockEngineClient creates a new mock instance. +func NewMockEngineClient(ctrl *gomock.Controller) *MockEngineClient { + mock := &MockEngineClient{ctrl: ctrl} + mock.recorder = &MockEngineClientMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockEngineClient) EXPECT() *MockEngineClientMockRecorder { + return m.recorder +} + +// BanishRace mocks base method. +func (m *MockEngineClient) BanishRace(ctx context.Context, baseURL, raceName string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "BanishRace", ctx, baseURL, raceName) + ret0, _ := ret[0].(error) + return ret0 +} + +// BanishRace indicates an expected call of BanishRace. +func (mr *MockEngineClientMockRecorder) BanishRace(ctx, baseURL, raceName any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "BanishRace", reflect.TypeOf((*MockEngineClient)(nil).BanishRace), ctx, baseURL, raceName) +} + +// ExecuteCommands mocks base method. +func (m *MockEngineClient) ExecuteCommands(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ExecuteCommands", ctx, baseURL, payload) + ret0, _ := ret[0].(json.RawMessage) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ExecuteCommands indicates an expected call of ExecuteCommands. +func (mr *MockEngineClientMockRecorder) ExecuteCommands(ctx, baseURL, payload any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ExecuteCommands", reflect.TypeOf((*MockEngineClient)(nil).ExecuteCommands), ctx, baseURL, payload) +} + +// GetReport mocks base method. +func (m *MockEngineClient) GetReport(ctx context.Context, baseURL, raceName string, turn int) (json.RawMessage, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "GetReport", ctx, baseURL, raceName, turn) + ret0, _ := ret[0].(json.RawMessage) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// GetReport indicates an expected call of GetReport. +func (mr *MockEngineClientMockRecorder) GetReport(ctx, baseURL, raceName, turn any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "GetReport", reflect.TypeOf((*MockEngineClient)(nil).GetReport), ctx, baseURL, raceName, turn) +} + +// Init mocks base method. +func (m *MockEngineClient) Init(ctx context.Context, baseURL string, request ports.InitRequest) (ports.StateResponse, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Init", ctx, baseURL, request) + ret0, _ := ret[0].(ports.StateResponse) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Init indicates an expected call of Init. +func (mr *MockEngineClientMockRecorder) Init(ctx, baseURL, request any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Init", reflect.TypeOf((*MockEngineClient)(nil).Init), ctx, baseURL, request) +} + +// PutOrders mocks base method. +func (m *MockEngineClient) PutOrders(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "PutOrders", ctx, baseURL, payload) + ret0, _ := ret[0].(json.RawMessage) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// PutOrders indicates an expected call of PutOrders. +func (mr *MockEngineClientMockRecorder) PutOrders(ctx, baseURL, payload any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "PutOrders", reflect.TypeOf((*MockEngineClient)(nil).PutOrders), ctx, baseURL, payload) +} + +// Status mocks base method. +func (m *MockEngineClient) Status(ctx context.Context, baseURL string) (ports.StateResponse, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Status", ctx, baseURL) + ret0, _ := ret[0].(ports.StateResponse) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Status indicates an expected call of Status. +func (mr *MockEngineClientMockRecorder) Status(ctx, baseURL any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Status", reflect.TypeOf((*MockEngineClient)(nil).Status), ctx, baseURL) +} + +// Turn mocks base method. +func (m *MockEngineClient) Turn(ctx context.Context, baseURL string) (ports.StateResponse, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Turn", ctx, baseURL) + ret0, _ := ret[0].(ports.StateResponse) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Turn indicates an expected call of Turn. +func (mr *MockEngineClientMockRecorder) Turn(ctx, baseURL any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Turn", reflect.TypeOf((*MockEngineClient)(nil).Turn), ctx, baseURL) +} diff --git a/gamemaster/internal/adapters/mocks/mock_engineversionstore.go b/gamemaster/internal/adapters/mocks/mock_engineversionstore.go new file mode 100644 index 0000000..1a16d41 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_engineversionstore.go @@ -0,0 +1,145 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: EngineVersionStore) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_engineversionstore.go -package=mocks galaxy/gamemaster/internal/ports EngineVersionStore +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + engineversion "galaxy/gamemaster/internal/domain/engineversion" + ports "galaxy/gamemaster/internal/ports" + reflect "reflect" + time "time" + + gomock "go.uber.org/mock/gomock" +) + +// MockEngineVersionStore is a mock of EngineVersionStore interface. +type MockEngineVersionStore struct { + ctrl *gomock.Controller + recorder *MockEngineVersionStoreMockRecorder + isgomock struct{} +} + +// MockEngineVersionStoreMockRecorder is the mock recorder for MockEngineVersionStore. +type MockEngineVersionStoreMockRecorder struct { + mock *MockEngineVersionStore +} + +// NewMockEngineVersionStore creates a new mock instance. +func NewMockEngineVersionStore(ctrl *gomock.Controller) *MockEngineVersionStore { + mock := &MockEngineVersionStore{ctrl: ctrl} + mock.recorder = &MockEngineVersionStoreMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockEngineVersionStore) EXPECT() *MockEngineVersionStoreMockRecorder { + return m.recorder +} + +// Delete mocks base method. +func (m *MockEngineVersionStore) Delete(ctx context.Context, version string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Delete", ctx, version) + ret0, _ := ret[0].(error) + return ret0 +} + +// Delete indicates an expected call of Delete. +func (mr *MockEngineVersionStoreMockRecorder) Delete(ctx, version any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Delete", reflect.TypeOf((*MockEngineVersionStore)(nil).Delete), ctx, version) +} + +// Deprecate mocks base method. +func (m *MockEngineVersionStore) Deprecate(ctx context.Context, version string, now time.Time) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Deprecate", ctx, version, now) + ret0, _ := ret[0].(error) + return ret0 +} + +// Deprecate indicates an expected call of Deprecate. +func (mr *MockEngineVersionStoreMockRecorder) Deprecate(ctx, version, now any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Deprecate", reflect.TypeOf((*MockEngineVersionStore)(nil).Deprecate), ctx, version, now) +} + +// Get mocks base method. +func (m *MockEngineVersionStore) Get(ctx context.Context, version string) (engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Get", ctx, version) + ret0, _ := ret[0].(engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Get indicates an expected call of Get. +func (mr *MockEngineVersionStoreMockRecorder) Get(ctx, version any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockEngineVersionStore)(nil).Get), ctx, version) +} + +// Insert mocks base method. +func (m *MockEngineVersionStore) Insert(ctx context.Context, record engineversion.EngineVersion) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Insert", ctx, record) + ret0, _ := ret[0].(error) + return ret0 +} + +// Insert indicates an expected call of Insert. +func (mr *MockEngineVersionStoreMockRecorder) Insert(ctx, record any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Insert", reflect.TypeOf((*MockEngineVersionStore)(nil).Insert), ctx, record) +} + +// IsReferencedByActiveRuntime mocks base method. +func (m *MockEngineVersionStore) IsReferencedByActiveRuntime(ctx context.Context, version string) (bool, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "IsReferencedByActiveRuntime", ctx, version) + ret0, _ := ret[0].(bool) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// IsReferencedByActiveRuntime indicates an expected call of IsReferencedByActiveRuntime. +func (mr *MockEngineVersionStoreMockRecorder) IsReferencedByActiveRuntime(ctx, version any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "IsReferencedByActiveRuntime", reflect.TypeOf((*MockEngineVersionStore)(nil).IsReferencedByActiveRuntime), ctx, version) +} + +// List mocks base method. +func (m *MockEngineVersionStore) List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "List", ctx, statusFilter) + ret0, _ := ret[0].([]engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// List indicates an expected call of List. +func (mr *MockEngineVersionStoreMockRecorder) List(ctx, statusFilter any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "List", reflect.TypeOf((*MockEngineVersionStore)(nil).List), ctx, statusFilter) +} + +// Update mocks base method. +func (m *MockEngineVersionStore) Update(ctx context.Context, input ports.UpdateEngineVersionInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Update", ctx, input) + ret0, _ := ret[0].(error) + return ret0 +} + +// Update indicates an expected call of Update. +func (mr *MockEngineVersionStoreMockRecorder) Update(ctx, input any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Update", reflect.TypeOf((*MockEngineVersionStore)(nil).Update), ctx, input) +} diff --git a/gamemaster/internal/adapters/mocks/mock_lobbyclient.go b/gamemaster/internal/adapters/mocks/mock_lobbyclient.go new file mode 100644 index 0000000..1f82a67 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_lobbyclient.go @@ -0,0 +1,72 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: LobbyClient) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_lobbyclient.go -package=mocks galaxy/gamemaster/internal/ports LobbyClient +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + ports "galaxy/gamemaster/internal/ports" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockLobbyClient is a mock of LobbyClient interface. +type MockLobbyClient struct { + ctrl *gomock.Controller + recorder *MockLobbyClientMockRecorder + isgomock struct{} +} + +// MockLobbyClientMockRecorder is the mock recorder for MockLobbyClient. +type MockLobbyClientMockRecorder struct { + mock *MockLobbyClient +} + +// NewMockLobbyClient creates a new mock instance. +func NewMockLobbyClient(ctrl *gomock.Controller) *MockLobbyClient { + mock := &MockLobbyClient{ctrl: ctrl} + mock.recorder = &MockLobbyClientMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockLobbyClient) EXPECT() *MockLobbyClientMockRecorder { + return m.recorder +} + +// GetGameSummary mocks base method. +func (m *MockLobbyClient) GetGameSummary(ctx context.Context, gameID string) (ports.GameSummary, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "GetGameSummary", ctx, gameID) + ret0, _ := ret[0].(ports.GameSummary) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// GetGameSummary indicates an expected call of GetGameSummary. +func (mr *MockLobbyClientMockRecorder) GetGameSummary(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "GetGameSummary", reflect.TypeOf((*MockLobbyClient)(nil).GetGameSummary), ctx, gameID) +} + +// GetMemberships mocks base method. +func (m *MockLobbyClient) GetMemberships(ctx context.Context, gameID string) ([]ports.Membership, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "GetMemberships", ctx, gameID) + ret0, _ := ret[0].([]ports.Membership) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// GetMemberships indicates an expected call of GetMemberships. +func (mr *MockLobbyClientMockRecorder) GetMemberships(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "GetMemberships", reflect.TypeOf((*MockLobbyClient)(nil).GetMemberships), ctx, gameID) +} diff --git a/gamemaster/internal/adapters/mocks/mock_lobbyeventspublisher.go b/gamemaster/internal/adapters/mocks/mock_lobbyeventspublisher.go new file mode 100644 index 0000000..7f7abf8 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_lobbyeventspublisher.go @@ -0,0 +1,70 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: LobbyEventsPublisher) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_lobbyeventspublisher.go -package=mocks galaxy/gamemaster/internal/ports LobbyEventsPublisher +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + ports "galaxy/gamemaster/internal/ports" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockLobbyEventsPublisher is a mock of LobbyEventsPublisher interface. +type MockLobbyEventsPublisher struct { + ctrl *gomock.Controller + recorder *MockLobbyEventsPublisherMockRecorder + isgomock struct{} +} + +// MockLobbyEventsPublisherMockRecorder is the mock recorder for MockLobbyEventsPublisher. +type MockLobbyEventsPublisherMockRecorder struct { + mock *MockLobbyEventsPublisher +} + +// NewMockLobbyEventsPublisher creates a new mock instance. +func NewMockLobbyEventsPublisher(ctrl *gomock.Controller) *MockLobbyEventsPublisher { + mock := &MockLobbyEventsPublisher{ctrl: ctrl} + mock.recorder = &MockLobbyEventsPublisherMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockLobbyEventsPublisher) EXPECT() *MockLobbyEventsPublisherMockRecorder { + return m.recorder +} + +// PublishGameFinished mocks base method. +func (m *MockLobbyEventsPublisher) PublishGameFinished(ctx context.Context, msg ports.GameFinished) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "PublishGameFinished", ctx, msg) + ret0, _ := ret[0].(error) + return ret0 +} + +// PublishGameFinished indicates an expected call of PublishGameFinished. +func (mr *MockLobbyEventsPublisherMockRecorder) PublishGameFinished(ctx, msg any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "PublishGameFinished", reflect.TypeOf((*MockLobbyEventsPublisher)(nil).PublishGameFinished), ctx, msg) +} + +// PublishSnapshotUpdate mocks base method. +func (m *MockLobbyEventsPublisher) PublishSnapshotUpdate(ctx context.Context, msg ports.RuntimeSnapshotUpdate) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "PublishSnapshotUpdate", ctx, msg) + ret0, _ := ret[0].(error) + return ret0 +} + +// PublishSnapshotUpdate indicates an expected call of PublishSnapshotUpdate. +func (mr *MockLobbyEventsPublisherMockRecorder) PublishSnapshotUpdate(ctx, msg any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "PublishSnapshotUpdate", reflect.TypeOf((*MockLobbyEventsPublisher)(nil).PublishSnapshotUpdate), ctx, msg) +} diff --git a/gamemaster/internal/adapters/mocks/mock_notificationpublisher.go b/gamemaster/internal/adapters/mocks/mock_notificationpublisher.go new file mode 100644 index 0000000..1c37c61 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_notificationpublisher.go @@ -0,0 +1,56 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: NotificationIntentPublisher) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_notificationpublisher.go -package=mocks galaxy/gamemaster/internal/ports NotificationIntentPublisher +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + notificationintent "galaxy/notificationintent" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockNotificationIntentPublisher is a mock of NotificationIntentPublisher interface. +type MockNotificationIntentPublisher struct { + ctrl *gomock.Controller + recorder *MockNotificationIntentPublisherMockRecorder + isgomock struct{} +} + +// MockNotificationIntentPublisherMockRecorder is the mock recorder for MockNotificationIntentPublisher. +type MockNotificationIntentPublisherMockRecorder struct { + mock *MockNotificationIntentPublisher +} + +// NewMockNotificationIntentPublisher creates a new mock instance. +func NewMockNotificationIntentPublisher(ctrl *gomock.Controller) *MockNotificationIntentPublisher { + mock := &MockNotificationIntentPublisher{ctrl: ctrl} + mock.recorder = &MockNotificationIntentPublisherMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockNotificationIntentPublisher) EXPECT() *MockNotificationIntentPublisherMockRecorder { + return m.recorder +} + +// Publish mocks base method. +func (m *MockNotificationIntentPublisher) Publish(ctx context.Context, intent notificationintent.Intent) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Publish", ctx, intent) + ret0, _ := ret[0].(error) + return ret0 +} + +// Publish indicates an expected call of Publish. +func (mr *MockNotificationIntentPublisherMockRecorder) Publish(ctx, intent any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Publish", reflect.TypeOf((*MockNotificationIntentPublisher)(nil).Publish), ctx, intent) +} diff --git a/gamemaster/internal/adapters/mocks/mock_operationlog.go b/gamemaster/internal/adapters/mocks/mock_operationlog.go new file mode 100644 index 0000000..42c2357 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_operationlog.go @@ -0,0 +1,72 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: OperationLogStore) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_operationlog.go -package=mocks galaxy/gamemaster/internal/ports OperationLogStore +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + operation "galaxy/gamemaster/internal/domain/operation" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockOperationLogStore is a mock of OperationLogStore interface. +type MockOperationLogStore struct { + ctrl *gomock.Controller + recorder *MockOperationLogStoreMockRecorder + isgomock struct{} +} + +// MockOperationLogStoreMockRecorder is the mock recorder for MockOperationLogStore. +type MockOperationLogStoreMockRecorder struct { + mock *MockOperationLogStore +} + +// NewMockOperationLogStore creates a new mock instance. +func NewMockOperationLogStore(ctrl *gomock.Controller) *MockOperationLogStore { + mock := &MockOperationLogStore{ctrl: ctrl} + mock.recorder = &MockOperationLogStoreMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockOperationLogStore) EXPECT() *MockOperationLogStoreMockRecorder { + return m.recorder +} + +// Append mocks base method. +func (m *MockOperationLogStore) Append(ctx context.Context, entry operation.OperationEntry) (int64, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Append", ctx, entry) + ret0, _ := ret[0].(int64) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Append indicates an expected call of Append. +func (mr *MockOperationLogStoreMockRecorder) Append(ctx, entry any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Append", reflect.TypeOf((*MockOperationLogStore)(nil).Append), ctx, entry) +} + +// ListByGame mocks base method. +func (m *MockOperationLogStore) ListByGame(ctx context.Context, gameID string, limit int) ([]operation.OperationEntry, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ListByGame", ctx, gameID, limit) + ret0, _ := ret[0].([]operation.OperationEntry) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ListByGame indicates an expected call of ListByGame. +func (mr *MockOperationLogStoreMockRecorder) ListByGame(ctx, gameID, limit any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ListByGame", reflect.TypeOf((*MockOperationLogStore)(nil).ListByGame), ctx, gameID, limit) +} diff --git a/gamemaster/internal/adapters/mocks/mock_playermappingstore.go b/gamemaster/internal/adapters/mocks/mock_playermappingstore.go new file mode 100644 index 0000000..a9c1aa6 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_playermappingstore.go @@ -0,0 +1,115 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: PlayerMappingStore) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_playermappingstore.go -package=mocks galaxy/gamemaster/internal/ports PlayerMappingStore +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + playermapping "galaxy/gamemaster/internal/domain/playermapping" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockPlayerMappingStore is a mock of PlayerMappingStore interface. +type MockPlayerMappingStore struct { + ctrl *gomock.Controller + recorder *MockPlayerMappingStoreMockRecorder + isgomock struct{} +} + +// MockPlayerMappingStoreMockRecorder is the mock recorder for MockPlayerMappingStore. +type MockPlayerMappingStoreMockRecorder struct { + mock *MockPlayerMappingStore +} + +// NewMockPlayerMappingStore creates a new mock instance. +func NewMockPlayerMappingStore(ctrl *gomock.Controller) *MockPlayerMappingStore { + mock := &MockPlayerMappingStore{ctrl: ctrl} + mock.recorder = &MockPlayerMappingStoreMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockPlayerMappingStore) EXPECT() *MockPlayerMappingStoreMockRecorder { + return m.recorder +} + +// BulkInsert mocks base method. +func (m *MockPlayerMappingStore) BulkInsert(ctx context.Context, records []playermapping.PlayerMapping) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "BulkInsert", ctx, records) + ret0, _ := ret[0].(error) + return ret0 +} + +// BulkInsert indicates an expected call of BulkInsert. +func (mr *MockPlayerMappingStoreMockRecorder) BulkInsert(ctx, records any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "BulkInsert", reflect.TypeOf((*MockPlayerMappingStore)(nil).BulkInsert), ctx, records) +} + +// DeleteByGame mocks base method. +func (m *MockPlayerMappingStore) DeleteByGame(ctx context.Context, gameID string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "DeleteByGame", ctx, gameID) + ret0, _ := ret[0].(error) + return ret0 +} + +// DeleteByGame indicates an expected call of DeleteByGame. +func (mr *MockPlayerMappingStoreMockRecorder) DeleteByGame(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "DeleteByGame", reflect.TypeOf((*MockPlayerMappingStore)(nil).DeleteByGame), ctx, gameID) +} + +// Get mocks base method. +func (m *MockPlayerMappingStore) Get(ctx context.Context, gameID, userID string) (playermapping.PlayerMapping, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Get", ctx, gameID, userID) + ret0, _ := ret[0].(playermapping.PlayerMapping) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Get indicates an expected call of Get. +func (mr *MockPlayerMappingStoreMockRecorder) Get(ctx, gameID, userID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockPlayerMappingStore)(nil).Get), ctx, gameID, userID) +} + +// GetByRace mocks base method. +func (m *MockPlayerMappingStore) GetByRace(ctx context.Context, gameID, raceName string) (playermapping.PlayerMapping, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "GetByRace", ctx, gameID, raceName) + ret0, _ := ret[0].(playermapping.PlayerMapping) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// GetByRace indicates an expected call of GetByRace. +func (mr *MockPlayerMappingStoreMockRecorder) GetByRace(ctx, gameID, raceName any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "GetByRace", reflect.TypeOf((*MockPlayerMappingStore)(nil).GetByRace), ctx, gameID, raceName) +} + +// ListByGame mocks base method. +func (m *MockPlayerMappingStore) ListByGame(ctx context.Context, gameID string) ([]playermapping.PlayerMapping, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ListByGame", ctx, gameID) + ret0, _ := ret[0].([]playermapping.PlayerMapping) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ListByGame indicates an expected call of ListByGame. +func (mr *MockPlayerMappingStoreMockRecorder) ListByGame(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ListByGame", reflect.TypeOf((*MockPlayerMappingStore)(nil).ListByGame), ctx, gameID) +} diff --git a/gamemaster/internal/adapters/mocks/mock_rtmclient.go b/gamemaster/internal/adapters/mocks/mock_rtmclient.go new file mode 100644 index 0000000..c5e2631 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_rtmclient.go @@ -0,0 +1,69 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: RTMClient) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_rtmclient.go -package=mocks galaxy/gamemaster/internal/ports RTMClient +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockRTMClient is a mock of RTMClient interface. +type MockRTMClient struct { + ctrl *gomock.Controller + recorder *MockRTMClientMockRecorder + isgomock struct{} +} + +// MockRTMClientMockRecorder is the mock recorder for MockRTMClient. +type MockRTMClientMockRecorder struct { + mock *MockRTMClient +} + +// NewMockRTMClient creates a new mock instance. +func NewMockRTMClient(ctrl *gomock.Controller) *MockRTMClient { + mock := &MockRTMClient{ctrl: ctrl} + mock.recorder = &MockRTMClientMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockRTMClient) EXPECT() *MockRTMClientMockRecorder { + return m.recorder +} + +// Patch mocks base method. +func (m *MockRTMClient) Patch(ctx context.Context, gameID, imageRef string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Patch", ctx, gameID, imageRef) + ret0, _ := ret[0].(error) + return ret0 +} + +// Patch indicates an expected call of Patch. +func (mr *MockRTMClientMockRecorder) Patch(ctx, gameID, imageRef any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Patch", reflect.TypeOf((*MockRTMClient)(nil).Patch), ctx, gameID, imageRef) +} + +// Stop mocks base method. +func (m *MockRTMClient) Stop(ctx context.Context, gameID, reason string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Stop", ctx, gameID, reason) + ret0, _ := ret[0].(error) + return ret0 +} + +// Stop indicates an expected call of Stop. +func (mr *MockRTMClientMockRecorder) Stop(ctx, gameID, reason any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Stop", reflect.TypeOf((*MockRTMClient)(nil).Stop), ctx, gameID, reason) +} diff --git a/gamemaster/internal/adapters/mocks/mock_runtimerecordstore.go b/gamemaster/internal/adapters/mocks/mock_runtimerecordstore.go new file mode 100644 index 0000000..1a554f6 --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_runtimerecordstore.go @@ -0,0 +1,188 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: RuntimeRecordStore) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_runtimerecordstore.go -package=mocks galaxy/gamemaster/internal/ports RuntimeRecordStore +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + runtime "galaxy/gamemaster/internal/domain/runtime" + ports "galaxy/gamemaster/internal/ports" + reflect "reflect" + time "time" + + gomock "go.uber.org/mock/gomock" +) + +// MockRuntimeRecordStore is a mock of RuntimeRecordStore interface. +type MockRuntimeRecordStore struct { + ctrl *gomock.Controller + recorder *MockRuntimeRecordStoreMockRecorder + isgomock struct{} +} + +// MockRuntimeRecordStoreMockRecorder is the mock recorder for MockRuntimeRecordStore. +type MockRuntimeRecordStoreMockRecorder struct { + mock *MockRuntimeRecordStore +} + +// NewMockRuntimeRecordStore creates a new mock instance. +func NewMockRuntimeRecordStore(ctrl *gomock.Controller) *MockRuntimeRecordStore { + mock := &MockRuntimeRecordStore{ctrl: ctrl} + mock.recorder = &MockRuntimeRecordStoreMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockRuntimeRecordStore) EXPECT() *MockRuntimeRecordStoreMockRecorder { + return m.recorder +} + +// Delete mocks base method. +func (m *MockRuntimeRecordStore) Delete(ctx context.Context, gameID string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Delete", ctx, gameID) + ret0, _ := ret[0].(error) + return ret0 +} + +// Delete indicates an expected call of Delete. +func (mr *MockRuntimeRecordStoreMockRecorder) Delete(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Delete", reflect.TypeOf((*MockRuntimeRecordStore)(nil).Delete), ctx, gameID) +} + +// Get mocks base method. +func (m *MockRuntimeRecordStore) Get(ctx context.Context, gameID string) (runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Get", ctx, gameID) + ret0, _ := ret[0].(runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Get indicates an expected call of Get. +func (mr *MockRuntimeRecordStoreMockRecorder) Get(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockRuntimeRecordStore)(nil).Get), ctx, gameID) +} + +// Insert mocks base method. +func (m *MockRuntimeRecordStore) Insert(ctx context.Context, record runtime.RuntimeRecord) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Insert", ctx, record) + ret0, _ := ret[0].(error) + return ret0 +} + +// Insert indicates an expected call of Insert. +func (mr *MockRuntimeRecordStoreMockRecorder) Insert(ctx, record any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Insert", reflect.TypeOf((*MockRuntimeRecordStore)(nil).Insert), ctx, record) +} + +// List mocks base method. +func (m *MockRuntimeRecordStore) List(ctx context.Context) ([]runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "List", ctx) + ret0, _ := ret[0].([]runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// List indicates an expected call of List. +func (mr *MockRuntimeRecordStoreMockRecorder) List(ctx any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "List", reflect.TypeOf((*MockRuntimeRecordStore)(nil).List), ctx) +} + +// ListByStatus mocks base method. +func (m *MockRuntimeRecordStore) ListByStatus(ctx context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ListByStatus", ctx, status) + ret0, _ := ret[0].([]runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ListByStatus indicates an expected call of ListByStatus. +func (mr *MockRuntimeRecordStoreMockRecorder) ListByStatus(ctx, status any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ListByStatus", reflect.TypeOf((*MockRuntimeRecordStore)(nil).ListByStatus), ctx, status) +} + +// ListDueRunning mocks base method. +func (m *MockRuntimeRecordStore) ListDueRunning(ctx context.Context, now time.Time) ([]runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ListDueRunning", ctx, now) + ret0, _ := ret[0].([]runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ListDueRunning indicates an expected call of ListDueRunning. +func (mr *MockRuntimeRecordStoreMockRecorder) ListDueRunning(ctx, now any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ListDueRunning", reflect.TypeOf((*MockRuntimeRecordStore)(nil).ListDueRunning), ctx, now) +} + +// UpdateEngineHealth mocks base method. +func (m *MockRuntimeRecordStore) UpdateEngineHealth(ctx context.Context, input ports.UpdateEngineHealthInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "UpdateEngineHealth", ctx, input) + ret0, _ := ret[0].(error) + return ret0 +} + +// UpdateEngineHealth indicates an expected call of UpdateEngineHealth. +func (mr *MockRuntimeRecordStoreMockRecorder) UpdateEngineHealth(ctx, input any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "UpdateEngineHealth", reflect.TypeOf((*MockRuntimeRecordStore)(nil).UpdateEngineHealth), ctx, input) +} + +// UpdateImage mocks base method. +func (m *MockRuntimeRecordStore) UpdateImage(ctx context.Context, input ports.UpdateImageInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "UpdateImage", ctx, input) + ret0, _ := ret[0].(error) + return ret0 +} + +// UpdateImage indicates an expected call of UpdateImage. +func (mr *MockRuntimeRecordStoreMockRecorder) UpdateImage(ctx, input any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "UpdateImage", reflect.TypeOf((*MockRuntimeRecordStore)(nil).UpdateImage), ctx, input) +} + +// UpdateScheduling mocks base method. +func (m *MockRuntimeRecordStore) UpdateScheduling(ctx context.Context, input ports.UpdateSchedulingInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "UpdateScheduling", ctx, input) + ret0, _ := ret[0].(error) + return ret0 +} + +// UpdateScheduling indicates an expected call of UpdateScheduling. +func (mr *MockRuntimeRecordStoreMockRecorder) UpdateScheduling(ctx, input any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "UpdateScheduling", reflect.TypeOf((*MockRuntimeRecordStore)(nil).UpdateScheduling), ctx, input) +} + +// UpdateStatus mocks base method. +func (m *MockRuntimeRecordStore) UpdateStatus(ctx context.Context, input ports.UpdateStatusInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "UpdateStatus", ctx, input) + ret0, _ := ret[0].(error) + return ret0 +} + +// UpdateStatus indicates an expected call of UpdateStatus. +func (mr *MockRuntimeRecordStoreMockRecorder) UpdateStatus(ctx, input any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "UpdateStatus", reflect.TypeOf((*MockRuntimeRecordStore)(nil).UpdateStatus), ctx, input) +} diff --git a/gamemaster/internal/adapters/mocks/mock_streamoffsetstore.go b/gamemaster/internal/adapters/mocks/mock_streamoffsetstore.go new file mode 100644 index 0000000..6fdfc3d --- /dev/null +++ b/gamemaster/internal/adapters/mocks/mock_streamoffsetstore.go @@ -0,0 +1,71 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/ports (interfaces: StreamOffsetStore) +// +// Generated by this command: +// +// mockgen -destination=../adapters/mocks/mock_streamoffsetstore.go -package=mocks galaxy/gamemaster/internal/ports StreamOffsetStore +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockStreamOffsetStore is a mock of StreamOffsetStore interface. +type MockStreamOffsetStore struct { + ctrl *gomock.Controller + recorder *MockStreamOffsetStoreMockRecorder + isgomock struct{} +} + +// MockStreamOffsetStoreMockRecorder is the mock recorder for MockStreamOffsetStore. +type MockStreamOffsetStoreMockRecorder struct { + mock *MockStreamOffsetStore +} + +// NewMockStreamOffsetStore creates a new mock instance. +func NewMockStreamOffsetStore(ctrl *gomock.Controller) *MockStreamOffsetStore { + mock := &MockStreamOffsetStore{ctrl: ctrl} + mock.recorder = &MockStreamOffsetStoreMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockStreamOffsetStore) EXPECT() *MockStreamOffsetStoreMockRecorder { + return m.recorder +} + +// Load mocks base method. +func (m *MockStreamOffsetStore) Load(ctx context.Context, stream string) (string, bool, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Load", ctx, stream) + ret0, _ := ret[0].(string) + ret1, _ := ret[1].(bool) + ret2, _ := ret[2].(error) + return ret0, ret1, ret2 +} + +// Load indicates an expected call of Load. +func (mr *MockStreamOffsetStoreMockRecorder) Load(ctx, stream any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Load", reflect.TypeOf((*MockStreamOffsetStore)(nil).Load), ctx, stream) +} + +// Save mocks base method. +func (m *MockStreamOffsetStore) Save(ctx context.Context, stream, entryID string) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Save", ctx, stream, entryID) + ret0, _ := ret[0].(error) + return ret0 +} + +// Save indicates an expected call of Save. +func (mr *MockStreamOffsetStoreMockRecorder) Save(ctx, stream, entryID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Save", reflect.TypeOf((*MockStreamOffsetStore)(nil).Save), ctx, stream, entryID) +} diff --git a/gamemaster/internal/adapters/notificationpublisher/publisher.go b/gamemaster/internal/adapters/notificationpublisher/publisher.go new file mode 100644 index 0000000..7b8eb06 --- /dev/null +++ b/gamemaster/internal/adapters/notificationpublisher/publisher.go @@ -0,0 +1,73 @@ +// Package notificationpublisher provides the Redis-Streams-backed +// notification-intent publisher Game Master uses for the three GM-owned +// types listed in `gamemaster/README.md §Notification Contracts`: +// `game.turn.ready`, `game.finished`, `game.generation_failed`. +// +// The adapter is a thin shim over `galaxy/notificationintent.Publisher` +// that drops the entry id at the wrapper boundary; it mirrors +// `rtmanager/internal/adapters/notificationpublisher` byte-for-byte +// (`rtmanager/docs/domain-and-ports.md §7` justifies that decision and +// applies here for the same reason). +package notificationpublisher + +import ( + "context" + "errors" + "fmt" + + "github.com/redis/go-redis/v9" + + "galaxy/notificationintent" + + "galaxy/gamemaster/internal/ports" +) + +// Config groups the dependencies and stream name required to construct +// a Publisher. +type Config struct { + // Client appends entries to Redis Streams. Must be non-nil. + Client *redis.Client + + // Stream stores the Redis Stream key intents are published to. + // When empty, `notificationintent.DefaultIntentsStream` is used. + Stream string +} + +// Publisher implements `ports.NotificationIntentPublisher` on top of +// the shared `notificationintent.Publisher`. +type Publisher struct { + inner *notificationintent.Publisher +} + +// NewPublisher constructs a Publisher from cfg. Validation errors and +// transport errors propagate verbatim. +func NewPublisher(cfg Config) (*Publisher, error) { + if cfg.Client == nil { + return nil, errors.New("new gamemaster notification publisher: nil redis client") + } + inner, err := notificationintent.NewPublisher(notificationintent.PublisherConfig{ + Client: cfg.Client, + Stream: cfg.Stream, + }) + if err != nil { + return nil, fmt.Errorf("new gamemaster notification publisher: %w", err) + } + return &Publisher{inner: inner}, nil +} + +// Publish forwards intent to the underlying notificationintent +// publisher and discards the resulting Redis Stream entry id. A failed +// publish surfaces as the underlying error. +func (publisher *Publisher) Publish(ctx context.Context, intent notificationintent.Intent) error { + if publisher == nil || publisher.inner == nil { + return errors.New("publish notification intent: nil publisher") + } + if _, err := publisher.inner.Publish(ctx, intent); err != nil { + return err + } + return nil +} + +// Compile-time assertion: Publisher implements +// ports.NotificationIntentPublisher. +var _ ports.NotificationIntentPublisher = (*Publisher)(nil) diff --git a/gamemaster/internal/adapters/notificationpublisher/publisher_test.go b/gamemaster/internal/adapters/notificationpublisher/publisher_test.go new file mode 100644 index 0000000..226e4a4 --- /dev/null +++ b/gamemaster/internal/adapters/notificationpublisher/publisher_test.go @@ -0,0 +1,167 @@ +package notificationpublisher + +import ( + "context" + "encoding/json" + "testing" + "time" + + "github.com/alicebob/miniredis/v2" + "github.com/redis/go-redis/v9" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "galaxy/notificationintent" +) + +func newRedis(t *testing.T) (*redis.Client, *miniredis.Miniredis) { + t.Helper() + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + return client, server +} + +func readStream(t *testing.T, client *redis.Client, stream string) []redis.XMessage { + t.Helper() + messages, err := client.XRange(context.Background(), stream, "-", "+").Result() + require.NoError(t, err) + return messages +} + +func TestNewPublisherValidation(t *testing.T) { + t.Run("nil client", func(t *testing.T) { + _, err := NewPublisher(Config{}) + require.Error(t, err) + assert.Contains(t, err.Error(), "nil redis client") + }) +} + +func TestPublishGameTurnReady(t *testing.T) { + client, _ := newRedis(t) + + publisher, err := NewPublisher(Config{Client: client, Stream: "notification:intents"}) + require.NoError(t, err) + + intent, err := notificationintent.NewGameTurnReadyIntent( + notificationintent.Metadata{ + IdempotencyKey: "gamemaster:turn:game-1:42", + OccurredAt: time.UnixMilli(1714200000000).UTC(), + }, + []string{"u-2", "u-1"}, + notificationintent.GameTurnReadyPayload{ + GameID: "game-1", + GameName: "Galaxy", + TurnNumber: 42, + }, + ) + require.NoError(t, err) + require.NoError(t, publisher.Publish(context.Background(), intent)) + + messages := readStream(t, client, "notification:intents") + require.Len(t, messages, 1) + values := messages[0].Values + assert.Equal(t, "game.turn.ready", values["notification_type"]) + assert.Equal(t, "game_master", values["producer"]) + assert.Equal(t, "user", values["audience_kind"]) + assert.Equal(t, "gamemaster:turn:game-1:42", values["idempotency_key"]) + + recipients, ok := values["recipient_user_ids_json"].(string) + require.True(t, ok) + var ids []string + require.NoError(t, json.Unmarshal([]byte(recipients), &ids)) + assert.ElementsMatch(t, []string{"u-1", "u-2"}, ids) + + payloadRaw, ok := values["payload_json"].(string) + require.True(t, ok) + var payload map[string]any + require.NoError(t, json.Unmarshal([]byte(payloadRaw), &payload)) + assert.Equal(t, "game-1", payload["game_id"]) + assert.Equal(t, float64(42), payload["turn_number"]) +} + +func TestPublishGameFinished(t *testing.T) { + client, _ := newRedis(t) + publisher, err := NewPublisher(Config{Client: client, Stream: "notification:intents"}) + require.NoError(t, err) + + intent, err := notificationintent.NewGameFinishedIntent( + notificationintent.Metadata{ + IdempotencyKey: "gamemaster:finished:g-1", + OccurredAt: time.UnixMilli(1714200000000).UTC(), + }, + []string{"u-1"}, + notificationintent.GameFinishedPayload{ + GameID: "g-1", + GameName: "Galaxy", + FinalTurnNumber: 100, + }, + ) + require.NoError(t, err) + require.NoError(t, publisher.Publish(context.Background(), intent)) + + messages := readStream(t, client, "notification:intents") + require.Len(t, messages, 1) + assert.Equal(t, "game.finished", messages[0].Values["notification_type"]) + assert.Equal(t, "user", messages[0].Values["audience_kind"]) +} + +func TestPublishGameGenerationFailed(t *testing.T) { + client, _ := newRedis(t) + publisher, err := NewPublisher(Config{Client: client, Stream: "notification:intents"}) + require.NoError(t, err) + + intent, err := notificationintent.NewGameGenerationFailedIntent( + notificationintent.Metadata{ + IdempotencyKey: "gamemaster:gen-failed:g-1:42", + OccurredAt: time.UnixMilli(1714200000000).UTC(), + }, + notificationintent.GameGenerationFailedPayload{ + GameID: "g-1", + GameName: "Galaxy", + FailureReason: "engine timeout", + }, + ) + require.NoError(t, err) + require.NoError(t, publisher.Publish(context.Background(), intent)) + + messages := readStream(t, client, "notification:intents") + require.Len(t, messages, 1) + values := messages[0].Values + assert.Equal(t, "game.generation_failed", values["notification_type"]) + assert.Equal(t, "admin_email", values["audience_kind"]) + _, hasRecipients := values["recipient_user_ids_json"] + assert.False(t, hasRecipients, "admin_email audience must not carry recipient ids") +} + +func TestPublishForwardsValidationError(t *testing.T) { + client, _ := newRedis(t) + publisher, err := NewPublisher(Config{Client: client}) + require.NoError(t, err) + + bad := notificationintent.Intent{ + NotificationType: notificationintent.NotificationTypeGameTurnReady, + Producer: notificationintent.ProducerGameMaster, + AudienceKind: notificationintent.AudienceKindUser, + IdempotencyKey: "k", + PayloadJSON: `{"game_id":"g","game_name":"x","turn_number":1}`, + } + require.Error(t, publisher.Publish(context.Background(), bad)) +} + +func TestPublishDefaultStream(t *testing.T) { + client, _ := newRedis(t) + publisher, err := NewPublisher(Config{Client: client, Stream: ""}) + require.NoError(t, err) + + intent, err := notificationintent.NewGameTurnReadyIntent( + notificationintent.Metadata{IdempotencyKey: "k", OccurredAt: time.UnixMilli(1).UTC()}, + []string{"u-1"}, + notificationintent.GameTurnReadyPayload{GameID: "g", GameName: "n", TurnNumber: 1}, + ) + require.NoError(t, err) + require.NoError(t, publisher.Publish(context.Background(), intent)) + + messages := readStream(t, client, notificationintent.DefaultIntentsStream) + require.Len(t, messages, 1) +} diff --git a/gamemaster/internal/adapters/postgres/engineversionstore/store.go b/gamemaster/internal/adapters/postgres/engineversionstore/store.go new file mode 100644 index 0000000..eb158d8 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/engineversionstore/store.go @@ -0,0 +1,416 @@ +// Package engineversionstore implements the PostgreSQL-backed adapter +// for `ports.EngineVersionStore`. +// +// The package owns the on-disk shape of the `engine_versions` table +// defined in +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql` +// and translates the schema-agnostic `ports.EngineVersionStore` +// interface declared in `internal/ports/engineversionstore.go` into +// concrete go-jet/v2 statements driven by the pgx driver. +// +// Insert maps PostgreSQL unique violations to engineversion.ErrConflict; +// Update applies a partial UPDATE driven by the non-nil pointer fields +// of UpdateEngineVersionInput; Deprecate is idempotent on the +// already-deprecated row; IsReferencedByActiveRuntime probes the +// runtime_records table for non-finished references. +package engineversionstore + +import ( + "context" + "database/sql" + "errors" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/sqlx" + pgtable "galaxy/gamemaster/internal/adapters/postgres/jet/gamemaster/table" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + + pg "github.com/go-jet/jet/v2/postgres" +) + +// emptyOptionsJSON is the default value persisted when a caller hands +// us an empty Options slice. It matches the SQL column default. +var emptyOptionsJSON = []byte("{}") + +// Config configures one PostgreSQL-backed engine-version store. The +// store does not own the underlying *sql.DB lifecycle. +type Config struct { + DB *sql.DB + OperationTimeout time.Duration +} + +// Store persists Game Master engine-version registry rows in +// PostgreSQL. +type Store struct { + db *sql.DB + operationTimeout time.Duration +} + +// New constructs one PostgreSQL-backed engine-version store from cfg. +func New(cfg Config) (*Store, error) { + if cfg.DB == nil { + return nil, errors.New("new postgres engine version store: db must not be nil") + } + if cfg.OperationTimeout <= 0 { + return nil, errors.New("new postgres engine version store: operation timeout must be positive") + } + return &Store{ + db: cfg.DB, + operationTimeout: cfg.OperationTimeout, + }, nil +} + +// engineVersionSelectColumns matches scanRow's column order. +var engineVersionSelectColumns = pg.ColumnList{ + pgtable.EngineVersions.Version, + pgtable.EngineVersions.ImageRef, + pgtable.EngineVersions.Options, + pgtable.EngineVersions.Status, + pgtable.EngineVersions.CreatedAt, + pgtable.EngineVersions.UpdatedAt, +} + +// Get returns the row identified by version. Returns +// engineversion.ErrNotFound when no row exists. +func (store *Store) Get(ctx context.Context, version string) (engineversion.EngineVersion, error) { + if store == nil || store.db == nil { + return engineversion.EngineVersion{}, errors.New("get engine version: nil store") + } + if strings.TrimSpace(version) == "" { + return engineversion.EngineVersion{}, fmt.Errorf("get engine version: version must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "get engine version", store.operationTimeout) + if err != nil { + return engineversion.EngineVersion{}, err + } + defer cancel() + + stmt := pg.SELECT(engineVersionSelectColumns). + FROM(pgtable.EngineVersions). + WHERE(pgtable.EngineVersions.Version.EQ(pg.String(version))) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + got, err := scanRow(row) + if sqlx.IsNoRows(err) { + return engineversion.EngineVersion{}, engineversion.ErrNotFound + } + if err != nil { + return engineversion.EngineVersion{}, fmt.Errorf("get engine version: %w", err) + } + return got, nil +} + +// List returns every row whose status matches statusFilter (when +// non-nil), ordered by version ASC. +func (store *Store) List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) { + if store == nil || store.db == nil { + return nil, errors.New("list engine versions: nil store") + } + if statusFilter != nil && !statusFilter.IsKnown() { + return nil, fmt.Errorf("list engine versions: status %q is unsupported", *statusFilter) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list engine versions", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + stmt := pg.SELECT(engineVersionSelectColumns). + FROM(pgtable.EngineVersions) + if statusFilter != nil { + stmt = stmt.WHERE(pgtable.EngineVersions.Status.EQ(pg.String(string(*statusFilter)))) + } + stmt = stmt.ORDER_BY(pgtable.EngineVersions.Version.ASC()) + + query, args := stmt.Sql() + rows, err := store.db.QueryContext(operationCtx, query, args...) + if err != nil { + return nil, fmt.Errorf("list engine versions: %w", err) + } + defer rows.Close() + + versions := make([]engineversion.EngineVersion, 0) + for rows.Next() { + got, err := scanRow(rows) + if err != nil { + return nil, fmt.Errorf("list engine versions: scan: %w", err) + } + versions = append(versions, got) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("list engine versions: %w", err) + } + if len(versions) == 0 { + return nil, nil + } + return versions, nil +} + +// Insert installs record into the registry. Returns +// engineversion.ErrConflict when a row with the same version already +// exists. +func (store *Store) Insert(ctx context.Context, record engineversion.EngineVersion) error { + if store == nil || store.db == nil { + return errors.New("insert engine version: nil store") + } + if err := record.Validate(); err != nil { + return fmt.Errorf("insert engine version: %w", err) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "insert engine version", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + options := record.Options + if len(options) == 0 { + options = emptyOptionsJSON + } + + stmt := pgtable.EngineVersions.INSERT( + pgtable.EngineVersions.Version, + pgtable.EngineVersions.ImageRef, + pgtable.EngineVersions.Options, + pgtable.EngineVersions.Status, + pgtable.EngineVersions.CreatedAt, + pgtable.EngineVersions.UpdatedAt, + ).VALUES( + record.Version, + record.ImageRef, + string(options), + string(record.Status), + record.CreatedAt.UTC(), + record.UpdatedAt.UTC(), + ) + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + if sqlx.IsUniqueViolation(err) { + return fmt.Errorf("insert engine version: %w", engineversion.ErrConflict) + } + return fmt.Errorf("insert engine version: %w", err) + } + return nil +} + +// Update applies a partial update to one engine-version row. +// updated_at is always refreshed from input.Now. Returns +// engineversion.ErrNotFound when the row is absent. +func (store *Store) Update(ctx context.Context, input ports.UpdateEngineVersionInput) error { + if store == nil || store.db == nil { + return errors.New("update engine version: nil store") + } + if err := input.Validate(); err != nil { + return err + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "update engine version", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + now := input.Now.UTC() + assignments := []any{ + pgtable.EngineVersions.UpdatedAt.SET(pg.TimestampzT(now)), + } + if input.ImageRef != nil { + assignments = append(assignments, + pgtable.EngineVersions.ImageRef.SET(pg.String(*input.ImageRef))) + } + if input.Options != nil { + options := *input.Options + if len(options) == 0 { + options = emptyOptionsJSON + } + assignments = append(assignments, + pgtable.EngineVersions.Options.SET( + pg.StringExp(pg.CAST(pg.String(string(options))).AS("jsonb")), + )) + } + if input.Status != nil { + assignments = append(assignments, + pgtable.EngineVersions.Status.SET(pg.String(string(*input.Status)))) + } + + stmt := pgtable.EngineVersions.UPDATE(pgtable.EngineVersions.UpdatedAt). + SET(assignments[0], assignments[1:]...). + WHERE(pgtable.EngineVersions.Version.EQ(pg.String(input.Version))) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("update engine version: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("update engine version: rows affected: %w", err) + } + if affected == 0 { + return engineversion.ErrNotFound + } + return nil +} + +// Deprecate sets `status=deprecated` and refreshes `updated_at` for +// version. Returns engineversion.ErrNotFound when no row exists. +// Calling Deprecate on an already deprecated row succeeds with no +// further mutation (idempotent). +func (store *Store) Deprecate(ctx context.Context, version string, now time.Time) error { + if store == nil || store.db == nil { + return errors.New("deprecate engine version: nil store") + } + if strings.TrimSpace(version) == "" { + return fmt.Errorf("deprecate engine version: version must not be empty") + } + if now.IsZero() { + return fmt.Errorf("deprecate engine version: now must not be zero") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "deprecate engine version", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + // Pre-check the row's existence so we can surface a precise + // ErrNotFound; a 0-row affected from the UPDATE alone could mean + // "missing" or "already deprecated". + current, err := store.Get(operationCtx, version) + if err != nil { + return err + } + if current.Status == engineversion.StatusDeprecated { + return nil + } + + stmt := pgtable.EngineVersions.UPDATE(pgtable.EngineVersions.Status). + SET( + pgtable.EngineVersions.Status.SET(pg.String(string(engineversion.StatusDeprecated))), + pgtable.EngineVersions.UpdatedAt.SET(pg.TimestampzT(now.UTC())), + ). + WHERE(pgtable.EngineVersions.Version.EQ(pg.String(version))) + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + return fmt.Errorf("deprecate engine version: %w", err) + } + return nil +} + +// Delete removes the row identified by version. Returns +// engineversion.ErrNotFound when no row matches. The adapter does not +// inspect runtime_records; the service layer guards against active +// references through IsReferencedByActiveRuntime before issuing Delete. +func (store *Store) Delete(ctx context.Context, version string) error { + if store == nil || store.db == nil { + return errors.New("delete engine version: nil store") + } + if strings.TrimSpace(version) == "" { + return fmt.Errorf("delete engine version: version must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "delete engine version", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.EngineVersions.DELETE(). + WHERE(pgtable.EngineVersions.Version.EQ(pg.String(version))) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("delete engine version: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("delete engine version: rows affected: %w", err) + } + if affected == 0 { + return engineversion.ErrNotFound + } + return nil +} + +// IsReferencedByActiveRuntime reports whether any non-finished and +// non-stopped runtime row currently references version through +// `current_engine_version`. +func (store *Store) IsReferencedByActiveRuntime(ctx context.Context, version string) (bool, error) { + if store == nil || store.db == nil { + return false, errors.New("is referenced by active runtime: nil store") + } + if strings.TrimSpace(version) == "" { + return false, fmt.Errorf("is referenced by active runtime: version must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "is referenced by active runtime", store.operationTimeout) + if err != nil { + return false, err + } + defer cancel() + + stmt := pg.SELECT(pg.Int32(1).AS("present")). + FROM(pgtable.RuntimeRecords). + WHERE(pg.AND( + pgtable.RuntimeRecords.CurrentEngineVersion.EQ(pg.String(version)), + pgtable.RuntimeRecords.Status.NOT_IN( + pg.String(string(runtime.StatusFinished)), + pg.String(string(runtime.StatusStopped)), + ), + )). + LIMIT(1) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + var present int32 + if err := row.Scan(&present); err != nil { + if sqlx.IsNoRows(err) { + return false, nil + } + return false, fmt.Errorf("is referenced by active runtime: %w", err) + } + return true, nil +} + +// rowScanner abstracts *sql.Row and *sql.Rows so scanRow can be shared +// across single-row and iterated reads. +type rowScanner interface { + Scan(dest ...any) error +} + +// scanRow scans one engine_versions row from rs. +func scanRow(rs rowScanner) (engineversion.EngineVersion, error) { + var ( + version string + imageRef string + options string + status string + createdAt time.Time + updatedAt time.Time + ) + if err := rs.Scan(&version, &imageRef, &options, &status, &createdAt, &updatedAt); err != nil { + return engineversion.EngineVersion{}, err + } + return engineversion.EngineVersion{ + Version: version, + ImageRef: imageRef, + Options: []byte(options), + Status: engineversion.Status(status), + CreatedAt: createdAt.UTC(), + UpdatedAt: updatedAt.UTC(), + }, nil +} + +// Ensure Store satisfies the ports.EngineVersionStore interface at +// compile time. +var _ ports.EngineVersionStore = (*Store)(nil) diff --git a/gamemaster/internal/adapters/postgres/engineversionstore/store_test.go b/gamemaster/internal/adapters/postgres/engineversionstore/store_test.go new file mode 100644 index 0000000..e66e462 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/engineversionstore/store_test.go @@ -0,0 +1,403 @@ +package engineversionstore_test + +import ( + "context" + "database/sql" + "errors" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/engineversionstore" + "galaxy/gamemaster/internal/adapters/postgres/internal/pgtest" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestMain(m *testing.M) { pgtest.RunMain(m) } + +func newStore(t *testing.T) *engineversionstore.Store { + t.Helper() + pgtest.TruncateAll(t) + store, err := engineversionstore.New(engineversionstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: pgtest.OperationTimeout, + }) + require.NoError(t, err) + return store +} + +// poolOnly returns the shared pool for tests that have to seed +// runtime_records directly (e.g. TestIsReferencedByActiveRuntime). +func poolOnly(t *testing.T) *sql.DB { + t.Helper() + pgtest.TruncateAll(t) + return pgtest.Ensure(t).Pool() +} + +func validVersion(version string, createdAt time.Time, status engineversion.Status) engineversion.EngineVersion { + return engineversion.EngineVersion{ + Version: version, + ImageRef: "ghcr.io/galaxy/game:" + version, + Options: []byte(`{"max_planets":120}`), + Status: status, + CreatedAt: createdAt, + UpdatedAt: createdAt, + } +} + +func TestNewRejectsInvalidConfig(t *testing.T) { + _, err := engineversionstore.New(engineversionstore.Config{}) + require.Error(t, err) + + store, err := engineversionstore.New(engineversionstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: 0, + }) + require.Error(t, err) + require.Nil(t, store) +} + +func TestInsertGetRoundTrip(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + record := validVersion("v1.2.3", now, engineversion.StatusActive) + + require.NoError(t, store.Insert(ctx, record)) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.Equal(t, record.Version, got.Version) + assert.Equal(t, record.ImageRef, got.ImageRef) + assert.JSONEq(t, `{"max_planets":120}`, string(got.Options)) + assert.Equal(t, engineversion.StatusActive, got.Status) + assert.True(t, got.CreatedAt.Equal(now)) + assert.True(t, got.UpdatedAt.Equal(now)) + assert.Equal(t, time.UTC, got.CreatedAt.Location()) + assert.Equal(t, time.UTC, got.UpdatedAt.Location()) +} + +func TestInsertEmptyOptionsDefaultsToObject(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + record := validVersion("v1.2.3", now, engineversion.StatusActive) + record.Options = nil + + require.NoError(t, store.Insert(ctx, record)) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.JSONEq(t, `{}`, string(got.Options)) +} + +func TestInsertConflict(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + record := validVersion("v1.2.3", now, engineversion.StatusActive) + require.NoError(t, store.Insert(ctx, record)) + + err := store.Insert(ctx, record) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrConflict), "want ErrConflict, got %v", err) +} + +func TestGetNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "v9.9.9") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestListNoFilter(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.0", now, engineversion.StatusDeprecated))) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + require.NoError(t, store.Insert(ctx, validVersion("v1.3.0", now, engineversion.StatusActive))) + + all, err := store.List(ctx, nil) + require.NoError(t, err) + require.Len(t, all, 3) + assert.Equal(t, "v1.2.0", all[0].Version) + assert.Equal(t, "v1.2.3", all[1].Version) + assert.Equal(t, "v1.3.0", all[2].Version) +} + +func TestListByStatusFilter(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.0", now, engineversion.StatusDeprecated))) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + require.NoError(t, store.Insert(ctx, validVersion("v1.3.0", now, engineversion.StatusActive))) + + active := engineversion.StatusActive + got, err := store.List(ctx, &active) + require.NoError(t, err) + require.Len(t, got, 2) + assert.Equal(t, "v1.2.3", got[0].Version) + assert.Equal(t, "v1.3.0", got[1].Version) + + deprecated := engineversion.StatusDeprecated + got, err = store.List(ctx, &deprecated) + require.NoError(t, err) + require.Len(t, got, 1) + assert.Equal(t, "v1.2.0", got[0].Version) +} + +func TestListUnknownStatusRejected(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + exotic := engineversion.Status("exotic") + _, err := store.List(ctx, &exotic) + require.Error(t, err) +} + +func TestUpdateImageRefOnly(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + + newRef := "ghcr.io/galaxy/game:v1.2.4" + updateAt := now.Add(time.Minute) + require.NoError(t, store.Update(ctx, ports.UpdateEngineVersionInput{ + Version: "v1.2.3", + ImageRef: &newRef, + Now: updateAt, + })) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.Equal(t, newRef, got.ImageRef) + assert.Equal(t, engineversion.StatusActive, got.Status) + assert.True(t, got.UpdatedAt.Equal(updateAt)) +} + +func TestUpdateAllFields(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + + newRef := "ghcr.io/galaxy/game:v1.2.4" + newOptions := []byte(`{"max_planets":240,"hot_seat":true}`) + deprecated := engineversion.StatusDeprecated + updateAt := now.Add(time.Minute) + require.NoError(t, store.Update(ctx, ports.UpdateEngineVersionInput{ + Version: "v1.2.3", + ImageRef: &newRef, + Options: &newOptions, + Status: &deprecated, + Now: updateAt, + })) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.Equal(t, newRef, got.ImageRef) + assert.JSONEq(t, string(newOptions), string(got.Options)) + assert.Equal(t, engineversion.StatusDeprecated, got.Status) + assert.True(t, got.UpdatedAt.Equal(updateAt)) +} + +func TestUpdateNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + newRef := "ghcr.io/galaxy/game:v1.2.4" + err := store.Update(ctx, ports.UpdateEngineVersionInput{ + Version: "v9.9.9", + ImageRef: &newRef, + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestDeprecateHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + + deprecateAt := now.Add(time.Hour) + require.NoError(t, store.Deprecate(ctx, "v1.2.3", deprecateAt)) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.Equal(t, engineversion.StatusDeprecated, got.Status) + assert.True(t, got.UpdatedAt.Equal(deprecateAt)) +} + +func TestDeprecateIdempotent(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusDeprecated))) + + require.NoError(t, store.Deprecate(ctx, "v1.2.3", now.Add(time.Hour))) + + got, err := store.Get(ctx, "v1.2.3") + require.NoError(t, err) + assert.Equal(t, engineversion.StatusDeprecated, got.Status) + // updated_at must remain at the original insert value because the + // idempotent path performs no UPDATE. + assert.True(t, got.UpdatedAt.Equal(now)) +} + +func TestDeprecateNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.Deprecate(ctx, "v9.9.9", time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC)) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestDeprecateRejectsZeroNow(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.Deprecate(ctx, "v1.2.3", time.Time{}) + require.Error(t, err) +} + +func TestDeleteHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + + require.NoError(t, store.Delete(ctx, "v1.2.3")) + + _, err := store.Get(ctx, "v1.2.3") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestDeleteNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.Delete(ctx, "v9.9.9") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestDeleteRejectsEmptyVersion(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.Delete(ctx, "") + require.Error(t, err) +} + +// TestIsReferencedByActiveRuntime exercises the join between +// engine_versions and runtime_records. The runtime rows are seeded by +// inserting directly through the shared pool, since the +// runtimerecordstore adapter lives in a sibling package. +func TestIsReferencedByActiveRuntime(t *testing.T) { + ctx := context.Background() + pool := poolOnly(t) + store, err := engineversionstore.New(engineversionstore.Config{ + DB: pool, + OperationTimeout: pgtest.OperationTimeout, + }) + require.NoError(t, err) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.3", now, engineversion.StatusActive))) + require.NoError(t, store.Insert(ctx, validVersion("v1.2.4", now, engineversion.StatusActive))) + + insertRuntime(t, pool, "game-running", runtime.StatusRunning, "v1.2.3", now) + insertRuntime(t, pool, "game-finished", runtime.StatusFinished, "v1.2.3", now) + insertRuntime(t, pool, "game-stopped", runtime.StatusStopped, "v1.2.3", now) + + used, err := store.IsReferencedByActiveRuntime(ctx, "v1.2.3") + require.NoError(t, err) + assert.True(t, used, "v1.2.3 must be reported referenced (game-running uses it)") + + unused, err := store.IsReferencedByActiveRuntime(ctx, "v1.2.4") + require.NoError(t, err) + assert.False(t, unused, "v1.2.4 has no active runtime reference") + + missing, err := store.IsReferencedByActiveRuntime(ctx, "v9.9.9") + require.NoError(t, err) + assert.False(t, missing) +} + +// insertRuntime seeds one runtime_records row directly via raw SQL. The +// adapter under test is engineversionstore; using the runtimerecordstore +// here would couple two adapter test suites unnecessarily. +func insertRuntime(t *testing.T, pool *sql.DB, gameID string, status runtime.Status, engineVersion string, createdAt time.Time) { + t.Helper() + at := createdAt.UTC() + var stoppedAt, finishedAt any + switch status { + case runtime.StatusStopped: + stoppedAt = at + case runtime.StatusFinished: + finishedAt = at + } + const stmt = ` +INSERT INTO runtime_records ( + game_id, status, engine_endpoint, current_image_ref, + current_engine_version, turn_schedule, current_turn, + next_generation_at, skip_next_tick, engine_health, + created_at, updated_at, started_at, stopped_at, finished_at +) VALUES ( + $1, $2, 'http://galaxy-game-' || $1 || ':8080', 'ghcr.io/galaxy/game:' || $3, + $3, '0 18 * * *', 0, + NULL, false, '', + $4, $5, $6, $7, $8 +)` + _, err := pool.ExecContext(context.Background(), stmt, + gameID, string(status), engineVersion, + at, at, at, stoppedAt, finishedAt, + ) + require.NoError(t, err) +} + +func TestIsReferencedRejectsEmptyVersion(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.IsReferencedByActiveRuntime(ctx, "") + require.Error(t, err) +} + +func TestGetRejectsEmpty(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "") + require.Error(t, err) +} + +func TestUpdateRejectsInvalidInput(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.Update(ctx, ports.UpdateEngineVersionInput{Version: "v1.2.3"}) + require.Error(t, err) +} diff --git a/gamemaster/internal/adapters/postgres/internal/pgtest/pgtest.go b/gamemaster/internal/adapters/postgres/internal/pgtest/pgtest.go new file mode 100644 index 0000000..aa04a8d --- /dev/null +++ b/gamemaster/internal/adapters/postgres/internal/pgtest/pgtest.go @@ -0,0 +1,211 @@ +// Package pgtest exposes the testcontainers-backed PostgreSQL bootstrap +// shared by every Game Master PG adapter test. The package is regular +// Go code — not a `_test.go` file — so it can be imported by the +// `_test.go` files in the four sibling store packages +// (`runtimerecordstore`, `engineversionstore`, `playermappingstore`, +// `operationlog`). +// +// No production code in `cmd/gamemaster` or in the runtime imports this +// package. The testcontainers-go dependency therefore stays out of the +// production binary's import graph. +package pgtest + +import ( + "context" + "database/sql" + "net/url" + "os" + "sync" + "testing" + "time" + + "galaxy/postgres" + + "galaxy/gamemaster/internal/adapters/postgres/migrations" + + testcontainers "github.com/testcontainers/testcontainers-go" + tcpostgres "github.com/testcontainers/testcontainers-go/modules/postgres" + "github.com/testcontainers/testcontainers-go/wait" +) + +const ( + postgresImage = "postgres:16-alpine" + superUser = "galaxy" + superPassword = "galaxy" + superDatabase = "galaxy_gamemaster" + serviceRole = "gamemasterservice" + servicePassword = "gamemasterservice" + serviceSchema = "gamemaster" + containerStartup = 90 * time.Second + + // OperationTimeout is the per-statement timeout used by every store + // constructed via the per-package newStore helpers. Tests may pass a + // smaller value if they need to assert deadline behaviour explicitly. + OperationTimeout = 10 * time.Second +) + +// Env holds the per-process container plus the *sql.DB pool already +// provisioned with the gamemaster schema, role, and migrations applied. +type Env struct { + container *tcpostgres.PostgresContainer + pool *sql.DB +} + +// Pool returns the shared pool. Tests truncate per-table state before +// each run via TruncateAll. +func (env *Env) Pool() *sql.DB { return env.pool } + +var ( + once sync.Once + cur *Env + curEr error +) + +// Ensure starts the PostgreSQL container on first invocation and applies +// the embedded goose migrations. Subsequent invocations reuse the same +// container/pool. When Docker is unavailable Ensure calls t.Skip with the +// underlying error so the test suite still passes on machines without +// Docker. +func Ensure(t testing.TB) *Env { + t.Helper() + once.Do(func() { + cur, curEr = start() + }) + if curEr != nil { + t.Skipf("postgres container start failed (Docker unavailable?): %v", curEr) + } + return cur +} + +// TruncateAll wipes every Game Master table inside the shared pool, +// leaving the schema and indexes intact. Use it from each test that +// needs a clean slate. +func TruncateAll(t testing.TB) { + t.Helper() + env := Ensure(t) + const stmt = `TRUNCATE TABLE runtime_records, engine_versions, player_mappings, operation_log RESTART IDENTITY CASCADE` + if _, err := env.pool.ExecContext(context.Background(), stmt); err != nil { + t.Fatalf("truncate gamemaster tables: %v", err) + } +} + +// Shutdown terminates the shared container and closes the pool. It is +// invoked from each test package's TestMain after `m.Run` returns so the +// container is released even if individual tests panic. +func Shutdown() { + if cur == nil { + return + } + if cur.pool != nil { + _ = cur.pool.Close() + } + if cur.container != nil { + _ = testcontainers.TerminateContainer(cur.container) + } + cur = nil +} + +// RunMain is a convenience helper for each store package's TestMain: it +// runs the test main, captures the exit code, shuts the container down, +// and exits. Wiring it through one helper keeps every TestMain to two +// lines. +func RunMain(m *testing.M) { + code := m.Run() + Shutdown() + os.Exit(code) +} + +func start() (*Env, error) { + ctx := context.Background() + container, err := tcpostgres.Run(ctx, postgresImage, + tcpostgres.WithDatabase(superDatabase), + tcpostgres.WithUsername(superUser), + tcpostgres.WithPassword(superPassword), + testcontainers.WithWaitStrategy( + wait.ForLog("database system is ready to accept connections"). + WithOccurrence(2). + WithStartupTimeout(containerStartup), + ), + ) + if err != nil { + return nil, err + } + baseDSN, err := container.ConnectionString(ctx, "sslmode=disable") + if err != nil { + _ = testcontainers.TerminateContainer(container) + return nil, err + } + if err := provisionRoleAndSchema(ctx, baseDSN); err != nil { + _ = testcontainers.TerminateContainer(container) + return nil, err + } + scopedDSN, err := dsnForServiceRole(baseDSN) + if err != nil { + _ = testcontainers.TerminateContainer(container) + return nil, err + } + cfg := postgres.DefaultConfig() + cfg.PrimaryDSN = scopedDSN + cfg.OperationTimeout = OperationTimeout + pool, err := postgres.OpenPrimary(ctx, cfg) + if err != nil { + _ = testcontainers.TerminateContainer(container) + return nil, err + } + if err := postgres.Ping(ctx, pool, OperationTimeout); err != nil { + _ = pool.Close() + _ = testcontainers.TerminateContainer(container) + return nil, err + } + if err := postgres.RunMigrations(ctx, pool, migrations.FS(), "."); err != nil { + _ = pool.Close() + _ = testcontainers.TerminateContainer(container) + return nil, err + } + return &Env{container: container, pool: pool}, nil +} + +func provisionRoleAndSchema(ctx context.Context, baseDSN string) error { + cfg := postgres.DefaultConfig() + cfg.PrimaryDSN = baseDSN + cfg.OperationTimeout = OperationTimeout + db, err := postgres.OpenPrimary(ctx, cfg) + if err != nil { + return err + } + defer func() { _ = db.Close() }() + + statements := []string{ + `DO $$ BEGIN + IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'gamemasterservice') THEN + CREATE ROLE gamemasterservice LOGIN PASSWORD 'gamemasterservice'; + END IF; + END $$;`, + `CREATE SCHEMA IF NOT EXISTS gamemaster AUTHORIZATION gamemasterservice;`, + `GRANT USAGE ON SCHEMA gamemaster TO gamemasterservice;`, + } + for _, statement := range statements { + if _, err := db.ExecContext(ctx, statement); err != nil { + return err + } + } + return nil +} + +func dsnForServiceRole(baseDSN string) (string, error) { + parsed, err := url.Parse(baseDSN) + if err != nil { + return "", err + } + values := url.Values{} + values.Set("search_path", serviceSchema) + values.Set("sslmode", "disable") + scoped := url.URL{ + Scheme: parsed.Scheme, + User: url.UserPassword(serviceRole, servicePassword), + Host: parsed.Host, + Path: parsed.Path, + RawQuery: values.Encode(), + } + return scoped.String(), nil +} diff --git a/gamemaster/internal/adapters/postgres/internal/sqlx/sqlx.go b/gamemaster/internal/adapters/postgres/internal/sqlx/sqlx.go new file mode 100644 index 0000000..966250c --- /dev/null +++ b/gamemaster/internal/adapters/postgres/internal/sqlx/sqlx.go @@ -0,0 +1,111 @@ +// Package sqlx contains the small set of helpers shared by every Game +// Master PostgreSQL adapter (runtimerecordstore, engineversionstore, +// playermappingstore, operationlog). The helpers centralise the +// boundary translations for nullable timestamps and the pgx SQLSTATE +// codes the adapters interpret as domain conflicts. +package sqlx + +import ( + "context" + "database/sql" + "errors" + "fmt" + "time" + + "github.com/jackc/pgx/v5/pgconn" +) + +// PgUniqueViolationCode identifies the SQLSTATE returned by PostgreSQL +// when a UNIQUE constraint is violated by INSERT or UPDATE. +const PgUniqueViolationCode = "23505" + +// IsUniqueViolation reports whether err is a PostgreSQL unique-violation, +// regardless of constraint name. +func IsUniqueViolation(err error) bool { + var pgErr *pgconn.PgError + if !errors.As(err, &pgErr) { + return false + } + return pgErr.Code == PgUniqueViolationCode +} + +// IsNoRows reports whether err is sql.ErrNoRows. +func IsNoRows(err error) bool { + return errors.Is(err, sql.ErrNoRows) +} + +// NullableTime returns t.UTC() when non-zero, otherwise nil so the column +// is bound as SQL NULL. +func NullableTime(t time.Time) any { + if t.IsZero() { + return nil + } + return t.UTC() +} + +// NullableTimePtr returns t.UTC() when t is non-nil and non-zero, +// otherwise nil. Companion of NullableTime for domain types that use +// *time.Time to express absent timestamps. +func NullableTimePtr(t *time.Time) any { + if t == nil { + return nil + } + return NullableTime(*t) +} + +// NullableString returns value when non-empty, otherwise nil so the +// column is bound as SQL NULL. +func NullableString(value string) any { + if value == "" { + return nil + } + return value +} + +// StringFromNullable copies an optional sql.NullString into a domain +// string. NULL becomes the empty string, matching the Game Master +// domain convention that empty == NULL for nullable text columns. +func StringFromNullable(value sql.NullString) string { + if !value.Valid { + return "" + } + return value.String +} + +// TimeFromNullable copies an optional sql.NullTime into a domain +// time.Time, applying the global UTC normalisation rule. NULL values +// become the zero time.Time. +func TimeFromNullable(value sql.NullTime) time.Time { + if !value.Valid { + return time.Time{} + } + return value.Time.UTC() +} + +// TimePtrFromNullable copies an optional sql.NullTime into a domain +// *time.Time. NULL becomes nil; non-NULL values are wrapped after UTC +// normalisation. +func TimePtrFromNullable(value sql.NullTime) *time.Time { + if !value.Valid { + return nil + } + t := value.Time.UTC() + return &t +} + +// WithTimeout derives a child context bounded by timeout and prefixes +// context errors with operation. Callers must always invoke the returned +// cancel. +func WithTimeout(ctx context.Context, operation string, timeout time.Duration) (context.Context, context.CancelFunc, error) { + if ctx == nil { + return nil, nil, fmt.Errorf("%s: nil context", operation) + } + if err := ctx.Err(); err != nil { + return nil, nil, fmt.Errorf("%s: %w", operation, err) + } + if timeout <= 0 { + return nil, nil, fmt.Errorf("%s: operation timeout must be positive", operation) + } + bounded, cancel := context.WithTimeout(ctx, timeout) + return bounded, cancel, nil +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/model/engine_versions.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/engine_versions.go new file mode 100644 index 0000000..40a081e --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/engine_versions.go @@ -0,0 +1,21 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package model + +import ( + "time" +) + +type EngineVersions struct { + Version string `sql:"primary_key"` + ImageRef string + Options string + Status string + CreatedAt time.Time + UpdatedAt time.Time +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/model/goose_db_version.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/goose_db_version.go new file mode 100644 index 0000000..c7f68e8 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/goose_db_version.go @@ -0,0 +1,19 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package model + +import ( + "time" +) + +type GooseDbVersion struct { + ID int32 `sql:"primary_key"` + VersionID int64 + IsApplied bool + Tstamp time.Time +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/model/operation_log.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/operation_log.go new file mode 100644 index 0000000..459bfab --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/operation_log.go @@ -0,0 +1,25 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package model + +import ( + "time" +) + +type OperationLog struct { + ID int64 `sql:"primary_key"` + GameID string + OpKind string + OpSource string + SourceRef string + Outcome string + ErrorCode string + ErrorMessage string + StartedAt time.Time + FinishedAt *time.Time +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/model/player_mappings.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/player_mappings.go new file mode 100644 index 0000000..780e412 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/player_mappings.go @@ -0,0 +1,20 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package model + +import ( + "time" +) + +type PlayerMappings struct { + GameID string `sql:"primary_key"` + UserID string `sql:"primary_key"` + RaceName string + EnginePlayerUUID string + CreatedAt time.Time +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/model/runtime_records.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/runtime_records.go new file mode 100644 index 0000000..209ad9b --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/model/runtime_records.go @@ -0,0 +1,30 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package model + +import ( + "time" +) + +type RuntimeRecords struct { + GameID string `sql:"primary_key"` + Status string + EngineEndpoint string + CurrentImageRef string + CurrentEngineVersion string + TurnSchedule string + CurrentTurn int32 + NextGenerationAt *time.Time + SkipNextTick bool + EngineHealth string + CreatedAt time.Time + UpdatedAt time.Time + StartedAt *time.Time + StoppedAt *time.Time + FinishedAt *time.Time +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/engine_versions.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/engine_versions.go new file mode 100644 index 0000000..bdd582e --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/engine_versions.go @@ -0,0 +1,93 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +import ( + "github.com/go-jet/jet/v2/postgres" +) + +var EngineVersions = newEngineVersionsTable("gamemaster", "engine_versions", "") + +type engineVersionsTable struct { + postgres.Table + + // Columns + Version postgres.ColumnString + ImageRef postgres.ColumnString + Options postgres.ColumnString + Status postgres.ColumnString + CreatedAt postgres.ColumnTimestampz + UpdatedAt postgres.ColumnTimestampz + + AllColumns postgres.ColumnList + MutableColumns postgres.ColumnList + DefaultColumns postgres.ColumnList +} + +type EngineVersionsTable struct { + engineVersionsTable + + EXCLUDED engineVersionsTable +} + +// AS creates new EngineVersionsTable with assigned alias +func (a EngineVersionsTable) AS(alias string) *EngineVersionsTable { + return newEngineVersionsTable(a.SchemaName(), a.TableName(), alias) +} + +// Schema creates new EngineVersionsTable with assigned schema name +func (a EngineVersionsTable) FromSchema(schemaName string) *EngineVersionsTable { + return newEngineVersionsTable(schemaName, a.TableName(), a.Alias()) +} + +// WithPrefix creates new EngineVersionsTable with assigned table prefix +func (a EngineVersionsTable) WithPrefix(prefix string) *EngineVersionsTable { + return newEngineVersionsTable(a.SchemaName(), prefix+a.TableName(), a.TableName()) +} + +// WithSuffix creates new EngineVersionsTable with assigned table suffix +func (a EngineVersionsTable) WithSuffix(suffix string) *EngineVersionsTable { + return newEngineVersionsTable(a.SchemaName(), a.TableName()+suffix, a.TableName()) +} + +func newEngineVersionsTable(schemaName, tableName, alias string) *EngineVersionsTable { + return &EngineVersionsTable{ + engineVersionsTable: newEngineVersionsTableImpl(schemaName, tableName, alias), + EXCLUDED: newEngineVersionsTableImpl("", "excluded", ""), + } +} + +func newEngineVersionsTableImpl(schemaName, tableName, alias string) engineVersionsTable { + var ( + VersionColumn = postgres.StringColumn("version") + ImageRefColumn = postgres.StringColumn("image_ref") + OptionsColumn = postgres.StringColumn("options") + StatusColumn = postgres.StringColumn("status") + CreatedAtColumn = postgres.TimestampzColumn("created_at") + UpdatedAtColumn = postgres.TimestampzColumn("updated_at") + allColumns = postgres.ColumnList{VersionColumn, ImageRefColumn, OptionsColumn, StatusColumn, CreatedAtColumn, UpdatedAtColumn} + mutableColumns = postgres.ColumnList{ImageRefColumn, OptionsColumn, StatusColumn, CreatedAtColumn, UpdatedAtColumn} + defaultColumns = postgres.ColumnList{OptionsColumn} + ) + + return engineVersionsTable{ + Table: postgres.NewTable(schemaName, tableName, alias, allColumns...), + + //Columns + Version: VersionColumn, + ImageRef: ImageRefColumn, + Options: OptionsColumn, + Status: StatusColumn, + CreatedAt: CreatedAtColumn, + UpdatedAt: UpdatedAtColumn, + + AllColumns: allColumns, + MutableColumns: mutableColumns, + DefaultColumns: defaultColumns, + } +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/goose_db_version.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/goose_db_version.go new file mode 100644 index 0000000..c4520e5 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/goose_db_version.go @@ -0,0 +1,87 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +import ( + "github.com/go-jet/jet/v2/postgres" +) + +var GooseDbVersion = newGooseDbVersionTable("gamemaster", "goose_db_version", "") + +type gooseDbVersionTable struct { + postgres.Table + + // Columns + ID postgres.ColumnInteger + VersionID postgres.ColumnInteger + IsApplied postgres.ColumnBool + Tstamp postgres.ColumnTimestamp + + AllColumns postgres.ColumnList + MutableColumns postgres.ColumnList + DefaultColumns postgres.ColumnList +} + +type GooseDbVersionTable struct { + gooseDbVersionTable + + EXCLUDED gooseDbVersionTable +} + +// AS creates new GooseDbVersionTable with assigned alias +func (a GooseDbVersionTable) AS(alias string) *GooseDbVersionTable { + return newGooseDbVersionTable(a.SchemaName(), a.TableName(), alias) +} + +// Schema creates new GooseDbVersionTable with assigned schema name +func (a GooseDbVersionTable) FromSchema(schemaName string) *GooseDbVersionTable { + return newGooseDbVersionTable(schemaName, a.TableName(), a.Alias()) +} + +// WithPrefix creates new GooseDbVersionTable with assigned table prefix +func (a GooseDbVersionTable) WithPrefix(prefix string) *GooseDbVersionTable { + return newGooseDbVersionTable(a.SchemaName(), prefix+a.TableName(), a.TableName()) +} + +// WithSuffix creates new GooseDbVersionTable with assigned table suffix +func (a GooseDbVersionTable) WithSuffix(suffix string) *GooseDbVersionTable { + return newGooseDbVersionTable(a.SchemaName(), a.TableName()+suffix, a.TableName()) +} + +func newGooseDbVersionTable(schemaName, tableName, alias string) *GooseDbVersionTable { + return &GooseDbVersionTable{ + gooseDbVersionTable: newGooseDbVersionTableImpl(schemaName, tableName, alias), + EXCLUDED: newGooseDbVersionTableImpl("", "excluded", ""), + } +} + +func newGooseDbVersionTableImpl(schemaName, tableName, alias string) gooseDbVersionTable { + var ( + IDColumn = postgres.IntegerColumn("id") + VersionIDColumn = postgres.IntegerColumn("version_id") + IsAppliedColumn = postgres.BoolColumn("is_applied") + TstampColumn = postgres.TimestampColumn("tstamp") + allColumns = postgres.ColumnList{IDColumn, VersionIDColumn, IsAppliedColumn, TstampColumn} + mutableColumns = postgres.ColumnList{VersionIDColumn, IsAppliedColumn, TstampColumn} + defaultColumns = postgres.ColumnList{TstampColumn} + ) + + return gooseDbVersionTable{ + Table: postgres.NewTable(schemaName, tableName, alias, allColumns...), + + //Columns + ID: IDColumn, + VersionID: VersionIDColumn, + IsApplied: IsAppliedColumn, + Tstamp: TstampColumn, + + AllColumns: allColumns, + MutableColumns: mutableColumns, + DefaultColumns: defaultColumns, + } +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/operation_log.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/operation_log.go new file mode 100644 index 0000000..9a3967e --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/operation_log.go @@ -0,0 +1,105 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +import ( + "github.com/go-jet/jet/v2/postgres" +) + +var OperationLog = newOperationLogTable("gamemaster", "operation_log", "") + +type operationLogTable struct { + postgres.Table + + // Columns + ID postgres.ColumnInteger + GameID postgres.ColumnString + OpKind postgres.ColumnString + OpSource postgres.ColumnString + SourceRef postgres.ColumnString + Outcome postgres.ColumnString + ErrorCode postgres.ColumnString + ErrorMessage postgres.ColumnString + StartedAt postgres.ColumnTimestampz + FinishedAt postgres.ColumnTimestampz + + AllColumns postgres.ColumnList + MutableColumns postgres.ColumnList + DefaultColumns postgres.ColumnList +} + +type OperationLogTable struct { + operationLogTable + + EXCLUDED operationLogTable +} + +// AS creates new OperationLogTable with assigned alias +func (a OperationLogTable) AS(alias string) *OperationLogTable { + return newOperationLogTable(a.SchemaName(), a.TableName(), alias) +} + +// Schema creates new OperationLogTable with assigned schema name +func (a OperationLogTable) FromSchema(schemaName string) *OperationLogTable { + return newOperationLogTable(schemaName, a.TableName(), a.Alias()) +} + +// WithPrefix creates new OperationLogTable with assigned table prefix +func (a OperationLogTable) WithPrefix(prefix string) *OperationLogTable { + return newOperationLogTable(a.SchemaName(), prefix+a.TableName(), a.TableName()) +} + +// WithSuffix creates new OperationLogTable with assigned table suffix +func (a OperationLogTable) WithSuffix(suffix string) *OperationLogTable { + return newOperationLogTable(a.SchemaName(), a.TableName()+suffix, a.TableName()) +} + +func newOperationLogTable(schemaName, tableName, alias string) *OperationLogTable { + return &OperationLogTable{ + operationLogTable: newOperationLogTableImpl(schemaName, tableName, alias), + EXCLUDED: newOperationLogTableImpl("", "excluded", ""), + } +} + +func newOperationLogTableImpl(schemaName, tableName, alias string) operationLogTable { + var ( + IDColumn = postgres.IntegerColumn("id") + GameIDColumn = postgres.StringColumn("game_id") + OpKindColumn = postgres.StringColumn("op_kind") + OpSourceColumn = postgres.StringColumn("op_source") + SourceRefColumn = postgres.StringColumn("source_ref") + OutcomeColumn = postgres.StringColumn("outcome") + ErrorCodeColumn = postgres.StringColumn("error_code") + ErrorMessageColumn = postgres.StringColumn("error_message") + StartedAtColumn = postgres.TimestampzColumn("started_at") + FinishedAtColumn = postgres.TimestampzColumn("finished_at") + allColumns = postgres.ColumnList{IDColumn, GameIDColumn, OpKindColumn, OpSourceColumn, SourceRefColumn, OutcomeColumn, ErrorCodeColumn, ErrorMessageColumn, StartedAtColumn, FinishedAtColumn} + mutableColumns = postgres.ColumnList{GameIDColumn, OpKindColumn, OpSourceColumn, SourceRefColumn, OutcomeColumn, ErrorCodeColumn, ErrorMessageColumn, StartedAtColumn, FinishedAtColumn} + defaultColumns = postgres.ColumnList{IDColumn, SourceRefColumn, ErrorCodeColumn, ErrorMessageColumn} + ) + + return operationLogTable{ + Table: postgres.NewTable(schemaName, tableName, alias, allColumns...), + + //Columns + ID: IDColumn, + GameID: GameIDColumn, + OpKind: OpKindColumn, + OpSource: OpSourceColumn, + SourceRef: SourceRefColumn, + Outcome: OutcomeColumn, + ErrorCode: ErrorCodeColumn, + ErrorMessage: ErrorMessageColumn, + StartedAt: StartedAtColumn, + FinishedAt: FinishedAtColumn, + + AllColumns: allColumns, + MutableColumns: mutableColumns, + DefaultColumns: defaultColumns, + } +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/player_mappings.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/player_mappings.go new file mode 100644 index 0000000..8c98b61 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/player_mappings.go @@ -0,0 +1,90 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +import ( + "github.com/go-jet/jet/v2/postgres" +) + +var PlayerMappings = newPlayerMappingsTable("gamemaster", "player_mappings", "") + +type playerMappingsTable struct { + postgres.Table + + // Columns + GameID postgres.ColumnString + UserID postgres.ColumnString + RaceName postgres.ColumnString + EnginePlayerUUID postgres.ColumnString + CreatedAt postgres.ColumnTimestampz + + AllColumns postgres.ColumnList + MutableColumns postgres.ColumnList + DefaultColumns postgres.ColumnList +} + +type PlayerMappingsTable struct { + playerMappingsTable + + EXCLUDED playerMappingsTable +} + +// AS creates new PlayerMappingsTable with assigned alias +func (a PlayerMappingsTable) AS(alias string) *PlayerMappingsTable { + return newPlayerMappingsTable(a.SchemaName(), a.TableName(), alias) +} + +// Schema creates new PlayerMappingsTable with assigned schema name +func (a PlayerMappingsTable) FromSchema(schemaName string) *PlayerMappingsTable { + return newPlayerMappingsTable(schemaName, a.TableName(), a.Alias()) +} + +// WithPrefix creates new PlayerMappingsTable with assigned table prefix +func (a PlayerMappingsTable) WithPrefix(prefix string) *PlayerMappingsTable { + return newPlayerMappingsTable(a.SchemaName(), prefix+a.TableName(), a.TableName()) +} + +// WithSuffix creates new PlayerMappingsTable with assigned table suffix +func (a PlayerMappingsTable) WithSuffix(suffix string) *PlayerMappingsTable { + return newPlayerMappingsTable(a.SchemaName(), a.TableName()+suffix, a.TableName()) +} + +func newPlayerMappingsTable(schemaName, tableName, alias string) *PlayerMappingsTable { + return &PlayerMappingsTable{ + playerMappingsTable: newPlayerMappingsTableImpl(schemaName, tableName, alias), + EXCLUDED: newPlayerMappingsTableImpl("", "excluded", ""), + } +} + +func newPlayerMappingsTableImpl(schemaName, tableName, alias string) playerMappingsTable { + var ( + GameIDColumn = postgres.StringColumn("game_id") + UserIDColumn = postgres.StringColumn("user_id") + RaceNameColumn = postgres.StringColumn("race_name") + EnginePlayerUUIDColumn = postgres.StringColumn("engine_player_uuid") + CreatedAtColumn = postgres.TimestampzColumn("created_at") + allColumns = postgres.ColumnList{GameIDColumn, UserIDColumn, RaceNameColumn, EnginePlayerUUIDColumn, CreatedAtColumn} + mutableColumns = postgres.ColumnList{RaceNameColumn, EnginePlayerUUIDColumn, CreatedAtColumn} + defaultColumns = postgres.ColumnList{} + ) + + return playerMappingsTable{ + Table: postgres.NewTable(schemaName, tableName, alias, allColumns...), + + //Columns + GameID: GameIDColumn, + UserID: UserIDColumn, + RaceName: RaceNameColumn, + EnginePlayerUUID: EnginePlayerUUIDColumn, + CreatedAt: CreatedAtColumn, + + AllColumns: allColumns, + MutableColumns: mutableColumns, + DefaultColumns: defaultColumns, + } +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/runtime_records.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/runtime_records.go new file mode 100644 index 0000000..463fe26 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/runtime_records.go @@ -0,0 +1,120 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +import ( + "github.com/go-jet/jet/v2/postgres" +) + +var RuntimeRecords = newRuntimeRecordsTable("gamemaster", "runtime_records", "") + +type runtimeRecordsTable struct { + postgres.Table + + // Columns + GameID postgres.ColumnString + Status postgres.ColumnString + EngineEndpoint postgres.ColumnString + CurrentImageRef postgres.ColumnString + CurrentEngineVersion postgres.ColumnString + TurnSchedule postgres.ColumnString + CurrentTurn postgres.ColumnInteger + NextGenerationAt postgres.ColumnTimestampz + SkipNextTick postgres.ColumnBool + EngineHealth postgres.ColumnString + CreatedAt postgres.ColumnTimestampz + UpdatedAt postgres.ColumnTimestampz + StartedAt postgres.ColumnTimestampz + StoppedAt postgres.ColumnTimestampz + FinishedAt postgres.ColumnTimestampz + + AllColumns postgres.ColumnList + MutableColumns postgres.ColumnList + DefaultColumns postgres.ColumnList +} + +type RuntimeRecordsTable struct { + runtimeRecordsTable + + EXCLUDED runtimeRecordsTable +} + +// AS creates new RuntimeRecordsTable with assigned alias +func (a RuntimeRecordsTable) AS(alias string) *RuntimeRecordsTable { + return newRuntimeRecordsTable(a.SchemaName(), a.TableName(), alias) +} + +// Schema creates new RuntimeRecordsTable with assigned schema name +func (a RuntimeRecordsTable) FromSchema(schemaName string) *RuntimeRecordsTable { + return newRuntimeRecordsTable(schemaName, a.TableName(), a.Alias()) +} + +// WithPrefix creates new RuntimeRecordsTable with assigned table prefix +func (a RuntimeRecordsTable) WithPrefix(prefix string) *RuntimeRecordsTable { + return newRuntimeRecordsTable(a.SchemaName(), prefix+a.TableName(), a.TableName()) +} + +// WithSuffix creates new RuntimeRecordsTable with assigned table suffix +func (a RuntimeRecordsTable) WithSuffix(suffix string) *RuntimeRecordsTable { + return newRuntimeRecordsTable(a.SchemaName(), a.TableName()+suffix, a.TableName()) +} + +func newRuntimeRecordsTable(schemaName, tableName, alias string) *RuntimeRecordsTable { + return &RuntimeRecordsTable{ + runtimeRecordsTable: newRuntimeRecordsTableImpl(schemaName, tableName, alias), + EXCLUDED: newRuntimeRecordsTableImpl("", "excluded", ""), + } +} + +func newRuntimeRecordsTableImpl(schemaName, tableName, alias string) runtimeRecordsTable { + var ( + GameIDColumn = postgres.StringColumn("game_id") + StatusColumn = postgres.StringColumn("status") + EngineEndpointColumn = postgres.StringColumn("engine_endpoint") + CurrentImageRefColumn = postgres.StringColumn("current_image_ref") + CurrentEngineVersionColumn = postgres.StringColumn("current_engine_version") + TurnScheduleColumn = postgres.StringColumn("turn_schedule") + CurrentTurnColumn = postgres.IntegerColumn("current_turn") + NextGenerationAtColumn = postgres.TimestampzColumn("next_generation_at") + SkipNextTickColumn = postgres.BoolColumn("skip_next_tick") + EngineHealthColumn = postgres.StringColumn("engine_health") + CreatedAtColumn = postgres.TimestampzColumn("created_at") + UpdatedAtColumn = postgres.TimestampzColumn("updated_at") + StartedAtColumn = postgres.TimestampzColumn("started_at") + StoppedAtColumn = postgres.TimestampzColumn("stopped_at") + FinishedAtColumn = postgres.TimestampzColumn("finished_at") + allColumns = postgres.ColumnList{GameIDColumn, StatusColumn, EngineEndpointColumn, CurrentImageRefColumn, CurrentEngineVersionColumn, TurnScheduleColumn, CurrentTurnColumn, NextGenerationAtColumn, SkipNextTickColumn, EngineHealthColumn, CreatedAtColumn, UpdatedAtColumn, StartedAtColumn, StoppedAtColumn, FinishedAtColumn} + mutableColumns = postgres.ColumnList{StatusColumn, EngineEndpointColumn, CurrentImageRefColumn, CurrentEngineVersionColumn, TurnScheduleColumn, CurrentTurnColumn, NextGenerationAtColumn, SkipNextTickColumn, EngineHealthColumn, CreatedAtColumn, UpdatedAtColumn, StartedAtColumn, StoppedAtColumn, FinishedAtColumn} + defaultColumns = postgres.ColumnList{CurrentTurnColumn, SkipNextTickColumn, EngineHealthColumn} + ) + + return runtimeRecordsTable{ + Table: postgres.NewTable(schemaName, tableName, alias, allColumns...), + + //Columns + GameID: GameIDColumn, + Status: StatusColumn, + EngineEndpoint: EngineEndpointColumn, + CurrentImageRef: CurrentImageRefColumn, + CurrentEngineVersion: CurrentEngineVersionColumn, + TurnSchedule: TurnScheduleColumn, + CurrentTurn: CurrentTurnColumn, + NextGenerationAt: NextGenerationAtColumn, + SkipNextTick: SkipNextTickColumn, + EngineHealth: EngineHealthColumn, + CreatedAt: CreatedAtColumn, + UpdatedAt: UpdatedAtColumn, + StartedAt: StartedAtColumn, + StoppedAt: StoppedAtColumn, + FinishedAt: FinishedAtColumn, + + AllColumns: allColumns, + MutableColumns: mutableColumns, + DefaultColumns: defaultColumns, + } +} diff --git a/gamemaster/internal/adapters/postgres/jet/gamemaster/table/table_use_schema.go b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/table_use_schema.go new file mode 100644 index 0000000..48e2814 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/jet/gamemaster/table/table_use_schema.go @@ -0,0 +1,18 @@ +// +// Code generated by go-jet DO NOT EDIT. +// +// WARNING: Changes to this file may cause incorrect behavior +// and will be lost if the code is regenerated +// + +package table + +// UseSchema sets a new schema name for all generated table SQL builder types. It is recommended to invoke +// this method only once at the beginning of the program. +func UseSchema(schema string) { + EngineVersions = EngineVersions.FromSchema(schema) + GooseDbVersion = GooseDbVersion.FromSchema(schema) + OperationLog = OperationLog.FromSchema(schema) + PlayerMappings = PlayerMappings.FromSchema(schema) + RuntimeRecords = RuntimeRecords.FromSchema(schema) +} diff --git a/gamemaster/internal/adapters/postgres/migrations/00001_init.sql b/gamemaster/internal/adapters/postgres/migrations/00001_init.sql new file mode 100644 index 0000000..4a097ae --- /dev/null +++ b/gamemaster/internal/adapters/postgres/migrations/00001_init.sql @@ -0,0 +1,136 @@ +-- +goose Up +-- Initial Game Master PostgreSQL schema. +-- +-- Four tables cover the durable surface of the service: +-- * runtime_records — one row per game with the latest known runtime +-- status, scheduling state, and engine health summary; +-- * engine_versions — the deployable engine version registry consumed +-- by Lobby's start flow and the GM admin/patch flow; +-- * player_mappings — the (game_id, user_id) → (race_name, +-- engine_player_uuid) projection installed at register-runtime; +-- * operation_log — append-only audit of every register-runtime, +-- turn-generation, force-next-turn, banish, stop, patch, and +-- engine-version mutation GM performed. +-- +-- Schema and the matching `gamemasterservice` role are provisioned +-- outside this script (in tests via cmd/jetgen/main.go::provisionRoleAndSchema; +-- in production via an ops init script). This migration runs as the +-- schema owner with `search_path=gamemaster` and only contains DDL for +-- the service-owned tables and indexes. ARCHITECTURE.md §Database topology +-- mandates that the per-service role's grants stay restricted to its own +-- schema; consequently this file deliberately deviates from PLAN.md +-- Stage 09's literal `CREATE SCHEMA IF NOT EXISTS gamemaster;` instruction. + +-- runtime_records holds one durable record per game with the latest +-- known runtime status, scheduling state, and engine health summary. +-- The status enum is enforced by a CHECK so domain code can rely on it +-- without reading every callsite. The composite (status, +-- next_generation_at) index drives the scheduler ticker scan that +-- selects `status='running' AND next_generation_at <= now()` once per +-- second. next_generation_at is nullable: a row enters with +-- status='starting' and a null tick, and only acquires a tick when the +-- register-runtime CAS flips it to 'running'. +CREATE TABLE runtime_records ( + game_id text PRIMARY KEY, + status text NOT NULL, + engine_endpoint text NOT NULL, + current_image_ref text NOT NULL, + current_engine_version text NOT NULL, + turn_schedule text NOT NULL, + current_turn integer NOT NULL DEFAULT 0, + next_generation_at timestamptz, + skip_next_tick boolean NOT NULL DEFAULT false, + engine_health text NOT NULL DEFAULT '', + created_at timestamptz NOT NULL, + updated_at timestamptz NOT NULL, + started_at timestamptz, + stopped_at timestamptz, + finished_at timestamptz, + CONSTRAINT runtime_records_status_chk + CHECK (status IN ( + 'starting', 'running', 'generation_in_progress', + 'generation_failed', 'stopped', 'engine_unreachable', + 'finished' + )) +); + +CREATE INDEX runtime_records_status_next_gen_idx + ON runtime_records (status, next_generation_at); + +-- engine_versions is the deployable engine version registry. Each row +-- ties a semver string to a Docker reference and a free-form options +-- document; the status enum gates the start flow (active versions are +-- accepted by Lobby's resolve, deprecated versions are rejected on new +-- starts but remain valid for already-running games). `options` is +-- jsonb: v1 stores it verbatim and never element-filters. +CREATE TABLE engine_versions ( + version text PRIMARY KEY, + image_ref text NOT NULL, + options jsonb NOT NULL DEFAULT '{}'::jsonb, + status text NOT NULL, + created_at timestamptz NOT NULL, + updated_at timestamptz NOT NULL, + CONSTRAINT engine_versions_status_chk + CHECK (status IN ('active', 'deprecated')) +); + +-- player_mappings carries the (game_id, user_id) → (race_name, +-- engine_player_uuid) projection installed at register-runtime. The +-- composite primary key both serves the lookups by (game_id, user_id) +-- on every command/order/report request and as a leftmost-prefix index +-- for the per-game roster reads (`WHERE game_id = $1`). The partial +-- UNIQUE index on (game_id, race_name) enforces the one-race-per-game +-- invariant at the storage boundary. +CREATE TABLE player_mappings ( + game_id text NOT NULL, + user_id text NOT NULL, + race_name text NOT NULL, + engine_player_uuid text NOT NULL, + created_at timestamptz NOT NULL, + PRIMARY KEY (game_id, user_id) +); + +CREATE UNIQUE INDEX player_mappings_game_race_uniq + ON player_mappings (game_id, race_name); + +-- operation_log is an append-only audit of every operation Game Master +-- performed against a game's runtime or against the engine version +-- registry. The (game_id, started_at DESC) index drives audit reads +-- from the GM/Admin REST surface. finished_at is nullable for in-flight +-- rows even though the service layer always finalises the row. The +-- op_kind / op_source / outcome enums are enforced by CHECK constraints +-- to keep the audit schema honest without a separate Go validator. +CREATE TABLE operation_log ( + id bigserial PRIMARY KEY, + game_id text NOT NULL, + op_kind text NOT NULL, + op_source text NOT NULL, + source_ref text NOT NULL DEFAULT '', + outcome text NOT NULL, + error_code text NOT NULL DEFAULT '', + error_message text NOT NULL DEFAULT '', + started_at timestamptz NOT NULL, + finished_at timestamptz, + CONSTRAINT operation_log_op_kind_chk + CHECK (op_kind IN ( + 'register_runtime', 'turn_generation', 'force_next_turn', + 'banish', 'stop', 'patch', + 'engine_version_create', 'engine_version_update', + 'engine_version_deprecate', 'engine_version_delete' + )), + CONSTRAINT operation_log_op_source_chk + CHECK (op_source IN ( + 'gateway_player', 'lobby_internal', 'admin_rest' + )), + CONSTRAINT operation_log_outcome_chk + CHECK (outcome IN ('success', 'failure')) +); + +CREATE INDEX operation_log_game_started_idx + ON operation_log (game_id, started_at DESC); + +-- +goose Down +DROP TABLE IF EXISTS operation_log; +DROP TABLE IF EXISTS player_mappings; +DROP TABLE IF EXISTS engine_versions; +DROP TABLE IF EXISTS runtime_records; diff --git a/gamemaster/internal/adapters/postgres/migrations/migrations.go b/gamemaster/internal/adapters/postgres/migrations/migrations.go new file mode 100644 index 0000000..31dcaa6 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/migrations/migrations.go @@ -0,0 +1,19 @@ +// Package migrations exposes the embedded goose migration files used by +// Game Master to provision its `gamemaster` schema in PostgreSQL. +// +// The embedded filesystem is consumed by `pkg/postgres.RunMigrations` +// during gamemaster-service startup and by `cmd/jetgen` when regenerating +// the `internal/adapters/postgres/jet/` code against a transient +// PostgreSQL instance. +package migrations + +import "embed" + +//go:embed *.sql +var fs embed.FS + +// FS returns the embedded filesystem containing every numbered goose +// migration shipped with Game Master. +func FS() embed.FS { + return fs +} diff --git a/gamemaster/internal/adapters/postgres/operationlog/store.go b/gamemaster/internal/adapters/postgres/operationlog/store.go new file mode 100644 index 0000000..d969206 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/operationlog/store.go @@ -0,0 +1,221 @@ +// Package operationlog implements the PostgreSQL-backed adapter for +// `ports.OperationLogStore`. +// +// The package owns the on-disk shape of the `operation_log` table +// defined in +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql` +// and translates the schema-agnostic `ports.OperationLogStore` +// interface declared in `internal/ports/operationlog.go` into +// concrete go-jet/v2 statements driven by the pgx driver. +// +// Append uses `INSERT ... RETURNING id` to surface the bigserial id +// back to callers; ListByGame is index-driven by +// `operation_log_game_started_idx`. +package operationlog + +import ( + "context" + "database/sql" + "errors" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/sqlx" + pgtable "galaxy/gamemaster/internal/adapters/postgres/jet/gamemaster/table" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/ports" + + pg "github.com/go-jet/jet/v2/postgres" +) + +// Config configures one PostgreSQL-backed operation-log store. +type Config struct { + DB *sql.DB + OperationTimeout time.Duration +} + +// Store persists Game Master operation-log entries in PostgreSQL. +type Store struct { + db *sql.DB + operationTimeout time.Duration +} + +// New constructs one PostgreSQL-backed operation-log store from cfg. +func New(cfg Config) (*Store, error) { + if cfg.DB == nil { + return nil, errors.New("new postgres operation log store: db must not be nil") + } + if cfg.OperationTimeout <= 0 { + return nil, errors.New("new postgres operation log store: operation timeout must be positive") + } + return &Store{ + db: cfg.DB, + operationTimeout: cfg.OperationTimeout, + }, nil +} + +// operationLogSelectColumns matches scanRow's column order. +var operationLogSelectColumns = pg.ColumnList{ + pgtable.OperationLog.ID, + pgtable.OperationLog.GameID, + pgtable.OperationLog.OpKind, + pgtable.OperationLog.OpSource, + pgtable.OperationLog.SourceRef, + pgtable.OperationLog.Outcome, + pgtable.OperationLog.ErrorCode, + pgtable.OperationLog.ErrorMessage, + pgtable.OperationLog.StartedAt, + pgtable.OperationLog.FinishedAt, +} + +// Append inserts entry into the operation log and returns the +// generated bigserial id. entry is validated through +// operation.OperationEntry.Validate before the SQL is issued. +func (store *Store) Append(ctx context.Context, entry operation.OperationEntry) (int64, error) { + if store == nil || store.db == nil { + return 0, errors.New("append operation log entry: nil store") + } + if err := entry.Validate(); err != nil { + return 0, fmt.Errorf("append operation log entry: %w", err) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "append operation log entry", store.operationTimeout) + if err != nil { + return 0, err + } + defer cancel() + + stmt := pgtable.OperationLog.INSERT( + pgtable.OperationLog.GameID, + pgtable.OperationLog.OpKind, + pgtable.OperationLog.OpSource, + pgtable.OperationLog.SourceRef, + pgtable.OperationLog.Outcome, + pgtable.OperationLog.ErrorCode, + pgtable.OperationLog.ErrorMessage, + pgtable.OperationLog.StartedAt, + pgtable.OperationLog.FinishedAt, + ).VALUES( + entry.GameID, + string(entry.OpKind), + string(entry.OpSource), + entry.SourceRef, + string(entry.Outcome), + entry.ErrorCode, + entry.ErrorMessage, + entry.StartedAt.UTC(), + sqlx.NullableTimePtr(entry.FinishedAt), + ).RETURNING(pgtable.OperationLog.ID) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + var id int64 + if err := row.Scan(&id); err != nil { + return 0, fmt.Errorf("append operation log entry: %w", err) + } + return id, nil +} + +// ListByGame returns the most recent entries for gameID, ordered by +// started_at descending and id descending (a tie-breaker that keeps +// the order stable when two rows share a started_at). The result is +// capped by limit; non-positive limit is rejected. +func (store *Store) ListByGame(ctx context.Context, gameID string, limit int) ([]operation.OperationEntry, error) { + if store == nil || store.db == nil { + return nil, errors.New("list operation log entries by game: nil store") + } + if strings.TrimSpace(gameID) == "" { + return nil, fmt.Errorf("list operation log entries by game: game id must not be empty") + } + if limit <= 0 { + return nil, fmt.Errorf("list operation log entries by game: limit must be positive, got %d", limit) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list operation log entries by game", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + stmt := pg.SELECT(operationLogSelectColumns). + FROM(pgtable.OperationLog). + WHERE(pgtable.OperationLog.GameID.EQ(pg.String(gameID))). + ORDER_BY(pgtable.OperationLog.StartedAt.DESC(), pgtable.OperationLog.ID.DESC()). + LIMIT(int64(limit)) + + query, args := stmt.Sql() + rows, err := store.db.QueryContext(operationCtx, query, args...) + if err != nil { + return nil, fmt.Errorf("list operation log entries by game: %w", err) + } + defer rows.Close() + + entries := make([]operation.OperationEntry, 0) + for rows.Next() { + got, err := scanRow(rows) + if err != nil { + return nil, fmt.Errorf("list operation log entries by game: scan: %w", err) + } + entries = append(entries, got) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("list operation log entries by game: %w", err) + } + if len(entries) == 0 { + return nil, nil + } + return entries, nil +} + +// rowScanner abstracts *sql.Row and *sql.Rows so scanRow can be shared +// across single-row and iterated reads. +type rowScanner interface { + Scan(dest ...any) error +} + +// scanRow scans one operation_log row from rs. +func scanRow(rs rowScanner) (operation.OperationEntry, error) { + var ( + id int64 + gameID string + opKind string + opSource string + sourceRef string + outcome string + errorCode string + errorMessage string + startedAt time.Time + finishedAt sql.NullTime + ) + if err := rs.Scan( + &id, + &gameID, + &opKind, + &opSource, + &sourceRef, + &outcome, + &errorCode, + &errorMessage, + &startedAt, + &finishedAt, + ); err != nil { + return operation.OperationEntry{}, err + } + return operation.OperationEntry{ + ID: id, + GameID: gameID, + OpKind: operation.OpKind(opKind), + OpSource: operation.OpSource(opSource), + SourceRef: sourceRef, + Outcome: operation.Outcome(outcome), + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: startedAt.UTC(), + FinishedAt: sqlx.TimePtrFromNullable(finishedAt), + }, nil +} + +// Ensure Store satisfies the ports.OperationLogStore interface at +// compile time. +var _ ports.OperationLogStore = (*Store)(nil) diff --git a/gamemaster/internal/adapters/postgres/operationlog/store_test.go b/gamemaster/internal/adapters/postgres/operationlog/store_test.go new file mode 100644 index 0000000..1d2b4e8 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/operationlog/store_test.go @@ -0,0 +1,190 @@ +package operationlog_test + +import ( + "context" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/pgtest" + "galaxy/gamemaster/internal/adapters/postgres/operationlog" + "galaxy/gamemaster/internal/domain/operation" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestMain(m *testing.M) { pgtest.RunMain(m) } + +func newStore(t *testing.T) *operationlog.Store { + t.Helper() + pgtest.TruncateAll(t) + store, err := operationlog.New(operationlog.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: pgtest.OperationTimeout, + }) + require.NoError(t, err) + return store +} + +func successEntry(gameID string, kind operation.OpKind, source operation.OpSource, startedAt time.Time) operation.OperationEntry { + finishedAt := startedAt.Add(50 * time.Millisecond) + return operation.OperationEntry{ + GameID: gameID, + OpKind: kind, + OpSource: source, + SourceRef: "req-001", + Outcome: operation.OutcomeSuccess, + StartedAt: startedAt, + FinishedAt: &finishedAt, + } +} + +func TestNewRejectsInvalidConfig(t *testing.T) { + _, err := operationlog.New(operationlog.Config{}) + require.Error(t, err) + + store, err := operationlog.New(operationlog.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: 0, + }) + require.Error(t, err) + require.Nil(t, store) +} + +func TestAppendSuccessEntry(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + at := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + entry := successEntry("game-001", operation.OpKindRegisterRuntime, operation.OpSourceLobbyInternal, at) + + id, err := store.Append(ctx, entry) + require.NoError(t, err) + assert.Greater(t, id, int64(0)) + + entries, err := store.ListByGame(ctx, "game-001", 10) + require.NoError(t, err) + require.Len(t, entries, 1) + got := entries[0] + assert.Equal(t, id, got.ID) + assert.Equal(t, entry.GameID, got.GameID) + assert.Equal(t, entry.OpKind, got.OpKind) + assert.Equal(t, entry.OpSource, got.OpSource) + assert.Equal(t, entry.SourceRef, got.SourceRef) + assert.Equal(t, operation.OutcomeSuccess, got.Outcome) + assert.Empty(t, got.ErrorCode) + assert.Empty(t, got.ErrorMessage) + assert.True(t, got.StartedAt.Equal(at)) + require.NotNil(t, got.FinishedAt) + assert.Equal(t, time.UTC, got.StartedAt.Location()) + assert.Equal(t, time.UTC, got.FinishedAt.Location()) +} + +func TestAppendFailureEntry(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + at := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + finishedAt := at.Add(time.Second) + entry := operation.OperationEntry{ + GameID: "game-001", + OpKind: operation.OpKindTurnGeneration, + OpSource: operation.OpSourceAdminRest, + Outcome: operation.OutcomeFailure, + ErrorCode: "engine_unreachable", + ErrorMessage: "connection refused", + StartedAt: at, + FinishedAt: &finishedAt, + } + + _, err := store.Append(ctx, entry) + require.NoError(t, err) + + got, err := store.ListByGame(ctx, "game-001", 1) + require.NoError(t, err) + require.Len(t, got, 1) + assert.Equal(t, operation.OutcomeFailure, got[0].Outcome) + assert.Equal(t, "engine_unreachable", got[0].ErrorCode) + assert.Equal(t, "connection refused", got[0].ErrorMessage) +} + +func TestAppendIDsAreMonotonic(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + at := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + id1, err := store.Append(ctx, successEntry("game-001", operation.OpKindRegisterRuntime, operation.OpSourceLobbyInternal, at)) + require.NoError(t, err) + + id2, err := store.Append(ctx, successEntry("game-001", operation.OpKindTurnGeneration, operation.OpSourceLobbyInternal, at.Add(time.Second))) + require.NoError(t, err) + + assert.Greater(t, id2, id1, "bigserial ids must be monotonic across appends") +} + +func TestAppendValidationRejection(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + bad := operation.OperationEntry{} + _, err := store.Append(ctx, bad) + require.Error(t, err) +} + +func TestListByGameOrderingDesc(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + at := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + _, err := store.Append(ctx, successEntry("game-001", operation.OpKindRegisterRuntime, operation.OpSourceLobbyInternal, at)) + require.NoError(t, err) + _, err = store.Append(ctx, successEntry("game-001", operation.OpKindTurnGeneration, operation.OpSourceLobbyInternal, at.Add(time.Second))) + require.NoError(t, err) + _, err = store.Append(ctx, successEntry("game-001", operation.OpKindStop, operation.OpSourceAdminRest, at.Add(2*time.Second))) + require.NoError(t, err) + + got, err := store.ListByGame(ctx, "game-001", 10) + require.NoError(t, err) + require.Len(t, got, 3) + assert.Equal(t, operation.OpKindStop, got[0].OpKind) + assert.Equal(t, operation.OpKindTurnGeneration, got[1].OpKind) + assert.Equal(t, operation.OpKindRegisterRuntime, got[2].OpKind) +} + +func TestListByGameRespectsLimit(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + at := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + for index := range 5 { + _, err := store.Append(ctx, successEntry("game-001", operation.OpKindTurnGeneration, operation.OpSourceLobbyInternal, at.Add(time.Duration(index)*time.Second))) + require.NoError(t, err) + } + + got, err := store.ListByGame(ctx, "game-001", 2) + require.NoError(t, err) + require.Len(t, got, 2) +} + +func TestListByGameUnknownGame(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + got, err := store.ListByGame(ctx, "unknown-game", 10) + require.NoError(t, err) + assert.Empty(t, got) +} + +func TestListByGameRejectsBadArgs(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.ListByGame(ctx, "", 10) + require.Error(t, err) + + _, err = store.ListByGame(ctx, "game-001", 0) + require.Error(t, err) + + _, err = store.ListByGame(ctx, "game-001", -1) + require.Error(t, err) +} diff --git a/gamemaster/internal/adapters/postgres/playermappingstore/store.go b/gamemaster/internal/adapters/postgres/playermappingstore/store.go new file mode 100644 index 0000000..e476dd0 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/playermappingstore/store.go @@ -0,0 +1,292 @@ +// Package playermappingstore implements the PostgreSQL-backed adapter +// for `ports.PlayerMappingStore`. +// +// The package owns the on-disk shape of the `player_mappings` table +// defined in +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql` +// and translates the schema-agnostic `ports.PlayerMappingStore` +// interface declared in `internal/ports/playermappingstore.go` into +// concrete go-jet/v2 statements driven by the pgx driver. +// +// BulkInsert ships every row in a single multi-row INSERT so the +// operation is atomic — any unique-constraint violation rolls back the +// whole batch and is mapped to playermapping.ErrConflict. +package playermappingstore + +import ( + "context" + "database/sql" + "errors" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/sqlx" + pgtable "galaxy/gamemaster/internal/adapters/postgres/jet/gamemaster/table" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/ports" + + pg "github.com/go-jet/jet/v2/postgres" +) + +// Config configures one PostgreSQL-backed player-mapping store. +type Config struct { + DB *sql.DB + OperationTimeout time.Duration +} + +// Store persists Game Master player mappings in PostgreSQL. +type Store struct { + db *sql.DB + operationTimeout time.Duration +} + +// New constructs one PostgreSQL-backed player-mapping store from cfg. +func New(cfg Config) (*Store, error) { + if cfg.DB == nil { + return nil, errors.New("new postgres player mapping store: db must not be nil") + } + if cfg.OperationTimeout <= 0 { + return nil, errors.New("new postgres player mapping store: operation timeout must be positive") + } + return &Store{ + db: cfg.DB, + operationTimeout: cfg.OperationTimeout, + }, nil +} + +// playerMappingSelectColumns matches scanRow's column order. +var playerMappingSelectColumns = pg.ColumnList{ + pgtable.PlayerMappings.GameID, + pgtable.PlayerMappings.UserID, + pgtable.PlayerMappings.RaceName, + pgtable.PlayerMappings.EnginePlayerUUID, + pgtable.PlayerMappings.CreatedAt, +} + +// BulkInsert installs every mapping in records using a single +// multi-row INSERT. Either every row is persisted or none of them is. +// Any PostgreSQL unique-violation +// (`(game_id, user_id)` PK or `(game_id, race_name)` UNIQUE) is mapped +// to playermapping.ErrConflict. +func (store *Store) BulkInsert(ctx context.Context, records []playermapping.PlayerMapping) error { + if store == nil || store.db == nil { + return errors.New("bulk insert player mappings: nil store") + } + if len(records) == 0 { + return nil + } + for index, record := range records { + if err := record.Validate(); err != nil { + return fmt.Errorf("bulk insert player mappings: record %d: %w", index, err) + } + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "bulk insert player mappings", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.PlayerMappings.INSERT( + pgtable.PlayerMappings.GameID, + pgtable.PlayerMappings.UserID, + pgtable.PlayerMappings.RaceName, + pgtable.PlayerMappings.EnginePlayerUUID, + pgtable.PlayerMappings.CreatedAt, + ) + for _, record := range records { + stmt = stmt.VALUES( + record.GameID, + record.UserID, + record.RaceName, + record.EnginePlayerUUID, + record.CreatedAt.UTC(), + ) + } + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + if sqlx.IsUniqueViolation(err) { + return fmt.Errorf("bulk insert player mappings: %w", playermapping.ErrConflict) + } + return fmt.Errorf("bulk insert player mappings: %w", err) + } + return nil +} + +// Get returns the mapping identified by (gameID, userID). +func (store *Store) Get(ctx context.Context, gameID, userID string) (playermapping.PlayerMapping, error) { + if store == nil || store.db == nil { + return playermapping.PlayerMapping{}, errors.New("get player mapping: nil store") + } + if strings.TrimSpace(gameID) == "" { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping: game id must not be empty") + } + if strings.TrimSpace(userID) == "" { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping: user id must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "get player mapping", store.operationTimeout) + if err != nil { + return playermapping.PlayerMapping{}, err + } + defer cancel() + + stmt := pg.SELECT(playerMappingSelectColumns). + FROM(pgtable.PlayerMappings). + WHERE(pg.AND( + pgtable.PlayerMappings.GameID.EQ(pg.String(gameID)), + pgtable.PlayerMappings.UserID.EQ(pg.String(userID)), + )) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + got, err := scanRow(row) + if sqlx.IsNoRows(err) { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + if err != nil { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping: %w", err) + } + return got, nil +} + +// GetByRace returns the mapping identified by (gameID, raceName). +func (store *Store) GetByRace(ctx context.Context, gameID, raceName string) (playermapping.PlayerMapping, error) { + if store == nil || store.db == nil { + return playermapping.PlayerMapping{}, errors.New("get player mapping by race: nil store") + } + if strings.TrimSpace(gameID) == "" { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping by race: game id must not be empty") + } + if strings.TrimSpace(raceName) == "" { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping by race: race name must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "get player mapping by race", store.operationTimeout) + if err != nil { + return playermapping.PlayerMapping{}, err + } + defer cancel() + + stmt := pg.SELECT(playerMappingSelectColumns). + FROM(pgtable.PlayerMappings). + WHERE(pg.AND( + pgtable.PlayerMappings.GameID.EQ(pg.String(gameID)), + pgtable.PlayerMappings.RaceName.EQ(pg.String(raceName)), + )) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + got, err := scanRow(row) + if sqlx.IsNoRows(err) { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + if err != nil { + return playermapping.PlayerMapping{}, fmt.Errorf("get player mapping by race: %w", err) + } + return got, nil +} + +// ListByGame returns every mapping owned by gameID, ordered by user_id +// ascending. +func (store *Store) ListByGame(ctx context.Context, gameID string) ([]playermapping.PlayerMapping, error) { + if store == nil || store.db == nil { + return nil, errors.New("list player mappings by game: nil store") + } + if strings.TrimSpace(gameID) == "" { + return nil, fmt.Errorf("list player mappings by game: game id must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list player mappings by game", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + stmt := pg.SELECT(playerMappingSelectColumns). + FROM(pgtable.PlayerMappings). + WHERE(pgtable.PlayerMappings.GameID.EQ(pg.String(gameID))). + ORDER_BY(pgtable.PlayerMappings.UserID.ASC()) + + query, args := stmt.Sql() + rows, err := store.db.QueryContext(operationCtx, query, args...) + if err != nil { + return nil, fmt.Errorf("list player mappings by game: %w", err) + } + defer rows.Close() + + mappings := make([]playermapping.PlayerMapping, 0) + for rows.Next() { + got, err := scanRow(rows) + if err != nil { + return nil, fmt.Errorf("list player mappings by game: scan: %w", err) + } + mappings = append(mappings, got) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("list player mappings by game: %w", err) + } + if len(mappings) == 0 { + return nil, nil + } + return mappings, nil +} + +// DeleteByGame removes every mapping owned by gameID. The call is +// idempotent: it returns nil even when no rows were deleted. +func (store *Store) DeleteByGame(ctx context.Context, gameID string) error { + if store == nil || store.db == nil { + return errors.New("delete player mappings by game: nil store") + } + if strings.TrimSpace(gameID) == "" { + return fmt.Errorf("delete player mappings by game: game id must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "delete player mappings by game", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.PlayerMappings.DELETE(). + WHERE(pgtable.PlayerMappings.GameID.EQ(pg.String(gameID))) + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + return fmt.Errorf("delete player mappings by game: %w", err) + } + return nil +} + +// rowScanner abstracts *sql.Row and *sql.Rows so scanRow can be shared +// across single-row and iterated reads. +type rowScanner interface { + Scan(dest ...any) error +} + +// scanRow scans one player_mappings row from rs. +func scanRow(rs rowScanner) (playermapping.PlayerMapping, error) { + var ( + gameID string + userID string + raceName string + enginePlayerUUID string + createdAt time.Time + ) + if err := rs.Scan(&gameID, &userID, &raceName, &enginePlayerUUID, &createdAt); err != nil { + return playermapping.PlayerMapping{}, err + } + return playermapping.PlayerMapping{ + GameID: gameID, + UserID: userID, + RaceName: raceName, + EnginePlayerUUID: enginePlayerUUID, + CreatedAt: createdAt.UTC(), + }, nil +} + +// Ensure Store satisfies the ports.PlayerMappingStore interface at +// compile time. +var _ ports.PlayerMappingStore = (*Store)(nil) diff --git a/gamemaster/internal/adapters/postgres/playermappingstore/store_test.go b/gamemaster/internal/adapters/postgres/playermappingstore/store_test.go new file mode 100644 index 0000000..50e865f --- /dev/null +++ b/gamemaster/internal/adapters/postgres/playermappingstore/store_test.go @@ -0,0 +1,264 @@ +package playermappingstore_test + +import ( + "context" + "errors" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/pgtest" + "galaxy/gamemaster/internal/adapters/postgres/playermappingstore" + "galaxy/gamemaster/internal/domain/playermapping" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestMain(m *testing.M) { pgtest.RunMain(m) } + +func newStore(t *testing.T) *playermappingstore.Store { + t.Helper() + pgtest.TruncateAll(t) + store, err := playermappingstore.New(playermappingstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: pgtest.OperationTimeout, + }) + require.NoError(t, err) + return store +} + +func mapping(gameID, userID, raceName, uuid string, createdAt time.Time) playermapping.PlayerMapping { + return playermapping.PlayerMapping{ + GameID: gameID, + UserID: userID, + RaceName: raceName, + EnginePlayerUUID: uuid, + CreatedAt: createdAt, + } +} + +func TestNewRejectsInvalidConfig(t *testing.T) { + _, err := playermappingstore.New(playermappingstore.Config{}) + require.Error(t, err) + + store, err := playermappingstore.New(playermappingstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: 0, + }) + require.Error(t, err) + require.Nil(t, store) +} + +func TestBulkInsertHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + records := []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + mapping("game-001", "user-2", "Drazi", "uuid-2", now), + mapping("game-001", "user-3", "Voltori", "uuid-3", now), + } + require.NoError(t, store.BulkInsert(ctx, records)) + + for _, want := range records { + got, err := store.Get(ctx, want.GameID, want.UserID) + require.NoError(t, err) + assert.Equal(t, want.RaceName, got.RaceName) + assert.Equal(t, want.EnginePlayerUUID, got.EnginePlayerUUID) + assert.True(t, got.CreatedAt.Equal(now)) + assert.Equal(t, time.UTC, got.CreatedAt.Location()) + } +} + +func TestBulkInsertEmpty(t *testing.T) { + ctx := context.Background() + store := newStore(t) + require.NoError(t, store.BulkInsert(ctx, nil)) + require.NoError(t, store.BulkInsert(ctx, []playermapping.PlayerMapping{})) + + got, err := store.ListByGame(ctx, "game-001") + require.NoError(t, err) + assert.Empty(t, got) +} + +func TestBulkInsertAtomicConflictRaceName(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + // user-2 reuses Aelinari (already taken by user-1) inside the same + // game — the unique (game_id, race_name) index must reject the + // whole batch. + records := []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + mapping("game-001", "user-2", "Drazi", "uuid-2", now), + mapping("game-001", "user-3", "Aelinari", "uuid-3", now), + } + err := store.BulkInsert(ctx, records) + require.Error(t, err) + require.True(t, errors.Is(err, playermapping.ErrConflict), "want ErrConflict, got %v", err) + + got, err := store.ListByGame(ctx, "game-001") + require.NoError(t, err) + assert.Empty(t, got, "atomic batch must roll back every row when any row fails") +} + +func TestBulkInsertAtomicConflictUserID(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + records := []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + mapping("game-001", "user-1", "Drazi", "uuid-2", now), // user-1 twice + } + err := store.BulkInsert(ctx, records) + require.Error(t, err) + require.True(t, errors.Is(err, playermapping.ErrConflict)) +} + +func TestBulkInsertConflictAcrossCalls(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.BulkInsert(ctx, []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + })) + + err := store.BulkInsert(ctx, []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "DifferentRace", "uuid-2", now), + }) + require.Error(t, err) + require.True(t, errors.Is(err, playermapping.ErrConflict)) +} + +func TestBulkInsertRejectsInvalid(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + bad := []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + {GameID: "game-001", UserID: "", RaceName: "Drazi", EnginePlayerUUID: "uuid-2", CreatedAt: now}, + } + err := store.BulkInsert(ctx, bad) + require.Error(t, err) + require.False(t, errors.Is(err, playermapping.ErrConflict)) + + got, err := store.ListByGame(ctx, "game-001") + require.NoError(t, err) + assert.Empty(t, got, "validation rejection must not insert any row") +} + +func TestGetMissingReturnsNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "game-001", "user-1") + require.Error(t, err) + require.True(t, errors.Is(err, playermapping.ErrNotFound)) +} + +func TestGetByRace(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.BulkInsert(ctx, []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + mapping("game-001", "user-2", "Drazi", "uuid-2", now), + })) + + got, err := store.GetByRace(ctx, "game-001", "Aelinari") + require.NoError(t, err) + assert.Equal(t, "user-1", got.UserID) + + _, err = store.GetByRace(ctx, "game-001", "Voltori") + require.Error(t, err) + require.True(t, errors.Is(err, playermapping.ErrNotFound)) +} + +func TestListByGameSortedByUserID(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.BulkInsert(ctx, []playermapping.PlayerMapping{ + mapping("game-001", "user-c", "Aelinari", "uuid-1", now), + mapping("game-001", "user-a", "Drazi", "uuid-2", now), + mapping("game-001", "user-b", "Voltori", "uuid-3", now), + // other game's mappings must not leak + mapping("game-002", "user-z", "Outsider", "uuid-4", now), + })) + + got, err := store.ListByGame(ctx, "game-001") + require.NoError(t, err) + require.Len(t, got, 3) + assert.Equal(t, "user-a", got[0].UserID) + assert.Equal(t, "user-b", got[1].UserID) + assert.Equal(t, "user-c", got[2].UserID) +} + +func TestListByGameUnknown(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + got, err := store.ListByGame(ctx, "unknown-game") + require.NoError(t, err) + assert.Empty(t, got) +} + +func TestDeleteByGameIdempotent(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.BulkInsert(ctx, []playermapping.PlayerMapping{ + mapping("game-001", "user-1", "Aelinari", "uuid-1", now), + mapping("game-001", "user-2", "Drazi", "uuid-2", now), + })) + + require.NoError(t, store.DeleteByGame(ctx, "game-001")) + got, err := store.ListByGame(ctx, "game-001") + require.NoError(t, err) + assert.Empty(t, got) + + // Second call must be a no-op. + require.NoError(t, store.DeleteByGame(ctx, "game-001")) +} + +func TestGetRejectsEmptyArgs(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "", "user-1") + require.Error(t, err) + _, err = store.Get(ctx, "game-001", "") + require.Error(t, err) +} + +func TestGetByRaceRejectsEmptyArgs(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.GetByRace(ctx, "", "Aelinari") + require.Error(t, err) + _, err = store.GetByRace(ctx, "game-001", "") + require.Error(t, err) +} + +func TestListByGameRejectsEmpty(t *testing.T) { + ctx := context.Background() + store := newStore(t) + _, err := store.ListByGame(ctx, "") + require.Error(t, err) +} + +func TestDeleteByGameRejectsEmpty(t *testing.T) { + ctx := context.Background() + store := newStore(t) + err := store.DeleteByGame(ctx, "") + require.Error(t, err) +} diff --git a/gamemaster/internal/adapters/postgres/runtimerecordstore/store.go b/gamemaster/internal/adapters/postgres/runtimerecordstore/store.go new file mode 100644 index 0000000..90bdb90 --- /dev/null +++ b/gamemaster/internal/adapters/postgres/runtimerecordstore/store.go @@ -0,0 +1,636 @@ +// Package runtimerecordstore implements the PostgreSQL-backed adapter +// for `ports.RuntimeRecordStore`. +// +// The package owns the on-disk shape of the `runtime_records` table +// defined in +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql` +// and translates the schema-agnostic `ports.RuntimeRecordStore` +// interface declared in `internal/ports/runtimerecordstore.go` into +// concrete go-jet/v2 statements driven by the pgx driver. +// +// Lifecycle transitions (UpdateStatus) use compare-and-swap on +// `(game_id, status)` rather than holding a SELECT ... FOR UPDATE lock +// across the caller's logic, mirroring the pattern used by +// `rtmanager/internal/adapters/postgres/runtimerecordstore`. +package runtimerecordstore + +import ( + "context" + "database/sql" + "errors" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/sqlx" + pgtable "galaxy/gamemaster/internal/adapters/postgres/jet/gamemaster/table" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + + pg "github.com/go-jet/jet/v2/postgres" +) + +// Config configures one PostgreSQL-backed runtime-record store. The +// store does not own the underlying *sql.DB lifecycle; the caller +// (typically the service runtime) opens, instruments, migrates, and +// closes the pool. +type Config struct { + // DB stores the connection pool the store uses for every query. + DB *sql.DB + + // OperationTimeout bounds one round trip. The store creates a + // derived context for each operation so callers cannot starve the + // pool with an unbounded ctx. + OperationTimeout time.Duration +} + +// Store persists Game Master runtime records in PostgreSQL. +type Store struct { + db *sql.DB + operationTimeout time.Duration +} + +// New constructs one PostgreSQL-backed runtime-record store from cfg. +func New(cfg Config) (*Store, error) { + if cfg.DB == nil { + return nil, errors.New("new postgres runtime record store: db must not be nil") + } + if cfg.OperationTimeout <= 0 { + return nil, errors.New("new postgres runtime record store: operation timeout must be positive") + } + return &Store{ + db: cfg.DB, + operationTimeout: cfg.OperationTimeout, + }, nil +} + +// runtimeSelectColumns is the canonical SELECT list for the +// runtime_records table, matching scanRecord's column order. +var runtimeSelectColumns = pg.ColumnList{ + pgtable.RuntimeRecords.GameID, + pgtable.RuntimeRecords.Status, + pgtable.RuntimeRecords.EngineEndpoint, + pgtable.RuntimeRecords.CurrentImageRef, + pgtable.RuntimeRecords.CurrentEngineVersion, + pgtable.RuntimeRecords.TurnSchedule, + pgtable.RuntimeRecords.CurrentTurn, + pgtable.RuntimeRecords.NextGenerationAt, + pgtable.RuntimeRecords.SkipNextTick, + pgtable.RuntimeRecords.EngineHealth, + pgtable.RuntimeRecords.CreatedAt, + pgtable.RuntimeRecords.UpdatedAt, + pgtable.RuntimeRecords.StartedAt, + pgtable.RuntimeRecords.StoppedAt, + pgtable.RuntimeRecords.FinishedAt, +} + +// Get returns the record identified by gameID. It returns +// runtime.ErrNotFound when no record exists. +func (store *Store) Get(ctx context.Context, gameID string) (runtime.RuntimeRecord, error) { + if store == nil || store.db == nil { + return runtime.RuntimeRecord{}, errors.New("get runtime record: nil store") + } + if strings.TrimSpace(gameID) == "" { + return runtime.RuntimeRecord{}, fmt.Errorf("get runtime record: game id must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "get runtime record", store.operationTimeout) + if err != nil { + return runtime.RuntimeRecord{}, err + } + defer cancel() + + stmt := pg.SELECT(runtimeSelectColumns). + FROM(pgtable.RuntimeRecords). + WHERE(pgtable.RuntimeRecords.GameID.EQ(pg.String(gameID))) + + query, args := stmt.Sql() + row := store.db.QueryRowContext(operationCtx, query, args...) + record, err := scanRecord(row) + if sqlx.IsNoRows(err) { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + if err != nil { + return runtime.RuntimeRecord{}, fmt.Errorf("get runtime record: %w", err) + } + return record, nil +} + +// Insert installs record into the store. Returns runtime.ErrConflict +// when a row already exists for record.GameID. +func (store *Store) Insert(ctx context.Context, record runtime.RuntimeRecord) error { + if store == nil || store.db == nil { + return errors.New("insert runtime record: nil store") + } + if err := record.Validate(); err != nil { + return fmt.Errorf("insert runtime record: %w", err) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "insert runtime record", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.RuntimeRecords.INSERT( + pgtable.RuntimeRecords.GameID, + pgtable.RuntimeRecords.Status, + pgtable.RuntimeRecords.EngineEndpoint, + pgtable.RuntimeRecords.CurrentImageRef, + pgtable.RuntimeRecords.CurrentEngineVersion, + pgtable.RuntimeRecords.TurnSchedule, + pgtable.RuntimeRecords.CurrentTurn, + pgtable.RuntimeRecords.NextGenerationAt, + pgtable.RuntimeRecords.SkipNextTick, + pgtable.RuntimeRecords.EngineHealth, + pgtable.RuntimeRecords.CreatedAt, + pgtable.RuntimeRecords.UpdatedAt, + pgtable.RuntimeRecords.StartedAt, + pgtable.RuntimeRecords.StoppedAt, + pgtable.RuntimeRecords.FinishedAt, + ).VALUES( + record.GameID, + string(record.Status), + record.EngineEndpoint, + record.CurrentImageRef, + record.CurrentEngineVersion, + record.TurnSchedule, + int32(record.CurrentTurn), + sqlx.NullableTimePtr(record.NextGenerationAt), + record.SkipNextTick, + record.EngineHealth, + record.CreatedAt.UTC(), + record.UpdatedAt.UTC(), + sqlx.NullableTimePtr(record.StartedAt), + sqlx.NullableTimePtr(record.StoppedAt), + sqlx.NullableTimePtr(record.FinishedAt), + ) + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + if sqlx.IsUniqueViolation(err) { + return fmt.Errorf("insert runtime record: %w", runtime.ErrConflict) + } + return fmt.Errorf("insert runtime record: %w", err) + } + return nil +} + +// UpdateStatus applies one status transition with a compare-and-swap +// guard on (game_id, status). The destination's lifecycle timestamps +// (started_at, stopped_at, finished_at) and the optional fields +// (engine_health, current_image_ref, current_engine_version) are +// written only when applicable. +func (store *Store) UpdateStatus(ctx context.Context, input ports.UpdateStatusInput) error { + if store == nil || store.db == nil { + return errors.New("update runtime status: nil store") + } + if err := input.Validate(); err != nil { + return err + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "update runtime status", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + assignments := buildUpdateStatusAssignments(input, input.Now.UTC()) + + // The first positional argument to UPDATE is required by jet's + // API but ignored when SET receives ColumnAssigment values + // (jet then serialises SetClauseNew instead of clauseSet). + stmt := pgtable.RuntimeRecords.UPDATE(pgtable.RuntimeRecords.Status). + SET(assignments[0], assignments[1:]...). + WHERE(pg.AND( + pgtable.RuntimeRecords.GameID.EQ(pg.String(input.GameID)), + pgtable.RuntimeRecords.Status.EQ(pg.String(string(input.ExpectedFrom))), + )) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("update runtime status: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("update runtime status: rows affected: %w", err) + } + if affected == 0 { + return store.classifyMissingUpdate(operationCtx, input.GameID) + } + return nil +} + +// buildUpdateStatusAssignments returns the slice of column assignments +// produced by one UpdateStatus call. Mandatory assignments (status, +// updated_at) are always present; lifecycle timestamps and optional +// fields appear only when relevant to the destination status or when +// the corresponding pointer is non-nil. +// +// The slice element type is `any` so the result can be spread into +// `UpdateStatement.SET(value any, values ...any)` without manual +// boxing at the call site. +func buildUpdateStatusAssignments(input ports.UpdateStatusInput, now time.Time) []any { + nowExpr := pg.TimestampzT(now) + assignments := []any{ + pgtable.RuntimeRecords.Status.SET(pg.String(string(input.To))), + pgtable.RuntimeRecords.UpdatedAt.SET(nowExpr), + } + + if input.To == runtime.StatusRunning && input.ExpectedFrom == runtime.StatusStarting { + assignments = append(assignments, pgtable.RuntimeRecords.StartedAt.SET(nowExpr)) + } + if input.To == runtime.StatusStopped { + assignments = append(assignments, pgtable.RuntimeRecords.StoppedAt.SET(nowExpr)) + } + if input.To == runtime.StatusFinished { + assignments = append(assignments, pgtable.RuntimeRecords.FinishedAt.SET(nowExpr)) + } + if input.EngineHealthSummary != nil { + assignments = append(assignments, pgtable.RuntimeRecords.EngineHealth.SET(pg.String(*input.EngineHealthSummary))) + } + if input.CurrentImageRef != nil { + assignments = append(assignments, pgtable.RuntimeRecords.CurrentImageRef.SET(pg.String(*input.CurrentImageRef))) + } + if input.CurrentEngineVersion != nil { + assignments = append(assignments, pgtable.RuntimeRecords.CurrentEngineVersion.SET(pg.String(*input.CurrentEngineVersion))) + } + + return assignments +} + +// classifyMissingUpdate distinguishes ErrNotFound from ErrConflict +// after an UPDATE that affected zero rows. A row that is absent yields +// ErrNotFound; a row whose status does not match the CAS predicate +// yields ErrConflict. +func (store *Store) classifyMissingUpdate(ctx context.Context, gameID string) error { + probe := pg.SELECT(pgtable.RuntimeRecords.Status). + FROM(pgtable.RuntimeRecords). + WHERE(pgtable.RuntimeRecords.GameID.EQ(pg.String(gameID))) + probeQuery, probeArgs := probe.Sql() + + var current string + row := store.db.QueryRowContext(ctx, probeQuery, probeArgs...) + if err := row.Scan(¤t); err != nil { + if sqlx.IsNoRows(err) { + return runtime.ErrNotFound + } + return fmt.Errorf("update runtime status: probe: %w", err) + } + return runtime.ErrConflict +} + +// UpdateImage rotates the `current_image_ref` and +// `current_engine_version` columns of one runtime row under a +// compare-and-swap guard on `(game_id, status)`. The destination +// status is preserved; only `updated_at` and the two image columns +// change. Returns runtime.ErrNotFound when no row matches and +// runtime.ErrConflict when the stored status differs from +// input.ExpectedStatus. Used by the admin patch flow (Stage 17) where +// Runtime Manager recreates the engine container with a new image +// while the runtime stays `running`. +func (store *Store) UpdateImage(ctx context.Context, input ports.UpdateImageInput) error { + if store == nil || store.db == nil { + return errors.New("update runtime image: nil store") + } + if err := input.Validate(); err != nil { + return err + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "update runtime image", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + now := input.Now.UTC() + stmt := pgtable.RuntimeRecords.UPDATE( + pgtable.RuntimeRecords.CurrentImageRef, + pgtable.RuntimeRecords.CurrentEngineVersion, + pgtable.RuntimeRecords.UpdatedAt, + ).SET( + pg.String(input.CurrentImageRef), + pg.String(input.CurrentEngineVersion), + pg.TimestampzT(now), + ).WHERE(pg.AND( + pgtable.RuntimeRecords.GameID.EQ(pg.String(input.GameID)), + pgtable.RuntimeRecords.Status.EQ(pg.String(string(input.ExpectedStatus))), + )) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("update runtime image: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("update runtime image: rows affected: %w", err) + } + if affected == 0 { + return store.classifyMissingUpdate(operationCtx, input.GameID) + } + return nil +} + +// UpdateEngineHealth rotates the `engine_health` column of one runtime +// row plus `updated_at`. The destination status is preserved and no +// CAS guard is applied so late-arriving runtime:health_events still +// refresh the summary regardless of the current runtime status. Used +// by the Stage 18 health-events consumer. Returns runtime.ErrNotFound +// when no row exists for input.GameID. +func (store *Store) UpdateEngineHealth(ctx context.Context, input ports.UpdateEngineHealthInput) error { + if store == nil || store.db == nil { + return errors.New("update runtime engine health: nil store") + } + if err := input.Validate(); err != nil { + return err + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "update runtime engine health", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.RuntimeRecords.UPDATE( + pgtable.RuntimeRecords.EngineHealth, + pgtable.RuntimeRecords.UpdatedAt, + ).SET( + pg.String(input.EngineHealthSummary), + pg.TimestampzT(input.Now.UTC()), + ).WHERE(pgtable.RuntimeRecords.GameID.EQ(pg.String(input.GameID))) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("update runtime engine health: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("update runtime engine health: rows affected: %w", err) + } + if affected == 0 { + return runtime.ErrNotFound + } + return nil +} + +// UpdateScheduling mutates the scheduling columns of one runtime row +// (`next_generation_at`, `skip_next_tick`, `current_turn`) plus +// `updated_at`. Returns runtime.ErrNotFound when no row exists. +func (store *Store) UpdateScheduling(ctx context.Context, input ports.UpdateSchedulingInput) error { + if store == nil || store.db == nil { + return errors.New("update runtime scheduling: nil store") + } + if err := input.Validate(); err != nil { + return err + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "update runtime scheduling", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + var nextGenExpr pg.Expression + if input.NextGenerationAt != nil { + nextGenExpr = pg.TimestampzT(input.NextGenerationAt.UTC()) + } else { + nextGenExpr = pg.NULL + } + + stmt := pgtable.RuntimeRecords.UPDATE( + pgtable.RuntimeRecords.NextGenerationAt, + pgtable.RuntimeRecords.SkipNextTick, + pgtable.RuntimeRecords.CurrentTurn, + pgtable.RuntimeRecords.UpdatedAt, + ).SET( + nextGenExpr, + pg.Bool(input.SkipNextTick), + pg.Int32(int32(input.CurrentTurn)), + pg.TimestampzT(input.Now.UTC()), + ).WHERE(pgtable.RuntimeRecords.GameID.EQ(pg.String(input.GameID))) + + query, args := stmt.Sql() + result, err := store.db.ExecContext(operationCtx, query, args...) + if err != nil { + return fmt.Errorf("update runtime scheduling: %w", err) + } + affected, err := result.RowsAffected() + if err != nil { + return fmt.Errorf("update runtime scheduling: rows affected: %w", err) + } + if affected == 0 { + return runtime.ErrNotFound + } + return nil +} + +// Delete removes the record identified by gameID. The call is +// idempotent: it returns nil even when no row matches (mirrors +// PlayerMappingStore.DeleteByGame). Used by the register-runtime +// rollback path (Stage 13) when engine /admin/init or any later setup +// step fails after the row has been installed with status=starting. +func (store *Store) Delete(ctx context.Context, gameID string) error { + if store == nil || store.db == nil { + return errors.New("delete runtime record: nil store") + } + if strings.TrimSpace(gameID) == "" { + return fmt.Errorf("delete runtime record: game id must not be empty") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "delete runtime record", store.operationTimeout) + if err != nil { + return err + } + defer cancel() + + stmt := pgtable.RuntimeRecords.DELETE(). + WHERE(pgtable.RuntimeRecords.GameID.EQ(pg.String(gameID))) + + query, args := stmt.Sql() + if _, err := store.db.ExecContext(operationCtx, query, args...); err != nil { + return fmt.Errorf("delete runtime record: %w", err) + } + return nil +} + +// ListDueRunning returns every record whose status is `running` and +// whose `next_generation_at <= now`. The order is +// (next_generation_at ASC, game_id ASC), matching the +// `runtime_records_status_next_gen_idx` direction. +func (store *Store) ListDueRunning(ctx context.Context, now time.Time) ([]runtime.RuntimeRecord, error) { + if store == nil || store.db == nil { + return nil, errors.New("list due runtime records: nil store") + } + if now.IsZero() { + return nil, fmt.Errorf("list due runtime records: now must not be zero") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list due runtime records", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + cutoff := pg.TimestampzT(now.UTC()) + stmt := pg.SELECT(runtimeSelectColumns). + FROM(pgtable.RuntimeRecords). + WHERE(pg.AND( + pgtable.RuntimeRecords.Status.EQ(pg.String(string(runtime.StatusRunning))), + pgtable.RuntimeRecords.NextGenerationAt.LT_EQ(cutoff), + )). + ORDER_BY( + pgtable.RuntimeRecords.NextGenerationAt.ASC(), + pgtable.RuntimeRecords.GameID.ASC(), + ) + + return store.queryRecords(operationCtx, stmt, "list due runtime records") +} + +// List returns every record in the store, ordered by `created_at` +// descending and by `game_id` ascending as a tie-breaker. Used by the +// `internalListRuntimes` REST handler when no status filter is +// supplied. +func (store *Store) List(ctx context.Context) ([]runtime.RuntimeRecord, error) { + if store == nil || store.db == nil { + return nil, errors.New("list runtime records: nil store") + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list runtime records", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + stmt := pg.SELECT(runtimeSelectColumns). + FROM(pgtable.RuntimeRecords). + ORDER_BY( + pgtable.RuntimeRecords.CreatedAt.DESC(), + pgtable.RuntimeRecords.GameID.ASC(), + ) + + return store.queryRecords(operationCtx, stmt, "list runtime records") +} + +// ListByStatus returns every record currently indexed under status, +// ordered by game_id ASC. +func (store *Store) ListByStatus(ctx context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) { + if store == nil || store.db == nil { + return nil, errors.New("list runtime records by status: nil store") + } + if !status.IsKnown() { + return nil, fmt.Errorf("list runtime records by status: status %q is unsupported", status) + } + + operationCtx, cancel, err := sqlx.WithTimeout(ctx, "list runtime records by status", store.operationTimeout) + if err != nil { + return nil, err + } + defer cancel() + + stmt := pg.SELECT(runtimeSelectColumns). + FROM(pgtable.RuntimeRecords). + WHERE(pgtable.RuntimeRecords.Status.EQ(pg.String(string(status)))). + ORDER_BY(pgtable.RuntimeRecords.GameID.ASC()) + + return store.queryRecords(operationCtx, stmt, "list runtime records by status") +} + +// queryRecords runs a SELECT statement and scans every returned row +// into a runtime.RuntimeRecord slice. opName is used only to prefix +// error messages. +func (store *Store) queryRecords(ctx context.Context, stmt pg.SelectStatement, opName string) ([]runtime.RuntimeRecord, error) { + query, args := stmt.Sql() + rows, err := store.db.QueryContext(ctx, query, args...) + if err != nil { + return nil, fmt.Errorf("%s: %w", opName, err) + } + defer rows.Close() + + records := make([]runtime.RuntimeRecord, 0) + for rows.Next() { + record, err := scanRecord(rows) + if err != nil { + return nil, fmt.Errorf("%s: scan: %w", opName, err) + } + records = append(records, record) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("%s: %w", opName, err) + } + if len(records) == 0 { + return nil, nil + } + return records, nil +} + +// rowScanner abstracts *sql.Row and *sql.Rows so scanRecord can be +// shared across both single-row and iterated reads. +type rowScanner interface { + Scan(dest ...any) error +} + +// scanRecord scans one runtime_records row from rs. Returns +// sql.ErrNoRows verbatim so callers can distinguish "no row" from a +// hard error. +func scanRecord(rs rowScanner) (runtime.RuntimeRecord, error) { + var ( + gameID string + status string + engineEndpoint string + currentImageRef string + currentEngineVersion string + turnSchedule string + currentTurn int32 + nextGenerationAt sql.NullTime + skipNextTick bool + engineHealth string + createdAt time.Time + updatedAt time.Time + startedAt sql.NullTime + stoppedAt sql.NullTime + finishedAt sql.NullTime + ) + if err := rs.Scan( + &gameID, + &status, + &engineEndpoint, + ¤tImageRef, + ¤tEngineVersion, + &turnSchedule, + ¤tTurn, + &nextGenerationAt, + &skipNextTick, + &engineHealth, + &createdAt, + &updatedAt, + &startedAt, + &stoppedAt, + &finishedAt, + ); err != nil { + return runtime.RuntimeRecord{}, err + } + return runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.Status(status), + EngineEndpoint: engineEndpoint, + CurrentImageRef: currentImageRef, + CurrentEngineVersion: currentEngineVersion, + TurnSchedule: turnSchedule, + CurrentTurn: int(currentTurn), + NextGenerationAt: sqlx.TimePtrFromNullable(nextGenerationAt), + SkipNextTick: skipNextTick, + EngineHealth: engineHealth, + CreatedAt: createdAt.UTC(), + UpdatedAt: updatedAt.UTC(), + StartedAt: sqlx.TimePtrFromNullable(startedAt), + StoppedAt: sqlx.TimePtrFromNullable(stoppedAt), + FinishedAt: sqlx.TimePtrFromNullable(finishedAt), + }, nil +} + +// Ensure Store satisfies the ports.RuntimeRecordStore interface at +// compile time. +var _ ports.RuntimeRecordStore = (*Store)(nil) diff --git a/gamemaster/internal/adapters/postgres/runtimerecordstore/store_test.go b/gamemaster/internal/adapters/postgres/runtimerecordstore/store_test.go new file mode 100644 index 0000000..76fac9a --- /dev/null +++ b/gamemaster/internal/adapters/postgres/runtimerecordstore/store_test.go @@ -0,0 +1,718 @@ +package runtimerecordstore_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/postgres/internal/pgtest" + "galaxy/gamemaster/internal/adapters/postgres/runtimerecordstore" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestMain(m *testing.M) { pgtest.RunMain(m) } + +func newStore(t *testing.T) *runtimerecordstore.Store { + t.Helper() + pgtest.TruncateAll(t) + store, err := runtimerecordstore.New(runtimerecordstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: pgtest.OperationTimeout, + }) + require.NoError(t, err) + return store +} + +func startingRecord(gameID string, createdAt time.Time) runtime.RuntimeRecord { + return runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.StatusStarting, + EngineEndpoint: "http://galaxy-game-" + gameID + ":8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 0, + EngineHealth: "", + CreatedAt: createdAt, + UpdatedAt: createdAt, + } +} + +func runningRecord(gameID string, createdAt time.Time, nextGen time.Time) runtime.RuntimeRecord { + startedAt := createdAt.Add(time.Second) + return runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-" + gameID + ":8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 1, + NextGenerationAt: &nextGen, + EngineHealth: "healthy", + CreatedAt: createdAt, + UpdatedAt: startedAt, + StartedAt: &startedAt, + } +} + +func TestNewRejectsInvalidConfig(t *testing.T) { + _, err := runtimerecordstore.New(runtimerecordstore.Config{}) + require.Error(t, err) + + store, err := runtimerecordstore.New(runtimerecordstore.Config{ + DB: pgtest.Ensure(t).Pool(), + OperationTimeout: 0, + }) + require.Error(t, err) + require.Nil(t, store) +} + +func TestInsertGetRoundTrip(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + record := startingRecord("game-001", now) + + require.NoError(t, store.Insert(ctx, record)) + + got, err := store.Get(ctx, record.GameID) + require.NoError(t, err) + assert.Equal(t, record.GameID, got.GameID) + assert.Equal(t, runtime.StatusStarting, got.Status) + assert.Equal(t, record.EngineEndpoint, got.EngineEndpoint) + assert.Equal(t, record.CurrentImageRef, got.CurrentImageRef) + assert.Equal(t, record.CurrentEngineVersion, got.CurrentEngineVersion) + assert.Equal(t, record.TurnSchedule, got.TurnSchedule) + assert.Equal(t, 0, got.CurrentTurn) + assert.Nil(t, got.NextGenerationAt) + assert.False(t, got.SkipNextTick) + assert.Equal(t, "", got.EngineHealth) + assert.True(t, got.CreatedAt.Equal(now), "created_at: want %v, got %v", now, got.CreatedAt) + assert.Equal(t, time.UTC, got.CreatedAt.Location()) + assert.True(t, got.UpdatedAt.Equal(now)) + assert.Equal(t, time.UTC, got.UpdatedAt.Location()) + assert.Nil(t, got.StartedAt) + assert.Nil(t, got.StoppedAt) + assert.Nil(t, got.FinishedAt) +} + +func TestInsertRejectsDuplicate(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + record := startingRecord("game-001", now) + require.NoError(t, store.Insert(ctx, record)) + + err := store.Insert(ctx, record) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrConflict), "want ErrConflict, got %v", err) +} + +func TestInsertRejectsInvalidRecord(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + bad := runtime.RuntimeRecord{} // empty + err := store.Insert(ctx, bad) + require.Error(t, err) + require.False(t, errors.Is(err, runtime.ErrConflict)) +} + +func TestGetReturnsErrNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "missing") + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrNotFound)) +} + +func TestUpdateStatusStartingToRunningSetsStartedAt(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, startingRecord("game-001", created))) + + now := created.Add(2 * time.Second) + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusStarting, + To: runtime.StatusRunning, + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusRunning, got.Status) + require.NotNil(t, got.StartedAt) + assert.True(t, got.StartedAt.Equal(now)) + assert.True(t, got.UpdatedAt.Equal(now)) + assert.Nil(t, got.StoppedAt) + assert.Nil(t, got.FinishedAt) +} + +func TestUpdateStatusToFinishedSetsFinishedAt(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusGenerationInProgress, + Now: created.Add(2 * time.Second), + })) + + finishAt := created.Add(time.Hour) + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusGenerationInProgress, + To: runtime.StatusFinished, + Now: finishAt, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusFinished, got.Status) + require.NotNil(t, got.FinishedAt) + assert.True(t, got.FinishedAt.Equal(finishAt)) + assert.True(t, got.UpdatedAt.Equal(finishAt)) +} + +func TestUpdateStatusToStoppedSetsStoppedAt(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + stopAt := created.Add(2 * time.Hour) + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusStopped, + Now: stopAt, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusStopped, got.Status) + require.NotNil(t, got.StoppedAt) + assert.True(t, got.StoppedAt.Equal(stopAt)) + require.NotNil(t, got.StartedAt, "started_at must remain set after stop") + assert.Nil(t, got.FinishedAt) +} + +func TestUpdateStatusEngineUnreachableRecoveryKeepsStartedAt(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + original := runningRecord("game-001", created, nextGen) + require.NoError(t, store.Insert(ctx, original)) + + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusEngineUnreachable, + Now: created.Add(time.Minute), + })) + + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusEngineUnreachable, + To: runtime.StatusRunning, + Now: created.Add(2 * time.Minute), + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusRunning, got.Status) + require.NotNil(t, got.StartedAt) + assert.True(t, got.StartedAt.Equal(*original.StartedAt), + "recovery transition must not overwrite started_at") +} + +func TestUpdateStatusOptionalFields(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + healthy := "engine_unreachable_summary" + imageRef := "ghcr.io/galaxy/game:v1.2.4" + engineVersion := "v1.2.4" + now := created.Add(time.Minute) + + require.NoError(t, store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusGenerationInProgress, + Now: now, + EngineHealthSummary: &healthy, + CurrentImageRef: &imageRef, + CurrentEngineVersion: &engineVersion, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusGenerationInProgress, got.Status) + assert.Equal(t, healthy, got.EngineHealth) + assert.Equal(t, imageRef, got.CurrentImageRef) + assert.Equal(t, engineVersion, got.CurrentEngineVersion) +} + +func TestUpdateStatusOnMissingReturnsNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "ghost", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusStopped, + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrNotFound), "want ErrNotFound, got %v", err) +} + +func TestUpdateStatusStaleCASReturnsConflict(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, startingRecord("game-001", created))) + + err := store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusStopped, + Now: created.Add(time.Second), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrConflict), "want ErrConflict, got %v", err) +} + +func TestUpdateStatusConcurrentCAS(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + const concurrency = 8 + results := make([]error, concurrency) + var wg sync.WaitGroup + wg.Add(concurrency) + for index := range concurrency { + go func() { + defer wg.Done() + results[index] = store.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: "game-001", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusStopped, + Now: created.Add(time.Duration(index+1) * time.Second), + }) + }() + } + wg.Wait() + + wins, conflicts := 0, 0 + for _, err := range results { + switch { + case err == nil: + wins++ + case errors.Is(err, runtime.ErrConflict): + conflicts++ + default: + t.Errorf("unexpected error: %v", err) + } + } + assert.Equal(t, 1, wins, "exactly one caller must win the CAS race") + assert.Equal(t, concurrency-1, conflicts, "the rest must observe runtime.ErrConflict") +} + +func TestUpdateImageHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + now := nextGen.Add(time.Second) + require.NoError(t, store.UpdateImage(ctx, ports.UpdateImageInput{ + GameID: "game-001", + ExpectedStatus: runtime.StatusRunning, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.4", + CurrentEngineVersion: "v1.2.4", + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusRunning, got.Status, "patch must not change status") + assert.Equal(t, "ghcr.io/galaxy/game:v1.2.4", got.CurrentImageRef) + assert.Equal(t, "v1.2.4", got.CurrentEngineVersion) + assert.True(t, got.UpdatedAt.Equal(now)) + require.NotNil(t, got.NextGenerationAt, "next_generation_at must remain untouched") + assert.True(t, got.NextGenerationAt.Equal(nextGen)) + assert.Equal(t, 1, got.CurrentTurn, "current_turn must remain untouched") +} + +func TestUpdateImageStaleStatusReturnsConflict(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, startingRecord("game-001", created))) + + err := store.UpdateImage(ctx, ports.UpdateImageInput{ + GameID: "game-001", + ExpectedStatus: runtime.StatusRunning, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.4", + CurrentEngineVersion: "v1.2.4", + Now: created.Add(time.Second), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrConflict), "want ErrConflict, got %v", err) +} + +func TestUpdateImageOnMissingReturnsNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateImage(ctx, ports.UpdateImageInput{ + GameID: "ghost", + ExpectedStatus: runtime.StatusRunning, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.4", + CurrentEngineVersion: "v1.2.4", + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrNotFound), "want ErrNotFound, got %v", err) +} + +func TestUpdateImageRejectsInvalidInput(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateImage(ctx, ports.UpdateImageInput{ + GameID: "", + ExpectedStatus: runtime.StatusRunning, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.4", + CurrentEngineVersion: "v1.2.4", + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.False(t, errors.Is(err, runtime.ErrConflict)) + require.False(t, errors.Is(err, runtime.ErrNotFound)) +} + +func TestUpdateEngineHealthHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + now := nextGen.Add(2 * time.Second) + require.NoError(t, store.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: "game-001", + EngineHealthSummary: "probe_failed", + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusRunning, got.Status, "engine health update must not change status") + assert.Equal(t, "probe_failed", got.EngineHealth) + assert.True(t, got.UpdatedAt.Equal(now)) + require.NotNil(t, got.NextGenerationAt, "next_generation_at must remain untouched") + assert.True(t, got.NextGenerationAt.Equal(nextGen)) + assert.Equal(t, 1, got.CurrentTurn, "current_turn must remain untouched") +} + +func TestUpdateEngineHealthAcceptsEmptySummary(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + now := nextGen.Add(time.Second) + require.NoError(t, store.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: "game-001", + EngineHealthSummary: "", + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, "", got.EngineHealth) +} + +func TestUpdateEngineHealthOnMissingReturnsNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: "ghost", + EngineHealthSummary: "exited", + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrNotFound), "want ErrNotFound, got %v", err) +} + +func TestUpdateEngineHealthRejectsInvalidInput(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: "", + EngineHealthSummary: "healthy", + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.False(t, errors.Is(err, runtime.ErrConflict)) + require.False(t, errors.Is(err, runtime.ErrNotFound)) +} + +func TestUpdateEngineHealthAppliesFromAnyStatus(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, startingRecord("game-001", created))) + + now := created.Add(time.Second) + require.NoError(t, store.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: "game-001", + EngineHealthSummary: "exited", + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Equal(t, runtime.StatusStarting, got.Status, "no status mutation expected") + assert.Equal(t, "exited", got.EngineHealth) +} + +func TestUpdateSchedulingHappy(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + updated := nextGen.Add(time.Hour) + now := nextGen.Add(time.Second) + require.NoError(t, store.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: "game-001", + NextGenerationAt: &updated, + SkipNextTick: true, + CurrentTurn: 5, + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + require.NotNil(t, got.NextGenerationAt) + assert.True(t, got.NextGenerationAt.Equal(updated)) + assert.True(t, got.SkipNextTick) + assert.Equal(t, 5, got.CurrentTurn) + assert.True(t, got.UpdatedAt.Equal(now)) +} + +func TestUpdateSchedulingClearsNextGen(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + nextGen := created.Add(time.Hour) + require.NoError(t, store.Insert(ctx, runningRecord("game-001", created, nextGen))) + + now := nextGen.Add(time.Second) + require.NoError(t, store.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: "game-001", + NextGenerationAt: nil, + SkipNextTick: false, + CurrentTurn: 0, + Now: now, + })) + + got, err := store.Get(ctx, "game-001") + require.NoError(t, err) + assert.Nil(t, got.NextGenerationAt) + assert.False(t, got.SkipNextTick) +} + +func TestUpdateSchedulingOnMissingReturnsNotFound(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + err := store.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: "ghost", + CurrentTurn: 0, + Now: time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC), + }) + require.Error(t, err) + require.True(t, errors.Is(err, runtime.ErrNotFound)) +} + +func TestListDueRunning(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + createdEarlier := time.Date(2026, time.April, 27, 10, 0, 0, 0, time.UTC) + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + due := created.Add(-time.Minute) // due before now + future := created.Add(time.Hour) // not due yet + + dueRecord := runningRecord("game-due", created, due) + require.NoError(t, store.Insert(ctx, dueRecord)) + + futureRecord := runningRecord("game-future", created, future) + require.NoError(t, store.Insert(ctx, futureRecord)) + + // A stopped record whose next_generation_at is in the past must + // still be excluded by the running-status filter. + stoppedRecord := startingRecord("game-stopped", createdEarlier) + stoppedRecord.Status = runtime.StatusStopped + startedAt := createdEarlier.Add(time.Second) + stoppedAt := createdEarlier.Add(time.Minute) + stoppedRecord.StartedAt = &startedAt + stoppedRecord.StoppedAt = &stoppedAt + stoppedRecord.UpdatedAt = stoppedAt + stalePast := created.Add(-30 * time.Minute) + stoppedRecord.NextGenerationAt = &stalePast + require.NoError(t, store.Insert(ctx, stoppedRecord)) + + results, err := store.ListDueRunning(ctx, created) + require.NoError(t, err) + require.Len(t, results, 1) + assert.Equal(t, "game-due", results[0].GameID) +} + +func TestListByStatus(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + created := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, runningRecord("game-r1", created, created.Add(time.Hour)))) + require.NoError(t, store.Insert(ctx, runningRecord("game-r2", created, created.Add(time.Hour)))) + require.NoError(t, store.Insert(ctx, startingRecord("game-s1", created))) + + running, err := store.ListByStatus(ctx, runtime.StatusRunning) + require.NoError(t, err) + require.Len(t, running, 2) + assert.Equal(t, "game-r1", running[0].GameID) + assert.Equal(t, "game-r2", running[1].GameID) + + starting, err := store.ListByStatus(ctx, runtime.StatusStarting) + require.NoError(t, err) + require.Len(t, starting, 1) + assert.Equal(t, "game-s1", starting[0].GameID) + + finished, err := store.ListByStatus(ctx, runtime.StatusFinished) + require.NoError(t, err) + assert.Empty(t, finished) +} + +func TestListReturnsEveryRecordOrderedByCreatedAtDesc(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + earliest := time.Date(2026, time.April, 27, 10, 0, 0, 0, time.UTC) + middle := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + latest := time.Date(2026, time.April, 27, 14, 0, 0, 0, time.UTC) + + require.NoError(t, store.Insert(ctx, startingRecord("game-earliest", earliest))) + require.NoError(t, store.Insert(ctx, runningRecord("game-middle", middle, middle.Add(time.Hour)))) + require.NoError(t, store.Insert(ctx, runningRecord("game-latest", latest, latest.Add(time.Hour)))) + + records, err := store.List(ctx) + require.NoError(t, err) + require.Len(t, records, 3) + assert.Equal(t, "game-latest", records[0].GameID) + assert.Equal(t, "game-middle", records[1].GameID) + assert.Equal(t, "game-earliest", records[2].GameID) +} + +func TestListReturnsEmptySliceWhenStoreIsEmpty(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + records, err := store.List(ctx) + require.NoError(t, err) + assert.Empty(t, records) +} + +func TestListByStatusUnknownRejected(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.ListByStatus(ctx, runtime.Status("exotic")) + require.Error(t, err) +} + +func TestListDueRunningRejectsZeroNow(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.ListDueRunning(ctx, time.Time{}) + require.Error(t, err) +} + +func TestGetRejectsEmptyGameID(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + _, err := store.Get(ctx, "") + require.Error(t, err) +} + +func TestDeleteIdempotent(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + now := time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) + require.NoError(t, store.Insert(ctx, startingRecord("game-001", now))) + + require.NoError(t, store.Delete(ctx, "game-001")) + + _, err := store.Get(ctx, "game-001") + require.ErrorIs(t, err, runtime.ErrNotFound) + + // Second call must be a no-op. + require.NoError(t, store.Delete(ctx, "game-001")) +} + +func TestDeleteRejectsEmptyGameID(t *testing.T) { + ctx := context.Background() + store := newStore(t) + + require.Error(t, store.Delete(ctx, "")) +} diff --git a/gamemaster/internal/adapters/redisstate/keyspace.go b/gamemaster/internal/adapters/redisstate/keyspace.go new file mode 100644 index 0000000..1dd9d3d --- /dev/null +++ b/gamemaster/internal/adapters/redisstate/keyspace.go @@ -0,0 +1,38 @@ +// Package redisstate hosts the Game Master Redis adapters that share a +// single keyspace. The sole sibling subpackage in v1 is +// `streamoffsets` (the per-consumer offset for the +// runtime:health_events stream); membership cache lives in process and +// does not touch Redis. +// +// The package itself only declares the keyspace; concrete stores live +// in nested packages so dependencies (miniredis, testcontainers) stay +// out of consumer build graphs that do not need them. +package redisstate + +import "encoding/base64" + +// defaultPrefix is the mandatory `gamemaster:` namespace prefix shared +// by every Game Master Redis key. +const defaultPrefix = "gamemaster:" + +// Keyspace builds the Game Master Redis keys. The namespace covers +// stream consumer offsets in v1. +// +// Dynamic key segments are encoded with base64url so raw key structure +// does not depend on caller-provided characters; this matches the +// encoding chosen by `lobby/internal/adapters/redisstate.Keyspace` and +// `rtmanager/internal/adapters/redisstate.Keyspace`. +type Keyspace struct{} + +// StreamOffset returns the Redis key that stores the last successfully +// processed entry id for one Redis Stream consumer. The streamLabel is +// the short logical identifier of the consumer (e.g. `health_events`), +// not the full stream name; it stays stable when the underlying stream +// key is renamed. +func (Keyspace) StreamOffset(streamLabel string) string { + return defaultPrefix + "stream_offsets:" + encodeKeyComponent(streamLabel) +} + +func encodeKeyComponent(value string) string { + return base64.RawURLEncoding.EncodeToString([]byte(value)) +} diff --git a/gamemaster/internal/adapters/redisstate/streamoffsets/store.go b/gamemaster/internal/adapters/redisstate/streamoffsets/store.go new file mode 100644 index 0000000..837aee5 --- /dev/null +++ b/gamemaster/internal/adapters/redisstate/streamoffsets/store.go @@ -0,0 +1,94 @@ +// Package streamoffsets implements the Redis-backed adapter for +// `ports.StreamOffsetStore`. +// +// In v1 the only consumer that calls Load/Save is the +// runtime:health_events worker (PLAN stage 18). Keys are produced by +// `redisstate.Keyspace.StreamOffset`, mirroring the lobby and rtmanager +// patterns. +package streamoffsets + +import ( + "context" + "errors" + "fmt" + "strings" + + "galaxy/gamemaster/internal/adapters/redisstate" + "galaxy/gamemaster/internal/ports" + + "github.com/redis/go-redis/v9" +) + +// Config configures one Redis-backed stream-offset store. The store +// does not own the redis client lifecycle; the caller (typically the +// service runtime) opens and closes it. +type Config struct { + Client *redis.Client +} + +// Store persists Game Master stream consumer offsets in Redis. +type Store struct { + client *redis.Client + keys redisstate.Keyspace +} + +// New constructs one Redis-backed stream-offset store from cfg. +func New(cfg Config) (*Store, error) { + if cfg.Client == nil { + return nil, errors.New("new gamemaster stream offset store: nil redis client") + } + return &Store{ + client: cfg.Client, + keys: redisstate.Keyspace{}, + }, nil +} + +// Load returns the last processed entry id for streamLabel when one +// is stored. A missing key returns ("", false, nil). +func (store *Store) Load(ctx context.Context, streamLabel string) (string, bool, error) { + if store == nil || store.client == nil { + return "", false, errors.New("load gamemaster stream offset: nil store") + } + if ctx == nil { + return "", false, errors.New("load gamemaster stream offset: nil context") + } + if strings.TrimSpace(streamLabel) == "" { + return "", false, errors.New("load gamemaster stream offset: stream label must not be empty") + } + + value, err := store.client.Get(ctx, store.keys.StreamOffset(streamLabel)).Result() + switch { + case errors.Is(err, redis.Nil): + return "", false, nil + case err != nil: + return "", false, fmt.Errorf("load gamemaster stream offset: %w", err) + } + return value, true, nil +} + +// Save stores entryID as the new offset for streamLabel. The key has +// no TTL — offsets are durable and only overwritten by subsequent +// Saves. +func (store *Store) Save(ctx context.Context, streamLabel, entryID string) error { + if store == nil || store.client == nil { + return errors.New("save gamemaster stream offset: nil store") + } + if ctx == nil { + return errors.New("save gamemaster stream offset: nil context") + } + if strings.TrimSpace(streamLabel) == "" { + return errors.New("save gamemaster stream offset: stream label must not be empty") + } + if strings.TrimSpace(entryID) == "" { + return errors.New("save gamemaster stream offset: entry id must not be empty") + } + + if err := store.client.Set(ctx, store.keys.StreamOffset(streamLabel), entryID, 0).Err(); err != nil { + return fmt.Errorf("save gamemaster stream offset: %w", err) + } + return nil +} + +// Ensure Store satisfies the ports.StreamOffsetStore interface at +// compile time. +var _ ports.StreamOffsetStore = (*Store)(nil) diff --git a/gamemaster/internal/adapters/redisstate/streamoffsets/store_test.go b/gamemaster/internal/adapters/redisstate/streamoffsets/store_test.go new file mode 100644 index 0000000..377f8a3 --- /dev/null +++ b/gamemaster/internal/adapters/redisstate/streamoffsets/store_test.go @@ -0,0 +1,93 @@ +package streamoffsets_test + +import ( + "context" + "testing" + + "galaxy/gamemaster/internal/adapters/redisstate" + "galaxy/gamemaster/internal/adapters/redisstate/streamoffsets" + + "github.com/alicebob/miniredis/v2" + "github.com/redis/go-redis/v9" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func newOffsetStore(t *testing.T) (*streamoffsets.Store, *miniredis.Miniredis) { + t.Helper() + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + + store, err := streamoffsets.New(streamoffsets.Config{Client: client}) + require.NoError(t, err) + return store, server +} + +func TestNewRejectsNilClient(t *testing.T) { + _, err := streamoffsets.New(streamoffsets.Config{}) + require.Error(t, err) +} + +func TestLoadMissingReturnsNotFound(t *testing.T) { + store, _ := newOffsetStore(t) + id, found, err := store.Load(context.Background(), "health_events") + require.NoError(t, err) + assert.False(t, found) + assert.Empty(t, id) +} + +func TestSaveLoadRoundTrip(t *testing.T) { + store, server := newOffsetStore(t) + + const entryID = "1700000000000-0" + require.NoError(t, store.Save(context.Background(), "health_events", entryID)) + + id, found, err := store.Load(context.Background(), "health_events") + require.NoError(t, err) + assert.True(t, found) + assert.Equal(t, entryID, id) + + // Verify the namespace prefix lands as expected. + expectedKey := redisstate.Keyspace{}.StreamOffset("health_events") + assert.True(t, server.Exists(expectedKey), + "key %q must exist after Save", expectedKey) +} + +func TestSaveOverwritesPreviousValue(t *testing.T) { + store, _ := newOffsetStore(t) + + require.NoError(t, store.Save(context.Background(), "health_events", "1-0")) + require.NoError(t, store.Save(context.Background(), "health_events", "2-0")) + + id, found, err := store.Load(context.Background(), "health_events") + require.NoError(t, err) + assert.True(t, found) + assert.Equal(t, "2-0", id) +} + +func TestSaveRejectsBadInputs(t *testing.T) { + store, _ := newOffsetStore(t) + + require.Error(t, store.Save(context.Background(), "", "1-0")) + require.Error(t, store.Save(context.Background(), "health_events", "")) + //nolint:staticcheck // intentional nil ctx test + require.Error(t, store.Save(nil, "health_events", "1-0")) +} + +func TestLoadRejectsBadInputs(t *testing.T) { + store, _ := newOffsetStore(t) + + _, _, err := store.Load(context.Background(), "") + require.Error(t, err) + //nolint:staticcheck // intentional nil ctx test + _, _, err = store.Load(nil, "health_events") + require.Error(t, err) +} + +func TestNilStoreOperationsRejected(t *testing.T) { + var store *streamoffsets.Store + _, _, err := store.Load(context.Background(), "health_events") + require.Error(t, err) + require.Error(t, store.Save(context.Background(), "health_events", "1-0")) +} diff --git a/gamemaster/internal/adapters/rtmclient/client.go b/gamemaster/internal/adapters/rtmclient/client.go new file mode 100644 index 0000000..efcdde2 --- /dev/null +++ b/gamemaster/internal/adapters/rtmclient/client.go @@ -0,0 +1,225 @@ +// Package rtmclient provides the trusted-internal Runtime Manager +// REST client Game Master uses for synchronous lifecycle operations +// against an already-running container. Two routes are mounted: +// +// - POST /api/v1/internal/runtimes/{game_id}/stop +// - POST /api/v1/internal/runtimes/{game_id}/patch +// +// `Restart` is reserved per `gamemaster/PLAN.md` Stage 10 and is not +// part of the v1 surface. +package rtmclient + +import ( + "bytes" + "context" + "encoding/json" + "errors" + "fmt" + "io" + "net/http" + "net/url" + "strings" + "time" + + "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" + + "galaxy/gamemaster/internal/ports" +) + +const ( + stopPathTemplate = "/api/v1/internal/runtimes/%s/stop" + patchPathTemplate = "/api/v1/internal/runtimes/%s/patch" +) + +// Config configures one HTTP-backed Runtime Manager internal client. +type Config struct { + // BaseURL stores the absolute base URL of the Runtime Manager + // internal HTTP listener (e.g. `http://rtmanager:8096`). + BaseURL string + + // RequestTimeout bounds one outbound stop/patch request. + RequestTimeout time.Duration +} + +// Client speaks REST/JSON to the Runtime Manager internal API. +type Client struct { + baseURL string + requestTimeout time.Duration + httpClient *http.Client + closeIdleConnections func() +} + +type stopRequestEnvelope struct { + Reason string `json:"reason"` +} + +type patchRequestEnvelope struct { + ImageRef string `json:"image_ref"` +} + +type errorEnvelope struct { + Error *errorBody `json:"error"` +} + +type errorBody struct { + Code string `json:"code"` + Message string `json:"message"` +} + +// NewClient constructs an RTM internal client with otelhttp-wrapped +// transport cloned from `http.DefaultTransport`. Call `Close` to +// release idle connections at shutdown. +func NewClient(cfg Config) (*Client, error) { + transport, ok := http.DefaultTransport.(*http.Transport) + if !ok { + return nil, errors.New("new rtm client: default transport is not *http.Transport") + } + cloned := transport.Clone() + return newClient(cfg, &http.Client{Transport: otelhttp.NewTransport(cloned)}, cloned.CloseIdleConnections) +} + +func newClient(cfg Config, httpClient *http.Client, closeIdleConnections func()) (*Client, error) { + switch { + case strings.TrimSpace(cfg.BaseURL) == "": + return nil, errors.New("new rtm client: base url must not be empty") + case cfg.RequestTimeout <= 0: + return nil, errors.New("new rtm client: request timeout must be positive") + case httpClient == nil: + return nil, errors.New("new rtm client: http client must not be nil") + } + parsed, err := url.Parse(strings.TrimRight(strings.TrimSpace(cfg.BaseURL), "/")) + if err != nil { + return nil, fmt.Errorf("new rtm client: parse base url: %w", err) + } + if parsed.Scheme == "" || parsed.Host == "" { + return nil, errors.New("new rtm client: base url must be absolute") + } + return &Client{ + baseURL: parsed.String(), + requestTimeout: cfg.RequestTimeout, + httpClient: httpClient, + closeIdleConnections: closeIdleConnections, + }, nil +} + +// Close releases idle HTTP connections owned by the underlying +// transport. Safe to call multiple times. +func (client *Client) Close() error { + if client == nil || client.closeIdleConnections == nil { + return nil + } + client.closeIdleConnections() + return nil +} + +// Stop calls POST /api/v1/internal/runtimes/{game_id}/stop with body +// `{reason}`. Any non-success outcome is wrapped with +// `ports.ErrRTMUnavailable`. +func (client *Client) Stop(ctx context.Context, gameID, reason string) error { + if err := client.validate(ctx, gameID); err != nil { + return err + } + if strings.TrimSpace(reason) == "" { + return errors.New("rtm stop: reason must not be empty") + } + body, err := json.Marshal(stopRequestEnvelope{Reason: reason}) + if err != nil { + return fmt.Errorf("rtm stop: encode request: %w", err) + } + return client.callMutation(ctx, fmt.Sprintf(stopPathTemplate, url.PathEscape(gameID)), body, "rtm stop") +} + +// Patch calls POST /api/v1/internal/runtimes/{game_id}/patch with body +// `{image_ref}`. A `409 conflict` from RTM (semver violation) is also +// wrapped with `ports.ErrRTMUnavailable`; the underlying `error_code` +// is preserved in the wrapped error message so callers can branch on +// the substring if needed. +func (client *Client) Patch(ctx context.Context, gameID, imageRef string) error { + if err := client.validate(ctx, gameID); err != nil { + return err + } + if strings.TrimSpace(imageRef) == "" { + return errors.New("rtm patch: image ref must not be empty") + } + body, err := json.Marshal(patchRequestEnvelope{ImageRef: imageRef}) + if err != nil { + return fmt.Errorf("rtm patch: encode request: %w", err) + } + return client.callMutation(ctx, fmt.Sprintf(patchPathTemplate, url.PathEscape(gameID)), body, "rtm patch") +} + +func (client *Client) validate(ctx context.Context, gameID string) error { + if client == nil || client.httpClient == nil { + return errors.New("rtm client: nil client") + } + if ctx == nil { + return errors.New("rtm client: nil context") + } + if err := ctx.Err(); err != nil { + return err + } + if strings.TrimSpace(gameID) == "" { + return errors.New("rtm client: game id must not be empty") + } + return nil +} + +func (client *Client) callMutation(ctx context.Context, requestPath string, body []byte, opLabel string) error { + payload, statusCode, err := client.doRequest(ctx, http.MethodPost, requestPath, body) + if err != nil { + return fmt.Errorf("%w: %s: %w", ports.ErrRTMUnavailable, opLabel, err) + } + if statusCode >= 200 && statusCode < 300 { + return nil + } + errorCode := decodeErrorCode(payload) + if errorCode != "" { + return fmt.Errorf("%w: %s: unexpected status %d (error_code=%s)", ports.ErrRTMUnavailable, opLabel, statusCode, errorCode) + } + return fmt.Errorf("%w: %s: unexpected status %d", ports.ErrRTMUnavailable, opLabel, statusCode) +} + +func (client *Client) doRequest(ctx context.Context, method, requestPath string, body []byte) ([]byte, int, error) { + attemptCtx, cancel := context.WithTimeout(ctx, client.requestTimeout) + defer cancel() + + var reader io.Reader + if len(body) > 0 { + reader = bytes.NewReader(body) + } + req, err := http.NewRequestWithContext(attemptCtx, method, client.baseURL+requestPath, reader) + if err != nil { + return nil, 0, fmt.Errorf("build request: %w", err) + } + req.Header.Set("Accept", "application/json") + if len(body) > 0 { + req.Header.Set("Content-Type", "application/json") + } + resp, err := client.httpClient.Do(req) + if err != nil { + return nil, 0, err + } + defer resp.Body.Close() + respBody, err := io.ReadAll(resp.Body) + if err != nil { + return nil, resp.StatusCode, fmt.Errorf("read response body: %w", err) + } + return respBody, resp.StatusCode, nil +} + +func decodeErrorCode(payload []byte) string { + if len(payload) == 0 { + return "" + } + var envelope errorEnvelope + if err := json.Unmarshal(payload, &envelope); err != nil { + return "" + } + if envelope.Error == nil { + return "" + } + return envelope.Error.Code +} + +// Compile-time assertion: Client implements ports.RTMClient. +var _ ports.RTMClient = (*Client)(nil) diff --git a/gamemaster/internal/adapters/rtmclient/client_test.go b/gamemaster/internal/adapters/rtmclient/client_test.go new file mode 100644 index 0000000..38ad1e6 --- /dev/null +++ b/gamemaster/internal/adapters/rtmclient/client_test.go @@ -0,0 +1,156 @@ +package rtmclient + +import ( + "context" + "encoding/json" + "errors" + "io" + "net/http" + "net/http/httptest" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "galaxy/gamemaster/internal/ports" +) + +func newTestClient(t *testing.T, baseURL string, timeout time.Duration) *Client { + t.Helper() + client, err := NewClient(Config{BaseURL: baseURL, RequestTimeout: timeout}) + require.NoError(t, err) + t.Cleanup(func() { _ = client.Close() }) + return client +} + +func TestNewClientValidatesConfig(t *testing.T) { + cases := map[string]Config{ + "empty base url": {BaseURL: "", RequestTimeout: time.Second}, + "non-absolute": {BaseURL: "rtm:8096", RequestTimeout: time.Second}, + "zero timeout": {BaseURL: "http://rtm:8096", RequestTimeout: 0}, + "negative timeout": {BaseURL: "http://rtm:8096", RequestTimeout: -time.Second}, + } + for name, cfg := range cases { + t.Run(name, func(t *testing.T) { + _, err := NewClient(cfg) + require.Error(t, err) + }) + } +} + +func TestStopHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPost, r.Method) + require.Equal(t, "/api/v1/internal/runtimes/game-1/stop", r.URL.Path) + require.Equal(t, "application/json", r.Header.Get("Content-Type")) + body, err := io.ReadAll(r.Body) + require.NoError(t, err) + var got stopRequestEnvelope + require.NoError(t, json.Unmarshal(body, &got)) + assert.Equal(t, "admin_request", got.Reason) + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{"game_id":"game-1","status":"stopped"}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + require.NoError(t, client.Stop(context.Background(), "game-1", "admin_request")) +} + +func TestStopRejectsBadInput(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact rtm on bad input") + })) + defer server.Close() + client := newTestClient(t, server.URL, time.Second) + + require.Error(t, client.Stop(context.Background(), " ", "admin_request")) + require.Error(t, client.Stop(context.Background(), "g", " ")) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + err := client.Stop(ctx, "g", "admin_request") + require.Error(t, err) + assert.True(t, errors.Is(err, context.Canceled)) +} + +func TestStopInternalError(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"error":{"code":"internal_error","message":"boom"}}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + err := client.Stop(context.Background(), "g", "admin_request") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrRTMUnavailable)) + assert.Contains(t, err.Error(), "internal_error") +} + +func TestStopTimeoutMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + time.Sleep(120 * time.Millisecond) + _, _ = w.Write([]byte(`{}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, 30*time.Millisecond) + err := client.Stop(context.Background(), "g", "admin_request") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrRTMUnavailable)) +} + +func TestPatchHappyPath(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + require.Equal(t, http.MethodPost, r.Method) + require.Equal(t, "/api/v1/internal/runtimes/g/patch", r.URL.Path) + body, err := io.ReadAll(r.Body) + require.NoError(t, err) + var got patchRequestEnvelope + require.NoError(t, json.Unmarshal(body, &got)) + assert.Equal(t, "galaxy/game:1.2.4", got.ImageRef) + _, _ = w.Write([]byte(`{"game_id":"g","status":"running"}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + require.NoError(t, client.Patch(context.Background(), "g", "galaxy/game:1.2.4")) +} + +func TestPatchSemverConflictMapsToUnavailable(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusConflict) + _, _ = w.Write([]byte(`{"error":{"code":"semver_patch_only","message":"cross-major patch not allowed"}}`)) + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + err := client.Patch(context.Background(), "g", "galaxy/game:2.0.0") + require.Error(t, err) + assert.True(t, errors.Is(err, ports.ErrRTMUnavailable)) + assert.Contains(t, err.Error(), "semver_patch_only") +} + +func TestPatchRejectsBadInput(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + t.Fatal("must not contact rtm on bad input") + })) + defer server.Close() + + client := newTestClient(t, server.URL, time.Second) + require.Error(t, client.Patch(context.Background(), " ", "galaxy/game:1.0.0")) + require.Error(t, client.Patch(context.Background(), "g", " ")) +} + +func TestCloseIsIdempotent(t *testing.T) { + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + _, _ = w.Write([]byte(`{}`)) + })) + defer server.Close() + client := newTestClient(t, server.URL, time.Second) + require.NoError(t, client.Stop(context.Background(), "g", "admin_request")) + require.NoError(t, client.Close()) + require.NoError(t, client.Close()) +} diff --git a/gamemaster/internal/api/internalhttp/conformance_test.go b/gamemaster/internal/api/internalhttp/conformance_test.go new file mode 100644 index 0000000..2cf6d61 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/conformance_test.go @@ -0,0 +1,611 @@ +package internalhttp + +import ( + "bytes" + "context" + "encoding/json" + "io" + "net/http" + "net/http/httptest" + "path/filepath" + "runtime" + "strings" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/api/internalhttp/handlers" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + domainruntime "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/service/adminbanish" + "galaxy/gamemaster/internal/service/adminforce" + "galaxy/gamemaster/internal/service/adminpatch" + "galaxy/gamemaster/internal/service/adminstop" + "galaxy/gamemaster/internal/service/commandexecute" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" + "galaxy/gamemaster/internal/service/livenessreply" + "galaxy/gamemaster/internal/service/orderput" + "galaxy/gamemaster/internal/service/registerruntime" + "galaxy/gamemaster/internal/service/reportget" + "galaxy/gamemaster/internal/service/turngeneration" + + "github.com/getkin/kin-openapi/openapi3" + "github.com/getkin/kin-openapi/openapi3filter" + "github.com/getkin/kin-openapi/routers" + "github.com/getkin/kin-openapi/routers/legacy" + "github.com/stretchr/testify/require" +) + +// TestInternalRESTConformance loads the OpenAPI specification, drives +// every internal REST operation against the live listener backed by +// stub services, and validates each request and response body +// against the spec via `openapi3filter.ValidateRequest` and +// `openapi3filter.ValidateResponse`. Failure-path response shapes +// are intentionally out of scope here; per-handler tests under +// `handlers/_test.go` cover the failure branches. +func TestInternalRESTConformance(t *testing.T) { + t.Parallel() + + doc := loadConformanceSpec(t) + + router, err := legacy.NewRouter(doc) + require.NoError(t, err) + + deps := newConformanceDeps() + server, err := NewServer(newConformanceConfig(), Dependencies{ + Logger: nil, + Telemetry: nil, + Readiness: nil, + RuntimeRecords: deps.runtimeRecords, + RegisterRuntime: deps.registerRuntime, + ForceNextTurn: deps.forceNextTurn, + StopRuntime: deps.stopRuntime, + PatchRuntime: deps.patchRuntime, + BanishRace: deps.banishRace, + InvalidateMemberships: deps.membership, + GameLiveness: deps.liveness, + EngineVersions: deps.engineVersions, + CommandExecute: deps.commandExecute, + PutOrders: deps.putOrders, + GetReport: deps.getReport, + }) + require.NoError(t, err) + + cases := []conformanceCase{ + {name: "internalHealthz", method: http.MethodGet, path: "/healthz"}, + {name: "internalReadyz", method: http.MethodGet, path: "/readyz"}, + { + name: "internalRegisterRuntime", + method: http.MethodPost, + path: "/api/v1/internal/games/" + conformanceGameID + "/register-runtime", + contentType: "application/json", + body: `{ + "engine_endpoint": "http://galaxy-game-` + conformanceGameID + `:8080", + "members": [{"user_id": "user-1", "race_name": "Aelinari"}], + "target_engine_version": "1.2.3", + "turn_schedule": "0 18 * * *" + }`, + }, + { + name: "internalBanishRace", + method: http.MethodPost, + path: "/api/v1/internal/games/" + conformanceGameID + "/race/Aelinari/banish", + expectedStatus: http.StatusNoContent, + }, + { + name: "internalInvalidateMemberships", + method: http.MethodPost, + path: "/api/v1/internal/games/" + conformanceGameID + "/memberships/invalidate", + expectedStatus: http.StatusNoContent, + }, + { + name: "internalGameLiveness", + method: http.MethodGet, + path: "/api/v1/internal/games/" + conformanceGameID + "/liveness", + }, + {name: "internalListRuntimes", method: http.MethodGet, path: "/api/v1/internal/runtimes"}, + { + name: "internalGetRuntime", + method: http.MethodGet, + path: "/api/v1/internal/runtimes/" + conformanceGameID, + }, + { + name: "internalForceNextTurn", + method: http.MethodPost, + path: "/api/v1/internal/runtimes/" + conformanceGameID + "/force-next-turn", + }, + { + name: "internalStopRuntime", + method: http.MethodPost, + path: "/api/v1/internal/runtimes/" + conformanceGameID + "/stop", + contentType: "application/json", + body: `{"reason":"admin_request"}`, + }, + { + name: "internalPatchRuntime", + method: http.MethodPost, + path: "/api/v1/internal/runtimes/" + conformanceGameID + "/patch", + contentType: "application/json", + body: `{"version":"1.2.4"}`, + }, + {name: "internalListEngineVersions", method: http.MethodGet, path: "/api/v1/internal/engine-versions"}, + { + name: "internalCreateEngineVersion", + method: http.MethodPost, + path: "/api/v1/internal/engine-versions", + contentType: "application/json", + body: `{"version":"1.2.5","image_ref":"galaxy/game:1.2.5"}`, + expectedStatus: http.StatusCreated, + }, + { + name: "internalGetEngineVersion", + method: http.MethodGet, + path: "/api/v1/internal/engine-versions/1.2.3", + }, + { + name: "internalUpdateEngineVersion", + method: http.MethodPatch, + path: "/api/v1/internal/engine-versions/1.2.3", + contentType: "application/json", + body: `{"image_ref":"galaxy/game:1.2.3-patch"}`, + }, + { + name: "internalDeprecateEngineVersion", + method: http.MethodDelete, + path: "/api/v1/internal/engine-versions/1.2.3", + expectedStatus: http.StatusNoContent, + }, + { + name: "internalResolveEngineVersionImageRef", + method: http.MethodGet, + path: "/api/v1/internal/engine-versions/1.2.3/image-ref", + }, + { + name: "internalExecuteCommands", + method: http.MethodPost, + path: "/api/v1/internal/games/" + conformanceGameID + "/commands", + contentType: "application/json", + body: `{"commands":[{"name":"build","args":{}}]}`, + extraHeaders: map[string]string{userIDHeader: conformanceUserID}, + }, + { + name: "internalPutOrders", + method: http.MethodPost, + path: "/api/v1/internal/games/" + conformanceGameID + "/orders", + contentType: "application/json", + body: `{"commands":[{"name":"move","args":{}}]}`, + extraHeaders: map[string]string{userIDHeader: conformanceUserID}, + }, + { + name: "internalGetReport", + method: http.MethodGet, + path: "/api/v1/internal/games/" + conformanceGameID + "/reports/0", + extraHeaders: map[string]string{userIDHeader: conformanceUserID}, + }, + } + + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + runConformanceCase(t, server.handler, router, tc) + }) + } +} + +const ( + conformanceGameID = "game-conformance" + conformanceUserID = "user-conformance" + conformanceServerURL = "http://localhost:8097" + userIDHeader = "X-User-ID" +) + +type conformanceCase struct { + name string + method string + path string + contentType string + body string + expectedStatus int + extraHeaders map[string]string +} + +func runConformanceCase(t *testing.T, handler http.Handler, router routers.Router, tc conformanceCase) { + t.Helper() + + expectedStatus := tc.expectedStatus + if expectedStatus == 0 { + expectedStatus = http.StatusOK + } + + var bodyReader io.Reader + if tc.body != "" { + bodyReader = strings.NewReader(tc.body) + } + request := httptest.NewRequest(tc.method, tc.path, bodyReader) + if tc.contentType != "" { + request.Header.Set("Content-Type", tc.contentType) + } + request.Header.Set("X-Galaxy-Caller", "admin") + for key, value := range tc.extraHeaders { + request.Header.Set(key, value) + } + + recorder := httptest.NewRecorder() + handler.ServeHTTP(recorder, request) + require.Equalf(t, expectedStatus, recorder.Code, + "operation %s returned %d: %s", tc.name, recorder.Code, recorder.Body.String()) + + validationURL := conformanceServerURL + tc.path + validationRequest := httptest.NewRequest(tc.method, validationURL, bodyReaderFor(tc.body)) + if tc.contentType != "" { + validationRequest.Header.Set("Content-Type", tc.contentType) + } + validationRequest.Header.Set("X-Galaxy-Caller", "admin") + for key, value := range tc.extraHeaders { + validationRequest.Header.Set(key, value) + } + + route, pathParams, err := router.FindRoute(validationRequest) + require.NoError(t, err) + + requestInput := &openapi3filter.RequestValidationInput{ + Request: validationRequest, + PathParams: pathParams, + Route: route, + Options: &openapi3filter.Options{ + IncludeResponseStatus: true, + }, + } + require.NoError(t, openapi3filter.ValidateRequest(context.Background(), requestInput)) + + responseInput := &openapi3filter.ResponseValidationInput{ + RequestValidationInput: requestInput, + Status: recorder.Code, + Header: recorder.Header(), + Options: &openapi3filter.Options{ + IncludeResponseStatus: true, + }, + } + responseInput.SetBodyBytes(recorder.Body.Bytes()) + require.NoError(t, openapi3filter.ValidateResponse(context.Background(), responseInput)) +} + +func loadConformanceSpec(t *testing.T) *openapi3.T { + t.Helper() + + _, thisFile, _, ok := runtime.Caller(0) + require.True(t, ok) + + specPath := filepath.Join(filepath.Dir(thisFile), "..", "..", "..", "api", "internal-openapi.yaml") + loader := openapi3.NewLoader() + doc, err := loader.LoadFromFile(specPath) + require.NoError(t, err) + require.NoError(t, doc.Validate(context.Background())) + return doc +} + +func bodyReaderFor(raw string) io.Reader { + if raw == "" { + return http.NoBody + } + return bytes.NewBufferString(raw) +} + +func newConformanceConfig() Config { + return Config{ + Addr: ":0", + ReadHeaderTimeout: time.Second, + ReadTimeout: time.Second, + WriteTimeout: time.Second, + IdleTimeout: time.Second, + } +} + +// conformanceDeps groups the stub collaborators handed to the listener. +type conformanceDeps struct { + runtimeRecords *conformanceRuntimeRecords + registerRuntime *conformanceRegister + forceNextTurn *conformanceForce + stopRuntime *conformanceStop + patchRuntime *conformancePatch + banishRace *conformanceBanish + membership *conformanceMembership + liveness *conformanceLiveness + engineVersions *conformanceEngineVersions + commandExecute *conformanceCommands + putOrders *conformanceOrders + getReport *conformanceReport +} + +func newConformanceDeps() *conformanceDeps { + return &conformanceDeps{ + runtimeRecords: newConformanceRuntimeRecords(), + registerRuntime: &conformanceRegister{}, + forceNextTurn: &conformanceForce{}, + stopRuntime: &conformanceStop{}, + patchRuntime: &conformancePatch{}, + banishRace: &conformanceBanish{}, + membership: &conformanceMembership{}, + liveness: &conformanceLiveness{}, + engineVersions: newConformanceEngineVersions(), + commandExecute: &conformanceCommands{}, + putOrders: &conformanceOrders{}, + getReport: &conformanceReport{}, + } +} + +// conformanceRecord builds a canonical running runtime record used +// by every stub service. +func conformanceRuntimeRecord() domainruntime.RuntimeRecord { + moment := time.Date(2026, 4, 30, 12, 0, 0, 0, time.UTC) + next := moment.Add(time.Minute) + started := moment + return domainruntime.RuntimeRecord{ + GameID: conformanceGameID, + Status: domainruntime.StatusRunning, + EngineEndpoint: "http://galaxy-game-" + conformanceGameID + ":8080", + CurrentImageRef: "galaxy/game:1.2.3", + CurrentEngineVersion: "1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 0, + NextGenerationAt: &next, + SkipNextTick: false, + EngineHealth: "healthy", + CreatedAt: moment, + UpdatedAt: moment, + StartedAt: &started, + } +} + +func conformanceEngineVersionRecord(version string) engineversion.EngineVersion { + moment := time.Date(2026, 4, 30, 12, 0, 0, 0, time.UTC) + return engineversion.EngineVersion{ + Version: version, + ImageRef: "galaxy/game:" + version, + Options: nil, + Status: engineversion.StatusActive, + CreatedAt: moment, + UpdatedAt: moment, + } +} + +// conformanceRuntimeRecords is an in-memory store seeded with the +// canonical record so the get/list endpoints have something to return. +type conformanceRuntimeRecords struct { + mu sync.Mutex + stored map[string]domainruntime.RuntimeRecord +} + +func newConformanceRuntimeRecords() *conformanceRuntimeRecords { + return &conformanceRuntimeRecords{ + stored: map[string]domainruntime.RuntimeRecord{ + conformanceGameID: conformanceRuntimeRecord(), + }, + } +} + +func (s *conformanceRuntimeRecords) Get(_ context.Context, gameID string) (domainruntime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[gameID] + if !ok { + return domainruntime.RuntimeRecord{}, domainruntime.ErrNotFound + } + return record, nil +} + +func (s *conformanceRuntimeRecords) List(_ context.Context) ([]domainruntime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]domainruntime.RuntimeRecord, 0, len(s.stored)) + for _, record := range s.stored { + out = append(out, record) + } + return out, nil +} + +func (s *conformanceRuntimeRecords) ListByStatus(_ context.Context, status domainruntime.Status) ([]domainruntime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]domainruntime.RuntimeRecord, 0, len(s.stored)) + for _, record := range s.stored { + if record.Status == status { + out = append(out, record) + } + } + return out, nil +} + +type conformanceRegister struct{} + +func (s *conformanceRegister) Handle(_ context.Context, _ registerruntime.Input) (registerruntime.Result, error) { + return registerruntime.Result{ + Record: conformanceRuntimeRecord(), + Outcome: operation.OutcomeSuccess, + }, nil +} + +type conformanceForce struct{} + +func (s *conformanceForce) Handle(_ context.Context, _ adminforce.Input) (adminforce.Result, error) { + return adminforce.Result{ + TurnGeneration: turngeneration.Result{Record: conformanceRuntimeRecord()}, + SkipScheduled: true, + Outcome: operation.OutcomeSuccess, + }, nil +} + +type conformanceStop struct{} + +func (s *conformanceStop) Handle(_ context.Context, _ adminstop.Input) (adminstop.Result, error) { + rec := conformanceRuntimeRecord() + rec.Status = domainruntime.StatusStopped + stopped := rec.UpdatedAt.Add(time.Second) + rec.StoppedAt = &stopped + rec.UpdatedAt = stopped + return adminstop.Result{Record: rec, Outcome: operation.OutcomeSuccess}, nil +} + +type conformancePatch struct{} + +func (s *conformancePatch) Handle(_ context.Context, in adminpatch.Input) (adminpatch.Result, error) { + rec := conformanceRuntimeRecord() + if in.Version != "" { + rec.CurrentImageRef = "galaxy/game:" + in.Version + rec.CurrentEngineVersion = in.Version + } + return adminpatch.Result{Record: rec, Outcome: operation.OutcomeSuccess}, nil +} + +type conformanceBanish struct{} + +func (s *conformanceBanish) Handle(_ context.Context, _ adminbanish.Input) (adminbanish.Result, error) { + return adminbanish.Result{Outcome: operation.OutcomeSuccess}, nil +} + +type conformanceMembership struct{} + +func (m *conformanceMembership) Invalidate(string) {} + +type conformanceLiveness struct{} + +func (s *conformanceLiveness) Handle(_ context.Context, _ livenessreply.Input) (livenessreply.Result, error) { + return livenessreply.Result{ + Ready: true, + Status: domainruntime.StatusRunning, + }, nil +} + +type conformanceEngineVersions struct { + mu sync.Mutex + versions map[string]engineversion.EngineVersion +} + +func newConformanceEngineVersions() *conformanceEngineVersions { + return &conformanceEngineVersions{ + versions: map[string]engineversion.EngineVersion{ + "1.2.3": conformanceEngineVersionRecord("1.2.3"), + }, + } +} + +func (s *conformanceEngineVersions) List(_ context.Context, _ *engineversion.Status) ([]engineversion.EngineVersion, error) { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]engineversion.EngineVersion, 0, len(s.versions)) + for _, version := range s.versions { + out = append(out, version) + } + return out, nil +} + +func (s *conformanceEngineVersions) Get(_ context.Context, version string) (engineversion.EngineVersion, error) { + s.mu.Lock() + defer s.mu.Unlock() + v, ok := s.versions[version] + if !ok { + return engineversion.EngineVersion{}, engineversionsvc.ErrNotFound + } + return v, nil +} + +func (s *conformanceEngineVersions) ResolveImageRef(_ context.Context, version string) (string, error) { + s.mu.Lock() + defer s.mu.Unlock() + v, ok := s.versions[version] + if !ok { + return "", engineversionsvc.ErrNotFound + } + return v.ImageRef, nil +} + +func (s *conformanceEngineVersions) Create(_ context.Context, in engineversionsvc.CreateInput) (engineversion.EngineVersion, error) { + rec := engineversion.EngineVersion{ + Version: in.Version, + ImageRef: in.ImageRef, + Options: in.Options, + Status: engineversion.StatusActive, + CreatedAt: time.Date(2026, 4, 30, 12, 0, 0, 0, time.UTC), + UpdatedAt: time.Date(2026, 4, 30, 12, 0, 0, 0, time.UTC), + } + s.mu.Lock() + s.versions[in.Version] = rec + s.mu.Unlock() + return rec, nil +} + +func (s *conformanceEngineVersions) Update(_ context.Context, in engineversionsvc.UpdateInput) (engineversion.EngineVersion, error) { + s.mu.Lock() + defer s.mu.Unlock() + rec, ok := s.versions[in.Version] + if !ok { + return engineversion.EngineVersion{}, engineversionsvc.ErrNotFound + } + if in.ImageRef != nil { + rec.ImageRef = *in.ImageRef + } + if in.Status != nil { + rec.Status = *in.Status + } + rec.UpdatedAt = time.Date(2026, 4, 30, 13, 0, 0, 0, time.UTC) + s.versions[in.Version] = rec + return rec, nil +} + +func (s *conformanceEngineVersions) Deprecate(_ context.Context, in engineversionsvc.DeprecateInput) error { + s.mu.Lock() + defer s.mu.Unlock() + rec, ok := s.versions[in.Version] + if !ok { + return engineversionsvc.ErrNotFound + } + rec.Status = engineversion.StatusDeprecated + rec.UpdatedAt = time.Date(2026, 4, 30, 14, 0, 0, 0, time.UTC) + s.versions[in.Version] = rec + return nil +} + +type conformanceCommands struct{} + +func (s *conformanceCommands) Handle(_ context.Context, _ commandexecute.Input) (commandexecute.Result, error) { + return commandexecute.Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: json.RawMessage(`{"results":[]}`), + }, nil +} + +type conformanceOrders struct{} + +func (s *conformanceOrders) Handle(_ context.Context, _ orderput.Input) (orderput.Result, error) { + return orderput.Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: json.RawMessage(`{"results":[]}`), + }, nil +} + +type conformanceReport struct{} + +func (s *conformanceReport) Handle(_ context.Context, _ reportget.Input) (reportget.Result, error) { + return reportget.Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: json.RawMessage(`{"player":"Aelinari","turn":0}`), + }, nil +} + +// Compile-time guards that the stubs satisfy the handler-level +// service interfaces accepted by the listener. +var ( + _ handlers.RegisterRuntimeService = (*conformanceRegister)(nil) + _ handlers.ForceNextTurnService = (*conformanceForce)(nil) + _ handlers.StopRuntimeService = (*conformanceStop)(nil) + _ handlers.PatchRuntimeService = (*conformancePatch)(nil) + _ handlers.BanishRaceService = (*conformanceBanish)(nil) + _ handlers.MembershipInvalidator = (*conformanceMembership)(nil) + _ handlers.LivenessService = (*conformanceLiveness)(nil) + _ handlers.EngineVersionService = (*conformanceEngineVersions)(nil) + _ handlers.CommandExecuteService = (*conformanceCommands)(nil) + _ handlers.OrderPutService = (*conformanceOrders)(nil) + _ handlers.ReportGetService = (*conformanceReport)(nil) + _ handlers.RuntimeRecordsReader = (*conformanceRuntimeRecords)(nil) +) diff --git a/gamemaster/internal/api/internalhttp/handlers/banishrace.go b/gamemaster/internal/api/internalhttp/handlers/banishrace.go new file mode 100644 index 0000000..eeaa7a1 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/banishrace.go @@ -0,0 +1,54 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/adminbanish" +) + +// newBanishRaceHandler returns the handler for +// `POST /api/v1/internal/games/{game_id}/race/{race_name}/banish`. The +// request has no body; both identifiers come from the URL path. +// Success returns `204 No Content`. +func newBanishRaceHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.banish_race") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.BanishRace == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "banish race service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + raceName, ok := extractRaceName(writer, request) + if !ok { + return + } + + result, err := deps.BanishRace.Handle(request.Context(), adminbanish.Input{ + GameID: gameID, + RaceName: raceName, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "banish race service errored", + "game_id", gameID, + "race_name", raceName, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "banish race service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeNoContent(writer) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/common.go b/gamemaster/internal/api/internalhttp/handlers/common.go new file mode 100644 index 0000000..04262c8 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/common.go @@ -0,0 +1,422 @@ +package handlers + +import ( + "encoding/json" + "errors" + "io" + "log/slog" + "net/http" + "strings" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" +) + +// jsonContentType is the Content-Type used by every internal REST +// response body except the engine pass-through bodies which retain +// the engine's chosen Content-Type. +const jsonContentType = "application/json; charset=utf-8" + +// callerHeader is the optional caller-classification header used to +// attribute each request to a specific entry point. Documented in +// `gamemaster/README.md` §«Internal REST API». Missing or unknown +// values map to OpSourceAdminRest. +const callerHeader = "X-Galaxy-Caller" + +// userIDHeader carries the verified player identity propagated by +// Edge Gateway on hot-path operations. Required for +// `internalExecuteCommands`, `internalPutOrders`, and +// `internalGetReport`. +const userIDHeader = "X-User-ID" + +// requestIDHeader is read into `operation_log.source_ref` when present +// so REST callers can correlate audit rows with their requests. +const requestIDHeader = "X-Request-ID" + +// gameIDPathParam, raceNamePathParam, versionPathParam, turnPathParam +// mirror the parameter names declared in +// `gamemaster/api/internal-openapi.yaml`. +const ( + gameIDPathParam = "game_id" + raceNamePathParam = "race_name" + versionPathParam = "version" + turnPathParam = "turn" +) + +// Stable error codes used by the handler layer when no service result +// is available (e.g., the service is not wired or the request shape +// failed pre-decode validation). The values match the vocabulary +// frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. +const ( + errorCodeInvalidRequest = "invalid_request" + errorCodeForbidden = "forbidden" + errorCodeRuntimeNotFound = "runtime_not_found" + errorCodeEngineVersionNotFound = "engine_version_not_found" + errorCodeEngineVersionInUse = "engine_version_in_use" + errorCodeConflict = "conflict" + errorCodeRuntimeNotRunning = "runtime_not_running" + errorCodeSemverPatchOnly = "semver_patch_only" + errorCodeEngineUnreachable = "engine_unreachable" + errorCodeEngineValidationError = "engine_validation_error" + errorCodeEngineProtocolError = "engine_protocol_violation" + errorCodeServiceUnavailable = "service_unavailable" + errorCodeInternal = "internal_error" +) + +// errorBody mirrors the `error` element of the OpenAPI ErrorResponse +// schema. +type errorBody struct { + Code string `json:"code"` + Message string `json:"message"` +} + +// errorResponse mirrors the OpenAPI ErrorResponse envelope. +type errorResponse struct { + Error errorBody `json:"error"` +} + +// runtimeRecordResponse mirrors the OpenAPI RuntimeRecord schema. +// Required timestamps are always present and encode as int64 UTC +// milliseconds; optional ones use `*int64` so an absent value is +// omitted from the JSON form (rather than encoded as `null`). +type runtimeRecordResponse struct { + GameID string `json:"game_id"` + RuntimeStatus string `json:"runtime_status"` + EngineEndpoint string `json:"engine_endpoint"` + CurrentImageRef string `json:"current_image_ref"` + CurrentEngineVersion string `json:"current_engine_version"` + TurnSchedule string `json:"turn_schedule"` + CurrentTurn int `json:"current_turn"` + NextGenerationAt int64 `json:"next_generation_at"` + SkipNextTick bool `json:"skip_next_tick"` + EngineHealthSummary string `json:"engine_health_summary"` + CreatedAt int64 `json:"created_at"` + UpdatedAt int64 `json:"updated_at"` + StartedAt *int64 `json:"started_at,omitempty"` + StoppedAt *int64 `json:"stopped_at,omitempty"` + FinishedAt *int64 `json:"finished_at,omitempty"` +} + +// runtimeListResponse mirrors the OpenAPI RuntimeListResponse schema. +// Runtimes is always non-nil so an empty result encodes as +// `{"runtimes":[]}` rather than `{"runtimes":null}`. +type runtimeListResponse struct { + Runtimes []runtimeRecordResponse `json:"runtimes"` +} + +// engineVersionResponse mirrors the OpenAPI EngineVersion schema. +// Options is a `json.RawMessage` so the engine-side document passes +// through verbatim. +type engineVersionResponse struct { + Version string `json:"version"` + ImageRef string `json:"image_ref"` + Options json.RawMessage `json:"options"` + Status string `json:"status"` + CreatedAt int64 `json:"created_at"` + UpdatedAt int64 `json:"updated_at"` +} + +// engineVersionListResponse mirrors the OpenAPI +// EngineVersionListResponse schema. +type engineVersionListResponse struct { + Versions []engineVersionResponse `json:"versions"` +} + +// imageRefResponse mirrors the OpenAPI ImageRefResponse schema. +type imageRefResponse struct { + ImageRef string `json:"image_ref"` +} + +// livenessResponse mirrors the OpenAPI LivenessResponse schema. +type livenessResponse struct { + Ready bool `json:"ready"` + Status string `json:"status"` +} + +// encodeRuntimeRecord turns a domain RuntimeRecord into its wire shape. +// Required `next_generation_at` encodes as `0` when the record carries +// no scheduled tick (e.g., status=starting before the first +// scheduling write); optional lifecycle timestamps are omitted when +// nil. +func encodeRuntimeRecord(record runtime.RuntimeRecord) runtimeRecordResponse { + resp := runtimeRecordResponse{ + GameID: record.GameID, + RuntimeStatus: string(record.Status), + EngineEndpoint: record.EngineEndpoint, + CurrentImageRef: record.CurrentImageRef, + CurrentEngineVersion: record.CurrentEngineVersion, + TurnSchedule: record.TurnSchedule, + CurrentTurn: record.CurrentTurn, + SkipNextTick: record.SkipNextTick, + EngineHealthSummary: record.EngineHealth, + CreatedAt: record.CreatedAt.UTC().UnixMilli(), + UpdatedAt: record.UpdatedAt.UTC().UnixMilli(), + } + if record.NextGenerationAt != nil { + resp.NextGenerationAt = record.NextGenerationAt.UTC().UnixMilli() + } + if record.StartedAt != nil { + v := record.StartedAt.UTC().UnixMilli() + resp.StartedAt = &v + } + if record.StoppedAt != nil { + v := record.StoppedAt.UTC().UnixMilli() + resp.StoppedAt = &v + } + if record.FinishedAt != nil { + v := record.FinishedAt.UTC().UnixMilli() + resp.FinishedAt = &v + } + return resp +} + +// encodeRuntimeList turns a domain RuntimeRecord slice into a wire +// list response. records may be nil (empty store); the result still +// carries an empty Runtimes slice so the JSON form is `{"runtimes":[]}`. +func encodeRuntimeList(records []runtime.RuntimeRecord) runtimeListResponse { + resp := runtimeListResponse{ + Runtimes: make([]runtimeRecordResponse, 0, len(records)), + } + for _, record := range records { + resp.Runtimes = append(resp.Runtimes, encodeRuntimeRecord(record)) + } + return resp +} + +// encodeEngineVersion turns a domain EngineVersion into its wire shape. +// Empty Options bytes encode as the JSON object literal `{}` to +// satisfy the schema (`type: object`). +func encodeEngineVersion(version engineversion.EngineVersion) engineVersionResponse { + options := json.RawMessage(version.Options) + if len(options) == 0 { + options = json.RawMessage("{}") + } + return engineVersionResponse{ + Version: version.Version, + ImageRef: version.ImageRef, + Options: options, + Status: string(version.Status), + CreatedAt: version.CreatedAt.UTC().UnixMilli(), + UpdatedAt: version.UpdatedAt.UTC().UnixMilli(), + } +} + +// encodeEngineVersionList turns a slice of domain EngineVersions into +// a wire list response. The Versions slice is always non-nil. +func encodeEngineVersionList(versions []engineversion.EngineVersion) engineVersionListResponse { + resp := engineVersionListResponse{ + Versions: make([]engineVersionResponse, 0, len(versions)), + } + for _, version := range versions { + resp.Versions = append(resp.Versions, encodeEngineVersion(version)) + } + return resp +} + +// writeJSON writes payload as a JSON response with the given status +// code. +func writeJSON(writer http.ResponseWriter, statusCode int, payload any) { + writer.Header().Set("Content-Type", jsonContentType) + writer.WriteHeader(statusCode) + _ = json.NewEncoder(writer).Encode(payload) +} + +// writeNoContent writes `204 No Content` with no body. The +// Content-Type header is intentionally omitted so kin-openapi's +// response validator does not look for a body. +func writeNoContent(writer http.ResponseWriter) { + writer.WriteHeader(http.StatusNoContent) +} + +// writeRawJSON writes raw, already-encoded JSON bytes as the response +// body with the given status code. Used by the hot-path handlers +// where the engine's response body is forwarded verbatim. +func writeRawJSON(writer http.ResponseWriter, statusCode int, body []byte) { + writer.Header().Set("Content-Type", jsonContentType) + writer.WriteHeader(statusCode) + _, _ = writer.Write(body) +} + +// writeError writes the canonical error envelope at statusCode. +func writeError(writer http.ResponseWriter, statusCode int, code, message string) { + writeJSON(writer, statusCode, errorResponse{ + Error: errorBody{Code: code, Message: message}, + }) +} + +// writeFailure writes the canonical error envelope using the HTTP +// status mapped from code via mapErrorCodeToStatus. Used by every +// service-backed handler when its service returns +// `Outcome=failure`. +func writeFailure(writer http.ResponseWriter, code, message string) { + writeError(writer, mapErrorCodeToStatus(code), code, message) +} + +// mapErrorCodeToStatus maps a stable error code to the HTTP status +// declared by `gamemaster/api/internal-openapi.yaml`. Unknown codes +// degrade to 500 so a future error code that ships ahead of its +// handler-layer mapping still produces a structurally valid response. +func mapErrorCodeToStatus(code string) int { + switch code { + case errorCodeInvalidRequest: + return http.StatusBadRequest + case errorCodeForbidden: + return http.StatusForbidden + case errorCodeRuntimeNotFound, errorCodeEngineVersionNotFound: + return http.StatusNotFound + case errorCodeConflict, + errorCodeRuntimeNotRunning, + errorCodeSemverPatchOnly, + errorCodeEngineVersionInUse: + return http.StatusConflict + case errorCodeEngineUnreachable, + errorCodeEngineValidationError, + errorCodeEngineProtocolError: + return http.StatusBadGateway + case errorCodeServiceUnavailable: + return http.StatusServiceUnavailable + default: + return http.StatusInternalServerError + } +} + +// mapServiceError translates one of the `engineversionsvc` sentinel +// errors into the corresponding HTTP status, error code, and message. +// Unknown errors degrade to `500 internal_error`. +func mapServiceError(err error) (int, string, string) { + switch { + case errors.Is(err, engineversionsvc.ErrInvalidRequest): + return http.StatusBadRequest, errorCodeInvalidRequest, err.Error() + case errors.Is(err, engineversionsvc.ErrNotFound): + return http.StatusNotFound, errorCodeEngineVersionNotFound, err.Error() + case errors.Is(err, engineversionsvc.ErrConflict): + return http.StatusConflict, errorCodeConflict, err.Error() + case errors.Is(err, engineversionsvc.ErrInUse): + return http.StatusConflict, errorCodeEngineVersionInUse, err.Error() + case errors.Is(err, engineversionsvc.ErrServiceUnavailable): + return http.StatusServiceUnavailable, errorCodeServiceUnavailable, err.Error() + default: + return http.StatusInternalServerError, errorCodeInternal, "internal server error" + } +} + +// decodeStrictJSON decodes one request body into target with strict +// JSON semantics: unknown fields are rejected and trailing content is +// rejected. Mirrors the helper used by lobby and rtmanager. +func decodeStrictJSON(body io.Reader, target any) error { + decoder := json.NewDecoder(body) + decoder.DisallowUnknownFields() + if err := decoder.Decode(target); err != nil { + return err + } + if decoder.More() { + return errors.New("unexpected trailing content after JSON body") + } + return nil +} + +// readRawJSONBody returns the raw request body provided it parses as +// a JSON value. The hot-path handlers use this helper because the +// envelope is engine-owned (`additionalProperties: true` on +// ExecuteCommandsRequest / PutOrdersRequest); strict decoding would +// reject legitimate extra fields. +func readRawJSONBody(reader io.Reader) ([]byte, error) { + if reader == nil { + return nil, errors.New("request body is required") + } + body, err := io.ReadAll(reader) + if err != nil { + return nil, err + } + if len(body) == 0 { + return nil, errors.New("request body is required") + } + if !json.Valid(body) { + return nil, errors.New("request body is not valid JSON") + } + return body, nil +} + +// extractGameID pulls the {game_id} path variable from request. An +// empty or whitespace-only value writes a `400 invalid_request` and +// returns ok=false so callers can short-circuit. +func extractGameID(writer http.ResponseWriter, request *http.Request) (string, bool) { + raw := request.PathValue(gameIDPathParam) + if strings.TrimSpace(raw) == "" { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "game id is required") + return "", false + } + return raw, true +} + +// extractRaceName pulls the {race_name} path variable. +func extractRaceName(writer http.ResponseWriter, request *http.Request) (string, bool) { + raw := request.PathValue(raceNamePathParam) + if strings.TrimSpace(raw) == "" { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "race name is required") + return "", false + } + return raw, true +} + +// extractVersion pulls the {version} path variable. +func extractVersion(writer http.ResponseWriter, request *http.Request) (string, bool) { + raw := request.PathValue(versionPathParam) + if strings.TrimSpace(raw) == "" { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "version is required") + return "", false + } + return raw, true +} + +// extractUserID pulls the verified player identity from the +// X-User-ID header. The hot-path operations require this header per +// the OpenAPI spec; absent or whitespace-only values short-circuit +// with `400 invalid_request`. +func extractUserID(writer http.ResponseWriter, request *http.Request) (string, bool) { + raw := strings.TrimSpace(request.Header.Get(userIDHeader)) + if raw == "" { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "X-User-ID header is required") + return "", false + } + return raw, true +} + +// resolveOpSource maps the X-Galaxy-Caller header value to an +// `operation.OpSource`. Missing or unknown values default to +// OpSourceAdminRest, matching the documented contract in +// `gamemaster/README.md` §«Internal REST API». +func resolveOpSource(request *http.Request) operation.OpSource { + switch strings.ToLower(strings.TrimSpace(request.Header.Get(callerHeader))) { + case "gateway": + return operation.OpSourceGatewayPlayer + case "lobby": + return operation.OpSourceLobbyInternal + case "admin": + return operation.OpSourceAdminRest + default: + return operation.OpSourceAdminRest + } +} + +// requestSourceRef returns an opaque per-request reference recorded +// in `operation_log.source_ref`. v1 reads the X-Request-ID header +// when present so callers may correlate REST requests with audit +// rows. +func requestSourceRef(request *http.Request) string { + return strings.TrimSpace(request.Header.Get(requestIDHeader)) +} + +// loggerFor returns a logger annotated with the operation tag. Each +// handler scopes its logs by op so operators filtering on +// `op=internal_rest.` see exactly the lifecycle they care +// about. +func loggerFor(parent *slog.Logger, op string) *slog.Logger { + if parent == nil { + parent = slog.Default() + } + return parent.With("component", "internal_http.handlers", "op", op) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/common_test.go b/gamemaster/internal/api/internalhttp/handlers/common_test.go new file mode 100644 index 0000000..43ebd45 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/common_test.go @@ -0,0 +1,205 @@ +package handlers + +import ( + "errors" + "net/http" + "net/http/httptest" + "strings" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestMapErrorCodeToStatusCoversEveryDocumentedCode(t *testing.T) { + t.Parallel() + + cases := map[string]int{ + errorCodeInvalidRequest: http.StatusBadRequest, + errorCodeForbidden: http.StatusForbidden, + errorCodeRuntimeNotFound: http.StatusNotFound, + errorCodeEngineVersionNotFound: http.StatusNotFound, + errorCodeConflict: http.StatusConflict, + errorCodeRuntimeNotRunning: http.StatusConflict, + errorCodeSemverPatchOnly: http.StatusConflict, + errorCodeEngineVersionInUse: http.StatusConflict, + errorCodeEngineUnreachable: http.StatusBadGateway, + errorCodeEngineValidationError: http.StatusBadGateway, + errorCodeEngineProtocolError: http.StatusBadGateway, + errorCodeServiceUnavailable: http.StatusServiceUnavailable, + errorCodeInternal: http.StatusInternalServerError, + "unknown_code": http.StatusInternalServerError, + } + + for code, expected := range cases { + assert.Equalf(t, expected, mapErrorCodeToStatus(code), "code %q", code) + } +} + +func TestMapServiceErrorMapsEverySentinel(t *testing.T) { + t.Parallel() + + cases := []struct { + err error + status int + code string + }{ + {engineversionsvc.ErrInvalidRequest, http.StatusBadRequest, errorCodeInvalidRequest}, + {engineversionsvc.ErrNotFound, http.StatusNotFound, errorCodeEngineVersionNotFound}, + {engineversionsvc.ErrConflict, http.StatusConflict, errorCodeConflict}, + {engineversionsvc.ErrInUse, http.StatusConflict, errorCodeEngineVersionInUse}, + {engineversionsvc.ErrServiceUnavailable, http.StatusServiceUnavailable, errorCodeServiceUnavailable}, + {errors.New("plain go error"), http.StatusInternalServerError, errorCodeInternal}, + } + + for _, tc := range cases { + status, code, _ := mapServiceError(tc.err) + assert.Equalf(t, tc.status, status, "status for %v", tc.err) + assert.Equalf(t, tc.code, code, "code for %v", tc.err) + } +} + +func TestResolveOpSourceMapsCallerHeader(t *testing.T) { + t.Parallel() + + cases := map[string]operation.OpSource{ + "": operation.OpSourceAdminRest, + "unknown": operation.OpSourceAdminRest, + "GATEWAY": operation.OpSourceGatewayPlayer, + " lobby ": operation.OpSourceLobbyInternal, + "admin": operation.OpSourceAdminRest, + } + + for value, expected := range cases { + request := httptest.NewRequest(http.MethodGet, "/", nil) + if value != "" { + request.Header.Set(callerHeader, value) + } + assert.Equalf(t, expected, resolveOpSource(request), "header %q", value) + } +} + +func TestRequestSourceRefReadsXRequestID(t *testing.T) { + t.Parallel() + + request := httptest.NewRequest(http.MethodGet, "/", nil) + assert.Empty(t, requestSourceRef(request)) + + request.Header.Set(requestIDHeader, " trace-123 ") + assert.Equal(t, "trace-123", requestSourceRef(request)) +} + +func TestDecodeStrictJSONRejectsUnknownFieldsAndTrailingContent(t *testing.T) { + t.Parallel() + + type input struct { + Field string `json:"field"` + } + + var ok input + require.NoError(t, decodeStrictJSON(strings.NewReader(`{"field":"value"}`), &ok)) + assert.Equal(t, "value", ok.Field) + + var rejected input + err := decodeStrictJSON(strings.NewReader(`{"field":"v","extra":1}`), &rejected) + require.Error(t, err) + + var trailing input + err = decodeStrictJSON(strings.NewReader(`{"field":"v"}{"another":true}`), &trailing) + require.Error(t, err) +} + +func TestReadRawJSONBodyValidatesPayload(t *testing.T) { + t.Parallel() + + body, err := readRawJSONBody(strings.NewReader(`{"commands":[]}`)) + require.NoError(t, err) + assert.JSONEq(t, `{"commands":[]}`, string(body)) + + _, err = readRawJSONBody(strings.NewReader("")) + require.Error(t, err) + + _, err = readRawJSONBody(strings.NewReader("not json")) + require.Error(t, err) +} + +func TestEncodeRuntimeRecordIncludesEveryRequiredField(t *testing.T) { + t.Parallel() + + moment := time.Date(2026, 5, 1, 9, 30, 0, 0, time.UTC) + next := moment.Add(time.Minute) + record := runtime.RuntimeRecord{ + GameID: "game-1", + Status: runtime.StatusRunning, + EngineEndpoint: "http://example:8080", + CurrentImageRef: "galaxy/game:1.2.3", + CurrentEngineVersion: "1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 7, + NextGenerationAt: &next, + SkipNextTick: true, + EngineHealth: "healthy", + CreatedAt: moment, + UpdatedAt: moment, + StartedAt: &moment, + } + + encoded := encodeRuntimeRecord(record) + assert.Equal(t, "game-1", encoded.GameID) + assert.Equal(t, "running", encoded.RuntimeStatus) + assert.Equal(t, moment.UnixMilli(), encoded.CreatedAt) + assert.Equal(t, next.UnixMilli(), encoded.NextGenerationAt) + require.NotNil(t, encoded.StartedAt) + assert.Equal(t, moment.UnixMilli(), *encoded.StartedAt) + assert.Nil(t, encoded.StoppedAt) + assert.Nil(t, encoded.FinishedAt) +} + +func TestEncodeRuntimeRecordZerosNextGenerationWhenNil(t *testing.T) { + t.Parallel() + + moment := time.Date(2026, 5, 1, 9, 30, 0, 0, time.UTC) + record := runtime.RuntimeRecord{ + GameID: "game-1", + Status: runtime.StatusStarting, + EngineEndpoint: "http://example:8080", + CurrentImageRef: "galaxy/game:1.2.3", + CurrentEngineVersion: "1.2.3", + TurnSchedule: "0 18 * * *", + CreatedAt: moment, + UpdatedAt: moment, + } + + encoded := encodeRuntimeRecord(record) + assert.Equal(t, int64(0), encoded.NextGenerationAt) + assert.Nil(t, encoded.StartedAt) +} + +func TestEncodeEngineVersionDefaultsEmptyOptionsToObject(t *testing.T) { + t.Parallel() + + moment := time.Date(2026, 5, 1, 9, 30, 0, 0, time.UTC) + encoded := encodeEngineVersion(engineversion.EngineVersion{ + Version: "1.2.3", + ImageRef: "galaxy/game:1.2.3", + Status: engineversion.StatusActive, + CreatedAt: moment, + UpdatedAt: moment, + }) + assert.Equal(t, "{}", string(encoded.Options)) + assert.Equal(t, "active", encoded.Status) +} + +func TestEncodeRuntimeListAlwaysReturnsNonNilSlice(t *testing.T) { + t.Parallel() + + resp := encodeRuntimeList(nil) + require.NotNil(t, resp.Runtimes) + assert.Empty(t, resp.Runtimes) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/createengineversion.go b/gamemaster/internal/api/internalhttp/handlers/createengineversion.go new file mode 100644 index 0000000..3edd2ac --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/createengineversion.go @@ -0,0 +1,50 @@ +package handlers + +import ( + "encoding/json" + "net/http" + + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" +) + +// createEngineVersionRequestBody mirrors the OpenAPI +// CreateEngineVersionRequest schema. +type createEngineVersionRequestBody struct { + Version string `json:"version"` + ImageRef string `json:"image_ref"` + Options json.RawMessage `json:"options,omitempty"` +} + +// newCreateEngineVersionHandler returns the handler for +// `POST /api/v1/internal/engine-versions`. +func newCreateEngineVersionHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.create_engine_version") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + var body createEngineVersionRequestBody + if err := decodeStrictJSON(request.Body, &body); err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + record, err := deps.EngineVersions.Create(request.Context(), engineversionsvc.CreateInput{ + Version: body.Version, + ImageRef: body.ImageRef, + Options: []byte(body.Options), + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "create engine version failed", "err", err.Error()) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeJSON(writer, http.StatusCreated, encodeEngineVersion(record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/deprecateengineversion.go b/gamemaster/internal/api/internalhttp/handlers/deprecateengineversion.go new file mode 100644 index 0000000..9636812 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/deprecateengineversion.go @@ -0,0 +1,44 @@ +package handlers + +import ( + "net/http" + + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" +) + +// newDeprecateEngineVersionHandler returns the handler for +// `DELETE /api/v1/internal/engine-versions/{version}`. The endpoint +// flips the row's status to `deprecated` (decision D2 in +// `gamemaster/docs/stage19-internal-rest-handlers.md`); hard removal +// is reserved for future Admin Service operations and not exposed +// here. +func newDeprecateEngineVersionHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.deprecate_engine_version") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + version, ok := extractVersion(writer, request) + if !ok { + return + } + + if err := deps.EngineVersions.Deprecate(request.Context(), engineversionsvc.DeprecateInput{ + Version: version, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }); err != nil { + logger.ErrorContext(request.Context(), "deprecate engine version failed", + "version", version, + "err", err.Error(), + ) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeNoContent(writer) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/executecommands.go b/gamemaster/internal/api/internalhttp/handlers/executecommands.go new file mode 100644 index 0000000..0bbc617 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/executecommands.go @@ -0,0 +1,60 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/commandexecute" +) + +// newExecuteCommandsHandler returns the handler for +// `POST /api/v1/internal/games/{game_id}/commands`. The request body +// is engine-owned (`additionalProperties: true`) and is forwarded to +// the service as a `json.RawMessage`. The response on success is the +// engine's payload byte-for-byte; failure outcomes use the canonical +// error envelope per the OpenAPI contract. +func newExecuteCommandsHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.execute_commands") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.CommandExecute == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "command execute service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + userID, ok := extractUserID(writer, request) + if !ok { + return + } + body, err := readRawJSONBody(request.Body) + if err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + result, err := deps.CommandExecute.Handle(request.Context(), commandexecute.Input{ + GameID: gameID, + UserID: userID, + Payload: body, + }) + if err != nil { + logger.ErrorContext(request.Context(), "command execute service errored", + "game_id", gameID, + "user_id", userID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "command execute service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeRawJSON(writer, http.StatusOK, []byte(result.RawResponse)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/forcenextturn.go b/gamemaster/internal/api/internalhttp/handlers/forcenextturn.go new file mode 100644 index 0000000..bb6089c --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/forcenextturn.go @@ -0,0 +1,49 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/adminforce" +) + +// newForceNextTurnHandler returns the handler for +// `POST /api/v1/internal/runtimes/{game_id}/force-next-turn`. The +// request has no body; the handler delegates to +// `adminforce.Service.Handle` and encodes the resulting runtime +// record on success. +func newForceNextTurnHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.force_next_turn") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.ForceNextTurn == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "force next turn service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + result, err := deps.ForceNextTurn.Handle(request.Context(), adminforce.Input{ + GameID: gameID, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "force next turn service errored", + "game_id", gameID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "force next turn service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeJSON(writer, http.StatusOK, encodeRuntimeRecord(result.TurnGeneration.Record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/gameliveness.go b/gamemaster/internal/api/internalhttp/handlers/gameliveness.go new file mode 100644 index 0000000..5320730 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/gameliveness.go @@ -0,0 +1,50 @@ +package handlers + +import ( + "net/http" + "strings" + + "galaxy/gamemaster/internal/service/livenessreply" +) + +// newGameLivenessHandler returns the handler for +// `GET /api/v1/internal/games/{game_id}/liveness`. The endpoint +// always responds with 200 + LivenessResponse; Go-level errors +// returned by the service map to 500 / 503 according to their +// embedded error code prefix. +func newGameLivenessHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.game_liveness") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.GameLiveness == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "game liveness service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + result, err := deps.GameLiveness.Handle(request.Context(), livenessreply.Input{GameID: gameID}) + if err != nil { + logger.ErrorContext(request.Context(), "game liveness service errored", + "game_id", gameID, + "err", err.Error(), + ) + switch { + case strings.HasPrefix(err.Error(), livenessreply.ErrorCodeInvalidRequest+":"): + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + case strings.HasPrefix(err.Error(), livenessreply.ErrorCodeServiceUnavailable+":"): + writeError(writer, http.StatusServiceUnavailable, errorCodeServiceUnavailable, "service unavailable") + default: + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "game liveness service failed") + } + return + } + + writeJSON(writer, http.StatusOK, livenessResponse{ + Ready: result.Ready, + Status: string(result.Status), + }) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/getengineversion.go b/gamemaster/internal/api/internalhttp/handlers/getengineversion.go new file mode 100644 index 0000000..4bfe3e3 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/getengineversion.go @@ -0,0 +1,33 @@ +package handlers + +import "net/http" + +// newGetEngineVersionHandler returns the handler for +// `GET /api/v1/internal/engine-versions/{version}`. +func newGetEngineVersionHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.get_engine_version") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + version, ok := extractVersion(writer, request) + if !ok { + return + } + + record, err := deps.EngineVersions.Get(request.Context(), version) + if err != nil { + logger.ErrorContext(request.Context(), "get engine version failed", + "version", version, + "err", err.Error(), + ) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeJSON(writer, http.StatusOK, encodeEngineVersion(record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/getreport.go b/gamemaster/internal/api/internalhttp/handlers/getreport.go new file mode 100644 index 0000000..fad6d23 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/getreport.go @@ -0,0 +1,67 @@ +package handlers + +import ( + "net/http" + "strconv" + "strings" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/reportget" +) + +// newGetReportHandler returns the handler for +// `GET /api/v1/internal/games/{game_id}/reports/{turn}`. Path +// validation rejects non-numeric or negative turn values with +// `400 invalid_request` before the service is touched. +func newGetReportHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.get_report") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.GetReport == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "get report service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + userID, ok := extractUserID(writer, request) + if !ok { + return + } + + raw := strings.TrimSpace(request.PathValue(turnPathParam)) + if raw == "" { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "turn is required") + return + } + turn, err := strconv.Atoi(raw) + if err != nil || turn < 0 { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "turn must be a non-negative integer") + return + } + + result, err := deps.GetReport.Handle(request.Context(), reportget.Input{ + GameID: gameID, + UserID: userID, + Turn: turn, + }) + if err != nil { + logger.ErrorContext(request.Context(), "get report service errored", + "game_id", gameID, + "user_id", userID, + "turn", turn, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "get report service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeRawJSON(writer, http.StatusOK, []byte(result.RawResponse)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/getruntime.go b/gamemaster/internal/api/internalhttp/handlers/getruntime.go new file mode 100644 index 0000000..e99cd86 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/getruntime.go @@ -0,0 +1,43 @@ +package handlers + +import ( + "errors" + "net/http" + + "galaxy/gamemaster/internal/domain/runtime" +) + +// newGetRuntimeHandler returns the handler for +// `GET /api/v1/internal/runtimes/{game_id}`. Reads from +// `RuntimeRecordsReader.Get` and translates `runtime.ErrNotFound` to +// `404 runtime_not_found`. +func newGetRuntimeHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.get_runtime") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.RuntimeRecords == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "runtime records store is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + record, err := deps.RuntimeRecords.Get(request.Context(), gameID) + if err != nil { + if errors.Is(err, runtime.ErrNotFound) { + writeError(writer, http.StatusNotFound, errorCodeRuntimeNotFound, "runtime not found") + return + } + logger.ErrorContext(request.Context(), "get runtime record failed", + "game_id", gameID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "failed to read runtime record") + return + } + + writeJSON(writer, http.StatusOK, encodeRuntimeRecord(record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/handlers.go b/gamemaster/internal/api/internalhttp/handlers/handlers.go new file mode 100644 index 0000000..2fadcf4 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/handlers.go @@ -0,0 +1,119 @@ +// Package handlers serves the trusted internal REST surface of Game +// Master frozen by `gamemaster/api/internal-openapi.yaml`. The package +// owns one HandlerFunc per OpenAPI operation; route registration goes +// through Register so the listener (`internal/api/internalhttp`) keeps +// its lifecycle code separate from the per-operation logic. Handlers +// delegate every business decision to the `internal/service/*` +// packages and never decode engine-owned hot-path payloads. +// +// The pattern mirrors `rtmanager/internal/api/internalhttp/handlers` +// so a reader familiar with one service can find their way around the +// other. +package handlers + +import ( + "log/slog" + "net/http" +) + +// Route paths frozen by `gamemaster/api/internal-openapi.yaml`. The +// values match the operation IDs asserted in +// `gamemaster/contract_openapi_test.go`; renaming any of them is a +// contract change. +const ( + registerRuntimePath = "/api/v1/internal/games/{game_id}/register-runtime" + banishRacePath = "/api/v1/internal/games/{game_id}/race/{race_name}/banish" + invalidateMembershipsPath = "/api/v1/internal/games/{game_id}/memberships/invalidate" + gameLivenessPath = "/api/v1/internal/games/{game_id}/liveness" + listRuntimesPath = "/api/v1/internal/runtimes" + getRuntimePath = "/api/v1/internal/runtimes/{game_id}" + forceNextTurnPath = "/api/v1/internal/runtimes/{game_id}/force-next-turn" + stopRuntimePath = "/api/v1/internal/runtimes/{game_id}/stop" + patchRuntimePath = "/api/v1/internal/runtimes/{game_id}/patch" + listEngineVersionsPath = "/api/v1/internal/engine-versions" + createEngineVersionPath = "/api/v1/internal/engine-versions" + engineVersionItemPath = "/api/v1/internal/engine-versions/{version}" + resolveEngineVersionImageRefPath = "/api/v1/internal/engine-versions/{version}/image-ref" + executeCommandsPath = "/api/v1/internal/games/{game_id}/commands" + putOrdersPath = "/api/v1/internal/games/{game_id}/orders" + getReportPath = "/api/v1/internal/games/{game_id}/reports/{turn}" +) + +// Dependencies bundles the collaborators required to serve the +// gateway-, Lobby-, and Admin-facing internal REST surface. Any port +// may be nil; in that case the routes that depend on it return +// `500 internal_error` with the message «service is not wired». This +// mirrors the rtmanager handlers' guard so partially-wired listener +// tests do not crash on routes they do not exercise. +type Dependencies struct { + // Logger receives structured per-handler logs. nil falls back to + // slog.Default. + Logger *slog.Logger + + // RuntimeRecords backs the read-only list/get runtime endpoints. + // Reads do not produce operation_log rows, mirroring + // `rtmanager/docs/services.md` §18. + RuntimeRecords RuntimeRecordsReader + + // RegisterRuntime is the orchestrator for the + // `internalRegisterRuntime` operation. + RegisterRuntime RegisterRuntimeService + + // ForceNextTurn drives the synchronous force-next-turn flow. + ForceNextTurn ForceNextTurnService + + // StopRuntime drives the admin stop flow. + StopRuntime StopRuntimeService + + // PatchRuntime drives the admin patch flow. + PatchRuntime PatchRuntimeService + + // BanishRace drives the engine race-banish flow. + BanishRace BanishRaceService + + // InvalidateMemberships purges the in-process membership cache for a + // game id; backed by `service/membership.Cache.Invalidate`. + InvalidateMemberships MembershipInvalidator + + // GameLiveness returns the current runtime status without + // contacting the engine. + GameLiveness LivenessService + + // EngineVersions exposes the multi-method engine-version registry + // service (List/Get/ResolveImageRef/Create/Update/Deprecate). + EngineVersions EngineVersionService + + // CommandExecute forwards a player command batch to the engine. + CommandExecute CommandExecuteService + + // PutOrders forwards a player order batch to the engine. + PutOrders OrderPutService + + // GetReport reads a per-player turn report from the engine. + GetReport ReportGetService +} + +// Register attaches every internal REST route to mux. The function is +// idempotent against the listener-level probes (`/healthz`, +// `/readyz`); the probe routes are owned by the listener and remain +// disjoint from the paths registered here. +func Register(mux *http.ServeMux, deps Dependencies) { + mux.HandleFunc(http.MethodPost+" "+registerRuntimePath, newRegisterRuntimeHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+getRuntimePath, newGetRuntimeHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+listRuntimesPath, newListRuntimesHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+forceNextTurnPath, newForceNextTurnHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+stopRuntimePath, newStopRuntimeHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+patchRuntimePath, newPatchRuntimeHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+banishRacePath, newBanishRaceHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+invalidateMembershipsPath, newInvalidateMembershipsHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+gameLivenessPath, newGameLivenessHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+listEngineVersionsPath, newListEngineVersionsHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+createEngineVersionPath, newCreateEngineVersionHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+engineVersionItemPath, newGetEngineVersionHandler(deps)) + mux.HandleFunc(http.MethodPatch+" "+engineVersionItemPath, newUpdateEngineVersionHandler(deps)) + mux.HandleFunc(http.MethodDelete+" "+engineVersionItemPath, newDeprecateEngineVersionHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+resolveEngineVersionImageRefPath, newResolveEngineVersionImageRefHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+executeCommandsPath, newExecuteCommandsHandler(deps)) + mux.HandleFunc(http.MethodPost+" "+putOrdersPath, newPutOrdersHandler(deps)) + mux.HandleFunc(http.MethodGet+" "+getReportPath, newGetReportHandler(deps)) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/handlers_test.go b/gamemaster/internal/api/internalhttp/handlers/handlers_test.go new file mode 100644 index 0000000..7e3ae77 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/handlers_test.go @@ -0,0 +1,422 @@ +package handlers_test + +import ( + "context" + "encoding/json" + "errors" + "io" + "net/http" + "net/http/httptest" + "strings" + "testing" + "time" + + "galaxy/gamemaster/internal/api/internalhttp/handlers" + "galaxy/gamemaster/internal/api/internalhttp/handlers/mocks" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/service/adminstop" + "galaxy/gamemaster/internal/service/commandexecute" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" + "galaxy/gamemaster/internal/service/livenessreply" + "galaxy/gamemaster/internal/service/registerruntime" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// driveHandler builds a fresh ServeMux + handler set bound to deps, +// fires one request, and returns the recorder. +func driveHandler(t *testing.T, deps handlers.Dependencies, method, path string, body io.Reader, headers map[string]string) *httptest.ResponseRecorder { + t.Helper() + mux := http.NewServeMux() + handlers.Register(mux, deps) + request := httptest.NewRequest(method, path, body) + for key, value := range headers { + request.Header.Set(key, value) + } + if body != nil { + request.Header.Set("Content-Type", "application/json") + } + recorder := httptest.NewRecorder() + mux.ServeHTTP(recorder, request) + return recorder +} + +func decodeErrorBody(t *testing.T, recorder *httptest.ResponseRecorder) (string, string) { + t.Helper() + var body struct { + Error struct { + Code string `json:"code"` + Message string `json:"message"` + } `json:"error"` + } + require.NoError(t, json.Unmarshal(recorder.Body.Bytes(), &body)) + return body.Error.Code, body.Error.Message +} + +func TestRegisterRuntimeHandlerHappyPath(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + + moment := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + record := runtime.RuntimeRecord{ + GameID: "game-1", + Status: runtime.StatusRunning, + EngineEndpoint: "http://engine:8080", + CurrentImageRef: "galaxy/game:1.2.3", + CurrentEngineVersion: "1.2.3", + TurnSchedule: "0 18 * * *", + CreatedAt: moment, + UpdatedAt: moment, + } + + registerSvc := mocks.NewMockRegisterRuntimeService(ctrl) + registerSvc.EXPECT(). + Handle(gomock.Any(), gomock.AssignableToTypeOf(registerruntime.Input{})). + DoAndReturn(func(_ context.Context, in registerruntime.Input) (registerruntime.Result, error) { + assert.Equal(t, "game-1", in.GameID) + assert.Equal(t, "http://engine:8080", in.EngineEndpoint) + assert.Equal(t, operation.OpSourceLobbyInternal, in.OpSource) + require.Len(t, in.Members, 1) + return registerruntime.Result{Record: record, Outcome: operation.OutcomeSuccess}, nil + }) + + body := strings.NewReader(`{ + "engine_endpoint": "http://engine:8080", + "members": [{"user_id":"u1","race_name":"Aelinari"}], + "target_engine_version": "1.2.3", + "turn_schedule": "0 18 * * *" + }`) + recorder := driveHandler(t, + handlers.Dependencies{RegisterRuntime: registerSvc}, + http.MethodPost, + "/api/v1/internal/games/game-1/register-runtime", + body, + map[string]string{"X-Galaxy-Caller": "lobby"}, + ) + + require.Equal(t, http.StatusOK, recorder.Code, recorder.Body.String()) + assert.Contains(t, recorder.Body.String(), `"game_id":"game-1"`) +} + +func TestRegisterRuntimeHandlerRejectsUnknownFields(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + registerSvc := mocks.NewMockRegisterRuntimeService(ctrl) + // no expectations — handler must short-circuit before calling. + + body := strings.NewReader(`{"engine_endpoint":"http://e","extra":1}`) + recorder := driveHandler(t, + handlers.Dependencies{RegisterRuntime: registerSvc}, + http.MethodPost, + "/api/v1/internal/games/game-1/register-runtime", + body, + nil, + ) + + require.Equal(t, http.StatusBadRequest, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "invalid_request", code) +} + +func TestRegisterRuntimeHandlerWiresFailureCodes(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + errCode string + wantStatus int + }{ + {"invalid_request", registerruntime.ErrorCodeInvalidRequest, http.StatusBadRequest}, + {"conflict", registerruntime.ErrorCodeConflict, http.StatusConflict}, + {"engine_version_not_found", registerruntime.ErrorCodeEngineVersionNotFound, http.StatusNotFound}, + {"engine_unreachable", registerruntime.ErrorCodeEngineUnreachable, http.StatusBadGateway}, + {"service_unavailable", registerruntime.ErrorCodeServiceUnavailable, http.StatusServiceUnavailable}, + {"internal_error", registerruntime.ErrorCodeInternal, http.StatusInternalServerError}, + } + + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockRegisterRuntimeService(ctrl) + svc.EXPECT(). + Handle(gomock.Any(), gomock.Any()). + Return(registerruntime.Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: tc.errCode, + ErrorMessage: tc.errCode + " details", + }, nil) + + body := strings.NewReader(`{ + "engine_endpoint": "http://e", + "members":[{"user_id":"u1","race_name":"r"}], + "target_engine_version":"1.0.0", + "turn_schedule":"* * * * *" + }`) + recorder := driveHandler(t, + handlers.Dependencies{RegisterRuntime: svc}, + http.MethodPost, + "/api/v1/internal/games/game-1/register-runtime", + body, + nil, + ) + + assert.Equal(t, tc.wantStatus, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, tc.errCode, code) + }) + } +} + +func TestRegisterRuntimeHandlerNilServiceReturns500(t *testing.T) { + t.Parallel() + + body := strings.NewReader(`{"engine_endpoint":"http://e"}`) + recorder := driveHandler(t, + handlers.Dependencies{}, + http.MethodPost, + "/api/v1/internal/games/game-1/register-runtime", + body, + nil, + ) + require.Equal(t, http.StatusInternalServerError, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "internal_error", code) +} + +func TestStopRuntimeHandlerForwardsReason(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + + moment := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + record := runtime.RuntimeRecord{ + GameID: "game-1", + Status: runtime.StatusStopped, + EngineEndpoint: "http://engine:8080", + CurrentImageRef: "galaxy/game:1.2.3", + CurrentEngineVersion: "1.2.3", + TurnSchedule: "0 18 * * *", + CreatedAt: moment, + UpdatedAt: moment, + } + + stopSvc := mocks.NewMockStopRuntimeService(ctrl) + stopSvc.EXPECT(). + Handle(gomock.Any(), gomock.AssignableToTypeOf(adminstop.Input{})). + DoAndReturn(func(_ context.Context, in adminstop.Input) (adminstop.Result, error) { + assert.Equal(t, "admin_request", in.Reason) + return adminstop.Result{Record: record, Outcome: operation.OutcomeSuccess}, nil + }) + + body := strings.NewReader(`{"reason":"admin_request"}`) + recorder := driveHandler(t, + handlers.Dependencies{StopRuntime: stopSvc}, + http.MethodPost, + "/api/v1/internal/runtimes/game-1/stop", + body, + nil, + ) + require.Equal(t, http.StatusOK, recorder.Code, recorder.Body.String()) +} + +func TestGetEngineVersionHandlerMapsNotFound(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockEngineVersionService(ctrl) + svc.EXPECT(). + Get(gomock.Any(), "9.9.9"). + Return(engineversion.EngineVersion{}, engineversionsvc.ErrNotFound) + + recorder := driveHandler(t, + handlers.Dependencies{EngineVersions: svc}, + http.MethodGet, + "/api/v1/internal/engine-versions/9.9.9", + nil, + nil, + ) + + assert.Equal(t, http.StatusNotFound, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "engine_version_not_found", code) +} + +func TestListEngineVersionsHandlerRejectsUnknownStatus(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockEngineVersionService(ctrl) + // no expectations — short-circuits. + + recorder := driveHandler(t, + handlers.Dependencies{EngineVersions: svc}, + http.MethodGet, + "/api/v1/internal/engine-versions?status=mystery", + nil, + nil, + ) + + assert.Equal(t, http.StatusBadRequest, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "invalid_request", code) +} + +func TestDeprecateEngineVersionReturns204(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockEngineVersionService(ctrl) + svc.EXPECT(). + Deprecate(gomock.Any(), gomock.AssignableToTypeOf(engineversionsvc.DeprecateInput{})). + Return(nil) + + recorder := driveHandler(t, + handlers.Dependencies{EngineVersions: svc}, + http.MethodDelete, + "/api/v1/internal/engine-versions/1.0.0", + nil, + nil, + ) + assert.Equal(t, http.StatusNoContent, recorder.Code) + assert.Empty(t, recorder.Body.String()) +} + +func TestDeprecateEngineVersionDoesNotReportInUse(t *testing.T) { + t.Parallel() + // D2: the DELETE endpoint flips status; the handler does not call + // Service.Delete and therefore can never produce + // `engine_version_in_use`. Deprecate's own error vocabulary is + // limited to invalid_request / not_found / service_unavailable. + ctrl := gomock.NewController(t) + svc := mocks.NewMockEngineVersionService(ctrl) + svc.EXPECT(). + Deprecate(gomock.Any(), gomock.Any()). + Return(engineversionsvc.ErrNotFound) + + recorder := driveHandler(t, + handlers.Dependencies{EngineVersions: svc}, + http.MethodDelete, + "/api/v1/internal/engine-versions/9.9.9", + nil, + nil, + ) + assert.Equal(t, http.StatusNotFound, recorder.Code) +} + +func TestExecuteCommandsRequiresUserIDHeader(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockCommandExecuteService(ctrl) + // short-circuit before service is touched. + + recorder := driveHandler(t, + handlers.Dependencies{CommandExecute: svc}, + http.MethodPost, + "/api/v1/internal/games/game-1/commands", + strings.NewReader(`{"commands":[]}`), + nil, + ) + assert.Equal(t, http.StatusBadRequest, recorder.Code) + code, msg := decodeErrorBody(t, recorder) + assert.Equal(t, "invalid_request", code) + assert.Contains(t, msg, "X-User-ID") +} + +func TestExecuteCommandsRejectsInvalidJSONBody(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockCommandExecuteService(ctrl) + + recorder := driveHandler(t, + handlers.Dependencies{CommandExecute: svc}, + http.MethodPost, + "/api/v1/internal/games/game-1/commands", + strings.NewReader("not json"), + map[string]string{"X-User-ID": "u1"}, + ) + assert.Equal(t, http.StatusBadRequest, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "invalid_request", code) +} + +func TestExecuteCommandsForwardsRawResponseOnSuccess(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockCommandExecuteService(ctrl) + svc.EXPECT(). + Handle(gomock.Any(), gomock.AssignableToTypeOf(commandexecute.Input{})). + DoAndReturn(func(_ context.Context, in commandexecute.Input) (commandexecute.Result, error) { + assert.Equal(t, "game-1", in.GameID) + assert.Equal(t, "u1", in.UserID) + assert.JSONEq(t, `{"commands":[{"name":"build"}]}`, string(in.Payload)) + return commandexecute.Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: []byte(`{"results":[{"ok":true}]}`), + }, nil + }) + + recorder := driveHandler(t, + handlers.Dependencies{CommandExecute: svc}, + http.MethodPost, + "/api/v1/internal/games/game-1/commands", + strings.NewReader(`{"commands":[{"name":"build"}]}`), + map[string]string{"X-User-ID": "u1"}, + ) + require.Equal(t, http.StatusOK, recorder.Code, recorder.Body.String()) + assert.JSONEq(t, `{"results":[{"ok":true}]}`, recorder.Body.String()) +} + +func TestInvalidateMembershipsAlwaysReturns204(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + cache := mocks.NewMockMembershipInvalidator(ctrl) + cache.EXPECT().Invalidate("game-7").Times(1) + + recorder := driveHandler(t, + handlers.Dependencies{InvalidateMemberships: cache}, + http.MethodPost, + "/api/v1/internal/games/game-7/memberships/invalidate", + nil, + nil, + ) + assert.Equal(t, http.StatusNoContent, recorder.Code) +} + +func TestGameLivenessHandlerMapsServiceUnavailable(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockLivenessService(ctrl) + svc.EXPECT(). + Handle(gomock.Any(), livenessreply.Input{GameID: "game-1"}). + Return(livenessreply.Result{}, errors.New(livenessreply.ErrorCodeServiceUnavailable+": store ping")) + + recorder := driveHandler(t, + handlers.Dependencies{GameLiveness: svc}, + http.MethodGet, + "/api/v1/internal/games/game-1/liveness", + nil, + nil, + ) + assert.Equal(t, http.StatusServiceUnavailable, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "service_unavailable", code) +} + +func TestGetReportRejectsNegativeTurn(t *testing.T) { + t.Parallel() + ctrl := gomock.NewController(t) + svc := mocks.NewMockReportGetService(ctrl) + // short-circuits. + + recorder := driveHandler(t, + handlers.Dependencies{GetReport: svc}, + http.MethodGet, + "/api/v1/internal/games/game-1/reports/-3", + nil, + map[string]string{"X-User-ID": "u1"}, + ) + assert.Equal(t, http.StatusBadRequest, recorder.Code) + code, _ := decodeErrorBody(t, recorder) + assert.Equal(t, "invalid_request", code) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/invalidatememberships.go b/gamemaster/internal/api/internalhttp/handlers/invalidatememberships.go new file mode 100644 index 0000000..9c53086 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/invalidatememberships.go @@ -0,0 +1,25 @@ +package handlers + +import "net/http" + +// newInvalidateMembershipsHandler returns the handler for +// `POST /api/v1/internal/games/{game_id}/memberships/invalidate`. The +// underlying cache invalidation is a fire-and-forget local operation, +// so the handler always responds with `204 No Content` once the path +// parameter validates. +func newInvalidateMembershipsHandler(deps Dependencies) http.HandlerFunc { + return func(writer http.ResponseWriter, request *http.Request) { + if deps.InvalidateMemberships == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "membership cache invalidator is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + deps.InvalidateMemberships.Invalidate(gameID) + writeNoContent(writer) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/listengineversions.go b/gamemaster/internal/api/internalhttp/handlers/listengineversions.go new file mode 100644 index 0000000..7556f1b --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/listengineversions.go @@ -0,0 +1,42 @@ +package handlers + +import ( + "net/http" + "strings" + + "galaxy/gamemaster/internal/domain/engineversion" +) + +// newListEngineVersionsHandler returns the handler for +// `GET /api/v1/internal/engine-versions`. The optional `status` +// query parameter narrows the result. +func newListEngineVersionsHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.list_engine_versions") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + var statusFilter *engineversion.Status + raw := strings.TrimSpace(request.URL.Query().Get("status")) + if raw != "" { + candidate := engineversion.Status(raw) + if !candidate.IsKnown() { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "status query parameter is unsupported") + return + } + statusFilter = &candidate + } + + versions, err := deps.EngineVersions.List(request.Context(), statusFilter) + if err != nil { + logger.ErrorContext(request.Context(), "list engine versions failed", "err", err.Error()) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeJSON(writer, http.StatusOK, encodeEngineVersionList(versions)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/listruntimes.go b/gamemaster/internal/api/internalhttp/handlers/listruntimes.go new file mode 100644 index 0000000..b65f543 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/listruntimes.go @@ -0,0 +1,54 @@ +package handlers + +import ( + "net/http" + "strings" + + "galaxy/gamemaster/internal/domain/runtime" +) + +// newListRuntimesHandler returns the handler for +// `GET /api/v1/internal/runtimes`. The optional `status` query +// parameter narrows the result; an unknown value short-circuits with +// `400 invalid_request`. Records are returned ordered by +// `created_at DESC` (the underlying store guarantees the ordering). +func newListRuntimesHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.list_runtimes") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.RuntimeRecords == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "runtime records store is not wired") + return + } + + ctx := request.Context() + + raw := strings.TrimSpace(request.URL.Query().Get("status")) + if raw == "" { + records, err := deps.RuntimeRecords.List(ctx) + if err != nil { + logger.ErrorContext(ctx, "list runtime records failed", "err", err.Error()) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "failed to list runtime records") + return + } + writeJSON(writer, http.StatusOK, encodeRuntimeList(records)) + return + } + + status := runtime.Status(raw) + if !status.IsKnown() { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, "status query parameter is unsupported") + return + } + + records, err := deps.RuntimeRecords.ListByStatus(ctx, status) + if err != nil { + logger.ErrorContext(ctx, "list runtime records by status failed", + "status", string(status), + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "failed to list runtime records") + return + } + writeJSON(writer, http.StatusOK, encodeRuntimeList(records)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/mocks/mock_services.go b/gamemaster/internal/api/internalhttp/handlers/mocks/mock_services.go new file mode 100644 index 0000000..9345131 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/mocks/mock_services.go @@ -0,0 +1,598 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: galaxy/gamemaster/internal/api/internalhttp/handlers (interfaces: RegisterRuntimeService,ForceNextTurnService,StopRuntimeService,PatchRuntimeService,BanishRaceService,LivenessService,CommandExecuteService,OrderPutService,ReportGetService,MembershipInvalidator,EngineVersionService,RuntimeRecordsReader) +// +// Generated by this command: +// +// mockgen -destination=./mocks/mock_services.go -package=mocks galaxy/gamemaster/internal/api/internalhttp/handlers RegisterRuntimeService,ForceNextTurnService,StopRuntimeService,PatchRuntimeService,BanishRaceService,LivenessService,CommandExecuteService,OrderPutService,ReportGetService,MembershipInvalidator,EngineVersionService,RuntimeRecordsReader +// + +// Package mocks is a generated GoMock package. +package mocks + +import ( + context "context" + engineversion "galaxy/gamemaster/internal/domain/engineversion" + runtime "galaxy/gamemaster/internal/domain/runtime" + adminbanish "galaxy/gamemaster/internal/service/adminbanish" + adminforce "galaxy/gamemaster/internal/service/adminforce" + adminpatch "galaxy/gamemaster/internal/service/adminpatch" + adminstop "galaxy/gamemaster/internal/service/adminstop" + commandexecute "galaxy/gamemaster/internal/service/commandexecute" + engineversion0 "galaxy/gamemaster/internal/service/engineversion" + livenessreply "galaxy/gamemaster/internal/service/livenessreply" + orderput "galaxy/gamemaster/internal/service/orderput" + registerruntime "galaxy/gamemaster/internal/service/registerruntime" + reportget "galaxy/gamemaster/internal/service/reportget" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockRegisterRuntimeService is a mock of RegisterRuntimeService interface. +type MockRegisterRuntimeService struct { + ctrl *gomock.Controller + recorder *MockRegisterRuntimeServiceMockRecorder + isgomock struct{} +} + +// MockRegisterRuntimeServiceMockRecorder is the mock recorder for MockRegisterRuntimeService. +type MockRegisterRuntimeServiceMockRecorder struct { + mock *MockRegisterRuntimeService +} + +// NewMockRegisterRuntimeService creates a new mock instance. +func NewMockRegisterRuntimeService(ctrl *gomock.Controller) *MockRegisterRuntimeService { + mock := &MockRegisterRuntimeService{ctrl: ctrl} + mock.recorder = &MockRegisterRuntimeServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockRegisterRuntimeService) EXPECT() *MockRegisterRuntimeServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockRegisterRuntimeService) Handle(ctx context.Context, in registerruntime.Input) (registerruntime.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(registerruntime.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockRegisterRuntimeServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockRegisterRuntimeService)(nil).Handle), ctx, in) +} + +// MockForceNextTurnService is a mock of ForceNextTurnService interface. +type MockForceNextTurnService struct { + ctrl *gomock.Controller + recorder *MockForceNextTurnServiceMockRecorder + isgomock struct{} +} + +// MockForceNextTurnServiceMockRecorder is the mock recorder for MockForceNextTurnService. +type MockForceNextTurnServiceMockRecorder struct { + mock *MockForceNextTurnService +} + +// NewMockForceNextTurnService creates a new mock instance. +func NewMockForceNextTurnService(ctrl *gomock.Controller) *MockForceNextTurnService { + mock := &MockForceNextTurnService{ctrl: ctrl} + mock.recorder = &MockForceNextTurnServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockForceNextTurnService) EXPECT() *MockForceNextTurnServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockForceNextTurnService) Handle(ctx context.Context, in adminforce.Input) (adminforce.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(adminforce.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockForceNextTurnServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockForceNextTurnService)(nil).Handle), ctx, in) +} + +// MockStopRuntimeService is a mock of StopRuntimeService interface. +type MockStopRuntimeService struct { + ctrl *gomock.Controller + recorder *MockStopRuntimeServiceMockRecorder + isgomock struct{} +} + +// MockStopRuntimeServiceMockRecorder is the mock recorder for MockStopRuntimeService. +type MockStopRuntimeServiceMockRecorder struct { + mock *MockStopRuntimeService +} + +// NewMockStopRuntimeService creates a new mock instance. +func NewMockStopRuntimeService(ctrl *gomock.Controller) *MockStopRuntimeService { + mock := &MockStopRuntimeService{ctrl: ctrl} + mock.recorder = &MockStopRuntimeServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockStopRuntimeService) EXPECT() *MockStopRuntimeServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockStopRuntimeService) Handle(ctx context.Context, in adminstop.Input) (adminstop.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(adminstop.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockStopRuntimeServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockStopRuntimeService)(nil).Handle), ctx, in) +} + +// MockPatchRuntimeService is a mock of PatchRuntimeService interface. +type MockPatchRuntimeService struct { + ctrl *gomock.Controller + recorder *MockPatchRuntimeServiceMockRecorder + isgomock struct{} +} + +// MockPatchRuntimeServiceMockRecorder is the mock recorder for MockPatchRuntimeService. +type MockPatchRuntimeServiceMockRecorder struct { + mock *MockPatchRuntimeService +} + +// NewMockPatchRuntimeService creates a new mock instance. +func NewMockPatchRuntimeService(ctrl *gomock.Controller) *MockPatchRuntimeService { + mock := &MockPatchRuntimeService{ctrl: ctrl} + mock.recorder = &MockPatchRuntimeServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockPatchRuntimeService) EXPECT() *MockPatchRuntimeServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockPatchRuntimeService) Handle(ctx context.Context, in adminpatch.Input) (adminpatch.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(adminpatch.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockPatchRuntimeServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockPatchRuntimeService)(nil).Handle), ctx, in) +} + +// MockBanishRaceService is a mock of BanishRaceService interface. +type MockBanishRaceService struct { + ctrl *gomock.Controller + recorder *MockBanishRaceServiceMockRecorder + isgomock struct{} +} + +// MockBanishRaceServiceMockRecorder is the mock recorder for MockBanishRaceService. +type MockBanishRaceServiceMockRecorder struct { + mock *MockBanishRaceService +} + +// NewMockBanishRaceService creates a new mock instance. +func NewMockBanishRaceService(ctrl *gomock.Controller) *MockBanishRaceService { + mock := &MockBanishRaceService{ctrl: ctrl} + mock.recorder = &MockBanishRaceServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockBanishRaceService) EXPECT() *MockBanishRaceServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockBanishRaceService) Handle(ctx context.Context, in adminbanish.Input) (adminbanish.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(adminbanish.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockBanishRaceServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockBanishRaceService)(nil).Handle), ctx, in) +} + +// MockLivenessService is a mock of LivenessService interface. +type MockLivenessService struct { + ctrl *gomock.Controller + recorder *MockLivenessServiceMockRecorder + isgomock struct{} +} + +// MockLivenessServiceMockRecorder is the mock recorder for MockLivenessService. +type MockLivenessServiceMockRecorder struct { + mock *MockLivenessService +} + +// NewMockLivenessService creates a new mock instance. +func NewMockLivenessService(ctrl *gomock.Controller) *MockLivenessService { + mock := &MockLivenessService{ctrl: ctrl} + mock.recorder = &MockLivenessServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockLivenessService) EXPECT() *MockLivenessServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockLivenessService) Handle(ctx context.Context, in livenessreply.Input) (livenessreply.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(livenessreply.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockLivenessServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockLivenessService)(nil).Handle), ctx, in) +} + +// MockCommandExecuteService is a mock of CommandExecuteService interface. +type MockCommandExecuteService struct { + ctrl *gomock.Controller + recorder *MockCommandExecuteServiceMockRecorder + isgomock struct{} +} + +// MockCommandExecuteServiceMockRecorder is the mock recorder for MockCommandExecuteService. +type MockCommandExecuteServiceMockRecorder struct { + mock *MockCommandExecuteService +} + +// NewMockCommandExecuteService creates a new mock instance. +func NewMockCommandExecuteService(ctrl *gomock.Controller) *MockCommandExecuteService { + mock := &MockCommandExecuteService{ctrl: ctrl} + mock.recorder = &MockCommandExecuteServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockCommandExecuteService) EXPECT() *MockCommandExecuteServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockCommandExecuteService) Handle(ctx context.Context, in commandexecute.Input) (commandexecute.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(commandexecute.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockCommandExecuteServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockCommandExecuteService)(nil).Handle), ctx, in) +} + +// MockOrderPutService is a mock of OrderPutService interface. +type MockOrderPutService struct { + ctrl *gomock.Controller + recorder *MockOrderPutServiceMockRecorder + isgomock struct{} +} + +// MockOrderPutServiceMockRecorder is the mock recorder for MockOrderPutService. +type MockOrderPutServiceMockRecorder struct { + mock *MockOrderPutService +} + +// NewMockOrderPutService creates a new mock instance. +func NewMockOrderPutService(ctrl *gomock.Controller) *MockOrderPutService { + mock := &MockOrderPutService{ctrl: ctrl} + mock.recorder = &MockOrderPutServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockOrderPutService) EXPECT() *MockOrderPutServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockOrderPutService) Handle(ctx context.Context, in orderput.Input) (orderput.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(orderput.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockOrderPutServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockOrderPutService)(nil).Handle), ctx, in) +} + +// MockReportGetService is a mock of ReportGetService interface. +type MockReportGetService struct { + ctrl *gomock.Controller + recorder *MockReportGetServiceMockRecorder + isgomock struct{} +} + +// MockReportGetServiceMockRecorder is the mock recorder for MockReportGetService. +type MockReportGetServiceMockRecorder struct { + mock *MockReportGetService +} + +// NewMockReportGetService creates a new mock instance. +func NewMockReportGetService(ctrl *gomock.Controller) *MockReportGetService { + mock := &MockReportGetService{ctrl: ctrl} + mock.recorder = &MockReportGetServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockReportGetService) EXPECT() *MockReportGetServiceMockRecorder { + return m.recorder +} + +// Handle mocks base method. +func (m *MockReportGetService) Handle(ctx context.Context, in reportget.Input) (reportget.Result, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Handle", ctx, in) + ret0, _ := ret[0].(reportget.Result) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Handle indicates an expected call of Handle. +func (mr *MockReportGetServiceMockRecorder) Handle(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Handle", reflect.TypeOf((*MockReportGetService)(nil).Handle), ctx, in) +} + +// MockMembershipInvalidator is a mock of MembershipInvalidator interface. +type MockMembershipInvalidator struct { + ctrl *gomock.Controller + recorder *MockMembershipInvalidatorMockRecorder + isgomock struct{} +} + +// MockMembershipInvalidatorMockRecorder is the mock recorder for MockMembershipInvalidator. +type MockMembershipInvalidatorMockRecorder struct { + mock *MockMembershipInvalidator +} + +// NewMockMembershipInvalidator creates a new mock instance. +func NewMockMembershipInvalidator(ctrl *gomock.Controller) *MockMembershipInvalidator { + mock := &MockMembershipInvalidator{ctrl: ctrl} + mock.recorder = &MockMembershipInvalidatorMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockMembershipInvalidator) EXPECT() *MockMembershipInvalidatorMockRecorder { + return m.recorder +} + +// Invalidate mocks base method. +func (m *MockMembershipInvalidator) Invalidate(gameID string) { + m.ctrl.T.Helper() + m.ctrl.Call(m, "Invalidate", gameID) +} + +// Invalidate indicates an expected call of Invalidate. +func (mr *MockMembershipInvalidatorMockRecorder) Invalidate(gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Invalidate", reflect.TypeOf((*MockMembershipInvalidator)(nil).Invalidate), gameID) +} + +// MockEngineVersionService is a mock of EngineVersionService interface. +type MockEngineVersionService struct { + ctrl *gomock.Controller + recorder *MockEngineVersionServiceMockRecorder + isgomock struct{} +} + +// MockEngineVersionServiceMockRecorder is the mock recorder for MockEngineVersionService. +type MockEngineVersionServiceMockRecorder struct { + mock *MockEngineVersionService +} + +// NewMockEngineVersionService creates a new mock instance. +func NewMockEngineVersionService(ctrl *gomock.Controller) *MockEngineVersionService { + mock := &MockEngineVersionService{ctrl: ctrl} + mock.recorder = &MockEngineVersionServiceMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockEngineVersionService) EXPECT() *MockEngineVersionServiceMockRecorder { + return m.recorder +} + +// Create mocks base method. +func (m *MockEngineVersionService) Create(ctx context.Context, in engineversion0.CreateInput) (engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Create", ctx, in) + ret0, _ := ret[0].(engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Create indicates an expected call of Create. +func (mr *MockEngineVersionServiceMockRecorder) Create(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Create", reflect.TypeOf((*MockEngineVersionService)(nil).Create), ctx, in) +} + +// Deprecate mocks base method. +func (m *MockEngineVersionService) Deprecate(ctx context.Context, in engineversion0.DeprecateInput) error { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Deprecate", ctx, in) + ret0, _ := ret[0].(error) + return ret0 +} + +// Deprecate indicates an expected call of Deprecate. +func (mr *MockEngineVersionServiceMockRecorder) Deprecate(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Deprecate", reflect.TypeOf((*MockEngineVersionService)(nil).Deprecate), ctx, in) +} + +// Get mocks base method. +func (m *MockEngineVersionService) Get(ctx context.Context, version string) (engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Get", ctx, version) + ret0, _ := ret[0].(engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Get indicates an expected call of Get. +func (mr *MockEngineVersionServiceMockRecorder) Get(ctx, version any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockEngineVersionService)(nil).Get), ctx, version) +} + +// List mocks base method. +func (m *MockEngineVersionService) List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "List", ctx, statusFilter) + ret0, _ := ret[0].([]engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// List indicates an expected call of List. +func (mr *MockEngineVersionServiceMockRecorder) List(ctx, statusFilter any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "List", reflect.TypeOf((*MockEngineVersionService)(nil).List), ctx, statusFilter) +} + +// ResolveImageRef mocks base method. +func (m *MockEngineVersionService) ResolveImageRef(ctx context.Context, version string) (string, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ResolveImageRef", ctx, version) + ret0, _ := ret[0].(string) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ResolveImageRef indicates an expected call of ResolveImageRef. +func (mr *MockEngineVersionServiceMockRecorder) ResolveImageRef(ctx, version any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ResolveImageRef", reflect.TypeOf((*MockEngineVersionService)(nil).ResolveImageRef), ctx, version) +} + +// Update mocks base method. +func (m *MockEngineVersionService) Update(ctx context.Context, in engineversion0.UpdateInput) (engineversion.EngineVersion, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Update", ctx, in) + ret0, _ := ret[0].(engineversion.EngineVersion) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Update indicates an expected call of Update. +func (mr *MockEngineVersionServiceMockRecorder) Update(ctx, in any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Update", reflect.TypeOf((*MockEngineVersionService)(nil).Update), ctx, in) +} + +// MockRuntimeRecordsReader is a mock of RuntimeRecordsReader interface. +type MockRuntimeRecordsReader struct { + ctrl *gomock.Controller + recorder *MockRuntimeRecordsReaderMockRecorder + isgomock struct{} +} + +// MockRuntimeRecordsReaderMockRecorder is the mock recorder for MockRuntimeRecordsReader. +type MockRuntimeRecordsReaderMockRecorder struct { + mock *MockRuntimeRecordsReader +} + +// NewMockRuntimeRecordsReader creates a new mock instance. +func NewMockRuntimeRecordsReader(ctrl *gomock.Controller) *MockRuntimeRecordsReader { + mock := &MockRuntimeRecordsReader{ctrl: ctrl} + mock.recorder = &MockRuntimeRecordsReaderMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockRuntimeRecordsReader) EXPECT() *MockRuntimeRecordsReaderMockRecorder { + return m.recorder +} + +// Get mocks base method. +func (m *MockRuntimeRecordsReader) Get(ctx context.Context, gameID string) (runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "Get", ctx, gameID) + ret0, _ := ret[0].(runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// Get indicates an expected call of Get. +func (mr *MockRuntimeRecordsReaderMockRecorder) Get(ctx, gameID any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockRuntimeRecordsReader)(nil).Get), ctx, gameID) +} + +// List mocks base method. +func (m *MockRuntimeRecordsReader) List(ctx context.Context) ([]runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "List", ctx) + ret0, _ := ret[0].([]runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// List indicates an expected call of List. +func (mr *MockRuntimeRecordsReaderMockRecorder) List(ctx any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "List", reflect.TypeOf((*MockRuntimeRecordsReader)(nil).List), ctx) +} + +// ListByStatus mocks base method. +func (m *MockRuntimeRecordsReader) ListByStatus(ctx context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ListByStatus", ctx, status) + ret0, _ := ret[0].([]runtime.RuntimeRecord) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ListByStatus indicates an expected call of ListByStatus. +func (mr *MockRuntimeRecordsReaderMockRecorder) ListByStatus(ctx, status any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ListByStatus", reflect.TypeOf((*MockRuntimeRecordsReader)(nil).ListByStatus), ctx, status) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/patchruntime.go b/gamemaster/internal/api/internalhttp/handlers/patchruntime.go new file mode 100644 index 0000000..9c068e8 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/patchruntime.go @@ -0,0 +1,59 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/adminpatch" +) + +// patchRuntimeRequestBody mirrors the OpenAPI PatchRuntimeRequest +// schema. +type patchRuntimeRequestBody struct { + Version string `json:"version"` +} + +// newPatchRuntimeHandler returns the handler for +// `POST /api/v1/internal/runtimes/{game_id}/patch`. +func newPatchRuntimeHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.patch_runtime") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.PatchRuntime == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "patch runtime service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + var body patchRuntimeRequestBody + if err := decodeStrictJSON(request.Body, &body); err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + result, err := deps.PatchRuntime.Handle(request.Context(), adminpatch.Input{ + GameID: gameID, + Version: body.Version, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "patch runtime service errored", + "game_id", gameID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "patch runtime service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeJSON(writer, http.StatusOK, encodeRuntimeRecord(result.Record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/putorders.go b/gamemaster/internal/api/internalhttp/handlers/putorders.go new file mode 100644 index 0000000..9a7193c --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/putorders.go @@ -0,0 +1,58 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/orderput" +) + +// newPutOrdersHandler returns the handler for +// `POST /api/v1/internal/games/{game_id}/orders`. The shape and +// semantics mirror executeCommands: engine-owned body, raw JSON +// pass-through on success, error envelope on failure. +func newPutOrdersHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.put_orders") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.PutOrders == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "put orders service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + userID, ok := extractUserID(writer, request) + if !ok { + return + } + body, err := readRawJSONBody(request.Body) + if err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + result, err := deps.PutOrders.Handle(request.Context(), orderput.Input{ + GameID: gameID, + UserID: userID, + Payload: body, + }) + if err != nil { + logger.ErrorContext(request.Context(), "put orders service errored", + "game_id", gameID, + "user_id", userID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "put orders service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeRawJSON(writer, http.StatusOK, []byte(result.RawResponse)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/registerruntime.go b/gamemaster/internal/api/internalhttp/handlers/registerruntime.go new file mode 100644 index 0000000..a67bbda --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/registerruntime.go @@ -0,0 +1,81 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/registerruntime" +) + +// registerRuntimeRequestBody mirrors the OpenAPI +// RegisterRuntimeRequest schema. Strict decoding rejects unknown +// fields. +type registerRuntimeRequestBody struct { + EngineEndpoint string `json:"engine_endpoint"` + Members []registerRuntimeMemberBody `json:"members"` + TargetEngineVersion string `json:"target_engine_version"` + TurnSchedule string `json:"turn_schedule"` +} + +// registerRuntimeMemberBody mirrors the OpenAPI +// RegisterRuntimeMember schema. +type registerRuntimeMemberBody struct { + UserID string `json:"user_id"` + RaceName string `json:"race_name"` +} + +// newRegisterRuntimeHandler returns the handler for +// `POST /api/v1/internal/games/{game_id}/register-runtime`. +func newRegisterRuntimeHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.register_runtime") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.RegisterRuntime == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "register runtime service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + var body registerRuntimeRequestBody + if err := decodeStrictJSON(request.Body, &body); err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + members := make([]registerruntime.Member, 0, len(body.Members)) + for _, member := range body.Members { + members = append(members, registerruntime.Member{ + UserID: member.UserID, + RaceName: member.RaceName, + }) + } + + result, err := deps.RegisterRuntime.Handle(request.Context(), registerruntime.Input{ + GameID: gameID, + EngineEndpoint: body.EngineEndpoint, + Members: members, + TargetEngineVersion: body.TargetEngineVersion, + TurnSchedule: body.TurnSchedule, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "register runtime service errored", + "game_id", gameID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "register runtime service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeJSON(writer, http.StatusOK, encodeRuntimeRecord(result.Record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/resolveengineversionimageref.go b/gamemaster/internal/api/internalhttp/handlers/resolveengineversionimageref.go new file mode 100644 index 0000000..9c12693 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/resolveengineversionimageref.go @@ -0,0 +1,35 @@ +package handlers + +import "net/http" + +// newResolveEngineVersionImageRefHandler returns the handler for +// `GET /api/v1/internal/engine-versions/{version}/image-ref`. It is +// the hot-path Lobby calls before publishing a `runtime:start_jobs` +// envelope; the response carries only the image reference. +func newResolveEngineVersionImageRefHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.resolve_image_ref") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + version, ok := extractVersion(writer, request) + if !ok { + return + } + + imageRef, err := deps.EngineVersions.ResolveImageRef(request.Context(), version) + if err != nil { + logger.ErrorContext(request.Context(), "resolve image ref failed", + "version", version, + "err", err.Error(), + ) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeJSON(writer, http.StatusOK, imageRefResponse{ImageRef: imageRef}) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/services.go b/gamemaster/internal/api/internalhttp/handlers/services.go new file mode 100644 index 0000000..a26306b --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/services.go @@ -0,0 +1,98 @@ +package handlers + +import ( + "context" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/service/adminbanish" + "galaxy/gamemaster/internal/service/adminforce" + "galaxy/gamemaster/internal/service/adminpatch" + "galaxy/gamemaster/internal/service/adminstop" + "galaxy/gamemaster/internal/service/commandexecute" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" + "galaxy/gamemaster/internal/service/livenessreply" + "galaxy/gamemaster/internal/service/orderput" + "galaxy/gamemaster/internal/service/registerruntime" + "galaxy/gamemaster/internal/service/reportget" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=./mocks/mock_services.go -package=mocks galaxy/gamemaster/internal/api/internalhttp/handlers RegisterRuntimeService,ForceNextTurnService,StopRuntimeService,PatchRuntimeService,BanishRaceService,LivenessService,CommandExecuteService,OrderPutService,ReportGetService,MembershipInvalidator,EngineVersionService,RuntimeRecordsReader + +// RegisterRuntimeService wires the `internalRegisterRuntime` handler +// to the underlying register-runtime orchestrator. +type RegisterRuntimeService interface { + Handle(ctx context.Context, in registerruntime.Input) (registerruntime.Result, error) +} + +// ForceNextTurnService wires the `internalForceNextTurn` handler. +type ForceNextTurnService interface { + Handle(ctx context.Context, in adminforce.Input) (adminforce.Result, error) +} + +// StopRuntimeService wires the `internalStopRuntime` handler. +type StopRuntimeService interface { + Handle(ctx context.Context, in adminstop.Input) (adminstop.Result, error) +} + +// PatchRuntimeService wires the `internalPatchRuntime` handler. +type PatchRuntimeService interface { + Handle(ctx context.Context, in adminpatch.Input) (adminpatch.Result, error) +} + +// BanishRaceService wires the `internalBanishRace` handler. +type BanishRaceService interface { + Handle(ctx context.Context, in adminbanish.Input) (adminbanish.Result, error) +} + +// LivenessService wires the `internalGameLiveness` handler. +type LivenessService interface { + Handle(ctx context.Context, in livenessreply.Input) (livenessreply.Result, error) +} + +// CommandExecuteService wires the `internalExecuteCommands` handler. +type CommandExecuteService interface { + Handle(ctx context.Context, in commandexecute.Input) (commandexecute.Result, error) +} + +// OrderPutService wires the `internalPutOrders` handler. +type OrderPutService interface { + Handle(ctx context.Context, in orderput.Input) (orderput.Result, error) +} + +// ReportGetService wires the `internalGetReport` handler. +type ReportGetService interface { + Handle(ctx context.Context, in reportget.Input) (reportget.Result, error) +} + +// MembershipInvalidator wires the `internalInvalidateMemberships` +// handler. Backed by `service/membership.Cache.Invalidate`. +type MembershipInvalidator interface { + // Invalidate purges the in-process membership cache entry for + // gameID. The call is fire-and-forget and never returns an error; + // missing entries are a no-op. + Invalidate(gameID string) +} + +// EngineVersionService wires every engine-version registry handler. The +// service exposes one Go-error-returning method per OpenAPI operation; +// the handler layer translates the wrapped sentinel errors into +// `engine_version_*` codes via `mapServiceError`. +type EngineVersionService interface { + List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) + Get(ctx context.Context, version string) (engineversion.EngineVersion, error) + ResolveImageRef(ctx context.Context, version string) (string, error) + Create(ctx context.Context, in engineversionsvc.CreateInput) (engineversion.EngineVersion, error) + Update(ctx context.Context, in engineversionsvc.UpdateInput) (engineversion.EngineVersion, error) + Deprecate(ctx context.Context, in engineversionsvc.DeprecateInput) error +} + +// RuntimeRecordsReader exposes the read-only subset of +// `ports.RuntimeRecordStore` required by the get/list runtime +// handlers. The narrower surface keeps the handler layer from +// inadvertently mutating runtime state. +type RuntimeRecordsReader interface { + Get(ctx context.Context, gameID string) (runtime.RuntimeRecord, error) + List(ctx context.Context) ([]runtime.RuntimeRecord, error) + ListByStatus(ctx context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) +} diff --git a/gamemaster/internal/api/internalhttp/handlers/stopruntime.go b/gamemaster/internal/api/internalhttp/handlers/stopruntime.go new file mode 100644 index 0000000..2feb38c --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/stopruntime.go @@ -0,0 +1,59 @@ +package handlers + +import ( + "net/http" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/service/adminstop" +) + +// stopRuntimeRequestBody mirrors the OpenAPI StopRuntimeRequest +// schema. +type stopRuntimeRequestBody struct { + Reason string `json:"reason"` +} + +// newStopRuntimeHandler returns the handler for +// `POST /api/v1/internal/runtimes/{game_id}/stop`. +func newStopRuntimeHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.stop_runtime") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.StopRuntime == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "stop runtime service is not wired") + return + } + + gameID, ok := extractGameID(writer, request) + if !ok { + return + } + + var body stopRuntimeRequestBody + if err := decodeStrictJSON(request.Body, &body); err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + result, err := deps.StopRuntime.Handle(request.Context(), adminstop.Input{ + GameID: gameID, + Reason: body.Reason, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + }) + if err != nil { + logger.ErrorContext(request.Context(), "stop runtime service errored", + "game_id", gameID, + "err", err.Error(), + ) + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "stop runtime service failed") + return + } + + if result.Outcome == operation.OutcomeFailure { + writeFailure(writer, result.ErrorCode, result.ErrorMessage) + return + } + + writeJSON(writer, http.StatusOK, encodeRuntimeRecord(result.Record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/handlers/updateengineversion.go b/gamemaster/internal/api/internalhttp/handlers/updateengineversion.go new file mode 100644 index 0000000..ebacd1f --- /dev/null +++ b/gamemaster/internal/api/internalhttp/handlers/updateengineversion.go @@ -0,0 +1,69 @@ +package handlers + +import ( + "encoding/json" + "net/http" + + "galaxy/gamemaster/internal/domain/engineversion" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" +) + +// updateEngineVersionRequestBody mirrors the OpenAPI +// UpdateEngineVersionRequest schema. Every field is optional; the +// service rejects calls with no fields set as `invalid_request`. +type updateEngineVersionRequestBody struct { + ImageRef *string `json:"image_ref,omitempty"` + Options *json.RawMessage `json:"options,omitempty"` + Status *string `json:"status,omitempty"` +} + +// newUpdateEngineVersionHandler returns the handler for +// `PATCH /api/v1/internal/engine-versions/{version}`. +func newUpdateEngineVersionHandler(deps Dependencies) http.HandlerFunc { + logger := loggerFor(deps.Logger, "internal_rest.update_engine_version") + return func(writer http.ResponseWriter, request *http.Request) { + if deps.EngineVersions == nil { + writeError(writer, http.StatusInternalServerError, errorCodeInternal, "engine version service is not wired") + return + } + + version, ok := extractVersion(writer, request) + if !ok { + return + } + + var body updateEngineVersionRequestBody + if err := decodeStrictJSON(request.Body, &body); err != nil { + writeError(writer, http.StatusBadRequest, errorCodeInvalidRequest, err.Error()) + return + } + + input := engineversionsvc.UpdateInput{ + Version: version, + ImageRef: body.ImageRef, + OpSource: resolveOpSource(request), + SourceRef: requestSourceRef(request), + } + if body.Options != nil { + optionBytes := []byte(*body.Options) + input.Options = &optionBytes + } + if body.Status != nil { + candidate := engineversion.Status(*body.Status) + input.Status = &candidate + } + + record, err := deps.EngineVersions.Update(request.Context(), input) + if err != nil { + logger.ErrorContext(request.Context(), "update engine version failed", + "version", version, + "err", err.Error(), + ) + status, code, message := mapServiceError(err) + writeError(writer, status, code, message) + return + } + + writeJSON(writer, http.StatusOK, encodeEngineVersion(record)) + } +} diff --git a/gamemaster/internal/api/internalhttp/server.go b/gamemaster/internal/api/internalhttp/server.go new file mode 100644 index 0000000..a06d511 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/server.go @@ -0,0 +1,392 @@ +// Package internalhttp provides the trusted internal HTTP listener +// used by the runnable Game Master process. It exposes the `/healthz` +// and `/readyz` probes plus every internal REST operation declared in +// `gamemaster/api/internal-openapi.yaml`. Per-operation handlers live +// in the nested `handlers` package; this file owns the listener +// lifecycle and the probe routes only. +package internalhttp + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "net" + "net/http" + "strconv" + "sync" + "time" + + "galaxy/gamemaster/internal/api/internalhttp/handlers" + "galaxy/gamemaster/internal/telemetry" + + "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" + "go.opentelemetry.io/otel/attribute" +) + +const jsonContentType = "application/json; charset=utf-8" + +// errorCodeServiceUnavailable mirrors the stable error code declared in +// `gamemaster/api/internal-openapi.yaml` §Error Model. +const errorCodeServiceUnavailable = "service_unavailable" + +// HealthzPath and ReadyzPath are the internal probe routes documented in +// `gamemaster/api/internal-openapi.yaml`. +const ( + HealthzPath = "/healthz" + ReadyzPath = "/readyz" +) + +// ReadinessProbe reports whether the dependencies the listener guards +// (PostgreSQL, Redis) are reachable. A non-nil error is reported to the +// caller as `503 service_unavailable` with the wrapped message. +type ReadinessProbe interface { + Check(ctx context.Context) error +} + +// Config describes the trusted internal HTTP listener owned by Game +// Master. +type Config struct { + // Addr is the TCP listen address used by the internal HTTP server. + Addr string + + // ReadHeaderTimeout bounds how long the listener may spend reading + // request headers before the server rejects the connection. + ReadHeaderTimeout time.Duration + + // ReadTimeout bounds how long the listener may spend reading one + // request. + ReadTimeout time.Duration + + // WriteTimeout bounds how long the listener may spend writing one + // response. + WriteTimeout time.Duration + + // IdleTimeout bounds how long the listener keeps an idle keep-alive + // connection open. + IdleTimeout time.Duration +} + +// Validate reports whether cfg contains a usable internal HTTP listener +// configuration. +func (cfg Config) Validate() error { + switch { + case cfg.Addr == "": + return errors.New("internal HTTP addr must not be empty") + case cfg.ReadHeaderTimeout <= 0: + return errors.New("internal HTTP read header timeout must be positive") + case cfg.ReadTimeout <= 0: + return errors.New("internal HTTP read timeout must be positive") + case cfg.WriteTimeout <= 0: + return errors.New("internal HTTP write timeout must be positive") + case cfg.IdleTimeout <= 0: + return errors.New("internal HTTP idle timeout must be positive") + default: + return nil + } +} + +// Dependencies describes the collaborators used by the internal HTTP +// transport layer. The probe-only fields (Logger, Telemetry, +// Readiness) drive `/healthz` and `/readyz`; the remaining fields +// pass through to the per-operation handlers registered by +// `handlers.Register`. +type Dependencies struct { + // Logger writes structured listener lifecycle logs. When nil, + // slog.Default is used. + Logger *slog.Logger + + // Telemetry records low-cardinality probe metrics and lifecycle + // events. + Telemetry *telemetry.Runtime + + // Readiness reports whether PG / Redis are reachable. A nil + // readiness probe makes `/readyz` always answer `200`; the runtime + // always supplies a real probe in production wiring. + Readiness ReadinessProbe + + // RuntimeRecords backs the read-only list/get runtime endpoints. + RuntimeRecords handlers.RuntimeRecordsReader + + // RegisterRuntime is the orchestrator for `internalRegisterRuntime`. + RegisterRuntime handlers.RegisterRuntimeService + + // ForceNextTurn drives the synchronous force-next-turn flow. + ForceNextTurn handlers.ForceNextTurnService + + // StopRuntime drives the admin stop flow. + StopRuntime handlers.StopRuntimeService + + // PatchRuntime drives the admin patch flow. + PatchRuntime handlers.PatchRuntimeService + + // BanishRace drives the engine race-banish flow. + BanishRace handlers.BanishRaceService + + // InvalidateMemberships purges the in-process membership cache. + InvalidateMemberships handlers.MembershipInvalidator + + // GameLiveness returns the current runtime status without + // contacting the engine. + GameLiveness handlers.LivenessService + + // EngineVersions exposes the multi-method engine-version registry + // service. + EngineVersions handlers.EngineVersionService + + // CommandExecute forwards a player command batch to the engine. + CommandExecute handlers.CommandExecuteService + + // PutOrders forwards a player order batch to the engine. + PutOrders handlers.OrderPutService + + // GetReport reads a per-player turn report from the engine. + GetReport handlers.ReportGetService +} + +// Server owns the trusted internal HTTP listener exposed by Game Master. +type Server struct { + cfg Config + + handler http.Handler + logger *slog.Logger + metrics *telemetry.Runtime + + stateMu sync.RWMutex + server *http.Server + listener net.Listener +} + +// NewServer constructs one trusted internal HTTP server for cfg and deps. +func NewServer(cfg Config, deps Dependencies) (*Server, error) { + if err := cfg.Validate(); err != nil { + return nil, fmt.Errorf("new internal HTTP server: %w", err) + } + + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + + return &Server{ + cfg: cfg, + handler: newHandler(deps, logger), + logger: logger.With("component", "internal_http"), + metrics: deps.Telemetry, + }, nil +} + +// Addr returns the currently bound listener address after Run is called. +// It returns an empty string if the server has not yet bound a listener. +func (server *Server) Addr() string { + server.stateMu.RLock() + defer server.stateMu.RUnlock() + if server.listener == nil { + return "" + } + + return server.listener.Addr().String() +} + +// Run binds the configured listener and serves the internal HTTP surface +// until Shutdown closes the server. +func (server *Server) Run(ctx context.Context) error { + if ctx == nil { + return errors.New("run internal HTTP server: nil context") + } + if err := ctx.Err(); err != nil { + return err + } + + listener, err := net.Listen("tcp", server.cfg.Addr) + if err != nil { + return fmt.Errorf("run internal HTTP server: listen on %q: %w", server.cfg.Addr, err) + } + + httpServer := &http.Server{ + Handler: server.handler, + ReadHeaderTimeout: server.cfg.ReadHeaderTimeout, + ReadTimeout: server.cfg.ReadTimeout, + WriteTimeout: server.cfg.WriteTimeout, + IdleTimeout: server.cfg.IdleTimeout, + } + + server.stateMu.Lock() + server.server = httpServer + server.listener = listener + server.stateMu.Unlock() + + server.logger.Info("gamemaster internal HTTP server started", "addr", listener.Addr().String()) + + defer func() { + server.stateMu.Lock() + server.server = nil + server.listener = nil + server.stateMu.Unlock() + }() + + err = httpServer.Serve(listener) + switch { + case err == nil: + return nil + case errors.Is(err, http.ErrServerClosed): + server.logger.Info("gamemaster internal HTTP server stopped") + return nil + default: + return fmt.Errorf("run internal HTTP server: serve on %q: %w", server.cfg.Addr, err) + } +} + +// Shutdown gracefully stops the internal HTTP server within ctx. +func (server *Server) Shutdown(ctx context.Context) error { + if ctx == nil { + return errors.New("shutdown internal HTTP server: nil context") + } + + server.stateMu.RLock() + httpServer := server.server + server.stateMu.RUnlock() + + if httpServer == nil { + return nil + } + + if err := httpServer.Shutdown(ctx); err != nil && !errors.Is(err, http.ErrServerClosed) { + return fmt.Errorf("shutdown internal HTTP server: %w", err) + } + + return nil +} + +func newHandler(deps Dependencies, logger *slog.Logger) http.Handler { + mux := http.NewServeMux() + mux.HandleFunc("GET "+HealthzPath, handleHealthz) + mux.HandleFunc("GET "+ReadyzPath, handleReadyz(deps.Readiness, logger)) + handlers.Register(mux, handlers.Dependencies{ + Logger: logger, + RuntimeRecords: deps.RuntimeRecords, + RegisterRuntime: deps.RegisterRuntime, + ForceNextTurn: deps.ForceNextTurn, + StopRuntime: deps.StopRuntime, + PatchRuntime: deps.PatchRuntime, + BanishRace: deps.BanishRace, + InvalidateMemberships: deps.InvalidateMemberships, + GameLiveness: deps.GameLiveness, + EngineVersions: deps.EngineVersions, + CommandExecute: deps.CommandExecute, + PutOrders: deps.PutOrders, + GetReport: deps.GetReport, + }) + + metrics := deps.Telemetry + options := []otelhttp.Option{} + if metrics != nil { + options = append(options, + otelhttp.WithTracerProvider(metrics.TracerProvider()), + otelhttp.WithMeterProvider(metrics.MeterProvider()), + ) + } + + return otelhttp.NewHandler(withObservability(mux, metrics), "gamemaster.internal_http", options...) +} + +func withObservability(next http.Handler, metrics *telemetry.Runtime) http.Handler { + return http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) { + startedAt := time.Now() + recorder := &statusRecorder{ + ResponseWriter: writer, + statusCode: http.StatusOK, + } + + next.ServeHTTP(recorder, request) + + route := request.Pattern + switch recorder.statusCode { + case http.StatusMethodNotAllowed: + route = "method_not_allowed" + case http.StatusNotFound: + route = "not_found" + case 0: + route = "unmatched" + } + if route == "" { + route = "unmatched" + } + + if metrics != nil { + metrics.RecordInternalHTTPRequest( + request.Context(), + []attribute.KeyValue{ + attribute.String("route", route), + attribute.String("method", request.Method), + attribute.String("status_code", strconv.Itoa(recorder.statusCode)), + }, + time.Since(startedAt), + ) + } + }) +} + +func handleHealthz(writer http.ResponseWriter, _ *http.Request) { + writeStatusResponse(writer, http.StatusOK, "ok") +} + +func handleReadyz(probe ReadinessProbe, logger *slog.Logger) http.HandlerFunc { + return func(writer http.ResponseWriter, request *http.Request) { + if probe == nil { + writeStatusResponse(writer, http.StatusOK, "ready") + return + } + + if err := probe.Check(request.Context()); err != nil { + logger.WarnContext(request.Context(), "gamemaster readiness probe failed", + "err", err.Error(), + ) + writeServiceUnavailable(writer, err.Error()) + return + } + + writeStatusResponse(writer, http.StatusOK, "ready") + } +} + +func writeStatusResponse(writer http.ResponseWriter, statusCode int, status string) { + writer.Header().Set("Content-Type", jsonContentType) + writer.WriteHeader(statusCode) + _ = json.NewEncoder(writer).Encode(statusResponse{Status: status}) +} + +func writeServiceUnavailable(writer http.ResponseWriter, message string) { + writer.Header().Set("Content-Type", jsonContentType) + writer.WriteHeader(http.StatusServiceUnavailable) + _ = json.NewEncoder(writer).Encode(errorResponse{ + Error: errorBody{ + Code: errorCodeServiceUnavailable, + Message: message, + }, + }) +} + +type statusResponse struct { + Status string `json:"status"` +} + +type errorBody struct { + Code string `json:"code"` + Message string `json:"message"` +} + +type errorResponse struct { + Error errorBody `json:"error"` +} + +type statusRecorder struct { + http.ResponseWriter + statusCode int +} + +func (recorder *statusRecorder) WriteHeader(statusCode int) { + recorder.statusCode = statusCode + recorder.ResponseWriter.WriteHeader(statusCode) +} diff --git a/gamemaster/internal/api/internalhttp/server_test.go b/gamemaster/internal/api/internalhttp/server_test.go new file mode 100644 index 0000000..6b468d9 --- /dev/null +++ b/gamemaster/internal/api/internalhttp/server_test.go @@ -0,0 +1,142 @@ +package internalhttp + +import ( + "context" + "encoding/json" + "errors" + "net/http" + "net/http/httptest" + "strings" + "testing" + "time" + + "github.com/stretchr/testify/require" +) + +func newTestConfig() Config { + return Config{ + Addr: ":0", + ReadHeaderTimeout: time.Second, + ReadTimeout: time.Second, + WriteTimeout: time.Second, + IdleTimeout: time.Second, + } +} + +type stubReadiness struct { + err error +} + +func (probe stubReadiness) Check(_ context.Context) error { + return probe.err +} + +func newTestServer(t *testing.T, deps Dependencies) http.Handler { + t.Helper() + server, err := NewServer(newTestConfig(), deps) + require.NoError(t, err) + return server.handler +} + +func TestHealthzReturnsOK(t *testing.T) { + t.Parallel() + + handler := newTestServer(t, Dependencies{}) + + rec := httptest.NewRecorder() + req := httptest.NewRequest(http.MethodGet, HealthzPath, nil) + handler.ServeHTTP(rec, req) + + require.Equal(t, http.StatusOK, rec.Code) + require.Equal(t, jsonContentType, rec.Header().Get("Content-Type")) + + var body statusResponse + require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &body)) + require.Equal(t, "ok", body.Status) +} + +func TestReadyzReturnsReadyWhenProbeIsNil(t *testing.T) { + t.Parallel() + + handler := newTestServer(t, Dependencies{}) + + rec := httptest.NewRecorder() + req := httptest.NewRequest(http.MethodGet, ReadyzPath, nil) + handler.ServeHTTP(rec, req) + + require.Equal(t, http.StatusOK, rec.Code) + + var body statusResponse + require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &body)) + require.Equal(t, "ready", body.Status) +} + +func TestReadyzReturnsReadyWhenProbeSucceeds(t *testing.T) { + t.Parallel() + + handler := newTestServer(t, Dependencies{Readiness: stubReadiness{}}) + + rec := httptest.NewRecorder() + req := httptest.NewRequest(http.MethodGet, ReadyzPath, nil) + handler.ServeHTTP(rec, req) + + require.Equal(t, http.StatusOK, rec.Code) + + var body statusResponse + require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &body)) + require.Equal(t, "ready", body.Status) +} + +func TestReadyzReturnsServiceUnavailableWhenProbeFails(t *testing.T) { + t.Parallel() + + handler := newTestServer(t, Dependencies{ + Readiness: stubReadiness{err: errors.New("postgres ping: connection refused")}, + }) + + rec := httptest.NewRecorder() + req := httptest.NewRequest(http.MethodGet, ReadyzPath, nil) + handler.ServeHTTP(rec, req) + + require.Equal(t, http.StatusServiceUnavailable, rec.Code) + require.Equal(t, jsonContentType, rec.Header().Get("Content-Type")) + + var body errorResponse + require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &body)) + require.Equal(t, errorCodeServiceUnavailable, body.Error.Code) + require.True(t, strings.Contains(body.Error.Message, "postgres")) +} + +func TestNewServerRejectsInvalidConfig(t *testing.T) { + t.Parallel() + + _, err := NewServer(Config{}, Dependencies{}) + require.Error(t, err) +} + +func TestRunBindsListenerAndShutsDown(t *testing.T) { + t.Parallel() + + server, err := NewServer(newTestConfig(), Dependencies{}) + require.NoError(t, err) + + runErr := make(chan error, 1) + go func() { + runErr <- server.Run(t.Context()) + }() + + require.Eventually(t, func() bool { + return server.Addr() != "" + }, time.Second, 10*time.Millisecond, "listener should bind quickly") + + shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), time.Second) + defer shutdownCancel() + require.NoError(t, server.Shutdown(shutdownCtx)) + + select { + case err := <-runErr: + require.NoError(t, err) + case <-time.After(time.Second): + t.Fatal("server did not return after shutdown") + } +} diff --git a/gamemaster/internal/app/app.go b/gamemaster/internal/app/app.go new file mode 100644 index 0000000..22dd9b3 --- /dev/null +++ b/gamemaster/internal/app/app.go @@ -0,0 +1,170 @@ +// Package app wires the Game Master process lifecycle and coordinates +// component startup and graceful shutdown. +package app + +import ( + "context" + "errors" + "fmt" + "sync" + + "galaxy/gamemaster/internal/config" +) + +// Component is a long-lived Game Master subsystem that participates in +// coordinated startup and graceful shutdown. +type Component interface { + // Run starts the component and blocks until it stops. + Run(context.Context) error + + // Shutdown stops the component within the provided timeout-bounded + // context. + Shutdown(context.Context) error +} + +// App owns the process-level lifecycle of Game Master and its registered +// components. +type App struct { + cfg config.Config + components []Component +} + +// New constructs App with a defensive copy of the supplied components. +func New(cfg config.Config, components ...Component) *App { + clonedComponents := append([]Component(nil), components...) + + return &App{ + cfg: cfg, + components: clonedComponents, + } +} + +// Run starts all configured components, waits for cancellation or the +// first component failure, and then executes best-effort graceful +// shutdown. +func (app *App) Run(ctx context.Context) error { + if ctx == nil { + return errors.New("run gamemaster app: nil context") + } + if err := app.validate(); err != nil { + return err + } + if len(app.components) == 0 { + <-ctx.Done() + return nil + } + + runCtx, cancel := context.WithCancel(ctx) + defer cancel() + + results := make(chan componentResult, len(app.components)) + var runWaitGroup sync.WaitGroup + + for index, component := range app.components { + runWaitGroup.Add(1) + + go func(componentIndex int, component Component) { + defer runWaitGroup.Done() + results <- componentResult{ + index: componentIndex, + err: component.Run(runCtx), + } + }(index, component) + } + + var runErr error + + select { + case <-ctx.Done(): + case result := <-results: + runErr = classifyComponentResult(ctx, result) + } + + cancel() + + shutdownErr := app.shutdownComponents() + waitErr := app.waitForComponents(&runWaitGroup) + + return errors.Join(runErr, shutdownErr, waitErr) +} + +type componentResult struct { + index int + err error +} + +func (app *App) validate() error { + if app.cfg.ShutdownTimeout <= 0 { + return fmt.Errorf("run gamemaster app: shutdown timeout must be positive, got %s", app.cfg.ShutdownTimeout) + } + + for index, component := range app.components { + if component == nil { + return fmt.Errorf("run gamemaster app: component %d is nil", index) + } + } + + return nil +} + +func classifyComponentResult(parentCtx context.Context, result componentResult) error { + switch { + case result.err == nil: + if parentCtx.Err() != nil { + return nil + } + return fmt.Errorf("run gamemaster app: component %d exited without error before shutdown", result.index) + case errors.Is(result.err, context.Canceled) && parentCtx.Err() != nil: + return nil + default: + return fmt.Errorf("run gamemaster app: component %d: %w", result.index, result.err) + } +} + +func (app *App) shutdownComponents() error { + var shutdownWaitGroup sync.WaitGroup + errs := make(chan error, len(app.components)) + + for index, component := range app.components { + shutdownWaitGroup.Add(1) + + go func(componentIndex int, component Component) { + defer shutdownWaitGroup.Done() + + shutdownCtx, cancel := context.WithTimeout(context.Background(), app.cfg.ShutdownTimeout) + defer cancel() + + if err := component.Shutdown(shutdownCtx); err != nil { + errs <- fmt.Errorf("shutdown gamemaster component %d: %w", componentIndex, err) + } + }(index, component) + } + + shutdownWaitGroup.Wait() + close(errs) + + var joined error + for err := range errs { + joined = errors.Join(joined, err) + } + + return joined +} + +func (app *App) waitForComponents(runWaitGroup *sync.WaitGroup) error { + done := make(chan struct{}) + go func() { + runWaitGroup.Wait() + close(done) + }() + + waitCtx, cancel := context.WithTimeout(context.Background(), app.cfg.ShutdownTimeout) + defer cancel() + + select { + case <-done: + return nil + case <-waitCtx.Done(): + return fmt.Errorf("wait for gamemaster components: %w", waitCtx.Err()) + } +} diff --git a/gamemaster/internal/app/app_test.go b/gamemaster/internal/app/app_test.go new file mode 100644 index 0000000..9b05fcc --- /dev/null +++ b/gamemaster/internal/app/app_test.go @@ -0,0 +1,125 @@ +package app + +import ( + "context" + "errors" + "strings" + "sync/atomic" + "testing" + "time" + + "galaxy/gamemaster/internal/config" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +type fakeComponent struct { + runErr error + shutdownErr error + runHook func(context.Context) error + shutdownHook func(context.Context) error + runCount atomic.Int32 + downCount atomic.Int32 + blockForCtx bool +} + +func (component *fakeComponent) Run(ctx context.Context) error { + component.runCount.Add(1) + if component.runHook != nil { + return component.runHook(ctx) + } + if component.blockForCtx { + <-ctx.Done() + return ctx.Err() + } + + return component.runErr +} + +func (component *fakeComponent) Shutdown(ctx context.Context) error { + component.downCount.Add(1) + if component.shutdownHook != nil { + return component.shutdownHook(ctx) + } + + return component.shutdownErr +} + +func newCfg() config.Config { + return config.Config{ShutdownTimeout: time.Second} +} + +func TestAppRunWithoutComponentsBlocksUntilContextDone(t *testing.T) { + t.Parallel() + + app := New(newCfg()) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + + require.NoError(t, app.Run(ctx)) +} + +func TestAppRunReturnsOnContextCancel(t *testing.T) { + t.Parallel() + + component := &fakeComponent{blockForCtx: true} + app := New(newCfg(), component) + + ctx, cancel := context.WithCancel(context.Background()) + go func() { + time.Sleep(10 * time.Millisecond) + cancel() + }() + + require.NoError(t, app.Run(ctx)) + assert.EqualValues(t, 1, component.runCount.Load()) + assert.EqualValues(t, 1, component.downCount.Load()) +} + +func TestAppRunPropagatesComponentFailure(t *testing.T) { + t.Parallel() + + failure := errors.New("boom") + component := &fakeComponent{runErr: failure} + app := New(newCfg(), component) + + err := app.Run(context.Background()) + require.Error(t, err) + require.ErrorIs(t, err, failure) + assert.EqualValues(t, 1, component.downCount.Load()) +} + +func TestAppRunFailsOnNilContext(t *testing.T) { + t.Parallel() + + app := New(newCfg()) + var ctx context.Context + require.Error(t, app.Run(ctx)) +} + +func TestAppRunFailsOnNonPositiveShutdownTimeout(t *testing.T) { + t.Parallel() + + app := New(config.Config{}, &fakeComponent{}) + require.Error(t, app.Run(context.Background())) +} + +func TestAppRunFailsOnNilComponent(t *testing.T) { + t.Parallel() + + app := New(newCfg(), nil) + require.Error(t, app.Run(context.Background())) +} + +func TestAppRunFlagsCleanExitBeforeShutdown(t *testing.T) { + t.Parallel() + + component := &fakeComponent{} + app := New(newCfg(), component) + + err := app.Run(context.Background()) + require.Error(t, err) + require.True(t, strings.Contains(err.Error(), "exited without error")) +} diff --git a/gamemaster/internal/app/bootstrap.go b/gamemaster/internal/app/bootstrap.go new file mode 100644 index 0000000..94f0aff --- /dev/null +++ b/gamemaster/internal/app/bootstrap.go @@ -0,0 +1,45 @@ +package app + +import ( + "context" + "errors" + + "galaxy/redisconn" + + "galaxy/gamemaster/internal/config" + "galaxy/gamemaster/internal/telemetry" + + "github.com/redis/go-redis/v9" +) + +// newRedisClient builds the master Redis client from cfg via the shared +// `pkg/redisconn` helper. Replica clients are not opened in this iteration +// per ARCHITECTURE.md §Persistence Backends; they will be wired when read +// routing is introduced. +func newRedisClient(cfg config.RedisConfig) *redis.Client { + return redisconn.NewMasterClient(cfg.Conn) +} + +// instrumentRedisClient attaches the OpenTelemetry tracing and metrics +// instrumentation to client when telemetryRuntime is available. The +// actual instrumentation lives in `pkg/redisconn` so every Galaxy service +// shares one surface. +func instrumentRedisClient(redisClient *redis.Client, telemetryRuntime *telemetry.Runtime) error { + if redisClient == nil { + return errors.New("instrument redis client: nil client") + } + if telemetryRuntime == nil { + return nil + } + return redisconn.Instrument(redisClient, + redisconn.WithTracerProvider(telemetryRuntime.TracerProvider()), + redisconn.WithMeterProvider(telemetryRuntime.MeterProvider()), + ) +} + +// pingRedis performs a single Redis PING bounded by +// cfg.Conn.OperationTimeout to confirm that the configured Redis endpoint +// is reachable at startup. +func pingRedis(ctx context.Context, cfg config.RedisConfig, redisClient *redis.Client) error { + return redisconn.Ping(ctx, redisClient, cfg.Conn.OperationTimeout) +} diff --git a/gamemaster/internal/app/runtime.go b/gamemaster/internal/app/runtime.go new file mode 100644 index 0000000..50b4d03 --- /dev/null +++ b/gamemaster/internal/app/runtime.go @@ -0,0 +1,238 @@ +package app + +import ( + "context" + "database/sql" + "errors" + "fmt" + "log/slog" + "time" + + "galaxy/postgres" + "galaxy/redisconn" + + "galaxy/gamemaster/internal/adapters/postgres/migrations" + "galaxy/gamemaster/internal/api/internalhttp" + "galaxy/gamemaster/internal/config" + "galaxy/gamemaster/internal/telemetry" + + "github.com/redis/go-redis/v9" +) + +// Runtime owns the runnable Game Master process plus the cleanup +// functions that release runtime resources after shutdown. +type Runtime struct { + cfg config.Config + + app *App + + wiring *wiring + + internalServer *internalhttp.Server + + cleanupFns []func() error +} + +// NewRuntime constructs the runnable Game Master process from cfg. +// +// The runtime opens one shared `*redis.Client`, one `*sql.DB`, and one +// OpenTelemetry runtime; all are released in reverse construction order +// on shutdown. Embedded goose migrations apply synchronously after the +// PostgreSQL pool is opened and pinged, before any listener is constructed. +func NewRuntime(ctx context.Context, cfg config.Config, logger *slog.Logger) (*Runtime, error) { + if ctx == nil { + return nil, errors.New("new gamemaster runtime: nil context") + } + if err := cfg.Validate(); err != nil { + return nil, fmt.Errorf("new gamemaster runtime: %w", err) + } + if logger == nil { + logger = slog.Default() + } + + runtime := &Runtime{ + cfg: cfg, + } + + cleanupOnError := func(err error) (*Runtime, error) { + if cleanupErr := runtime.Close(); cleanupErr != nil { + return nil, fmt.Errorf("%w; cleanup: %w", err, cleanupErr) + } + + return nil, err + } + + telemetryRuntime, err := telemetry.NewProcess(ctx, telemetry.ProcessConfig{ + ServiceName: cfg.Telemetry.ServiceName, + TracesExporter: cfg.Telemetry.TracesExporter, + MetricsExporter: cfg.Telemetry.MetricsExporter, + TracesProtocol: cfg.Telemetry.TracesProtocol, + MetricsProtocol: cfg.Telemetry.MetricsProtocol, + StdoutTracesEnabled: cfg.Telemetry.StdoutTracesEnabled, + StdoutMetricsEnabled: cfg.Telemetry.StdoutMetricsEnabled, + }, logger) + if err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: telemetry: %w", err)) + } + runtime.cleanupFns = append(runtime.cleanupFns, func() error { + shutdownCtx, cancel := context.WithTimeout(context.Background(), cfg.ShutdownTimeout) + defer cancel() + return telemetryRuntime.Shutdown(shutdownCtx) + }) + + redisClient := newRedisClient(cfg.Redis) + if err := instrumentRedisClient(redisClient, telemetryRuntime); err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: %w", err)) + } + runtime.cleanupFns = append(runtime.cleanupFns, func() error { + err := redisClient.Close() + if errors.Is(err, redis.ErrClosed) { + return nil + } + return err + }) + if err := pingRedis(ctx, cfg.Redis, redisClient); err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: %w", err)) + } + + pgPool, err := postgres.OpenPrimary(ctx, cfg.Postgres.Conn, + postgres.WithTracerProvider(telemetryRuntime.TracerProvider()), + postgres.WithMeterProvider(telemetryRuntime.MeterProvider()), + ) + if err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: open postgres: %w", err)) + } + runtime.cleanupFns = append(runtime.cleanupFns, pgPool.Close) + unregisterPGStats, err := postgres.InstrumentDBStats(pgPool, + postgres.WithMeterProvider(telemetryRuntime.MeterProvider()), + ) + if err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: instrument postgres: %w", err)) + } + runtime.cleanupFns = append(runtime.cleanupFns, func() error { + return unregisterPGStats() + }) + if err := postgres.Ping(ctx, pgPool, cfg.Postgres.Conn.OperationTimeout); err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: ping postgres: %w", err)) + } + if err := postgres.RunMigrations(ctx, pgPool, migrations.FS(), "."); err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: run postgres migrations: %w", err)) + } + + wiring, err := newWiring(cfg, redisClient, pgPool, time.Now, logger, telemetryRuntime) + if err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: wiring: %w", err)) + } + runtime.wiring = wiring + runtime.cleanupFns = append(runtime.cleanupFns, wiring.close) + + probe := newReadinessProbe(pgPool, redisClient, cfg) + + internalServer, err := internalhttp.NewServer(internalhttp.Config{ + Addr: cfg.InternalHTTP.Addr, + ReadHeaderTimeout: cfg.InternalHTTP.ReadHeaderTimeout, + ReadTimeout: cfg.InternalHTTP.ReadTimeout, + WriteTimeout: cfg.InternalHTTP.WriteTimeout, + IdleTimeout: cfg.InternalHTTP.IdleTimeout, + }, internalhttp.Dependencies{ + Logger: logger, + Telemetry: telemetryRuntime, + Readiness: probe, + RuntimeRecords: wiring.runtimeRecords, + RegisterRuntime: wiring.registerRuntimeSvc, + ForceNextTurn: wiring.forceNextTurnSvc, + StopRuntime: wiring.stopRuntimeSvc, + PatchRuntime: wiring.patchRuntimeSvc, + BanishRace: wiring.banishRaceSvc, + InvalidateMemberships: wiring.membershipCache, + GameLiveness: wiring.livenessSvc, + EngineVersions: wiring.engineVersionSvc, + CommandExecute: wiring.commandExecuteSvc, + PutOrders: wiring.orderPutSvc, + GetReport: wiring.reportGetSvc, + }) + if err != nil { + return cleanupOnError(fmt.Errorf("new gamemaster runtime: internal HTTP server: %w", err)) + } + runtime.internalServer = internalServer + + runtime.app = New(cfg, + internalServer, + wiring.schedulerTicker, + wiring.healthEventsConsumer, + ) + + return runtime, nil +} + +// InternalServer returns the internal HTTP server owned by runtime. It is +// primarily exposed for tests; production code should not depend on it. +func (runtime *Runtime) InternalServer() *internalhttp.Server { + if runtime == nil { + return nil + } + + return runtime.internalServer +} + +// Run serves the internal HTTP listener until ctx is canceled or one +// component fails. +func (runtime *Runtime) Run(ctx context.Context) error { + if ctx == nil { + return errors.New("run gamemaster runtime: nil context") + } + if runtime == nil { + return errors.New("run gamemaster runtime: nil runtime") + } + if runtime.app == nil { + return errors.New("run gamemaster runtime: nil app") + } + + return runtime.app.Run(ctx) +} + +// Close releases every runtime dependency in reverse construction order. +// Close is safe to call multiple times. +func (runtime *Runtime) Close() error { + if runtime == nil { + return nil + } + + var joined error + for index := len(runtime.cleanupFns) - 1; index >= 0; index-- { + if err := runtime.cleanupFns[index](); err != nil { + joined = errors.Join(joined, err) + } + } + runtime.cleanupFns = nil + + return joined +} + +// readinessProbe pings every steady-state dependency the listener +// guards: PostgreSQL primary and Redis master. +type readinessProbe struct { + pgPool *sql.DB + redisClient *redis.Client + + postgresTimeout time.Duration + redisTimeout time.Duration +} + +func newReadinessProbe(pgPool *sql.DB, redisClient *redis.Client, cfg config.Config) *readinessProbe { + return &readinessProbe{ + pgPool: pgPool, + redisClient: redisClient, + postgresTimeout: cfg.Postgres.Conn.OperationTimeout, + redisTimeout: cfg.Redis.Conn.OperationTimeout, + } +} + +// Check pings PostgreSQL and Redis. The first failing dependency aborts +// the check so callers see a single, actionable error. +func (probe *readinessProbe) Check(ctx context.Context) error { + if err := postgres.Ping(ctx, probe.pgPool, probe.postgresTimeout); err != nil { + return err + } + return redisconn.Ping(ctx, probe.redisClient, probe.redisTimeout) +} diff --git a/gamemaster/internal/app/wiring.go b/gamemaster/internal/app/wiring.go new file mode 100644 index 0000000..8f07754 --- /dev/null +++ b/gamemaster/internal/app/wiring.go @@ -0,0 +1,479 @@ +package app + +import ( + "database/sql" + "errors" + "fmt" + "log/slog" + "time" + + "galaxy/gamemaster/internal/adapters/engineclient" + "galaxy/gamemaster/internal/adapters/lobbyclient" + "galaxy/gamemaster/internal/adapters/lobbyeventspublisher" + "galaxy/gamemaster/internal/adapters/notificationpublisher" + "galaxy/gamemaster/internal/adapters/postgres/engineversionstore" + "galaxy/gamemaster/internal/adapters/postgres/operationlog" + "galaxy/gamemaster/internal/adapters/postgres/playermappingstore" + "galaxy/gamemaster/internal/adapters/postgres/runtimerecordstore" + "galaxy/gamemaster/internal/adapters/redisstate/streamoffsets" + "galaxy/gamemaster/internal/adapters/rtmclient" + "galaxy/gamemaster/internal/config" + "galaxy/gamemaster/internal/service/adminbanish" + "galaxy/gamemaster/internal/service/adminforce" + "galaxy/gamemaster/internal/service/adminpatch" + "galaxy/gamemaster/internal/service/adminstop" + "galaxy/gamemaster/internal/service/commandexecute" + engineversionsvc "galaxy/gamemaster/internal/service/engineversion" + "galaxy/gamemaster/internal/service/livenessreply" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/service/orderput" + "galaxy/gamemaster/internal/service/registerruntime" + "galaxy/gamemaster/internal/service/reportget" + "galaxy/gamemaster/internal/service/scheduler" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" + "galaxy/gamemaster/internal/worker/healtheventsconsumer" + "galaxy/gamemaster/internal/worker/schedulerticker" + + "github.com/redis/go-redis/v9" +) + +// wiring owns the process-level singletons constructed once during +// `NewRuntime` and consumed by every worker and HTTP handler. Stage +// 19 grew the struct to hold every store, adapter, service and +// worker required by the listener and the long-lived components. +type wiring struct { + cfg config.Config + + redisClient *redis.Client + pgPool *sql.DB + + clock func() time.Time + + logger *slog.Logger + telemetry *telemetry.Runtime + + // Stores. + runtimeRecords *runtimerecordstore.Store + engineVersions *engineversionstore.Store + playerMappings *playermappingstore.Store + operationLogs *operationlog.Store + streamOffsets *streamoffsets.Store + + // External adapters. + engineClient *engineclient.Client + lobbyClient *lobbyclient.Client + rtmClient *rtmclient.Client + notificationPublisher *notificationpublisher.Publisher + lobbyEventsPublisher *lobbyeventspublisher.Publisher + + // Services. + membershipCache *membership.Cache + registerRuntimeSvc *registerruntime.Service + engineVersionSvc *engineversionsvc.Service + stopRuntimeSvc *adminstop.Service + forceNextTurnSvc *adminforce.Service + patchRuntimeSvc *adminpatch.Service + banishRaceSvc *adminbanish.Service + livenessSvc *livenessreply.Service + commandExecuteSvc *commandexecute.Service + orderPutSvc *orderput.Service + reportGetSvc *reportget.Service + schedulerSvc *scheduler.Service + turnGenerationSvc *turngeneration.Service + + // Workers. + schedulerTicker *schedulerticker.Worker + healthEventsConsumer *healtheventsconsumer.Worker + + // closers releases adapter-level resources at runtime shutdown. + closers []func() error +} + +// newWiring constructs the process-level dependency set. It validates +// every required collaborator so callers can rely on them being +// non-nil. Construction proceeds in four phases: persistence stores, +// external adapters, services, workers. Each phase is in its own +// helper to keep the function readable. +func newWiring( + cfg config.Config, + redisClient *redis.Client, + pgPool *sql.DB, + clock func() time.Time, + logger *slog.Logger, + telemetryRuntime *telemetry.Runtime, +) (*wiring, error) { + if redisClient == nil { + return nil, errors.New("new gamemaster wiring: nil redis client") + } + if pgPool == nil { + return nil, errors.New("new gamemaster wiring: nil postgres pool") + } + if clock == nil { + clock = time.Now + } + if logger == nil { + logger = slog.Default() + } + if telemetryRuntime == nil { + return nil, fmt.Errorf("new gamemaster wiring: nil telemetry runtime") + } + + w := &wiring{ + cfg: cfg, + redisClient: redisClient, + pgPool: pgPool, + clock: clock, + logger: logger, + telemetry: telemetryRuntime, + } + + if err := w.buildPersistence(); err != nil { + return nil, fmt.Errorf("new gamemaster wiring: persistence: %w", err) + } + if err := w.buildAdapters(); err != nil { + return nil, fmt.Errorf("new gamemaster wiring: adapters: %w", err) + } + if err := w.buildServices(); err != nil { + return nil, fmt.Errorf("new gamemaster wiring: services: %w", err) + } + if err := w.buildWorkers(); err != nil { + return nil, fmt.Errorf("new gamemaster wiring: workers: %w", err) + } + + return w, nil +} + +// buildPersistence constructs the four PostgreSQL stores plus the +// Redis-backed stream-offset store. The stores share the connection +// pools opened by the runtime; their lifecycles are owned by the +// runtime, not the wiring. +func (w *wiring) buildPersistence() error { + timeout := w.cfg.Postgres.Conn.OperationTimeout + + runtimeRecords, err := runtimerecordstore.New(runtimerecordstore.Config{ + DB: w.pgPool, + OperationTimeout: timeout, + }) + if err != nil { + return fmt.Errorf("runtime record store: %w", err) + } + w.runtimeRecords = runtimeRecords + + engineVersions, err := engineversionstore.New(engineversionstore.Config{ + DB: w.pgPool, + OperationTimeout: timeout, + }) + if err != nil { + return fmt.Errorf("engine version store: %w", err) + } + w.engineVersions = engineVersions + + playerMappings, err := playermappingstore.New(playermappingstore.Config{ + DB: w.pgPool, + OperationTimeout: timeout, + }) + if err != nil { + return fmt.Errorf("player mapping store: %w", err) + } + w.playerMappings = playerMappings + + operationLogs, err := operationlog.New(operationlog.Config{ + DB: w.pgPool, + OperationTimeout: timeout, + }) + if err != nil { + return fmt.Errorf("operation log store: %w", err) + } + w.operationLogs = operationLogs + + streamOffsets, err := streamoffsets.New(streamoffsets.Config{Client: w.redisClient}) + if err != nil { + return fmt.Errorf("stream offset store: %w", err) + } + w.streamOffsets = streamOffsets + + return nil +} + +// buildAdapters constructs the HTTP clients (engine, Lobby, Runtime +// Manager) and the two Redis Stream publishers. Their `Close` hooks +// are appended to w.closers so idle TCP connections are released on +// shutdown. +func (w *wiring) buildAdapters() error { + engine, err := engineclient.NewClient(engineclient.Config{ + CallTimeout: w.cfg.EngineClient.CallTimeout, + ProbeTimeout: w.cfg.EngineClient.ProbeTimeout, + }) + if err != nil { + return fmt.Errorf("engine client: %w", err) + } + w.engineClient = engine + w.closers = append(w.closers, engine.Close) + + lobby, err := lobbyclient.NewClient(lobbyclient.Config{ + BaseURL: w.cfg.Lobby.BaseURL, + RequestTimeout: w.cfg.Lobby.Timeout, + }) + if err != nil { + return fmt.Errorf("lobby client: %w", err) + } + w.lobbyClient = lobby + w.closers = append(w.closers, lobby.Close) + + rtm, err := rtmclient.NewClient(rtmclient.Config{ + BaseURL: w.cfg.RTM.BaseURL, + RequestTimeout: w.cfg.RTM.Timeout, + }) + if err != nil { + return fmt.Errorf("rtm client: %w", err) + } + w.rtmClient = rtm + w.closers = append(w.closers, rtm.Close) + + notification, err := notificationpublisher.NewPublisher(notificationpublisher.Config{ + Client: w.redisClient, + Stream: w.cfg.Streams.NotificationIntents, + }) + if err != nil { + return fmt.Errorf("notification publisher: %w", err) + } + w.notificationPublisher = notification + + lobbyEvents, err := lobbyeventspublisher.NewPublisher(lobbyeventspublisher.Config{ + Client: w.redisClient, + Stream: w.cfg.Streams.LobbyEvents, + }) + if err != nil { + return fmt.Errorf("lobby events publisher: %w", err) + } + w.lobbyEventsPublisher = lobbyEvents + + return nil +} + +// buildServices constructs every service-layer collaborator consumed +// by the REST listener and the workers. Construction order matters +// only between turngeneration → adminforce (the latter wraps the +// former) and between membership cache → command/order/report +// services. +func (w *wiring) buildServices() error { + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: w.lobbyClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + TTL: w.cfg.MembershipCache.TTL, + MaxGames: w.cfg.MembershipCache.MaxGames, + }) + if err != nil { + return fmt.Errorf("membership cache: %w", err) + } + w.membershipCache = cache + + w.schedulerSvc = scheduler.New() + + registerSvc, err := registerruntime.NewService(registerruntime.Dependencies{ + RuntimeRecords: w.runtimeRecords, + EngineVersions: w.engineVersions, + PlayerMappings: w.playerMappings, + OperationLogs: w.operationLogs, + Engine: w.engineClient, + LobbyEvents: w.lobbyEventsPublisher, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("register runtime service: %w", err) + } + w.registerRuntimeSvc = registerSvc + + engineVersionSvc, err := engineversionsvc.NewService(engineversionsvc.Dependencies{ + EngineVersions: w.engineVersions, + OperationLogs: w.operationLogs, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("engine version service: %w", err) + } + w.engineVersionSvc = engineVersionSvc + + turnGen, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: w.runtimeRecords, + PlayerMappings: w.playerMappings, + OperationLogs: w.operationLogs, + Engine: w.engineClient, + LobbyEvents: w.lobbyEventsPublisher, + Notifications: w.notificationPublisher, + Lobby: w.lobbyClient, + Scheduler: w.schedulerSvc, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("turn generation service: %w", err) + } + w.turnGenerationSvc = turnGen + + stopSvc, err := adminstop.NewService(adminstop.Dependencies{ + RuntimeRecords: w.runtimeRecords, + OperationLogs: w.operationLogs, + RTM: w.rtmClient, + LobbyEvents: w.lobbyEventsPublisher, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("admin stop service: %w", err) + } + w.stopRuntimeSvc = stopSvc + + forceSvc, err := adminforce.NewService(adminforce.Dependencies{ + RuntimeRecords: w.runtimeRecords, + OperationLogs: w.operationLogs, + TurnGeneration: turnGen, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("admin force service: %w", err) + } + w.forceNextTurnSvc = forceSvc + + patchSvc, err := adminpatch.NewService(adminpatch.Dependencies{ + RuntimeRecords: w.runtimeRecords, + EngineVersions: w.engineVersions, + OperationLogs: w.operationLogs, + RTM: w.rtmClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("admin patch service: %w", err) + } + w.patchRuntimeSvc = patchSvc + + banishSvc, err := adminbanish.NewService(adminbanish.Dependencies{ + RuntimeRecords: w.runtimeRecords, + PlayerMappings: w.playerMappings, + OperationLogs: w.operationLogs, + Engine: w.engineClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("admin banish service: %w", err) + } + w.banishRaceSvc = banishSvc + + livenessSvc, err := livenessreply.NewService(livenessreply.Dependencies{ + RuntimeRecords: w.runtimeRecords, + Logger: w.logger, + }) + if err != nil { + return fmt.Errorf("liveness reply service: %w", err) + } + w.livenessSvc = livenessSvc + + commandSvc, err := commandexecute.NewService(commandexecute.Dependencies{ + RuntimeRecords: w.runtimeRecords, + PlayerMappings: w.playerMappings, + Membership: cache, + Engine: w.engineClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("command execute service: %w", err) + } + w.commandExecuteSvc = commandSvc + + orderSvc, err := orderput.NewService(orderput.Dependencies{ + RuntimeRecords: w.runtimeRecords, + PlayerMappings: w.playerMappings, + Membership: cache, + Engine: w.engineClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("put orders service: %w", err) + } + w.orderPutSvc = orderSvc + + reportSvc, err := reportget.NewService(reportget.Dependencies{ + RuntimeRecords: w.runtimeRecords, + PlayerMappings: w.playerMappings, + Membership: cache, + Engine: w.engineClient, + Telemetry: w.telemetry, + Logger: w.logger, + Clock: w.clock, + }) + if err != nil { + return fmt.Errorf("get report service: %w", err) + } + w.reportGetSvc = reportSvc + + return nil +} + +// buildWorkers constructs the long-lived components started by +// `App.Run` alongside the listener: the per-second scheduler ticker +// and the runtime:health_events consumer. +func (w *wiring) buildWorkers() error { + ticker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: w.runtimeRecords, + TurnGeneration: w.turnGenerationSvc, + Telemetry: w.telemetry, + Interval: w.cfg.Scheduler.TickInterval, + Clock: w.clock, + Logger: w.logger, + }) + if err != nil { + return fmt.Errorf("scheduler ticker: %w", err) + } + w.schedulerTicker = ticker + + healthConsumer, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: w.redisClient, + Stream: w.cfg.Streams.HealthEvents, + BlockTimeout: w.cfg.Streams.BlockTimeout, + OffsetStore: w.streamOffsets, + RuntimeRecords: w.runtimeRecords, + LobbyEvents: w.lobbyEventsPublisher, + Telemetry: w.telemetry, + Clock: w.clock, + Logger: w.logger, + }) + if err != nil { + return fmt.Errorf("health events consumer: %w", err) + } + w.healthEventsConsumer = healthConsumer + + return nil +} + +// close releases adapter-level resources owned by the wiring layer. +// Returns the joined error of every closer; the caller is expected +// to invoke this once during process shutdown. Closers run in LIFO +// order so the resource opened last is released first. +func (w *wiring) close() error { + var joined error + for index := len(w.closers) - 1; index >= 0; index-- { + if err := w.closers[index](); err != nil { + joined = errors.Join(joined, err) + } + } + w.closers = nil + return joined +} diff --git a/gamemaster/internal/config/config.go b/gamemaster/internal/config/config.go new file mode 100644 index 0000000..047b0a1 --- /dev/null +++ b/gamemaster/internal/config/config.go @@ -0,0 +1,448 @@ +// Package config loads the Game Master process configuration from +// environment variables. +package config + +import ( + "fmt" + "strings" + "time" + + "galaxy/postgres" + "galaxy/redisconn" + + "galaxy/gamemaster/internal/telemetry" +) + +const ( + envPrefix = "GAMEMASTER" + + shutdownTimeoutEnvVar = "GAMEMASTER_SHUTDOWN_TIMEOUT" + logLevelEnvVar = "GAMEMASTER_LOG_LEVEL" + + internalHTTPAddrEnvVar = "GAMEMASTER_INTERNAL_HTTP_ADDR" + internalHTTPReadHeaderTimeoutEnvVar = "GAMEMASTER_INTERNAL_HTTP_READ_HEADER_TIMEOUT" + internalHTTPReadTimeoutEnvVar = "GAMEMASTER_INTERNAL_HTTP_READ_TIMEOUT" + internalHTTPWriteTimeoutEnvVar = "GAMEMASTER_INTERNAL_HTTP_WRITE_TIMEOUT" + internalHTTPIdleTimeoutEnvVar = "GAMEMASTER_INTERNAL_HTTP_IDLE_TIMEOUT" + + lobbyEventsStreamEnvVar = "GAMEMASTER_REDIS_LOBBY_EVENTS_STREAM" + healthEventsStreamEnvVar = "GAMEMASTER_REDIS_HEALTH_EVENTS_STREAM" + notificationIntentsStreamEnvVar = "GAMEMASTER_REDIS_NOTIFICATION_INTENTS_STREAM" + streamBlockTimeoutEnvVar = "GAMEMASTER_STREAM_BLOCK_TIMEOUT" + + engineCallTimeoutEnvVar = "GAMEMASTER_ENGINE_CALL_TIMEOUT" + engineProbeTimeoutEnvVar = "GAMEMASTER_ENGINE_PROBE_TIMEOUT" + + lobbyInternalBaseURLEnvVar = "GAMEMASTER_LOBBY_INTERNAL_BASE_URL" + lobbyInternalTimeoutEnvVar = "GAMEMASTER_LOBBY_INTERNAL_TIMEOUT" + + rtmInternalBaseURLEnvVar = "GAMEMASTER_RTM_INTERNAL_BASE_URL" + rtmInternalTimeoutEnvVar = "GAMEMASTER_RTM_INTERNAL_TIMEOUT" + + schedulerTickIntervalEnvVar = "GAMEMASTER_SCHEDULER_TICK_INTERVAL" + turnGenerationTimeoutEnvVar = "GAMEMASTER_TURN_GENERATION_TIMEOUT" + membershipCacheTTLEnvVar = "GAMEMASTER_MEMBERSHIP_CACHE_TTL" + membershipCacheMaxGamesEnvVar = "GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES" + + otelServiceNameEnvVar = "OTEL_SERVICE_NAME" + otelTracesExporterEnvVar = "OTEL_TRACES_EXPORTER" + otelMetricsExporterEnvVar = "OTEL_METRICS_EXPORTER" + otelExporterOTLPProtocolEnvVar = "OTEL_EXPORTER_OTLP_PROTOCOL" + otelExporterOTLPTracesProtocolEnvVar = "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL" + otelExporterOTLPMetricsProtocolEnvVar = "OTEL_EXPORTER_OTLP_METRICS_PROTOCOL" + otelStdoutTracesEnabledEnvVar = "GAMEMASTER_OTEL_STDOUT_TRACES_ENABLED" + otelStdoutMetricsEnabledEnvVar = "GAMEMASTER_OTEL_STDOUT_METRICS_ENABLED" + + defaultShutdownTimeout = 30 * time.Second + defaultLogLevel = "info" + defaultInternalHTTPAddr = ":8097" + defaultReadHeaderTimeout = 2 * time.Second + defaultReadTimeout = 5 * time.Second + defaultWriteTimeout = 30 * time.Second + defaultIdleTimeout = 60 * time.Second + + defaultLobbyEventsStream = "gm:lobby_events" + defaultHealthEventsStream = "runtime:health_events" + defaultNotificationIntentsStream = "notification:intents" + defaultStreamBlockTimeout = 5 * time.Second + + defaultEngineCallTimeout = 30 * time.Second + defaultEngineProbeTimeout = 5 * time.Second + + defaultLobbyInternalTimeout = 2 * time.Second + defaultRTMInternalTimeout = 5 * time.Second + + defaultSchedulerTickInterval = time.Second + defaultTurnGenerationTimeout = 60 * time.Second + defaultMembershipCacheTTL = 30 * time.Second + defaultMembershipCacheMaxGames = 4096 + + defaultOTelServiceName = "galaxy-gamemaster" +) + +// Config stores the full Game Master process configuration. +type Config struct { + // ShutdownTimeout bounds graceful shutdown of every long-lived + // component. + ShutdownTimeout time.Duration + + // Logging configures the process-wide structured logger. + Logging LoggingConfig + + // InternalHTTP configures the trusted internal HTTP listener. + InternalHTTP InternalHTTPConfig + + // Postgres configures the PostgreSQL-backed durable store consumed + // via `pkg/postgres`. + Postgres PostgresConfig + + // Redis configures the shared Redis connection topology consumed via + // `pkg/redisconn`. + Redis RedisConfig + + // Streams stores the stable Redis Stream names GM reads from and + // writes to. + Streams StreamsConfig + + // EngineClient configures per-call timeouts of the engine HTTP + // client. + EngineClient EngineClientConfig + + // Lobby configures the synchronous Lobby internal REST client. + Lobby LobbyClientConfig + + // RTM configures the synchronous Runtime Manager internal REST + // client. + RTM RTMClientConfig + + // Scheduler configures the scheduler ticker worker and the per-turn + // generation deadline. + Scheduler SchedulerConfig + + // MembershipCache configures the in-process membership cache. + MembershipCache MembershipCacheConfig + + // Telemetry configures the process-wide OpenTelemetry runtime. + Telemetry TelemetryConfig +} + +// LoggingConfig configures the process-wide structured logger. +type LoggingConfig struct { + // Level stores the process log level accepted by log/slog. + Level string +} + +// InternalHTTPConfig configures the trusted internal HTTP listener. +type InternalHTTPConfig struct { + // Addr stores the TCP listen address. + Addr string + + // ReadHeaderTimeout bounds request-header reading. + ReadHeaderTimeout time.Duration + + // ReadTimeout bounds reading one request. + ReadTimeout time.Duration + + // WriteTimeout bounds writing one response. + WriteTimeout time.Duration + + // IdleTimeout bounds how long keep-alive connections stay open. + IdleTimeout time.Duration +} + +// Validate reports whether cfg stores a usable internal HTTP listener +// configuration. +func (cfg InternalHTTPConfig) Validate() error { + switch { + case strings.TrimSpace(cfg.Addr) == "": + return fmt.Errorf("internal HTTP addr must not be empty") + case !isTCPAddr(cfg.Addr): + return fmt.Errorf("internal HTTP addr %q must use host:port form", cfg.Addr) + case cfg.ReadHeaderTimeout <= 0: + return fmt.Errorf("internal HTTP read header timeout must be positive") + case cfg.ReadTimeout <= 0: + return fmt.Errorf("internal HTTP read timeout must be positive") + case cfg.WriteTimeout <= 0: + return fmt.Errorf("internal HTTP write timeout must be positive") + case cfg.IdleTimeout <= 0: + return fmt.Errorf("internal HTTP idle timeout must be positive") + default: + return nil + } +} + +// PostgresConfig configures the PostgreSQL-backed durable store consumed +// via `pkg/postgres`. +type PostgresConfig struct { + // Conn carries the primary plus replica DSN topology and pool tuning. + Conn postgres.Config +} + +// Validate reports whether cfg stores a usable PostgreSQL configuration. +func (cfg PostgresConfig) Validate() error { + return cfg.Conn.Validate() +} + +// RedisConfig configures the Game Master Redis connection topology. +type RedisConfig struct { + // Conn carries the connection topology (master, replicas, password, + // db, per-call timeout). + Conn redisconn.Config +} + +// Validate reports whether cfg stores a usable Redis configuration. +func (cfg RedisConfig) Validate() error { + return cfg.Conn.Validate() +} + +// StreamsConfig stores the stable Redis Stream names used by Game Master. +type StreamsConfig struct { + // LobbyEvents stores the Redis Streams key GM publishes runtime + // snapshot updates and game-finished events to. + LobbyEvents string + + // HealthEvents stores the Redis Streams key GM consumes runtime + // health events from. + HealthEvents string + + // NotificationIntents stores the Redis Streams key GM publishes + // notification intents to. + NotificationIntents string + + // BlockTimeout bounds the maximum blocking read window for stream + // consumers. + BlockTimeout time.Duration +} + +// Validate reports whether cfg stores usable stream names. +func (cfg StreamsConfig) Validate() error { + switch { + case strings.TrimSpace(cfg.LobbyEvents) == "": + return fmt.Errorf("redis lobby events stream must not be empty") + case strings.TrimSpace(cfg.HealthEvents) == "": + return fmt.Errorf("redis health events stream must not be empty") + case strings.TrimSpace(cfg.NotificationIntents) == "": + return fmt.Errorf("redis notification intents stream must not be empty") + case cfg.BlockTimeout <= 0: + return fmt.Errorf("redis stream block timeout must be positive") + default: + return nil + } +} + +// EngineClientConfig configures per-call timeouts of the engine HTTP +// client. +type EngineClientConfig struct { + // CallTimeout bounds one full engine call (including turn generation + // for large games). + CallTimeout time.Duration + + // ProbeTimeout bounds inspect-style reads against the engine. + ProbeTimeout time.Duration +} + +// Validate reports whether cfg stores usable engine client timeouts. +func (cfg EngineClientConfig) Validate() error { + switch { + case cfg.CallTimeout <= 0: + return fmt.Errorf("engine call timeout must be positive") + case cfg.ProbeTimeout <= 0: + return fmt.Errorf("engine probe timeout must be positive") + default: + return nil + } +} + +// LobbyClientConfig configures the synchronous Lobby internal REST +// client. +type LobbyClientConfig struct { + // BaseURL stores the trusted Lobby internal listener base URL. + BaseURL string + + // Timeout bounds one Lobby internal request. + Timeout time.Duration +} + +// Validate reports whether cfg stores a usable Lobby client +// configuration. +func (cfg LobbyClientConfig) Validate() error { + switch { + case strings.TrimSpace(cfg.BaseURL) == "": + return fmt.Errorf("lobby internal base url must not be empty") + case !isHTTPURL(cfg.BaseURL): + return fmt.Errorf("lobby internal base url %q must be an absolute http(s) URL", cfg.BaseURL) + case cfg.Timeout <= 0: + return fmt.Errorf("lobby internal timeout must be positive") + default: + return nil + } +} + +// RTMClientConfig configures the synchronous Runtime Manager internal +// REST client. +type RTMClientConfig struct { + // BaseURL stores the trusted Runtime Manager internal listener base + // URL. + BaseURL string + + // Timeout bounds one Runtime Manager internal request. + Timeout time.Duration +} + +// Validate reports whether cfg stores a usable Runtime Manager client +// configuration. +func (cfg RTMClientConfig) Validate() error { + switch { + case strings.TrimSpace(cfg.BaseURL) == "": + return fmt.Errorf("rtm internal base url must not be empty") + case !isHTTPURL(cfg.BaseURL): + return fmt.Errorf("rtm internal base url %q must be an absolute http(s) URL", cfg.BaseURL) + case cfg.Timeout <= 0: + return fmt.Errorf("rtm internal timeout must be positive") + default: + return nil + } +} + +// SchedulerConfig configures the scheduler ticker worker and the +// per-turn generation deadline. +type SchedulerConfig struct { + // TickInterval is the period between two scheduler scans for due + // runtime records. + TickInterval time.Duration + + // TurnGenerationTimeout bounds one engine `/admin/turn` call from + // the scheduler's perspective. + TurnGenerationTimeout time.Duration +} + +// Validate reports whether cfg stores usable scheduler timings. +func (cfg SchedulerConfig) Validate() error { + switch { + case cfg.TickInterval <= 0: + return fmt.Errorf("scheduler tick interval must be positive") + case cfg.TurnGenerationTimeout <= 0: + return fmt.Errorf("turn generation timeout must be positive") + default: + return nil + } +} + +// MembershipCacheConfig configures the in-process membership cache. +type MembershipCacheConfig struct { + // TTL bounds how long an unobserved membership entry stays cached + // before a forced reload from Lobby. + TTL time.Duration + + // MaxGames bounds how many games can populate the cache before + // LRU eviction kicks in. + MaxGames int +} + +// Validate reports whether cfg stores usable membership cache settings. +func (cfg MembershipCacheConfig) Validate() error { + switch { + case cfg.TTL <= 0: + return fmt.Errorf("membership cache ttl must be positive") + case cfg.MaxGames <= 0: + return fmt.Errorf("membership cache max games must be positive") + default: + return nil + } +} + +// TelemetryConfig configures the Game Master OpenTelemetry runtime. +type TelemetryConfig struct { + // ServiceName overrides the default OpenTelemetry service name. + ServiceName string + + // TracesExporter selects the external traces exporter. Supported + // values are `none` and `otlp`. + TracesExporter string + + // MetricsExporter selects the external metrics exporter. Supported + // values are `none` and `otlp`. + MetricsExporter string + + // TracesProtocol selects the OTLP traces protocol when + // TracesExporter is `otlp`. + TracesProtocol string + + // MetricsProtocol selects the OTLP metrics protocol when + // MetricsExporter is `otlp`. + MetricsProtocol string + + // StdoutTracesEnabled enables the additional stdout trace exporter + // used for local development and debugging. + StdoutTracesEnabled bool + + // StdoutMetricsEnabled enables the additional stdout metric + // exporter used for local development and debugging. + StdoutMetricsEnabled bool +} + +// Validate reports whether cfg contains a supported OpenTelemetry +// configuration. +func (cfg TelemetryConfig) Validate() error { + return telemetry.ProcessConfig{ + ServiceName: cfg.ServiceName, + TracesExporter: cfg.TracesExporter, + MetricsExporter: cfg.MetricsExporter, + TracesProtocol: cfg.TracesProtocol, + MetricsProtocol: cfg.MetricsProtocol, + StdoutTracesEnabled: cfg.StdoutTracesEnabled, + StdoutMetricsEnabled: cfg.StdoutMetricsEnabled, + }.Validate() +} + +// DefaultConfig returns the default Game Master process configuration. +func DefaultConfig() Config { + return Config{ + ShutdownTimeout: defaultShutdownTimeout, + Logging: LoggingConfig{ + Level: defaultLogLevel, + }, + InternalHTTP: InternalHTTPConfig{ + Addr: defaultInternalHTTPAddr, + ReadHeaderTimeout: defaultReadHeaderTimeout, + ReadTimeout: defaultReadTimeout, + WriteTimeout: defaultWriteTimeout, + IdleTimeout: defaultIdleTimeout, + }, + Postgres: PostgresConfig{ + Conn: postgres.DefaultConfig(), + }, + Redis: RedisConfig{ + Conn: redisconn.DefaultConfig(), + }, + Streams: StreamsConfig{ + LobbyEvents: defaultLobbyEventsStream, + HealthEvents: defaultHealthEventsStream, + NotificationIntents: defaultNotificationIntentsStream, + BlockTimeout: defaultStreamBlockTimeout, + }, + EngineClient: EngineClientConfig{ + CallTimeout: defaultEngineCallTimeout, + ProbeTimeout: defaultEngineProbeTimeout, + }, + Lobby: LobbyClientConfig{ + Timeout: defaultLobbyInternalTimeout, + }, + RTM: RTMClientConfig{ + Timeout: defaultRTMInternalTimeout, + }, + Scheduler: SchedulerConfig{ + TickInterval: defaultSchedulerTickInterval, + TurnGenerationTimeout: defaultTurnGenerationTimeout, + }, + MembershipCache: MembershipCacheConfig{ + TTL: defaultMembershipCacheTTL, + MaxGames: defaultMembershipCacheMaxGames, + }, + Telemetry: TelemetryConfig{ + ServiceName: defaultOTelServiceName, + TracesExporter: "none", + MetricsExporter: "none", + }, + } +} diff --git a/gamemaster/internal/config/config_test.go b/gamemaster/internal/config/config_test.go new file mode 100644 index 0000000..acb6a96 --- /dev/null +++ b/gamemaster/internal/config/config_test.go @@ -0,0 +1,169 @@ +package config + +import ( + "strings" + "testing" + "time" + + "github.com/stretchr/testify/require" +) + +func validEnv(t *testing.T) { + t.Helper() + + t.Setenv("GAMEMASTER_INTERNAL_HTTP_ADDR", ":8097") + t.Setenv("GAMEMASTER_POSTGRES_PRIMARY_DSN", "postgres://gm:secret@localhost:5432/galaxy?search_path=gamemaster&sslmode=disable") + t.Setenv("GAMEMASTER_REDIS_MASTER_ADDR", "localhost:6379") + t.Setenv("GAMEMASTER_REDIS_PASSWORD", "secret") + t.Setenv("GAMEMASTER_LOBBY_INTERNAL_BASE_URL", "http://lobby:8095") + t.Setenv("GAMEMASTER_RTM_INTERNAL_BASE_URL", "http://rtmanager:8096") +} + +func TestLoadFromEnvAcceptsDefaults(t *testing.T) { + validEnv(t) + + cfg, err := LoadFromEnv() + require.NoError(t, err) + + require.Equal(t, ":8097", cfg.InternalHTTP.Addr) + require.Equal(t, 30*time.Second, cfg.ShutdownTimeout) + require.Equal(t, "info", cfg.Logging.Level) + require.Equal(t, "gm:lobby_events", cfg.Streams.LobbyEvents) + require.Equal(t, "runtime:health_events", cfg.Streams.HealthEvents) + require.Equal(t, "notification:intents", cfg.Streams.NotificationIntents) + require.Equal(t, 5*time.Second, cfg.Streams.BlockTimeout) + require.Equal(t, 30*time.Second, cfg.EngineClient.CallTimeout) + require.Equal(t, 5*time.Second, cfg.EngineClient.ProbeTimeout) + require.Equal(t, "http://lobby:8095", cfg.Lobby.BaseURL) + require.Equal(t, 2*time.Second, cfg.Lobby.Timeout) + require.Equal(t, "http://rtmanager:8096", cfg.RTM.BaseURL) + require.Equal(t, 5*time.Second, cfg.RTM.Timeout) + require.Equal(t, time.Second, cfg.Scheduler.TickInterval) + require.Equal(t, 60*time.Second, cfg.Scheduler.TurnGenerationTimeout) + require.Equal(t, 30*time.Second, cfg.MembershipCache.TTL) + require.Equal(t, 4096, cfg.MembershipCache.MaxGames) + require.Equal(t, "galaxy-gamemaster", cfg.Telemetry.ServiceName) +} + +func TestLoadFromEnvHonoursOverrides(t *testing.T) { + validEnv(t) + t.Setenv("GAMEMASTER_INTERNAL_HTTP_ADDR", ":9097") + t.Setenv("GAMEMASTER_REDIS_LOBBY_EVENTS_STREAM", "custom:lobby_events") + t.Setenv("GAMEMASTER_ENGINE_CALL_TIMEOUT", "45s") + t.Setenv("GAMEMASTER_SCHEDULER_TICK_INTERVAL", "500ms") + t.Setenv("GAMEMASTER_MEMBERSHIP_CACHE_TTL", "60s") + t.Setenv("GAMEMASTER_MEMBERSHIP_CACHE_MAX_GAMES", "1024") + + cfg, err := LoadFromEnv() + require.NoError(t, err) + + require.Equal(t, ":9097", cfg.InternalHTTP.Addr) + require.Equal(t, "custom:lobby_events", cfg.Streams.LobbyEvents) + require.Equal(t, 45*time.Second, cfg.EngineClient.CallTimeout) + require.Equal(t, 500*time.Millisecond, cfg.Scheduler.TickInterval) + require.Equal(t, 60*time.Second, cfg.MembershipCache.TTL) + require.Equal(t, 1024, cfg.MembershipCache.MaxGames) +} + +func TestLoadFromEnvRequiresInternalHTTPAddr(t *testing.T) { + t.Setenv("GAMEMASTER_POSTGRES_PRIMARY_DSN", "postgres://gm:secret@localhost:5432/galaxy") + t.Setenv("GAMEMASTER_REDIS_MASTER_ADDR", "localhost:6379") + t.Setenv("GAMEMASTER_REDIS_PASSWORD", "secret") + t.Setenv("GAMEMASTER_LOBBY_INTERNAL_BASE_URL", "http://lobby:8095") + t.Setenv("GAMEMASTER_RTM_INTERNAL_BASE_URL", "http://rtmanager:8096") + + _, err := LoadFromEnv() + require.Error(t, err) + require.Contains(t, err.Error(), "GAMEMASTER_INTERNAL_HTTP_ADDR") +} + +func TestLoadFromEnvRequiresLobbyBaseURL(t *testing.T) { + t.Setenv("GAMEMASTER_INTERNAL_HTTP_ADDR", ":8097") + t.Setenv("GAMEMASTER_POSTGRES_PRIMARY_DSN", "postgres://gm:secret@localhost:5432/galaxy") + t.Setenv("GAMEMASTER_REDIS_MASTER_ADDR", "localhost:6379") + t.Setenv("GAMEMASTER_REDIS_PASSWORD", "secret") + t.Setenv("GAMEMASTER_RTM_INTERNAL_BASE_URL", "http://rtmanager:8096") + + _, err := LoadFromEnv() + require.Error(t, err) + require.Contains(t, err.Error(), "GAMEMASTER_LOBBY_INTERNAL_BASE_URL") +} + +func TestLoadFromEnvRequiresRTMBaseURL(t *testing.T) { + t.Setenv("GAMEMASTER_INTERNAL_HTTP_ADDR", ":8097") + t.Setenv("GAMEMASTER_POSTGRES_PRIMARY_DSN", "postgres://gm:secret@localhost:5432/galaxy") + t.Setenv("GAMEMASTER_REDIS_MASTER_ADDR", "localhost:6379") + t.Setenv("GAMEMASTER_REDIS_PASSWORD", "secret") + t.Setenv("GAMEMASTER_LOBBY_INTERNAL_BASE_URL", "http://lobby:8095") + + _, err := LoadFromEnv() + require.Error(t, err) + require.Contains(t, err.Error(), "GAMEMASTER_RTM_INTERNAL_BASE_URL") +} + +func TestLoadFromEnvRejectsBadLogLevel(t *testing.T) { + validEnv(t) + t.Setenv("GAMEMASTER_LOG_LEVEL", "verbose") + + _, err := LoadFromEnv() + require.Error(t, err) + require.Contains(t, err.Error(), "GAMEMASTER_LOG_LEVEL") +} + +func TestLoadFromEnvRejectsBadDuration(t *testing.T) { + validEnv(t) + t.Setenv("GAMEMASTER_ENGINE_CALL_TIMEOUT", "thirty seconds") + + _, err := LoadFromEnv() + require.Error(t, err) + require.Contains(t, err.Error(), "GAMEMASTER_ENGINE_CALL_TIMEOUT") +} + +func TestInternalHTTPValidateRejectsBadAddr(t *testing.T) { + cfg := DefaultConfig().InternalHTTP + cfg.Addr = "not-an-addr" + err := cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "host:port") +} + +func TestStreamsValidateRequiresAllNames(t *testing.T) { + cfg := DefaultConfig().Streams + cfg.LobbyEvents = " " + err := cfg.Validate() + require.Error(t, err) + require.True(t, strings.Contains(err.Error(), "lobby events")) +} + +func TestLobbyClientValidateRejectsBadURL(t *testing.T) { + cfg := LobbyClientConfig{BaseURL: "ftp://lobby", Timeout: time.Second} + err := cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "http(s)") +} + +func TestRTMClientValidateRejectsEmptyURL(t *testing.T) { + cfg := RTMClientConfig{BaseURL: " ", Timeout: time.Second} + err := cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "rtm internal base url") +} + +func TestSchedulerValidateRejectsZeroInterval(t *testing.T) { + cfg := SchedulerConfig{TickInterval: 0, TurnGenerationTimeout: time.Second} + err := cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "scheduler tick interval") +} + +func TestMembershipCacheValidateRejectsZero(t *testing.T) { + cfg := MembershipCacheConfig{TTL: 0, MaxGames: 1} + err := cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "ttl") + + cfg = MembershipCacheConfig{TTL: time.Second, MaxGames: 0} + err = cfg.Validate() + require.Error(t, err) + require.Contains(t, err.Error(), "max games") +} diff --git a/gamemaster/internal/config/env.go b/gamemaster/internal/config/env.go new file mode 100644 index 0000000..99c4751 --- /dev/null +++ b/gamemaster/internal/config/env.go @@ -0,0 +1,219 @@ +package config + +import ( + "fmt" + "os" + "strconv" + "strings" + "time" + + "galaxy/postgres" + "galaxy/redisconn" +) + +// LoadFromEnv builds Config from environment variables and validates the +// resulting configuration. +func LoadFromEnv() (Config, error) { + cfg := DefaultConfig() + + var err error + + cfg.ShutdownTimeout, err = durationEnv(shutdownTimeoutEnvVar, cfg.ShutdownTimeout) + if err != nil { + return Config{}, err + } + + cfg.Logging.Level = stringEnv(logLevelEnvVar, cfg.Logging.Level) + + addr, ok := os.LookupEnv(internalHTTPAddrEnvVar) + if !ok || strings.TrimSpace(addr) == "" { + return Config{}, fmt.Errorf("%s must be set", internalHTTPAddrEnvVar) + } + cfg.InternalHTTP.Addr = strings.TrimSpace(addr) + cfg.InternalHTTP.ReadHeaderTimeout, err = durationEnv(internalHTTPReadHeaderTimeoutEnvVar, cfg.InternalHTTP.ReadHeaderTimeout) + if err != nil { + return Config{}, err + } + cfg.InternalHTTP.ReadTimeout, err = durationEnv(internalHTTPReadTimeoutEnvVar, cfg.InternalHTTP.ReadTimeout) + if err != nil { + return Config{}, err + } + cfg.InternalHTTP.WriteTimeout, err = durationEnv(internalHTTPWriteTimeoutEnvVar, cfg.InternalHTTP.WriteTimeout) + if err != nil { + return Config{}, err + } + cfg.InternalHTTP.IdleTimeout, err = durationEnv(internalHTTPIdleTimeoutEnvVar, cfg.InternalHTTP.IdleTimeout) + if err != nil { + return Config{}, err + } + + pgConn, err := postgres.LoadFromEnv(envPrefix) + if err != nil { + return Config{}, err + } + cfg.Postgres.Conn = pgConn + + redisConn, err := redisconn.LoadFromEnv(envPrefix) + if err != nil { + return Config{}, err + } + cfg.Redis.Conn = redisConn + + cfg.Streams.LobbyEvents = stringEnv(lobbyEventsStreamEnvVar, cfg.Streams.LobbyEvents) + cfg.Streams.HealthEvents = stringEnv(healthEventsStreamEnvVar, cfg.Streams.HealthEvents) + cfg.Streams.NotificationIntents = stringEnv(notificationIntentsStreamEnvVar, cfg.Streams.NotificationIntents) + cfg.Streams.BlockTimeout, err = durationEnv(streamBlockTimeoutEnvVar, cfg.Streams.BlockTimeout) + if err != nil { + return Config{}, err + } + + cfg.EngineClient.CallTimeout, err = durationEnv(engineCallTimeoutEnvVar, cfg.EngineClient.CallTimeout) + if err != nil { + return Config{}, err + } + cfg.EngineClient.ProbeTimeout, err = durationEnv(engineProbeTimeoutEnvVar, cfg.EngineClient.ProbeTimeout) + if err != nil { + return Config{}, err + } + + lobbyURL, ok := os.LookupEnv(lobbyInternalBaseURLEnvVar) + if !ok || strings.TrimSpace(lobbyURL) == "" { + return Config{}, fmt.Errorf("%s must be set", lobbyInternalBaseURLEnvVar) + } + cfg.Lobby.BaseURL = strings.TrimSpace(lobbyURL) + cfg.Lobby.Timeout, err = durationEnv(lobbyInternalTimeoutEnvVar, cfg.Lobby.Timeout) + if err != nil { + return Config{}, err + } + + rtmURL, ok := os.LookupEnv(rtmInternalBaseURLEnvVar) + if !ok || strings.TrimSpace(rtmURL) == "" { + return Config{}, fmt.Errorf("%s must be set", rtmInternalBaseURLEnvVar) + } + cfg.RTM.BaseURL = strings.TrimSpace(rtmURL) + cfg.RTM.Timeout, err = durationEnv(rtmInternalTimeoutEnvVar, cfg.RTM.Timeout) + if err != nil { + return Config{}, err + } + + cfg.Scheduler.TickInterval, err = durationEnv(schedulerTickIntervalEnvVar, cfg.Scheduler.TickInterval) + if err != nil { + return Config{}, err + } + cfg.Scheduler.TurnGenerationTimeout, err = durationEnv(turnGenerationTimeoutEnvVar, cfg.Scheduler.TurnGenerationTimeout) + if err != nil { + return Config{}, err + } + + cfg.MembershipCache.TTL, err = durationEnv(membershipCacheTTLEnvVar, cfg.MembershipCache.TTL) + if err != nil { + return Config{}, err + } + cfg.MembershipCache.MaxGames, err = intEnv(membershipCacheMaxGamesEnvVar, cfg.MembershipCache.MaxGames) + if err != nil { + return Config{}, err + } + + cfg.Telemetry.ServiceName = stringEnv(otelServiceNameEnvVar, cfg.Telemetry.ServiceName) + cfg.Telemetry.TracesExporter = normalizeExporterValue(stringEnv(otelTracesExporterEnvVar, cfg.Telemetry.TracesExporter)) + cfg.Telemetry.MetricsExporter = normalizeExporterValue(stringEnv(otelMetricsExporterEnvVar, cfg.Telemetry.MetricsExporter)) + cfg.Telemetry.TracesProtocol = normalizeProtocolValue( + os.Getenv(otelExporterOTLPTracesProtocolEnvVar), + os.Getenv(otelExporterOTLPProtocolEnvVar), + cfg.Telemetry.TracesProtocol, + ) + cfg.Telemetry.MetricsProtocol = normalizeProtocolValue( + os.Getenv(otelExporterOTLPMetricsProtocolEnvVar), + os.Getenv(otelExporterOTLPProtocolEnvVar), + cfg.Telemetry.MetricsProtocol, + ) + cfg.Telemetry.StdoutTracesEnabled, err = boolEnv(otelStdoutTracesEnabledEnvVar, cfg.Telemetry.StdoutTracesEnabled) + if err != nil { + return Config{}, err + } + cfg.Telemetry.StdoutMetricsEnabled, err = boolEnv(otelStdoutMetricsEnabledEnvVar, cfg.Telemetry.StdoutMetricsEnabled) + if err != nil { + return Config{}, err + } + + if err := cfg.Validate(); err != nil { + return Config{}, err + } + + return cfg, nil +} + +func stringEnv(name string, fallback string) string { + value, ok := os.LookupEnv(name) + if !ok { + return fallback + } + + return strings.TrimSpace(value) +} + +func durationEnv(name string, fallback time.Duration) (time.Duration, error) { + value, ok := os.LookupEnv(name) + if !ok { + return fallback, nil + } + + parsed, err := time.ParseDuration(strings.TrimSpace(value)) + if err != nil { + return 0, fmt.Errorf("%s: parse duration: %w", name, err) + } + + return parsed, nil +} + +func intEnv(name string, fallback int) (int, error) { + value, ok := os.LookupEnv(name) + if !ok { + return fallback, nil + } + + parsed, err := strconv.Atoi(strings.TrimSpace(value)) + if err != nil { + return 0, fmt.Errorf("%s: parse int: %w", name, err) + } + + return parsed, nil +} + +func boolEnv(name string, fallback bool) (bool, error) { + value, ok := os.LookupEnv(name) + if !ok { + return fallback, nil + } + + parsed, err := strconv.ParseBool(strings.TrimSpace(value)) + if err != nil { + return false, fmt.Errorf("%s: parse bool: %w", name, err) + } + + return parsed, nil +} + +func normalizeExporterValue(value string) string { + trimmed := strings.TrimSpace(value) + switch trimmed { + case "", "none": + return "none" + default: + return trimmed + } +} + +func normalizeProtocolValue(primary string, fallback string, defaultValue string) string { + primary = strings.TrimSpace(primary) + if primary != "" { + return primary + } + + fallback = strings.TrimSpace(fallback) + if fallback != "" { + return fallback + } + + return strings.TrimSpace(defaultValue) +} diff --git a/gamemaster/internal/config/validation.go b/gamemaster/internal/config/validation.go new file mode 100644 index 0000000..cc87671 --- /dev/null +++ b/gamemaster/internal/config/validation.go @@ -0,0 +1,90 @@ +package config + +import ( + "fmt" + "log/slog" + "net" + "net/url" + "strings" +) + +// Validate reports whether cfg stores a usable Game Master process +// configuration. +func (cfg Config) Validate() error { + if cfg.ShutdownTimeout <= 0 { + return fmt.Errorf("%s must be positive", shutdownTimeoutEnvVar) + } + if err := validateSlogLevel(cfg.Logging.Level); err != nil { + return fmt.Errorf("%s: %w", logLevelEnvVar, err) + } + if err := cfg.InternalHTTP.Validate(); err != nil { + return err + } + if err := cfg.Postgres.Validate(); err != nil { + return err + } + if err := cfg.Redis.Validate(); err != nil { + return err + } + if err := cfg.Streams.Validate(); err != nil { + return err + } + if err := cfg.EngineClient.Validate(); err != nil { + return err + } + if err := cfg.Lobby.Validate(); err != nil { + return err + } + if err := cfg.RTM.Validate(); err != nil { + return err + } + if err := cfg.Scheduler.Validate(); err != nil { + return err + } + if err := cfg.MembershipCache.Validate(); err != nil { + return err + } + if err := cfg.Telemetry.Validate(); err != nil { + return err + } + + return nil +} + +func validateSlogLevel(level string) error { + var slogLevel slog.Level + if err := slogLevel.UnmarshalText([]byte(strings.TrimSpace(level))); err != nil { + return fmt.Errorf("invalid slog level %q: %w", level, err) + } + + return nil +} + +func isTCPAddr(value string) bool { + host, port, err := net.SplitHostPort(strings.TrimSpace(value)) + if err != nil { + return false + } + + if port == "" { + return false + } + if host == "" { + return true + } + + return !strings.Contains(host, " ") +} + +func isHTTPURL(value string) bool { + parsed, err := url.Parse(strings.TrimSpace(value)) + if err != nil { + return false + } + + if parsed.Scheme != "http" && parsed.Scheme != "https" { + return false + } + + return parsed.Host != "" +} diff --git a/gamemaster/internal/domain/engineversion/model.go b/gamemaster/internal/domain/engineversion/model.go new file mode 100644 index 0000000..70e7e01 --- /dev/null +++ b/gamemaster/internal/domain/engineversion/model.go @@ -0,0 +1,121 @@ +// Package engineversion defines the engine version registry domain +// model owned by Game Master. +// +// The registry mirrors the durable shape of the `engine_versions` +// PostgreSQL table (see +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql`) +// and the user-visible status enum frozen in +// `galaxy/gamemaster/api/internal-openapi.yaml`. +// +// `Options` is intentionally kept opaque ([]byte holding raw JSON) so +// the v1 service does not impose a Go-side schema on the engine-owned +// document. Schema-aware handling lands when an engine version actually +// requires it; until then the registry is a pass-through store. +package engineversion + +import ( + "errors" + "fmt" + "strings" + "time" +) + +// Status identifies one engine-version registry state. +type Status string + +const ( + // StatusActive marks a version as deployable. Lobby's start flow + // resolves image refs only against active versions. + StatusActive Status = "active" + + // StatusDeprecated marks a version as no longer offered for new + // starts. Already-running games on a deprecated version are + // unaffected; the runtime stays bound to the version it started on. + StatusDeprecated Status = "deprecated" +) + +// IsKnown reports whether status belongs to the frozen engine-version +// status vocabulary. +func (status Status) IsKnown() bool { + switch status { + case StatusActive, StatusDeprecated: + return true + default: + return false + } +} + +// AllStatuses returns the frozen list of every engine-version status +// value. The slice order is stable across calls. +func AllStatuses() []Status { + return []Status{StatusActive, StatusDeprecated} +} + +// EngineVersion stores one row of the `engine_versions` registry table. +// Options carries the raw `jsonb` document verbatim so the registry +// stays decoupled from any engine-side schema. +type EngineVersion struct { + // Version stores the canonical semver string (primary key). + Version string + + // ImageRef stores the Docker reference of the engine image. + ImageRef string + + // Options stores the engine-side options document as raw JSON. Empty + // is treated as `{}` by adapters that hydrate the column. + Options []byte + + // Status reports whether the version is deployable (`active`) or + // no longer offered for new starts (`deprecated`). + Status Status + + // CreatedAt stores the wall-clock at which the row was created. + CreatedAt time.Time + + // UpdatedAt stores the wall-clock of the most recent mutation. + UpdatedAt time.Time +} + +// Validate reports whether record satisfies the engine-version +// invariants implied by `engine_versions_status_chk` and the README +// §Engine Version Registry surface. +func (record EngineVersion) Validate() error { + if strings.TrimSpace(record.Version) == "" { + return fmt.Errorf("version must not be empty") + } + if strings.TrimSpace(record.ImageRef) == "" { + return fmt.Errorf("image ref must not be empty") + } + if !record.Status.IsKnown() { + return fmt.Errorf("status %q is unsupported", record.Status) + } + if record.CreatedAt.IsZero() { + return fmt.Errorf("created at must not be zero") + } + if record.UpdatedAt.IsZero() { + return fmt.Errorf("updated at must not be zero") + } + if record.UpdatedAt.Before(record.CreatedAt) { + return fmt.Errorf("updated at must not be before created at") + } + return nil +} + +// ErrNotFound reports that an engine-version lookup failed because no +// matching row exists. +var ErrNotFound = errors.New("engine version not found") + +// ErrInUse reports that a hard-delete or deprecate operation was +// rejected because the version is still referenced by a non-finished +// runtime record. +var ErrInUse = errors.New("engine version in use") + +// ErrConflict reports that an engine-version mutation could not be +// applied because a row with the same primary key already exists. +// Adapters surface a PostgreSQL unique-violation through this sentinel +// so the service layer maps it to a `conflict` REST envelope. +var ErrConflict = errors.New("engine version already exists") + +// ErrInvalidSemver reports that a semver string did not parse against +// `golang.org/x/mod/semver`'s grammar. +var ErrInvalidSemver = errors.New("invalid semver") diff --git a/gamemaster/internal/domain/engineversion/model_test.go b/gamemaster/internal/domain/engineversion/model_test.go new file mode 100644 index 0000000..60ff71e --- /dev/null +++ b/gamemaster/internal/domain/engineversion/model_test.go @@ -0,0 +1,63 @@ +package engineversion + +import ( + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validVersion() EngineVersion { + created := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + return EngineVersion{ + Version: "v1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + Options: []byte(`{"max_planets":120}`), + Status: StatusActive, + CreatedAt: created, + UpdatedAt: created, + } +} + +func TestStatusIsKnown(t *testing.T) { + for _, status := range AllStatuses() { + assert.True(t, status.IsKnown(), "want known: %q", status) + } + assert.False(t, Status("retired").IsKnown()) + assert.False(t, Status("").IsKnown()) +} + +func TestEngineVersionValidateHappy(t *testing.T) { + require.NoError(t, validVersion().Validate()) +} + +func TestEngineVersionValidateAcceptsEmptyOptions(t *testing.T) { + record := validVersion() + record.Options = nil + assert.NoError(t, record.Validate()) +} + +func TestEngineVersionValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*EngineVersion) + }{ + {"empty version", func(v *EngineVersion) { v.Version = "" }}, + {"empty image ref", func(v *EngineVersion) { v.ImageRef = "" }}, + {"unknown status", func(v *EngineVersion) { v.Status = "exotic" }}, + {"zero created at", func(v *EngineVersion) { v.CreatedAt = time.Time{} }}, + {"zero updated at", func(v *EngineVersion) { v.UpdatedAt = time.Time{} }}, + {"updated before created", func(v *EngineVersion) { + v.UpdatedAt = v.CreatedAt.Add(-time.Minute) + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + record := validVersion() + tt.mutate(&record) + assert.Error(t, record.Validate()) + }) + } +} diff --git a/gamemaster/internal/domain/engineversion/semver.go b/gamemaster/internal/domain/engineversion/semver.go new file mode 100644 index 0000000..1f6da27 --- /dev/null +++ b/gamemaster/internal/domain/engineversion/semver.go @@ -0,0 +1,60 @@ +package engineversion + +import ( + "fmt" + "strings" + + "golang.org/x/mod/semver" +) + +// ParseSemver normalises version into the canonical "vMAJOR.MINOR.PATCH" +// form expected by `golang.org/x/mod/semver` and reports a wrapped +// ErrInvalidSemver when the resulting string is not a valid full semver. +// +// Whitespace is trimmed; a missing leading "v" is added before the +// validity check so callers may pass either "1.2.3" or "v1.2.3". The +// stripped base must carry exactly three dot-separated numeric +// components — `golang.org/x/mod/semver` accepts shortened forms such +// as "v1" or "v1.2", but the engine-version registry requires the full +// triple, so this function rejects anything narrower. +func ParseSemver(version string) (string, error) { + candidate := strings.TrimSpace(version) + if candidate == "" { + return "", fmt.Errorf("%w: empty", ErrInvalidSemver) + } + if !strings.HasPrefix(candidate, "v") { + candidate = "v" + candidate + } + if !semver.IsValid(candidate) { + return "", fmt.Errorf("%w: %q", ErrInvalidSemver, version) + } + + base := candidate + if i := strings.IndexAny(base, "-+"); i >= 0 { + base = base[:i] + } + if strings.Count(base, ".") != 2 { + return "", fmt.Errorf( + "%w: %q (need vMAJOR.MINOR.PATCH)", + ErrInvalidSemver, version, + ) + } + return candidate, nil +} + +// IsPatchUpgrade reports whether next is a same-major.minor upgrade of +// current. Both inputs are parsed through ParseSemver so callers may +// pass either bare or `v`-prefixed forms. A wrapped ErrInvalidSemver is +// returned when either argument fails to parse; the boolean result is +// undefined in that case. +func IsPatchUpgrade(current, next string) (bool, error) { + curr, err := ParseSemver(current) + if err != nil { + return false, fmt.Errorf("current: %w", err) + } + nxt, err := ParseSemver(next) + if err != nil { + return false, fmt.Errorf("next: %w", err) + } + return semver.MajorMinor(curr) == semver.MajorMinor(nxt), nil +} diff --git a/gamemaster/internal/domain/engineversion/semver_test.go b/gamemaster/internal/domain/engineversion/semver_test.go new file mode 100644 index 0000000..7c56fb5 --- /dev/null +++ b/gamemaster/internal/domain/engineversion/semver_test.go @@ -0,0 +1,85 @@ +package engineversion + +import ( + "errors" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestParseSemverNormalises(t *testing.T) { + tests := []struct { + input string + want string + }{ + {"1.2.3", "v1.2.3"}, + {"v1.2.3", "v1.2.3"}, + {" v0.4.0 ", "v0.4.0"}, + {"v2.0.0-rc.1", "v2.0.0-rc.1"}, + {"v2.0.0+build.7", "v2.0.0+build.7"}, + } + for _, tt := range tests { + t.Run(tt.input, func(t *testing.T) { + got, err := ParseSemver(tt.input) + require.NoError(t, err) + assert.Equal(t, tt.want, got) + }) + } +} + +func TestParseSemverRejects(t *testing.T) { + tests := []string{ + "", + " ", + "latest", + "1", + "1.2", + "v1.2", + "1.2.3.4", + "v1.2.x", + } + for _, input := range tests { + t.Run(input, func(t *testing.T) { + _, err := ParseSemver(input) + require.Error(t, err) + assert.True(t, errors.Is(err, ErrInvalidSemver)) + }) + } +} + +func TestIsPatchUpgrade(t *testing.T) { + tests := []struct { + name string + current string + next string + want bool + }{ + {"same patch", "v1.2.3", "v1.2.3", true}, + {"patch bump", "v1.2.3", "v1.2.4", true}, + {"patch downgrade", "1.2.4", "1.2.0", true}, + {"prerelease patch", "v1.2.3", "v1.2.3-rc.1", true}, + {"minor bump", "v1.2.3", "v1.3.0", false}, + {"minor downgrade", "v1.2.3", "v1.1.9", false}, + {"major bump", "v1.2.3", "v2.0.0", false}, + {"major downgrade", "v2.0.0", "v1.9.9", false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got, err := IsPatchUpgrade(tt.current, tt.next) + require.NoError(t, err) + assert.Equal(t, tt.want, got) + }) + } +} + +func TestIsPatchUpgradeRejectsBadInputs(t *testing.T) { + _, err := IsPatchUpgrade("garbage", "v1.2.3") + require.Error(t, err) + assert.True(t, errors.Is(err, ErrInvalidSemver)) + + _, err = IsPatchUpgrade("v1.2.3", "") + require.Error(t, err) + assert.True(t, errors.Is(err, ErrInvalidSemver)) +} diff --git a/gamemaster/internal/domain/operation/log.go b/gamemaster/internal/domain/operation/log.go new file mode 100644 index 0000000..bcad41f --- /dev/null +++ b/gamemaster/internal/domain/operation/log.go @@ -0,0 +1,244 @@ +// Package operation defines the runtime-operation audit-log domain +// types owned by Game Master. +// +// One OperationEntry maps to one row of the `operation_log` PostgreSQL +// table (see +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql`). +// The OpKind / OpSource / Outcome enums match the SQL CHECK constraints +// verbatim and feed the telemetry counters declared in +// `galaxy/gamemaster/README.md §Observability`. +package operation + +import ( + "fmt" + "strings" + "time" +) + +// OpKind identifies the kind of operation Game Master performed. +type OpKind string + +const ( + // OpKindRegisterRuntime records a register-runtime operation + // (engine init plus first transition to running). + OpKindRegisterRuntime OpKind = "register_runtime" + + // OpKindTurnGeneration records a turn-generation operation + // (scheduler ticker or admin force). + OpKindTurnGeneration OpKind = "turn_generation" + + // OpKindForceNextTurn records the admin force-next-turn driver + // (separate from the turn-generation entry it produces, so audit + // callers can tell scheduler ticks from manual ones). + OpKindForceNextTurn OpKind = "force_next_turn" + + // OpKindBanish records a /admin/race/banish call against the + // engine container. + OpKindBanish OpKind = "banish" + + // OpKindStop records the admin stop driver (the underlying RTM + // stop call is recorded in Runtime Manager's own operation log). + OpKindStop OpKind = "stop" + + // OpKindPatch records the admin patch driver. + OpKindPatch OpKind = "patch" + + // OpKindEngineVersionCreate records a registry CREATE. + OpKindEngineVersionCreate OpKind = "engine_version_create" + + // OpKindEngineVersionUpdate records a registry PATCH. + OpKindEngineVersionUpdate OpKind = "engine_version_update" + + // OpKindEngineVersionDeprecate records a registry DELETE / soft + // deprecate. + OpKindEngineVersionDeprecate OpKind = "engine_version_deprecate" + + // OpKindEngineVersionDelete records a registry hard delete: the + // row is removed from `engine_versions` after the service layer + // confirms no non-finished runtime still references it. + OpKindEngineVersionDelete OpKind = "engine_version_delete" +) + +// IsKnown reports whether kind belongs to the frozen op-kind vocabulary. +func (kind OpKind) IsKnown() bool { + switch kind { + case OpKindRegisterRuntime, + OpKindTurnGeneration, + OpKindForceNextTurn, + OpKindBanish, + OpKindStop, + OpKindPatch, + OpKindEngineVersionCreate, + OpKindEngineVersionUpdate, + OpKindEngineVersionDeprecate, + OpKindEngineVersionDelete: + return true + default: + return false + } +} + +// AllOpKinds returns the frozen list of every op-kind value. The slice +// order is stable across calls. +func AllOpKinds() []OpKind { + return []OpKind{ + OpKindRegisterRuntime, + OpKindTurnGeneration, + OpKindForceNextTurn, + OpKindBanish, + OpKindStop, + OpKindPatch, + OpKindEngineVersionCreate, + OpKindEngineVersionUpdate, + OpKindEngineVersionDeprecate, + OpKindEngineVersionDelete, + } +} + +// OpSource identifies where one operation entered Game Master. +type OpSource string + +const ( + // OpSourceGatewayPlayer identifies entries triggered by a verified + // player command, order, or report read forwarded through Edge + // Gateway. + OpSourceGatewayPlayer OpSource = "gateway_player" + + // OpSourceLobbyInternal identifies entries triggered by Game Lobby + // over the trusted internal REST surface (register-runtime, + // memberships invalidate, banish, liveness). + OpSourceLobbyInternal OpSource = "lobby_internal" + + // OpSourceAdminRest identifies entries triggered by Admin Service + // (or system administrators today). The default when the + // `X-Galaxy-Caller` header is missing or unrecognised. + OpSourceAdminRest OpSource = "admin_rest" +) + +// IsKnown reports whether source belongs to the frozen op-source +// vocabulary. +func (source OpSource) IsKnown() bool { + switch source { + case OpSourceGatewayPlayer, + OpSourceLobbyInternal, + OpSourceAdminRest: + return true + default: + return false + } +} + +// AllOpSources returns the frozen list of every op-source value. The +// slice order is stable across calls. +func AllOpSources() []OpSource { + return []OpSource{ + OpSourceGatewayPlayer, + OpSourceLobbyInternal, + OpSourceAdminRest, + } +} + +// Outcome reports the high-level outcome of one operation. +type Outcome string + +const ( + // OutcomeSuccess reports that the operation completed without + // surfacing an error. + OutcomeSuccess Outcome = "success" + + // OutcomeFailure reports that the operation surfaced a stable + // error code recorded in OperationEntry.ErrorCode. + OutcomeFailure Outcome = "failure" +) + +// IsKnown reports whether outcome belongs to the frozen outcome +// vocabulary. +func (outcome Outcome) IsKnown() bool { + switch outcome { + case OutcomeSuccess, OutcomeFailure: + return true + default: + return false + } +} + +// AllOutcomes returns the frozen list of every outcome value. The slice +// order is stable across calls. +func AllOutcomes() []Outcome { + return []Outcome{OutcomeSuccess, OutcomeFailure} +} + +// OperationEntry stores one append-only audit row of the `operation_log` +// table. ID is zero on records that have not been persisted yet; the +// store assigns it from the table's bigserial column. FinishedAt is a +// pointer because the column is nullable for in-flight rows even though +// the service layer finalises the row in the same transaction. +type OperationEntry struct { + // ID identifies the persisted row. Zero before persistence. + ID int64 + + // GameID identifies the platform game this operation acted on. + GameID string + + // OpKind classifies what the operation did. + OpKind OpKind + + // OpSource classifies how the operation entered Game Master. + OpSource OpSource + + // SourceRef stores an opaque per-source reference such as a request + // id, a Redis Stream entry id, or an admin user id. Empty when the + // source does not provide one. + SourceRef string + + // Outcome reports whether the operation succeeded or failed. + Outcome Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string + + // StartedAt stores the wall-clock at which the operation began. + StartedAt time.Time + + // FinishedAt stores the wall-clock at which the operation + // finalised. Nil for in-flight rows. + FinishedAt *time.Time +} + +// Validate reports whether entry satisfies the operation-log invariants +// implied by the SQL CHECK constraints and the README §Persistence +// Layout listing. +func (entry OperationEntry) Validate() error { + if strings.TrimSpace(entry.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if !entry.OpKind.IsKnown() { + return fmt.Errorf("op kind %q is unsupported", entry.OpKind) + } + if !entry.OpSource.IsKnown() { + return fmt.Errorf("op source %q is unsupported", entry.OpSource) + } + if !entry.Outcome.IsKnown() { + return fmt.Errorf("outcome %q is unsupported", entry.Outcome) + } + if entry.StartedAt.IsZero() { + return fmt.Errorf("started at must not be zero") + } + if entry.FinishedAt != nil { + if entry.FinishedAt.IsZero() { + return fmt.Errorf("finished at must not be zero when present") + } + if entry.FinishedAt.Before(entry.StartedAt) { + return fmt.Errorf("finished at must not be before started at") + } + } + if entry.Outcome == OutcomeFailure && strings.TrimSpace(entry.ErrorCode) == "" { + return fmt.Errorf("error code must not be empty for failure entries") + } + return nil +} diff --git a/gamemaster/internal/domain/operation/log_test.go b/gamemaster/internal/domain/operation/log_test.go new file mode 100644 index 0000000..15d83f4 --- /dev/null +++ b/gamemaster/internal/domain/operation/log_test.go @@ -0,0 +1,100 @@ +package operation + +import ( + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validSuccessEntry() OperationEntry { + started := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + finished := started.Add(time.Second) + return OperationEntry{ + GameID: "game-1", + OpKind: OpKindRegisterRuntime, + OpSource: OpSourceLobbyInternal, + Outcome: OutcomeSuccess, + StartedAt: started, + FinishedAt: &finished, + } +} + +func validFailureEntry() OperationEntry { + entry := validSuccessEntry() + entry.Outcome = OutcomeFailure + entry.ErrorCode = "engine_unreachable" + entry.ErrorMessage = "engine returned 502" + return entry +} + +func TestOpKindIsKnown(t *testing.T) { + for _, kind := range AllOpKinds() { + assert.True(t, kind.IsKnown(), "want known: %q", kind) + } + assert.False(t, OpKind("exotic").IsKnown()) + assert.Len(t, AllOpKinds(), 10) +} + +func TestOpSourceIsKnown(t *testing.T) { + for _, src := range AllOpSources() { + assert.True(t, src.IsKnown(), "want known: %q", src) + } + assert.False(t, OpSource("exotic").IsKnown()) + assert.Len(t, AllOpSources(), 3) +} + +func TestOutcomeIsKnown(t *testing.T) { + for _, outcome := range AllOutcomes() { + assert.True(t, outcome.IsKnown(), "want known: %q", outcome) + } + assert.False(t, Outcome("exotic").IsKnown()) + assert.Len(t, AllOutcomes(), 2) +} + +func TestOperationEntryValidateHappy(t *testing.T) { + require.NoError(t, validSuccessEntry().Validate()) + require.NoError(t, validFailureEntry().Validate()) +} + +func TestOperationEntryValidateAcceptsInFlight(t *testing.T) { + entry := validSuccessEntry() + entry.FinishedAt = nil + assert.NoError(t, entry.Validate()) +} + +func TestOperationEntryValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*OperationEntry) + }{ + {"empty game id", func(e *OperationEntry) { e.GameID = "" }}, + {"unknown op kind", func(e *OperationEntry) { e.OpKind = "exotic" }}, + {"unknown op source", func(e *OperationEntry) { e.OpSource = "exotic" }}, + {"unknown outcome", func(e *OperationEntry) { e.Outcome = "exotic" }}, + {"zero started at", func(e *OperationEntry) { e.StartedAt = time.Time{} }}, + {"zero finished at when present", func(e *OperationEntry) { + zero := time.Time{} + e.FinishedAt = &zero + }}, + {"finished before started", func(e *OperationEntry) { + before := e.StartedAt.Add(-time.Second) + e.FinishedAt = &before + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + entry := validSuccessEntry() + tt.mutate(&entry) + assert.Error(t, entry.Validate()) + }) + } +} + +func TestOperationEntryValidateRejectsFailureWithoutCode(t *testing.T) { + entry := validFailureEntry() + entry.ErrorCode = "" + assert.Error(t, entry.Validate()) +} diff --git a/gamemaster/internal/domain/playermapping/model.go b/gamemaster/internal/domain/playermapping/model.go new file mode 100644 index 0000000..c6d4ebc --- /dev/null +++ b/gamemaster/internal/domain/playermapping/model.go @@ -0,0 +1,71 @@ +// Package playermapping defines the durable mapping between platform +// users and engine player handles owned by Game Master. +// +// One PlayerMapping mirrors one row of the `player_mappings` PostgreSQL +// table (see +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql`). +// The composite primary key `(game_id, user_id)` and the unique +// `(game_id, race_name)` index live in the SQL schema; the domain model +// captures the per-row invariants enforced from the application side. +package playermapping + +import ( + "errors" + "fmt" + "strings" + "time" +) + +// PlayerMapping stores one (game_id, user_id) → (race_name, +// engine_player_uuid) projection installed at register-runtime. +type PlayerMapping struct { + // GameID identifies the game owning this mapping. + GameID string + + // UserID identifies the platform user this mapping refers to. + UserID string + + // RaceName stores the in-game race name reserved for the user in + // the original casing presented by the engine. + RaceName string + + // EnginePlayerUUID stores the engine-side player handle returned by + // the engine /admin/init response. + EnginePlayerUUID string + + // CreatedAt stores the wall-clock at which the row was inserted. + CreatedAt time.Time +} + +// Validate reports whether mapping satisfies the player-mapping +// invariants implied by the README §Persistence Layout / player_mappings +// columns and the SQL primary-key + unique-index constraints. +func (mapping PlayerMapping) Validate() error { + if strings.TrimSpace(mapping.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(mapping.UserID) == "" { + return fmt.Errorf("user id must not be empty") + } + if strings.TrimSpace(mapping.RaceName) == "" { + return fmt.Errorf("race name must not be empty") + } + if strings.TrimSpace(mapping.EnginePlayerUUID) == "" { + return fmt.Errorf("engine player uuid must not be empty") + } + if mapping.CreatedAt.IsZero() { + return fmt.Errorf("created at must not be zero") + } + return nil +} + +// ErrNotFound reports that a player-mapping lookup failed because no +// matching row exists. +var ErrNotFound = errors.New("player mapping not found") + +// ErrConflict reports that a player-mapping insert could not be applied +// because a row with the same `(game_id, user_id)` primary key or with +// the same `(game_id, race_name)` unique pair already exists. Adapters +// surface PostgreSQL unique-violations through this sentinel so the +// service layer maps it to a `conflict` REST envelope. +var ErrConflict = errors.New("player mapping already exists") diff --git a/gamemaster/internal/domain/playermapping/model_test.go b/gamemaster/internal/domain/playermapping/model_test.go new file mode 100644 index 0000000..a6cc88d --- /dev/null +++ b/gamemaster/internal/domain/playermapping/model_test.go @@ -0,0 +1,44 @@ +package playermapping + +import ( + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validMapping() PlayerMapping { + return PlayerMapping{ + GameID: "game-1", + UserID: "user-1", + RaceName: "Aelinari", + EnginePlayerUUID: "00000000-0000-0000-0000-000000000001", + CreatedAt: time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC), + } +} + +func TestPlayerMappingValidateHappy(t *testing.T) { + require.NoError(t, validMapping().Validate()) +} + +func TestPlayerMappingValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*PlayerMapping) + }{ + {"empty game id", func(m *PlayerMapping) { m.GameID = "" }}, + {"empty user id", func(m *PlayerMapping) { m.UserID = "" }}, + {"empty race name", func(m *PlayerMapping) { m.RaceName = "" }}, + {"empty engine uuid", func(m *PlayerMapping) { m.EnginePlayerUUID = "" }}, + {"zero created at", func(m *PlayerMapping) { m.CreatedAt = time.Time{} }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + mapping := validMapping() + tt.mutate(&mapping) + assert.Error(t, mapping.Validate()) + }) + } +} diff --git a/gamemaster/internal/domain/runtime/errors.go b/gamemaster/internal/domain/runtime/errors.go new file mode 100644 index 0000000..522e1f1 --- /dev/null +++ b/gamemaster/internal/domain/runtime/errors.go @@ -0,0 +1,43 @@ +package runtime + +import ( + "errors" + "fmt" +) + +// ErrNotFound reports that a runtime record was requested but does not +// exist in the store. +var ErrNotFound = errors.New("runtime record not found") + +// ErrConflict reports that a runtime mutation could not be applied +// because the record changed concurrently or failed a compare-and-swap +// guard. +var ErrConflict = errors.New("runtime record conflict") + +// ErrInvalidTransition is the sentinel returned when Transition rejects +// a `(from, to)` pair. +var ErrInvalidTransition = errors.New("invalid runtime status transition") + +// InvalidTransitionError stores the rejected `(from, to)` pair and wraps +// ErrInvalidTransition so callers can match it with errors.Is. +type InvalidTransitionError struct { + // From stores the source status that was attempted to leave. + From Status + + // To stores the destination status that was attempted to enter. + To Status +} + +// Error reports a human-readable summary of the rejected pair. +func (err *InvalidTransitionError) Error() string { + return fmt.Sprintf( + "invalid runtime status transition from %q to %q", + err.From, err.To, + ) +} + +// Unwrap returns ErrInvalidTransition so errors.Is recognizes the +// sentinel. +func (err *InvalidTransitionError) Unwrap() error { + return ErrInvalidTransition +} diff --git a/gamemaster/internal/domain/runtime/model.go b/gamemaster/internal/domain/runtime/model.go new file mode 100644 index 0000000..aa5c046 --- /dev/null +++ b/gamemaster/internal/domain/runtime/model.go @@ -0,0 +1,254 @@ +// Package runtime defines the runtime-record domain model, status +// machine, and sentinel errors owned by Game Master. +// +// The package mirrors the durable shape of the `runtime_records` +// PostgreSQL table (see +// `galaxy/gamemaster/internal/adapters/postgres/migrations/00001_init.sql`). +// Every status / transition / required-field rule already documented in +// `galaxy/gamemaster/README.md` lives here as code so adapter and service +// layers do not re-derive it. +package runtime + +import ( + "fmt" + "strings" + "time" +) + +// Status identifies one runtime-record lifecycle state. +type Status string + +const ( + // StatusStarting reports that register-runtime has persisted the row + // but the engine /admin/init call has not yet succeeded. + StatusStarting Status = "starting" + + // StatusRunning reports that the runtime is healthy and accepting + // player commands and turn generation. + StatusRunning Status = "running" + + // StatusGenerationInProgress reports that the scheduler or admin + // force-next-turn flow has CAS'd the row to drive turn generation. + StatusGenerationInProgress Status = "generation_in_progress" + + // StatusGenerationFailed reports that turn generation surfaced an + // engine error and the runtime is awaiting manual recovery. + StatusGenerationFailed Status = "generation_failed" + + // StatusStopped reports that an admin stop has completed; the row + // stays in PostgreSQL for audit. + StatusStopped Status = "stopped" + + // StatusEngineUnreachable reports that runtime:health_events observed + // an engine container failure (exited, OOM, disappeared, or repeated + // probe failures). + StatusEngineUnreachable Status = "engine_unreachable" + + // StatusFinished reports that the engine returned `finished:true` on + // a turn-generation response. The state is terminal: the row stays + // here indefinitely; operator cleanup is the only path out. + StatusFinished Status = "finished" +) + +// IsKnown reports whether status belongs to the frozen runtime status +// vocabulary. +func (status Status) IsKnown() bool { + switch status { + case StatusStarting, + StatusRunning, + StatusGenerationInProgress, + StatusGenerationFailed, + StatusStopped, + StatusEngineUnreachable, + StatusFinished: + return true + default: + return false + } +} + +// IsTerminal reports whether status can no longer accept lifecycle +// transitions. Per `gamemaster/README.md §Game Master status model`, only +// `finished` is terminal; `stopped` may still be observed but is treated +// as a non-terminal end-state for admin replay purposes (no transitions +// out of it are wired in v1, but the state machine does not forbid them +// architecturally). +func (status Status) IsTerminal() bool { + return status == StatusFinished +} + +// AllStatuses returns the frozen list of every runtime status value. The +// slice order is stable across calls and matches the README §Persistence +// Layout listing. +func AllStatuses() []Status { + return []Status{ + StatusStarting, + StatusRunning, + StatusGenerationInProgress, + StatusGenerationFailed, + StatusStopped, + StatusEngineUnreachable, + StatusFinished, + } +} + +// RuntimeRecord stores one durable runtime record owned by Game Master. +// It mirrors one row of the `runtime_records` table. +// +// NextGenerationAt is *time.Time so a missing tick (e.g., a row that has +// just entered with status=starting) is unambiguous. StartedAt, StoppedAt, +// and FinishedAt are *time.Time for the same reason and align with the +// jet-generated model. +type RuntimeRecord struct { + // GameID identifies the platform game owning this runtime record. + GameID string + + // Status stores the current lifecycle state. + Status Status + + // EngineEndpoint stores the stable URL Game Master uses to reach the + // engine container, in `http://galaxy-game-{game_id}:8080` form. + EngineEndpoint string + + // CurrentImageRef stores the Docker reference of the running engine + // image (or the most recent one for stopped/finished records). + CurrentImageRef string + + // CurrentEngineVersion stores the semver of the currently-bound + // engine version (registered in `engine_versions`). + CurrentEngineVersion string + + // TurnSchedule stores the five-field cron expression governing turn + // generation, copied from the platform game record at + // register-runtime time. + TurnSchedule string + + // CurrentTurn stores the last completed turn number; zero until the + // first turn generates. + CurrentTurn int + + // NextGenerationAt stores the next due tick. Nil when no tick is + // scheduled (e.g., status=starting, finished, stopped). + NextGenerationAt *time.Time + + // SkipNextTick is true when force-next-turn has set the skip flag + // for the next regular tick. Cleared by the scheduler after the + // first scheduled step is skipped. + SkipNextTick bool + + // EngineHealth stores the short text summary derived from + // runtime:health_events; empty until the first health observation. + EngineHealth string + + // CreatedAt stores the wall-clock at which the record was created. + CreatedAt time.Time + + // UpdatedAt stores the wall-clock of the most recent mutation. + UpdatedAt time.Time + + // StartedAt stores the wall-clock at which the runtime first + // transitioned to running. Non-nil once the status leaves starting. + StartedAt *time.Time + + // StoppedAt stores the wall-clock at which the runtime was stopped. + // Non-nil when status is stopped. + StoppedAt *time.Time + + // FinishedAt stores the wall-clock at which the engine reported + // finish. Non-nil when status is finished. + FinishedAt *time.Time +} + +// Validate reports whether record satisfies the runtime-record invariants +// implied by README §Lifecycles and the SQL CHECK on `runtime_records`. +func (record RuntimeRecord) Validate() error { + if strings.TrimSpace(record.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if !record.Status.IsKnown() { + return fmt.Errorf("status %q is unsupported", record.Status) + } + if strings.TrimSpace(record.EngineEndpoint) == "" { + return fmt.Errorf("engine endpoint must not be empty") + } + if strings.TrimSpace(record.CurrentImageRef) == "" { + return fmt.Errorf("current image ref must not be empty") + } + if strings.TrimSpace(record.CurrentEngineVersion) == "" { + return fmt.Errorf("current engine version must not be empty") + } + if strings.TrimSpace(record.TurnSchedule) == "" { + return fmt.Errorf("turn schedule must not be empty") + } + if record.CurrentTurn < 0 { + return fmt.Errorf("current turn must not be negative") + } + if record.CreatedAt.IsZero() { + return fmt.Errorf("created at must not be zero") + } + if record.UpdatedAt.IsZero() { + return fmt.Errorf("updated at must not be zero") + } + if record.UpdatedAt.Before(record.CreatedAt) { + return fmt.Errorf("updated at must not be before created at") + } + + if record.NextGenerationAt != nil && record.NextGenerationAt.IsZero() { + return fmt.Errorf("next generation at must not be zero when present") + } + + switch record.Status { + case StatusStarting: + if record.StartedAt != nil { + return fmt.Errorf("started at must be nil for starting records") + } + + case StatusRunning, + StatusGenerationInProgress, + StatusGenerationFailed, + StatusEngineUnreachable: + if record.StartedAt == nil { + return fmt.Errorf( + "started at must not be nil for %s records", + record.Status, + ) + } + if record.StartedAt.IsZero() { + return fmt.Errorf("started at must not be zero when present") + } + + case StatusStopped: + if record.StartedAt == nil { + return fmt.Errorf("started at must not be nil for stopped records") + } + if record.StoppedAt == nil { + return fmt.Errorf("stopped at must not be nil for stopped records") + } + if record.StoppedAt.IsZero() { + return fmt.Errorf("stopped at must not be zero when present") + } + if record.StoppedAt.Before(*record.StartedAt) { + return fmt.Errorf("stopped at must not be before started at") + } + + case StatusFinished: + if record.StartedAt == nil { + return fmt.Errorf("started at must not be nil for finished records") + } + if record.FinishedAt == nil { + return fmt.Errorf("finished at must not be nil for finished records") + } + if record.FinishedAt.IsZero() { + return fmt.Errorf("finished at must not be zero when present") + } + if record.FinishedAt.Before(*record.StartedAt) { + return fmt.Errorf("finished at must not be before started at") + } + } + + if record.StartedAt != nil && record.StartedAt.Before(record.CreatedAt) { + return fmt.Errorf("started at must not be before created at") + } + + return nil +} diff --git a/gamemaster/internal/domain/runtime/model_test.go b/gamemaster/internal/domain/runtime/model_test.go new file mode 100644 index 0000000..45316c9 --- /dev/null +++ b/gamemaster/internal/domain/runtime/model_test.go @@ -0,0 +1,130 @@ +package runtime + +import ( + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validRunningRecord() RuntimeRecord { + created := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + started := created.Add(time.Minute) + updated := started.Add(time.Minute) + next := updated.Add(time.Hour) + return RuntimeRecord{ + GameID: "game-1", + Status: StatusRunning, + EngineEndpoint: "http://galaxy-game-1:8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 0, + NextGenerationAt: &next, + CreatedAt: created, + UpdatedAt: updated, + StartedAt: &started, + } +} + +func TestStatusIsKnown(t *testing.T) { + for _, status := range AllStatuses() { + assert.True(t, status.IsKnown(), "want known: %q", status) + } + assert.False(t, Status("exotic").IsKnown()) + assert.False(t, Status("").IsKnown()) +} + +func TestStatusIsTerminal(t *testing.T) { + assert.True(t, StatusFinished.IsTerminal()) + for _, status := range AllStatuses() { + if status == StatusFinished { + continue + } + assert.False(t, status.IsTerminal(), "%q must not be terminal", status) + } +} + +func TestAllStatusesStable(t *testing.T) { + first := AllStatuses() + second := AllStatuses() + assert.Equal(t, first, second) + assert.Len(t, first, 7) +} + +func TestRuntimeRecordValidateHappy(t *testing.T) { + require.NoError(t, validRunningRecord().Validate()) +} + +func TestRuntimeRecordValidateAcceptsStarting(t *testing.T) { + record := validRunningRecord() + record.Status = StatusStarting + record.StartedAt = nil + record.NextGenerationAt = nil + + assert.NoError(t, record.Validate()) +} + +func TestRuntimeRecordValidateRequiresFinishedAt(t *testing.T) { + record := validRunningRecord() + record.Status = StatusFinished + record.FinishedAt = nil + + assert.Error(t, record.Validate()) + + finished := record.UpdatedAt.Add(time.Minute) + record.FinishedAt = &finished + assert.NoError(t, record.Validate()) +} + +func TestRuntimeRecordValidateRequiresStoppedAtForStopped(t *testing.T) { + record := validRunningRecord() + record.Status = StatusStopped + assert.Error(t, record.Validate()) + + stopped := record.UpdatedAt.Add(time.Minute) + record.StoppedAt = &stopped + assert.NoError(t, record.Validate()) +} + +func TestRuntimeRecordValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*RuntimeRecord) + }{ + {"empty game id", func(r *RuntimeRecord) { r.GameID = "" }}, + {"unknown status", func(r *RuntimeRecord) { r.Status = "exotic" }}, + {"empty engine endpoint", func(r *RuntimeRecord) { r.EngineEndpoint = "" }}, + {"empty image ref", func(r *RuntimeRecord) { r.CurrentImageRef = "" }}, + {"empty engine version", func(r *RuntimeRecord) { r.CurrentEngineVersion = "" }}, + {"empty turn schedule", func(r *RuntimeRecord) { r.TurnSchedule = "" }}, + {"negative turn", func(r *RuntimeRecord) { r.CurrentTurn = -1 }}, + {"zero created at", func(r *RuntimeRecord) { r.CreatedAt = time.Time{} }}, + {"zero updated at", func(r *RuntimeRecord) { r.UpdatedAt = time.Time{} }}, + {"updated before created", func(r *RuntimeRecord) { + r.UpdatedAt = r.CreatedAt.Add(-time.Minute) + }}, + {"started before created", func(r *RuntimeRecord) { + before := r.CreatedAt.Add(-time.Minute) + r.StartedAt = &before + }}, + {"running missing started at", func(r *RuntimeRecord) { r.StartedAt = nil }}, + {"starting with started at", func(r *RuntimeRecord) { + r.Status = StatusStarting + // keep StartedAt set + }}, + {"zero next generation at", func(r *RuntimeRecord) { + zero := time.Time{} + r.NextGenerationAt = &zero + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + record := validRunningRecord() + tt.mutate(&record) + assert.Error(t, record.Validate()) + }) + } +} diff --git a/gamemaster/internal/domain/runtime/transitions.go b/gamemaster/internal/domain/runtime/transitions.go new file mode 100644 index 0000000..e118f71 --- /dev/null +++ b/gamemaster/internal/domain/runtime/transitions.go @@ -0,0 +1,77 @@ +package runtime + +// transitionKey stores one `(from, to)` pair in the allowed-transitions +// table. +type transitionKey struct { + from Status + to Status +} + +// allowedTransitions enumerates the runtime-status transitions Game +// Master is allowed to apply. The set mirrors the lifecycle flows frozen +// in `galaxy/gamemaster/README.md §Lifecycles`: +// +// - starting → running: register-runtime CAS after a successful +// engine /admin/init. +// - running → generation_in_progress: scheduler ticker or admin +// force-next-turn enters turn generation. +// - generation_in_progress → running: turn generation succeeded with +// `finished=false`. +// - generation_in_progress → generation_failed: engine timeout or +// 5xx during turn generation. +// - generation_in_progress → finished: engine returned +// `finished=true`; the state is terminal. +// - generation_failed → generation_in_progress: admin force-next-turn +// after manual recovery. +// - running → engine_unreachable: runtime:health_events observed an +// engine container failure (Stage 18 consumer). +// - engine_unreachable → running: runtime:health_events observed a +// recovery; reserved for the Stage 18 consumer; declared here so +// Stage 18 needs no transitions edit. +// - running → stopped, generation_in_progress → stopped, +// generation_failed → stopped, engine_unreachable → stopped: admin +// stop is allowed from every non-terminal status (README §Stop: +// «CAS `runtime_records.status: * → stopped`»). +var allowedTransitions = map[transitionKey]struct{}{ + {StatusStarting, StatusRunning}: {}, + + {StatusRunning, StatusGenerationInProgress}: {}, + + {StatusGenerationInProgress, StatusRunning}: {}, + {StatusGenerationInProgress, StatusGenerationFailed}: {}, + {StatusGenerationInProgress, StatusFinished}: {}, + {StatusGenerationFailed, StatusGenerationInProgress}: {}, + + {StatusRunning, StatusEngineUnreachable}: {}, + {StatusEngineUnreachable, StatusRunning}: {}, + + {StatusRunning, StatusStopped}: {}, + {StatusGenerationInProgress, StatusStopped}: {}, + {StatusGenerationFailed, StatusStopped}: {}, + {StatusEngineUnreachable, StatusStopped}: {}, +} + +// AllowedTransitions returns a copy of the `(from, to)` allowed +// transitions table used by Transition. The returned map is safe to +// mutate; callers should not rely on iteration order. +func AllowedTransitions() map[Status][]Status { + result := make(map[Status][]Status) + for key := range allowedTransitions { + result[key.from] = append(result[key.from], key.to) + } + return result +} + +// Transition reports whether from may transition to next. The function +// returns nil when the pair is permitted, and an *InvalidTransitionError +// wrapping ErrInvalidTransition otherwise. It does not touch any store +// and is safe to call from any layer. +func Transition(from Status, next Status) error { + if !from.IsKnown() || !next.IsKnown() { + return &InvalidTransitionError{From: from, To: next} + } + if _, ok := allowedTransitions[transitionKey{from: from, to: next}]; !ok { + return &InvalidTransitionError{From: from, To: next} + } + return nil +} diff --git a/gamemaster/internal/domain/runtime/transitions_test.go b/gamemaster/internal/domain/runtime/transitions_test.go new file mode 100644 index 0000000..abae069 --- /dev/null +++ b/gamemaster/internal/domain/runtime/transitions_test.go @@ -0,0 +1,90 @@ +package runtime + +import ( + "errors" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestTransitionAcceptsAllAllowedPairs(t *testing.T) { + for from, tos := range AllowedTransitions() { + for _, to := range tos { + t.Run(string(from)+"->"+string(to), func(t *testing.T) { + assert.NoError(t, Transition(from, to)) + }) + } + } +} + +func TestTransitionRejectsForbiddenPairs(t *testing.T) { + allowed := AllowedTransitions() + allowedSet := make(map[transitionKey]struct{}) + for from, tos := range allowed { + for _, to := range tos { + allowedSet[transitionKey{from: from, to: to}] = struct{}{} + } + } + + for _, from := range AllStatuses() { + for _, to := range AllStatuses() { + if _, ok := allowedSet[transitionKey{from: from, to: to}]; ok { + continue + } + t.Run(string(from)+"->"+string(to), func(t *testing.T) { + err := Transition(from, to) + require.Error(t, err) + var typed *InvalidTransitionError + assert.True(t, errors.As(err, &typed)) + assert.Equal(t, from, typed.From) + assert.Equal(t, to, typed.To) + assert.True(t, errors.Is(err, ErrInvalidTransition)) + }) + } + } +} + +func TestTransitionRejectsUnknownStatus(t *testing.T) { + tests := []struct { + name string + from Status + to Status + }{ + {"unknown from", "exotic", StatusRunning}, + {"unknown to", StatusRunning, "exotic"}, + {"both unknown", "from-x", "to-y"}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + err := Transition(tt.from, tt.to) + require.Error(t, err) + assert.True(t, errors.Is(err, ErrInvalidTransition)) + }) + } +} + +func TestAllowedTransitionsIncludesExpectedFlows(t *testing.T) { + allowed := AllowedTransitions() + must := func(from Status, expected Status) { + t.Helper() + got := allowed[from] + assert.Containsf(t, got, expected, + "expected %q in transitions from %q, got %v", + expected, from, got) + } + + must(StatusStarting, StatusRunning) + must(StatusRunning, StatusGenerationInProgress) + must(StatusGenerationInProgress, StatusRunning) + must(StatusGenerationInProgress, StatusGenerationFailed) + must(StatusGenerationInProgress, StatusFinished) + must(StatusGenerationFailed, StatusGenerationInProgress) + must(StatusRunning, StatusEngineUnreachable) + must(StatusEngineUnreachable, StatusRunning) + must(StatusRunning, StatusStopped) + must(StatusGenerationInProgress, StatusStopped) + must(StatusGenerationFailed, StatusStopped) + must(StatusEngineUnreachable, StatusStopped) +} diff --git a/gamemaster/internal/domain/schedule/nexttick.go b/gamemaster/internal/domain/schedule/nexttick.go new file mode 100644 index 0000000..31739d5 --- /dev/null +++ b/gamemaster/internal/domain/schedule/nexttick.go @@ -0,0 +1,59 @@ +// Package schedule wraps `pkg/cronutil` with the force-next-turn skip +// rule used by Game Master's scheduler. +// +// The wrapper is pure: callers pass the current `skip_next_tick` flag +// and the wrapper returns both the next firing time and a boolean that +// reports whether the flag was consumed. The runtime-record store is +// responsible for persisting the cleared flag; this package never +// touches it. +// +// `gamemaster/README.md §Force-next-turn` describes the rule: +// +// If `skip_next_tick=true`, advance by one extra cron step and clear +// the flag. +package schedule + +import ( + "time" + + "galaxy/cronutil" +) + +// Schedule wraps `cronutil.Schedule` with the GM-specific +// skip-next-tick semantics. The zero value is not usable; callers +// obtain a Schedule from Parse. +type Schedule struct { + inner cronutil.Schedule +} + +// Parse parses expr as a five-field cron expression and returns the +// resulting Schedule. Parse returns an error if expr is rejected by the +// underlying cronutil parser. +func Parse(expr string) (Schedule, error) { + inner, err := cronutil.Parse(expr) + if err != nil { + return Schedule{}, err + } + return Schedule{inner: inner}, nil +} + +// Next returns the next firing time strictly after `after`, honouring +// the skip flag. +// +// When `skip` is false, Next returns `cronutil.Schedule.Next(after)` +// and reports `skipConsumed=false`. +// +// When `skip` is true, Next computes the cron step immediately after +// `after`, then advances by one further cron step and returns that +// time with `skipConsumed=true`. The caller is responsible for +// persisting the cleared flag after observing `skipConsumed`. +// +// All returned times are in UTC; cronutil.Schedule already enforces +// UTC normalisation on its inputs and outputs. +func (s Schedule) Next(after time.Time, skip bool) (time.Time, bool) { + first := s.inner.Next(after) + if !skip { + return first, false + } + return s.inner.Next(first), true +} diff --git a/gamemaster/internal/domain/schedule/nexttick_test.go b/gamemaster/internal/domain/schedule/nexttick_test.go new file mode 100644 index 0000000..7a4ea2b --- /dev/null +++ b/gamemaster/internal/domain/schedule/nexttick_test.go @@ -0,0 +1,67 @@ +package schedule + +import ( + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestParseRejectsBadExpr(t *testing.T) { + _, err := Parse("") + assert.Error(t, err) + + _, err = Parse("0 0 31 2 *") // valid syntactically but never fires; cronutil accepts it + // cronutil only validates syntax; an impossible date is still parsed. + // We assert by separately rejecting clearly invalid syntax: + _, err = Parse("not-a-cron") + assert.Error(t, err) + + _, err = Parse("0 18 * *") // four fields + assert.Error(t, err) + + _, err = Parse("0 0 * * * *") // six fields + assert.Error(t, err) +} + +func TestNextNoSkip(t *testing.T) { + // Fires every day at 18:00 UTC. + sched, err := Parse("0 18 * * *") + require.NoError(t, err) + + after := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + got, skipped := sched.Next(after, false) + + assert.False(t, skipped) + assert.Equal(t, time.Date(2026, 4, 27, 18, 0, 0, 0, time.UTC), got) + assert.Equal(t, time.UTC, got.Location()) +} + +func TestNextWithSkipAdvancesOneStep(t *testing.T) { + sched, err := Parse("0 18 * * *") + require.NoError(t, err) + + after := time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC) + got, skipped := sched.Next(after, true) + + assert.True(t, skipped) + // First slot would be 2026-04-27 18:00 UTC; the skip rule advances + // to 2026-04-28 18:00 UTC. + assert.Equal(t, time.Date(2026, 4, 28, 18, 0, 0, 0, time.UTC), got) +} + +func TestNextNormalisesNonUTCInput(t *testing.T) { + sched, err := Parse("*/15 * * * *") + require.NoError(t, err) + + moscow := time.FixedZone("MSK", 3*60*60) + // 2026-04-27 15:30 MSK = 2026-04-27 12:30 UTC; next 15-minute slot + // in UTC is 12:45. + after := time.Date(2026, 4, 27, 15, 30, 0, 0, moscow) + + got, skipped := sched.Next(after, false) + assert.False(t, skipped) + assert.Equal(t, time.Date(2026, 4, 27, 12, 45, 0, 0, time.UTC), got) + assert.Equal(t, time.UTC, got.Location()) +} diff --git a/gamemaster/internal/logging/context.go b/gamemaster/internal/logging/context.go new file mode 100644 index 0000000..bc05afb --- /dev/null +++ b/gamemaster/internal/logging/context.go @@ -0,0 +1,43 @@ +package logging + +import "context" + +// requestIDKey is the unexported context key under which the HTTP layer +// stores the request id propagated from the X-Request-Id header. +type requestIDKey struct{} + +// WithRequestID returns a child context that carries requestID. An empty +// requestID returns ctx unchanged so callers do not have to branch. +func WithRequestID(ctx context.Context, requestID string) context.Context { + if ctx == nil || requestID == "" { + return ctx + } + return context.WithValue(ctx, requestIDKey{}, requestID) +} + +// RequestIDFromContext returns the request id stored on ctx by +// WithRequestID, or an empty string when no value is present. +func RequestIDFromContext(ctx context.Context) string { + if ctx == nil { + return "" + } + value, _ := ctx.Value(requestIDKey{}).(string) + return value +} + +// ContextAttrs returns slog key-value pairs that materialise the frozen +// `gamemaster/README.md` §Observability log fields `request_id`, +// `trace_id`, and `span_id` from ctx. Pairs whose value is empty are +// omitted so logs stay tight. +func ContextAttrs(ctx context.Context) []any { + if ctx == nil { + return nil + } + + var attrs []any + if requestID := RequestIDFromContext(ctx); requestID != "" { + attrs = append(attrs, "request_id", requestID) + } + attrs = append(attrs, TraceAttrsFromContext(ctx)...) + return attrs +} diff --git a/gamemaster/internal/logging/logger.go b/gamemaster/internal/logging/logger.go new file mode 100644 index 0000000..09cb68b --- /dev/null +++ b/gamemaster/internal/logging/logger.go @@ -0,0 +1,45 @@ +// Package logging configures the Game Master process logger and provides +// context-aware helpers for trace fields. +package logging + +import ( + "context" + "fmt" + "log/slog" + "os" + "strings" + + "go.opentelemetry.io/otel/trace" +) + +// New constructs the process-wide JSON logger from level. +func New(level string) (*slog.Logger, error) { + var slogLevel slog.Level + if err := slogLevel.UnmarshalText([]byte(strings.TrimSpace(level))); err != nil { + return nil, fmt.Errorf("build logger: %w", err) + } + + return slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{ + Level: slogLevel, + })), nil +} + +// TraceAttrsFromContext returns slog key-value pairs for the active +// OpenTelemetry span when ctx carries a valid span context. The keys match +// the frozen `gamemaster/README.md` §Observability log fields `trace_id` +// and `span_id`. +func TraceAttrsFromContext(ctx context.Context) []any { + if ctx == nil { + return nil + } + + spanContext := trace.SpanContextFromContext(ctx) + if !spanContext.IsValid() { + return nil + } + + return []any{ + "trace_id", spanContext.TraceID().String(), + "span_id", spanContext.SpanID().String(), + } +} diff --git a/gamemaster/internal/ports/engineclient.go b/gamemaster/internal/ports/engineclient.go new file mode 100644 index 0000000..c3e06bd --- /dev/null +++ b/gamemaster/internal/ports/engineclient.go @@ -0,0 +1,125 @@ +package ports + +import ( + "context" + "encoding/json" + "errors" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_engineclient.go -package=mocks galaxy/gamemaster/internal/ports EngineClient + +// EngineClient is the narrow surface Game Master uses against a running +// engine container. The production adapter (Stage 12) speaks REST/JSON +// against the engine routes documented in `galaxy/game/openapi.yaml`: +// +// - admin paths under `/api/v1/admin/*` (init, status, turn, +// race/banish); +// - player paths under `/api/v1/{command, order, report}`. +// +// The admin-path responses are typed (Init, Status, Turn) because GM +// reads structured fields out of them (`current_turn`, `finished`, +// per-player stats). The player-path payloads are forwarded verbatim: +// the gateway transcodes FlatBuffers to JSON, GM passes the JSON +// through, and the engine response is returned to the gateway +// unchanged. +type EngineClient interface { + // Init calls POST /api/v1/admin/init. The returned StateResponse + // carries the initial player roster used to install + // `player_mappings`. + Init(ctx context.Context, baseURL string, request InitRequest) (StateResponse, error) + + // Status calls GET /api/v1/admin/status. Used by inspect surfaces + // and by recovery flows. + Status(ctx context.Context, baseURL string) (StateResponse, error) + + // Turn calls PUT /api/v1/admin/turn. The returned StateResponse + // carries the new turn number, the per-player stats projected into + // `player_turn_stats`, and the `finished` flag. + Turn(ctx context.Context, baseURL string) (StateResponse, error) + + // BanishRace calls POST /api/v1/admin/race/banish with body + // `{race_name}`. The engine returns 204 on success. + BanishRace(ctx context.Context, baseURL, raceName string) error + + // ExecuteCommands calls PUT /api/v1/command. The request payload + // is forwarded verbatim; the engine response body is returned + // verbatim. + ExecuteCommands(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) + + // PutOrders calls PUT /api/v1/order with the same forwarding + // semantics as ExecuteCommands. + PutOrders(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) + + // GetReport calls GET /api/v1/report?player=&turn=. + // The engine response body is returned verbatim. + GetReport(ctx context.Context, baseURL, raceName string, turn int) (json.RawMessage, error) +} + +// InitRequest carries the race roster sent to the engine `/admin/init` +// route. The shape mirrors `galaxy/game/openapi.yaml`'s `InitRequest`. +type InitRequest struct { + // Races stores the per-player race entries in the order returned + // by Lobby's roster. + Races []InitRace +} + +// InitRace stores one entry of an InitRequest. +type InitRace struct { + // RaceName stores the in-game race name reserved for the player. + RaceName string +} + +// StateResponse is the typed projection of the engine's `StateResponse` +// payload (`galaxy/game/openapi.yaml`). GM reads only the fields it +// needs; the adapter is allowed to discard the rest. +type StateResponse struct { + // Turn stores the engine's current turn number. + Turn int + + // Players stores the per-player state entries returned by the + // engine. Each entry is mapped into `player_turn_stats[]` by + // resolving `RaceName` through `playermappingstore.ListByGame` to + // the platform `user_id`. + Players []PlayerState + + // Finished reports whether the engine considers the game finished. + // Becomes true on a turn-generation response when the engine's + // finish condition is satisfied. + Finished bool +} + +// PlayerState stores one entry of StateResponse.Players. The set of +// fields is the minimum GM needs from the engine surface; the adapter +// may decode additional fields and discard them. +type PlayerState struct { + // RaceName stores the in-game race name. + RaceName string + + // EnginePlayerUUID stores the engine-side player handle. Populated + // from `/admin/init` and `/admin/status`. + EnginePlayerUUID string + + // Planets stores the planet count reported for this player on the + // most recent turn. + Planets int + + // Population stores the population count reported for this player + // on the most recent turn. + Population int +} + +// ErrEngineUnreachable reports that the engine returned a transport +// error or 5xx status code. Surfaced to callers as `engine_unreachable`. +var ErrEngineUnreachable = errors.New("engine unreachable") + +// ErrEngineProtocolViolation reports that the engine responded with a +// payload that did not match the expected schema (missing required +// fields, malformed JSON, unexpected types). Surfaced as +// `engine_protocol_violation`. +var ErrEngineProtocolViolation = errors.New("engine protocol violation") + +// ErrEngineValidation reports that the engine returned 4xx with a +// per-command result. Surfaced as `engine_validation_error`; the +// engine's body is returned verbatim to the caller through the player +// command/order forwarding paths. +var ErrEngineValidation = errors.New("engine validation error") diff --git a/gamemaster/internal/ports/engineversionstore.go b/gamemaster/internal/ports/engineversionstore.go new file mode 100644 index 0000000..b317d69 --- /dev/null +++ b/gamemaster/internal/ports/engineversionstore.go @@ -0,0 +1,127 @@ +package ports + +import ( + "context" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_engineversionstore.go -package=mocks galaxy/gamemaster/internal/ports EngineVersionStore + +// EngineVersionStore stores the engine version registry rows used by +// Game Lobby's start flow and by GM's admin patch and registry CRUD +// surface. Adapters must preserve domain semantics: +// +// - Get returns engineversion.ErrNotFound when no row exists for +// version. +// - List with a nil status filter returns every row; with a non-nil +// filter, only rows whose status matches are returned. +// - Insert installs a fresh row and returns engineversion.ErrConflict +// when a row with the same `version` already exists. Adapters +// surface PostgreSQL unique violations through that sentinel so +// the service layer maps them to a `conflict` REST envelope. +// - Update applies a partial update; only fields whose pointer is +// non-nil are mutated. The `updated_at` column is always refreshed +// from input.Now. +// - Deprecate sets `status=deprecated` for an existing version with +// `updated_at = now`. It returns engineversion.ErrNotFound when no +// row exists. The call is idempotent: deprecating an already +// deprecated row succeeds with no further mutation. +// - Delete removes the row identified by version. Returns +// engineversion.ErrNotFound when no row matches. The service layer +// gates Delete behind an explicit IsReferencedByActiveRuntime probe +// so referenced rows surface engineversion.ErrInUse before the +// adapter is touched; adapters do not enforce that guard themselves. +// - IsReferencedByActiveRuntime reports whether any non-finished +// `runtime_records` row currently references the version through +// `current_engine_version`. +type EngineVersionStore interface { + // Get returns the row identified by version. Returns + // engineversion.ErrNotFound when no row exists. + Get(ctx context.Context, version string) (engineversion.EngineVersion, error) + + // List returns every row whose status matches statusFilter when + // non-nil, or every row when nil. The order is adapter-defined. + List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) + + // Insert installs record into the registry. + Insert(ctx context.Context, record engineversion.EngineVersion) error + + // Update applies a partial update to the row identified by + // input.Version. Only fields whose pointer is non-nil are mutated. + // Returns engineversion.ErrNotFound when no row exists. + Update(ctx context.Context, input UpdateEngineVersionInput) error + + // Deprecate sets `status=deprecated` for version and refreshes + // `updated_at` from now. Returns engineversion.ErrNotFound when no + // row exists. Calling Deprecate on an already-deprecated row + // succeeds with no mutation (idempotent). + Deprecate(ctx context.Context, version string, now time.Time) error + + // Delete removes the row identified by version. Returns + // engineversion.ErrNotFound when no row matches. Adapters do not + // inspect runtime references; the service layer probes + // IsReferencedByActiveRuntime first and surfaces + // engineversion.ErrInUse independently. + Delete(ctx context.Context, version string) error + + // IsReferencedByActiveRuntime reports whether any non-finished + // runtime row currently references version through + // `current_engine_version`. Used by the registry hard-delete path + // to surface engineversion.ErrInUse. + IsReferencedByActiveRuntime(ctx context.Context, version string) (bool, error) +} + +// UpdateEngineVersionInput stores the arguments required to PATCH one +// engine version row. Pointer fields communicate «leave alone» (nil) +// vs. «write the value» (non-nil). At least one optional field must be +// set; otherwise the call is a no-op and Validate rejects it. +type UpdateEngineVersionInput struct { + // Version identifies the row to mutate. + Version string + + // ImageRef is the new image reference. Nil leaves the column + // unchanged; non-nil must be non-empty. + ImageRef *string + + // Options is the new options document (raw JSON). Nil leaves the + // column unchanged; non-nil writes the value verbatim. + Options *[]byte + + // Status is the new status. Nil leaves the column unchanged; + // non-nil must be a known status. + Status *engineversion.Status + + // Now stores the wall-clock used to refresh the `updated_at` + // column on every successful update. + Now time.Time +} + +// Validate reports whether input contains a structurally valid PATCH +// request. Adapters call Validate before touching the store. +func (input UpdateEngineVersionInput) Validate() error { + if strings.TrimSpace(input.Version) == "" { + return fmt.Errorf("update engine version: version must not be empty") + } + if input.ImageRef == nil && input.Options == nil && input.Status == nil { + return fmt.Errorf("update engine version: at least one field must be set") + } + if input.ImageRef != nil && strings.TrimSpace(*input.ImageRef) == "" { + return fmt.Errorf( + "update engine version: image ref must not be empty when set", + ) + } + if input.Status != nil && !input.Status.IsKnown() { + return fmt.Errorf( + "update engine version: status %q is unsupported", + *input.Status, + ) + } + if input.Now.IsZero() { + return fmt.Errorf("update engine version: now must not be zero") + } + return nil +} diff --git a/gamemaster/internal/ports/engineversionstore_test.go b/gamemaster/internal/ports/engineversionstore_test.go new file mode 100644 index 0000000..bcffc05 --- /dev/null +++ b/gamemaster/internal/ports/engineversionstore_test.go @@ -0,0 +1,101 @@ +package ports + +import ( + "testing" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// fixedNow returns a stable wall-clock used by the input-validation +// fixtures. Adapters use the value verbatim to refresh the `updated_at` +// column. +func fixedNow() time.Time { + return time.Date(2026, time.April, 27, 12, 0, 0, 0, time.UTC) +} + +func TestUpdateEngineVersionInputValidateHappy(t *testing.T) { + imageRef := "ghcr.io/galaxy/game:v1.2.4" + input := UpdateEngineVersionInput{ + Version: "v1.2.3", + ImageRef: &imageRef, + Now: fixedNow(), + } + require.NoError(t, input.Validate()) +} + +func TestUpdateEngineVersionInputValidateAcceptsStatusOnly(t *testing.T) { + status := engineversion.StatusDeprecated + input := UpdateEngineVersionInput{ + Version: "v1.2.3", + Status: &status, + Now: fixedNow(), + } + assert.NoError(t, input.Validate()) +} + +func TestUpdateEngineVersionInputValidateAcceptsOptionsOnly(t *testing.T) { + options := []byte(`{"max_planets":120}`) + input := UpdateEngineVersionInput{ + Version: "v1.2.3", + Options: &options, + Now: fixedNow(), + } + assert.NoError(t, input.Validate()) +} + +func TestUpdateEngineVersionInputValidateRejects(t *testing.T) { + emptyImage := "" + imageRef := "ghcr.io/galaxy/game:v1.2.4" + unknownStatus := engineversion.Status("exotic") + + tests := []struct { + name string + input UpdateEngineVersionInput + }{ + { + name: "empty version", + input: UpdateEngineVersionInput{ + Version: "", + ImageRef: &imageRef, + Now: fixedNow(), + }, + }, + { + name: "no fields set", + input: UpdateEngineVersionInput{Version: "v1.2.3", Now: fixedNow()}, + }, + { + name: "empty image ref pointer", + input: UpdateEngineVersionInput{ + Version: "v1.2.3", + ImageRef: &emptyImage, + Now: fixedNow(), + }, + }, + { + name: "unknown status pointer", + input: UpdateEngineVersionInput{ + Version: "v1.2.3", + Status: &unknownStatus, + Now: fixedNow(), + }, + }, + { + name: "zero now", + input: UpdateEngineVersionInput{ + Version: "v1.2.3", + ImageRef: &imageRef, + }, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + assert.Error(t, tt.input.Validate()) + }) + } +} diff --git a/gamemaster/internal/ports/lobbyclient.go b/gamemaster/internal/ports/lobbyclient.go new file mode 100644 index 0000000..10faadf --- /dev/null +++ b/gamemaster/internal/ports/lobbyclient.go @@ -0,0 +1,93 @@ +package ports + +import ( + "context" + "errors" + "time" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_lobbyclient.go -package=mocks galaxy/gamemaster/internal/ports LobbyClient + +// LobbyClient executes synchronous calls to Game Lobby. The port +// surfaces two operations: +// +// - GetMemberships — used by the membership cache to authorise player +// commands on the hot path. +// - GetGameSummary — used by the turn-generation orchestrator to +// resolve the human-readable `game_name` consumed by +// `notification:intents` payloads (`game.turn.ready`, +// `game.finished`, `game.generation_failed`). Failure is fail-soft: +// callers fall back to `game_id` rather than block the runtime +// mutation. +// +// Membership data and the game record are owned by Game Lobby; GM +// treats them as remote projections. Consequently the Membership and +// GameSummary types live on the port file rather than as domain types, +// mirroring rtmanager's `LobbyGameRecord` precedent. +type LobbyClient interface { + // GetMemberships returns every membership of gameID, in any + // status. The cache layer filters to `active` for authorisation. + // Implementations wrap any non-success outcome (transport error, + // timeout, non-2xx response) with ErrLobbyUnavailable so callers + // can branch with errors.Is. + GetMemberships(ctx context.Context, gameID string) ([]Membership, error) + + // GetGameSummary returns the narrow projection of Lobby's + // `GameRecord` GM needs to populate notification payloads with a + // human-readable `game_name`. Implementations wrap any non-success + // outcome (transport error, timeout, non-2xx response, malformed + // payload) with ErrLobbyUnavailable. + GetGameSummary(ctx context.Context, gameID string) (GameSummary, error) +} + +// Membership stores one row of the membership projection returned by +// `Lobby /api/v1/internal/games/{game_id}/memberships`. The shape +// mirrors `MembershipRecord` in +// `galaxy/lobby/api/internal-openapi.yaml`. +type Membership struct { + // UserID identifies the platform user. + UserID string + + // RaceName stores the in-game race reserved for the user. + RaceName string + + // Status reports `active`, `removed`, or `blocked`. GM authorises + // only `active` callers on the hot path. + Status string + + // JoinedAt stores the wall-clock at which the membership entered + // active. + JoinedAt time.Time + + // RemovedAt stores the wall-clock at which the membership left + // active. Nil while the membership is still active. + RemovedAt *time.Time +} + +// GameSummary stores the narrow projection of Lobby's `GameRecord` GM +// consumes today: the platform game id, the human-readable +// `game_name`, and the platform-level lifecycle status. Additional +// fields can be added without breaking consumers because every caller +// reads through the typed fields directly. +type GameSummary struct { + // GameID identifies the platform game. Echoed back from Lobby as a + // sanity check. + GameID string + + // GameName stores the human-readable game name maintained by + // Lobby. Used by the turn-generation orchestrator to populate + // `game_name` on `notification:intents` payloads. + GameName string + + // Status stores Lobby's platform-level lifecycle status (`draft`, + // `enrollment_open`, `running`, `finished`, etc.). GM does not act + // on the value today; it is captured for future audit/log use. + Status string +} + +// ErrLobbyUnavailable signals that a Lobby call could not be completed +// because the upstream service was unreachable, returned an error +// response, or timed out. GM's hot-path callers treat any non-success +// outcome uniformly: the player command is rejected with +// `service_unavailable` and the cache TTL eventually retries. +var ErrLobbyUnavailable = errors.New("lobby unavailable") diff --git a/gamemaster/internal/ports/lobbyeventspublisher.go b/gamemaster/internal/ports/lobbyeventspublisher.go new file mode 100644 index 0000000..fa9f4d4 --- /dev/null +++ b/gamemaster/internal/ports/lobbyeventspublisher.go @@ -0,0 +1,166 @@ +package ports + +import ( + "context" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/runtime" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_lobbyeventspublisher.go -package=mocks galaxy/gamemaster/internal/ports LobbyEventsPublisher + +// LobbyEventsPublisher is the producer port for the `gm:lobby_events` +// Redis Stream consumed by Game Lobby. Two message shapes share the +// stream, discriminated by `event_type` per +// `galaxy/gamemaster/api/runtime-events-asyncapi.yaml`: +// +// - runtime_snapshot_update — every turn generation outcome and every +// status / health-summary transition. +// - game_finished — the terminal event published once per game when +// the engine reports `finished:true`. +type LobbyEventsPublisher interface { + // PublishSnapshotUpdate appends a `runtime_snapshot_update` message + // to the stream. Adapters validate msg through msg.Validate before + // touching Redis. + PublishSnapshotUpdate(ctx context.Context, msg RuntimeSnapshotUpdate) error + + // PublishGameFinished appends a `game_finished` message to the + // stream. Adapters validate msg through msg.Validate before + // touching Redis. + PublishGameFinished(ctx context.Context, msg GameFinished) error +} + +// PlayerTurnStats stores the per-player projection carried on every +// `runtime_snapshot_update` and `game_finished` message. The shape is +// frozen in the AsyncAPI spec. +type PlayerTurnStats struct { + // UserID identifies the platform user. + UserID string + + // Planets stores the planet count reported for this user on the + // most recent turn. + Planets int + + // Population stores the population count reported for this user + // on the most recent turn. + Population int +} + +// Validate reports whether stats carries valid per-player projection +// values. +func (stats PlayerTurnStats) Validate() error { + if strings.TrimSpace(stats.UserID) == "" { + return fmt.Errorf("player turn stats: user id must not be empty") + } + if stats.Planets < 0 { + return fmt.Errorf("player turn stats: planets must not be negative") + } + if stats.Population < 0 { + return fmt.Errorf("player turn stats: population must not be negative") + } + return nil +} + +// RuntimeSnapshotUpdate stores the body of a `runtime_snapshot_update` +// message. +type RuntimeSnapshotUpdate struct { + // GameID identifies the game the snapshot belongs to. + GameID string + + // CurrentTurn stores the latest completed turn number. + CurrentTurn int + + // RuntimeStatus stores the latest GM-side status of the runtime. + RuntimeStatus runtime.Status + + // EngineHealthSummary stores the current health summary string. + // Empty when no observation has been processed yet. + EngineHealthSummary string + + // PlayerTurnStats stores the per-active-member projection. Empty + // when the snapshot is published for a status transition with no + // new turn payload. + PlayerTurnStats []PlayerTurnStats + + // OccurredAt stores the wall-clock at which the snapshot was + // produced. Always UTC. + OccurredAt time.Time +} + +// Validate reports whether msg satisfies the AsyncAPI-frozen invariants. +func (msg RuntimeSnapshotUpdate) Validate() error { + if strings.TrimSpace(msg.GameID) == "" { + return fmt.Errorf("runtime snapshot update: game id must not be empty") + } + if msg.CurrentTurn < 0 { + return fmt.Errorf("runtime snapshot update: current turn must not be negative") + } + if !msg.RuntimeStatus.IsKnown() { + return fmt.Errorf( + "runtime snapshot update: runtime status %q is unsupported", + msg.RuntimeStatus, + ) + } + if msg.OccurredAt.IsZero() { + return fmt.Errorf("runtime snapshot update: occurred at must not be zero") + } + for i, stats := range msg.PlayerTurnStats { + if err := stats.Validate(); err != nil { + return fmt.Errorf( + "runtime snapshot update: player turn stats[%d]: %w", + i, err, + ) + } + } + return nil +} + +// GameFinished stores the body of a `game_finished` message. +type GameFinished struct { + // GameID identifies the game that finished. + GameID string + + // FinalTurnNumber stores the turn number on which the engine + // reported `finished:true`. + FinalTurnNumber int + + // RuntimeStatus is always runtime.StatusFinished. Carried in the + // message body so consumers can apply the same decoder to both + // stream shapes. + RuntimeStatus runtime.Status + + // PlayerTurnStats stores the final per-player projection used by + // Lobby's capability evaluation. + PlayerTurnStats []PlayerTurnStats + + // FinishedAt stores the wall-clock at which the engine returned + // the finished response. Always UTC. + FinishedAt time.Time +} + +// Validate reports whether msg satisfies the AsyncAPI-frozen invariants. +func (msg GameFinished) Validate() error { + if strings.TrimSpace(msg.GameID) == "" { + return fmt.Errorf("game finished: game id must not be empty") + } + if msg.FinalTurnNumber < 0 { + return fmt.Errorf("game finished: final turn number must not be negative") + } + if msg.RuntimeStatus != runtime.StatusFinished { + return fmt.Errorf( + "game finished: runtime status must be %q, got %q", + runtime.StatusFinished, msg.RuntimeStatus, + ) + } + if msg.FinishedAt.IsZero() { + return fmt.Errorf("game finished: finished at must not be zero") + } + for i, stats := range msg.PlayerTurnStats { + if err := stats.Validate(); err != nil { + return fmt.Errorf("game finished: player turn stats[%d]: %w", i, err) + } + } + return nil +} diff --git a/gamemaster/internal/ports/lobbyeventspublisher_test.go b/gamemaster/internal/ports/lobbyeventspublisher_test.go new file mode 100644 index 0000000..eedc792 --- /dev/null +++ b/gamemaster/internal/ports/lobbyeventspublisher_test.go @@ -0,0 +1,112 @@ +package ports + +import ( + "testing" + "time" + + "galaxy/gamemaster/internal/domain/runtime" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validSnapshotUpdate() RuntimeSnapshotUpdate { + return RuntimeSnapshotUpdate{ + GameID: "game-1", + CurrentTurn: 3, + RuntimeStatus: runtime.StatusRunning, + EngineHealthSummary: "healthy", + PlayerTurnStats: []PlayerTurnStats{ + {UserID: "user-1", Planets: 1, Population: 100}, + {UserID: "user-2", Planets: 2, Population: 200}, + }, + OccurredAt: time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC), + } +} + +func validGameFinished() GameFinished { + return GameFinished{ + GameID: "game-1", + FinalTurnNumber: 42, + RuntimeStatus: runtime.StatusFinished, + PlayerTurnStats: []PlayerTurnStats{ + {UserID: "user-1", Planets: 5, Population: 500}, + }, + FinishedAt: time.Date(2026, 4, 27, 18, 30, 0, 0, time.UTC), + } +} + +func TestRuntimeSnapshotUpdateValidateHappy(t *testing.T) { + require.NoError(t, validSnapshotUpdate().Validate()) +} + +func TestRuntimeSnapshotUpdateValidateAcceptsEmptyStats(t *testing.T) { + msg := validSnapshotUpdate() + msg.PlayerTurnStats = nil + assert.NoError(t, msg.Validate()) +} + +func TestRuntimeSnapshotUpdateValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*RuntimeSnapshotUpdate) + }{ + {"empty game id", func(m *RuntimeSnapshotUpdate) { m.GameID = "" }}, + {"negative turn", func(m *RuntimeSnapshotUpdate) { m.CurrentTurn = -1 }}, + {"unknown status", func(m *RuntimeSnapshotUpdate) { m.RuntimeStatus = "exotic" }}, + {"zero occurred at", func(m *RuntimeSnapshotUpdate) { m.OccurredAt = time.Time{} }}, + {"bad stats entry", func(m *RuntimeSnapshotUpdate) { + m.PlayerTurnStats = append(m.PlayerTurnStats, PlayerTurnStats{ + UserID: "", Planets: 0, Population: 0, + }) + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + msg := validSnapshotUpdate() + tt.mutate(&msg) + assert.Error(t, msg.Validate()) + }) + } +} + +func TestGameFinishedValidateHappy(t *testing.T) { + require.NoError(t, validGameFinished().Validate()) +} + +func TestGameFinishedValidateRejects(t *testing.T) { + tests := []struct { + name string + mutate func(*GameFinished) + }{ + {"empty game id", func(m *GameFinished) { m.GameID = "" }}, + {"negative final turn", func(m *GameFinished) { m.FinalTurnNumber = -1 }}, + {"non-finished status", func(m *GameFinished) { m.RuntimeStatus = runtime.StatusRunning }}, + {"zero finished at", func(m *GameFinished) { m.FinishedAt = time.Time{} }}, + {"bad stats entry", func(m *GameFinished) { + m.PlayerTurnStats = append(m.PlayerTurnStats, PlayerTurnStats{ + UserID: "user-bad", Planets: -1, Population: 0, + }) + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + msg := validGameFinished() + tt.mutate(&msg) + assert.Error(t, msg.Validate()) + }) + } +} + +func TestPlayerTurnStatsValidateRejects(t *testing.T) { + bad := PlayerTurnStats{UserID: "", Planets: 0, Population: 0} + assert.Error(t, bad.Validate()) + + bad = PlayerTurnStats{UserID: "u", Planets: -1, Population: 0} + assert.Error(t, bad.Validate()) + + bad = PlayerTurnStats{UserID: "u", Planets: 0, Population: -1} + assert.Error(t, bad.Validate()) +} diff --git a/gamemaster/internal/ports/notificationpublisher.go b/gamemaster/internal/ports/notificationpublisher.go new file mode 100644 index 0000000..89a5c4e --- /dev/null +++ b/gamemaster/internal/ports/notificationpublisher.go @@ -0,0 +1,24 @@ +package ports + +import ( + "context" + + "galaxy/notificationintent" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_notificationpublisher.go -package=mocks galaxy/gamemaster/internal/ports NotificationIntentPublisher + +// NotificationIntentPublisher is the producer port Game Master uses to +// publish notification intents to Notification Service. The production +// adapter is a thin wrapper around `notificationintent.Publisher`. +// +// A failed Publish call is a notification degradation per +// `galaxy/gamemaster/README.md §Notification Contracts` and must not +// roll back already committed runtime state. Callers log the error +// and proceed. +type NotificationIntentPublisher interface { + // Publish normalises intent and appends it to the configured + // Redis Stream. Validation failures and transport errors are + // returned verbatim. + Publish(ctx context.Context, intent notificationintent.Intent) error +} diff --git a/gamemaster/internal/ports/operationlog.go b/gamemaster/internal/ports/operationlog.go new file mode 100644 index 0000000..dba5cd6 --- /dev/null +++ b/gamemaster/internal/ports/operationlog.go @@ -0,0 +1,24 @@ +package ports + +import ( + "context" + + "galaxy/gamemaster/internal/domain/operation" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_operationlog.go -package=mocks galaxy/gamemaster/internal/ports OperationLogStore + +// OperationLogStore stores append-only audit entries for every +// operation Game Master performs. Adapters must persist entry verbatim +// and return the generated bigserial id from Append. +type OperationLogStore interface { + // Append inserts entry into the operation log and returns the + // generated bigserial id. Adapters validate entry through + // operation.OperationEntry.Validate before touching the store. + Append(ctx context.Context, entry operation.OperationEntry) (id int64, err error) + + // ListByGame returns the most recent entries for gameID, ordered + // by started_at descending and capped by limit. A non-positive + // limit is rejected as invalid input by adapters. + ListByGame(ctx context.Context, gameID string, limit int) ([]operation.OperationEntry, error) +} diff --git a/gamemaster/internal/ports/playermappingstore.go b/gamemaster/internal/ports/playermappingstore.go new file mode 100644 index 0000000..9719104 --- /dev/null +++ b/gamemaster/internal/ports/playermappingstore.go @@ -0,0 +1,47 @@ +package ports + +import ( + "context" + + "galaxy/gamemaster/internal/domain/playermapping" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_playermappingstore.go -package=mocks galaxy/gamemaster/internal/ports PlayerMappingStore + +// PlayerMappingStore stores the (game_id, user_id) → race_name + +// engine_player_uuid projection installed at register-runtime. Adapters +// must preserve the storage-level invariants enforced by +// `00001_init.sql`: +// +// - composite primary key on `(game_id, user_id)`; +// - UNIQUE on `(game_id, race_name)` (one race per game). +// +// BulkInsert is the only ingestion path: register-runtime inserts every +// row for a game in one batch. Per-row mutation is intentionally not +// exposed; rosters are immutable for the lifetime of the runtime. +type PlayerMappingStore interface { + // BulkInsert installs every mapping in records. Adapters validate + // each record through playermapping.PlayerMapping.Validate before + // touching the store. Adapters may use a single multi-row INSERT + // or one transaction with N rows; either way the operation is + // atomic. + BulkInsert(ctx context.Context, records []playermapping.PlayerMapping) error + + // Get returns the mapping identified by (gameID, userID). Returns + // playermapping.ErrNotFound when no row exists. + Get(ctx context.Context, gameID, userID string) (playermapping.PlayerMapping, error) + + // GetByRace returns the mapping identified by (gameID, raceName). + // Used by the admin banish flow (Stage 17) to resolve the engine + // player UUID for the engine /admin/race/banish call. Returns + // playermapping.ErrNotFound when no row exists. + GetByRace(ctx context.Context, gameID, raceName string) (playermapping.PlayerMapping, error) + + // ListByGame returns every mapping owned by gameID. The order is + // adapter-defined; callers may reorder as needed. + ListByGame(ctx context.Context, gameID string) ([]playermapping.PlayerMapping, error) + + // DeleteByGame removes every mapping owned by gameID. Returns nil + // even when no rows were deleted (idempotent). + DeleteByGame(ctx context.Context, gameID string) error +} diff --git a/gamemaster/internal/ports/rtmclient.go b/gamemaster/internal/ports/rtmclient.go new file mode 100644 index 0000000..e7a8495 --- /dev/null +++ b/gamemaster/internal/ports/rtmclient.go @@ -0,0 +1,34 @@ +package ports + +import ( + "context" + "errors" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_rtmclient.go -package=mocks galaxy/gamemaster/internal/ports RTMClient + +// RTMClient executes synchronous calls to Runtime Manager over the +// trusted internal REST surface documented in +// `galaxy/rtmanager/api/internal-openapi.yaml`. GM uses RTM only for +// stop and patch lifecycle actions in v1. +// +// `Restart` is reserved per `gamemaster/PLAN.md` Stage 10 («reserved; +// not in v1 feature scope») and is intentionally absent from the v1 +// surface. It will be added in a later iteration if a use case +// emerges. +type RTMClient interface { + // Stop calls POST /api/v1/internal/runtimes/{game_id}/stop with + // body `{reason}`. Implementations wrap any non-success outcome + // with ErrRTMUnavailable so callers can branch with errors.Is. + Stop(ctx context.Context, gameID, reason string) error + + // Patch calls POST /api/v1/internal/runtimes/{game_id}/patch with + // body `{image_ref}`. Implementations wrap any non-success outcome + // with ErrRTMUnavailable so callers can branch with errors.Is. + Patch(ctx context.Context, gameID, imageRef string) error +} + +// ErrRTMUnavailable signals that a Runtime Manager call could not be +// completed because the upstream service was unreachable, returned an +// error response, or timed out. +var ErrRTMUnavailable = errors.New("runtime manager unavailable") diff --git a/gamemaster/internal/ports/runtimerecordstore.go b/gamemaster/internal/ports/runtimerecordstore.go new file mode 100644 index 0000000..c2be8c8 --- /dev/null +++ b/gamemaster/internal/ports/runtimerecordstore.go @@ -0,0 +1,307 @@ +// Package ports defines the stable interfaces that connect Game Master +// use cases to external state and external services. +package ports + +import ( + "context" + "fmt" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/runtime" +) + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_runtimerecordstore.go -package=mocks galaxy/gamemaster/internal/ports RuntimeRecordStore + +// RuntimeRecordStore stores runtime records and exposes the operations +// used by the service layer (Stages 13+) and the workers (Stages 15-18). +// Adapters must preserve domain semantics: +// +// - Get returns runtime.ErrNotFound when no record exists for gameID. +// - Insert installs a fresh record and returns runtime.ErrConflict +// when a row already exists. +// - UpdateStatus applies one transition through a compare-and-swap +// guard on the stored status and returns runtime.ErrConflict on a +// stale CAS. +// - UpdateScheduling mutates `next_generation_at`, `skip_next_tick`, +// and `current_turn` together; the destination status is unaffected. +// - UpdateImage rotates `current_image_ref` and +// `current_engine_version` under a compare-and-swap guard on the +// stored status and returns runtime.ErrConflict on a stale CAS. +// - UpdateEngineHealth rotates the `engine_health` column without +// touching status. The call applies from any status (including +// stopped and finished) so late-arriving health observations still +// bookkeep correctly. Returns runtime.ErrNotFound when no row +// matches. +// - Delete removes the record identified by gameID. The call is +// idempotent: it returns nil even when no row matches. +// - ListDueRunning returns every running record with +// `next_generation_at <= now`. +// - ListByStatus returns every record currently indexed under status. +// - List returns every record ordered by `created_at` descending. Used +// by the `internalListRuntimes` REST handler when no status filter +// is supplied. +type RuntimeRecordStore interface { + // Get returns the record identified by gameID. It returns + // runtime.ErrNotFound when no record exists. + Get(ctx context.Context, gameID string) (runtime.RuntimeRecord, error) + + // Insert installs record into the store. It returns + // runtime.ErrConflict when a row already exists for record.GameID. + Insert(ctx context.Context, record runtime.RuntimeRecord) error + + // UpdateStatus applies one status transition in a compare-and-swap + // fashion. The adapter must first call runtime.Transition to reject + // invalid pairs without touching the store, then verify that the + // stored status equals input.ExpectedFrom. Optional fields on the + // input (CurrentImageRef, CurrentEngineVersion, EngineHealthSummary) + // are persisted only when non-nil. + UpdateStatus(ctx context.Context, input UpdateStatusInput) error + + // UpdateScheduling mutates the scheduling columns + // (`next_generation_at`, `skip_next_tick`, `current_turn`) of the + // record identified by input.GameID. The store does not validate + // the runtime status; callers issue UpdateScheduling alongside an + // UpdateStatus when the destination status changes. + UpdateScheduling(ctx context.Context, input UpdateSchedulingInput) error + + // UpdateImage rotates `current_image_ref` and + // `current_engine_version` of the record identified by + // input.GameID under a compare-and-swap guard on the stored status. + // The destination status is unchanged. Used by the admin patch + // flow (Stage 17) where the runtime stays `running` while the + // engine container is recreated by Runtime Manager with a new + // image. Returns runtime.ErrNotFound when no row matches and + // runtime.ErrConflict when the stored status differs from + // input.ExpectedStatus. + UpdateImage(ctx context.Context, input UpdateImageInput) error + + // UpdateEngineHealth rotates the `engine_health` column of the + // record identified by input.GameID without touching status. Used + // by the runtime:health_events consumer (Stage 18) when an + // observation should refresh the summary regardless of the current + // runtime status (including stopped and finished, so late-arriving + // events still bookkeep correctly). Returns runtime.ErrNotFound + // when no row matches. + UpdateEngineHealth(ctx context.Context, input UpdateEngineHealthInput) error + + // Delete removes the record identified by gameID. The call is + // idempotent: it returns nil even when no row matches. Used by the + // register-runtime rollback path (Stage 13) when the engine + // /admin/init call or any later setup step fails after the row has + // been installed with status=starting. + Delete(ctx context.Context, gameID string) error + + // ListDueRunning returns every record whose status is `running` + // and whose `next_generation_at <= now`. The order is + // adapter-defined; callers may reorder as needed. + ListDueRunning(ctx context.Context, now time.Time) ([]runtime.RuntimeRecord, error) + + // ListByStatus returns every record currently indexed under status. + // The order is adapter-defined; callers may reorder as needed. + ListByStatus(ctx context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) + + // List returns every record in the store, ordered by `created_at` + // descending. Used by the `internalListRuntimes` REST handler when no + // status filter is supplied. + List(ctx context.Context) ([]runtime.RuntimeRecord, error) +} + +// UpdateStatusInput stores the arguments required to apply one status +// transition through a RuntimeRecordStore. The optional fields are +// pointers so the adapter can distinguish «leave alone» from «write +// the zero value». +type UpdateStatusInput struct { + // GameID identifies the record to mutate. + GameID string + + // ExpectedFrom stores the status the caller believes the record + // currently has. A mismatch results in runtime.ErrConflict. + ExpectedFrom runtime.Status + + // To stores the destination status. + To runtime.Status + + // Now stores the wall-clock used to derive the lifecycle timestamps + // (started_at, stopped_at, finished_at, updated_at) according to + // To. + Now time.Time + + // EngineHealthSummary is the new value of the `engine_health` + // column. Nil leaves the column unchanged. + EngineHealthSummary *string + + // CurrentImageRef is the new value of the `current_image_ref` + // column. Nil leaves the column unchanged. Used by the patch flow + // (Stage 17) when the image reference rotates together with the + // status update. + CurrentImageRef *string + + // CurrentEngineVersion is the new value of the + // `current_engine_version` column. Nil leaves the column unchanged. + // Used by the patch flow when the engine version rotates together + // with the status update. + CurrentEngineVersion *string +} + +// Validate reports whether input contains a structurally valid status +// transition request. Adapters call Validate before touching the store. +func (input UpdateStatusInput) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("update runtime status: game id must not be empty") + } + if !input.ExpectedFrom.IsKnown() { + return fmt.Errorf( + "update runtime status: expected from status %q is unsupported", + input.ExpectedFrom, + ) + } + if !input.To.IsKnown() { + return fmt.Errorf( + "update runtime status: to status %q is unsupported", + input.To, + ) + } + if err := runtime.Transition(input.ExpectedFrom, input.To); err != nil { + return fmt.Errorf("update runtime status: %w", err) + } + if input.Now.IsZero() { + return fmt.Errorf("update runtime status: now must not be zero") + } + if input.CurrentImageRef != nil && strings.TrimSpace(*input.CurrentImageRef) == "" { + return fmt.Errorf( + "update runtime status: current image ref must not be empty when set", + ) + } + if input.CurrentEngineVersion != nil && strings.TrimSpace(*input.CurrentEngineVersion) == "" { + return fmt.Errorf( + "update runtime status: current engine version must not be empty when set", + ) + } + return nil +} + +// UpdateSchedulingInput stores the arguments required to mutate the +// scheduling columns of one runtime record. The status enum is +// deliberately absent: scheduling and status updates are independent +// operations and the service layer composes them when both must change. +type UpdateSchedulingInput struct { + // GameID identifies the record to mutate. + GameID string + + // NextGenerationAt is the new value of the column. Nil writes SQL + // NULL (used to clear the tick when the runtime leaves running). + NextGenerationAt *time.Time + + // SkipNextTick is the new value of the column. The store overwrites + // the column unconditionally. + SkipNextTick bool + + // CurrentTurn is the new value of the column. Must be non-negative. + CurrentTurn int + + // Now stores the wall-clock used to refresh `updated_at`. + Now time.Time +} + +// Validate reports whether input contains structurally valid scheduling +// arguments. Adapters call Validate before touching the store. +func (input UpdateSchedulingInput) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("update runtime scheduling: game id must not be empty") + } + if input.CurrentTurn < 0 { + return fmt.Errorf("update runtime scheduling: current turn must not be negative") + } + if input.NextGenerationAt != nil && input.NextGenerationAt.IsZero() { + return fmt.Errorf( + "update runtime scheduling: next generation at must not be zero when set", + ) + } + if input.Now.IsZero() { + return fmt.Errorf("update runtime scheduling: now must not be zero") + } + return nil +} + +// UpdateImageInput stores the arguments required to rotate the engine +// image reference and version of one runtime record without changing +// its status. The store applies a compare-and-swap guard on +// `(game_id, status)` so callers can reject the update if the runtime +// has drifted out of the expected status. +type UpdateImageInput struct { + // GameID identifies the record to mutate. + GameID string + + // ExpectedStatus stores the status the caller believes the record + // currently has. A mismatch results in runtime.ErrConflict. + ExpectedStatus runtime.Status + + // CurrentImageRef stores the new value of the + // `current_image_ref` column. Must not be empty. + CurrentImageRef string + + // CurrentEngineVersion stores the new value of the + // `current_engine_version` column. Must not be empty. + CurrentEngineVersion string + + // Now stores the wall-clock used to refresh `updated_at`. + Now time.Time +} + +// Validate reports whether input contains structurally valid image +// rotation arguments. Adapters call Validate before touching the store. +func (input UpdateImageInput) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("update runtime image: game id must not be empty") + } + if !input.ExpectedStatus.IsKnown() { + return fmt.Errorf( + "update runtime image: expected status %q is unsupported", + input.ExpectedStatus, + ) + } + if strings.TrimSpace(input.CurrentImageRef) == "" { + return fmt.Errorf("update runtime image: current image ref must not be empty") + } + if strings.TrimSpace(input.CurrentEngineVersion) == "" { + return fmt.Errorf("update runtime image: current engine version must not be empty") + } + if input.Now.IsZero() { + return fmt.Errorf("update runtime image: now must not be zero") + } + return nil +} + +// UpdateEngineHealthInput stores the arguments required to rotate the +// `engine_health` column of one runtime record without touching its +// status. The store performs no compare-and-swap so callers can apply +// the update from any runtime status (including stopped and finished) +// to keep the summary current for late-arriving runtime:health_events. +type UpdateEngineHealthInput struct { + // GameID identifies the record to mutate. + GameID string + + // EngineHealthSummary stores the new value of the `engine_health` + // column. The summary is a free-form short string drawn from the + // vocabulary documented in + // `gamemaster/README.md §Persistence Layout` and produced by the + // Stage 18 consumer. + EngineHealthSummary string + + // Now stores the wall-clock used to refresh `updated_at`. + Now time.Time +} + +// Validate reports whether input carries structurally valid arguments +// for an engine-health update. Adapters call Validate before touching +// the store. +func (input UpdateEngineHealthInput) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("update runtime engine health: game id must not be empty") + } + if input.Now.IsZero() { + return fmt.Errorf("update runtime engine health: now must not be zero") + } + return nil +} diff --git a/gamemaster/internal/ports/runtimerecordstore_test.go b/gamemaster/internal/ports/runtimerecordstore_test.go new file mode 100644 index 0000000..f2d3761 --- /dev/null +++ b/gamemaster/internal/ports/runtimerecordstore_test.go @@ -0,0 +1,122 @@ +package ports + +import ( + "errors" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/runtime" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func validUpdateStatusInput() UpdateStatusInput { + return UpdateStatusInput{ + GameID: "game-1", + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusGenerationInProgress, + Now: time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC), + } +} + +func validUpdateSchedulingInput() UpdateSchedulingInput { + next := time.Date(2026, 4, 27, 18, 0, 0, 0, time.UTC) + return UpdateSchedulingInput{ + GameID: "game-1", + NextGenerationAt: &next, + SkipNextTick: false, + CurrentTurn: 1, + Now: time.Date(2026, 4, 27, 12, 0, 0, 0, time.UTC), + } +} + +func TestUpdateStatusInputValidateHappy(t *testing.T) { + require.NoError(t, validUpdateStatusInput().Validate()) +} + +func TestUpdateStatusInputValidateAcceptsOptionalFields(t *testing.T) { + imageRef := "ghcr.io/galaxy/game:v1.2.4" + version := "v1.2.4" + summary := "healthy" + + input := validUpdateStatusInput() + input.CurrentImageRef = &imageRef + input.CurrentEngineVersion = &version + input.EngineHealthSummary = &summary + + assert.NoError(t, input.Validate()) +} + +func TestUpdateStatusInputValidateRejects(t *testing.T) { + emptyImageRef := "" + emptyVersion := "" + + tests := []struct { + name string + mutate func(*UpdateStatusInput) + }{ + {"empty game id", func(i *UpdateStatusInput) { i.GameID = "" }}, + {"unknown expected from", func(i *UpdateStatusInput) { i.ExpectedFrom = "exotic" }}, + {"unknown to", func(i *UpdateStatusInput) { i.To = "exotic" }}, + {"zero now", func(i *UpdateStatusInput) { i.Now = time.Time{} }}, + {"empty image ref pointer", func(i *UpdateStatusInput) { + i.CurrentImageRef = &emptyImageRef + }}, + {"empty engine version pointer", func(i *UpdateStatusInput) { + i.CurrentEngineVersion = &emptyVersion + }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + input := validUpdateStatusInput() + tt.mutate(&input) + assert.Error(t, input.Validate()) + }) + } +} + +func TestUpdateStatusInputValidateRejectsForbiddenTransition(t *testing.T) { + input := validUpdateStatusInput() + input.ExpectedFrom = runtime.StatusFinished + input.To = runtime.StatusRunning + + err := input.Validate() + require.Error(t, err) + assert.True(t, errors.Is(err, runtime.ErrInvalidTransition)) +} + +func TestUpdateSchedulingInputValidateHappy(t *testing.T) { + require.NoError(t, validUpdateSchedulingInput().Validate()) +} + +func TestUpdateSchedulingInputValidateAcceptsNullNextGen(t *testing.T) { + input := validUpdateSchedulingInput() + input.NextGenerationAt = nil + assert.NoError(t, input.Validate()) +} + +func TestUpdateSchedulingInputValidateRejects(t *testing.T) { + zero := time.Time{} + + tests := []struct { + name string + mutate func(*UpdateSchedulingInput) + }{ + {"empty game id", func(i *UpdateSchedulingInput) { i.GameID = "" }}, + {"negative current turn", func(i *UpdateSchedulingInput) { i.CurrentTurn = -1 }}, + {"zero next gen pointer", func(i *UpdateSchedulingInput) { + i.NextGenerationAt = &zero + }}, + {"zero now", func(i *UpdateSchedulingInput) { i.Now = time.Time{} }}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + input := validUpdateSchedulingInput() + tt.mutate(&input) + assert.Error(t, input.Validate()) + }) + } +} diff --git a/gamemaster/internal/ports/streamoffsetstore.go b/gamemaster/internal/ports/streamoffsetstore.go new file mode 100644 index 0000000..5b4c24f --- /dev/null +++ b/gamemaster/internal/ports/streamoffsetstore.go @@ -0,0 +1,25 @@ +package ports + +import "context" + +//go:generate go run go.uber.org/mock/mockgen -destination=../adapters/mocks/mock_streamoffsetstore.go -package=mocks galaxy/gamemaster/internal/ports StreamOffsetStore + +// StreamOffsetStore persists the last successfully processed Redis +// Stream entry id per consumer label. Workers call Load on startup to +// resume from the persisted offset and Save after every successful +// message handling so the next iteration advances past the +// just-processed entry. The label is the short logical identifier of +// the consumer (e.g., `health_events`), not the full stream name; it +// stays stable when the underlying stream key is renamed. +type StreamOffsetStore interface { + // Load returns the last processed entry id for the consumer + // labelled stream when one is stored. The boolean return reports + // whether a value was present; implementations must not return an + // error for a missing key. + Load(ctx context.Context, stream string) (entryID string, found bool, err error) + + // Save stores entryID as the new last processed offset for the + // consumer labelled stream. Implementations overwrite any previous + // value unconditionally. + Save(ctx context.Context, stream, entryID string) error +} diff --git a/gamemaster/internal/service/adminbanish/errors.go b/gamemaster/internal/service/adminbanish/errors.go new file mode 100644 index 0000000..de4409e --- /dev/null +++ b/gamemaster/internal/service/adminbanish/errors.go @@ -0,0 +1,42 @@ +package adminbanish + +// Stable error codes returned in `Result.ErrorCode`. The values match +// the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Service-layer callers (Stage +// 19 handlers) import these names rather than redeclare them; renaming +// any of them is a contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty GameID or RaceName). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no runtime_records row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeForbidden reports that the requested race is not in the + // game's roster (`player_mappings.GetByRace` returned not-found). + ErrorCodeForbidden = "forbidden" + + // ErrorCodeEngineUnreachable reports that the engine + // `/admin/race/banish` call returned a 5xx, timed out, or could + // not be dispatched. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine + // `/admin/race/banish` call returned a 4xx response (e.g. invalid + // race name). + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine + // response did not match the expected protocol shape. + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/adminbanish/service.go b/gamemaster/internal/service/adminbanish/service.go new file mode 100644 index 0000000..fa32296 --- /dev/null +++ b/gamemaster/internal/service/adminbanish/service.go @@ -0,0 +1,317 @@ +// Package adminbanish implements the admin banish service-layer +// orchestrator owned by Game Master. It is driven by Game Lobby (and, +// in a later iteration, Admin Service) through +// `POST /api/v1/internal/games/{game_id}/race/{race_name}/banish` after +// a permanent membership removal at the platform level. The flow +// resolves the race against the installed roster, calls the engine +// `/admin/race/banish` endpoint, and writes one operation_log row. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Banish`. Design rationale (no runtime status check, +// missing race surfaces as `forbidden`) is captured in +// `gamemaster/docs/stage17-admin-operations.md`. +package adminbanish + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" +) + +// Input stores the per-call arguments for one admin banish operation. +type Input struct { + // GameID identifies the runtime the race belongs to. + GameID string + + // RaceName stores the platform race name to banish. + RaceName string + + // OpSource classifies how the request entered Game Master. Used to + // stamp `operation_log.op_source`. Defaults to `lobby_internal` + // when missing or unrecognised — Lobby is the only v1 caller. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (REST + // request id). Empty when the caller does not provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(input.RaceName) == "" { + return fmt.Errorf("race name must not be empty") + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. Business +// outcomes flow through Result; the Go-level error return is reserved +// for non-business failures (nil context, nil receiver). +type Result struct { + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords supplies the engine endpoint used for the engine + // call. + RuntimeRecords ports.RuntimeRecordStore + + // PlayerMappings resolves the race against the installed roster. + PlayerMappings ports.PlayerMappingStore + + // OperationLogs records the audit entry. + OperationLogs ports.OperationLogStore + + // Engine drives the `/admin/race/banish` call. + Engine ports.EngineClient + + // Telemetry is required: every banish call ends with a + // `gamemaster.banish.outcomes` counter sample. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the admin banish lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + playerMappings ports.PlayerMappingStore + operationLogs ports.OperationLogStore + engine ports.EngineClient + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new admin banish service: nil runtime records") + case deps.PlayerMappings == nil: + return nil, errors.New("new admin banish service: nil player mappings") + case deps.OperationLogs == nil: + return nil, errors.New("new admin banish service: nil operation logs") + case deps.Engine == nil: + return nil, errors.New("new admin banish service: nil engine client") + case deps.Telemetry == nil: + return nil, errors.New("new admin banish service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.adminbanish") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + playerMappings: deps.PlayerMappings, + operationLogs: deps.OperationLogs, + engine: deps.Engine, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one admin banish operation end-to-end. The Go-level +// error return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("admin banish: nil service") + } + if ctx == nil { + return Result{}, errors.New("admin banish: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, err.Error()), nil + } + + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case errors.Is(err, runtime.ErrNotFound): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist"), nil + case err != nil: + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error())), nil + } + + if _, err := service.playerMappings.GetByRace(ctx, input.GameID, input.RaceName); err != nil { + switch { + case errors.Is(err, playermapping.ErrNotFound): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeForbidden, fmt.Sprintf("race %q not in roster", input.RaceName)), nil + default: + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get player mapping by race: %s", err.Error())), nil + } + } + + if err := service.engine.BanishRace(ctx, record.EngineEndpoint, input.RaceName); err != nil { + errorCode := classifyEngineError(err) + return service.recordFailure(ctx, opStartedAt, input, + errorCode, fmt.Sprintf("engine banish: %s", err.Error())), nil + } + + service.appendSuccessLog(ctx, opStartedAt, input) + service.telemetry.RecordBanishOutcome(ctx, string(operation.OutcomeSuccess), "") + + logArgs := []any{ + "game_id", input.GameID, + "race_name", input.RaceName, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "race banished", logArgs...) + + return Result{Outcome: operation.OutcomeSuccess}, nil +} + +// recordFailure assembles the failure Result, appends the +// operation_log failure entry, emits telemetry, and returns the +// structured outcome. +func (service *Service) recordFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + service.telemetry.RecordBanishOutcome(ctx, string(operation.OutcomeFailure), errorCode) + + logArgs := []any{ + "game_id", input.GameID, + "race_name", input.RaceName, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "admin banish rejected", logArgs...) + + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// classifyEngineError maps the engine port sentinels to the +// admin-banish stable error codes. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// appendSuccessLog records the success operation_log entry. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindBanish, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailureLog records the failure operation_log entry. Skipped +// when the input game id is empty so the entry validator does not +// reject an audit row that adds no value. +func (service *Service) appendFailureLog(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) { + if strings.TrimSpace(input.GameID) == "" { + return + } + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindBanish, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the engine state and runtime row are the source of +// truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to `lobby_internal` when the caller did +// not supply a known op source. Lobby is the only v1 banish caller; an +// `admin_rest` source is preserved when explicitly set so future Admin +// Service traffic is identifiable. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceLobbyInternal +} diff --git a/gamemaster/internal/service/adminbanish/service_test.go b/gamemaster/internal/service/adminbanish/service_test.go new file mode 100644 index 0000000..64e8575 --- /dev/null +++ b/gamemaster/internal/service/adminbanish/service_test.go @@ -0,0 +1,415 @@ +package adminbanish_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/adminbanish" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +type fakePlayerMappings struct { + mu sync.Mutex + races map[string]map[string]playermapping.PlayerMapping + getErr error +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{races: map[string]map[string]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) seedRace(gameID, raceName, userID, uuid string) { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.races[gameID]; !ok { + s.races[gameID] = map[string]playermapping.PlayerMapping{} + } + s.races[gameID][raceName] = playermapping.PlayerMapping{ + GameID: gameID, UserID: userID, RaceName: raceName, EnginePlayerUUID: uuid, + CreatedAt: time.Now(), + } +} + +func (s *fakePlayerMappings) BulkInsert(context.Context, []playermapping.PlayerMapping) error { + return errors.New("not used") +} +func (s *fakePlayerMappings) Get(context.Context, string, string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} +func (s *fakePlayerMappings) GetByRace(_ context.Context, gameID, raceName string) (playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return playermapping.PlayerMapping{}, s.getErr + } + gameRaces, ok := s.races[gameID] + if !ok { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + rec, ok := gameRaces[raceName] + if !ok { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + return rec, nil +} +func (s *fakePlayerMappings) ListByGame(context.Context, string) ([]playermapping.PlayerMapping, error) { + return nil, errors.New("not used") +} +func (s *fakePlayerMappings) DeleteByGame(context.Context, string) error { + return errors.New("not used") +} + +type fakeOperationLogs struct { + mu sync.Mutex + entries []operation.OperationEntry +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} +func (s *fakeOperationLogs) ListByGame(context.Context, string, int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used") +} +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + ctrl *gomock.Controller + runtime *fakeRuntimeRecords + mappings *fakePlayerMappings + logs *fakeOperationLogs + engine *mocks.MockEngineClient + telemetry *telemetry.Runtime + now time.Time + service *adminbanish.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + h := &harness{ + t: t, + ctrl: ctrl, + runtime: newFakeRuntimeRecords(), + mappings: newFakePlayerMappings(), + logs: &fakeOperationLogs{}, + engine: mocks.NewMockEngineClient(ctrl), + telemetry: telemetryRuntime, + now: time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC), + } + service, err := adminbanish.NewService(adminbanish.Dependencies{ + RuntimeRecords: h.runtime, + PlayerMappings: h.mappings, + OperationLogs: h.logs, + Engine: h.engine, + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +const ( + testGameID = "game-001" + testRaceName = "Aelinari" + testEndpoint = "http://galaxy-game-game-001:8080" +) + +func (h *harness) seedRuntime(status runtime.Status) { + created := h.now.Add(-time.Hour) + started := h.now.Add(-30 * time.Minute) + record := runtime.RuntimeRecord{ + GameID: testGameID, + Status: status, + EngineEndpoint: testEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 7, + CreatedAt: created, + UpdatedAt: started, + StartedAt: &started, + } + h.runtime.seed(record) +} + +func baseInput() adminbanish.Input { + return adminbanish.Input{ + GameID: testGameID, + RaceName: testRaceName, + OpSource: operation.OpSourceLobbyInternal, + SourceRef: "req-banish-001", + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*adminbanish.Dependencies) + }{ + {"runtime records", func(d *adminbanish.Dependencies) { d.RuntimeRecords = nil }}, + {"player mappings", func(d *adminbanish.Dependencies) { d.PlayerMappings = nil }}, + {"operation logs", func(d *adminbanish.Dependencies) { d.OperationLogs = nil }}, + {"engine", func(d *adminbanish.Dependencies) { d.Engine = nil }}, + {"telemetry", func(d *adminbanish.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := adminbanish.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + PlayerMappings: newFakePlayerMappings(), + OperationLogs: &fakeOperationLogs{}, + Engine: mocks.NewMockEngineClient(ctrl), + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := adminbanish.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName).Return(nil) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "want success, got %+v", result) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindBanish, entry.OpKind) + assert.Equal(t, operation.OpSourceLobbyInternal, entry.OpSource) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) +} + +func TestHandleHappyPathOnStoppedRuntime(t *testing.T) { + // README §Banish does not check status; the engine call may fail + // later with engine_unreachable, but the service runs the call. + h := newHarness(t) + h.seedRuntime(runtime.StatusStopped) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName).Return(nil) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleForbiddenWhenRaceMissing(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeForbidden, result.ErrorCode) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, adminbanish.ErrorCodeForbidden, entry.ErrorCode) +} + +func TestHandleEngineUnreachable(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName). + Return(ports.ErrEngineUnreachable) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeEngineUnreachable, result.ErrorCode) +} + +func TestHandleEngineValidation(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName). + Return(ports.ErrEngineValidation) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeEngineValidationError, result.ErrorCode) +} + +func TestHandleEngineProtocolViolation(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName). + Return(ports.ErrEngineProtocolViolation) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeEngineProtocolViolation, result.ErrorCode) +} + +func TestHandleStoreReadFailure(t *testing.T) { + h := newHarness(t) + h.runtime.getErr = errors.New("connection refused") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleMappingStoreFailure(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.getErr = errors.New("connection refused") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + input adminbanish.Input + }{ + {"empty game id", adminbanish.Input{GameID: "", RaceName: "X", OpSource: operation.OpSourceLobbyInternal}}, + {"empty race", adminbanish.Input{GameID: testGameID, RaceName: "", OpSource: operation.OpSourceLobbyInternal}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), tc.input) + require.NoError(t, err) + assert.Equal(t, adminbanish.ErrorCodeInvalidRequest, result.ErrorCode) + }) + } +} + +func TestHandleNilContextReturnsError(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, baseInput()) //nolint:staticcheck // guard test + require.Error(t, err) +} + +func TestHandleDefaultsOpSourceToLobbyInternal(t *testing.T) { + h := newHarness(t) + h.seedRuntime(runtime.StatusRunning) + h.mappings.seedRace(testGameID, testRaceName, "user-1", "uuid-1") + h.engine.EXPECT().BanishRace(gomock.Any(), testEndpoint, testRaceName).Return(nil) + + input := baseInput() + input.OpSource = "" + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpSourceLobbyInternal, entry.OpSource) +} diff --git a/gamemaster/internal/service/adminforce/errors.go b/gamemaster/internal/service/adminforce/errors.go new file mode 100644 index 0000000..146c843 --- /dev/null +++ b/gamemaster/internal/service/adminforce/errors.go @@ -0,0 +1,50 @@ +package adminforce + +// Stable error codes returned in `Result.ErrorCode`. The values match +// the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Service-layer callers (Stage +// 19 handlers) import these names rather than redeclare them; renaming +// any of them is a contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty GameID). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that the underlying turn + // generation could not find a runtime_records row for the + // requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeRuntimeNotRunning reports that the runtime is not in + // `running`. Force-next-turn requires the same precondition the + // scheduler ticker enforces. + ErrorCodeRuntimeNotRunning = "runtime_not_running" + + // ErrorCodeConflict reports that the underlying CAS to + // `generation_in_progress` lost the race to a concurrent mutation + // (admin stop / health observation / scheduler tick). + ErrorCodeConflict = "conflict" + + // ErrorCodeEngineUnreachable reports that the engine /admin/turn + // call returned a 5xx, timed out, or could not be dispatched. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine + // /admin/turn call returned a 4xx. + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine + // response did not match the expected schema or the installed + // roster. + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL, Redis, Lobby) was unreachable for this + // call. Also covers the post-success scheduling write that + // installs `skip_next_tick=true`. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/adminforce/service.go b/gamemaster/internal/service/adminforce/service.go new file mode 100644 index 0000000..678ed5a --- /dev/null +++ b/gamemaster/internal/service/adminforce/service.go @@ -0,0 +1,343 @@ +// Package adminforce implements the admin force-next-turn service-layer +// orchestrator owned by Game Master. It is driven by Admin Service or +// system administrators through +// `POST /api/v1/internal/runtimes/{game_id}/force-next-turn` and runs +// the turn-generation flow synchronously, then sets +// `runtime_records.skip_next_tick=true` so the next scheduler-driven +// generation skips one regular cron step. +// +// The skip rule guarantees that the inter-turn spacing is never shorter +// than one schedule interval, regardless of when the force is issued. +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Force-next-turn`. Design rationale is captured in +// `gamemaster/docs/stage17-admin-operations.md`. +package adminforce + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" +) + +// TurnGenerator narrows `*turngeneration.Service` to the single method +// adminforce calls. The interface lets tests substitute a stub without +// constructing the entire turn-generation collaborator graph. +type TurnGenerator interface { + Handle(ctx context.Context, input turngeneration.Input) (turngeneration.Result, error) +} + +// Input stores the per-call arguments for one admin force-next-turn +// operation. +type Input struct { + // GameID identifies the runtime to advance. + GameID string + + // OpSource classifies how the request entered Game Master. Used to + // stamp `operation_log.op_source` on both the driver entry and the + // inner turn-generation entry. Defaults to `admin_rest` when + // missing or unrecognised. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (REST + // request id, admin user id). Empty when the caller does not + // provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before the inner turn-generation call. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. Business +// outcomes flow through Result; the Go-level error return is reserved +// for non-business failures (nil context, nil receiver). +type Result struct { + // TurnGeneration carries the inner turn-generation result. Always + // populated when Handle returns nil error and the input passed + // validation; zero on early-rejection failures + // (invalid_request). + TurnGeneration turngeneration.Result + + // SkipScheduled reports whether the post-success + // `skip_next_tick=true` write landed. False on failure paths and + // when the inner turn-generation surfaced a failure. + SkipScheduled bool + + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords drives the post-success scheduling write that + // installs `skip_next_tick=true`. + RuntimeRecords ports.RuntimeRecordStore + + // OperationLogs records the audit driver entry + // (`op_kind=force_next_turn`). + OperationLogs ports.OperationLogStore + + // TurnGeneration runs the inner turn-generation flow. Required. + TurnGeneration TurnGenerator + + // Telemetry is required: every adminforce call ends with a + // telemetry record on the inner turn-generation counter. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the admin force-next-turn lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + operationLogs ports.OperationLogStore + turnGen TurnGenerator + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new admin force service: nil runtime records") + case deps.OperationLogs == nil: + return nil, errors.New("new admin force service: nil operation logs") + case deps.TurnGeneration == nil: + return nil, errors.New("new admin force service: nil turn generation") + case deps.Telemetry == nil: + return nil, errors.New("new admin force service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.adminforce") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + operationLogs: deps.OperationLogs, + turnGen: deps.TurnGeneration, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one admin force-next-turn operation end-to-end. +// The Go-level error return is reserved for non-business failures (nil +// context, nil receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("admin force: nil service") + } + if ctx == nil { + return Result{}, errors.New("admin force: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, err.Error()), nil + } + + turnResult, err := service.turnGen.Handle(ctx, turngeneration.Input{ + GameID: input.GameID, + Trigger: turngeneration.TriggerForce, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + }) + if err != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeInternal, fmt.Sprintf("turn generation: %s", err.Error())), nil + } + if !turnResult.IsSuccess() { + errorCode := turnResult.ErrorCode + if errorCode == "" { + errorCode = ErrorCodeInternal + } + return service.recordFailureWithTurn(ctx, opStartedAt, input, turnResult, + errorCode, turnResult.ErrorMessage), nil + } + + scheduledAt := service.clock().UTC() + scheduling := ports.UpdateSchedulingInput{ + GameID: input.GameID, + NextGenerationAt: turnResult.Record.NextGenerationAt, + SkipNextTick: true, + CurrentTurn: turnResult.Record.CurrentTurn, + Now: scheduledAt, + } + if err := service.runtimeRecords.UpdateScheduling(ctx, scheduling); err != nil { + // The forced turn already landed; the skip flag did not. Report + // as a service_unavailable so the admin UI can retry the skip + // without re-driving the engine. + return service.recordFailureWithTurn(ctx, opStartedAt, input, turnResult, + ErrorCodeServiceUnavailable, + fmt.Sprintf("update scheduling skip flag: %s", err.Error())), nil + } + + service.appendSuccessLog(ctx, opStartedAt, input) + + logArgs := []any{ + "game_id", input.GameID, + "current_turn", turnResult.Record.CurrentTurn, + "finished", turnResult.Finished, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "force next turn applied", logArgs...) + + return Result{ + TurnGeneration: turnResult, + SkipScheduled: true, + Outcome: operation.OutcomeSuccess, + }, nil +} + +// recordFailure records a failure that occurred before the inner +// turn-generation result was available. +func (service *Service) recordFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + + logArgs := []any{ + "game_id", input.GameID, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "force next turn rejected", logArgs...) + + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// recordFailureWithTurn records a failure after the inner turn- +// generation step ran, propagating its result for caller-side +// telemetry. +func (service *Service) recordFailureWithTurn(ctx context.Context, opStartedAt time.Time, input Input, turnResult turngeneration.Result, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + + logArgs := []any{ + "game_id", input.GameID, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "force next turn failed", logArgs...) + + return Result{ + TurnGeneration: turnResult, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// appendSuccessLog records the success driver operation_log entry. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindForceNextTurn, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailureLog records the failure driver operation_log entry. +func (service *Service) appendFailureLog(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) { + finishedAt := service.clock().UTC() + gameID := input.GameID + if strings.TrimSpace(gameID) == "" { + // Validation guard: the entry validator rejects empty GameID. + // Skip the audit entry instead of crashing the service. + return + } + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: gameID, + OpKind: operation.OpKindForceNextTurn, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the runtime row is the source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to `admin_rest` when the caller did not +// supply a known op source. Mirrors `gamemaster/README.md §Trusted +// Surfaces`. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/adminforce/service_test.go b/gamemaster/internal/service/adminforce/service_test.go new file mode 100644 index 0000000..f16134d --- /dev/null +++ b/gamemaster/internal/service/adminforce/service_test.go @@ -0,0 +1,437 @@ +package adminforce_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/adminforce" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + schErr error + scheds []ports.UpdateSchedulingInput +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(_ context.Context, input ports.UpdateSchedulingInput) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.schErr != nil { + return s.schErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if input.NextGenerationAt != nil { + next := *input.NextGenerationAt + record.NextGenerationAt = &next + } else { + record.NextGenerationAt = nil + } + record.SkipNextTick = input.SkipNextTick + record.CurrentTurn = input.CurrentTurn + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + s.scheds = append(s.scheds, input) + return nil +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +type fakeOperationLogs struct { + mu sync.Mutex + entries []operation.OperationEntry +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} +func (s *fakeOperationLogs) ListByGame(context.Context, string, int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used") +} +func (s *fakeOperationLogs) snapshot() []operation.OperationEntry { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]operation.OperationEntry, len(s.entries)) + copy(out, s.entries) + return out +} +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} + +type fakeTurnGenerator struct { + mu sync.Mutex + calls []turngeneration.Input + result turngeneration.Result + err error +} + +func (s *fakeTurnGenerator) Handle(_ context.Context, input turngeneration.Input) (turngeneration.Result, error) { + s.mu.Lock() + defer s.mu.Unlock() + s.calls = append(s.calls, input) + return s.result, s.err +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + runtime *fakeRuntimeRecords + logs *fakeOperationLogs + turn *fakeTurnGenerator + telemetry *telemetry.Runtime + now time.Time + service *adminforce.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + h := &harness{ + t: t, + runtime: newFakeRuntimeRecords(), + logs: &fakeOperationLogs{}, + turn: &fakeTurnGenerator{}, + telemetry: telemetryRuntime, + now: time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC), + } + service, err := adminforce.NewService(adminforce.Dependencies{ + RuntimeRecords: h.runtime, + OperationLogs: h.logs, + TurnGeneration: h.turn, + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +func (h *harness) seedRunningRecord() runtime.RuntimeRecord { + created := h.now.Add(-time.Hour) + started := h.now.Add(-30 * time.Minute) + next := h.now.Add(30 * time.Minute) + record := runtime.RuntimeRecord{ + GameID: "game-001", + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-game-001:8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 5, + NextGenerationAt: &next, + EngineHealth: "healthy", + CreatedAt: created, + UpdatedAt: started, + StartedAt: &started, + } + h.runtime.seed(record) + return record +} + +func baseInput() adminforce.Input { + return adminforce.Input{ + GameID: "game-001", + OpSource: operation.OpSourceAdminRest, + SourceRef: "req-force-001", + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*adminforce.Dependencies) + }{ + {"runtime records", func(d *adminforce.Dependencies) { d.RuntimeRecords = nil }}, + {"operation logs", func(d *adminforce.Dependencies) { d.OperationLogs = nil }}, + {"turn generation", func(d *adminforce.Dependencies) { d.TurnGeneration = nil }}, + {"telemetry", func(d *adminforce.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + deps := adminforce.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + OperationLogs: &fakeOperationLogs{}, + TurnGeneration: &fakeTurnGenerator{}, + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := adminforce.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleHappyPathSetsSkipNextTick(t *testing.T) { + h := newHarness(t) + original := h.seedRunningRecord() + + postTurn := original + postTurn.CurrentTurn = original.CurrentTurn + 1 + nextGen := h.now.Add(time.Hour) + postTurn.NextGenerationAt = &nextGen + postTurn.SkipNextTick = false + h.turn.result = turngeneration.Result{ + Record: postTurn, + Trigger: turngeneration.TriggerForce, + Outcome: operation.OutcomeSuccess, + } + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "want success, got %+v", result) + assert.True(t, result.SkipScheduled) + + // turngeneration.Handle invoked once with TriggerForce. + require.Len(t, h.turn.calls, 1) + assert.Equal(t, turngeneration.TriggerForce, h.turn.calls[0].Trigger) + assert.Equal(t, operation.OpSourceAdminRest, h.turn.calls[0].OpSource) + assert.Equal(t, "req-force-001", h.turn.calls[0].SourceRef) + + // Exactly one UpdateScheduling call with skip=true and identical + // next_generation_at / current_turn from the inner result. + require.Len(t, h.runtime.scheds, 1) + scheds := h.runtime.scheds[0] + assert.True(t, scheds.SkipNextTick) + require.NotNil(t, scheds.NextGenerationAt) + assert.True(t, scheds.NextGenerationAt.Equal(nextGen)) + assert.Equal(t, postTurn.CurrentTurn, scheds.CurrentTurn) + + // Driver entry op_kind=force_next_turn, outcome=success. + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindForceNextTurn, entry.OpKind) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) + assert.Equal(t, "req-force-001", entry.SourceRef) +} + +func TestHandleSetsSkipEvenWhenFinished(t *testing.T) { + h := newHarness(t) + original := h.seedRunningRecord() + + // Inner turn-generation finished the game: NextGenerationAt is + // cleared, status flipped to finished. adminforce still issues the + // scheduling write per stage 17 D3. + finished := original + finished.Status = runtime.StatusFinished + finished.NextGenerationAt = nil + finished.CurrentTurn = original.CurrentTurn + 1 + h.turn.result = turngeneration.Result{ + Record: finished, + Trigger: turngeneration.TriggerForce, + Finished: true, + Outcome: operation.OutcomeSuccess, + } + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + require.Len(t, h.runtime.scheds, 1, "skip must still be written even when finished") + assert.True(t, h.runtime.scheds[0].SkipNextTick) + assert.Nil(t, h.runtime.scheds[0].NextGenerationAt, "must propagate inner result's nil next-gen") + assert.Equal(t, finished.CurrentTurn, h.runtime.scheds[0].CurrentTurn) +} + +func TestHandlePropagatesInnerFailure(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + + h.turn.result = turngeneration.Result{ + Trigger: turngeneration.TriggerForce, + Outcome: operation.OutcomeFailure, + ErrorCode: turngeneration.ErrorCodeEngineUnreachable, + ErrorMessage: "engine 503", + } + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminforce.ErrorCodeEngineUnreachable, result.ErrorCode) + assert.False(t, result.SkipScheduled) + assert.Empty(t, h.runtime.scheds, "scheduling must not run after failure") + + // Driver entry recorded with the propagated error code. + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindForceNextTurn, entry.OpKind) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, adminforce.ErrorCodeEngineUnreachable, entry.ErrorCode) +} + +func TestHandlePropagatesRuntimeNotRunning(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + + h.turn.result = turngeneration.Result{ + Trigger: turngeneration.TriggerForce, + Outcome: operation.OutcomeFailure, + ErrorCode: turngeneration.ErrorCodeRuntimeNotRunning, + ErrorMessage: "runtime status is \"stopped\"", + } + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, adminforce.ErrorCodeRuntimeNotRunning, result.ErrorCode) +} + +func TestHandleSchedulingFailureAfterTurn(t *testing.T) { + h := newHarness(t) + original := h.seedRunningRecord() + + postTurn := original + postTurn.CurrentTurn = original.CurrentTurn + 1 + h.turn.result = turngeneration.Result{ + Record: postTurn, + Trigger: turngeneration.TriggerForce, + Outcome: operation.OutcomeSuccess, + } + h.runtime.schErr = errors.New("connection lost") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminforce.ErrorCodeServiceUnavailable, result.ErrorCode) + assert.False(t, result.SkipScheduled) + + // The driver entry records failure even though turn-generation + // committed successfully. + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, adminforce.ErrorCodeServiceUnavailable, entry.ErrorCode) +} + +func TestHandleTurnGeneratorReturnsError(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.turn.err = errors.New("nil context") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminforce.ErrorCodeInternal, result.ErrorCode) + assert.Empty(t, h.runtime.scheds) +} + +func TestHandleInvalidRequest(t *testing.T) { + h := newHarness(t) + + input := baseInput() + input.GameID = "" + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminforce.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Empty(t, h.turn.calls, "turn generator must not be called on invalid input") + assert.Empty(t, h.logs.snapshot(), "audit entry skipped when game id missing") +} + +func TestHandleNilContextReturnsError(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, baseInput()) //nolint:staticcheck // guard test + require.Error(t, err) +} + +func TestHandleDefaultsOpSource(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + + postTurn := runtime.RuntimeRecord{ + GameID: "game-001", + Status: runtime.StatusRunning, + CurrentTurn: 7, + } + h.turn.result = turngeneration.Result{ + Record: postTurn, + Trigger: turngeneration.TriggerForce, + Outcome: operation.OutcomeSuccess, + } + + input := baseInput() + input.OpSource = "" + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + require.Len(t, h.turn.calls, 1) + assert.Equal(t, operation.OpSourceAdminRest, h.turn.calls[0].OpSource) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpSourceAdminRest, entry.OpSource) +} diff --git a/gamemaster/internal/service/adminpatch/errors.go b/gamemaster/internal/service/adminpatch/errors.go new file mode 100644 index 0000000..2562820 --- /dev/null +++ b/gamemaster/internal/service/adminpatch/errors.go @@ -0,0 +1,45 @@ +package adminpatch + +// Stable error codes returned in `Result.ErrorCode`. The values match +// the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Service-layer callers (Stage +// 19 handlers) import these names rather than redeclare them; renaming +// any of them is a contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty GameID/Version, malformed semver). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no runtime_records row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeRuntimeNotRunning reports that the runtime is not in + // `running`. Patch is supported only for runtimes RTM can recreate + // in place. + ErrorCodeRuntimeNotRunning = "runtime_not_running" + + // ErrorCodeEngineVersionNotFound reports that the requested target + // version is missing from the engine_versions registry, or that it + // is present but `status=deprecated`. + ErrorCodeEngineVersionNotFound = "engine_version_not_found" + + // ErrorCodeSemverPatchOnly reports that the requested target + // version differs in major or minor from the current one. Patch + // upgrades are constrained to same-major.minor. + ErrorCodeSemverPatchOnly = "semver_patch_only" + + // ErrorCodeConflict reports that the runtime's status changed + // concurrently between the lookup and the post-RTM image rotation + // CAS. + ErrorCodeConflict = "conflict" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL, Runtime Manager) was unreachable for + // this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/adminpatch/service.go b/gamemaster/internal/service/adminpatch/service.go new file mode 100644 index 0000000..483f629 --- /dev/null +++ b/gamemaster/internal/service/adminpatch/service.go @@ -0,0 +1,375 @@ +// Package adminpatch implements the admin patch service-layer +// orchestrator owned by Game Master. It is driven by Admin Service or +// system administrators through +// `POST /api/v1/internal/runtimes/{game_id}/patch` and tells Runtime +// Manager to recreate the engine container with a new image, then +// rotates `runtime_records.current_image_ref` and +// `runtime_records.current_engine_version` while keeping the runtime in +// `running`. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Patch`. Design rationale (the dedicated UpdateImage +// port, rejection of deprecated targets, `service_unavailable` mapping +// for RTM failures) is captured in +// `gamemaster/docs/stage17-admin-operations.md`. +package adminpatch + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" +) + +// Input stores the per-call arguments for one admin patch operation. +type Input struct { + // GameID identifies the runtime to patch. + GameID string + + // Version stores the target engine version (semver). Must be + // present in `engine_versions` with `status=active` and a same + // major.minor as the runtime's current version. + Version string + + // OpSource classifies how the request entered Game Master. Used to + // stamp `operation_log.op_source`. Defaults to `admin_rest` when + // missing or unrecognised. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (REST + // request id, admin user id). Empty when the caller does not + // provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if _, err := engineversion.ParseSemver(input.Version); err != nil { + return fmt.Errorf("version: %w", err) + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. Business +// outcomes flow through Result; the Go-level error return is reserved +// for non-business failures (nil context, nil receiver). +type Result struct { + // Record carries the post-rotation runtime record. Populated on + // success; zero on early-rejection failures. + Record runtime.RuntimeRecord + + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords drives the row read plus the post-RTM image + // rotation under a CAS guard. + RuntimeRecords ports.RuntimeRecordStore + + // EngineVersions resolves the target version's image ref and + // status. + EngineVersions ports.EngineVersionStore + + // OperationLogs records the audit entry. + OperationLogs ports.OperationLogStore + + // RTM drives the Runtime Manager patch call. + RTM ports.RTMClient + + // Telemetry is required by the audit/log path. The Stage 17 + // service does not introduce a dedicated counter; outcome metrics + // land under the future Admin Service surface. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the admin patch lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + engineVersions ports.EngineVersionStore + operationLogs ports.OperationLogStore + rtm ports.RTMClient + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new admin patch service: nil runtime records") + case deps.EngineVersions == nil: + return nil, errors.New("new admin patch service: nil engine versions") + case deps.OperationLogs == nil: + return nil, errors.New("new admin patch service: nil operation logs") + case deps.RTM == nil: + return nil, errors.New("new admin patch service: nil rtm client") + case deps.Telemetry == nil: + return nil, errors.New("new admin patch service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.adminpatch") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + engineVersions: deps.EngineVersions, + operationLogs: deps.OperationLogs, + rtm: deps.RTM, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one admin patch operation end-to-end. The Go-level +// error return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("admin patch: nil service") + } + if ctx == nil { + return Result{}, errors.New("admin patch: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, err.Error()), nil + } + + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case errors.Is(err, runtime.ErrNotFound): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist"), nil + case err != nil: + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error())), nil + } + if record.Status != runtime.StatusRunning { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotRunning, + fmt.Sprintf("runtime status is %q, expected %q", + record.Status, runtime.StatusRunning)), nil + } + + target, err := service.engineVersions.Get(ctx, input.Version) + switch { + case errors.Is(err, engineversion.ErrNotFound): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeEngineVersionNotFound, + fmt.Sprintf("engine version %q not found", input.Version)), nil + case err != nil: + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get engine version: %s", err.Error())), nil + } + if target.Status != engineversion.StatusActive { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeEngineVersionNotFound, + fmt.Sprintf("engine version %q is %q, expected %q", + input.Version, target.Status, engineversion.StatusActive)), nil + } + + patchOK, semErr := engineversion.IsPatchUpgrade(record.CurrentEngineVersion, input.Version) + if semErr != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, fmt.Sprintf("compare semver: %s", semErr.Error())), nil + } + if !patchOK { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeSemverPatchOnly, + fmt.Sprintf("target %q is not a same-major.minor patch of %q", + input.Version, record.CurrentEngineVersion)), nil + } + + if err := service.rtm.Patch(ctx, input.GameID, target.ImageRef); err != nil { + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("rtm patch: %s", err.Error())), nil + } + + rotatedAt := service.clock().UTC() + updateErr := service.runtimeRecords.UpdateImage(ctx, ports.UpdateImageInput{ + GameID: input.GameID, + ExpectedStatus: runtime.StatusRunning, + CurrentImageRef: target.ImageRef, + CurrentEngineVersion: input.Version, + Now: rotatedAt, + }) + switch { + case updateErr == nil: + case errors.Is(updateErr, runtime.ErrConflict): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeConflict, + fmt.Sprintf("runtime status changed during patch: %s", updateErr.Error())), nil + case errors.Is(updateErr, runtime.ErrNotFound): + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, + fmt.Sprintf("runtime record disappeared during patch: %s", updateErr.Error())), nil + default: + return service.recordFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("update runtime image: %s", updateErr.Error())), nil + } + + persisted, reloadErr := service.runtimeRecords.Get(ctx, input.GameID) + if reloadErr != nil { + // The image rotation already committed; surface the success + // outcome with the in-memory projection so the caller still + // sees the new image_ref / engine_version. + service.logger.WarnContext(ctx, "reload runtime record after patch", + "game_id", input.GameID, + "err", reloadErr.Error(), + ) + persisted = record + persisted.CurrentImageRef = target.ImageRef + persisted.CurrentEngineVersion = input.Version + persisted.UpdatedAt = rotatedAt + } + + service.appendSuccessLog(ctx, opStartedAt, input) + + logArgs := []any{ + "game_id", input.GameID, + "new_image_ref", target.ImageRef, + "new_engine_version", input.Version, + "previous_engine_version", record.CurrentEngineVersion, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "runtime patched", logArgs...) + + return Result{ + Record: persisted, + Outcome: operation.OutcomeSuccess, + }, nil +} + +// recordFailure assembles the failure Result, appends the +// operation_log failure entry, and returns the structured outcome. +func (service *Service) recordFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + + logArgs := []any{ + "game_id", input.GameID, + "target_version", input.Version, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "admin patch rejected", logArgs...) + + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// appendSuccessLog records the success operation_log entry. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindPatch, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailureLog records the failure operation_log entry. Skipped +// when the input game id is empty so the entry validator does not +// reject an audit row that adds no value. +func (service *Service) appendFailureLog(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) { + if strings.TrimSpace(input.GameID) == "" { + return + } + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindPatch, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the runtime row is the source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to `admin_rest` when the caller did not +// supply a known op source. Mirrors `gamemaster/README.md §Trusted +// Surfaces`. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/adminpatch/service_test.go b/gamemaster/internal/service/adminpatch/service_test.go new file mode 100644 index 0000000..277df5a --- /dev/null +++ b/gamemaster/internal/service/adminpatch/service_test.go @@ -0,0 +1,448 @@ +package adminpatch_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/adminpatch" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error + imgErr error + images []ports.UpdateImageInput +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(_ context.Context, input ports.UpdateImageInput) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.imgErr != nil { + s.images = append(s.images, input) + return s.imgErr + } + record, ok := s.stored[input.GameID] + if !ok { + s.images = append(s.images, input) + return runtime.ErrNotFound + } + if record.Status != input.ExpectedStatus { + s.images = append(s.images, input) + return runtime.ErrConflict + } + record.CurrentImageRef = input.CurrentImageRef + record.CurrentEngineVersion = input.CurrentEngineVersion + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + s.images = append(s.images, input) + return nil +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +type fakeEngineVersions struct { + mu sync.Mutex + versions map[string]engineversion.EngineVersion + getErr error +} + +func newFakeEngineVersions() *fakeEngineVersions { + return &fakeEngineVersions{versions: map[string]engineversion.EngineVersion{}} +} + +func (s *fakeEngineVersions) seed(record engineversion.EngineVersion) { + s.mu.Lock() + defer s.mu.Unlock() + s.versions[record.Version] = record +} + +func (s *fakeEngineVersions) Get(_ context.Context, version string) (engineversion.EngineVersion, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return engineversion.EngineVersion{}, s.getErr + } + rec, ok := s.versions[version] + if !ok { + return engineversion.EngineVersion{}, engineversion.ErrNotFound + } + return rec, nil +} + +func (s *fakeEngineVersions) List(context.Context, *engineversion.Status) ([]engineversion.EngineVersion, error) { + return nil, errors.New("not used") +} +func (s *fakeEngineVersions) Insert(context.Context, engineversion.EngineVersion) error { + return errors.New("not used") +} +func (s *fakeEngineVersions) Update(context.Context, ports.UpdateEngineVersionInput) error { + return errors.New("not used") +} +func (s *fakeEngineVersions) Deprecate(context.Context, string, time.Time) error { + return errors.New("not used") +} +func (s *fakeEngineVersions) Delete(context.Context, string) error { + return errors.New("not used") +} +func (s *fakeEngineVersions) IsReferencedByActiveRuntime(context.Context, string) (bool, error) { + return false, errors.New("not used") +} + +type fakeOperationLogs struct { + mu sync.Mutex + entries []operation.OperationEntry +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} +func (s *fakeOperationLogs) ListByGame(context.Context, string, int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used") +} +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} +func (s *fakeOperationLogs) snapshot() []operation.OperationEntry { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]operation.OperationEntry, len(s.entries)) + copy(out, s.entries) + return out +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + ctrl *gomock.Controller + runtime *fakeRuntimeRecords + versions *fakeEngineVersions + logs *fakeOperationLogs + rtm *mocks.MockRTMClient + telemetry *telemetry.Runtime + now time.Time + service *adminpatch.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + h := &harness{ + t: t, + ctrl: ctrl, + runtime: newFakeRuntimeRecords(), + versions: newFakeEngineVersions(), + logs: &fakeOperationLogs{}, + rtm: mocks.NewMockRTMClient(ctrl), + telemetry: telemetryRuntime, + now: time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC), + } + service, err := adminpatch.NewService(adminpatch.Dependencies{ + RuntimeRecords: h.runtime, + EngineVersions: h.versions, + OperationLogs: h.logs, + RTM: h.rtm, + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +func (h *harness) seedRunningOnVersion(version, image string) runtime.RuntimeRecord { + created := h.now.Add(-time.Hour) + started := h.now.Add(-30 * time.Minute) + next := h.now.Add(30 * time.Minute) + record := runtime.RuntimeRecord{ + GameID: "game-001", + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-game-001:8080", + CurrentImageRef: image, + CurrentEngineVersion: version, + TurnSchedule: "0 18 * * *", + CurrentTurn: 7, + NextGenerationAt: &next, + EngineHealth: "healthy", + CreatedAt: created, + UpdatedAt: started, + StartedAt: &started, + } + h.runtime.seed(record) + return record +} + +func (h *harness) seedTarget(version, image string, status engineversion.Status) { + h.versions.seed(engineversion.EngineVersion{ + Version: version, + ImageRef: image, + Status: status, + CreatedAt: h.now.Add(-24 * time.Hour), + UpdatedAt: h.now.Add(-24 * time.Hour), + }) +} + +func baseInput(version string) adminpatch.Input { + return adminpatch.Input{ + GameID: "game-001", + Version: version, + OpSource: operation.OpSourceAdminRest, + SourceRef: "req-patch-001", + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*adminpatch.Dependencies) + }{ + {"runtime records", func(d *adminpatch.Dependencies) { d.RuntimeRecords = nil }}, + {"engine versions", func(d *adminpatch.Dependencies) { d.EngineVersions = nil }}, + {"operation logs", func(d *adminpatch.Dependencies) { d.OperationLogs = nil }}, + {"rtm", func(d *adminpatch.Dependencies) { d.RTM = nil }}, + {"telemetry", func(d *adminpatch.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := adminpatch.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + EngineVersions: newFakeEngineVersions(), + OperationLogs: &fakeOperationLogs{}, + RTM: mocks.NewMockRTMClient(ctrl), + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := adminpatch.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleHappyPathRotatesImage(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusActive) + + h.rtm.EXPECT().Patch(gomock.Any(), "game-001", "ghcr.io/galaxy/game:v1.2.4").Return(nil) + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "want success, got %+v", result) + assert.Equal(t, "ghcr.io/galaxy/game:v1.2.4", result.Record.CurrentImageRef) + assert.Equal(t, "v1.2.4", result.Record.CurrentEngineVersion) + assert.Equal(t, runtime.StatusRunning, result.Record.Status) + + require.Len(t, h.runtime.images, 1) + assert.Equal(t, runtime.StatusRunning, h.runtime.images[0].ExpectedStatus) + assert.Equal(t, "ghcr.io/galaxy/game:v1.2.4", h.runtime.images[0].CurrentImageRef) + assert.Equal(t, "v1.2.4", h.runtime.images[0].CurrentEngineVersion) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindPatch, entry.OpKind) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusActive) + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleRuntimeNotRunning(t *testing.T) { + h := newHarness(t) + rec := h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + rec.Status = runtime.StatusStopped + h.runtime.seed(rec) + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusActive) + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeRuntimeNotRunning, result.ErrorCode) + assert.Empty(t, h.runtime.images, "no UpdateImage when status precondition fails") +} + +func TestHandleEngineVersionMissing(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeEngineVersionNotFound, result.ErrorCode) +} + +func TestHandleEngineVersionDeprecated(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusDeprecated) + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeEngineVersionNotFound, result.ErrorCode) + assert.Contains(t, result.ErrorMessage, "deprecated") +} + +func TestHandleSemverPatchOnlyMajor(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v2.0.0", "ghcr.io/galaxy/game:v2.0.0", engineversion.StatusActive) + + result, err := h.service.Handle(context.Background(), baseInput("v2.0.0")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeSemverPatchOnly, result.ErrorCode) + assert.Empty(t, h.runtime.images) +} + +func TestHandleSemverPatchOnlyMinor(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v1.3.0", "ghcr.io/galaxy/game:v1.3.0", engineversion.StatusActive) + + result, err := h.service.Handle(context.Background(), baseInput("v1.3.0")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeSemverPatchOnly, result.ErrorCode) +} + +func TestHandleRTMUnavailable(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusActive) + + h.rtm.EXPECT().Patch(gomock.Any(), "game-001", "ghcr.io/galaxy/game:v1.2.4"). + Return(ports.ErrRTMUnavailable) + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeServiceUnavailable, result.ErrorCode) + assert.Empty(t, h.runtime.images, "no UpdateImage when RTM fails") +} + +func TestHandleCASLostAfterRTM(t *testing.T) { + h := newHarness(t) + h.seedRunningOnVersion("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + h.seedTarget("v1.2.4", "ghcr.io/galaxy/game:v1.2.4", engineversion.StatusActive) + + h.rtm.EXPECT().Patch(gomock.Any(), "game-001", "ghcr.io/galaxy/game:v1.2.4").Return(nil) + h.runtime.imgErr = runtime.ErrConflict + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeConflict, result.ErrorCode) + require.Len(t, h.runtime.images, 1) +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + input adminpatch.Input + }{ + {"empty game id", adminpatch.Input{GameID: "", Version: "v1.2.4", OpSource: operation.OpSourceAdminRest}}, + {"malformed version", adminpatch.Input{GameID: "game-001", Version: "not-a-semver", OpSource: operation.OpSourceAdminRest}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), tc.input) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeInvalidRequest, result.ErrorCode) + }) + } +} + +func TestHandleNilContextReturnsError(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, baseInput("v1.2.4")) //nolint:staticcheck // guard test + require.Error(t, err) +} + +func TestHandleStoreReadFailure(t *testing.T) { + h := newHarness(t) + h.runtime.getErr = errors.New("connection refused") + + result, err := h.service.Handle(context.Background(), baseInput("v1.2.4")) + require.NoError(t, err) + assert.Equal(t, adminpatch.ErrorCodeServiceUnavailable, result.ErrorCode) +} diff --git a/gamemaster/internal/service/adminstop/errors.go b/gamemaster/internal/service/adminstop/errors.go new file mode 100644 index 0000000..3c31746 --- /dev/null +++ b/gamemaster/internal/service/adminstop/errors.go @@ -0,0 +1,48 @@ +package adminstop + +// Stable error codes returned in `Result.ErrorCode`. The values match +// the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Service-layer callers (Stage +// 19 handlers) import these names rather than redeclare them; renaming +// any of them is a contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty GameID, unknown stop reason). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no runtime_records row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeConflict reports that the runtime is in a status that + // cannot transition to `stopped` (currently only `starting`), or + // that a CAS guard mid-flow lost the race to a concurrent mutation. + ErrorCodeConflict = "conflict" + + // ErrorCodeServiceUnavailable reports that a steady-state dependency + // (PostgreSQL, Runtime Manager) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) + +// Allowed values of Input.Reason mirror the README §Stop wording +// «reason ∈ {admin_request, finished, timeout}». Callers that pass an +// empty string get the documented default `admin_request`. +const ( + // ReasonAdminRequest is the operator-driven stop reason and the + // default when Input.Reason is empty. + ReasonAdminRequest = "admin_request" + + // ReasonFinished is reserved for callers that wrap a + // finish-detected stop (currently unused; documented for + // completeness). + ReasonFinished = "finished" + + // ReasonTimeout is reserved for callers that wrap an automated + // timeout-driven stop (currently unused; documented for + // completeness). + ReasonTimeout = "timeout" +) diff --git a/gamemaster/internal/service/adminstop/service.go b/gamemaster/internal/service/adminstop/service.go new file mode 100644 index 0000000..137b67f --- /dev/null +++ b/gamemaster/internal/service/adminstop/service.go @@ -0,0 +1,396 @@ +// Package adminstop implements the admin stop service-layer +// orchestrator owned by Game Master. It is driven by Admin Service or +// system administrators through +// `POST /api/v1/internal/runtimes/{game_id}/stop` and tells Runtime +// Manager to stop the game's container while transitioning the runtime +// record to `stopped`. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Stop`. The idempotent-on-terminal-status and +// conflict-on-starting rules are recorded in +// `gamemaster/docs/stage17-admin-operations.md`. +package adminstop + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" +) + +// Input stores the per-call arguments for one admin stop operation. +type Input struct { + // GameID identifies the runtime to stop. + GameID string + + // Reason classifies the stop. Empty defaults to + // `admin_request`. Allowed values: `admin_request`, `finished`, + // `timeout`. + Reason string + + // OpSource classifies how the request entered Game Master. Used to + // stamp `operation_log.op_source`. Defaults to `admin_rest` when + // missing or unrecognised. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (REST + // request id, admin user id). Empty when the caller does not + // provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + switch strings.TrimSpace(input.Reason) { + case "", ReasonAdminRequest, ReasonFinished, ReasonTimeout: + return nil + default: + return fmt.Errorf("reason %q is unsupported", input.Reason) + } +} + +// Result stores the deterministic outcome of one Handle call. Business +// outcomes flow through Result; the Go-level error return is reserved +// for non-business failures (nil context, nil receiver). +type Result struct { + // Record carries the runtime record observed (and on success + // transitioned) by the operation. Populated on success and on the + // idempotent no-op branch; zero on early-rejection failures + // (invalid_request, runtime_not_found). + Record runtime.RuntimeRecord + + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords drives the read of the current row plus the CAS + // transition to `stopped`. + RuntimeRecords ports.RuntimeRecordStore + + // OperationLogs records the audit entry for the operation. + OperationLogs ports.OperationLogStore + + // RTM drives the Runtime Manager stop call. + RTM ports.RTMClient + + // LobbyEvents publishes the post-success + // `runtime_snapshot_update` to `gm:lobby_events`. + LobbyEvents ports.LobbyEventsPublisher + + // Telemetry is required by the lobby-events publication helper. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the admin stop lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + operationLogs ports.OperationLogStore + rtm ports.RTMClient + lobbyEvents ports.LobbyEventsPublisher + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new admin stop service: nil runtime records") + case deps.OperationLogs == nil: + return nil, errors.New("new admin stop service: nil operation logs") + case deps.RTM == nil: + return nil, errors.New("new admin stop service: nil rtm client") + case deps.LobbyEvents == nil: + return nil, errors.New("new admin stop service: nil lobby events publisher") + case deps.Telemetry == nil: + return nil, errors.New("new admin stop service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.adminstop") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + operationLogs: deps.OperationLogs, + rtm: deps.RTM, + lobbyEvents: deps.LobbyEvents, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one admin stop operation end-to-end. The Go-level +// error return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("admin stop: nil service") + } + if ctx == nil { + return Result{}, errors.New("admin stop: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, err.Error()), nil + } + + reason := strings.TrimSpace(input.Reason) + if reason == "" { + reason = ReasonAdminRequest + } + + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case errors.Is(err, runtime.ErrNotFound): + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist"), nil + case err != nil: + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error())), nil + } + + switch record.Status { + case runtime.StatusStopped, runtime.StatusFinished: + return service.completeIdempotent(ctx, opStartedAt, input, record), nil + case runtime.StatusStarting: + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, record, + ErrorCodeConflict, + fmt.Sprintf("runtime status is %q; stop requires a started runtime", record.Status)), nil + } + + if err := service.rtm.Stop(ctx, input.GameID, reason); err != nil { + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, record, + ErrorCodeServiceUnavailable, fmt.Sprintf("rtm stop: %s", err.Error())), nil + } + + stoppedAt := service.clock().UTC() + casErr := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: record.Status, + To: runtime.StatusStopped, + Now: stoppedAt, + }) + switch { + case casErr == nil: + case errors.Is(casErr, runtime.ErrConflict): + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, record, + ErrorCodeConflict, + fmt.Sprintf("cas runtime status to stopped: %s", casErr.Error())), nil + case errors.Is(casErr, runtime.ErrNotFound): + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, record, + ErrorCodeRuntimeNotFound, + fmt.Sprintf("cas runtime status to stopped: %s", casErr.Error())), nil + default: + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, record, + ErrorCodeServiceUnavailable, + fmt.Sprintf("cas runtime status to stopped: %s", casErr.Error())), nil + } + + persisted, reloadErr := service.runtimeRecords.Get(ctx, input.GameID) + if reloadErr != nil { + // CAS already committed; surface the success outcome but log the + // degraded reload so operators know the response carries the + // pre-CAS record. + service.logger.WarnContext(ctx, "reload runtime record after stop", + "game_id", input.GameID, + "err", reloadErr.Error(), + ) + persisted = record + persisted.Status = runtime.StatusStopped + persisted.UpdatedAt = stoppedAt + persisted.StoppedAt = &stoppedAt + } + + service.publishSnapshot(ctx, persisted, stoppedAt) + service.appendSuccessLog(ctx, opStartedAt, input) + + logArgs := []any{ + "game_id", input.GameID, + "reason", reason, + "from_status", string(record.Status), + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "runtime stopped", logArgs...) + + return Result{ + Record: persisted, + Outcome: operation.OutcomeSuccess, + }, nil +} + +// completeIdempotent records the no-op success path used when the +// runtime is already terminal (stopped or finished). RTM is not +// invoked, no snapshot is published, but the audit row is written so +// operators can confirm the call landed. +func (service *Service) completeIdempotent(ctx context.Context, opStartedAt time.Time, input Input, record runtime.RuntimeRecord) Result { + service.appendSuccessLog(ctx, opStartedAt, input) + + logArgs := []any{ + "game_id", input.GameID, + "observed_status", string(record.Status), + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "runtime stop already terminal", logArgs...) + + return Result{ + Record: record, + Outcome: operation.OutcomeSuccess, + } +} + +// recordEarlyFailure records a failure that occurred before the runtime +// row was read or in the validation phase. +func (service *Service) recordEarlyFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + return service.recordEarlyFailureWithRecord(ctx, opStartedAt, input, runtime.RuntimeRecord{}, errorCode, errorMessage) +} + +// recordEarlyFailureWithRecord records a failure and propagates the +// observed runtime record (when available) to the caller. +func (service *Service) recordEarlyFailureWithRecord(ctx context.Context, opStartedAt time.Time, input Input, record runtime.RuntimeRecord, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + + logArgs := []any{ + "game_id", input.GameID, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "admin stop rejected", logArgs...) + + return Result{ + Record: record, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// publishSnapshot publishes the post-success +// `runtime_snapshot_update` per `gamemaster/README.md §Lifecycles → +// Stop` step 4. Failure is logged but never rolls back the just-applied +// CAS; the snapshot stream is best-effort by contract. +func (service *Service) publishSnapshot(ctx context.Context, record runtime.RuntimeRecord, occurredAt time.Time) { + msg := ports.RuntimeSnapshotUpdate{ + GameID: record.GameID, + CurrentTurn: record.CurrentTurn, + RuntimeStatus: record.Status, + EngineHealthSummary: record.EngineHealth, + PlayerTurnStats: nil, + OccurredAt: occurredAt, + } + if err := service.lobbyEvents.PublishSnapshotUpdate(ctx, msg); err != nil { + service.logger.ErrorContext(ctx, "publish runtime snapshot update", + "game_id", record.GameID, + "err", err.Error(), + ) + return + } + service.telemetry.RecordLobbyEventPublished(ctx, "runtime_snapshot_update") +} + +// appendSuccessLog records the success operation_log entry. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindStop, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailureLog records the failure operation_log entry. +func (service *Service) appendFailureLog(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindStop, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the runtime row is the source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to `admin_rest` when the caller did not +// supply a known op source. Mirrors `gamemaster/README.md §Trusted +// Surfaces`. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/adminstop/service_test.go b/gamemaster/internal/service/adminstop/service_test.go new file mode 100644 index 0000000..1cb775d --- /dev/null +++ b/gamemaster/internal/service/adminstop/service_test.go @@ -0,0 +1,459 @@ +package adminstop_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/adminstop" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error + updErr error + updates []ports.UpdateStatusInput +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) UpdateStatus(_ context.Context, input ports.UpdateStatusInput) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.updErr != nil { + return s.updErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if record.Status != input.ExpectedFrom { + return runtime.ErrConflict + } + record.Status = input.To + record.UpdatedAt = input.Now + if input.To == runtime.StatusStopped { + stopped := input.Now + record.StoppedAt = &stopped + } + s.stored[input.GameID] = record + s.updates = append(s.updates, input) + return nil +} + +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +func (s *fakeRuntimeRecords) updateCount() int { + s.mu.Lock() + defer s.mu.Unlock() + return len(s.updates) +} + +type fakeOperationLogs struct { + mu sync.Mutex + entries []operation.OperationEntry + appErr error +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.appErr != nil { + return 0, s.appErr + } + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} + +func (s *fakeOperationLogs) ListByGame(context.Context, string, int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used") +} + +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} + +func (s *fakeOperationLogs) snapshot() []operation.OperationEntry { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]operation.OperationEntry, len(s.entries)) + copy(out, s.entries) + return out +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + ctrl *gomock.Controller + runtime *fakeRuntimeRecords + logs *fakeOperationLogs + rtm *mocks.MockRTMClient + lobby *mocks.MockLobbyEventsPublisher + telemetry *telemetry.Runtime + now time.Time + service *adminstop.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + h := &harness{ + t: t, + ctrl: ctrl, + runtime: newFakeRuntimeRecords(), + logs: &fakeOperationLogs{}, + rtm: mocks.NewMockRTMClient(ctrl), + lobby: mocks.NewMockLobbyEventsPublisher(ctrl), + telemetry: telemetryRuntime, + now: time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC), + } + service, err := adminstop.NewService(adminstop.Dependencies{ + RuntimeRecords: h.runtime, + OperationLogs: h.logs, + RTM: h.rtm, + LobbyEvents: h.lobby, + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +func (h *harness) seedRecord(status runtime.Status) runtime.RuntimeRecord { + created := h.now.Add(-time.Hour) + started := h.now.Add(-30 * time.Minute) + next := h.now.Add(30 * time.Minute) + record := runtime.RuntimeRecord{ + GameID: "game-001", + Status: status, + EngineEndpoint: "http://galaxy-game-game-001:8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 7, + NextGenerationAt: &next, + EngineHealth: "healthy", + CreatedAt: created, + UpdatedAt: started, + StartedAt: &started, + } + h.runtime.seed(record) + return record +} + +func baseInput() adminstop.Input { + return adminstop.Input{ + GameID: "game-001", + Reason: adminstop.ReasonAdminRequest, + OpSource: operation.OpSourceAdminRest, + SourceRef: "req-stop-001", + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*adminstop.Dependencies) + }{ + {"runtime records", func(d *adminstop.Dependencies) { d.RuntimeRecords = nil }}, + {"operation logs", func(d *adminstop.Dependencies) { d.OperationLogs = nil }}, + {"rtm", func(d *adminstop.Dependencies) { d.RTM = nil }}, + {"lobby events", func(d *adminstop.Dependencies) { d.LobbyEvents = nil }}, + {"telemetry", func(d *adminstop.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := adminstop.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + OperationLogs: &fakeOperationLogs{}, + RTM: mocks.NewMockRTMClient(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := adminstop.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + original := h.seedRecord(runtime.StatusRunning) + + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest).Return(nil) + h.lobby.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.AssignableToTypeOf(ports.RuntimeSnapshotUpdate{})). + DoAndReturn(func(_ context.Context, msg ports.RuntimeSnapshotUpdate) error { + assert.Equal(t, "game-001", msg.GameID) + assert.Equal(t, runtime.StatusStopped, msg.RuntimeStatus) + assert.Equal(t, original.CurrentTurn, msg.CurrentTurn) + assert.Equal(t, original.EngineHealth, msg.EngineHealthSummary) + assert.Empty(t, msg.PlayerTurnStats) + assert.True(t, msg.OccurredAt.Equal(h.now)) + return nil + }) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "want success, got %+v", result) + assert.Equal(t, runtime.StatusStopped, result.Record.Status) + assert.Equal(t, 1, h.runtime.updateCount(), "exactly one CAS call expected") + + entry, ok := h.logs.lastEntry() + require.True(t, ok, "operation log entry must be appended") + assert.Equal(t, operation.OpKindStop, entry.OpKind) + assert.Equal(t, operation.OpSourceAdminRest, entry.OpSource) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) + assert.Empty(t, entry.ErrorCode) +} + +func TestHandleHappyPathFromGenerationFailed(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusGenerationFailed) + + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest).Return(nil) + h.lobby.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + assert.Equal(t, runtime.StatusStopped, result.Record.Status) + require.Len(t, h.runtime.updates, 1) + assert.Equal(t, runtime.StatusGenerationFailed, h.runtime.updates[0].ExpectedFrom) +} + +func TestHandleEmptyReasonDefaultsToAdminRequest(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusRunning) + + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest).Return(nil) + h.lobby.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + input := baseInput() + input.Reason = "" + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.True(t, result.IsSuccess()) +} + +func TestHandleIdempotentOnAlreadyStopped(t *testing.T) { + h := newHarness(t) + original := h.seedRecord(runtime.StatusStopped) + + // No RTM call, no snapshot publication expected. + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + assert.Equal(t, runtime.StatusStopped, result.Record.Status) + assert.Equal(t, original.UpdatedAt, result.Record.UpdatedAt, "no mutation expected") + assert.Zero(t, h.runtime.updateCount(), "no CAS expected on idempotent path") + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) +} + +func TestHandleIdempotentOnFinished(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusFinished) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + assert.Equal(t, runtime.StatusFinished, result.Record.Status) +} + +func TestHandleConflictOnStarting(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusStarting) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeConflict, result.ErrorCode) + assert.Zero(t, h.runtime.updateCount()) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, adminstop.ErrorCodeConflict, entry.ErrorCode) +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleRTMUnavailable(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusRunning) + + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest). + Return(ports.ErrRTMUnavailable) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeServiceUnavailable, result.ErrorCode) + assert.Zero(t, h.runtime.updateCount(), "CAS must not run after RTM failure") +} + +func TestHandleCASLostRace(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusRunning) + + // RTM stop succeeds, but a concurrent mutation flipped the row out + // of `running` before our CAS lands. + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest).Return(nil) + h.runtime.updErr = runtime.ErrConflict + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeConflict, result.ErrorCode) +} + +func TestHandleStoreReadFailure(t *testing.T) { + h := newHarness(t) + h.runtime.getErr = errors.New("connection refused") + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + mut func(*adminstop.Input) + }{ + {"empty game id", func(in *adminstop.Input) { in.GameID = "" }}, + {"unknown reason", func(in *adminstop.Input) { in.Reason = "panic" }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + input := baseInput() + tc.mut(&input) + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, adminstop.ErrorCodeInvalidRequest, result.ErrorCode) + // Audit log uses the validated game id; for the empty-id + // case it would fail entry validation, so we only assert + // when game id is present. + if input.GameID != "" { + _, ok := h.logs.lastEntry() + assert.True(t, ok) + } + }) + } +} + +func TestHandleNilContextReturnsError(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, baseInput()) //nolint:staticcheck // intentional nil for guard test + require.Error(t, err) +} + +func TestHandleSnapshotPublishFailureSurfacesSuccess(t *testing.T) { + h := newHarness(t) + h.seedRecord(runtime.StatusRunning) + + h.rtm.EXPECT().Stop(gomock.Any(), "game-001", adminstop.ReasonAdminRequest).Return(nil) + h.lobby.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + Return(errors.New("redis down")) + + result, err := h.service.Handle(context.Background(), baseInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "snapshot publication is best-effort") + assert.Equal(t, runtime.StatusStopped, result.Record.Status) +} diff --git a/gamemaster/internal/service/commandexecute/errors.go b/gamemaster/internal/service/commandexecute/errors.go new file mode 100644 index 0000000..fd2196c --- /dev/null +++ b/gamemaster/internal/service/commandexecute/errors.go @@ -0,0 +1,51 @@ +package commandexecute + +// Stable error codes returned in `Result.ErrorCode`. The values match the +// vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Stage 19's REST handler imports +// these names rather than redeclare them; renaming any of them is a +// contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty required field, malformed payload, + // non-object payload, payload missing the `commands` array). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no `runtime_records` row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeRuntimeNotRunning reports that the runtime exists but its + // current status is not `running`. Hot-path commands are rejected + // outside the running state to avoid racing with admin transitions + // and turn generation. + ErrorCodeRuntimeNotRunning = "runtime_not_running" + + // ErrorCodeForbidden reports that the caller is not an active member + // of the game, or that the (game_id, user_id) pair lacks a player + // mapping. Either way the caller is not authorised to act. + ErrorCodeForbidden = "forbidden" + + // ErrorCodeEngineUnreachable reports that the engine /api/v1/command + // call returned a 5xx status, timed out, or could not be dispatched. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine returned + // 4xx with a per-command result. The body is forwarded verbatim + // through `Result.RawResponse` so the gateway can surface the + // per-command error vocabulary. + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine response + // did not match the expected schema (malformed JSON, unexpected + // types). Stage 19 maps this to 502. + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state dependency + // (PostgreSQL, Lobby) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/commandexecute/service.go b/gamemaster/internal/service/commandexecute/service.go new file mode 100644 index 0000000..44ac3da --- /dev/null +++ b/gamemaster/internal/service/commandexecute/service.go @@ -0,0 +1,367 @@ +// Package commandexecute implements the player-command hot-path service +// owned by Game Master. It accepts a verified `(game_id, user_id, payload)` +// envelope from Edge Gateway, authorises the caller against the membership +// cache, resolves `actor=race_name` from `player_mappings`, reshapes the +// payload to the engine `CommandRequest{actor, cmd}` schema, and forwards +// the call to the engine `/api/v1/command` endpoint. +// +// Lifecycle and error semantics follow `gamemaster/README.md §Hot Path → +// Player commands and orders`. Design rationale is captured in +// `gamemaster/docs/stage16-membership-cache-and-invalidation.md`. +package commandexecute + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/telemetry" +) + +const ( + engineCallOp = "command" + + membershipStatusActive = "active" + + payloadCommandsKey = "commands" + payloadCmdKey = "cmd" + payloadActorKey = "actor" +) + +// Input stores the per-call arguments for one command-execute operation. +// The shape mirrors `ExecuteCommandsRequest` from +// `gamemaster/api/internal-openapi.yaml` plus the verified user identity +// captured from the `X-User-ID` header by the Stage 19 handler. +type Input struct { + // GameID identifies the platform game the command targets. + GameID string + + // UserID identifies the platform user submitting the command. The + // service derives `actor=race_name` from this value via + // `player_mappings`. + UserID string + + // Payload stores the raw `ExecuteCommandsRequest` body. The service + // rewrites it to the engine `CommandRequest{actor, cmd}` shape + // before forwarding. + Payload json.RawMessage +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(input.UserID) == "" { + return fmt.Errorf("user id must not be empty") + } + if len(input.Payload) == 0 { + return fmt.Errorf("payload must not be empty") + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. +type Result struct { + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string + + // RawResponse stores the engine response body. Populated on success + // and on `engine_validation_error` (where the engine 4xx body + // carries the per-command result vocabulary the gateway forwards). + // Empty on every other terminal branch. + RawResponse json.RawMessage +} + +// IsSuccess reports whether the result represents a successful operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords loads the engine endpoint and the runtime status. + RuntimeRecords ports.RuntimeRecordStore + + // PlayerMappings resolves `(game_id, user_id) → race_name`. + PlayerMappings ports.PlayerMappingStore + + // Membership authorises the caller. Hot-path services share one + // cache instance with `orderput` and `reportget`. + Membership *membership.Cache + + // Engine forwards the reshaped payload to `/api/v1/command`. + Engine ports.EngineClient + + // Telemetry records the per-outcome counter and the engine-call + // latency histogram. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for engine-call latency. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the command-execute hot-path operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + playerMappings ports.PlayerMappingStore + membership *membership.Cache + engine ports.EngineClient + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new command execute service: nil runtime records") + case deps.PlayerMappings == nil: + return nil, errors.New("new command execute service: nil player mappings") + case deps.Membership == nil: + return nil, errors.New("new command execute service: nil membership cache") + case deps.Engine == nil: + return nil, errors.New("new command execute service: nil engine client") + case deps.Telemetry == nil: + return nil, errors.New("new command execute service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.commandexecute") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + playerMappings: deps.PlayerMappings, + membership: deps.Membership, + engine: deps.Engine, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one command-execute operation end-to-end. The Go-level +// error return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("command execute: nil service") + } + if ctx == nil { + return Result{}, errors.New("command execute: nil context") + } + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, input, ErrorCodeInvalidRequest, err.Error(), nil), nil + } + + record, result, ok := service.loadRecord(ctx, input) + if !ok { + return result, nil + } + if record.Status != runtime.StatusRunning { + message := fmt.Sprintf("runtime status is %q, expected %q", record.Status, runtime.StatusRunning) + return service.recordFailure(ctx, input, ErrorCodeRuntimeNotRunning, message, nil), nil + } + + mapping, result, ok := service.authorise(ctx, input) + if !ok { + return result, nil + } + + payload, err := rewriteCommandPayload(input.Payload, mapping.RaceName) + if err != nil { + return service.recordFailure(ctx, input, ErrorCodeInvalidRequest, err.Error(), nil), nil + } + + body, engineErr := service.callEngine(ctx, record.EngineEndpoint, payload) + if engineErr != nil { + errorCode := classifyEngineError(engineErr) + message := fmt.Sprintf("engine command: %s", engineErr.Error()) + var bodyForCaller json.RawMessage + if errorCode == ErrorCodeEngineValidationError { + bodyForCaller = body + } + return service.recordFailure(ctx, input, errorCode, message, bodyForCaller), nil + } + + service.telemetry.RecordCommandExecuteOutcome(ctx, + string(operation.OutcomeSuccess), "") + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "actor", mapping.RaceName, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "command execute succeeded", logArgs...) + + return Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: body, + }, nil +} + +// loadRecord reads the runtime record and maps store errors to +// orchestrator outcomes. ok=false means the flow stops with the returned +// Result. +func (service *Service) loadRecord(ctx context.Context, input Input) (runtime.RuntimeRecord, Result, bool) { + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case err == nil: + return record, Result{}, true + case errors.Is(err, runtime.ErrNotFound): + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist", nil), false + default: + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error()), nil), false + } +} + +// authorise resolves the membership status and the player mapping for the +// caller. ok=false means the flow stops with the returned Result. +func (service *Service) authorise(ctx context.Context, input Input) (playermapping.PlayerMapping, Result, bool) { + status, err := service.membership.Resolve(ctx, input.GameID, input.UserID) + if err != nil { + if errors.Is(err, membership.ErrLobbyUnavailable) { + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("resolve membership: %s", err.Error()), nil), false + } + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("resolve membership: %s", err.Error()), nil), false + } + if status != membershipStatusActive { + message := fmt.Sprintf("membership status %q does not authorise commands", status) + if status == "" { + message = "user is not a member of the game" + } + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, message, nil), false + } + + mapping, err := service.playerMappings.Get(ctx, input.GameID, input.UserID) + switch { + case err == nil: + return mapping, Result{}, true + case errors.Is(err, playermapping.ErrNotFound): + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, "player mapping not installed for active member", nil), false + default: + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get player mapping: %s", err.Error()), nil), false + } +} + +// callEngine forwards the reshaped payload to the engine and records the +// wall-clock latency under the `command` op label. +func (service *Service) callEngine(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + start := service.clock() + body, err := service.engine.ExecuteCommands(ctx, baseURL, payload) + service.telemetry.RecordEngineCall(ctx, engineCallOp, service.clock().Sub(start)) + return body, err +} + +// classifyEngineError maps the engine port sentinels to the +// command-execute stable error codes. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// recordFailure emits the service-level outcome counter and a structured +// log entry, then returns the Result the caller surfaces. The caller is +// responsible for the runtime-side mutation (none for hot-path). +func (service *Service) recordFailure(ctx context.Context, input Input, errorCode, errorMessage string, rawResponse json.RawMessage) Result { + service.telemetry.RecordCommandExecuteOutcome(ctx, + string(operation.OutcomeFailure), errorCode) + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "command execute rejected", logArgs...) + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + RawResponse: rawResponse, + } +} + +// rewriteCommandPayload reshapes the GM `ExecuteCommandsRequest` body +// (`{commands:[…]}`) to the engine `CommandRequest` body +// (`{actor:, cmd:[…]}`). Every other top-level key is +// discarded; GM never trusts caller-supplied envelope fields per the +// README §Hot Path rule. Returns an error when the payload is not a JSON +// object or the `commands` field is missing or not an array. +func rewriteCommandPayload(payload json.RawMessage, raceName string) (json.RawMessage, error) { + var fields map[string]json.RawMessage + if err := json.Unmarshal(payload, &fields); err != nil { + return nil, fmt.Errorf("payload must decode as a JSON object: %w", err) + } + commands, ok := fields[payloadCommandsKey] + if !ok { + return nil, fmt.Errorf("payload missing required %q field", payloadCommandsKey) + } + var commandList []json.RawMessage + if err := json.Unmarshal(commands, &commandList); err != nil { + return nil, fmt.Errorf("payload %q field must decode as an array: %w", payloadCommandsKey, err) + } + actor, err := json.Marshal(raceName) + if err != nil { + return nil, fmt.Errorf("marshal actor: %w", err) + } + out := map[string]json.RawMessage{ + payloadActorKey: actor, + payloadCmdKey: commands, + } + encoded, err := json.Marshal(out) + if err != nil { + return nil, fmt.Errorf("marshal engine payload: %w", err) + } + _ = commandList // ensure the array shape is validated before forwarding + return encoded, nil +} diff --git a/gamemaster/internal/service/commandexecute/service_test.go b/gamemaster/internal/service/commandexecute/service_test.go new file mode 100644 index 0000000..c8ff163 --- /dev/null +++ b/gamemaster/internal/service/commandexecute/service_test.go @@ -0,0 +1,614 @@ +package commandexecute_test + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/commandexecute" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// --- fakes ------------------------------------------------------------ + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) CountByStatus(context.Context) (map[string]int, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) CountDue(context.Context) (int, error) { + return 0, errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} + +type fakePlayerMappings struct { + mu sync.Mutex + stored map[string]map[string]playermapping.PlayerMapping + getErr error +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{stored: map[string]map[string]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) seed(record playermapping.PlayerMapping) { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.stored[record.GameID]; !ok { + s.stored[record.GameID] = map[string]playermapping.PlayerMapping{} + } + s.stored[record.GameID][record.UserID] = record +} + +func (s *fakePlayerMappings) Get(_ context.Context, gameID, userID string) (playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return playermapping.PlayerMapping{}, s.getErr + } + record, ok := s.stored[gameID][userID] + if !ok { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + return record, nil +} + +func (s *fakePlayerMappings) BulkInsert(context.Context, []playermapping.PlayerMapping) error { + return errors.New("not used") +} +func (s *fakePlayerMappings) GetByRace(context.Context, string, string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} +func (s *fakePlayerMappings) ListByGame(context.Context, string) ([]playermapping.PlayerMapping, error) { + return nil, errors.New("not used") +} +func (s *fakePlayerMappings) DeleteByGame(context.Context, string) error { + return errors.New("not used") +} + +type recordedCall struct { + baseURL string + payload json.RawMessage +} + +type fakeEngine struct { + mu sync.Mutex + body json.RawMessage + err error + calls []recordedCall +} + +func (f *fakeEngine) ExecuteCommands(_ context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + f.mu.Lock() + defer f.mu.Unlock() + stored := append(json.RawMessage(nil), payload...) + f.calls = append(f.calls, recordedCall{baseURL: baseURL, payload: stored}) + return f.body, f.err +} + +func (f *fakeEngine) Init(context.Context, string, ports.InitRequest) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Status(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Turn(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) BanishRace(context.Context, string, string) error { + return errors.New("not used") +} +func (f *fakeEngine) PutOrders(context.Context, string, json.RawMessage) (json.RawMessage, error) { + return nil, errors.New("not used") +} +func (f *fakeEngine) GetReport(context.Context, string, string, int) (json.RawMessage, error) { + return nil, errors.New("not used") +} + +type fakeLobby struct { + mu sync.Mutex + answers map[string][]ports.Membership + errs map[string]error +} + +func newFakeLobby() *fakeLobby { + return &fakeLobby{ + answers: map[string][]ports.Membership{}, + errs: map[string]error{}, + } +} + +func (f *fakeLobby) seed(gameID string, members []ports.Membership) { + f.mu.Lock() + defer f.mu.Unlock() + f.answers[gameID] = members +} + +func (f *fakeLobby) seedErr(gameID string, err error) { + f.mu.Lock() + defer f.mu.Unlock() + f.errs[gameID] = err +} + +func (f *fakeLobby) GetMemberships(_ context.Context, gameID string) ([]ports.Membership, error) { + f.mu.Lock() + defer f.mu.Unlock() + if err, ok := f.errs[gameID]; ok { + return nil, err + } + return append([]ports.Membership(nil), f.answers[gameID]...), nil +} + +func (f *fakeLobby) GetGameSummary(context.Context, string) (ports.GameSummary, error) { + return ports.GameSummary{}, errors.New("not used") +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + now time.Time + runtimes *fakeRuntimeRecords + mappings *fakePlayerMappings + engine *fakeEngine + lobby *fakeLobby + cache *membership.Cache + service *commandexecute.Service +} + +const ( + testGameID = "game-001" + testUserID = "user-1" + testRaceName = "Aelinari" + testEngineEndpoint = "http://galaxy-game-game-001:8080" +) + +func newHarness(t *testing.T) *harness { + t.Helper() + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + + h := &harness{ + t: t, + now: now, + runtimes: newFakeRuntimeRecords(), + mappings: newFakePlayerMappings(), + engine: &fakeEngine{}, + lobby: newFakeLobby(), + } + + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: h.lobby, + Telemetry: tel, + TTL: time.Minute, + MaxGames: 16, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.cache = cache + + svc, err := commandexecute.NewService(commandexecute.Dependencies{ + RuntimeRecords: h.runtimes, + PlayerMappings: h.mappings, + Membership: h.cache, + Engine: h.engine, + Telemetry: tel, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = svc + return h +} + +func (h *harness) seedRunningRecord() { + startedAt := h.now.Add(-1 * time.Hour) + h.runtimes.seed(runtime.RuntimeRecord{ + GameID: testGameID, + Status: runtime.StatusRunning, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + EngineHealth: "healthy", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + StartedAt: &startedAt, + }) +} + +func (h *harness) seedActiveMembership() { + h.lobby.seed(testGameID, []ports.Membership{{ + UserID: testUserID, + RaceName: testRaceName, + Status: "active", + JoinedAt: h.now.Add(-2 * time.Hour), + }}) +} + +func (h *harness) seedPlayerMapping() { + h.mappings.seed(playermapping.PlayerMapping{ + GameID: testGameID, + UserID: testUserID, + RaceName: testRaceName, + EnginePlayerUUID: "uuid-1", + CreatedAt: h.now.Add(-2 * time.Hour), + }) +} + +func (h *harness) inputWithCommands(payload string) commandexecute.Input { + return commandexecute.Input{ + GameID: testGameID, + UserID: testUserID, + Payload: json.RawMessage(payload), + } +} + +func basicCommandsPayload() string { + return `{"commands":[{"@type":"BUILD_SHIP","cmdId":"00000000-0000-0000-0000-000000000001"}]}` +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsBadDependencies(t *testing.T) { + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: newFakeLobby(), Telemetry: tel, TTL: time.Minute, MaxGames: 1, + }) + require.NoError(t, err) + + cases := []struct { + name string + deps commandexecute.Dependencies + }{ + {"nil runtime records", commandexecute.Dependencies{PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil player mappings", commandexecute.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil membership", commandexecute.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil engine", commandexecute.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Telemetry: tel}}, + {"nil telemetry", commandexecute.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + svc, err := commandexecute.NewService(tc.deps) + require.Error(t, err) + assert.Nil(t, svc) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"results":[{"cmd_id":"00000000-0000-0000-0000-000000000001","cmd_applied":true}]}`) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome) + assert.Empty(t, result.ErrorCode) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) + + require.Len(t, h.engine.calls, 1) + assert.Equal(t, testEngineEndpoint, h.engine.calls[0].baseURL) + + var sentToEngine map[string]json.RawMessage + require.NoError(t, json.Unmarshal(h.engine.calls[0].payload, &sentToEngine)) + assert.Contains(t, sentToEngine, "actor") + assert.Contains(t, sentToEngine, "cmd") + assert.NotContains(t, sentToEngine, "commands", "GM must rewrite the field name") + var actor string + require.NoError(t, json.Unmarshal(sentToEngine["actor"], &actor)) + assert.Equal(t, testRaceName, actor) + var cmd []json.RawMessage + require.NoError(t, json.Unmarshal(sentToEngine["cmd"], &cmd)) + assert.Len(t, cmd, 1) +} + +func TestHandleHappyPathDoesNotTrustCallerActor(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{}`) + + payload := `{"actor":"Hacker","commands":[{"@type":"BUILD_SHIP","cmdId":"00000000-0000-0000-0000-000000000001"}]}` + result, err := h.service.Handle(context.Background(), h.inputWithCommands(payload)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome) + + require.Len(t, h.engine.calls, 1) + var sentToEngine map[string]json.RawMessage + require.NoError(t, json.Unmarshal(h.engine.calls[0].payload, &sentToEngine)) + var actor string + require.NoError(t, json.Unmarshal(sentToEngine["actor"], &actor)) + assert.Equal(t, testRaceName, actor, "GM must override caller-supplied actor") +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + input commandexecute.Input + message string + }{ + {"empty game id", commandexecute.Input{UserID: testUserID, Payload: json.RawMessage(basicCommandsPayload())}, "game id"}, + {"empty user id", commandexecute.Input{GameID: testGameID, Payload: json.RawMessage(basicCommandsPayload())}, "user id"}, + {"empty payload", commandexecute.Input{GameID: testGameID, UserID: testUserID}, "payload"}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), tc.input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Contains(t, result.ErrorMessage, tc.message) + }) + } +} + +func TestHandleMalformedPayload(t *testing.T) { + cases := []struct { + name string + payload string + }{ + {"non-object", `[1,2,3]`}, + {"missing commands", `{"orders":[]}`}, + {"commands not array", `{"commands":"oops"}`}, + {"non-json", `not json`}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(tc.payload)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleRuntimeStoreError(t *testing.T) { + h := newHarness(t) + h.runtimes.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleRuntimeNotRunning(t *testing.T) { + for _, status := range []runtime.Status{ + runtime.StatusStarting, + runtime.StatusGenerationInProgress, + runtime.StatusGenerationFailed, + runtime.StatusStopped, + runtime.StatusEngineUnreachable, + runtime.StatusFinished, + } { + t.Run(string(status), func(t *testing.T) { + h := newHarness(t) + startedAt := h.now.Add(-1 * time.Hour) + finishedAt := h.now + record := runtime.RuntimeRecord{ + GameID: testGameID, + Status: status, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + } + if status != runtime.StatusStarting { + record.StartedAt = &startedAt + } + if status == runtime.StatusStopped { + record.StoppedAt = &finishedAt + } + if status == runtime.StatusFinished { + record.FinishedAt = &finishedAt + } + h.runtimes.seed(record) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeRuntimeNotRunning, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleForbiddenInactiveMembership(t *testing.T) { + cases := []struct { + name string + members []ports.Membership + }{ + {"removed", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "removed"}}}, + {"blocked", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "blocked"}}}, + {"unknown user", []ports.Membership{{UserID: "ghost", RaceName: "Ghost", Status: "active"}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedPlayerMapping() + h.lobby.seed(testGameID, tc.members) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleForbiddenMissingPlayerMapping(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + // no player mapping + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) +} + +func TestHandleServiceUnavailableLobbyDown(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedPlayerMapping() + h.lobby.seedErr(testGameID, fmt.Errorf("dial: %w", ports.ErrLobbyUnavailable)) + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleServiceUnavailablePlayerMappingsError(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.mappings.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleEngineUnreachable(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("dial: %w", ports.ErrEngineUnreachable) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeEngineUnreachable, result.ErrorCode) + assert.Empty(t, result.RawResponse, "engine_unreachable does not forward a body") +} + +func TestHandleEngineValidationErrorForwardsBody(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"results":[{"cmd_id":"x","cmd_error_code":"INVALID_TARGET"}]}`) + h.engine.err = fmt.Errorf("400: %w", ports.ErrEngineValidation) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeEngineValidationError, result.ErrorCode) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) +} + +func TestHandleEngineProtocolViolation(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("garbled: %w", ports.ErrEngineProtocolViolation) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicCommandsPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, commandexecute.ErrorCodeEngineProtocolViolation, result.ErrorCode) +} + +func TestHandleNilContext(t *testing.T) { + h := newHarness(t) + var nilCtx context.Context + _, err := h.service.Handle(nilCtx, h.inputWithCommands(basicCommandsPayload())) + require.Error(t, err) +} + +func TestHandleNilReceiver(t *testing.T) { + var svc *commandexecute.Service + _, err := svc.Handle(context.Background(), commandexecute.Input{}) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/engineversion/errors.go b/gamemaster/internal/service/engineversion/errors.go new file mode 100644 index 0000000..28a65c1 --- /dev/null +++ b/gamemaster/internal/service/engineversion/errors.go @@ -0,0 +1,36 @@ +package engineversion + +// Stable error codes returned alongside service-level errors. The values +// match the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. The handler layer (Stage 19) +// maps the wrapped sentinel error to one of these codes; tests compare +// against the constant. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty required fields, malformed JSON + // options, malformed semver, malformed Docker reference, partial + // Update with no fields set, unsupported status enum). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeConflict reports that an Insert was rejected because a + // row with the same `version` already exists. + ErrorCodeConflict = "conflict" + + // ErrorCodeEngineVersionNotFound reports that the requested + // version is not present in the registry. Returned by Get, + // Update, Deprecate, Delete, and ResolveImageRef. + ErrorCodeEngineVersionNotFound = "engine_version_not_found" + + // ErrorCodeEngineVersionInUse reports that a hard-delete attempt + // was rejected because the version is still referenced by a + // non-finished `runtime_records` row. + ErrorCodeEngineVersionInUse = "engine_version_in_use" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/engineversion/service.go b/gamemaster/internal/service/engineversion/service.go new file mode 100644 index 0000000..4eabd4d --- /dev/null +++ b/gamemaster/internal/service/engineversion/service.go @@ -0,0 +1,752 @@ +// Package engineversion implements the engine version registry service +// owned by Game Master. The service backs the +// `/api/v1/internal/engine-versions/*` REST surface (Stage 19) and the +// hot-path `image_ref` resolve called synchronously by Game Lobby's +// start flow. +// +// Responsibilities and stable error codes are frozen by +// `gamemaster/README.md §Engine Version Registry` and +// `gamemaster/api/internal-openapi.yaml`. Design rationale for stage 14 +// is captured in `gamemaster/docs/stage14-engine-version-registry.md`. +package engineversion + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + + "github.com/distribution/reference" +) + +// Sentinel errors returned by the service. Handlers translate these +// into the stable `ErrorCode...` constants from `errors.go`. The +// adapter-level sentinels (`engineversion.ErrNotFound`, +// `engineversion.ErrConflict`, `engineversion.ErrInUse`, +// `engineversion.ErrInvalidSemver`) are wrapped with one of the +// service-level sentinels below before crossing the package boundary. +var ( + // ErrInvalidRequest reports that the input envelope failed + // structural validation. + ErrInvalidRequest = errors.New("invalid request") + + // ErrNotFound reports that the requested version does not exist + // in the registry. + ErrNotFound = errors.New("engine version not found") + + // ErrConflict reports that an Insert was rejected because a row + // with the same version already exists. + ErrConflict = errors.New("engine version already exists") + + // ErrInUse reports that a hard-delete attempt was rejected + // because a non-finished runtime references the version. + ErrInUse = errors.New("engine version in use") + + // ErrServiceUnavailable reports that a steady-state dependency + // was unreachable for this call. + ErrServiceUnavailable = errors.New("service unavailable") +) + +// CreateInput stores the per-call arguments for one Create operation. +// Mirrors `CreateEngineVersionRequest` plus the audit-only OpSource / +// SourceRef pair. +type CreateInput struct { + // Version stores the canonical semver (with or without the leading + // "v"; ParseSemver normalises it). + Version string + + // ImageRef stores the Docker reference of the engine image. + // Validated against `github.com/distribution/reference` before + // the row is persisted. + ImageRef string + + // Options stores the engine-side options document as raw JSON. + // Empty means "use the schema default `{}`". When non-empty the + // service validates the bytes parse as a JSON object. + Options []byte + + // OpSource classifies how the request entered Game Master. + // Defaults to `admin_rest` when missing or unknown. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference. + SourceRef string +} + +// UpdateInput stores the per-call arguments for one Update operation. +// Pointer fields communicate "leave alone" (nil) vs. "write the value" +// (non-nil); at least one must be set. +type UpdateInput struct { + // Version identifies the row to mutate. + Version string + + // ImageRef is the new image reference. Nil leaves the column + // unchanged; non-nil must be a valid Docker reference. + ImageRef *string + + // Options is the new options document. Nil leaves the column + // unchanged; non-nil must be a JSON object (possibly the empty + // object). + Options *[]byte + + // Status is the new registry status. Nil leaves the column + // unchanged; non-nil must be a known status value. + Status *engineversion.Status + + // OpSource classifies how the request entered Game Master. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference. + SourceRef string +} + +// DeprecateInput stores the per-call arguments for one Deprecate +// operation. +type DeprecateInput struct { + // Version identifies the row to deprecate. + Version string + + // OpSource classifies how the request entered Game Master. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference. + SourceRef string +} + +// DeleteInput stores the per-call arguments for one hard Delete +// operation. +type DeleteInput struct { + // Version identifies the row to delete. + Version string + + // OpSource classifies how the request entered Game Master. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference. + SourceRef string +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // EngineVersions persists the registry rows. Required. + EngineVersions ports.EngineVersionStore + + // OperationLogs records the audit entry for every mutation + // (Create, Update, Deprecate, Delete). Required. + OperationLogs ports.OperationLogStore + + // Logger records structured service-level events. Defaults to + // slog.Default when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for created_at / updated_at + // and audit timestamps. Defaults to time.Now when nil. + Clock func() time.Time +} + +// Service implements the engine version registry operations. +type Service struct { + versions ports.EngineVersionStore + operationLogs ports.OperationLogStore + + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.EngineVersions == nil: + return nil, errors.New("new engine version service: nil engine version store") + case deps.OperationLogs == nil: + return nil, errors.New("new engine version service: nil operation log store") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.engineversion") + + return &Service{ + versions: deps.EngineVersions, + operationLogs: deps.OperationLogs, + logger: logger, + clock: clock, + }, nil +} + +// List returns every registry row, optionally filtered by status. A +// non-nil statusFilter must reference a known engineversion.Status. +func (service *Service) List(ctx context.Context, statusFilter *engineversion.Status) ([]engineversion.EngineVersion, error) { + if service == nil { + return nil, errors.New("engine version list: nil service") + } + if ctx == nil { + return nil, errors.New("engine version list: nil context") + } + if statusFilter != nil && !statusFilter.IsKnown() { + return nil, fmt.Errorf("%w: status %q is unsupported", ErrInvalidRequest, *statusFilter) + } + versions, err := service.versions.List(ctx, statusFilter) + if err != nil { + return nil, fmt.Errorf("%w: list engine versions: %s", ErrServiceUnavailable, err.Error()) + } + return versions, nil +} + +// Get returns the registry row identified by version. Returns +// ErrNotFound when no row matches. +func (service *Service) Get(ctx context.Context, version string) (engineversion.EngineVersion, error) { + if service == nil { + return engineversion.EngineVersion{}, errors.New("engine version get: nil service") + } + if ctx == nil { + return engineversion.EngineVersion{}, errors.New("engine version get: nil context") + } + if strings.TrimSpace(version) == "" { + return engineversion.EngineVersion{}, fmt.Errorf("%w: version must not be empty", ErrInvalidRequest) + } + got, err := service.versions.Get(ctx, version) + switch { + case errors.Is(err, engineversion.ErrNotFound): + return engineversion.EngineVersion{}, fmt.Errorf("%w: %q", ErrNotFound, version) + case err != nil: + return engineversion.EngineVersion{}, fmt.Errorf("%w: get engine version: %s", ErrServiceUnavailable, err.Error()) + } + return got, nil +} + +// ResolveImageRef returns the image_ref of the requested version. This +// is the hot path used by Game Lobby's start flow synchronously per +// register-runtime envelope. +func (service *Service) ResolveImageRef(ctx context.Context, version string) (string, error) { + got, err := service.Get(ctx, version) + if err != nil { + return "", err + } + return got.ImageRef, nil +} + +// Create installs a fresh registry row. Validates the semver shape and +// Docker reference before touching the store. On success appends a +// success entry to operation_log; on classified failure (validation, +// conflict, store error) appends a failure entry. +func (service *Service) Create(ctx context.Context, input CreateInput) (engineversion.EngineVersion, error) { + if service == nil { + return engineversion.EngineVersion{}, errors.New("engine version create: nil service") + } + if ctx == nil { + return engineversion.EngineVersion{}, errors.New("engine version create: nil context") + } + + startedAt := service.clock().UTC() + + canonicalVersion, err := engineversion.ParseSemver(input.Version) + if err != nil { + return engineversion.EngineVersion{}, service.recordCreateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("parse semver: %s", err.Error()), + fmt.Errorf("%w: %s", ErrInvalidRequest, err.Error()), + ) + } + + if err := validateImageRef(input.ImageRef); err != nil { + return engineversion.EngineVersion{}, service.recordCreateFailure( + ctx, startedAt, canonicalVersion, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("validate image_ref: %s", err.Error()), + fmt.Errorf("%w: %s", ErrInvalidRequest, err.Error()), + ) + } + + options, err := normalizeOptions(input.Options) + if err != nil { + return engineversion.EngineVersion{}, service.recordCreateFailure( + ctx, startedAt, canonicalVersion, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("validate options: %s", err.Error()), + fmt.Errorf("%w: %s", ErrInvalidRequest, err.Error()), + ) + } + + record := engineversion.EngineVersion{ + Version: canonicalVersion, + ImageRef: strings.TrimSpace(input.ImageRef), + Options: options, + Status: engineversion.StatusActive, + CreatedAt: startedAt, + UpdatedAt: startedAt, + } + + if err := service.versions.Insert(ctx, record); err != nil { + switch { + case errors.Is(err, engineversion.ErrConflict): + return engineversion.EngineVersion{}, service.recordCreateFailure( + ctx, startedAt, canonicalVersion, input.OpSource, input.SourceRef, + ErrorCodeConflict, "engine version already exists", + fmt.Errorf("%w: %s", ErrConflict, canonicalVersion), + ) + default: + return engineversion.EngineVersion{}, service.recordCreateFailure( + ctx, startedAt, canonicalVersion, input.OpSource, input.SourceRef, + ErrorCodeServiceUnavailable, fmt.Sprintf("insert engine version: %s", err.Error()), + fmt.Errorf("%w: insert engine version: %s", ErrServiceUnavailable, err.Error()), + ) + } + } + + service.appendSuccess(ctx, operation.OpKindEngineVersionCreate, canonicalVersion, input.OpSource, input.SourceRef, startedAt) + + logArgs := []any{ + "version", canonicalVersion, + "image_ref", record.ImageRef, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "engine version created", logArgs...) + + return record, nil +} + +// Update applies a partial update to one registry row. At least one of +// ImageRef, Options, Status must be non-nil. +func (service *Service) Update(ctx context.Context, input UpdateInput) (engineversion.EngineVersion, error) { + if service == nil { + return engineversion.EngineVersion{}, errors.New("engine version update: nil service") + } + if ctx == nil { + return engineversion.EngineVersion{}, errors.New("engine version update: nil context") + } + + startedAt := service.clock().UTC() + + if strings.TrimSpace(input.Version) == "" { + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, "version must not be empty", + fmt.Errorf("%w: version must not be empty", ErrInvalidRequest), + ) + } + if input.ImageRef == nil && input.Options == nil && input.Status == nil { + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, "at least one field must be set", + fmt.Errorf("%w: at least one field must be set", ErrInvalidRequest), + ) + } + if input.ImageRef != nil { + if err := validateImageRef(*input.ImageRef); err != nil { + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("validate image_ref: %s", err.Error()), + fmt.Errorf("%w: %s", ErrInvalidRequest, err.Error()), + ) + } + } + if input.Status != nil && !input.Status.IsKnown() { + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("status %q is unsupported", *input.Status), + fmt.Errorf("%w: status %q is unsupported", ErrInvalidRequest, *input.Status), + ) + } + var normalizedOptions *[]byte + if input.Options != nil { + opts, err := normalizeOptions(*input.Options) + if err != nil { + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, fmt.Sprintf("validate options: %s", err.Error()), + fmt.Errorf("%w: %s", ErrInvalidRequest, err.Error()), + ) + } + normalizedOptions = &opts + } + + storeInput := ports.UpdateEngineVersionInput{ + Version: input.Version, + Options: normalizedOptions, + Status: input.Status, + Now: startedAt, + } + if input.ImageRef != nil { + trimmed := strings.TrimSpace(*input.ImageRef) + storeInput.ImageRef = &trimmed + } + + if err := service.versions.Update(ctx, storeInput); err != nil { + switch { + case errors.Is(err, engineversion.ErrNotFound): + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeEngineVersionNotFound, fmt.Sprintf("engine version %q not found", input.Version), + fmt.Errorf("%w: %q", ErrNotFound, input.Version), + ) + default: + return engineversion.EngineVersion{}, service.recordUpdateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeServiceUnavailable, fmt.Sprintf("update engine version: %s", err.Error()), + fmt.Errorf("%w: update engine version: %s", ErrServiceUnavailable, err.Error()), + ) + } + } + + persisted, err := service.versions.Get(ctx, input.Version) + if err != nil { + // The Update succeeded but the post-read failed. Surface the + // store error; the audit entry still records the successful + // mutation against operation_log. + service.appendSuccess(ctx, operation.OpKindEngineVersionUpdate, input.Version, input.OpSource, input.SourceRef, startedAt) + return engineversion.EngineVersion{}, fmt.Errorf("%w: reload engine version: %s", ErrServiceUnavailable, err.Error()) + } + + service.appendSuccess(ctx, operation.OpKindEngineVersionUpdate, input.Version, input.OpSource, input.SourceRef, startedAt) + + logArgs := []any{ + "version", input.Version, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "engine version updated", logArgs...) + + return persisted, nil +} + +// Deprecate marks one registry row as deprecated. Idempotent: the call +// succeeds even when the row is already deprecated. Returns ErrNotFound +// when no row matches. +func (service *Service) Deprecate(ctx context.Context, input DeprecateInput) error { + if service == nil { + return errors.New("engine version deprecate: nil service") + } + if ctx == nil { + return errors.New("engine version deprecate: nil context") + } + + startedAt := service.clock().UTC() + + if strings.TrimSpace(input.Version) == "" { + return service.recordDeprecateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, "version must not be empty", + fmt.Errorf("%w: version must not be empty", ErrInvalidRequest), + ) + } + + if err := service.versions.Deprecate(ctx, input.Version, startedAt); err != nil { + switch { + case errors.Is(err, engineversion.ErrNotFound): + return service.recordDeprecateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeEngineVersionNotFound, fmt.Sprintf("engine version %q not found", input.Version), + fmt.Errorf("%w: %q", ErrNotFound, input.Version), + ) + default: + return service.recordDeprecateFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeServiceUnavailable, fmt.Sprintf("deprecate engine version: %s", err.Error()), + fmt.Errorf("%w: deprecate engine version: %s", ErrServiceUnavailable, err.Error()), + ) + } + } + + service.appendSuccess(ctx, operation.OpKindEngineVersionDeprecate, input.Version, input.OpSource, input.SourceRef, startedAt) + + logArgs := []any{ + "version", input.Version, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "engine version deprecated", logArgs...) + + return nil +} + +// Delete hard-deletes one registry row. Rejected with ErrInUse when any +// non-finished runtime still references the version. The reference +// probe runs first so the conflict is surfaced before the row is +// removed. +func (service *Service) Delete(ctx context.Context, input DeleteInput) error { + if service == nil { + return errors.New("engine version delete: nil service") + } + if ctx == nil { + return errors.New("engine version delete: nil context") + } + + startedAt := service.clock().UTC() + + if strings.TrimSpace(input.Version) == "" { + return service.recordDeleteFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeInvalidRequest, "version must not be empty", + fmt.Errorf("%w: version must not be empty", ErrInvalidRequest), + ) + } + + referenced, err := service.versions.IsReferencedByActiveRuntime(ctx, input.Version) + if err != nil { + return service.recordDeleteFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeServiceUnavailable, fmt.Sprintf("is referenced by active runtime: %s", err.Error()), + fmt.Errorf("%w: is referenced by active runtime: %s", ErrServiceUnavailable, err.Error()), + ) + } + if referenced { + return service.recordDeleteFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeEngineVersionInUse, fmt.Sprintf("engine version %q is referenced by an active runtime", input.Version), + fmt.Errorf("%w: %q", ErrInUse, input.Version), + ) + } + + if err := service.versions.Delete(ctx, input.Version); err != nil { + switch { + case errors.Is(err, engineversion.ErrNotFound): + return service.recordDeleteFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeEngineVersionNotFound, fmt.Sprintf("engine version %q not found", input.Version), + fmt.Errorf("%w: %q", ErrNotFound, input.Version), + ) + default: + return service.recordDeleteFailure( + ctx, startedAt, input.Version, input.OpSource, input.SourceRef, + ErrorCodeServiceUnavailable, fmt.Sprintf("delete engine version: %s", err.Error()), + fmt.Errorf("%w: delete engine version: %s", ErrServiceUnavailable, err.Error()), + ) + } + } + + service.appendSuccess(ctx, operation.OpKindEngineVersionDelete, input.Version, input.OpSource, input.SourceRef, startedAt) + + logArgs := []any{ + "version", input.Version, + "op_source", string(fallbackOpSource(input.OpSource)), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "engine version deleted", logArgs...) + + return nil +} + +// validateImageRef enforces the Docker reference shape required by +// `engine_versions.image_ref`: non-empty trimmed, parseable through +// `distribution/reference.ParseNormalizedNamed`. The check is the same +// one Runtime Manager applies in startruntime so the registry never +// stores a value the runtime cannot pull. +func validateImageRef(imageRef string) error { + trimmed := strings.TrimSpace(imageRef) + if trimmed == "" { + return fmt.Errorf("image_ref must not be empty") + } + if _, err := reference.ParseNormalizedNamed(trimmed); err != nil { + return fmt.Errorf("parse image reference %q: %w", trimmed, err) + } + return nil +} + +// normalizeOptions validates that raw is a JSON document encoding a +// single object. Empty input is treated as `{}` and stored verbatim by +// the adapter (see stage 11 D5). +func normalizeOptions(raw []byte) ([]byte, error) { + trimmed := bytesTrim(raw) + if len(trimmed) == 0 { + return nil, nil + } + var probe map[string]any + if err := json.Unmarshal(trimmed, &probe); err != nil { + return nil, fmt.Errorf("options must be a JSON object: %w", err) + } + return trimmed, nil +} + +// bytesTrim returns raw with surrounding ASCII whitespace removed. The +// helper avoids the round-trip through `string` for raw JSON inputs. +func bytesTrim(raw []byte) []byte { + start, end := 0, len(raw) + for start < end && isASCIISpace(raw[start]) { + start++ + } + for end > start && isASCIISpace(raw[end-1]) { + end-- + } + return raw[start:end] +} + +func isASCIISpace(b byte) bool { + switch b { + case ' ', '\t', '\n', '\r': + return true + default: + return false + } +} + +// recordCreateFailure appends an audit failure entry for a Create call +// and returns the original sentinel error wrapped with the failure +// reason. The audit entry is written best-effort; storage failures are +// logged and discarded. +func (service *Service) recordCreateFailure( + ctx context.Context, + startedAt time.Time, + subject string, + source operation.OpSource, + sourceRef string, + errorCode string, + errorMessage string, + wrappedErr error, +) error { + service.appendFailure(ctx, operation.OpKindEngineVersionCreate, subject, source, sourceRef, startedAt, errorCode, errorMessage) + service.logFailure(ctx, "engine version create failed", subject, source, errorCode, errorMessage) + return wrappedErr +} + +func (service *Service) recordUpdateFailure( + ctx context.Context, + startedAt time.Time, + subject string, + source operation.OpSource, + sourceRef string, + errorCode string, + errorMessage string, + wrappedErr error, +) error { + service.appendFailure(ctx, operation.OpKindEngineVersionUpdate, subject, source, sourceRef, startedAt, errorCode, errorMessage) + service.logFailure(ctx, "engine version update failed", subject, source, errorCode, errorMessage) + return wrappedErr +} + +func (service *Service) recordDeprecateFailure( + ctx context.Context, + startedAt time.Time, + subject string, + source operation.OpSource, + sourceRef string, + errorCode string, + errorMessage string, + wrappedErr error, +) error { + service.appendFailure(ctx, operation.OpKindEngineVersionDeprecate, subject, source, sourceRef, startedAt, errorCode, errorMessage) + service.logFailure(ctx, "engine version deprecate failed", subject, source, errorCode, errorMessage) + return wrappedErr +} + +func (service *Service) recordDeleteFailure( + ctx context.Context, + startedAt time.Time, + subject string, + source operation.OpSource, + sourceRef string, + errorCode string, + errorMessage string, + wrappedErr error, +) error { + service.appendFailure(ctx, operation.OpKindEngineVersionDelete, subject, source, sourceRef, startedAt, errorCode, errorMessage) + service.logFailure(ctx, "engine version delete failed", subject, source, errorCode, errorMessage) + return wrappedErr +} + +// appendSuccess writes a success entry to operation_log. Subject is the +// canonical version string; the entry's GameID column doubles as the +// audit subject for engine-version operations (stage 14 decision — +// the registry is global, not per-game). +func (service *Service) appendSuccess( + ctx context.Context, + kind operation.OpKind, + subject string, + source operation.OpSource, + sourceRef string, + startedAt time.Time, +) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: subject, + OpKind: kind, + OpSource: fallbackOpSource(source), + SourceRef: sourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: startedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailure writes a failure entry to operation_log. Subject and +// the GameID column overload follow the same rule as appendSuccess. +func (service *Service) appendFailure( + ctx context.Context, + kind operation.OpKind, + subject string, + source operation.OpSource, + sourceRef string, + startedAt time.Time, + errorCode string, + errorMessage string, +) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: subject, + OpKind: kind, + OpSource: fallbackOpSource(source), + SourceRef: sourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: startedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the registry mutation (or its absence) remains the +// source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "subject", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// logFailure emits one structured warn-level entry per service-level +// failure, mirroring registerruntime's log shape. +func (service *Service) logFailure( + ctx context.Context, + message string, + subject string, + source operation.OpSource, + errorCode string, + errorMessage string, +) { + logArgs := []any{ + "version", subject, + "op_source", string(fallbackOpSource(source)), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, message, logArgs...) +} + +// fallbackOpSource defaults to admin_rest when source is missing or +// unrecognised. Mirrors `gamemaster/README.md §Trusted Surfaces`. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/engineversion/service_test.go b/gamemaster/internal/service/engineversion/service_test.go new file mode 100644 index 0000000..df59ade --- /dev/null +++ b/gamemaster/internal/service/engineversion/service_test.go @@ -0,0 +1,631 @@ +package engineversion_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + domainengineversion "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/engineversion" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// fakeOperationLogs is a thread-safe stub recorder for the few +// operation_log entries the engine-version service writes per call. +// Using a stub keeps the operation_log assertions table-driven without +// introducing the verbosity of a gomock recorder for every entry. +type fakeOperationLogs struct { + mu sync.Mutex + entries []operation.OperationEntry + err error +} + +func newFakeOperationLogs() *fakeOperationLogs { + return &fakeOperationLogs{} +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.err != nil { + return 0, s.err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} + +func (s *fakeOperationLogs) ListByGame(_ context.Context, _ string, _ int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used in engineversion tests") +} + +func (s *fakeOperationLogs) snapshot() []operation.OperationEntry { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]operation.OperationEntry, len(s.entries)) + copy(out, s.entries) + return out +} + +type harness struct { + ctrl *gomock.Controller + store *mocks.MockEngineVersionStore + oplog *fakeOperationLogs + clock time.Time + service *engineversion.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + store := mocks.NewMockEngineVersionStore(ctrl) + oplog := newFakeOperationLogs() + clock := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + + service, err := engineversion.NewService(engineversion.Dependencies{ + EngineVersions: store, + OperationLogs: oplog, + Clock: func() time.Time { return clock }, + }) + require.NoError(t, err) + + return &harness{ + ctrl: ctrl, + store: store, + oplog: oplog, + clock: clock, + service: service, + } +} + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + ctrl := gomock.NewController(t) + store := mocks.NewMockEngineVersionStore(ctrl) + oplog := newFakeOperationLogs() + + tests := []struct { + name string + deps engineversion.Dependencies + }{ + {"nil store", engineversion.Dependencies{OperationLogs: oplog}}, + {"nil oplog", engineversion.Dependencies{EngineVersions: store}}, + } + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + s, err := engineversion.NewService(tc.deps) + require.Error(t, err) + require.Nil(t, s) + }) + } +} + +func TestNewServiceDefaultsClockAndLogger(t *testing.T) { + ctrl := gomock.NewController(t) + service, err := engineversion.NewService(engineversion.Dependencies{ + EngineVersions: mocks.NewMockEngineVersionStore(ctrl), + OperationLogs: newFakeOperationLogs(), + }) + require.NoError(t, err) + require.NotNil(t, service) +} + +// --- List ------------------------------------------------------------ + +func TestListNoFilter(t *testing.T) { + h := newHarness(t) + rows := []domainengineversion.EngineVersion{ + {Version: "v1.2.3", ImageRef: "ghcr.io/galaxy/game:v1.2.3", Status: domainengineversion.StatusActive}, + {Version: "v1.3.0", ImageRef: "ghcr.io/galaxy/game:v1.3.0", Status: domainengineversion.StatusDeprecated}, + } + h.store.EXPECT().List(gomock.Any(), nil).Return(rows, nil) + + got, err := h.service.List(context.Background(), nil) + require.NoError(t, err) + assert.Equal(t, rows, got) +} + +func TestListWithStatusFilter(t *testing.T) { + h := newHarness(t) + active := domainengineversion.StatusActive + expected := []domainengineversion.EngineVersion{ + {Version: "v1.2.3", ImageRef: "ghcr.io/galaxy/game:v1.2.3", Status: active}, + } + h.store.EXPECT().List(gomock.Any(), &active).Return(expected, nil) + + got, err := h.service.List(context.Background(), &active) + require.NoError(t, err) + assert.Equal(t, expected, got) +} + +func TestListRejectsUnknownStatusFilter(t *testing.T) { + h := newHarness(t) + exotic := domainengineversion.Status("exotic") + got, err := h.service.List(context.Background(), &exotic) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) + assert.Nil(t, got) +} + +func TestListWrapsStoreErrorAsServiceUnavailable(t *testing.T) { + h := newHarness(t) + storeErr := errors.New("pg down") + h.store.EXPECT().List(gomock.Any(), nil).Return(nil, storeErr) + + _, err := h.service.List(context.Background(), nil) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +// --- Get ------------------------------------------------------------- + +func TestGetHappyPath(t *testing.T) { + h := newHarness(t) + row := domainengineversion.EngineVersion{ + Version: "v1.2.3", ImageRef: "ghcr.io/galaxy/game:v1.2.3", Status: domainengineversion.StatusActive, + } + h.store.EXPECT().Get(gomock.Any(), "v1.2.3").Return(row, nil) + + got, err := h.service.Get(context.Background(), "v1.2.3") + require.NoError(t, err) + assert.Equal(t, row, got) +} + +func TestGetNotFound(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Get(gomock.Any(), "v9.9.9").Return(domainengineversion.EngineVersion{}, domainengineversion.ErrNotFound) + + _, err := h.service.Get(context.Background(), "v9.9.9") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestGetRejectsEmptyVersion(t *testing.T) { + h := newHarness(t) + _, err := h.service.Get(context.Background(), " ") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestGetWrapsStoreError(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Get(gomock.Any(), "v1.2.3").Return(domainengineversion.EngineVersion{}, errors.New("pg down")) + + _, err := h.service.Get(context.Background(), "v1.2.3") + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +// --- ResolveImageRef ------------------------------------------------- + +func TestResolveImageRefHappyPath(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Get(gomock.Any(), "v1.2.3").Return(domainengineversion.EngineVersion{ + Version: "v1.2.3", ImageRef: "ghcr.io/galaxy/game:v1.2.3", Status: domainengineversion.StatusActive, + }, nil) + + got, err := h.service.ResolveImageRef(context.Background(), "v1.2.3") + require.NoError(t, err) + assert.Equal(t, "ghcr.io/galaxy/game:v1.2.3", got) +} + +func TestResolveImageRefSeededTable(t *testing.T) { + tests := []struct { + name string + seedVersion string + seedRef string + }{ + {"v1.0.0", "v1.0.0", "ghcr.io/galaxy/game:v1.0.0"}, + {"v1.2.3 with prerelease metadata", "v1.2.3-rc1", "ghcr.io/galaxy/game:v1.2.3-rc1"}, + {"v2.0.0 fully-qualified", "v2.0.0", "registry.galaxy.local/game:v2.0.0"}, + } + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Get(gomock.Any(), tc.seedVersion).Return(domainengineversion.EngineVersion{ + Version: tc.seedVersion, ImageRef: tc.seedRef, Status: domainengineversion.StatusActive, + }, nil) + got, err := h.service.ResolveImageRef(context.Background(), tc.seedVersion) + require.NoError(t, err) + assert.Equal(t, tc.seedRef, got) + }) + } +} + +func TestResolveImageRefNotFound(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Get(gomock.Any(), "v9.9.9").Return(domainengineversion.EngineVersion{}, domainengineversion.ErrNotFound) + + _, err := h.service.ResolveImageRef(context.Background(), "v9.9.9") + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +// --- Create ---------------------------------------------------------- + +func TestCreateHappyPath(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Insert(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, record domainengineversion.EngineVersion) error { + assert.Equal(t, "v1.2.3", record.Version) + assert.Equal(t, "ghcr.io/galaxy/game:v1.2.3", record.ImageRef) + assert.Equal(t, domainengineversion.StatusActive, record.Status) + assert.Equal(t, h.clock, record.CreatedAt) + assert.Equal(t, h.clock, record.UpdatedAt) + return nil + }, + ) + + got, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + Options: []byte(`{"max_planets":120}`), + OpSource: operation.OpSourceAdminRest, + SourceRef: "request-1", + }) + require.NoError(t, err) + assert.Equal(t, "v1.2.3", got.Version) + + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OpKindEngineVersionCreate, entries[0].OpKind) + assert.Equal(t, "v1.2.3", entries[0].GameID) + assert.Equal(t, operation.OutcomeSuccess, entries[0].Outcome) + assert.Equal(t, operation.OpSourceAdminRest, entries[0].OpSource) + assert.Equal(t, "request-1", entries[0].SourceRef) +} + +func TestCreateRejectsInvalidSemver(t *testing.T) { + tests := []string{"", " ", "not-a-version", "v1.2", "1.2"} + for _, version := range tests { + t.Run(version, func(t *testing.T) { + h := newHarness(t) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: version, + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) + }) + } +} + +func TestCreateAuditFailureForBadImageRef(t *testing.T) { + h := newHarness(t) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: " ", + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OpKindEngineVersionCreate, entries[0].OpKind) + assert.Equal(t, "v1.2.3", entries[0].GameID) + assert.Equal(t, operation.OutcomeFailure, entries[0].Outcome) + assert.Equal(t, engineversion.ErrorCodeInvalidRequest, entries[0].ErrorCode) +} + +func TestCreateRejectsBadDockerReference(t *testing.T) { + h := newHarness(t) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: "BAD//Ref::", + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestCreateRejectsNonObjectOptions(t *testing.T) { + h := newHarness(t) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + Options: []byte(`[1,2,3]`), + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestCreateAcceptsEmptyOptionsAsNil(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Insert(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, record domainengineversion.EngineVersion) error { + assert.Empty(t, record.Options, "expected empty options pass-through (adapter writes default {})") + return nil + }, + ) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + Options: nil, + }) + require.NoError(t, err) +} + +func TestCreateConflict(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Insert(gomock.Any(), gomock.Any()).Return(domainengineversion.ErrConflict) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrConflict)) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OutcomeFailure, entries[0].Outcome) + assert.Equal(t, engineversion.ErrorCodeConflict, entries[0].ErrorCode) +} + +func TestCreateUnknownStoreError(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Insert(gomock.Any(), gomock.Any()).Return(errors.New("pg down")) + _, err := h.service.Create(context.Background(), engineversion.CreateInput{ + Version: "v1.2.3", + ImageRef: "ghcr.io/galaxy/game:v1.2.3", + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +// --- Update ---------------------------------------------------------- + +func TestUpdateHappyPath(t *testing.T) { + h := newHarness(t) + newRef := "ghcr.io/galaxy/game:v1.2.4" + deprecated := domainengineversion.StatusDeprecated + + gomock.InOrder( + h.store.EXPECT().Update(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateEngineVersionInput) error { + require.NotNil(t, input.ImageRef) + assert.Equal(t, newRef, *input.ImageRef) + require.NotNil(t, input.Status) + assert.Equal(t, deprecated, *input.Status) + assert.Equal(t, h.clock, input.Now) + return nil + }, + ), + h.store.EXPECT().Get(gomock.Any(), "v1.2.3").Return(domainengineversion.EngineVersion{ + Version: "v1.2.3", ImageRef: newRef, Status: deprecated, UpdatedAt: h.clock, + }, nil), + ) + + got, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: "v1.2.3", + ImageRef: &newRef, + Status: &deprecated, + }) + require.NoError(t, err) + assert.Equal(t, deprecated, got.Status) + + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OpKindEngineVersionUpdate, entries[0].OpKind) + assert.Equal(t, operation.OutcomeSuccess, entries[0].Outcome) +} + +func TestUpdateRejectsEmptyVersion(t *testing.T) { + h := newHarness(t) + newRef := "ghcr.io/galaxy/game:v1.2.4" + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: " ", + ImageRef: &newRef, + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestUpdateRejectsEmptyPatch(t *testing.T) { + h := newHarness(t) + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{Version: "v1.2.3"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestUpdateRejectsBadImageRef(t *testing.T) { + h := newHarness(t) + bad := "BAD//Ref::" + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: "v1.2.3", + ImageRef: &bad, + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestUpdateRejectsUnknownStatus(t *testing.T) { + h := newHarness(t) + bad := domainengineversion.Status("exotic") + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: "v1.2.3", + Status: &bad, + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestUpdateRejectsBadOptions(t *testing.T) { + h := newHarness(t) + bad := []byte(`"not-an-object"`) + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: "v1.2.3", + Options: &bad, + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestUpdateNotFound(t *testing.T) { + h := newHarness(t) + newRef := "ghcr.io/galaxy/game:v1.2.4" + h.store.EXPECT().Update(gomock.Any(), gomock.Any()).Return(domainengineversion.ErrNotFound) + _, err := h.service.Update(context.Background(), engineversion.UpdateInput{ + Version: "v1.2.3", + ImageRef: &newRef, + }) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, engineversion.ErrorCodeEngineVersionNotFound, entries[0].ErrorCode) +} + +// --- Deprecate ------------------------------------------------------- + +func TestDeprecateHappyPath(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Deprecate(gomock.Any(), "v1.2.3", h.clock).Return(nil) + + err := h.service.Deprecate(context.Background(), engineversion.DeprecateInput{Version: "v1.2.3"}) + require.NoError(t, err) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OpKindEngineVersionDeprecate, entries[0].OpKind) + assert.Equal(t, operation.OutcomeSuccess, entries[0].Outcome) +} + +func TestDeprecateRejectsEmptyVersion(t *testing.T) { + h := newHarness(t) + err := h.service.Deprecate(context.Background(), engineversion.DeprecateInput{}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestDeprecateNotFound(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Deprecate(gomock.Any(), "v9.9.9", h.clock).Return(domainengineversion.ErrNotFound) + err := h.service.Deprecate(context.Background(), engineversion.DeprecateInput{Version: "v9.9.9"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OutcomeFailure, entries[0].Outcome) + assert.Equal(t, engineversion.ErrorCodeEngineVersionNotFound, entries[0].ErrorCode) +} + +func TestDeprecateUnknownStoreError(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().Deprecate(gomock.Any(), "v1.2.3", h.clock).Return(errors.New("pg down")) + err := h.service.Deprecate(context.Background(), engineversion.DeprecateInput{Version: "v1.2.3"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +// --- Delete ---------------------------------------------------------- + +func TestDeleteHappyPath(t *testing.T) { + h := newHarness(t) + gomock.InOrder( + h.store.EXPECT().IsReferencedByActiveRuntime(gomock.Any(), "v1.2.3").Return(false, nil), + h.store.EXPECT().Delete(gomock.Any(), "v1.2.3").Return(nil), + ) + err := h.service.Delete(context.Background(), engineversion.DeleteInput{ + Version: "v1.2.3", + OpSource: operation.OpSourceAdminRest, + SourceRef: "ticket-42", + }) + require.NoError(t, err) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OpKindEngineVersionDelete, entries[0].OpKind) + assert.Equal(t, operation.OutcomeSuccess, entries[0].Outcome) + assert.Equal(t, "ticket-42", entries[0].SourceRef) +} + +func TestDeleteRejectsEmptyVersion(t *testing.T) { + h := newHarness(t) + err := h.service.Delete(context.Background(), engineversion.DeleteInput{}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInvalidRequest)) +} + +func TestDeleteRejectedWhenReferenced(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().IsReferencedByActiveRuntime(gomock.Any(), "v1.2.3").Return(true, nil) + // Delete must not be called when the row is referenced. + + err := h.service.Delete(context.Background(), engineversion.DeleteInput{Version: "v1.2.3"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrInUse)) + entries := h.oplog.snapshot() + require.Len(t, entries, 1) + assert.Equal(t, operation.OutcomeFailure, entries[0].Outcome) + assert.Equal(t, engineversion.ErrorCodeEngineVersionInUse, entries[0].ErrorCode) +} + +func TestDeleteIsReferencedProbeError(t *testing.T) { + h := newHarness(t) + h.store.EXPECT().IsReferencedByActiveRuntime(gomock.Any(), "v1.2.3").Return(false, errors.New("pg down")) + + err := h.service.Delete(context.Background(), engineversion.DeleteInput{Version: "v1.2.3"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +func TestDeleteNotFound(t *testing.T) { + h := newHarness(t) + gomock.InOrder( + h.store.EXPECT().IsReferencedByActiveRuntime(gomock.Any(), "v9.9.9").Return(false, nil), + h.store.EXPECT().Delete(gomock.Any(), "v9.9.9").Return(domainengineversion.ErrNotFound), + ) + err := h.service.Delete(context.Background(), engineversion.DeleteInput{Version: "v9.9.9"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrNotFound)) +} + +func TestDeleteUnknownStoreError(t *testing.T) { + h := newHarness(t) + gomock.InOrder( + h.store.EXPECT().IsReferencedByActiveRuntime(gomock.Any(), "v1.2.3").Return(false, nil), + h.store.EXPECT().Delete(gomock.Any(), "v1.2.3").Return(errors.New("pg down")), + ) + err := h.service.Delete(context.Background(), engineversion.DeleteInput{Version: "v1.2.3"}) + require.Error(t, err) + require.True(t, errors.Is(err, engineversion.ErrServiceUnavailable)) +} + +// --- guard rails ----------------------------------------------------- + +func TestNilContextReturnsError(t *testing.T) { + h := newHarness(t) + + t.Run("List", func(t *testing.T) { + _, err := h.service.List(nil, nil) //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) + t.Run("Get", func(t *testing.T) { + _, err := h.service.Get(nil, "v1.2.3") //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) + t.Run("Create", func(t *testing.T) { + _, err := h.service.Create(nil, engineversion.CreateInput{}) //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) + t.Run("Update", func(t *testing.T) { + _, err := h.service.Update(nil, engineversion.UpdateInput{}) //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) + t.Run("Deprecate", func(t *testing.T) { + err := h.service.Deprecate(nil, engineversion.DeprecateInput{}) //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) + t.Run("Delete", func(t *testing.T) { + err := h.service.Delete(nil, engineversion.DeleteInput{}) //nolint:staticcheck // intentional nil context + require.Error(t, err) + }) +} + +func TestNilServiceReturnsError(t *testing.T) { + var s *engineversion.Service + _, err := s.Get(context.Background(), "v1.2.3") + require.Error(t, err) + _, err = s.Create(context.Background(), engineversion.CreateInput{}) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/livenessreply/errors.go b/gamemaster/internal/service/livenessreply/errors.go new file mode 100644 index 0000000..5308949 --- /dev/null +++ b/gamemaster/internal/service/livenessreply/errors.go @@ -0,0 +1,19 @@ +package livenessreply + +// Stable error codes returned by Handle as Go-level errors. Liveness +// reply itself never produces a 4xx/5xx response — the endpoint always +// answers 200 — but the service surfaces structural validation +// failures to the handler so it can return the standard error envelope. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty GameID). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/livenessreply/service.go b/gamemaster/internal/service/livenessreply/service.go new file mode 100644 index 0000000..71e8f6b --- /dev/null +++ b/gamemaster/internal/service/livenessreply/service.go @@ -0,0 +1,114 @@ +// Package livenessreply implements the Lobby-facing liveness service- +// layer answer owned by Game Master. It is driven by Game Lobby +// resuming a paused game through +// `GET /api/v1/internal/games/{game_id}/liveness` and reflects GM's +// own view of the runtime without ever calling the engine. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Liveness reply`. The 200 / status="" response on +// `runtime_not_found` is the Stage 17 D5 decision recorded in +// `gamemaster/docs/stage17-admin-operations.md`. +package livenessreply + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strings" + + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" +) + +// Input stores the per-call arguments for one liveness reply. +type Input struct { + // GameID identifies the runtime to inspect. + GameID string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. The +// endpoint always answers 200; the result fields populate the JSON +// body. ErrorCode / ErrorMessage are reserved for handler-side error +// envelopes and are never set by Handle on a successful read. +type Result struct { + // Ready is true when the runtime exists and is in `running`. + Ready bool + + // Status carries the observed runtime status. Empty when the + // runtime record does not exist (Stage 17 D5). + Status runtime.Status +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords supplies the runtime status read. + RuntimeRecords ports.RuntimeRecordStore + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger +} + +// Service executes the liveness reply lookup. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + logger *slog.Logger +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + if deps.RuntimeRecords == nil { + return nil, errors.New("new liveness reply service: nil runtime records") + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.livenessreply") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + logger: logger, + }, nil +} + +// Handle executes one liveness reply lookup. The Go-level error return +// is reserved for non-business failures: nil context, nil receiver, +// invalid input (so the handler can answer `invalid_request`), or a +// store read failure (so the handler can answer `service_unavailable`). +// `runtime.ErrNotFound` is intentionally absorbed into Result with +// `Ready=false` and an empty status. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("liveness reply: nil service") + } + if ctx == nil { + return Result{}, errors.New("liveness reply: nil context") + } + if err := input.Validate(); err != nil { + return Result{}, fmt.Errorf("%s: %w", ErrorCodeInvalidRequest, err) + } + + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case err == nil: + return Result{ + Ready: record.Status == runtime.StatusRunning, + Status: record.Status, + }, nil + case errors.Is(err, runtime.ErrNotFound): + return Result{Ready: false, Status: ""}, nil + default: + return Result{}, fmt.Errorf("%s: get runtime record: %w", ErrorCodeServiceUnavailable, err) + } +} diff --git a/gamemaster/internal/service/livenessreply/service_test.go b/gamemaster/internal/service/livenessreply/service_test.go new file mode 100644 index 0000000..eb4a6dc --- /dev/null +++ b/gamemaster/internal/service/livenessreply/service_test.go @@ -0,0 +1,175 @@ +package livenessreply_test + +import ( + "context" + "errors" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/livenessreply" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} + +func newService(t *testing.T, store *fakeRuntimeRecords) *livenessreply.Service { + t.Helper() + service, err := livenessreply.NewService(livenessreply.Dependencies{ + RuntimeRecords: store, + }) + require.NoError(t, err) + return service +} + +func runningRecord(gameID string) runtime.RuntimeRecord { + now := time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC) + return runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-" + gameID + ":8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 5, + CreatedAt: now, + UpdatedAt: now, + } +} + +func TestNewServiceRejectsNilRuntimeRecords(t *testing.T) { + _, err := livenessreply.NewService(livenessreply.Dependencies{}) + require.Error(t, err) +} + +func TestHandleRunningReturnsReadyTrue(t *testing.T) { + store := newFakeRuntimeRecords() + store.seed(runningRecord("game-001")) + service := newService(t, store) + + result, err := service.Handle(context.Background(), livenessreply.Input{GameID: "game-001"}) + require.NoError(t, err) + assert.True(t, result.Ready) + assert.Equal(t, runtime.StatusRunning, result.Status) +} + +func TestHandleNonRunningReturnsReadyFalseWithStatus(t *testing.T) { + cases := []runtime.Status{ + runtime.StatusStarting, + runtime.StatusGenerationInProgress, + runtime.StatusGenerationFailed, + runtime.StatusEngineUnreachable, + runtime.StatusStopped, + runtime.StatusFinished, + } + for _, status := range cases { + t.Run(string(status), func(t *testing.T) { + store := newFakeRuntimeRecords() + rec := runningRecord("game-001") + rec.Status = status + store.seed(rec) + service := newService(t, store) + + result, err := service.Handle(context.Background(), livenessreply.Input{GameID: "game-001"}) + require.NoError(t, err) + assert.False(t, result.Ready) + assert.Equal(t, status, result.Status) + }) + } +} + +func TestHandleRuntimeNotFoundReturnsEmptyStatus(t *testing.T) { + store := newFakeRuntimeRecords() + service := newService(t, store) + + result, err := service.Handle(context.Background(), livenessreply.Input{GameID: "missing"}) + require.NoError(t, err, "runtime_not_found is absorbed into 200 response per Stage 17 D5") + assert.False(t, result.Ready) + assert.Equal(t, runtime.Status(""), result.Status) +} + +func TestHandleStoreReadFailureReturnsServiceUnavailable(t *testing.T) { + store := newFakeRuntimeRecords() + store.getErr = errors.New("connection refused") + service := newService(t, store) + + _, err := service.Handle(context.Background(), livenessreply.Input{GameID: "game-001"}) + require.Error(t, err) + assert.Contains(t, err.Error(), livenessreply.ErrorCodeServiceUnavailable) +} + +func TestHandleEmptyGameIDReturnsInvalidRequest(t *testing.T) { + store := newFakeRuntimeRecords() + service := newService(t, store) + + _, err := service.Handle(context.Background(), livenessreply.Input{GameID: ""}) + require.Error(t, err) + assert.Contains(t, err.Error(), livenessreply.ErrorCodeInvalidRequest) +} + +func TestHandleNilContextReturnsError(t *testing.T) { + store := newFakeRuntimeRecords() + service := newService(t, store) + + _, err := service.Handle(nil, livenessreply.Input{GameID: "game-001"}) //nolint:staticcheck // guard test + require.Error(t, err) +} diff --git a/gamemaster/internal/service/membership/cache.go b/gamemaster/internal/service/membership/cache.go new file mode 100644 index 0000000..cf0b2a9 --- /dev/null +++ b/gamemaster/internal/service/membership/cache.go @@ -0,0 +1,280 @@ +// Package membership implements the in-process membership cache that +// authorises every hot-path call (commandexecute, orderput, reportget) +// owned by Game Master. +// +// The cache is a per-game TTL projection of Lobby's +// `/api/v1/internal/games/{game_id}/memberships` view. Lobby invokes the +// invalidation hook (`POST /api/v1/internal/games/{game_id}/memberships/invalidate`) +// post-commit on every roster mutation; the TTL is the safety net for any +// missed invalidation. Cache rules and trade-offs are documented in +// `gamemaster/README.md §Hot Path → Membership cache and invalidation` and +// `gamemaster/docs/stage16-membership-cache-and-invalidation.md`. +package membership + +import ( + "container/list" + "context" + "errors" + "fmt" + "log/slog" + "sync" + "time" + + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" +) + +// Result labels used with `telemetry.Runtime.RecordMembershipCacheResult`. +const ( + resultHit = "hit" + resultMiss = "miss" + resultInvalidate = "invalidate" +) + +// Dependencies groups the collaborators required by Cache. +type Dependencies struct { + // Lobby loads the per-game membership projection on cache miss. + Lobby ports.LobbyClient + + // Telemetry records `gamemaster.membership_cache.hits` outcomes. + Telemetry *telemetry.Runtime + + // Logger records structured cache events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for entry freshness. Defaults + // to `time.Now` when nil. + Clock func() time.Time + + // TTL bounds the freshness of one cached entry; expired entries are + // re-fetched from Lobby. Must be positive. + TTL time.Duration + + // MaxGames bounds the cache size in number of games. The + // least-recently-used entry is evicted when an insert overflows the + // bound. Must be positive. + MaxGames int +} + +// Cache stores the per-game membership projection used by hot-path +// services. The zero value is not usable; construct with NewCache. +type Cache struct { + lobby ports.LobbyClient + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time + ttl time.Duration + maxGames int + + mu sync.Mutex + entries map[string]*list.Element // gameID → element holding *cacheEntry + lru *list.List // *cacheEntry, MRU at front + inflight map[string]*flight // gameID → in-flight Lobby fetch +} + +// cacheEntry stores one per-game membership projection. +type cacheEntry struct { + gameID string + members map[string]string // user_id → status ("active"|"removed"|"blocked") + loadedAt time.Time +} + +// flight coordinates concurrent misses on the same gameID so only one +// Lobby fetch is issued. Joiners wait on `done`; the leader populates +// `members` (or `err`) before closing the channel. +type flight struct { + done chan struct{} + members map[string]string + err error +} + +// NewCache constructs a Cache from deps. Returns a Go-level error when a +// required dependency is missing or a numeric bound is non-positive. +func NewCache(deps Dependencies) (*Cache, error) { + switch { + case deps.Lobby == nil: + return nil, errors.New("new membership cache: nil lobby client") + case deps.Telemetry == nil: + return nil, errors.New("new membership cache: nil telemetry runtime") + case deps.TTL <= 0: + return nil, fmt.Errorf("new membership cache: ttl must be positive, got %s", deps.TTL) + case deps.MaxGames <= 0: + return nil, fmt.Errorf("new membership cache: max games must be positive, got %d", deps.MaxGames) + } + + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("component", "gamemaster.membership_cache") + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + + return &Cache{ + lobby: deps.Lobby, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + ttl: deps.TTL, + maxGames: deps.MaxGames, + entries: make(map[string]*list.Element), + lru: list.New(), + inflight: make(map[string]*flight), + }, nil +} + +// Resolve returns the membership status of userID inside gameID. The +// returned status is the raw Lobby vocabulary (`"active"`, `"removed"`, +// `"blocked"`) and is empty when the user is not present in the roster at +// all; callers must compare against `"active"` to authorise a hot-path +// call. +// +// Resolve fetches from Lobby on cache miss, on TTL expiry, or after an +// Invalidate. Concurrent misses on the same gameID share a single Lobby +// call. A failed Lobby fetch surfaces as ErrLobbyUnavailable and is not +// cached. +func (cache *Cache) Resolve(ctx context.Context, gameID, userID string) (string, error) { + if cache == nil { + return "", errors.New("membership cache: nil receiver") + } + if ctx == nil { + return "", errors.New("membership cache: nil context") + } + + if entry, ok := cache.lookupFresh(gameID); ok { + cache.telemetry.RecordMembershipCacheResult(ctx, resultHit) + return entry.members[userID], nil + } + + members, err := cache.fetch(ctx, gameID) + cache.telemetry.RecordMembershipCacheResult(ctx, resultMiss) + if err != nil { + logArgs := []any{ + "game_id", gameID, + "err", err.Error(), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + cache.logger.WarnContext(ctx, "lobby fetch failed", logArgs...) + return "", err + } + return members[userID], nil +} + +// Invalidate purges the cache entry for gameID, if any. Subsequent +// Resolve calls fetch from Lobby. Safe to call from the invalidation +// hook handler (Stage 19) at any time. +func (cache *Cache) Invalidate(gameID string) { + if cache == nil { + return + } + cache.mu.Lock() + if element, ok := cache.entries[gameID]; ok { + cache.lru.Remove(element) + delete(cache.entries, gameID) + } + cache.mu.Unlock() + cache.telemetry.RecordMembershipCacheResult(context.Background(), resultInvalidate) +} + +// lookupFresh returns the cached entry for gameID when it exists and is +// still fresh. The MRU position is updated under the lock. +func (cache *Cache) lookupFresh(gameID string) (*cacheEntry, bool) { + cache.mu.Lock() + defer cache.mu.Unlock() + element, ok := cache.entries[gameID] + if !ok { + return nil, false + } + entry := element.Value.(*cacheEntry) + if cache.clock().Sub(entry.loadedAt) >= cache.ttl { + return nil, false + } + cache.lru.MoveToFront(element) + return entry, true +} + +// fetch loads the membership projection from Lobby, deduplicating +// concurrent misses on the same gameID through the inflight map. The +// successful result is cached; failures are not. +func (cache *Cache) fetch(ctx context.Context, gameID string) (map[string]string, error) { + cache.mu.Lock() + if existing, ok := cache.inflight[gameID]; ok { + cache.mu.Unlock() + select { + case <-existing.done: + if existing.err != nil { + return nil, existing.err + } + return existing.members, nil + case <-ctx.Done(): + return nil, ctx.Err() + } + } + current := &flight{done: make(chan struct{})} + cache.inflight[gameID] = current + cache.mu.Unlock() + + members, err := cache.loadFromLobby(ctx, gameID) + + cache.mu.Lock() + delete(cache.inflight, gameID) + if err == nil { + cache.installLocked(gameID, members) + } + cache.mu.Unlock() + + if err != nil { + current.err = err + } else { + current.members = members + } + close(current.done) + + if err != nil { + return nil, err + } + return members, nil +} + +// loadFromLobby calls the LobbyClient and projects the raw response to +// the user_id → status map the cache stores. +func (cache *Cache) loadFromLobby(ctx context.Context, gameID string) (map[string]string, error) { + records, err := cache.lobby.GetMemberships(ctx, gameID) + if err != nil { + return nil, fmt.Errorf("%w: %w", ErrLobbyUnavailable, err) + } + members := make(map[string]string, len(records)) + for _, record := range records { + members[record.UserID] = record.Status + } + return members, nil +} + +// installLocked stores members under gameID, evicting the least-recently +// -used entry if the cache is at capacity. Caller must hold cache.mu. +func (cache *Cache) installLocked(gameID string, members map[string]string) { + now := cache.clock() + if element, ok := cache.entries[gameID]; ok { + entry := element.Value.(*cacheEntry) + entry.members = members + entry.loadedAt = now + cache.lru.MoveToFront(element) + return + } + entry := &cacheEntry{gameID: gameID, members: members, loadedAt: now} + cache.entries[gameID] = cache.lru.PushFront(entry) + for cache.lru.Len() > cache.maxGames { + oldest := cache.lru.Back() + if oldest == nil { + break + } + evicted := oldest.Value.(*cacheEntry) + cache.lru.Remove(oldest) + delete(cache.entries, evicted.gameID) + } +} diff --git a/gamemaster/internal/service/membership/cache_test.go b/gamemaster/internal/service/membership/cache_test.go new file mode 100644 index 0000000..f8deae8 --- /dev/null +++ b/gamemaster/internal/service/membership/cache_test.go @@ -0,0 +1,376 @@ +package membership_test + +import ( + "context" + "errors" + "fmt" + "sync" + "sync/atomic" + "testing" + "time" + + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// fakeLobby is a hand-rolled LobbyClient stub used by membership tests. +// It mirrors the test-double style used elsewhere in the gamemaster +// service tree. +type fakeLobby struct { + mu sync.Mutex + calls atomic.Int32 + answers map[string][]ports.Membership + errs map[string]error + delay time.Duration + released chan struct{} +} + +func newFakeLobby() *fakeLobby { + return &fakeLobby{ + answers: map[string][]ports.Membership{}, + errs: map[string]error{}, + } +} + +func (f *fakeLobby) seed(gameID string, members []ports.Membership) { + f.mu.Lock() + defer f.mu.Unlock() + f.answers[gameID] = members +} + +func (f *fakeLobby) seedErr(gameID string, err error) { + f.mu.Lock() + defer f.mu.Unlock() + f.errs[gameID] = err +} + +func (f *fakeLobby) GetMemberships(ctx context.Context, gameID string) ([]ports.Membership, error) { + f.calls.Add(1) + if f.delay > 0 { + select { + case <-time.After(f.delay): + case <-ctx.Done(): + return nil, ctx.Err() + } + } + if f.released != nil { + select { + case <-f.released: + case <-ctx.Done(): + return nil, ctx.Err() + } + } + f.mu.Lock() + defer f.mu.Unlock() + if err, ok := f.errs[gameID]; ok { + return nil, err + } + if members, ok := f.answers[gameID]; ok { + out := make([]ports.Membership, len(members)) + copy(out, members) + return out, nil + } + return []ports.Membership{}, nil +} + +func (f *fakeLobby) GetGameSummary(_ context.Context, _ string) (ports.GameSummary, error) { + return ports.GameSummary{}, errors.New("not used in cache tests") +} + +func newTelemetry(t *testing.T) *telemetry.Runtime { + t.Helper() + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + return tel +} + +func active(userID, raceName string) ports.Membership { + return ports.Membership{UserID: userID, RaceName: raceName, Status: "active", JoinedAt: time.Unix(0, 0).UTC()} +} + +func newCacheForTest(t *testing.T, lobby ports.LobbyClient, ttl time.Duration, maxGames int, clock func() time.Time) *membership.Cache { + t.Helper() + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: lobby, + Telemetry: newTelemetry(t), + TTL: ttl, + MaxGames: maxGames, + Clock: clock, + }) + require.NoError(t, err) + return cache +} + +func TestNewCacheRejectsBadDependencies(t *testing.T) { + tel := newTelemetry(t) + cases := []struct { + name string + deps membership.Dependencies + }{ + {"nil lobby", membership.Dependencies{Telemetry: tel, TTL: time.Second, MaxGames: 1}}, + {"nil telemetry", membership.Dependencies{Lobby: newFakeLobby(), TTL: time.Second, MaxGames: 1}}, + {"zero ttl", membership.Dependencies{Lobby: newFakeLobby(), Telemetry: tel, TTL: 0, MaxGames: 1}}, + {"negative ttl", membership.Dependencies{Lobby: newFakeLobby(), Telemetry: tel, TTL: -time.Second, MaxGames: 1}}, + {"zero max games", membership.Dependencies{Lobby: newFakeLobby(), Telemetry: tel, TTL: time.Second, MaxGames: 0}}, + {"negative max games", membership.Dependencies{Lobby: newFakeLobby(), Telemetry: tel, TTL: time.Second, MaxGames: -1}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + cache, err := membership.NewCache(tc.deps) + require.Error(t, err) + assert.Nil(t, cache) + }) + } +} + +func TestResolveHitServesCachedEntry(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari"), active("user-2", "Drazi")}) + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + clock := func() time.Time { return now } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + first, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, "active", first) + + second, err := cache.Resolve(context.Background(), "game-1", "user-2") + require.NoError(t, err) + assert.Equal(t, "active", second) + + assert.Equal(t, int32(1), lobby.calls.Load()) +} + +func TestResolveUnknownUserReturnsEmptyString(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari")}) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + status, err := cache.Resolve(context.Background(), "game-1", "ghost") + require.NoError(t, err) + assert.Empty(t, status) +} + +func TestResolveTTLExpiryRefetches(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari")}) + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + clockTime := now + clock := func() time.Time { return clockTime } + cache := newCacheForTest(t, lobby, 30*time.Second, 8, clock) + + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(1), lobby.calls.Load()) + + clockTime = now.Add(20 * time.Second) + _, err = cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(1), lobby.calls.Load(), "fresh entry must not refetch") + + clockTime = now.Add(31 * time.Second) + _, err = cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(2), lobby.calls.Load(), "expired entry must refetch") +} + +func TestInvalidatePurgesEntry(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari")}) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(1), lobby.calls.Load()) + + cache.Invalidate("game-1") + + _, err = cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(2), lobby.calls.Load()) +} + +func TestInvalidateOnAbsentGameIsNoop(t *testing.T) { + lobby := newFakeLobby() + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + cache.Invalidate("missing") +} + +func TestLRUEvictsOldestEntry(t *testing.T) { + lobby := newFakeLobby() + for index := range 4 { + gameID := fmt.Sprintf("game-%d", index) + lobby.seed(gameID, []ports.Membership{active("user-1", "Aelinari")}) + } + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + clockTime := now + clock := func() time.Time { return clockTime } + cache := newCacheForTest(t, lobby, time.Minute, 2, clock) + + // Load games 0, 1, 2 sequentially. The cache holds at most 2; game-0 + // must have been evicted by the time game-2 lands. + for index := range 3 { + clockTime = now.Add(time.Duration(index) * time.Second) + _, err := cache.Resolve(context.Background(), fmt.Sprintf("game-%d", index), "user-1") + require.NoError(t, err) + } + require.Equal(t, int32(3), lobby.calls.Load()) + + // Re-resolving game-1 hits the cache. + clockTime = now.Add(3 * time.Second) + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(3), lobby.calls.Load(), "game-1 must still be cached") + + // Re-resolving game-0 misses (it was the LRU victim). + clockTime = now.Add(4 * time.Second) + _, err = cache.Resolve(context.Background(), "game-0", "user-1") + require.NoError(t, err) + assert.Equal(t, int32(4), lobby.calls.Load(), "game-0 must have been evicted") +} + +func TestResolveLobbyUnavailableSurfacesAndDoesNotCache(t *testing.T) { + lobby := newFakeLobby() + lobby.seedErr("game-1", fmt.Errorf("dial: %w", ports.ErrLobbyUnavailable)) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.Error(t, err) + assert.True(t, errors.Is(err, membership.ErrLobbyUnavailable)) + assert.True(t, errors.Is(err, ports.ErrLobbyUnavailable)) + + _, err = cache.Resolve(context.Background(), "game-1", "user-1") + require.Error(t, err) + assert.Equal(t, int32(2), lobby.calls.Load(), "failed fetch must not be cached") +} + +func TestResolveUnwrappedLobbyErrorIsStillSurfacedAsLobbyUnavailable(t *testing.T) { + lobby := newFakeLobby() + lobby.seedErr("game-1", errors.New("transport")) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.Error(t, err) + assert.True(t, errors.Is(err, membership.ErrLobbyUnavailable)) +} + +func TestResolveDeduplicatesConcurrentMisses(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari")}) + gate := make(chan struct{}) + lobby.released = gate + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + const callers = 16 + var wg sync.WaitGroup + results := make([]string, callers) + errs := make([]error, callers) + wg.Add(callers) + for index := range callers { + go func(slot int) { + defer wg.Done() + results[slot], errs[slot] = cache.Resolve(context.Background(), "game-1", "user-1") + }(index) + } + + // Give all goroutines a moment to register on the inflight map + // before releasing the Lobby fetch. + time.Sleep(10 * time.Millisecond) + close(gate) + wg.Wait() + + for index := range callers { + require.NoError(t, errs[index]) + assert.Equal(t, "active", results[index]) + } + assert.Equal(t, int32(1), lobby.calls.Load(), "concurrent misses must collapse to one Lobby call") +} + +func TestResolveRespectsContextCancellation(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{active("user-1", "Aelinari")}) + gate := make(chan struct{}) + lobby.released = gate + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + leaderDone := make(chan struct{}) + go func() { + defer close(leaderDone) + _, _ = cache.Resolve(context.Background(), "game-1", "user-1") + }() + + // Wait for leader to register the inflight slot. + time.Sleep(10 * time.Millisecond) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + + _, err := cache.Resolve(ctx, "game-1", "user-1") + require.Error(t, err) + assert.True(t, errors.Is(err, context.Canceled)) + + close(gate) + <-leaderDone +} + +func TestResolveRefreshAfterErrorReturnsSuccess(t *testing.T) { + lobby := newFakeLobby() + lobby.seedErr("game-1", errors.New("transport")) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + _, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.Error(t, err) + + lobby.mu.Lock() + delete(lobby.errs, "game-1") + lobby.answers["game-1"] = []ports.Membership{active("user-1", "Aelinari")} + lobby.mu.Unlock() + + status, err := cache.Resolve(context.Background(), "game-1", "user-1") + require.NoError(t, err) + assert.Equal(t, "active", status) +} + +func TestResolveRejectsNilContextAndReceiver(t *testing.T) { + lobby := newFakeLobby() + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + var nilCtx context.Context + _, err := cache.Resolve(nilCtx, "game-1", "user-1") + require.Error(t, err) + + var nilCache *membership.Cache + _, err = nilCache.Resolve(context.Background(), "game-1", "user-1") + require.Error(t, err) +} + +func TestStatusFromLobbyIsPreserved(t *testing.T) { + lobby := newFakeLobby() + lobby.seed("game-1", []ports.Membership{ + {UserID: "user-1", RaceName: "Aelinari", Status: "active", JoinedAt: time.Unix(0, 0).UTC()}, + {UserID: "user-2", RaceName: "Drazi", Status: "removed", JoinedAt: time.Unix(0, 0).UTC()}, + {UserID: "user-3", RaceName: "Vorlons", Status: "blocked", JoinedAt: time.Unix(0, 0).UTC()}, + }) + clock := func() time.Time { return time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) } + cache := newCacheForTest(t, lobby, time.Minute, 8, clock) + + for userID, expected := range map[string]string{"user-1": "active", "user-2": "removed", "user-3": "blocked"} { + status, err := cache.Resolve(context.Background(), "game-1", userID) + require.NoError(t, err) + assert.Equal(t, expected, status, "user %s", userID) + } +} diff --git a/gamemaster/internal/service/membership/errors.go b/gamemaster/internal/service/membership/errors.go new file mode 100644 index 0000000..3ebd792 --- /dev/null +++ b/gamemaster/internal/service/membership/errors.go @@ -0,0 +1,13 @@ +package membership + +import "errors" + +// ErrLobbyUnavailable signals that a Resolve call could not be completed +// because the upstream Lobby service was unreachable. The cache wraps +// `ports.ErrLobbyUnavailable` returned by the LobbyClient adapter; hot-path +// services map this sentinel to `service_unavailable`. +// +// Callers branch with errors.Is. Returned only on cache miss / TTL expiry +// when the Lobby fetch fails; cached entries are served regardless of +// upstream availability until the TTL elapses. +var ErrLobbyUnavailable = errors.New("membership cache: lobby unavailable") diff --git a/gamemaster/internal/service/orderput/errors.go b/gamemaster/internal/service/orderput/errors.go new file mode 100644 index 0000000..3e60c74 --- /dev/null +++ b/gamemaster/internal/service/orderput/errors.go @@ -0,0 +1,49 @@ +package orderput + +// Stable error codes returned in `Result.ErrorCode`. The values match the +// vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Stage 19's REST handler imports +// these names rather than redeclare them; renaming any of them is a +// contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty required field, malformed payload, + // non-object payload, payload missing the `commands` array). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no `runtime_records` row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeRuntimeNotRunning reports that the runtime exists but its + // current status is not `running`. Hot-path orders are rejected + // outside the running state to avoid racing with admin transitions + // and turn generation. + ErrorCodeRuntimeNotRunning = "runtime_not_running" + + // ErrorCodeForbidden reports that the caller is not an active member + // of the game, or that the (game_id, user_id) pair lacks a player + // mapping. + ErrorCodeForbidden = "forbidden" + + // ErrorCodeEngineUnreachable reports that the engine /api/v1/order + // call returned a 5xx status, timed out, or could not be dispatched. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine returned + // 4xx with a per-command result. The body is forwarded verbatim + // through `Result.RawResponse`. + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine response + // did not match the expected schema. Stage 19 maps this to 502. + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state dependency + // (PostgreSQL, Lobby) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/orderput/service.go b/gamemaster/internal/service/orderput/service.go new file mode 100644 index 0000000..c078dbe --- /dev/null +++ b/gamemaster/internal/service/orderput/service.go @@ -0,0 +1,361 @@ +// Package orderput implements the player-order hot-path service owned by +// Game Master. It accepts a verified `(game_id, user_id, payload)` +// envelope from Edge Gateway, authorises the caller against the membership +// cache, resolves `actor=race_name` from `player_mappings`, reshapes the +// payload to the engine `CommandRequest{actor, cmd}` schema, and forwards +// the call to the engine `/api/v1/order` endpoint. +// +// Lifecycle and error semantics follow `gamemaster/README.md §Hot Path → +// Player commands and orders`. Design rationale is captured in +// `gamemaster/docs/stage16-membership-cache-and-invalidation.md`. +package orderput + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/telemetry" +) + +const ( + engineCallOp = "order" + + membershipStatusActive = "active" + + payloadCommandsKey = "commands" + payloadCmdKey = "cmd" + payloadActorKey = "actor" +) + +// Input stores the per-call arguments for one order-put operation. The +// shape mirrors `PutOrdersRequest` from +// `gamemaster/api/internal-openapi.yaml` plus the verified user identity +// captured from the `X-User-ID` header by the Stage 19 handler. +type Input struct { + // GameID identifies the platform game the order targets. + GameID string + + // UserID identifies the platform user submitting the order. The + // service derives `actor=race_name` from this value via + // `player_mappings`. + UserID string + + // Payload stores the raw `PutOrdersRequest` body. The service + // rewrites it to the engine `CommandRequest{actor, cmd}` shape + // before forwarding. + Payload json.RawMessage +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(input.UserID) == "" { + return fmt.Errorf("user id must not be empty") + } + if len(input.Payload) == 0 { + return fmt.Errorf("payload must not be empty") + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. +type Result struct { + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string + + // RawResponse stores the engine response body. Populated on success + // and on `engine_validation_error`. Empty on every other terminal + // branch. + RawResponse json.RawMessage +} + +// IsSuccess reports whether the result represents a successful operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords loads the engine endpoint and the runtime status. + RuntimeRecords ports.RuntimeRecordStore + + // PlayerMappings resolves `(game_id, user_id) → race_name`. + PlayerMappings ports.PlayerMappingStore + + // Membership authorises the caller. Hot-path services share one + // cache instance with `commandexecute` and `reportget`. + Membership *membership.Cache + + // Engine forwards the reshaped payload to `/api/v1/order`. + Engine ports.EngineClient + + // Telemetry records the per-outcome counter and the engine-call + // latency histogram. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for engine-call latency. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the order-put hot-path operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + playerMappings ports.PlayerMappingStore + membership *membership.Cache + engine ports.EngineClient + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new order put service: nil runtime records") + case deps.PlayerMappings == nil: + return nil, errors.New("new order put service: nil player mappings") + case deps.Membership == nil: + return nil, errors.New("new order put service: nil membership cache") + case deps.Engine == nil: + return nil, errors.New("new order put service: nil engine client") + case deps.Telemetry == nil: + return nil, errors.New("new order put service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.orderput") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + playerMappings: deps.PlayerMappings, + membership: deps.Membership, + engine: deps.Engine, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one order-put operation end-to-end. The Go-level error +// return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("order put: nil service") + } + if ctx == nil { + return Result{}, errors.New("order put: nil context") + } + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, input, ErrorCodeInvalidRequest, err.Error(), nil), nil + } + + record, result, ok := service.loadRecord(ctx, input) + if !ok { + return result, nil + } + if record.Status != runtime.StatusRunning { + message := fmt.Sprintf("runtime status is %q, expected %q", record.Status, runtime.StatusRunning) + return service.recordFailure(ctx, input, ErrorCodeRuntimeNotRunning, message, nil), nil + } + + mapping, result, ok := service.authorise(ctx, input) + if !ok { + return result, nil + } + + payload, err := rewriteOrderPayload(input.Payload, mapping.RaceName) + if err != nil { + return service.recordFailure(ctx, input, ErrorCodeInvalidRequest, err.Error(), nil), nil + } + + body, engineErr := service.callEngine(ctx, record.EngineEndpoint, payload) + if engineErr != nil { + errorCode := classifyEngineError(engineErr) + message := fmt.Sprintf("engine order: %s", engineErr.Error()) + var bodyForCaller json.RawMessage + if errorCode == ErrorCodeEngineValidationError { + bodyForCaller = body + } + return service.recordFailure(ctx, input, errorCode, message, bodyForCaller), nil + } + + service.telemetry.RecordOrderPutOutcome(ctx, + string(operation.OutcomeSuccess), "") + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "actor", mapping.RaceName, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "order put succeeded", logArgs...) + + return Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: body, + }, nil +} + +// loadRecord reads the runtime record and maps store errors to +// orchestrator outcomes. ok=false means the flow stops with the returned +// Result. +func (service *Service) loadRecord(ctx context.Context, input Input) (runtime.RuntimeRecord, Result, bool) { + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case err == nil: + return record, Result{}, true + case errors.Is(err, runtime.ErrNotFound): + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist", nil), false + default: + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error()), nil), false + } +} + +// authorise resolves the membership status and the player mapping for +// the caller. ok=false means the flow stops with the returned Result. +func (service *Service) authorise(ctx context.Context, input Input) (playermapping.PlayerMapping, Result, bool) { + status, err := service.membership.Resolve(ctx, input.GameID, input.UserID) + if err != nil { + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("resolve membership: %s", err.Error()), nil), false + } + if status != membershipStatusActive { + message := fmt.Sprintf("membership status %q does not authorise orders", status) + if status == "" { + message = "user is not a member of the game" + } + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, message, nil), false + } + + mapping, err := service.playerMappings.Get(ctx, input.GameID, input.UserID) + switch { + case err == nil: + return mapping, Result{}, true + case errors.Is(err, playermapping.ErrNotFound): + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, "player mapping not installed for active member", nil), false + default: + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get player mapping: %s", err.Error()), nil), false + } +} + +// callEngine forwards the reshaped payload to the engine and records the +// wall-clock latency under the `order` op label. +func (service *Service) callEngine(ctx context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + start := service.clock() + body, err := service.engine.PutOrders(ctx, baseURL, payload) + service.telemetry.RecordEngineCall(ctx, engineCallOp, service.clock().Sub(start)) + return body, err +} + +// classifyEngineError maps the engine port sentinels to the order-put +// stable error codes. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// recordFailure emits the service-level outcome counter and a structured +// log entry, then returns the Result the caller surfaces. +func (service *Service) recordFailure(ctx context.Context, input Input, errorCode, errorMessage string, rawResponse json.RawMessage) Result { + service.telemetry.RecordOrderPutOutcome(ctx, + string(operation.OutcomeFailure), errorCode) + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "order put rejected", logArgs...) + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + RawResponse: rawResponse, + } +} + +// rewriteOrderPayload reshapes the GM `PutOrdersRequest` body +// (`{commands:[…]}`) to the engine `CommandRequest` body +// (`{actor:, cmd:[…]}`). Every other top-level key is +// discarded; GM never trusts caller-supplied envelope fields per the +// README §Hot Path rule. Returns an error when the payload is not a JSON +// object or the `commands` field is missing or not an array. +func rewriteOrderPayload(payload json.RawMessage, raceName string) (json.RawMessage, error) { + var fields map[string]json.RawMessage + if err := json.Unmarshal(payload, &fields); err != nil { + return nil, fmt.Errorf("payload must decode as a JSON object: %w", err) + } + commands, ok := fields[payloadCommandsKey] + if !ok { + return nil, fmt.Errorf("payload missing required %q field", payloadCommandsKey) + } + var commandList []json.RawMessage + if err := json.Unmarshal(commands, &commandList); err != nil { + return nil, fmt.Errorf("payload %q field must decode as an array: %w", payloadCommandsKey, err) + } + actor, err := json.Marshal(raceName) + if err != nil { + return nil, fmt.Errorf("marshal actor: %w", err) + } + out := map[string]json.RawMessage{ + payloadActorKey: actor, + payloadCmdKey: commands, + } + encoded, err := json.Marshal(out) + if err != nil { + return nil, fmt.Errorf("marshal engine payload: %w", err) + } + _ = commandList // ensure the array shape is validated before forwarding + return encoded, nil +} diff --git a/gamemaster/internal/service/orderput/service_test.go b/gamemaster/internal/service/orderput/service_test.go new file mode 100644 index 0000000..c2fadc3 --- /dev/null +++ b/gamemaster/internal/service/orderput/service_test.go @@ -0,0 +1,600 @@ +package orderput_test + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/service/orderput" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// --- fakes ------------------------------------------------------------ + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} + +type fakePlayerMappings struct { + mu sync.Mutex + stored map[string]map[string]playermapping.PlayerMapping + getErr error +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{stored: map[string]map[string]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) seed(record playermapping.PlayerMapping) { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.stored[record.GameID]; !ok { + s.stored[record.GameID] = map[string]playermapping.PlayerMapping{} + } + s.stored[record.GameID][record.UserID] = record +} + +func (s *fakePlayerMappings) Get(_ context.Context, gameID, userID string) (playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return playermapping.PlayerMapping{}, s.getErr + } + record, ok := s.stored[gameID][userID] + if !ok { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + return record, nil +} + +func (s *fakePlayerMappings) BulkInsert(context.Context, []playermapping.PlayerMapping) error { + return errors.New("not used") +} +func (s *fakePlayerMappings) GetByRace(context.Context, string, string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} +func (s *fakePlayerMappings) ListByGame(context.Context, string) ([]playermapping.PlayerMapping, error) { + return nil, errors.New("not used") +} +func (s *fakePlayerMappings) DeleteByGame(context.Context, string) error { + return errors.New("not used") +} + +type recordedCall struct { + baseURL string + payload json.RawMessage +} + +type fakeEngine struct { + mu sync.Mutex + body json.RawMessage + err error + calls []recordedCall +} + +func (f *fakeEngine) PutOrders(_ context.Context, baseURL string, payload json.RawMessage) (json.RawMessage, error) { + f.mu.Lock() + defer f.mu.Unlock() + stored := append(json.RawMessage(nil), payload...) + f.calls = append(f.calls, recordedCall{baseURL: baseURL, payload: stored}) + return f.body, f.err +} + +func (f *fakeEngine) Init(context.Context, string, ports.InitRequest) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Status(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Turn(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) BanishRace(context.Context, string, string) error { + return errors.New("not used") +} +func (f *fakeEngine) ExecuteCommands(context.Context, string, json.RawMessage) (json.RawMessage, error) { + return nil, errors.New("not used") +} +func (f *fakeEngine) GetReport(context.Context, string, string, int) (json.RawMessage, error) { + return nil, errors.New("not used") +} + +type fakeLobby struct { + mu sync.Mutex + answers map[string][]ports.Membership + errs map[string]error +} + +func newFakeLobby() *fakeLobby { + return &fakeLobby{ + answers: map[string][]ports.Membership{}, + errs: map[string]error{}, + } +} + +func (f *fakeLobby) seed(gameID string, members []ports.Membership) { + f.mu.Lock() + defer f.mu.Unlock() + f.answers[gameID] = members +} + +func (f *fakeLobby) seedErr(gameID string, err error) { + f.mu.Lock() + defer f.mu.Unlock() + f.errs[gameID] = err +} + +func (f *fakeLobby) GetMemberships(_ context.Context, gameID string) ([]ports.Membership, error) { + f.mu.Lock() + defer f.mu.Unlock() + if err, ok := f.errs[gameID]; ok { + return nil, err + } + return append([]ports.Membership(nil), f.answers[gameID]...), nil +} + +func (f *fakeLobby) GetGameSummary(context.Context, string) (ports.GameSummary, error) { + return ports.GameSummary{}, errors.New("not used") +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + now time.Time + runtimes *fakeRuntimeRecords + mappings *fakePlayerMappings + engine *fakeEngine + lobby *fakeLobby + cache *membership.Cache + service *orderput.Service +} + +const ( + testGameID = "game-001" + testUserID = "user-1" + testRaceName = "Aelinari" + testEngineEndpoint = "http://galaxy-game-game-001:8080" +) + +func newHarness(t *testing.T) *harness { + t.Helper() + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + + h := &harness{ + t: t, + now: now, + runtimes: newFakeRuntimeRecords(), + mappings: newFakePlayerMappings(), + engine: &fakeEngine{}, + lobby: newFakeLobby(), + } + + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: h.lobby, + Telemetry: tel, + TTL: time.Minute, + MaxGames: 16, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.cache = cache + + svc, err := orderput.NewService(orderput.Dependencies{ + RuntimeRecords: h.runtimes, + PlayerMappings: h.mappings, + Membership: h.cache, + Engine: h.engine, + Telemetry: tel, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = svc + return h +} + +func (h *harness) seedRunningRecord() { + startedAt := h.now.Add(-1 * time.Hour) + h.runtimes.seed(runtime.RuntimeRecord{ + GameID: testGameID, + Status: runtime.StatusRunning, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + EngineHealth: "healthy", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + StartedAt: &startedAt, + }) +} + +func (h *harness) seedActiveMembership() { + h.lobby.seed(testGameID, []ports.Membership{{ + UserID: testUserID, + RaceName: testRaceName, + Status: "active", + JoinedAt: h.now.Add(-2 * time.Hour), + }}) +} + +func (h *harness) seedPlayerMapping() { + h.mappings.seed(playermapping.PlayerMapping{ + GameID: testGameID, + UserID: testUserID, + RaceName: testRaceName, + EnginePlayerUUID: "uuid-1", + CreatedAt: h.now.Add(-2 * time.Hour), + }) +} + +func (h *harness) inputWithCommands(payload string) orderput.Input { + return orderput.Input{ + GameID: testGameID, + UserID: testUserID, + Payload: json.RawMessage(payload), + } +} + +func basicOrdersPayload() string { + return `{"commands":[{"@type":"BUILD_SHIP","cmdId":"00000000-0000-0000-0000-000000000001"}]}` +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsBadDependencies(t *testing.T) { + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: newFakeLobby(), Telemetry: tel, TTL: time.Minute, MaxGames: 1, + }) + require.NoError(t, err) + + cases := []struct { + name string + deps orderput.Dependencies + }{ + {"nil runtime records", orderput.Dependencies{PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil player mappings", orderput.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil membership", orderput.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil engine", orderput.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Telemetry: tel}}, + {"nil telemetry", orderput.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + svc, err := orderput.NewService(tc.deps) + require.Error(t, err) + assert.Nil(t, svc) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"results":[{"cmd_id":"00000000-0000-0000-0000-000000000001","cmd_applied":true}]}`) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) + + require.Len(t, h.engine.calls, 1) + assert.Equal(t, testEngineEndpoint, h.engine.calls[0].baseURL) + var sentToEngine map[string]json.RawMessage + require.NoError(t, json.Unmarshal(h.engine.calls[0].payload, &sentToEngine)) + assert.Contains(t, sentToEngine, "actor") + assert.Contains(t, sentToEngine, "cmd") + assert.NotContains(t, sentToEngine, "commands", "GM must rewrite the field name") + var actor string + require.NoError(t, json.Unmarshal(sentToEngine["actor"], &actor)) + assert.Equal(t, testRaceName, actor) +} + +func TestHandleHappyPathDoesNotTrustCallerActor(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{}`) + + payload := `{"actor":"Hacker","commands":[{"@type":"BUILD_SHIP","cmdId":"00000000-0000-0000-0000-000000000001"}]}` + result, err := h.service.Handle(context.Background(), h.inputWithCommands(payload)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome) + + var sentToEngine map[string]json.RawMessage + require.NoError(t, json.Unmarshal(h.engine.calls[0].payload, &sentToEngine)) + var actor string + require.NoError(t, json.Unmarshal(sentToEngine["actor"], &actor)) + assert.Equal(t, testRaceName, actor, "GM must override caller-supplied actor") +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + input orderput.Input + message string + }{ + {"empty game id", orderput.Input{UserID: testUserID, Payload: json.RawMessage(basicOrdersPayload())}, "game id"}, + {"empty user id", orderput.Input{GameID: testGameID, Payload: json.RawMessage(basicOrdersPayload())}, "user id"}, + {"empty payload", orderput.Input{GameID: testGameID, UserID: testUserID}, "payload"}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), tc.input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Contains(t, result.ErrorMessage, tc.message) + }) + } +} + +func TestHandleMalformedPayload(t *testing.T) { + cases := []struct { + name string + payload string + }{ + {"non-object", `[1,2,3]`}, + {"missing commands", `{"orders":[]}`}, + {"commands not array", `{"commands":"oops"}`}, + {"non-json", `not json`}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(tc.payload)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleRuntimeStoreError(t *testing.T) { + h := newHarness(t) + h.runtimes.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleRuntimeNotRunning(t *testing.T) { + for _, status := range []runtime.Status{ + runtime.StatusStarting, + runtime.StatusGenerationInProgress, + runtime.StatusGenerationFailed, + runtime.StatusStopped, + runtime.StatusEngineUnreachable, + runtime.StatusFinished, + } { + t.Run(string(status), func(t *testing.T) { + h := newHarness(t) + startedAt := h.now.Add(-1 * time.Hour) + finishedAt := h.now + record := runtime.RuntimeRecord{ + GameID: testGameID, + Status: status, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + } + if status != runtime.StatusStarting { + record.StartedAt = &startedAt + } + if status == runtime.StatusStopped { + record.StoppedAt = &finishedAt + } + if status == runtime.StatusFinished { + record.FinishedAt = &finishedAt + } + h.runtimes.seed(record) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeRuntimeNotRunning, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleForbiddenInactiveMembership(t *testing.T) { + cases := []struct { + name string + members []ports.Membership + }{ + {"removed", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "removed"}}}, + {"blocked", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "blocked"}}}, + {"unknown user", []ports.Membership{{UserID: "ghost", RaceName: "Ghost", Status: "active"}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedPlayerMapping() + h.lobby.seed(testGameID, tc.members) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleForbiddenMissingPlayerMapping(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) +} + +func TestHandleServiceUnavailableLobbyDown(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedPlayerMapping() + h.lobby.seedErr(testGameID, fmt.Errorf("dial: %w", ports.ErrLobbyUnavailable)) + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleServiceUnavailablePlayerMappingsError(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.mappings.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleEngineUnreachable(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("dial: %w", ports.ErrEngineUnreachable) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeEngineUnreachable, result.ErrorCode) +} + +func TestHandleEngineValidationErrorForwardsBody(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"results":[{"cmd_id":"x","cmd_error_code":"INVALID_TARGET"}]}`) + h.engine.err = fmt.Errorf("400: %w", ports.ErrEngineValidation) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeEngineValidationError, result.ErrorCode) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) +} + +func TestHandleEngineProtocolViolation(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord() + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("garbled: %w", ports.ErrEngineProtocolViolation) + + result, err := h.service.Handle(context.Background(), h.inputWithCommands(basicOrdersPayload())) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, orderput.ErrorCodeEngineProtocolViolation, result.ErrorCode) +} + +func TestHandleNilContext(t *testing.T) { + h := newHarness(t) + var nilCtx context.Context + _, err := h.service.Handle(nilCtx, h.inputWithCommands(basicOrdersPayload())) + require.Error(t, err) +} + +func TestHandleNilReceiver(t *testing.T) { + var svc *orderput.Service + _, err := svc.Handle(context.Background(), orderput.Input{}) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/registerruntime/errors.go b/gamemaster/internal/service/registerruntime/errors.go new file mode 100644 index 0000000..c602a67 --- /dev/null +++ b/gamemaster/internal/service/registerruntime/errors.go @@ -0,0 +1,50 @@ +package registerruntime + +// Stable error codes returned in `Result.ErrorCode`. The values match the +// vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Service-layer stages 14-17 +// import these names rather than redeclare them; renaming any of them is +// a contract change. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty required fields, unknown enum values, + // malformed turn schedule). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeConflict reports that a runtime record already exists for + // the requested game id (idempotent re-registration not supported in + // v1) or that a CAS guard failed mid-flow because the row changed + // concurrently. + ErrorCodeConflict = "conflict" + + // ErrorCodeEngineVersionNotFound reports that the requested + // `target_engine_version` is not present in the engine_versions + // registry. Returned before any engine call is attempted. + ErrorCodeEngineVersionNotFound = "engine_version_not_found" + + // ErrorCodeEngineUnreachable reports that the engine /admin/init call + // returned a 5xx status, timed out, or could not be dispatched. The + // runtime_records and player_mappings rows are rolled back before + // the error reaches the caller. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine /admin/init + // call returned a 4xx status. Distinguished from + // `engine_unreachable` so the operator knows the engine is + // reachable but rejected the request shape (per Stage 13 D1). + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine response + // did not match the expected schema or did not match the input + // roster (player count mismatch, race-name set mismatch, missing + // required fields). + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state dependency + // (PostgreSQL, Redis) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by the + // other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/registerruntime/service.go b/gamemaster/internal/service/registerruntime/service.go new file mode 100644 index 0000000..7b2cbb1 --- /dev/null +++ b/gamemaster/internal/service/registerruntime/service.go @@ -0,0 +1,726 @@ +// Package registerruntime implements the register-runtime service-layer +// orchestrator owned by Game Master. The service is the single entry +// point Game Lobby uses (after Runtime Manager has reported a successful +// container start) to install a freshly-started game in Game Master. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Register-runtime`. Design rationale is captured in +// `gamemaster/docs/stage13-register-runtime.md`. +package registerruntime + +import ( + "context" + "errors" + "fmt" + "log/slog" + "sort" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/domain/schedule" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" +) + +// Member stores one entry of Input.Members. The shape mirrors +// `RegisterRuntimeMember` in `gamemaster/api/internal-openapi.yaml`. +type Member struct { + // UserID identifies an active platform member of the game. + UserID string + + // RaceName stores the race name reserved for the member by Game + // Lobby. Used both to build the engine /admin/init roster and to + // resolve the engine response back to user_id. + RaceName string +} + +// Input stores the per-call arguments for one register-runtime +// operation. The shape mirrors `RegisterRuntimeRequest` plus the +// audit-only OpSource / SourceRef pair. +type Input struct { + // GameID identifies the platform game whose runtime is being + // registered. + GameID string + + // EngineEndpoint stores the engine container URL Game Master uses + // for every subsequent call against the runtime + // (`http://galaxy-game-{game_id}:8080`). + EngineEndpoint string + + // Members stores the per-active-member roster Game Lobby committed + // when the platform game opened. Must be non-empty. + Members []Member + + // TargetEngineVersion stores the semver under which Runtime Manager + // started the container. Resolved against the engine_versions + // registry to recover the matching image_ref. + TargetEngineVersion string + + // TurnSchedule stores the five-field cron expression governing turn + // generation, copied from the platform game record. + TurnSchedule string + + // OpSource classifies how the request entered Game Master. Required: + // every operation_log entry carries an op_source. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (request + // id, admin user id). Empty when the caller does not provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(input.EngineEndpoint) == "" { + return fmt.Errorf("engine endpoint must not be empty") + } + if len(input.Members) == 0 { + return fmt.Errorf("members must not be empty") + } + for index, member := range input.Members { + if strings.TrimSpace(member.UserID) == "" { + return fmt.Errorf("members[%d]: user id must not be empty", index) + } + if strings.TrimSpace(member.RaceName) == "" { + return fmt.Errorf("members[%d]: race name must not be empty", index) + } + } + if strings.TrimSpace(input.TargetEngineVersion) == "" { + return fmt.Errorf("target engine version must not be empty") + } + if strings.TrimSpace(input.TurnSchedule) == "" { + return fmt.Errorf("turn schedule must not be empty") + } + if !input.OpSource.IsKnown() { + return fmt.Errorf("op source %q is unsupported", input.OpSource) + } + if duplicate := firstDuplicateMember(input.Members); duplicate != "" { + return fmt.Errorf("members carry duplicate entries for %q", duplicate) + } + return nil +} + +// firstDuplicateMember returns the first user_id or race_name that +// appears more than once in members. Empty when every entry is unique. +func firstDuplicateMember(members []Member) string { + seenUsers := make(map[string]struct{}, len(members)) + seenRaces := make(map[string]struct{}, len(members)) + for _, member := range members { + if _, ok := seenUsers[member.UserID]; ok { + return member.UserID + } + seenUsers[member.UserID] = struct{}{} + if _, ok := seenRaces[member.RaceName]; ok { + return member.RaceName + } + seenRaces[member.RaceName] = struct{}{} + } + return "" +} + +// Result stores the deterministic outcome of one Handle call. Business +// outcomes flow through Result; the Go-level error return is reserved +// for non-business failures (nil context, nil receiver). +type Result struct { + // Record carries the runtime record installed by the operation. + // Populated on success; zero on failure. + Record runtime.RuntimeRecord + + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords stores the runtime_records row installed by the + // flow. + RuntimeRecords ports.RuntimeRecordStore + + // EngineVersions resolves `target_engine_version` to the matching + // image_ref and validates the version exists. + EngineVersions ports.EngineVersionStore + + // PlayerMappings persists the (game_id, user_id) → race_name + // projection derived from the engine /admin/init response. + PlayerMappings ports.PlayerMappingStore + + // OperationLogs records the audit entry for the operation. + OperationLogs ports.OperationLogStore + + // Engine drives the engine /admin/init call and decodes the + // response. + Engine ports.EngineClient + + // LobbyEvents publishes the post-success runtime_snapshot_update + // to `gm:lobby_events`. + LobbyEvents ports.LobbyEventsPublisher + + // Telemetry records register-runtime outcomes plus the snapshot + // publication counter. Required. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the register-runtime lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + engineVersions ports.EngineVersionStore + playerMappings ports.PlayerMappingStore + operationLogs ports.OperationLogStore + engine ports.EngineClient + lobbyEvents ports.LobbyEventsPublisher + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new register runtime service: nil runtime records") + case deps.EngineVersions == nil: + return nil, errors.New("new register runtime service: nil engine versions") + case deps.PlayerMappings == nil: + return nil, errors.New("new register runtime service: nil player mappings") + case deps.OperationLogs == nil: + return nil, errors.New("new register runtime service: nil operation logs") + case deps.Engine == nil: + return nil, errors.New("new register runtime service: nil engine client") + case deps.LobbyEvents == nil: + return nil, errors.New("new register runtime service: nil lobby events publisher") + case deps.Telemetry == nil: + return nil, errors.New("new register runtime service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.registerruntime") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + engineVersions: deps.EngineVersions, + playerMappings: deps.PlayerMappings, + operationLogs: deps.OperationLogs, + engine: deps.Engine, + lobbyEvents: deps.LobbyEvents, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one register-runtime operation end-to-end. The +// Go-level error return is reserved for non-business failures (nil +// context, nil receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("register runtime: nil service") + } + if ctx == nil { + return Result{}, errors.New("register runtime: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeInvalidRequest, err.Error()), nil + } + + if outcome, ok := service.rejectExisting(ctx, opStartedAt, input); ok { + return outcome, nil + } + + imageRef, outcome, ok := service.resolveImageRef(ctx, opStartedAt, input) + if !ok { + return outcome, nil + } + + record := service.buildStartingRecord(input, imageRef, opStartedAt) + if err := service.runtimeRecords.Insert(ctx, record); err != nil { + switch { + case errors.Is(err, runtime.ErrConflict): + return service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeConflict, "runtime record already exists"), nil + default: + return service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeServiceUnavailable, fmt.Sprintf("insert runtime record: %s", err.Error())), nil + } + } + + engineState, outcome, ok := service.callEngineInit(ctx, opStartedAt, input) + if !ok { + return outcome, nil + } + + if outcome, ok := service.validateRoster(ctx, opStartedAt, input, engineState); !ok { + return outcome, nil + } + + if outcome, ok := service.installPlayerMappings(ctx, opStartedAt, input, engineState); !ok { + return outcome, nil + } + + nextGenerationAt, outcome, ok := service.computeNextGeneration(ctx, opStartedAt, input) + if !ok { + return outcome, nil + } + + if outcome, ok := service.casToRunning(ctx, opStartedAt, input); !ok { + return outcome, nil + } + + if outcome, ok := service.persistInitialScheduling(ctx, opStartedAt, input, nextGenerationAt); !ok { + return outcome, nil + } + + persisted, outcome, ok := service.reloadRecord(ctx, opStartedAt, input) + if !ok { + return outcome, nil + } + + stats := projectInitToStats(engineState, input.Members) + + service.appendSuccessLog(ctx, opStartedAt, input) + service.publishSnapshot(ctx, persisted, stats, opStartedAt) + service.telemetry.RecordRegisterRuntimeOutcome(ctx, string(operation.OutcomeSuccess), "") + + logArgs := []any{ + "game_id", input.GameID, + "engine_version", input.TargetEngineVersion, + "members", len(input.Members), + "op_source", string(input.OpSource), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "runtime registered", logArgs...) + + return Result{ + Record: persisted, + Outcome: operation.OutcomeSuccess, + }, nil +} + +// rejectExisting returns a Result and ok=true when the runtime record +// already exists or the lookup itself failed; ok=false continues the +// flow. +func (service *Service) rejectExisting(ctx context.Context, opStartedAt time.Time, input Input) (Result, bool) { + _, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case errors.Is(err, runtime.ErrNotFound): + return Result{}, false + case err != nil: + return service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error())), true + default: + return service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeConflict, "runtime record already exists"), true + } +} + +// resolveImageRef resolves the target engine version against the +// engine_versions registry. Returns ok=false on failure with the +// matching Result. +func (service *Service) resolveImageRef(ctx context.Context, opStartedAt time.Time, input Input) (string, Result, bool) { + version, err := service.engineVersions.Get(ctx, input.TargetEngineVersion) + switch { + case errors.Is(err, engineversion.ErrNotFound): + return "", service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeEngineVersionNotFound, + fmt.Sprintf("engine version %q not found", input.TargetEngineVersion)), false + case err != nil: + return "", service.recordFailure(ctx, opStartedAt, input, false, false, + ErrorCodeServiceUnavailable, fmt.Sprintf("get engine version: %s", err.Error())), false + } + return version.ImageRef, Result{}, true +} + +// buildStartingRecord assembles the initial runtime_records row, +// matching `gamemaster/README.md §Lifecycles → Register-runtime` step 4. +func (service *Service) buildStartingRecord(input Input, imageRef string, now time.Time) runtime.RuntimeRecord { + return runtime.RuntimeRecord{ + GameID: input.GameID, + Status: runtime.StatusStarting, + EngineEndpoint: input.EngineEndpoint, + CurrentImageRef: imageRef, + CurrentEngineVersion: input.TargetEngineVersion, + TurnSchedule: input.TurnSchedule, + CurrentTurn: 0, + NextGenerationAt: nil, + SkipNextTick: false, + EngineHealth: "", + CreatedAt: now, + UpdatedAt: now, + } +} + +// callEngineInit dispatches the engine /admin/init call and maps the +// transport-layer error to a stable Result code. ok=false means the +// flow stops. +func (service *Service) callEngineInit(ctx context.Context, opStartedAt time.Time, input Input) (ports.StateResponse, Result, bool) { + races := make([]ports.InitRace, 0, len(input.Members)) + for _, member := range input.Members { + races = append(races, ports.InitRace{RaceName: member.RaceName}) + } + state, err := service.engine.Init(ctx, input.EngineEndpoint, ports.InitRequest{Races: races}) + if err == nil { + return state, Result{}, true + } + + code := classifyEngineError(err) + message := fmt.Sprintf("engine init: %s", err.Error()) + return ports.StateResponse{}, service.recordFailure(ctx, opStartedAt, input, true, false, code, message), false +} + +// classifyEngineError maps the engine port sentinels to the +// register-runtime stable error codes per Stage 13 D1. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// validateRoster checks that the engine response carries exactly the +// race set Game Master sent on /admin/init. ok=false means the flow +// stops. +func (service *Service) validateRoster(ctx context.Context, opStartedAt time.Time, input Input, state ports.StateResponse) (Result, bool) { + if len(state.Players) != len(input.Members) { + message := fmt.Sprintf("engine player count %d does not match roster size %d", len(state.Players), len(input.Members)) + return service.recordFailure(ctx, opStartedAt, input, true, false, + ErrorCodeEngineProtocolViolation, message), false + } + expected := make(map[string]struct{}, len(input.Members)) + for _, member := range input.Members { + expected[member.RaceName] = struct{}{} + } + for _, player := range state.Players { + if _, ok := expected[player.RaceName]; !ok { + message := fmt.Sprintf("engine returned race %q not present in roster", player.RaceName) + return service.recordFailure(ctx, opStartedAt, input, true, false, + ErrorCodeEngineProtocolViolation, message), false + } + } + return Result{}, true +} + +// installPlayerMappings projects the engine response onto +// player_mappings rows and persists them in one batch. ok=false means +// the flow stops (and rolls back both stores). +func (service *Service) installPlayerMappings(ctx context.Context, opStartedAt time.Time, input Input, state ports.StateResponse) (Result, bool) { + userByRace := make(map[string]string, len(input.Members)) + for _, member := range input.Members { + userByRace[member.RaceName] = member.UserID + } + + mappings := make([]playermapping.PlayerMapping, 0, len(state.Players)) + for _, player := range state.Players { + userID, ok := userByRace[player.RaceName] + if !ok { + message := fmt.Sprintf("engine returned race %q not present in roster", player.RaceName) + return service.recordFailure(ctx, opStartedAt, input, true, false, + ErrorCodeEngineProtocolViolation, message), false + } + mappings = append(mappings, playermapping.PlayerMapping{ + GameID: input.GameID, + UserID: userID, + RaceName: player.RaceName, + EnginePlayerUUID: player.EnginePlayerUUID, + CreatedAt: opStartedAt, + }) + } + + if err := service.playerMappings.BulkInsert(ctx, mappings); err != nil { + // BulkInsert is per-statement atomic (stage 11 D7), so a failure + // leaves no mappings to clean up — only the runtime row. + switch { + case errors.Is(err, playermapping.ErrConflict): + return service.recordFailure(ctx, opStartedAt, input, true, false, + ErrorCodeConflict, fmt.Sprintf("bulk insert player mappings: %s", err.Error())), false + default: + return service.recordFailure(ctx, opStartedAt, input, true, false, + ErrorCodeServiceUnavailable, fmt.Sprintf("bulk insert player mappings: %s", err.Error())), false + } + } + return Result{}, true +} + +// computeNextGeneration parses the cron schedule and computes the first +// next-generation timestamp (no skip pending). ok=false means the flow +// stops with rollback. +func (service *Service) computeNextGeneration(ctx context.Context, opStartedAt time.Time, input Input) (time.Time, Result, bool) { + sched, err := schedule.Parse(input.TurnSchedule) + if err != nil { + return time.Time{}, service.recordFailure(ctx, opStartedAt, input, true, true, + ErrorCodeInvalidRequest, fmt.Sprintf("parse turn schedule: %s", err.Error())), false + } + next, _ := sched.Next(opStartedAt, false) + return next.UTC(), Result{}, true +} + +// casToRunning flips the runtime record from `starting` to `running`. +// On CAS failure or any storage error the flow rolls back both stores. +func (service *Service) casToRunning(ctx context.Context, opStartedAt time.Time, input Input) (Result, bool) { + err := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: runtime.StatusStarting, + To: runtime.StatusRunning, + Now: opStartedAt, + }) + switch { + case err == nil: + return Result{}, true + case errors.Is(err, runtime.ErrConflict): + return service.recordFailure(ctx, opStartedAt, input, true, true, + ErrorCodeConflict, fmt.Sprintf("cas runtime status to running: %s", err.Error())), false + default: + return service.recordFailure(ctx, opStartedAt, input, true, true, + ErrorCodeServiceUnavailable, fmt.Sprintf("cas runtime status to running: %s", err.Error())), false + } +} + +// persistInitialScheduling writes the first `next_generation_at` and +// the (already false) skip flag plus turn=0 on the runtime row. +// Failure rolls back both stores. +func (service *Service) persistInitialScheduling(ctx context.Context, opStartedAt time.Time, input Input, next time.Time) (Result, bool) { + err := service.runtimeRecords.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: input.GameID, + NextGenerationAt: &next, + SkipNextTick: false, + CurrentTurn: 0, + Now: opStartedAt, + }) + if err != nil { + return service.recordFailure(ctx, opStartedAt, input, true, true, + ErrorCodeServiceUnavailable, fmt.Sprintf("update initial scheduling: %s", err.Error())), false + } + return Result{}, true +} + +// reloadRecord re-reads the runtime row so the returned Result.Record +// carries the post-CAS, post-scheduling timestamps the adapters set. +// On read failure the flow rolls back both stores. +func (service *Service) reloadRecord(ctx context.Context, opStartedAt time.Time, input Input) (runtime.RuntimeRecord, Result, bool) { + persisted, err := service.runtimeRecords.Get(ctx, input.GameID) + if err != nil { + return runtime.RuntimeRecord{}, service.recordFailure(ctx, opStartedAt, input, true, true, + ErrorCodeServiceUnavailable, fmt.Sprintf("reload runtime record: %s", err.Error())), false + } + return persisted, Result{}, true +} + +// projectInitToStats joins the engine /admin/init response on RaceName +// against the input roster to produce one PlayerTurnStats per active +// member. The caller has already validated that every player race name +// is present in the roster, so the lookup is total. +func projectInitToStats(state ports.StateResponse, members []Member) []ports.PlayerTurnStats { + if len(state.Players) == 0 { + return nil + } + userByRace := make(map[string]string, len(members)) + for _, member := range members { + userByRace[member.RaceName] = member.UserID + } + stats := make([]ports.PlayerTurnStats, 0, len(state.Players)) + for _, player := range state.Players { + userID, ok := userByRace[player.RaceName] + if !ok { + continue + } + stats = append(stats, ports.PlayerTurnStats{ + UserID: userID, + Planets: player.Planets, + Population: player.Population, + }) + } + sort.Slice(stats, func(i, j int) bool { return stats[i].UserID < stats[j].UserID }) + return stats +} + +// recordFailure assembles the failure Result, rolls back any installed +// state, appends the operation_log failure entry, and emits telemetry. +// runtimeInserted reports whether the runtime row was already +// installed; playerMappingsInstalled reports whether the player_mappings +// rows were installed too. The two booleans gate the rollback so a +// race-induced ErrConflict from Insert does not delete a row owned by +// another caller. +func (service *Service) recordFailure( + ctx context.Context, + opStartedAt time.Time, + input Input, + runtimeInserted bool, + playerMappingsInstalled bool, + errorCode string, + errorMessage string, +) Result { + if runtimeInserted { + service.rollback(ctx, input.GameID, playerMappingsInstalled) + } + + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindRegisterRuntime, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) + + service.telemetry.RecordRegisterRuntimeOutcome(ctx, string(operation.OutcomeFailure), errorCode) + + logArgs := []any{ + "game_id", input.GameID, + "engine_version", input.TargetEngineVersion, + "op_source", string(input.OpSource), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "register runtime failed", logArgs...) + + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// rollback removes any installed state. Both store calls are +// idempotent; failures are logged but never overwrite the original +// failure reason. A fresh background context is used so a cancelled +// request context does not strand the row. +func (service *Service) rollback(ctx context.Context, gameID string, playerMappingsInstalled bool) { + cleanupCtx, cancel := context.WithTimeout(context.Background(), rollbackTimeout) + defer cancel() + if playerMappingsInstalled { + if err := service.playerMappings.DeleteByGame(cleanupCtx, gameID); err != nil { + service.logger.ErrorContext(ctx, "rollback player mappings", + "game_id", gameID, + "err", err.Error(), + ) + } + } + if err := service.runtimeRecords.Delete(cleanupCtx, gameID); err != nil { + service.logger.ErrorContext(ctx, "rollback runtime record", + "game_id", gameID, + "err", err.Error(), + ) + } +} + +// rollbackTimeout bounds each rollback storage call. A fresh background +// context is used so a canceled request context does not block the +// cleanup; the timeout matches the shape used by +// `rtmanager/internal/service/startruntime.Service.releaseLease`. +const rollbackTimeout = 5 * time.Second + +// appendSuccessLog records the success operation_log entry for the +// completed register-runtime operation. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindRegisterRuntime, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// publishSnapshot publishes the post-success runtime_snapshot_update +// per `gamemaster/README.md §Lifecycles → Register-runtime` step 9. +// Failures are logged but do not roll back the just-installed runtime +// record; the snapshot stream is best-effort by contract. +func (service *Service) publishSnapshot(ctx context.Context, record runtime.RuntimeRecord, stats []ports.PlayerTurnStats, occurredAt time.Time) { + msg := ports.RuntimeSnapshotUpdate{ + GameID: record.GameID, + CurrentTurn: record.CurrentTurn, + RuntimeStatus: record.Status, + EngineHealthSummary: record.EngineHealth, + PlayerTurnStats: stats, + OccurredAt: occurredAt, + } + if err := service.lobbyEvents.PublishSnapshotUpdate(ctx, msg); err != nil { + service.logger.ErrorContext(ctx, "publish runtime snapshot update", + "game_id", record.GameID, + "err", err.Error(), + ) + return + } + service.telemetry.RecordLobbyEventPublished(ctx, "runtime_snapshot_update") +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the runtime record (or its absence after rollback) is +// the source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to `admin_rest` when the caller did not +// supply a known op source. Mirrors the README §Trusted Surfaces rule +// "when missing or unrecognised, GM defaults to `op_source=admin_rest`". +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/registerruntime/service_test.go b/gamemaster/internal/service/registerruntime/service_test.go new file mode 100644 index 0000000..12be869 --- /dev/null +++ b/gamemaster/internal/service/registerruntime/service_test.go @@ -0,0 +1,796 @@ +package registerruntime_test + +import ( + "context" + "errors" + "fmt" + "sort" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/engineversion" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/registerruntime" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error + insErr error + updErr error + schErr error + delErr error + deletes []string + updates []ports.UpdateStatusInput + scheds []ports.UpdateSchedulingInput +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(_ context.Context, record runtime.RuntimeRecord) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.insErr != nil { + return s.insErr + } + if _, ok := s.stored[record.GameID]; ok { + return runtime.ErrConflict + } + s.stored[record.GameID] = record + return nil +} + +func (s *fakeRuntimeRecords) UpdateStatus(_ context.Context, input ports.UpdateStatusInput) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.updErr != nil { + return s.updErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if record.Status != input.ExpectedFrom { + return runtime.ErrConflict + } + record.Status = input.To + record.UpdatedAt = input.Now + if input.To == runtime.StatusRunning && record.StartedAt == nil { + started := input.Now + record.StartedAt = &started + } + s.stored[input.GameID] = record + s.updates = append(s.updates, input) + return nil +} + +func (s *fakeRuntimeRecords) UpdateScheduling(_ context.Context, input ports.UpdateSchedulingInput) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.schErr != nil { + return s.schErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if input.NextGenerationAt != nil { + next := *input.NextGenerationAt + record.NextGenerationAt = &next + } else { + record.NextGenerationAt = nil + } + record.SkipNextTick = input.SkipNextTick + record.CurrentTurn = input.CurrentTurn + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + s.scheds = append(s.scheds, input) + return nil +} + +func (s *fakeRuntimeRecords) UpdateImage(_ context.Context, input ports.UpdateImageInput) error { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if record.Status != input.ExpectedStatus { + return runtime.ErrConflict + } + record.CurrentImageRef = input.CurrentImageRef + record.CurrentEngineVersion = input.CurrentEngineVersion + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + return nil +} + +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} + +func (s *fakeRuntimeRecords) Delete(_ context.Context, gameID string) error { + s.mu.Lock() + defer s.mu.Unlock() + s.deletes = append(s.deletes, gameID) + if s.delErr != nil { + return s.delErr + } + delete(s.stored, gameID) + return nil +} + +func (s *fakeRuntimeRecords) ListDueRunning(_ context.Context, _ time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in registerruntime tests") +} + +func (s *fakeRuntimeRecords) ListByStatus(_ context.Context, _ runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in registerruntime tests") +} + +func (s *fakeRuntimeRecords) List(_ context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in registerruntime tests") +} + +func (s *fakeRuntimeRecords) deleteCount() int { + s.mu.Lock() + defer s.mu.Unlock() + return len(s.deletes) +} + +func (s *fakeRuntimeRecords) hasRecord(gameID string) bool { + s.mu.Lock() + defer s.mu.Unlock() + _, ok := s.stored[gameID] + return ok +} + +func (s *fakeRuntimeRecords) record(gameID string) (runtime.RuntimeRecord, bool) { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[gameID] + return record, ok +} + +type fakeEngineVersions struct { + mu sync.Mutex + versions map[string]engineversion.EngineVersion + getErr error +} + +func newFakeEngineVersions() *fakeEngineVersions { + return &fakeEngineVersions{versions: map[string]engineversion.EngineVersion{}} +} + +func (s *fakeEngineVersions) seed(version, imageRef string) { + s.mu.Lock() + defer s.mu.Unlock() + s.versions[version] = engineversion.EngineVersion{ + Version: version, + ImageRef: imageRef, + Status: engineversion.StatusActive, + CreatedAt: time.Now().UTC(), + UpdatedAt: time.Now().UTC(), + } +} + +func (s *fakeEngineVersions) Get(_ context.Context, version string) (engineversion.EngineVersion, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return engineversion.EngineVersion{}, s.getErr + } + record, ok := s.versions[version] + if !ok { + return engineversion.EngineVersion{}, engineversion.ErrNotFound + } + return record, nil +} + +func (s *fakeEngineVersions) List(_ context.Context, _ *engineversion.Status) ([]engineversion.EngineVersion, error) { + return nil, errors.New("not used in registerruntime tests") +} + +func (s *fakeEngineVersions) Insert(_ context.Context, _ engineversion.EngineVersion) error { + return errors.New("not used in registerruntime tests") +} + +func (s *fakeEngineVersions) Update(_ context.Context, _ ports.UpdateEngineVersionInput) error { + return errors.New("not used in registerruntime tests") +} + +func (s *fakeEngineVersions) Deprecate(_ context.Context, _ string, _ time.Time) error { + return errors.New("not used in registerruntime tests") +} + +func (s *fakeEngineVersions) Delete(_ context.Context, _ string) error { + return errors.New("not used in registerruntime tests") +} + +func (s *fakeEngineVersions) IsReferencedByActiveRuntime(_ context.Context, _ string) (bool, error) { + return false, errors.New("not used in registerruntime tests") +} + +type fakePlayerMappings struct { + mu sync.Mutex + stored map[string][]playermapping.PlayerMapping + bulkErr error + delErr error + deletes []string + inserted [][]playermapping.PlayerMapping +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{stored: map[string][]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) BulkInsert(_ context.Context, records []playermapping.PlayerMapping) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.bulkErr != nil { + return s.bulkErr + } + if len(records) == 0 { + return nil + } + for _, record := range records { + s.stored[record.GameID] = append(s.stored[record.GameID], record) + } + copyOf := make([]playermapping.PlayerMapping, len(records)) + copy(copyOf, records) + s.inserted = append(s.inserted, copyOf) + return nil +} + +func (s *fakePlayerMappings) Get(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used in registerruntime tests") +} + +func (s *fakePlayerMappings) GetByRace(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used in registerruntime tests") +} + +func (s *fakePlayerMappings) ListByGame(_ context.Context, gameID string) ([]playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + return append([]playermapping.PlayerMapping(nil), s.stored[gameID]...), nil +} + +func (s *fakePlayerMappings) DeleteByGame(_ context.Context, gameID string) error { + s.mu.Lock() + defer s.mu.Unlock() + s.deletes = append(s.deletes, gameID) + if s.delErr != nil { + return s.delErr + } + delete(s.stored, gameID) + return nil +} + +func (s *fakePlayerMappings) deleteCount() int { + s.mu.Lock() + defer s.mu.Unlock() + return len(s.deletes) +} + +func (s *fakePlayerMappings) hasRecords(gameID string) bool { + s.mu.Lock() + defer s.mu.Unlock() + return len(s.stored[gameID]) > 0 +} + +type fakeOperationLogs struct { + mu sync.Mutex + appErr error + entries []operation.OperationEntry +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.appErr != nil { + return 0, s.appErr + } + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} + +func (s *fakeOperationLogs) ListByGame(_ context.Context, _ string, _ int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used in registerruntime tests") +} + +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + ctrl *gomock.Controller + runtime *fakeRuntimeRecords + versions *fakeEngineVersions + mappings *fakePlayerMappings + logs *fakeOperationLogs + engine *mocks.MockEngineClient + lobby *mocks.MockLobbyEventsPublisher + telemetry *telemetry.Runtime + now time.Time + service *registerruntime.Service +} + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + + h := &harness{ + t: t, + ctrl: ctrl, + runtime: newFakeRuntimeRecords(), + versions: newFakeEngineVersions(), + mappings: newFakePlayerMappings(), + logs: &fakeOperationLogs{}, + engine: mocks.NewMockEngineClient(ctrl), + lobby: mocks.NewMockLobbyEventsPublisher(ctrl), + telemetry: telemetryRuntime, + now: time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC), + } + h.versions.seed("v1.2.3", "ghcr.io/galaxy/game:v1.2.3") + + service, err := registerruntime.NewService(registerruntime.Dependencies{ + RuntimeRecords: h.runtime, + EngineVersions: h.versions, + PlayerMappings: h.mappings, + OperationLogs: h.logs, + Engine: h.engine, + LobbyEvents: h.lobby, + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +func baseInput() registerruntime.Input { + return registerruntime.Input{ + GameID: "game-001", + EngineEndpoint: "http://galaxy-game-game-001:8080", + Members: []registerruntime.Member{ + {UserID: "user-1", RaceName: "Aelinari"}, + {UserID: "user-2", RaceName: "Drazi"}, + }, + TargetEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + OpSource: operation.OpSourceLobbyInternal, + SourceRef: "req-abc", + } +} + +func enginePlayers() []ports.PlayerState { + return []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 3, Population: 100}, + {RaceName: "Drazi", EnginePlayerUUID: "uuid-2", Planets: 2, Population: 80}, + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*registerruntime.Dependencies) + }{ + {"runtime records", func(d *registerruntime.Dependencies) { d.RuntimeRecords = nil }}, + {"engine versions", func(d *registerruntime.Dependencies) { d.EngineVersions = nil }}, + {"player mappings", func(d *registerruntime.Dependencies) { d.PlayerMappings = nil }}, + {"operation logs", func(d *registerruntime.Dependencies) { d.OperationLogs = nil }}, + {"engine", func(d *registerruntime.Dependencies) { d.Engine = nil }}, + {"lobby events", func(d *registerruntime.Dependencies) { d.LobbyEvents = nil }}, + {"telemetry", func(d *registerruntime.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := registerruntime.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + EngineVersions: newFakeEngineVersions(), + PlayerMappings: newFakePlayerMappings(), + OperationLogs: &fakeOperationLogs{}, + Engine: mocks.NewMockEngineClient(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := registerruntime.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, ports.InitRequest{ + Races: []ports.InitRace{{RaceName: "Aelinari"}, {RaceName: "Drazi"}}, + }). + Return(ports.StateResponse{ + Turn: 0, + Players: enginePlayers(), + }, nil) + + var captured ports.RuntimeSnapshotUpdate + h.lobby.EXPECT(). + PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, msg ports.RuntimeSnapshotUpdate) error { + captured = msg + return nil + }) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) + require.Equal(t, runtime.StatusRunning, result.Record.Status) + require.Equal(t, "ghcr.io/galaxy/game:v1.2.3", result.Record.CurrentImageRef) + require.NotNil(t, result.Record.NextGenerationAt) + require.NotNil(t, result.Record.StartedAt) + + stored, ok := h.runtime.record(input.GameID) + require.True(t, ok) + assert.Equal(t, runtime.StatusRunning, stored.Status) + assert.Equal(t, 0, stored.CurrentTurn) + assert.False(t, stored.SkipNextTick) + require.NotNil(t, stored.NextGenerationAt) + assert.True(t, stored.NextGenerationAt.After(h.now)) + + mappings, err := h.mappings.ListByGame(context.Background(), input.GameID) + require.NoError(t, err) + require.Len(t, mappings, 2) + sort.Slice(mappings, func(i, j int) bool { return mappings[i].UserID < mappings[j].UserID }) + assert.Equal(t, "user-1", mappings[0].UserID) + assert.Equal(t, "Aelinari", mappings[0].RaceName) + assert.Equal(t, "uuid-1", mappings[0].EnginePlayerUUID) + assert.Equal(t, "user-2", mappings[1].UserID) + assert.Equal(t, "Drazi", mappings[1].RaceName) + assert.Equal(t, "uuid-2", mappings[1].EnginePlayerUUID) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) + assert.Equal(t, operation.OpKindRegisterRuntime, entry.OpKind) + assert.Equal(t, operation.OpSourceLobbyInternal, entry.OpSource) + assert.Equal(t, "req-abc", entry.SourceRef) + + assert.Equal(t, input.GameID, captured.GameID) + assert.Equal(t, runtime.StatusRunning, captured.RuntimeStatus) + assert.Equal(t, 0, captured.CurrentTurn) + assert.Equal(t, "", captured.EngineHealthSummary) + require.Len(t, captured.PlayerTurnStats, 2) + assert.Equal(t, "user-1", captured.PlayerTurnStats[0].UserID) + assert.Equal(t, 3, captured.PlayerTurnStats[0].Planets) + assert.Equal(t, 100, captured.PlayerTurnStats[0].Population) + assert.Equal(t, "user-2", captured.PlayerTurnStats[1].UserID) + assert.Equal(t, 2, captured.PlayerTurnStats[1].Planets) + assert.Equal(t, 80, captured.PlayerTurnStats[1].Population) + assert.Equal(t, h.now.UTC(), captured.OccurredAt) +} + +func TestHandleRejectsInvalidInput(t *testing.T) { + cases := []struct { + name string + mut func(*registerruntime.Input) + }{ + {"empty game id", func(i *registerruntime.Input) { i.GameID = "" }}, + {"empty engine endpoint", func(i *registerruntime.Input) { i.EngineEndpoint = "" }}, + {"empty members", func(i *registerruntime.Input) { i.Members = nil }}, + {"empty target version", func(i *registerruntime.Input) { i.TargetEngineVersion = "" }}, + {"empty turn schedule", func(i *registerruntime.Input) { i.TurnSchedule = "" }}, + {"missing user id", func(i *registerruntime.Input) { + i.Members = []registerruntime.Member{{UserID: "", RaceName: "Aelinari"}} + }}, + {"missing race name", func(i *registerruntime.Input) { + i.Members = []registerruntime.Member{{UserID: "user-1", RaceName: ""}} + }}, + {"unknown op source", func(i *registerruntime.Input) { i.OpSource = "exotic" }}, + {"duplicate user id", func(i *registerruntime.Input) { + i.Members = []registerruntime.Member{ + {UserID: "user-1", RaceName: "Aelinari"}, + {UserID: "user-1", RaceName: "Drazi"}, + } + }}, + {"duplicate race name", func(i *registerruntime.Input) { + i.Members = []registerruntime.Member{ + {UserID: "user-1", RaceName: "Aelinari"}, + {UserID: "user-2", RaceName: "Aelinari"}, + } + }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + input := baseInput() + tc.mut(&input) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeInvalidRequest, result.ErrorCode) + + // No persistence should have happened. + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.False(t, h.mappings.hasRecords(input.GameID)) + }) + } +} + +func TestHandleRejectsExistingRuntime(t *testing.T) { + h := newHarness(t) + input := baseInput() + + require.NoError(t, h.runtime.Insert(context.Background(), runtime.RuntimeRecord{ + GameID: input.GameID, + Status: runtime.StatusRunning, + EngineEndpoint: input.EngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: input.TurnSchedule, + CreatedAt: h.now, + UpdatedAt: h.now, + StartedAt: &h.now, + })) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeConflict, result.ErrorCode) + + assert.True(t, h.runtime.hasRecord(input.GameID), "existing record must not be removed") + assert.Equal(t, 0, h.runtime.deleteCount()) + assert.Equal(t, 0, h.mappings.deleteCount()) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, registerruntime.ErrorCodeConflict, entry.ErrorCode) +} + +func TestHandleRejectsMissingEngineVersion(t *testing.T) { + h := newHarness(t) + input := baseInput() + input.TargetEngineVersion = "v9.9.9" + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineVersionNotFound, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 0, h.runtime.deleteCount()) +} + +func TestHandleRollsBackOnEngineUnreachable(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{}, fmt.Errorf("dial: %w", ports.ErrEngineUnreachable)) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineUnreachable, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) + // player_mappings were never installed; rollback skips them. + assert.Equal(t, 0, h.mappings.deleteCount()) +} + +func TestHandleRollsBackOnEngineValidationError(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{}, fmt.Errorf("init body: %w", ports.ErrEngineValidation)) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineValidationError, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) +} + +func TestHandleRollsBackOnEngineProtocolViolation(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{ + Players: []ports.PlayerState{ + {RaceName: "Unknown", EnginePlayerUUID: "uuid-x", Planets: 1, Population: 10}, + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 2, Population: 50}, + }, + }, nil) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineProtocolViolation, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) +} + +func TestHandleRollsBackOnPlayerCountMismatch(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{ + Players: []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 1, Population: 10}, + }, + }, nil) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineProtocolViolation, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) +} + +func TestHandleRollsBackOnPlayerMappingConflict(t *testing.T) { + h := newHarness(t) + input := baseInput() + h.mappings.bulkErr = fmt.Errorf("duplicate row: %w", playermapping.ErrConflict) + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{Players: enginePlayers()}, nil) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeConflict, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) + // BulkInsert is per-statement atomic, so a failure leaves no rows + // to clean up. + assert.Equal(t, 0, h.mappings.deleteCount()) +} + +func TestHandleRollsBackOnSchedulingUpdateFailure(t *testing.T) { + h := newHarness(t) + input := baseInput() + h.runtime.schErr = errors.New("postgres timeout") + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{Players: enginePlayers()}, nil) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeServiceUnavailable, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) + assert.Equal(t, 1, h.mappings.deleteCount()) +} + +func TestHandleRollsBackOnInvalidTurnSchedule(t *testing.T) { + h := newHarness(t) + input := baseInput() + input.TurnSchedule = "not-a-cron" + + // Engine init still happens because TurnSchedule is parsed only + // after the engine roster validation step. + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{Players: enginePlayers()}, nil) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeInvalidRequest, result.ErrorCode) + + assert.False(t, h.runtime.hasRecord(input.GameID)) + assert.Equal(t, 1, h.runtime.deleteCount()) + assert.Equal(t, 1, h.mappings.deleteCount()) +} + +func TestHandleAppendsOperationLogOnFailure(t *testing.T) { + h := newHarness(t) + input := baseInput() + + h.engine.EXPECT(). + Init(gomock.Any(), input.EngineEndpoint, gomock.Any()). + Return(ports.StateResponse{}, fmt.Errorf("dial: %w", ports.ErrEngineUnreachable)) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.Equal(t, operation.OutcomeFailure, result.Outcome) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindRegisterRuntime, entry.OpKind) + assert.Equal(t, operation.OpSourceLobbyInternal, entry.OpSource) + assert.Equal(t, operation.OutcomeFailure, entry.Outcome) + assert.Equal(t, registerruntime.ErrorCodeEngineUnreachable, entry.ErrorCode) + require.NotNil(t, entry.FinishedAt) + assert.False(t, entry.FinishedAt.Before(entry.StartedAt)) +} + +func TestHandleSurfaceServiceUnavailableOnGetRuntimeError(t *testing.T) { + h := newHarness(t) + input := baseInput() + h.runtime.getErr = errors.New("postgres dial timeout") + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, registerruntime.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleRejectsNilContext(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, baseInput()) //nolint:staticcheck // intentional nil context + require.Error(t, err) +} + +func TestHandleNilServiceReturnsError(t *testing.T) { + var service *registerruntime.Service + _, err := service.Handle(context.Background(), baseInput()) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/reportget/errors.go b/gamemaster/internal/service/reportget/errors.go new file mode 100644 index 0000000..50bcdef --- /dev/null +++ b/gamemaster/internal/service/reportget/errors.go @@ -0,0 +1,48 @@ +package reportget + +// Stable error codes returned in `Result.ErrorCode`. The values match the +// vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Stage 19's REST handler imports +// these names rather than redeclare them; renaming any of them is a +// contract change. +// +// Note: the report-get operation does **not** require the runtime to be +// in `running` state. Reports may be served against any runtime that +// exists in `runtime_records`; an unreachable engine surfaces naturally +// through `engine_unreachable`. Therefore `runtime_not_running` is not +// part of this vocabulary. +const ( + // ErrorCodeInvalidRequest reports that the request envelope failed + // structural validation (empty required field, negative turn). + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no `runtime_records` row + // exists for the requested game id. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeForbidden reports that the caller is not an active member + // of the game, or that the (game_id, user_id) pair lacks a player + // mapping. + ErrorCodeForbidden = "forbidden" + + // ErrorCodeEngineUnreachable reports that the engine /api/v1/report + // call returned a 5xx status, timed out, or could not be dispatched. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine returned + // 4xx. The body is forwarded verbatim through `Result.RawResponse`. + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine response + // did not match the expected schema (empty body, malformed JSON). + // Stage 19 maps this to 502. + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state dependency + // (PostgreSQL, Lobby) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/reportget/service.go b/gamemaster/internal/service/reportget/service.go new file mode 100644 index 0000000..11504cb --- /dev/null +++ b/gamemaster/internal/service/reportget/service.go @@ -0,0 +1,314 @@ +// Package reportget implements the per-player turn-report hot-path +// service owned by Game Master. It accepts a verified `(game_id, user_id, +// turn)` envelope from Edge Gateway, authorises the caller against the +// membership cache, resolves `race_name` from `player_mappings`, and +// forwards `GET /api/v1/report?player={race_name}&turn={turn}` to the +// engine. +// +// Lifecycle and error semantics follow `gamemaster/README.md §Hot Path → +// Reports`. Unlike commandexecute and orderput, the report service does +// not require `runtime_records.status = running`: reports may be served +// against any runtime that exists in the table, allowing post-finish +// inspection. Design rationale (decision D1) is captured in +// `gamemaster/docs/stage16-membership-cache-and-invalidation.md`. +package reportget + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/telemetry" +) + +const ( + engineCallOp = "report" + + membershipStatusActive = "active" +) + +// Input stores the per-call arguments for one report-get operation. +type Input struct { + // GameID identifies the platform game whose report is being read. + GameID string + + // UserID identifies the platform user submitting the request. The + // service derives `race_name` from this value via `player_mappings` + // before calling the engine. + UserID string + + // Turn identifies the turn number to read. Must be non-negative; + // zero requests the initial state report. + Turn int +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if strings.TrimSpace(input.UserID) == "" { + return fmt.Errorf("user id must not be empty") + } + if input.Turn < 0 { + return fmt.Errorf("turn must not be negative, got %d", input.Turn) + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. +type Result struct { + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string + + // RawResponse stores the engine response body. Populated on success + // and on `engine_validation_error`. Empty on every other terminal + // branch. + RawResponse json.RawMessage +} + +// IsSuccess reports whether the result represents a successful operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords loads the engine endpoint. + RuntimeRecords ports.RuntimeRecordStore + + // PlayerMappings resolves `(game_id, user_id) → race_name`. + PlayerMappings ports.PlayerMappingStore + + // Membership authorises the caller. + Membership *membership.Cache + + // Engine forwards `GET /api/v1/report` calls. + Engine ports.EngineClient + + // Telemetry records the per-outcome counter and the engine-call + // latency histogram. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for engine-call latency. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the report-get hot-path operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + playerMappings ports.PlayerMappingStore + membership *membership.Cache + engine ports.EngineClient + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new report get service: nil runtime records") + case deps.PlayerMappings == nil: + return nil, errors.New("new report get service: nil player mappings") + case deps.Membership == nil: + return nil, errors.New("new report get service: nil membership cache") + case deps.Engine == nil: + return nil, errors.New("new report get service: nil engine client") + case deps.Telemetry == nil: + return nil, errors.New("new report get service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.reportget") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + playerMappings: deps.PlayerMappings, + membership: deps.Membership, + engine: deps.Engine, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one report-get operation end-to-end. The Go-level error +// return is reserved for non-business failures (nil context, nil +// receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("report get: nil service") + } + if ctx == nil { + return Result{}, errors.New("report get: nil context") + } + + if err := input.Validate(); err != nil { + return service.recordFailure(ctx, input, ErrorCodeInvalidRequest, err.Error(), nil), nil + } + + record, result, ok := service.loadRecord(ctx, input) + if !ok { + return result, nil + } + + mapping, result, ok := service.authorise(ctx, input) + if !ok { + return result, nil + } + + body, engineErr := service.callEngine(ctx, record.EngineEndpoint, mapping.RaceName, input.Turn) + if engineErr != nil { + errorCode := classifyEngineError(engineErr) + message := fmt.Sprintf("engine report: %s", engineErr.Error()) + var bodyForCaller json.RawMessage + if errorCode == ErrorCodeEngineValidationError { + bodyForCaller = body + } + return service.recordFailure(ctx, input, errorCode, message, bodyForCaller), nil + } + + service.telemetry.RecordReportGetOutcome(ctx, + string(operation.OutcomeSuccess), "") + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "actor", mapping.RaceName, + "turn", input.Turn, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "report get succeeded", logArgs...) + + return Result{ + Outcome: operation.OutcomeSuccess, + RawResponse: body, + }, nil +} + +// loadRecord reads the runtime record and maps store errors to +// orchestrator outcomes. ok=false means the flow stops with the returned +// Result. Reports tolerate any non-deleted runtime status; the running +// guard from commandexecute / orderput is intentionally absent. +func (service *Service) loadRecord(ctx context.Context, input Input) (runtime.RuntimeRecord, Result, bool) { + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case err == nil: + return record, Result{}, true + case errors.Is(err, runtime.ErrNotFound): + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist", nil), false + default: + return runtime.RuntimeRecord{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error()), nil), false + } +} + +// authorise resolves the membership status and the player mapping for +// the caller. ok=false means the flow stops with the returned Result. +func (service *Service) authorise(ctx context.Context, input Input) (playermapping.PlayerMapping, Result, bool) { + status, err := service.membership.Resolve(ctx, input.GameID, input.UserID) + if err != nil { + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("resolve membership: %s", err.Error()), nil), false + } + if status != membershipStatusActive { + message := fmt.Sprintf("membership status %q does not authorise reports", status) + if status == "" { + message = "user is not a member of the game" + } + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, message, nil), false + } + + mapping, err := service.playerMappings.Get(ctx, input.GameID, input.UserID) + switch { + case err == nil: + return mapping, Result{}, true + case errors.Is(err, playermapping.ErrNotFound): + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeForbidden, "player mapping not installed for active member", nil), false + default: + return playermapping.PlayerMapping{}, service.recordFailure(ctx, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get player mapping: %s", err.Error()), nil), false + } +} + +// callEngine forwards the read to the engine and records the wall-clock +// latency under the `report` op label. +func (service *Service) callEngine(ctx context.Context, baseURL, raceName string, turn int) (json.RawMessage, error) { + start := service.clock() + body, err := service.engine.GetReport(ctx, baseURL, raceName, turn) + service.telemetry.RecordEngineCall(ctx, engineCallOp, service.clock().Sub(start)) + return body, err +} + +// classifyEngineError maps the engine port sentinels to the report-get +// stable error codes. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// recordFailure emits the service-level outcome counter and a structured +// log entry, then returns the Result the caller surfaces. +func (service *Service) recordFailure(ctx context.Context, input Input, errorCode, errorMessage string, rawResponse json.RawMessage) Result { + service.telemetry.RecordReportGetOutcome(ctx, + string(operation.OutcomeFailure), errorCode) + logArgs := []any{ + "game_id", input.GameID, + "user_id", input.UserID, + "turn", input.Turn, + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "report get rejected", logArgs...) + return Result{ + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + RawResponse: rawResponse, + } +} diff --git a/gamemaster/internal/service/reportget/service_test.go b/gamemaster/internal/service/reportget/service_test.go new file mode 100644 index 0000000..01bf9e2 --- /dev/null +++ b/gamemaster/internal/service/reportget/service_test.go @@ -0,0 +1,533 @@ +package reportget_test + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/membership" + "galaxy/gamemaster/internal/service/reportget" + "galaxy/gamemaster/internal/telemetry" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// --- fakes ------------------------------------------------------------ + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(context.Context, runtime.RuntimeRecord) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateStatus(context.Context, ports.UpdateStatusInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateScheduling(context.Context, ports.UpdateSchedulingInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) ListDueRunning(context.Context, time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) ListByStatus(context.Context, runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) List(context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateImage(context.Context, ports.UpdateImageInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) UpdateEngineHealth(context.Context, ports.UpdateEngineHealthInput) error { + return errors.New("not used") +} +func (s *fakeRuntimeRecords) Delete(context.Context, string) error { + return errors.New("not used") +} + +type fakePlayerMappings struct { + mu sync.Mutex + stored map[string]map[string]playermapping.PlayerMapping + getErr error +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{stored: map[string]map[string]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) seed(record playermapping.PlayerMapping) { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.stored[record.GameID]; !ok { + s.stored[record.GameID] = map[string]playermapping.PlayerMapping{} + } + s.stored[record.GameID][record.UserID] = record +} + +func (s *fakePlayerMappings) Get(_ context.Context, gameID, userID string) (playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.getErr != nil { + return playermapping.PlayerMapping{}, s.getErr + } + record, ok := s.stored[gameID][userID] + if !ok { + return playermapping.PlayerMapping{}, playermapping.ErrNotFound + } + return record, nil +} + +func (s *fakePlayerMappings) BulkInsert(context.Context, []playermapping.PlayerMapping) error { + return errors.New("not used") +} +func (s *fakePlayerMappings) GetByRace(context.Context, string, string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} +func (s *fakePlayerMappings) ListByGame(context.Context, string) ([]playermapping.PlayerMapping, error) { + return nil, errors.New("not used") +} +func (s *fakePlayerMappings) DeleteByGame(context.Context, string) error { + return errors.New("not used") +} + +type recordedReport struct { + baseURL string + raceName string + turn int +} + +type fakeEngine struct { + mu sync.Mutex + body json.RawMessage + err error + calls []recordedReport +} + +func (f *fakeEngine) GetReport(_ context.Context, baseURL, raceName string, turn int) (json.RawMessage, error) { + f.mu.Lock() + defer f.mu.Unlock() + f.calls = append(f.calls, recordedReport{baseURL: baseURL, raceName: raceName, turn: turn}) + return f.body, f.err +} + +func (f *fakeEngine) Init(context.Context, string, ports.InitRequest) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Status(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) Turn(context.Context, string) (ports.StateResponse, error) { + return ports.StateResponse{}, errors.New("not used") +} +func (f *fakeEngine) BanishRace(context.Context, string, string) error { + return errors.New("not used") +} +func (f *fakeEngine) ExecuteCommands(context.Context, string, json.RawMessage) (json.RawMessage, error) { + return nil, errors.New("not used") +} +func (f *fakeEngine) PutOrders(context.Context, string, json.RawMessage) (json.RawMessage, error) { + return nil, errors.New("not used") +} + +type fakeLobby struct { + mu sync.Mutex + answers map[string][]ports.Membership + errs map[string]error +} + +func newFakeLobby() *fakeLobby { + return &fakeLobby{ + answers: map[string][]ports.Membership{}, + errs: map[string]error{}, + } +} + +func (f *fakeLobby) seed(gameID string, members []ports.Membership) { + f.mu.Lock() + defer f.mu.Unlock() + f.answers[gameID] = members +} + +func (f *fakeLobby) seedErr(gameID string, err error) { + f.mu.Lock() + defer f.mu.Unlock() + f.errs[gameID] = err +} + +func (f *fakeLobby) GetMemberships(_ context.Context, gameID string) ([]ports.Membership, error) { + f.mu.Lock() + defer f.mu.Unlock() + if err, ok := f.errs[gameID]; ok { + return nil, err + } + return append([]ports.Membership(nil), f.answers[gameID]...), nil +} + +func (f *fakeLobby) GetGameSummary(context.Context, string) (ports.GameSummary, error) { + return ports.GameSummary{}, errors.New("not used") +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + now time.Time + runtimes *fakeRuntimeRecords + mappings *fakePlayerMappings + engine *fakeEngine + lobby *fakeLobby + cache *membership.Cache + service *reportget.Service +} + +const ( + testGameID = "game-001" + testUserID = "user-1" + testRaceName = "Aelinari" + testEngineEndpoint = "http://galaxy-game-game-001:8080" +) + +func newHarness(t *testing.T) *harness { + t.Helper() + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + now := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC) + + h := &harness{ + t: t, + now: now, + runtimes: newFakeRuntimeRecords(), + mappings: newFakePlayerMappings(), + engine: &fakeEngine{}, + lobby: newFakeLobby(), + } + + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: h.lobby, + Telemetry: tel, + TTL: time.Minute, + MaxGames: 16, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.cache = cache + + svc, err := reportget.NewService(reportget.Dependencies{ + RuntimeRecords: h.runtimes, + PlayerMappings: h.mappings, + Membership: h.cache, + Engine: h.engine, + Telemetry: tel, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = svc + return h +} + +func (h *harness) seedRecordWithStatus(status runtime.Status) { + startedAt := h.now.Add(-1 * time.Hour) + finishedAt := h.now + record := runtime.RuntimeRecord{ + GameID: testGameID, + Status: status, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + EngineHealth: "healthy", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + } + if status != runtime.StatusStarting { + record.StartedAt = &startedAt + } + if status == runtime.StatusStopped { + record.StoppedAt = &finishedAt + } + if status == runtime.StatusFinished { + record.FinishedAt = &finishedAt + } + h.runtimes.seed(record) +} + +func (h *harness) seedActiveMembership() { + h.lobby.seed(testGameID, []ports.Membership{{ + UserID: testUserID, + RaceName: testRaceName, + Status: "active", + JoinedAt: h.now.Add(-2 * time.Hour), + }}) +} + +func (h *harness) seedPlayerMapping() { + h.mappings.seed(playermapping.PlayerMapping{ + GameID: testGameID, + UserID: testUserID, + RaceName: testRaceName, + EnginePlayerUUID: "uuid-1", + CreatedAt: h.now.Add(-2 * time.Hour), + }) +} + +func (h *harness) input(turn int) reportget.Input { + return reportget.Input{GameID: testGameID, UserID: testUserID, Turn: turn} +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsBadDependencies(t *testing.T) { + tel, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cache, err := membership.NewCache(membership.Dependencies{ + Lobby: newFakeLobby(), Telemetry: tel, TTL: time.Minute, MaxGames: 1, + }) + require.NoError(t, err) + + cases := []struct { + name string + deps reportget.Dependencies + }{ + {"nil runtime records", reportget.Dependencies{PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil player mappings", reportget.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), Membership: cache, Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil membership", reportget.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Engine: &fakeEngine{}, Telemetry: tel}}, + {"nil engine", reportget.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Telemetry: tel}}, + {"nil telemetry", reportget.Dependencies{RuntimeRecords: newFakeRuntimeRecords(), PlayerMappings: newFakePlayerMappings(), Membership: cache, Engine: &fakeEngine{}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + svc, err := reportget.NewService(tc.deps) + require.Error(t, err) + assert.Nil(t, svc) + }) + } +} + +func TestHandleHappyPath(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"version":1,"turn":3,"player":[]}`) + + result, err := h.service.Handle(context.Background(), h.input(3)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) + + require.Len(t, h.engine.calls, 1) + assert.Equal(t, testEngineEndpoint, h.engine.calls[0].baseURL) + assert.Equal(t, testRaceName, h.engine.calls[0].raceName) + assert.Equal(t, 3, h.engine.calls[0].turn) +} + +func TestHandleAcceptsAnyNonNotFoundStatus(t *testing.T) { + for _, status := range []runtime.Status{ + runtime.StatusStarting, + runtime.StatusRunning, + runtime.StatusGenerationInProgress, + runtime.StatusGenerationFailed, + runtime.StatusStopped, + runtime.StatusEngineUnreachable, + runtime.StatusFinished, + } { + t.Run(string(status), func(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(status) + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"version":1,"turn":0,"player":[]}`) + + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeSuccess, result.Outcome, + "reports must be served regardless of status; got %s", result.ErrorCode) + }) + } +} + +func TestHandleInvalidRequest(t *testing.T) { + cases := []struct { + name string + input reportget.Input + message string + }{ + {"empty game id", reportget.Input{UserID: testUserID, Turn: 0}, "game id"}, + {"empty user id", reportget.Input{GameID: testGameID, Turn: 0}, "user id"}, + {"negative turn", reportget.Input{GameID: testGameID, UserID: testUserID, Turn: -1}, "turn"}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), tc.input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeInvalidRequest, result.ErrorCode) + assert.Contains(t, result.ErrorMessage, tc.message) + }) + } +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleRuntimeStoreError(t *testing.T) { + h := newHarness(t) + h.runtimes.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleForbiddenInactiveMembership(t *testing.T) { + cases := []struct { + name string + members []ports.Membership + }{ + {"removed", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "removed"}}}, + {"blocked", []ports.Membership{{UserID: testUserID, RaceName: testRaceName, Status: "blocked"}}}, + {"unknown user", []ports.Membership{{UserID: "ghost", RaceName: "Ghost", Status: "active"}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedPlayerMapping() + h.lobby.seed(testGameID, tc.members) + + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) + }) + } +} + +func TestHandleForbiddenMissingPlayerMapping(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeForbidden, result.ErrorCode) + assert.Empty(t, h.engine.calls) +} + +func TestHandleServiceUnavailableLobbyDown(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedPlayerMapping() + h.lobby.seedErr(testGameID, fmt.Errorf("dial: %w", ports.ErrLobbyUnavailable)) + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleServiceUnavailablePlayerMappingsError(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + h.mappings.getErr = errors.New("postgres down") + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleEngineUnreachable(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("dial: %w", ports.ErrEngineUnreachable) + + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeEngineUnreachable, result.ErrorCode) +} + +func TestHandleEngineValidationErrorForwardsBody(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.body = json.RawMessage(`{"error":"unknown turn"}`) + h.engine.err = fmt.Errorf("400: %w", ports.ErrEngineValidation) + + result, err := h.service.Handle(context.Background(), h.input(99)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeEngineValidationError, result.ErrorCode) + assert.JSONEq(t, string(h.engine.body), string(result.RawResponse)) +} + +func TestHandleEngineProtocolViolation(t *testing.T) { + h := newHarness(t) + h.seedRecordWithStatus(runtime.StatusRunning) + h.seedActiveMembership() + h.seedPlayerMapping() + h.engine.err = fmt.Errorf("garbled: %w", ports.ErrEngineProtocolViolation) + + result, err := h.service.Handle(context.Background(), h.input(0)) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, reportget.ErrorCodeEngineProtocolViolation, result.ErrorCode) +} + +func TestHandleNilContext(t *testing.T) { + h := newHarness(t) + var nilCtx context.Context + _, err := h.service.Handle(nilCtx, h.input(0)) + require.Error(t, err) +} + +func TestHandleNilReceiver(t *testing.T) { + var svc *reportget.Service + _, err := svc.Handle(context.Background(), reportget.Input{}) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/scheduler/service.go b/gamemaster/internal/service/scheduler/service.go new file mode 100644 index 0000000..9bf8eb5 --- /dev/null +++ b/gamemaster/internal/service/scheduler/service.go @@ -0,0 +1,59 @@ +// Package scheduler exposes the next-tick computation Game Master uses +// to advance `runtime_records.next_generation_at` after a successful +// turn generation. It is a thin, stateless wrapper over +// `domain/schedule.Schedule.Next` with the force-next-turn skip rule +// baked in via the `skipNextTick` parameter. +// +// Two callers consume the wrapper today: +// +// - `service/turngeneration` recomputes the next tick after a +// successful (non-finished) generation; +// - `service/adminforce` (Stage 17) reuses the same instance so the +// skip rule lives in exactly one place. +// +// The package depends only on `domain/schedule` and stdlib `time`. It +// holds no clock and no logger; callers pass `after` explicitly. +package scheduler + +import ( + "errors" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/schedule" +) + +// Service computes the next scheduler-driven turn-generation tick. +type Service struct{} + +// New constructs a stateless Service value. Returning a pointer keeps +// the construction shape consistent with the other GM services even +// though Service has no dependencies. +func New() *Service { + return &Service{} +} + +// ComputeNext parses turnSchedule and returns the next firing time +// strictly after `after`, applying the force-next-turn skip rule when +// skipNextTick is true. +// +// When skipNextTick is true the wrapper computes the immediate next +// cron step and then advances by one further step, so the inter-turn +// spacing is never shorter than one schedule interval. The returned +// `skipConsumed` flag reports whether the wrapper consumed the skip +// (true when skipNextTick was true). +// +// On parse error ComputeNext returns the zero time, false, and the +// error wrapped from `schedule.Parse`. The caller is responsible for +// mapping it to the orchestrator-level `invalid_request` code. +func (service *Service) ComputeNext(turnSchedule string, after time.Time, skipNextTick bool) (time.Time, bool, error) { + if service == nil { + return time.Time{}, false, errors.New("scheduler compute next: nil service") + } + parsed, err := schedule.Parse(strings.TrimSpace(turnSchedule)) + if err != nil { + return time.Time{}, false, err + } + next, skipConsumed := parsed.Next(after, skipNextTick) + return next, skipConsumed, nil +} diff --git a/gamemaster/internal/service/scheduler/service_test.go b/gamemaster/internal/service/scheduler/service_test.go new file mode 100644 index 0000000..0054c0a --- /dev/null +++ b/gamemaster/internal/service/scheduler/service_test.go @@ -0,0 +1,63 @@ +package scheduler_test + +import ( + "testing" + "time" + + "galaxy/gamemaster/internal/service/scheduler" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestComputeNextHappyPathWithoutSkip(t *testing.T) { + service := scheduler.New() + after := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + next, skipConsumed, err := service.ComputeNext("0 18 * * *", after, false) + require.NoError(t, err) + assert.False(t, skipConsumed) + expected := time.Date(2026, time.April, 30, 18, 0, 0, 0, time.UTC) + assert.Equal(t, expected, next) + assert.Equal(t, time.UTC, next.Location()) +} + +func TestComputeNextConsumesSkip(t *testing.T) { + service := scheduler.New() + after := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + next, skipConsumed, err := service.ComputeNext("0 18 * * *", after, true) + require.NoError(t, err) + assert.True(t, skipConsumed) + expected := time.Date(2026, time.May, 1, 18, 0, 0, 0, time.UTC) + assert.Equal(t, expected, next) +} + +func TestComputeNextEveryQuarterHourSkip(t *testing.T) { + service := scheduler.New() + after := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + first, _, err := service.ComputeNext("*/15 * * * *", after, false) + require.NoError(t, err) + skipped, _, err := service.ComputeNext("*/15 * * * *", after, true) + require.NoError(t, err) + assert.Equal(t, first.Add(15*time.Minute), skipped, "skip advances by exactly one cron step") +} + +func TestComputeNextRejectsInvalidCron(t *testing.T) { + service := scheduler.New() + _, _, err := service.ComputeNext("not-a-cron", time.Now().UTC(), false) + require.Error(t, err) +} + +func TestComputeNextTrimsWhitespace(t *testing.T) { + service := scheduler.New() + after := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + next, _, err := service.ComputeNext(" 0 18 * * * ", after, false) + require.NoError(t, err) + expected := time.Date(2026, time.April, 30, 18, 0, 0, 0, time.UTC) + assert.Equal(t, expected, next) +} + +func TestNilServiceRejected(t *testing.T) { + var service *scheduler.Service + _, _, err := service.ComputeNext("0 18 * * *", time.Now().UTC(), false) + require.Error(t, err) +} diff --git a/gamemaster/internal/service/turngeneration/errors.go b/gamemaster/internal/service/turngeneration/errors.go new file mode 100644 index 0000000..a2df180 --- /dev/null +++ b/gamemaster/internal/service/turngeneration/errors.go @@ -0,0 +1,56 @@ +package turngeneration + +// Stable error codes returned in `Result.ErrorCode`. The values match +// the vocabulary frozen by `gamemaster/README.md §Error Model` and +// `gamemaster/api/internal-openapi.yaml`. Stages 17 and 19 import these +// names rather than redeclare them; renaming any of them is a contract +// change. +const ( + // ErrorCodeInvalidRequest reports that the input envelope failed + // structural validation (empty game id, unsupported trigger, + // unsupported op_source) or that the runtime record's stored + // `turn_schedule` could not be parsed at recompute time. + ErrorCodeInvalidRequest = "invalid_request" + + // ErrorCodeRuntimeNotFound reports that no `runtime_records` row + // exists for the requested game id. The orchestrator does no other + // work and never publishes events. + ErrorCodeRuntimeNotFound = "runtime_not_found" + + // ErrorCodeRuntimeNotRunning reports that the runtime exists but + // its current status is not `running`. The orchestrator returns + // without calling the engine. + ErrorCodeRuntimeNotRunning = "runtime_not_running" + + // ErrorCodeConflict reports that a CAS guard failed mid-flow + // because the runtime row changed concurrently (typical cause: + // admin issued a stop while a generation was in progress). + ErrorCodeConflict = "conflict" + + // ErrorCodeEngineUnreachable reports that the engine /admin/turn + // call returned a 5xx status, timed out, or could not be + // dispatched. The runtime row is moved to `generation_failed` and a + // snapshot plus admin notification are published before the code + // reaches the caller. + ErrorCodeEngineUnreachable = "engine_unreachable" + + // ErrorCodeEngineValidationError reports that the engine + // /admin/turn call returned a 4xx status. Distinguished from + // `engine_unreachable` so operators can tell "engine is alive but + // rejected the request shape" from "engine is unreachable". + ErrorCodeEngineValidationError = "engine_validation_error" + + // ErrorCodeEngineProtocolViolation reports that the engine response + // did not match the expected schema or did not match the runtime's + // installed roster (player count mismatch, race-name set mismatch, + // missing required fields). + ErrorCodeEngineProtocolViolation = "engine_protocol_violation" + + // ErrorCodeServiceUnavailable reports that a steady-state + // dependency (PostgreSQL, Redis) was unreachable for this call. + ErrorCodeServiceUnavailable = "service_unavailable" + + // ErrorCodeInternal reports an unexpected error not classified by + // the other codes. + ErrorCodeInternal = "internal_error" +) diff --git a/gamemaster/internal/service/turngeneration/service.go b/gamemaster/internal/service/turngeneration/service.go new file mode 100644 index 0000000..4e271e6 --- /dev/null +++ b/gamemaster/internal/service/turngeneration/service.go @@ -0,0 +1,971 @@ +// Package turngeneration implements the turn-generation orchestrator +// owned by Game Master. It is the single entry point through which the +// scheduler ticker (Stage 15 worker) and the admin force-next-turn flow +// (Stage 17) drive a turn through the engine container. +// +// Lifecycle and failure-mode semantics follow `gamemaster/README.md +// §Lifecycles → Turn generation` and §Force-next-turn. Design rationale +// is captured in +// `gamemaster/docs/stage15-scheduler-and-turn-generation.md`. +package turngeneration + +import ( + "context" + "errors" + "fmt" + "log/slog" + "sort" + "strings" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/scheduler" + "galaxy/gamemaster/internal/telemetry" + "galaxy/notificationintent" +) + +// Trigger classifies the caller of one turn-generation operation. The +// value flows into telemetry and structured logs only — it does not +// branch the orchestrator's persistence path. The skip-tick mechanic is +// driven exclusively by the runtime record's `skip_next_tick` column. +type Trigger string + +const ( + // TriggerScheduler labels turn generations dispatched by the + // `schedulerticker` worker. + TriggerScheduler Trigger = "scheduler" + + // TriggerForce labels turn generations dispatched by the admin + // force-next-turn flow (Stage 17 `service/adminforce`). + TriggerForce Trigger = "force" +) + +// IsKnown reports whether trigger belongs to the frozen trigger +// vocabulary. +func (trigger Trigger) IsKnown() bool { + switch trigger { + case TriggerScheduler, TriggerForce: + return true + default: + return false + } +} + +// Input stores the per-call arguments for one turn-generation +// operation. +type Input struct { + // GameID identifies the runtime to drive. + GameID string + + // Trigger classifies the caller. Used for telemetry and logs only. + Trigger Trigger + + // OpSource classifies how the request entered Game Master. Used to + // stamp `operation_log.op_source`. Defaults to `admin_rest` when + // missing or unrecognised. + OpSource operation.OpSource + + // SourceRef stores the optional opaque per-source reference (REST + // request id, scheduler tick id). Empty when the caller does not + // provide one. + SourceRef string +} + +// Validate reports whether input carries the structural invariants the +// service requires before any store is touched. +func (input Input) Validate() error { + if strings.TrimSpace(input.GameID) == "" { + return fmt.Errorf("game id must not be empty") + } + if !input.Trigger.IsKnown() { + return fmt.Errorf("trigger %q is unsupported", input.Trigger) + } + if !input.OpSource.IsKnown() { + return fmt.Errorf("op source %q is unsupported", input.OpSource) + } + return nil +} + +// Result stores the deterministic outcome of one Handle call. +type Result struct { + // Record carries the post-mutation runtime record. Populated on + // every success outcome and on `engine_*` failures (where the row + // was moved to `generation_failed`); zero on early-rejection + // outcomes (`invalid_request`, `runtime_not_found`, + // `runtime_not_running`, `conflict` on initial CAS, + // `service_unavailable` on initial Get). + Record runtime.RuntimeRecord + + // Trigger echoes back Input.Trigger for log/telemetry consumers. + Trigger Trigger + + // Finished is true when the engine reported `finished=true` on this + // turn and the runtime transitioned to `finished`. + Finished bool + + // Outcome reports whether the operation completed (success) or + // produced a stable failure code. + Outcome operation.Outcome + + // ErrorCode stores the stable error code on failure. Empty on + // success. + ErrorCode string + + // ErrorMessage stores the operator-readable detail on failure. + // Empty on success. + ErrorMessage string +} + +// IsSuccess reports whether the result represents a successful +// operation. +func (result Result) IsSuccess() bool { + return result.Outcome == operation.OutcomeSuccess +} + +// Dependencies groups the collaborators required by Service. +type Dependencies struct { + // RuntimeRecords drives every CAS and scheduling persistence step. + RuntimeRecords ports.RuntimeRecordStore + + // PlayerMappings supplies the per-game roster used to project + // engine player state to user-facing notification recipients and + // `player_turn_stats`. + PlayerMappings ports.PlayerMappingStore + + // OperationLogs records the audit entry for the operation. + OperationLogs ports.OperationLogStore + + // Engine drives the engine /admin/turn call. + Engine ports.EngineClient + + // LobbyEvents publishes `runtime_snapshot_update` and + // `game_finished` to `gm:lobby_events`. + LobbyEvents ports.LobbyEventsPublisher + + // Notifications publishes `game.turn.ready`, `game.finished`, and + // `game.generation_failed` intents to `notification:intents`. + Notifications ports.NotificationIntentPublisher + + // Lobby resolves the human-readable `game_name` consumed by + // notification payloads. Failure is fail-soft: the orchestrator + // falls back to `game_id`. + Lobby ports.LobbyClient + + // Scheduler computes the post-success `next_generation_at` value. + Scheduler *scheduler.Service + + // Telemetry records the turn-generation outcome counter, lobby + // publication counter, and notification publish-attempt counter. + Telemetry *telemetry.Runtime + + // Logger records structured service-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger + + // Clock supplies the wall-clock used for operation timestamps. + // Defaults to `time.Now` when nil. + Clock func() time.Time +} + +// Service executes the turn-generation lifecycle operation. +type Service struct { + runtimeRecords ports.RuntimeRecordStore + playerMappings ports.PlayerMappingStore + operationLogs ports.OperationLogStore + engine ports.EngineClient + lobbyEvents ports.LobbyEventsPublisher + notifications ports.NotificationIntentPublisher + lobby ports.LobbyClient + scheduler *scheduler.Service + + telemetry *telemetry.Runtime + logger *slog.Logger + clock func() time.Time +} + +// NewService constructs one Service from deps. +func NewService(deps Dependencies) (*Service, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new turn generation service: nil runtime records") + case deps.PlayerMappings == nil: + return nil, errors.New("new turn generation service: nil player mappings") + case deps.OperationLogs == nil: + return nil, errors.New("new turn generation service: nil operation logs") + case deps.Engine == nil: + return nil, errors.New("new turn generation service: nil engine client") + case deps.LobbyEvents == nil: + return nil, errors.New("new turn generation service: nil lobby events publisher") + case deps.Notifications == nil: + return nil, errors.New("new turn generation service: nil notification publisher") + case deps.Lobby == nil: + return nil, errors.New("new turn generation service: nil lobby client") + case deps.Scheduler == nil: + return nil, errors.New("new turn generation service: nil scheduler") + case deps.Telemetry == nil: + return nil, errors.New("new turn generation service: nil telemetry runtime") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + logger = logger.With("service", "gamemaster.turngeneration") + + return &Service{ + runtimeRecords: deps.RuntimeRecords, + playerMappings: deps.PlayerMappings, + operationLogs: deps.OperationLogs, + engine: deps.Engine, + lobbyEvents: deps.LobbyEvents, + notifications: deps.Notifications, + lobby: deps.Lobby, + scheduler: deps.Scheduler, + telemetry: deps.Telemetry, + logger: logger, + clock: clock, + }, nil +} + +// Handle executes one turn-generation operation end-to-end. The +// Go-level error return is reserved for non-business failures (nil +// context, nil receiver). Every business outcome flows through Result. +func (service *Service) Handle(ctx context.Context, input Input) (Result, error) { + if service == nil { + return Result{}, errors.New("turn generation: nil service") + } + if ctx == nil { + return Result{}, errors.New("turn generation: nil context") + } + + opStartedAt := service.clock().UTC() + + if err := input.Validate(); err != nil { + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeInvalidRequest, err.Error()), nil + } + + record, outcome, ok := service.loadRecord(ctx, opStartedAt, input) + if !ok { + return outcome, nil + } + + if record.Status != runtime.StatusRunning { + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotRunning, + fmt.Sprintf("runtime status is %q, expected %q", + record.Status, runtime.StatusRunning)), nil + } + + if outcome, ok := service.casToInProgress(ctx, opStartedAt, input); !ok { + return outcome, nil + } + + state, engineOK, engineCode, engineMsg := service.callEngineTurn(ctx, record) + mappings, listErr := service.playerMappings.ListByGame(ctx, input.GameID) + if listErr != nil { + // Without mappings we cannot project player_turn_stats; treat + // as a service_unavailable failure but still try to roll the + // runtime to generation_failed because the engine call may + // have already mutated state. + return service.failGeneration(ctx, opStartedAt, input, record, + ErrorCodeServiceUnavailable, + fmt.Sprintf("list player mappings: %s", listErr.Error())), nil + } + + if !engineOK { + return service.failGeneration(ctx, opStartedAt, input, record, + engineCode, engineMsg), nil + } + + if outcome, ok := service.validateRoster(ctx, opStartedAt, input, record, state, mappings); !ok { + return outcome, nil + } + + if state.Finished { + return service.completeFinished(ctx, opStartedAt, input, record, state, mappings), nil + } + return service.completeRunning(ctx, opStartedAt, input, record, state, mappings), nil +} + +// loadRecord reads the runtime record and maps store errors to +// orchestrator outcomes. ok=false means the flow stops with the +// returned Result. +func (service *Service) loadRecord(ctx context.Context, opStartedAt time.Time, input Input) (runtime.RuntimeRecord, Result, bool) { + record, err := service.runtimeRecords.Get(ctx, input.GameID) + switch { + case err == nil: + return record, Result{}, true + case errors.Is(err, runtime.ErrNotFound): + return runtime.RuntimeRecord{}, service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, "runtime record does not exist"), false + default: + return runtime.RuntimeRecord{}, service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, fmt.Sprintf("get runtime record: %s", err.Error())), false + } +} + +// casToInProgress flips the runtime row from `running` to +// `generation_in_progress`. ok=false means the flow stops with the +// returned Result; the caller has not touched the engine yet. +func (service *Service) casToInProgress(ctx context.Context, opStartedAt time.Time, input Input) (Result, bool) { + err := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: runtime.StatusRunning, + To: runtime.StatusGenerationInProgress, + Now: opStartedAt, + }) + switch { + case err == nil: + return Result{}, true + case errors.Is(err, runtime.ErrConflict): + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeConflict, + fmt.Sprintf("cas runtime status to generation_in_progress: %s", err.Error())), false + case errors.Is(err, runtime.ErrNotFound): + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, + fmt.Sprintf("cas runtime status to generation_in_progress: %s", err.Error())), false + default: + return service.recordEarlyFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("cas runtime status to generation_in_progress: %s", err.Error())), false + } +} + +// callEngineTurn dispatches the engine /admin/turn call and classifies +// the outcome. engineOK=true means the response is well-formed at the +// transport level; engineOK=false populates errorCode / errorMessage +// with a stable failure shape. +func (service *Service) callEngineTurn(ctx context.Context, record runtime.RuntimeRecord) (state ports.StateResponse, engineOK bool, errorCode string, errorMessage string) { + state, err := service.engine.Turn(ctx, record.EngineEndpoint) + if err == nil { + return state, true, "", "" + } + return ports.StateResponse{}, false, classifyEngineError(err), fmt.Sprintf("engine turn: %s", err.Error()) +} + +// classifyEngineError maps the engine port sentinels to the +// turn-generation stable error codes. +func classifyEngineError(err error) string { + switch { + case errors.Is(err, ports.ErrEngineValidation): + return ErrorCodeEngineValidationError + case errors.Is(err, ports.ErrEngineProtocolViolation): + return ErrorCodeEngineProtocolViolation + case errors.Is(err, ports.ErrEngineUnreachable): + return ErrorCodeEngineUnreachable + default: + return ErrorCodeEngineUnreachable + } +} + +// validateRoster checks that the engine response carries exactly the +// race set installed at register-runtime. ok=false means the flow stops +// (and the runtime row is moved to `generation_failed`). +func (service *Service) validateRoster(ctx context.Context, opStartedAt time.Time, input Input, record runtime.RuntimeRecord, state ports.StateResponse, mappings []playermapping.PlayerMapping) (Result, bool) { + if len(state.Players) != len(mappings) { + message := fmt.Sprintf("engine player count %d does not match roster size %d", + len(state.Players), len(mappings)) + return service.failGeneration(ctx, opStartedAt, input, record, + ErrorCodeEngineProtocolViolation, message), false + } + expected := make(map[string]struct{}, len(mappings)) + for _, mapping := range mappings { + expected[mapping.RaceName] = struct{}{} + } + for _, player := range state.Players { + if _, ok := expected[player.RaceName]; !ok { + message := fmt.Sprintf("engine returned race %q not present in roster", player.RaceName) + return service.failGeneration(ctx, opStartedAt, input, record, + ErrorCodeEngineProtocolViolation, message), false + } + } + return Result{}, true +} + +// completeFinished handles the `finished=true` branch: CAS to finished, +// clear scheduling, publish game_finished, publish game.finished +// notification, audit success. +func (service *Service) completeFinished(ctx context.Context, opStartedAt time.Time, input Input, record runtime.RuntimeRecord, state ports.StateResponse, mappings []playermapping.PlayerMapping) Result { + finishedAt := service.clock().UTC() + + err := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: runtime.StatusGenerationInProgress, + To: runtime.StatusFinished, + Now: finishedAt, + }) + if err != nil { + return service.handlePostEngineCASFailure(ctx, opStartedAt, input, record, err) + } + + if err := service.runtimeRecords.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: input.GameID, + NextGenerationAt: nil, + SkipNextTick: false, + CurrentTurn: state.Turn, + Now: finishedAt, + }); err != nil { + // The CAS to finished succeeded; the row is in the terminal + // state. Surface a service_unavailable to the caller but keep + // the audit and snapshot consistent. + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("update scheduling on finish: %s", err.Error())) + } + + persisted, reloadErr := service.runtimeRecords.Get(ctx, input.GameID) + if reloadErr != nil { + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("reload runtime record: %s", reloadErr.Error())) + } + + stats := projectPlayerStats(state, mappings) + + finishedMsg := ports.GameFinished{ + GameID: input.GameID, + FinalTurnNumber: state.Turn, + RuntimeStatus: runtime.StatusFinished, + PlayerTurnStats: stats, + FinishedAt: finishedAt, + } + if err := service.lobbyEvents.PublishGameFinished(ctx, finishedMsg); err != nil { + service.logger.ErrorContext(ctx, "publish game finished", + "game_id", input.GameID, + "err", err.Error(), + ) + } else { + service.telemetry.RecordLobbyEventPublished(ctx, "game_finished") + } + + gameName := service.resolveGameName(ctx, input.GameID) + recipients := recipientUserIDs(mappings) + service.publishGameFinishedIntent(ctx, input, gameName, state.Turn, recipients, finishedAt) + + service.appendSuccessLog(ctx, opStartedAt, input) + service.telemetry.RecordTurnGenerationOutcome(ctx, + string(operation.OutcomeSuccess), "", string(input.Trigger)) + + logArgs := []any{ + "game_id", input.GameID, + "trigger", string(input.Trigger), + "final_turn", state.Turn, + "finished", true, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "turn generation finished game", logArgs...) + + return Result{ + Record: persisted, + Trigger: input.Trigger, + Finished: true, + Outcome: operation.OutcomeSuccess, + } +} + +// completeRunning handles the `finished=false` branch: recompute next +// tick, CAS back to running, publish snapshot, publish +// game.turn.ready notification, audit success. +func (service *Service) completeRunning(ctx context.Context, opStartedAt time.Time, input Input, record runtime.RuntimeRecord, state ports.StateResponse, mappings []playermapping.PlayerMapping) Result { + completedAt := service.clock().UTC() + + next, _, err := service.scheduler.ComputeNext(record.TurnSchedule, completedAt, record.SkipNextTick) + if err != nil { + return service.failGeneration(ctx, opStartedAt, input, record, + ErrorCodeInvalidRequest, + fmt.Sprintf("recompute next tick: %s", err.Error())) + } + + if err := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: runtime.StatusGenerationInProgress, + To: runtime.StatusRunning, + Now: completedAt, + }); err != nil { + return service.handlePostEngineCASFailure(ctx, opStartedAt, input, record, err) + } + + if err := service.runtimeRecords.UpdateScheduling(ctx, ports.UpdateSchedulingInput{ + GameID: input.GameID, + NextGenerationAt: &next, + SkipNextTick: false, + CurrentTurn: state.Turn, + Now: completedAt, + }); err != nil { + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("update scheduling on running: %s", err.Error())) + } + + persisted, reloadErr := service.runtimeRecords.Get(ctx, input.GameID) + if reloadErr != nil { + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("reload runtime record: %s", reloadErr.Error())) + } + + stats := projectPlayerStats(state, mappings) + + snapshot := ports.RuntimeSnapshotUpdate{ + GameID: input.GameID, + CurrentTurn: state.Turn, + RuntimeStatus: runtime.StatusRunning, + EngineHealthSummary: persisted.EngineHealth, + PlayerTurnStats: stats, + OccurredAt: completedAt, + } + if err := service.lobbyEvents.PublishSnapshotUpdate(ctx, snapshot); err != nil { + service.logger.ErrorContext(ctx, "publish runtime snapshot update", + "game_id", input.GameID, + "err", err.Error(), + ) + } else { + service.telemetry.RecordLobbyEventPublished(ctx, "runtime_snapshot_update") + } + + gameName := service.resolveGameName(ctx, input.GameID) + recipients := recipientUserIDs(mappings) + service.publishGameTurnReadyIntent(ctx, input, gameName, state.Turn, recipients, completedAt) + + service.appendSuccessLog(ctx, opStartedAt, input) + service.telemetry.RecordTurnGenerationOutcome(ctx, + string(operation.OutcomeSuccess), "", string(input.Trigger)) + + logArgs := []any{ + "game_id", input.GameID, + "trigger", string(input.Trigger), + "current_turn", state.Turn, + "next_generation_at", next.Format(time.RFC3339Nano), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.InfoContext(ctx, "turn generation succeeded", logArgs...) + + return Result{ + Record: persisted, + Trigger: input.Trigger, + Outcome: operation.OutcomeSuccess, + } +} + +// failGeneration handles every post-CAS failure path: CAS to +// generation_failed, publish snapshot, publish game.generation_failed +// admin notification, audit failure. +func (service *Service) failGeneration(ctx context.Context, opStartedAt time.Time, input Input, _ runtime.RuntimeRecord, errorCode string, errorMessage string) Result { + failedAt := service.clock().UTC() + + casErr := service.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: input.GameID, + ExpectedFrom: runtime.StatusGenerationInProgress, + To: runtime.StatusGenerationFailed, + Now: failedAt, + }) + if casErr != nil && !errors.Is(casErr, runtime.ErrConflict) { + // Best-effort transition. The original error code remains the + // caller-visible one; log the secondary failure. + service.logger.ErrorContext(ctx, "cas runtime status to generation_failed", + "game_id", input.GameID, + "err", casErr.Error(), + ) + } + + persisted, reloadErr := service.runtimeRecords.Get(ctx, input.GameID) + publishedStatus := runtime.StatusGenerationFailed + if reloadErr == nil { + publishedStatus = persisted.Status + } + + snapshot := ports.RuntimeSnapshotUpdate{ + GameID: input.GameID, + CurrentTurn: persistedTurn(persisted, reloadErr), + RuntimeStatus: publishedStatus, + EngineHealthSummary: persistedHealth(persisted, reloadErr), + PlayerTurnStats: nil, + OccurredAt: failedAt, + } + if err := service.lobbyEvents.PublishSnapshotUpdate(ctx, snapshot); err != nil { + service.logger.ErrorContext(ctx, "publish runtime snapshot update on failure", + "game_id", input.GameID, + "err", err.Error(), + ) + } else { + service.telemetry.RecordLobbyEventPublished(ctx, "runtime_snapshot_update") + } + + gameName := service.resolveGameName(ctx, input.GameID) + service.publishGameGenerationFailedIntent(ctx, input, gameName, errorCode, errorMessage, failedAt) + + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + service.telemetry.RecordTurnGenerationOutcome(ctx, + string(operation.OutcomeFailure), errorCode, string(input.Trigger)) + + logArgs := []any{ + "game_id", input.GameID, + "trigger", string(input.Trigger), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "turn generation failed", logArgs...) + + return Result{ + Record: persisted, + Trigger: input.Trigger, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// handlePostEngineCASFailure maps a CAS error that surfaced after the +// engine call already succeeded. Conflict means an external actor (e.g. +// admin stop) won the race; other errors are treated as +// service_unavailable. No publication is issued — the external mutation +// owns its own snapshot. +func (service *Service) handlePostEngineCASFailure(ctx context.Context, opStartedAt time.Time, input Input, _ runtime.RuntimeRecord, casErr error) Result { + switch { + case errors.Is(casErr, runtime.ErrConflict): + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeConflict, + fmt.Sprintf("cas runtime status post-engine: %s", casErr.Error())) + case errors.Is(casErr, runtime.ErrNotFound): + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeRuntimeNotFound, + fmt.Sprintf("cas runtime status post-engine: %s", casErr.Error())) + default: + return service.recordTerminalFailure(ctx, opStartedAt, input, + ErrorCodeServiceUnavailable, + fmt.Sprintf("cas runtime status post-engine: %s", casErr.Error())) + } +} + +// recordEarlyFailure handles failures that occur before the runtime row +// is in `generation_in_progress`. No status mutation, no publication; +// only audit and telemetry. +func (service *Service) recordEarlyFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + service.telemetry.RecordTurnGenerationOutcome(ctx, + string(operation.OutcomeFailure), errorCode, string(input.Trigger)) + logArgs := []any{ + "game_id", input.GameID, + "trigger", string(input.Trigger), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "turn generation rejected", logArgs...) + return Result{ + Trigger: input.Trigger, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// recordTerminalFailure handles failures after a post-engine CAS or a +// reload failed. The runtime row is in an undetermined state owned by +// whatever mutation won; we record the audit and surface the failure +// without further publication. +func (service *Service) recordTerminalFailure(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) Result { + service.appendFailureLog(ctx, opStartedAt, input, errorCode, errorMessage) + service.telemetry.RecordTurnGenerationOutcome(ctx, + string(operation.OutcomeFailure), errorCode, string(input.Trigger)) + logArgs := []any{ + "game_id", input.GameID, + "trigger", string(input.Trigger), + "error_code", errorCode, + "error_message", errorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "turn generation post-engine failure", logArgs...) + return Result{ + Trigger: input.Trigger, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + } +} + +// resolveGameName fetches the human-readable game name from Lobby and +// falls back to the platform game id on any error per Stage 15 D1. +func (service *Service) resolveGameName(ctx context.Context, gameID string) string { + summary, err := service.lobby.GetGameSummary(ctx, gameID) + if err != nil { + logArgs := []any{ + "game_id", gameID, + "error_code", "lobby_unavailable", + "err", err.Error(), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + service.logger.WarnContext(ctx, "resolve game name fell back to game id", logArgs...) + return gameID + } + if strings.TrimSpace(summary.GameName) == "" { + return gameID + } + return summary.GameName +} + +// publishGameTurnReadyIntent publishes the user-targeted notification +// that announces a freshly generated turn. Empty recipient sets are +// dropped silently — the validator inside notificationintent rejects +// them outright, but the orchestrator should not break commit. +func (service *Service) publishGameTurnReadyIntent(ctx context.Context, input Input, gameName string, turnNumber int, recipients []string, occurredAt time.Time) { + if len(recipients) == 0 { + service.logger.WarnContext(ctx, "skip game.turn.ready notification: empty recipient set", + "game_id", input.GameID, + ) + return + } + intent, err := notificationintent.NewGameTurnReadyIntent( + notificationintent.Metadata{ + IdempotencyKey: fmt.Sprintf("game.turn.ready:%s:%d", input.GameID, turnNumber), + OccurredAt: occurredAt, + RequestID: logging.RequestIDFromContext(ctx), + }, + recipients, + notificationintent.GameTurnReadyPayload{ + GameID: input.GameID, + GameName: gameName, + TurnNumber: int64(turnNumber), + }, + ) + if err != nil { + service.logger.ErrorContext(ctx, "build game.turn.ready intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameTurnReady), "error") + return + } + if err := service.notifications.Publish(ctx, intent); err != nil { + service.logger.ErrorContext(ctx, "publish game.turn.ready intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameTurnReady), "error") + return + } + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameTurnReady), "ok") +} + +// publishGameFinishedIntent publishes the user-targeted notification +// that announces a finished game. +func (service *Service) publishGameFinishedIntent(ctx context.Context, input Input, gameName string, finalTurnNumber int, recipients []string, occurredAt time.Time) { + if len(recipients) == 0 { + service.logger.WarnContext(ctx, "skip game.finished notification: empty recipient set", + "game_id", input.GameID, + ) + return + } + intent, err := notificationintent.NewGameFinishedIntent( + notificationintent.Metadata{ + IdempotencyKey: fmt.Sprintf("game.finished:%s:%d", input.GameID, finalTurnNumber), + OccurredAt: occurredAt, + RequestID: logging.RequestIDFromContext(ctx), + }, + recipients, + notificationintent.GameFinishedPayload{ + GameID: input.GameID, + GameName: gameName, + FinalTurnNumber: int64(finalTurnNumber), + }, + ) + if err != nil { + service.logger.ErrorContext(ctx, "build game.finished intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameFinished), "error") + return + } + if err := service.notifications.Publish(ctx, intent); err != nil { + service.logger.ErrorContext(ctx, "publish game.finished intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameFinished), "error") + return + } + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameFinished), "ok") +} + +// publishGameGenerationFailedIntent publishes the admin-email +// notification that announces a failed turn generation. +func (service *Service) publishGameGenerationFailedIntent(ctx context.Context, input Input, gameName string, errorCode string, errorMessage string, occurredAt time.Time) { + failureReason := errorCode + if strings.TrimSpace(errorMessage) != "" { + failureReason = fmt.Sprintf("%s: %s", errorCode, errorMessage) + } + intent, err := notificationintent.NewGameGenerationFailedIntent( + notificationintent.Metadata{ + IdempotencyKey: fmt.Sprintf("game.generation_failed:%s:%d", + input.GameID, occurredAt.UnixMilli()), + OccurredAt: occurredAt, + RequestID: logging.RequestIDFromContext(ctx), + }, + notificationintent.GameGenerationFailedPayload{ + GameID: input.GameID, + GameName: gameName, + FailureReason: failureReason, + }, + ) + if err != nil { + service.logger.ErrorContext(ctx, "build game.generation_failed intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameGenerationFailed), "error") + return + } + if err := service.notifications.Publish(ctx, intent); err != nil { + service.logger.ErrorContext(ctx, "publish game.generation_failed intent", + "game_id", input.GameID, + "err", err.Error(), + ) + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameGenerationFailed), "error") + return + } + service.telemetry.RecordNotificationPublishAttempt(ctx, + string(notificationintent.NotificationTypeGameGenerationFailed), "ok") +} + +// projectPlayerStats joins the engine response on RaceName against the +// installed roster to build one PlayerTurnStats per active member. +// Result is sorted by UserID for a deterministic wire order. +func projectPlayerStats(state ports.StateResponse, mappings []playermapping.PlayerMapping) []ports.PlayerTurnStats { + if len(state.Players) == 0 || len(mappings) == 0 { + return nil + } + userByRace := make(map[string]string, len(mappings)) + for _, mapping := range mappings { + userByRace[mapping.RaceName] = mapping.UserID + } + stats := make([]ports.PlayerTurnStats, 0, len(state.Players)) + for _, player := range state.Players { + userID, ok := userByRace[player.RaceName] + if !ok { + continue + } + stats = append(stats, ports.PlayerTurnStats{ + UserID: userID, + Planets: player.Planets, + Population: player.Population, + }) + } + sort.Slice(stats, func(i, j int) bool { return stats[i].UserID < stats[j].UserID }) + return stats +} + +// recipientUserIDs returns the deduplicated, sorted-ascending list of +// platform user ids derived from the roster. Mirrors the +// notificationintent validator's expectations. +func recipientUserIDs(mappings []playermapping.PlayerMapping) []string { + if len(mappings) == 0 { + return nil + } + seen := make(map[string]struct{}, len(mappings)) + result := make([]string, 0, len(mappings)) + for _, mapping := range mappings { + userID := strings.TrimSpace(mapping.UserID) + if userID == "" { + continue + } + if _, ok := seen[userID]; ok { + continue + } + seen[userID] = struct{}{} + result = append(result, userID) + } + sort.Strings(result) + return result +} + +// persistedTurn returns the stored CurrentTurn when reloadErr is nil, +// or zero otherwise. Used to populate the failure-side snapshot +// without making a second DB read. +func persistedTurn(record runtime.RuntimeRecord, reloadErr error) int { + if reloadErr != nil { + return 0 + } + return record.CurrentTurn +} + +// persistedHealth returns the stored EngineHealth when reloadErr is +// nil, or empty string otherwise. +func persistedHealth(record runtime.RuntimeRecord, reloadErr error) string { + if reloadErr != nil { + return "" + } + return record.EngineHealth +} + +// appendSuccessLog records the success operation_log entry. +func (service *Service) appendSuccessLog(ctx context.Context, opStartedAt time.Time, input Input) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindTurnGeneration, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeSuccess, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// appendFailureLog records the failure operation_log entry. +func (service *Service) appendFailureLog(ctx context.Context, opStartedAt time.Time, input Input, errorCode string, errorMessage string) { + finishedAt := service.clock().UTC() + service.bestEffortAppend(ctx, operation.OperationEntry{ + GameID: input.GameID, + OpKind: operation.OpKindTurnGeneration, + OpSource: fallbackOpSource(input.OpSource), + SourceRef: input.SourceRef, + Outcome: operation.OutcomeFailure, + ErrorCode: errorCode, + ErrorMessage: errorMessage, + StartedAt: opStartedAt, + FinishedAt: &finishedAt, + }) +} + +// bestEffortAppend writes one operation_log entry. A failure is logged +// and discarded; the runtime row is the source of truth. +func (service *Service) bestEffortAppend(ctx context.Context, entry operation.OperationEntry) { + if _, err := service.operationLogs.Append(ctx, entry); err != nil { + service.logger.ErrorContext(ctx, "append operation log", + "game_id", entry.GameID, + "op_kind", string(entry.OpKind), + "outcome", string(entry.Outcome), + "error_code", entry.ErrorCode, + "err", err.Error(), + ) + } +} + +// fallbackOpSource defaults to admin_rest when source is missing or +// unrecognised. Mirrors `gamemaster/README.md §Trusted Surfaces`. +func fallbackOpSource(source operation.OpSource) operation.OpSource { + if source.IsKnown() { + return source + } + return operation.OpSourceAdminRest +} diff --git a/gamemaster/internal/service/turngeneration/service_test.go b/gamemaster/internal/service/turngeneration/service_test.go new file mode 100644 index 0000000..e0f4a2f --- /dev/null +++ b/gamemaster/internal/service/turngeneration/service_test.go @@ -0,0 +1,841 @@ +package turngeneration_test + +import ( + "context" + "errors" + "fmt" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/scheduler" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" + "galaxy/notificationintent" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// --- test doubles ----------------------------------------------------- + +type fakeRuntimeRecords struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + getErr error + updErr error + schErr error + insErr error + updates []ports.UpdateStatusInput + scheds []ports.UpdateSchedulingInput + getCalls int +} + +func newFakeRuntimeRecords() *fakeRuntimeRecords { + return &fakeRuntimeRecords{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecords) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecords) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + s.getCalls++ + if s.getErr != nil { + return runtime.RuntimeRecord{}, s.getErr + } + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecords) Insert(_ context.Context, record runtime.RuntimeRecord) error { + s.mu.Lock() + defer s.mu.Unlock() + if s.insErr != nil { + return s.insErr + } + if _, ok := s.stored[record.GameID]; ok { + return runtime.ErrConflict + } + s.stored[record.GameID] = record + return nil +} + +func (s *fakeRuntimeRecords) UpdateStatus(_ context.Context, input ports.UpdateStatusInput) error { + s.mu.Lock() + defer s.mu.Unlock() + s.updates = append(s.updates, input) + if s.updErr != nil { + return s.updErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if record.Status != input.ExpectedFrom { + return runtime.ErrConflict + } + record.Status = input.To + record.UpdatedAt = input.Now + if input.To == runtime.StatusFinished { + finishedAt := input.Now + record.FinishedAt = &finishedAt + } + if input.To == runtime.StatusRunning && record.StartedAt == nil { + startedAt := input.Now + record.StartedAt = &startedAt + } + s.stored[input.GameID] = record + return nil +} + +func (s *fakeRuntimeRecords) UpdateScheduling(_ context.Context, input ports.UpdateSchedulingInput) error { + s.mu.Lock() + defer s.mu.Unlock() + s.scheds = append(s.scheds, input) + if s.schErr != nil { + return s.schErr + } + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if input.NextGenerationAt != nil { + next := *input.NextGenerationAt + record.NextGenerationAt = &next + } else { + record.NextGenerationAt = nil + } + record.SkipNextTick = input.SkipNextTick + record.CurrentTurn = input.CurrentTurn + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + return nil +} + +func (s *fakeRuntimeRecords) UpdateImage(_ context.Context, _ ports.UpdateImageInput) error { + return errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) UpdateEngineHealth(_ context.Context, _ ports.UpdateEngineHealthInput) error { + return errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) Delete(_ context.Context, _ string) error { + return errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) ListDueRunning(_ context.Context, _ time.Time) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) ListByStatus(_ context.Context, _ runtime.Status) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) List(_ context.Context) ([]runtime.RuntimeRecord, error) { + return nil, errors.New("not used in turngeneration tests") +} + +func (s *fakeRuntimeRecords) record(gameID string) (runtime.RuntimeRecord, bool) { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[gameID] + return record, ok +} + +func (s *fakeRuntimeRecords) statusUpdates() []ports.UpdateStatusInput { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]ports.UpdateStatusInput, len(s.updates)) + copy(out, s.updates) + return out +} + +func (s *fakeRuntimeRecords) scheduling() []ports.UpdateSchedulingInput { + s.mu.Lock() + defer s.mu.Unlock() + out := make([]ports.UpdateSchedulingInput, len(s.scheds)) + copy(out, s.scheds) + return out +} + +type fakePlayerMappings struct { + mu sync.Mutex + stored map[string][]playermapping.PlayerMapping + listErr error +} + +func newFakePlayerMappings() *fakePlayerMappings { + return &fakePlayerMappings{stored: map[string][]playermapping.PlayerMapping{}} +} + +func (s *fakePlayerMappings) seed(gameID string, members ...playermapping.PlayerMapping) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[gameID] = append([]playermapping.PlayerMapping(nil), members...) +} + +func (s *fakePlayerMappings) BulkInsert(_ context.Context, _ []playermapping.PlayerMapping) error { + return errors.New("not used in turngeneration tests") +} + +func (s *fakePlayerMappings) Get(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used in turngeneration tests") +} + +func (s *fakePlayerMappings) GetByRace(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used in turngeneration tests") +} + +func (s *fakePlayerMappings) ListByGame(_ context.Context, gameID string) ([]playermapping.PlayerMapping, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.listErr != nil { + return nil, s.listErr + } + return append([]playermapping.PlayerMapping(nil), s.stored[gameID]...), nil +} + +func (s *fakePlayerMappings) DeleteByGame(_ context.Context, _ string) error { + return errors.New("not used in turngeneration tests") +} + +type fakeOperationLogs struct { + mu sync.Mutex + appErr error + entries []operation.OperationEntry +} + +func (s *fakeOperationLogs) Append(_ context.Context, entry operation.OperationEntry) (int64, error) { + s.mu.Lock() + defer s.mu.Unlock() + if s.appErr != nil { + return 0, s.appErr + } + if err := entry.Validate(); err != nil { + return 0, err + } + s.entries = append(s.entries, entry) + return int64(len(s.entries)), nil +} + +func (s *fakeOperationLogs) ListByGame(_ context.Context, _ string, _ int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used in turngeneration tests") +} + +func (s *fakeOperationLogs) lastEntry() (operation.OperationEntry, bool) { + s.mu.Lock() + defer s.mu.Unlock() + if len(s.entries) == 0 { + return operation.OperationEntry{}, false + } + return s.entries[len(s.entries)-1], true +} + +// --- harness ---------------------------------------------------------- + +type harness struct { + t *testing.T + ctrl *gomock.Controller + runtimeStore *fakeRuntimeRecords + mappings *fakePlayerMappings + logs *fakeOperationLogs + engine *mocks.MockEngineClient + lobbyEvents *mocks.MockLobbyEventsPublisher + notifications *mocks.MockNotificationIntentPublisher + lobby *mocks.MockLobbyClient + telemetry *telemetry.Runtime + now time.Time + service *turngeneration.Service +} + +const ( + testGameID = "game-001" + testEngineEndpoint = "http://galaxy-game-game-001:8080" + testTurnSchedule = "0 18 * * *" + testGameName = "Andromeda Conquest" +) + +func newHarness(t *testing.T) *harness { + t.Helper() + ctrl := gomock.NewController(t) + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + + h := &harness{ + t: t, + ctrl: ctrl, + runtimeStore: newFakeRuntimeRecords(), + mappings: newFakePlayerMappings(), + logs: &fakeOperationLogs{}, + engine: mocks.NewMockEngineClient(ctrl), + lobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + notifications: mocks.NewMockNotificationIntentPublisher(ctrl), + lobby: mocks.NewMockLobbyClient(ctrl), + telemetry: telemetryRuntime, + now: time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC), + } + + service, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: h.runtimeStore, + PlayerMappings: h.mappings, + OperationLogs: h.logs, + Engine: h.engine, + LobbyEvents: h.lobbyEvents, + Notifications: h.notifications, + Lobby: h.lobby, + Scheduler: scheduler.New(), + Telemetry: h.telemetry, + Clock: func() time.Time { return h.now }, + }) + require.NoError(t, err) + h.service = service + return h +} + +func (h *harness) seedRunningRecord(skip bool) { + startedAt := h.now.Add(-1 * time.Hour) + h.runtimeStore.seed(runtime.RuntimeRecord{ + GameID: testGameID, + Status: runtime.StatusRunning, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: testTurnSchedule, + CurrentTurn: 0, + SkipNextTick: skip, + EngineHealth: "healthy", + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-2 * time.Hour), + StartedAt: &startedAt, + }) + h.mappings.seed(testGameID, + playermapping.PlayerMapping{ + GameID: testGameID, + UserID: "user-1", + RaceName: "Aelinari", + EnginePlayerUUID: "uuid-1", + CreatedAt: h.now.Add(-2 * time.Hour), + }, + playermapping.PlayerMapping{ + GameID: testGameID, + UserID: "user-2", + RaceName: "Drazi", + EnginePlayerUUID: "uuid-2", + CreatedAt: h.now.Add(-2 * time.Hour), + }, + ) +} + +func successInput() turngeneration.Input { + return turngeneration.Input{ + GameID: testGameID, + Trigger: turngeneration.TriggerScheduler, + OpSource: operation.OpSourceAdminRest, + SourceRef: "tick-1", + } +} + +func enginePlayers() []ports.PlayerState { + return []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 3, Population: 100}, + {RaceName: "Drazi", EnginePlayerUUID: "uuid-2", Planets: 2, Population: 80}, + } +} + +func (h *harness) expectGameSummary() { + h.lobby.EXPECT(). + GetGameSummary(gomock.Any(), testGameID). + Return(ports.GameSummary{GameID: testGameID, GameName: testGameName, Status: "running"}, nil) +} + +// --- tests ------------------------------------------------------------ + +func TestNewServiceRejectsMissingDeps(t *testing.T) { + telemetryRuntime, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + cases := []struct { + name string + mut func(*turngeneration.Dependencies) + }{ + {"runtime records", func(d *turngeneration.Dependencies) { d.RuntimeRecords = nil }}, + {"player mappings", func(d *turngeneration.Dependencies) { d.PlayerMappings = nil }}, + {"operation logs", func(d *turngeneration.Dependencies) { d.OperationLogs = nil }}, + {"engine", func(d *turngeneration.Dependencies) { d.Engine = nil }}, + {"lobby events", func(d *turngeneration.Dependencies) { d.LobbyEvents = nil }}, + {"notifications", func(d *turngeneration.Dependencies) { d.Notifications = nil }}, + {"lobby", func(d *turngeneration.Dependencies) { d.Lobby = nil }}, + {"scheduler", func(d *turngeneration.Dependencies) { d.Scheduler = nil }}, + {"telemetry", func(d *turngeneration.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := turngeneration.Dependencies{ + RuntimeRecords: newFakeRuntimeRecords(), + PlayerMappings: newFakePlayerMappings(), + OperationLogs: &fakeOperationLogs{}, + Engine: mocks.NewMockEngineClient(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Notifications: mocks.NewMockNotificationIntentPublisher(ctrl), + Lobby: mocks.NewMockLobbyClient(ctrl), + Scheduler: scheduler.New(), + Telemetry: telemetryRuntime, + } + tc.mut(&deps) + service, err := turngeneration.NewService(deps) + require.Error(t, err) + require.Nil(t, service) + }) + } +} + +func TestHandleRejectsInvalidInput(t *testing.T) { + cases := []struct { + name string + mut func(*turngeneration.Input) + }{ + {"empty game id", func(i *turngeneration.Input) { i.GameID = "" }}, + {"unknown trigger", func(i *turngeneration.Input) { i.Trigger = "exotic" }}, + {"unknown op source", func(i *turngeneration.Input) { i.OpSource = "exotic" }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + h := newHarness(t) + input := successInput() + tc.mut(&input) + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, turngeneration.ErrorCodeInvalidRequest, result.ErrorCode) + }) + } +} + +func TestHandleHappyPathScheduler(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 1, Players: enginePlayers(), Finished: false}, nil) + + var snapshot ports.RuntimeSnapshotUpdate + h.lobbyEvents.EXPECT(). + PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, msg ports.RuntimeSnapshotUpdate) error { + snapshot = msg + return nil + }) + + h.expectGameSummary() + + var publishedIntent notificationintent.Intent + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, intent notificationintent.Intent) error { + publishedIntent = intent + return nil + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) + assert.False(t, result.Finished) + assert.Equal(t, turngeneration.TriggerScheduler, result.Trigger) + assert.Equal(t, runtime.StatusRunning, result.Record.Status) + assert.Equal(t, 1, result.Record.CurrentTurn) + require.NotNil(t, result.Record.NextGenerationAt) + assert.Equal(t, time.Date(2026, time.April, 30, 18, 0, 0, 0, time.UTC), *result.Record.NextGenerationAt) + assert.False(t, result.Record.SkipNextTick) + + updates := h.runtimeStore.statusUpdates() + require.Len(t, updates, 2) + assert.Equal(t, runtime.StatusRunning, updates[0].ExpectedFrom) + assert.Equal(t, runtime.StatusGenerationInProgress, updates[0].To) + assert.Equal(t, runtime.StatusGenerationInProgress, updates[1].ExpectedFrom) + assert.Equal(t, runtime.StatusRunning, updates[1].To) + + scheds := h.runtimeStore.scheduling() + require.Len(t, scheds, 1) + require.NotNil(t, scheds[0].NextGenerationAt) + assert.False(t, scheds[0].SkipNextTick) + assert.Equal(t, 1, scheds[0].CurrentTurn) + + assert.Equal(t, runtime.StatusRunning, snapshot.RuntimeStatus) + assert.Equal(t, 1, snapshot.CurrentTurn) + assert.Equal(t, "healthy", snapshot.EngineHealthSummary) + require.Len(t, snapshot.PlayerTurnStats, 2) + assert.Equal(t, "user-1", snapshot.PlayerTurnStats[0].UserID) + assert.Equal(t, 3, snapshot.PlayerTurnStats[0].Planets) + assert.Equal(t, 100, snapshot.PlayerTurnStats[0].Population) + assert.Equal(t, "user-2", snapshot.PlayerTurnStats[1].UserID) + + assert.Equal(t, notificationintent.NotificationTypeGameTurnReady, publishedIntent.NotificationType) + assert.Equal(t, []string{"user-1", "user-2"}, publishedIntent.RecipientUserIDs) + assert.Equal(t, notificationintent.AudienceKindUser, publishedIntent.AudienceKind) + assert.Contains(t, publishedIntent.PayloadJSON, fmt.Sprintf(`"game_name":%q`, testGameName)) + assert.Contains(t, publishedIntent.PayloadJSON, `"turn_number":1`) + + entry, ok := h.logs.lastEntry() + require.True(t, ok) + assert.Equal(t, operation.OpKindTurnGeneration, entry.OpKind) + assert.Equal(t, operation.OutcomeSuccess, entry.Outcome) + assert.Equal(t, "tick-1", entry.SourceRef) +} + +func TestHandleConsumesSkipNextTick(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(true) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 5, Players: enginePlayers(), Finished: false}, nil) + h.lobbyEvents.EXPECT(). + PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + Return(nil) + h.expectGameSummary() + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + Return(nil) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) + + require.NotNil(t, result.Record.NextGenerationAt) + expected := time.Date(2026, time.May, 1, 18, 0, 0, 0, time.UTC) + assert.Equal(t, expected, *result.Record.NextGenerationAt, "skip advances by one extra cron step") + assert.False(t, result.Record.SkipNextTick, "skip flag cleared after consumption") +} + +func TestHandleForceTriggerLabelsTelemetry(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 1, Players: enginePlayers()}, nil) + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.expectGameSummary() + h.notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Return(nil) + + input := successInput() + input.Trigger = turngeneration.TriggerForce + + result, err := h.service.Handle(context.Background(), input) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + assert.Equal(t, turngeneration.TriggerForce, result.Trigger) +} + +func TestHandleFinishedTransitionsAndClearsTick(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 42, Players: enginePlayers(), Finished: true}, nil) + + var finishedMsg ports.GameFinished + h.lobbyEvents.EXPECT(). + PublishGameFinished(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, msg ports.GameFinished) error { + finishedMsg = msg + return nil + }) + h.expectGameSummary() + + var publishedIntent notificationintent.Intent + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, intent notificationintent.Intent) error { + publishedIntent = intent + return nil + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) + assert.True(t, result.Finished) + assert.Equal(t, runtime.StatusFinished, result.Record.Status) + assert.Nil(t, result.Record.NextGenerationAt) + require.NotNil(t, result.Record.FinishedAt) + assert.Equal(t, h.now, *result.Record.FinishedAt) + + assert.Equal(t, runtime.StatusFinished, finishedMsg.RuntimeStatus) + assert.Equal(t, 42, finishedMsg.FinalTurnNumber) + require.Len(t, finishedMsg.PlayerTurnStats, 2) + assert.Equal(t, h.now, finishedMsg.FinishedAt) + + assert.Equal(t, notificationintent.NotificationTypeGameFinished, publishedIntent.NotificationType) + assert.Contains(t, publishedIntent.PayloadJSON, `"final_turn_number":42`) +} + +func TestHandleEngineUnreachable(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{}, fmt.Errorf("dial: %w", ports.ErrEngineUnreachable)) + + var snapshot ports.RuntimeSnapshotUpdate + h.lobbyEvents.EXPECT(). + PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, msg ports.RuntimeSnapshotUpdate) error { + snapshot = msg + return nil + }) + h.expectGameSummary() + + var publishedIntent notificationintent.Intent + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, intent notificationintent.Intent) error { + publishedIntent = intent + return nil + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, turngeneration.ErrorCodeEngineUnreachable, result.ErrorCode) + + stored, ok := h.runtimeStore.record(testGameID) + require.True(t, ok) + assert.Equal(t, runtime.StatusGenerationFailed, stored.Status) + + assert.Equal(t, runtime.StatusGenerationFailed, snapshot.RuntimeStatus) + assert.Empty(t, snapshot.PlayerTurnStats) + + assert.Equal(t, notificationintent.NotificationTypeGameGenerationFailed, publishedIntent.NotificationType) + assert.Equal(t, notificationintent.AudienceKindAdminEmail, publishedIntent.AudienceKind) + assert.Empty(t, publishedIntent.RecipientUserIDs) +} + +func TestHandleEngineValidationError(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{}, fmt.Errorf("400: %w", ports.ErrEngineValidation)) + + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.expectGameSummary() + h.notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Return(nil) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, turngeneration.ErrorCodeEngineValidationError, result.ErrorCode) + + stored, ok := h.runtimeStore.record(testGameID) + require.True(t, ok) + assert.Equal(t, runtime.StatusGenerationFailed, stored.Status) +} + +func TestHandleEngineProtocolViolationOnRosterMismatch(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{ + Turn: 1, + Players: []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 1, Population: 10}, + {RaceName: "Unknown", EnginePlayerUUID: "uuid-x", Planets: 1, Population: 5}, + }, + }, nil) + + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.expectGameSummary() + h.notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Return(nil) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, turngeneration.ErrorCodeEngineProtocolViolation, result.ErrorCode) + + stored, ok := h.runtimeStore.record(testGameID) + require.True(t, ok) + assert.Equal(t, runtime.StatusGenerationFailed, stored.Status) +} + +func TestHandleEngineProtocolViolationOnCountMismatch(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{ + Turn: 1, + Players: []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 1, Population: 10}, + }, + }, nil) + + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.expectGameSummary() + h.notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Return(nil) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, turngeneration.ErrorCodeEngineProtocolViolation, result.ErrorCode) +} + +func TestHandleConflictOnInitialCAS(t *testing.T) { + h := newHarness(t) + startedAt := h.now.Add(-1 * time.Hour) + h.runtimeStore.seed(runtime.RuntimeRecord{ + GameID: testGameID, + Status: runtime.StatusStopped, + EngineEndpoint: testEngineEndpoint, + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: testTurnSchedule, + CreatedAt: h.now.Add(-2 * time.Hour), + UpdatedAt: h.now.Add(-1 * time.Hour), + StartedAt: &startedAt, + StoppedAt: &startedAt, + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, turngeneration.ErrorCodeRuntimeNotRunning, result.ErrorCode) + + assert.Empty(t, h.runtimeStore.statusUpdates(), "no CAS attempted on non-running record") +} + +func TestHandleConflictOnPostEngineCAS(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + // Simulate a concurrent admin stop that wins the race during the + // engine call by mutating the stored row mid-flight. + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + DoAndReturn(func(_ context.Context, _ string) (ports.StateResponse, error) { + h.runtimeStore.mu.Lock() + rec := h.runtimeStore.stored[testGameID] + rec.Status = runtime.StatusStopped + h.runtimeStore.stored[testGameID] = rec + h.runtimeStore.mu.Unlock() + return ports.StateResponse{Turn: 1, Players: enginePlayers()}, nil + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, turngeneration.ErrorCodeConflict, result.ErrorCode) +} + +func TestHandleRuntimeNotFound(t *testing.T) { + h := newHarness(t) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, operation.OutcomeFailure, result.Outcome) + assert.Equal(t, turngeneration.ErrorCodeRuntimeNotFound, result.ErrorCode) +} + +func TestHandleServiceUnavailableOnGet(t *testing.T) { + h := newHarness(t) + h.runtimeStore.getErr = errors.New("postgres dial timeout") + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + assert.Equal(t, turngeneration.ErrorCodeServiceUnavailable, result.ErrorCode) +} + +func TestHandleLobbyFallbackToGameID(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 1, Players: enginePlayers()}, nil) + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.lobby.EXPECT(). + GetGameSummary(gomock.Any(), testGameID). + Return(ports.GameSummary{}, fmt.Errorf("dial: %w", ports.ErrLobbyUnavailable)) + + var publishedIntent notificationintent.Intent + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + DoAndReturn(func(_ context.Context, intent notificationintent.Intent) error { + publishedIntent = intent + return nil + }) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess()) + assert.Contains(t, publishedIntent.PayloadJSON, fmt.Sprintf(`"game_name":%q`, testGameID)) +} + +func TestHandleLobbyEventPublishFailureDoesNotRollBack(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 1, Players: enginePlayers()}, nil) + h.lobbyEvents.EXPECT(). + PublishSnapshotUpdate(gomock.Any(), gomock.Any()). + Return(errors.New("redis broken")) + h.expectGameSummary() + h.notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Return(nil) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) + assert.Equal(t, runtime.StatusRunning, result.Record.Status) + assert.Equal(t, 1, result.Record.CurrentTurn) +} + +func TestHandleNotificationFailureDoesNotRollBack(t *testing.T) { + h := newHarness(t) + h.seedRunningRecord(false) + + h.engine.EXPECT(). + Turn(gomock.Any(), testEngineEndpoint). + Return(ports.StateResponse{Turn: 1, Players: enginePlayers()}, nil) + h.lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + h.expectGameSummary() + h.notifications.EXPECT(). + Publish(gomock.Any(), gomock.Any()). + Return(errors.New("notification stream broken")) + + result, err := h.service.Handle(context.Background(), successInput()) + require.NoError(t, err) + require.True(t, result.IsSuccess(), "outcome %q error_code=%q", result.Outcome, result.ErrorCode) +} + +func TestHandleNilContext(t *testing.T) { + h := newHarness(t) + _, err := h.service.Handle(nil, successInput()) //nolint:staticcheck // intentional nil context + require.Error(t, err) +} + +func TestHandleNilService(t *testing.T) { + var service *turngeneration.Service + _, err := service.Handle(context.Background(), successInput()) + require.Error(t, err) +} diff --git a/gamemaster/internal/telemetry/runtime.go b/gamemaster/internal/telemetry/runtime.go new file mode 100644 index 0000000..0818df0 --- /dev/null +++ b/gamemaster/internal/telemetry/runtime.go @@ -0,0 +1,721 @@ +// Package telemetry provides lightweight OpenTelemetry helpers and +// low-cardinality Game Master instruments used by the runnable skeleton. +// Later stages emit into the instruments declared here without touching +// this package. +package telemetry + +import ( + "context" + "errors" + "fmt" + "log/slog" + "os" + "strings" + "sync" + "time" + + "go.opentelemetry.io/otel" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc" + "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp" + "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" + "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" + "go.opentelemetry.io/otel/exporters/stdout/stdoutmetric" + "go.opentelemetry.io/otel/exporters/stdout/stdouttrace" + "go.opentelemetry.io/otel/metric" + "go.opentelemetry.io/otel/propagation" + sdkmetric "go.opentelemetry.io/otel/sdk/metric" + "go.opentelemetry.io/otel/sdk/resource" + sdktrace "go.opentelemetry.io/otel/sdk/trace" + oteltrace "go.opentelemetry.io/otel/trace" +) + +const meterName = "galaxy/gamemaster" + +const ( + defaultServiceName = "galaxy-gamemaster" + + processExporterNone = "none" + processExporterOTLP = "otlp" + processProtocolHTTPProtobuf = "http/protobuf" + processProtocolGRPC = "grpc" +) + +// ProcessConfig configures the process-wide OpenTelemetry runtime. +type ProcessConfig struct { + // ServiceName overrides the default OpenTelemetry service name. + ServiceName string + + // TracesExporter selects the external traces exporter. Supported values + // are `none` and `otlp`. + TracesExporter string + + // MetricsExporter selects the external metrics exporter. Supported + // values are `none` and `otlp`. + MetricsExporter string + + // TracesProtocol selects the OTLP traces protocol when TracesExporter is + // `otlp`. + TracesProtocol string + + // MetricsProtocol selects the OTLP metrics protocol when + // MetricsExporter is `otlp`. + MetricsProtocol string + + // StdoutTracesEnabled enables the additional stdout trace exporter used + // for local development and debugging. + StdoutTracesEnabled bool + + // StdoutMetricsEnabled enables the additional stdout metric exporter + // used for local development and debugging. + StdoutMetricsEnabled bool +} + +// Validate reports whether cfg contains a supported OpenTelemetry exporter +// configuration. +func (cfg ProcessConfig) Validate() error { + switch cfg.TracesExporter { + case processExporterNone, processExporterOTLP: + default: + return fmt.Errorf("unsupported traces exporter %q", cfg.TracesExporter) + } + + switch cfg.MetricsExporter { + case processExporterNone, processExporterOTLP: + default: + return fmt.Errorf("unsupported metrics exporter %q", cfg.MetricsExporter) + } + + if cfg.TracesProtocol != "" && cfg.TracesProtocol != processProtocolHTTPProtobuf && cfg.TracesProtocol != processProtocolGRPC { + return fmt.Errorf("unsupported OTLP traces protocol %q", cfg.TracesProtocol) + } + if cfg.MetricsProtocol != "" && cfg.MetricsProtocol != processProtocolHTTPProtobuf && cfg.MetricsProtocol != processProtocolGRPC { + return fmt.Errorf("unsupported OTLP metrics protocol %q", cfg.MetricsProtocol) + } + + return nil +} + +// Runtime owns the Game Master OpenTelemetry providers and the +// low-cardinality custom instruments listed in `gamemaster/README.md` +// §Observability. +type Runtime struct { + tracerProvider oteltrace.TracerProvider + meterProvider metric.MeterProvider + meter metric.Meter + + shutdownMu sync.Mutex + shutdownDone bool + shutdownErr error + shutdownFns []func(context.Context) error + + internalHTTPRequests metric.Int64Counter + internalHTTPDuration metric.Float64Histogram + + registerRuntimeOutcomes metric.Int64Counter + turnGenerationOutcomes metric.Int64Counter + commandExecuteOutcomes metric.Int64Counter + orderPutOutcomes metric.Int64Counter + reportGetOutcomes metric.Int64Counter + banishOutcomes metric.Int64Counter + healthEventsConsumed metric.Int64Counter + lobbyEventsPublished metric.Int64Counter + notificationPublishAttempts metric.Int64Counter + membershipCacheHits metric.Int64Counter + engineCallLatency metric.Float64Histogram + + runtimeRecordsByStatus metric.Int64ObservableGauge + schedulerDueGames metric.Int64ObservableGauge + engineVersionsTotal metric.Int64ObservableGauge + + gaugeMu sync.Mutex + gaugeRegistration metric.Registration +} + +// NewWithProviders constructs a telemetry runtime around explicitly supplied +// meterProvider and tracerProvider values. +func NewWithProviders(meterProvider metric.MeterProvider, tracerProvider oteltrace.TracerProvider) (*Runtime, error) { + if meterProvider == nil { + meterProvider = otel.GetMeterProvider() + } + if tracerProvider == nil { + tracerProvider = otel.GetTracerProvider() + } + if meterProvider == nil { + return nil, errors.New("new gamemaster telemetry runtime: nil meter provider") + } + if tracerProvider == nil { + return nil, errors.New("new gamemaster telemetry runtime: nil tracer provider") + } + + return buildRuntime(meterProvider, tracerProvider, nil) +} + +// NewProcess constructs the process-wide Game Master OpenTelemetry runtime +// from cfg, installs the resulting providers globally, and returns the +// runtime. +func NewProcess(ctx context.Context, cfg ProcessConfig, logger *slog.Logger) (*Runtime, error) { + if ctx == nil { + return nil, errors.New("new gamemaster telemetry process: nil context") + } + if err := cfg.Validate(); err != nil { + return nil, fmt.Errorf("new gamemaster telemetry process: %w", err) + } + if logger == nil { + logger = slog.Default() + } + + serviceName := strings.TrimSpace(cfg.ServiceName) + if serviceName == "" { + serviceName = defaultServiceName + } + + res := resource.NewSchemaless(attribute.String("service.name", serviceName)) + + tracerProvider, err := newTracerProvider(ctx, res, cfg) + if err != nil { + return nil, fmt.Errorf("new gamemaster telemetry process: tracer provider: %w", err) + } + meterProvider, err := newMeterProvider(ctx, res, cfg) + if err != nil { + return nil, fmt.Errorf("new gamemaster telemetry process: meter provider: %w", err) + } + + otel.SetTracerProvider(tracerProvider) + otel.SetMeterProvider(meterProvider) + otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator( + propagation.TraceContext{}, + propagation.Baggage{}, + )) + + runtime, err := buildRuntime(meterProvider, tracerProvider, []func(context.Context) error{ + meterProvider.Shutdown, + tracerProvider.Shutdown, + }) + if err != nil { + return nil, fmt.Errorf("new gamemaster telemetry process: runtime: %w", err) + } + + logger.Info("gamemaster telemetry configured", + "service_name", serviceName, + "traces_exporter", cfg.TracesExporter, + "metrics_exporter", cfg.MetricsExporter, + ) + + return runtime, nil +} + +// TracerProvider returns the runtime tracer provider. +func (runtime *Runtime) TracerProvider() oteltrace.TracerProvider { + if runtime == nil || runtime.tracerProvider == nil { + return otel.GetTracerProvider() + } + + return runtime.tracerProvider +} + +// MeterProvider returns the runtime meter provider. +func (runtime *Runtime) MeterProvider() metric.MeterProvider { + if runtime == nil || runtime.meterProvider == nil { + return otel.GetMeterProvider() + } + + return runtime.meterProvider +} + +// Shutdown flushes and stops the configured telemetry providers. Shutdown +// is idempotent. +func (runtime *Runtime) Shutdown(ctx context.Context) error { + if runtime == nil { + return nil + } + + runtime.shutdownMu.Lock() + if runtime.shutdownDone { + err := runtime.shutdownErr + runtime.shutdownMu.Unlock() + return err + } + runtime.shutdownDone = true + runtime.shutdownMu.Unlock() + + runtime.gaugeMu.Lock() + if runtime.gaugeRegistration != nil { + _ = runtime.gaugeRegistration.Unregister() + runtime.gaugeRegistration = nil + } + runtime.gaugeMu.Unlock() + + var shutdownErr error + for index := len(runtime.shutdownFns) - 1; index >= 0; index-- { + shutdownErr = errors.Join(shutdownErr, runtime.shutdownFns[index](ctx)) + } + + runtime.shutdownMu.Lock() + runtime.shutdownErr = shutdownErr + runtime.shutdownMu.Unlock() + + return shutdownErr +} + +// RecordInternalHTTPRequest records one internal HTTP request outcome. +func (runtime *Runtime) RecordInternalHTTPRequest(ctx context.Context, attrs []attribute.KeyValue, duration time.Duration) { + if runtime == nil { + return + } + + options := metric.WithAttributes(attrs...) + runtime.internalHTTPRequests.Add(normalizeContext(ctx), 1, options) + runtime.internalHTTPDuration.Record(normalizeContext(ctx), duration.Seconds()*1000, options) +} + +// RecordRegisterRuntimeOutcome records one terminal outcome of the +// register-runtime operation. +func (runtime *Runtime) RecordRegisterRuntimeOutcome(ctx context.Context, outcome, errorCode string) { + if runtime == nil || runtime.registerRuntimeOutcomes == nil { + return + } + runtime.registerRuntimeOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + )) +} + +// RecordTurnGenerationOutcome records one terminal outcome of a turn +// generation. trigger is `scheduler` or `force`. +func (runtime *Runtime) RecordTurnGenerationOutcome(ctx context.Context, outcome, errorCode, trigger string) { + if runtime == nil || runtime.turnGenerationOutcomes == nil { + return + } + runtime.turnGenerationOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + attribute.String("trigger", trigger), + )) +} + +// RecordCommandExecuteOutcome records one terminal outcome of a command +// execute call. +func (runtime *Runtime) RecordCommandExecuteOutcome(ctx context.Context, outcome, errorCode string) { + if runtime == nil || runtime.commandExecuteOutcomes == nil { + return + } + runtime.commandExecuteOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + )) +} + +// RecordOrderPutOutcome records one terminal outcome of an order put call. +func (runtime *Runtime) RecordOrderPutOutcome(ctx context.Context, outcome, errorCode string) { + if runtime == nil || runtime.orderPutOutcomes == nil { + return + } + runtime.orderPutOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + )) +} + +// RecordReportGetOutcome records one terminal outcome of a report get +// call. +func (runtime *Runtime) RecordReportGetOutcome(ctx context.Context, outcome, errorCode string) { + if runtime == nil || runtime.reportGetOutcomes == nil { + return + } + runtime.reportGetOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + )) +} + +// RecordBanishOutcome records one terminal outcome of a banish call. +func (runtime *Runtime) RecordBanishOutcome(ctx context.Context, outcome, errorCode string) { + if runtime == nil || runtime.banishOutcomes == nil { + return + } + runtime.banishOutcomes.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("outcome", outcome), + attribute.String("error_code", errorCode), + )) +} + +// RecordHealthEventConsumed records one consumed `runtime:health_events` +// entry. +func (runtime *Runtime) RecordHealthEventConsumed(ctx context.Context) { + if runtime == nil || runtime.healthEventsConsumed == nil { + return + } + runtime.healthEventsConsumed.Add(normalizeContext(ctx), 1) +} + +// RecordLobbyEventPublished records one publication on `gm:lobby_events`. +// eventType is `runtime_snapshot_update` or `game_finished`. +func (runtime *Runtime) RecordLobbyEventPublished(ctx context.Context, eventType string) { + if runtime == nil || runtime.lobbyEventsPublished == nil { + return + } + runtime.lobbyEventsPublished.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("event_type", eventType), + )) +} + +// RecordNotificationPublishAttempt records one publication attempt to +// `notification:intents`. result is `ok` or `error`. +func (runtime *Runtime) RecordNotificationPublishAttempt(ctx context.Context, notificationType, result string) { + if runtime == nil || runtime.notificationPublishAttempts == nil { + return + } + runtime.notificationPublishAttempts.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("notification_type", notificationType), + attribute.String("result", result), + )) +} + +// RecordMembershipCacheResult records one membership cache lookup outcome. +// result is `hit`, `miss`, or `invalidate`. +func (runtime *Runtime) RecordMembershipCacheResult(ctx context.Context, result string) { + if runtime == nil || runtime.membershipCacheHits == nil { + return + } + runtime.membershipCacheHits.Add(normalizeContext(ctx), 1, metric.WithAttributes( + attribute.String("result", result), + )) +} + +// RecordEngineCall records the wall-clock duration of one engine HTTP +// call. op is one of `init`, `status`, `turn`, `banish`, `command`, +// `order`, `report`. +func (runtime *Runtime) RecordEngineCall(ctx context.Context, op string, duration time.Duration) { + if runtime == nil || runtime.engineCallLatency == nil { + return + } + runtime.engineCallLatency.Record(normalizeContext(ctx), duration.Seconds()*1000, metric.WithAttributes( + attribute.String("op", op), + )) +} + +// RuntimeRecordsByStatusProbe reports the number of `runtime_records` +// rows per status. The production probe wraps the runtime record store; +// tests may pass a stub. +type RuntimeRecordsByStatusProbe interface { + CountByStatus(ctx context.Context) (map[string]int, error) +} + +// SchedulerDueGamesProbe reports how many runtime records are currently +// due for a scheduler-driven turn generation. +type SchedulerDueGamesProbe interface { + CountDue(ctx context.Context) (int, error) +} + +// EngineVersionsTotalProbe reports how many engine_versions rows are +// registered. +type EngineVersionsTotalProbe interface { + CountVersions(ctx context.Context) (int, error) +} + +// GaugeDependencies groups the collaborators required by RegisterGauges. +type GaugeDependencies struct { + // RuntimeRecordsByStatus probes the per-status row count for + // `gamemaster.runtime_records_by_status`. + RuntimeRecordsByStatus RuntimeRecordsByStatusProbe + + // SchedulerDueGames probes the due-now count for + // `gamemaster.scheduler.due_games`. + SchedulerDueGames SchedulerDueGamesProbe + + // EngineVersionsTotal probes the engine_versions row count for + // `gamemaster.engine_versions_total`. + EngineVersionsTotal EngineVersionsTotalProbe + + // Logger records non-fatal probe errors. Defaults to slog.Default + // when nil. + Logger *slog.Logger +} + +// RegisterGauges installs the observable-gauge callback that reports +// `gamemaster.runtime_records_by_status`, +// `gamemaster.scheduler.due_games`, and +// `gamemaster.engine_versions_total`. It is safe to call once per +// Runtime; a second call replaces the previous registration. The runtime +// keeps no strong reference to deps beyond the callback closure. +// +// The wiring layer registers the gauges once the persistence adapters +// and scheduler probe are constructed. +func (runtime *Runtime) RegisterGauges(deps GaugeDependencies) error { + if runtime == nil { + return errors.New("register gamemaster gauges: nil runtime") + } + if deps.RuntimeRecordsByStatus == nil { + return errors.New("register gamemaster gauges: nil runtime records probe") + } + if deps.SchedulerDueGames == nil { + return errors.New("register gamemaster gauges: nil scheduler probe") + } + if deps.EngineVersionsTotal == nil { + return errors.New("register gamemaster gauges: nil engine versions probe") + } + + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + + runtime.gaugeMu.Lock() + defer runtime.gaugeMu.Unlock() + + if runtime.gaugeRegistration != nil { + _ = runtime.gaugeRegistration.Unregister() + runtime.gaugeRegistration = nil + } + + callback := func(ctx context.Context, observer metric.Observer) error { + if counts, err := deps.RuntimeRecordsByStatus.CountByStatus(ctx); err != nil { + logger.WarnContext(ctx, "runtime records probe failed", + "err", err.Error(), + ) + } else { + for status, count := range counts { + observer.ObserveInt64(runtime.runtimeRecordsByStatus, int64(count), metric.WithAttributes( + attribute.String("status", status), + )) + } + } + + if due, err := deps.SchedulerDueGames.CountDue(ctx); err != nil { + logger.WarnContext(ctx, "scheduler due games probe failed", + "err", err.Error(), + ) + } else { + observer.ObserveInt64(runtime.schedulerDueGames, int64(due)) + } + + if versions, err := deps.EngineVersionsTotal.CountVersions(ctx); err != nil { + logger.WarnContext(ctx, "engine versions probe failed", + "err", err.Error(), + ) + } else { + observer.ObserveInt64(runtime.engineVersionsTotal, int64(versions)) + } + + return nil + } + + registration, err := runtime.meter.RegisterCallback(callback, + runtime.runtimeRecordsByStatus, + runtime.schedulerDueGames, + runtime.engineVersionsTotal, + ) + if err != nil { + return fmt.Errorf("register gamemaster gauges: %w", err) + } + runtime.gaugeRegistration = registration + + return nil +} + +func buildRuntime(meterProvider metric.MeterProvider, tracerProvider oteltrace.TracerProvider, shutdownFns []func(context.Context) error) (*Runtime, error) { + meter := meterProvider.Meter(meterName) + runtime := &Runtime{ + tracerProvider: tracerProvider, + meterProvider: meterProvider, + meter: meter, + shutdownFns: append([]func(context.Context) error(nil), shutdownFns...), + } + + internalHTTPRequests, err := meter.Int64Counter("gamemaster.internal_http.requests") + if err != nil { + return nil, fmt.Errorf("build gamemaster telemetry runtime: internal_http.requests: %w", err) + } + internalHTTPDuration, err := meter.Float64Histogram("gamemaster.internal_http.duration", metric.WithUnit("ms")) + if err != nil { + return nil, fmt.Errorf("build gamemaster telemetry runtime: internal_http.duration: %w", err) + } + runtime.internalHTTPRequests = internalHTTPRequests + runtime.internalHTTPDuration = internalHTTPDuration + + if err := registerCounters(meter, runtime); err != nil { + return nil, err + } + if err := registerHistograms(meter, runtime); err != nil { + return nil, err + } + if err := registerObservableGauges(meter, runtime); err != nil { + return nil, err + } + + return runtime, nil +} + +func registerCounters(meter metric.Meter, runtime *Runtime) error { + specs := []struct { + name string + target *metric.Int64Counter + }{ + {"gamemaster.register_runtime.outcomes", &runtime.registerRuntimeOutcomes}, + {"gamemaster.turn_generation.outcomes", &runtime.turnGenerationOutcomes}, + {"gamemaster.command_execute.outcomes", &runtime.commandExecuteOutcomes}, + {"gamemaster.order_put.outcomes", &runtime.orderPutOutcomes}, + {"gamemaster.report_get.outcomes", &runtime.reportGetOutcomes}, + {"gamemaster.banish.outcomes", &runtime.banishOutcomes}, + {"gamemaster.health_events.consumed", &runtime.healthEventsConsumed}, + {"gamemaster.lobby_events.published", &runtime.lobbyEventsPublished}, + {"gamemaster.notification.publish_attempts", &runtime.notificationPublishAttempts}, + {"gamemaster.membership_cache.hits", &runtime.membershipCacheHits}, + } + for _, spec := range specs { + counter, err := meter.Int64Counter(spec.name) + if err != nil { + return fmt.Errorf("build gamemaster telemetry runtime: %s: %w", spec.name, err) + } + *spec.target = counter + } + return nil +} + +func registerHistograms(meter metric.Meter, runtime *Runtime) error { + specs := []struct { + name string + unit string + target *metric.Float64Histogram + }{ + {"gamemaster.engine_call.latency", "ms", &runtime.engineCallLatency}, + } + for _, spec := range specs { + options := []metric.Float64HistogramOption{} + if spec.unit != "" { + options = append(options, metric.WithUnit(spec.unit)) + } + histogram, err := meter.Float64Histogram(spec.name, options...) + if err != nil { + return fmt.Errorf("build gamemaster telemetry runtime: %s: %w", spec.name, err) + } + *spec.target = histogram + } + return nil +} + +func registerObservableGauges(meter metric.Meter, runtime *Runtime) error { + gauge, err := meter.Int64ObservableGauge("gamemaster.runtime_records_by_status") + if err != nil { + return fmt.Errorf("build gamemaster telemetry runtime: runtime_records_by_status: %w", err) + } + runtime.runtimeRecordsByStatus = gauge + + due, err := meter.Int64ObservableGauge("gamemaster.scheduler.due_games") + if err != nil { + return fmt.Errorf("build gamemaster telemetry runtime: scheduler.due_games: %w", err) + } + runtime.schedulerDueGames = due + + versions, err := meter.Int64ObservableGauge("gamemaster.engine_versions_total") + if err != nil { + return fmt.Errorf("build gamemaster telemetry runtime: engine_versions_total: %w", err) + } + runtime.engineVersionsTotal = versions + + return nil +} + +func newTracerProvider(ctx context.Context, res *resource.Resource, cfg ProcessConfig) (*sdktrace.TracerProvider, error) { + options := []sdktrace.TracerProviderOption{ + sdktrace.WithResource(res), + } + + if exporter, err := traceExporter(ctx, cfg); err != nil { + return nil, err + } else if exporter != nil { + options = append(options, sdktrace.WithBatcher(exporter)) + } + + if cfg.StdoutTracesEnabled { + exporter, err := stdouttrace.New(stdouttrace.WithWriter(os.Stdout)) + if err != nil { + return nil, fmt.Errorf("stdout traces exporter: %w", err) + } + options = append(options, sdktrace.WithBatcher(exporter)) + } + + return sdktrace.NewTracerProvider(options...), nil +} + +func newMeterProvider(ctx context.Context, res *resource.Resource, cfg ProcessConfig) (*sdkmetric.MeterProvider, error) { + options := []sdkmetric.Option{ + sdkmetric.WithResource(res), + } + + if exporter, err := metricExporter(ctx, cfg); err != nil { + return nil, err + } else if exporter != nil { + options = append(options, sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exporter))) + } + + if cfg.StdoutMetricsEnabled { + exporter, err := stdoutmetric.New(stdoutmetric.WithWriter(os.Stdout)) + if err != nil { + return nil, fmt.Errorf("stdout metrics exporter: %w", err) + } + options = append(options, sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exporter))) + } + + return sdkmetric.NewMeterProvider(options...), nil +} + +func traceExporter(ctx context.Context, cfg ProcessConfig) (sdktrace.SpanExporter, error) { + if cfg.TracesExporter != processExporterOTLP { + return nil, nil + } + + switch normalizeProtocol(cfg.TracesProtocol) { + case processProtocolGRPC: + exporter, err := otlptracegrpc.New(ctx) + if err != nil { + return nil, fmt.Errorf("otlp grpc traces exporter: %w", err) + } + return exporter, nil + default: + exporter, err := otlptracehttp.New(ctx) + if err != nil { + return nil, fmt.Errorf("otlp http traces exporter: %w", err) + } + return exporter, nil + } +} + +func metricExporter(ctx context.Context, cfg ProcessConfig) (sdkmetric.Exporter, error) { + if cfg.MetricsExporter != processExporterOTLP { + return nil, nil + } + + switch normalizeProtocol(cfg.MetricsProtocol) { + case processProtocolGRPC: + exporter, err := otlpmetricgrpc.New(ctx) + if err != nil { + return nil, fmt.Errorf("otlp grpc metrics exporter: %w", err) + } + return exporter, nil + default: + exporter, err := otlpmetrichttp.New(ctx) + if err != nil { + return nil, fmt.Errorf("otlp http metrics exporter: %w", err) + } + return exporter, nil + } +} + +func normalizeProtocol(value string) string { + switch strings.TrimSpace(value) { + case processProtocolGRPC: + return processProtocolGRPC + default: + return processProtocolHTTPProtobuf + } +} + +func normalizeContext(ctx context.Context) context.Context { + if ctx == nil { + return context.Background() + } + + return ctx +} diff --git a/gamemaster/internal/telemetry/runtime_test.go b/gamemaster/internal/telemetry/runtime_test.go new file mode 100644 index 0000000..2307228 --- /dev/null +++ b/gamemaster/internal/telemetry/runtime_test.go @@ -0,0 +1,190 @@ +package telemetry + +import ( + "context" + "testing" + "time" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/sdk/metric" + "go.opentelemetry.io/otel/sdk/metric/metricdata" +) + +func TestProcessConfigValidate(t *testing.T) { + t.Parallel() + + require.NoError(t, ProcessConfig{ + TracesExporter: "none", + MetricsExporter: "none", + }.Validate()) + + require.NoError(t, ProcessConfig{ + TracesExporter: "otlp", + MetricsExporter: "otlp", + TracesProtocol: "grpc", + MetricsProtocol: "http/protobuf", + }.Validate()) + + require.Error(t, ProcessConfig{ + TracesExporter: "stdout", + MetricsExporter: "none", + }.Validate()) + + require.Error(t, ProcessConfig{ + TracesExporter: "none", + MetricsExporter: "kafka", + }.Validate()) + + require.Error(t, ProcessConfig{ + TracesExporter: "otlp", + MetricsExporter: "none", + TracesProtocol: "thrift", + }.Validate()) +} + +func TestNewWithProvidersBuildsRuntime(t *testing.T) { + t.Parallel() + + reader := metric.NewManualReader() + meterProvider := metric.NewMeterProvider(metric.WithReader(reader)) + + runtime, err := NewWithProviders(meterProvider, nil) + require.NoError(t, err) + require.NotNil(t, runtime) + require.NotNil(t, runtime.MeterProvider()) + require.NotNil(t, runtime.TracerProvider()) +} + +func TestRecordHelpersEmitInstruments(t *testing.T) { + t.Parallel() + + reader := metric.NewManualReader() + meterProvider := metric.NewMeterProvider(metric.WithReader(reader)) + runtime, err := NewWithProviders(meterProvider, nil) + require.NoError(t, err) + + ctx := context.Background() + + runtime.RecordInternalHTTPRequest(ctx, []attribute.KeyValue{ + attribute.String("route", "/healthz"), + attribute.String("method", "GET"), + attribute.String("status_code", "200"), + }, 10*time.Millisecond) + runtime.RecordRegisterRuntimeOutcome(ctx, "success", "") + runtime.RecordTurnGenerationOutcome(ctx, "success", "", "scheduler") + runtime.RecordCommandExecuteOutcome(ctx, "success", "") + runtime.RecordOrderPutOutcome(ctx, "success", "") + runtime.RecordReportGetOutcome(ctx, "success", "") + runtime.RecordBanishOutcome(ctx, "success", "") + runtime.RecordHealthEventConsumed(ctx) + runtime.RecordLobbyEventPublished(ctx, "runtime_snapshot_update") + runtime.RecordNotificationPublishAttempt(ctx, "game.turn.ready", "ok") + runtime.RecordMembershipCacheResult(ctx, "hit") + runtime.RecordEngineCall(ctx, "init", 25*time.Millisecond) + + var rm metricdata.ResourceMetrics + require.NoError(t, reader.Collect(ctx, &rm)) + + names := collectInstrumentNames(rm) + expected := []string{ + "gamemaster.internal_http.requests", + "gamemaster.internal_http.duration", + "gamemaster.register_runtime.outcomes", + "gamemaster.turn_generation.outcomes", + "gamemaster.command_execute.outcomes", + "gamemaster.order_put.outcomes", + "gamemaster.report_get.outcomes", + "gamemaster.banish.outcomes", + "gamemaster.health_events.consumed", + "gamemaster.lobby_events.published", + "gamemaster.notification.publish_attempts", + "gamemaster.membership_cache.hits", + "gamemaster.engine_call.latency", + } + for _, name := range expected { + require.Contains(t, names, name, "expected instrument %s to be recorded", name) + } +} + +func collectInstrumentNames(rm metricdata.ResourceMetrics) map[string]struct{} { + names := make(map[string]struct{}) + for _, sm := range rm.ScopeMetrics { + for _, m := range sm.Metrics { + names[m.Name] = struct{}{} + } + } + return names +} + +type stubRuntimeProbe struct { + counts map[string]int + err error +} + +func (probe stubRuntimeProbe) CountByStatus(_ context.Context) (map[string]int, error) { + return probe.counts, probe.err +} + +type stubSchedulerProbe struct { + due int + err error +} + +func (probe stubSchedulerProbe) CountDue(_ context.Context) (int, error) { + return probe.due, probe.err +} + +type stubVersionsProbe struct { + count int + err error +} + +func (probe stubVersionsProbe) CountVersions(_ context.Context) (int, error) { + return probe.count, probe.err +} + +func TestRegisterGaugesEmitsObservations(t *testing.T) { + t.Parallel() + + reader := metric.NewManualReader() + meterProvider := metric.NewMeterProvider(metric.WithReader(reader)) + runtime, err := NewWithProviders(meterProvider, nil) + require.NoError(t, err) + + require.NoError(t, runtime.RegisterGauges(GaugeDependencies{ + RuntimeRecordsByStatus: stubRuntimeProbe{counts: map[string]int{"running": 3}}, + SchedulerDueGames: stubSchedulerProbe{due: 2}, + EngineVersionsTotal: stubVersionsProbe{count: 5}, + })) + + var rm metricdata.ResourceMetrics + require.NoError(t, reader.Collect(context.Background(), &rm)) + + names := collectInstrumentNames(rm) + require.Contains(t, names, "gamemaster.runtime_records_by_status") + require.Contains(t, names, "gamemaster.scheduler.due_games") + require.Contains(t, names, "gamemaster.engine_versions_total") +} + +func TestRegisterGaugesRejectsNilDependencies(t *testing.T) { + t.Parallel() + + reader := metric.NewManualReader() + meterProvider := metric.NewMeterProvider(metric.WithReader(reader)) + runtime, err := NewWithProviders(meterProvider, nil) + require.NoError(t, err) + + require.Error(t, runtime.RegisterGauges(GaugeDependencies{ + SchedulerDueGames: stubSchedulerProbe{}, + EngineVersionsTotal: stubVersionsProbe{}, + })) + require.Error(t, runtime.RegisterGauges(GaugeDependencies{ + RuntimeRecordsByStatus: stubRuntimeProbe{}, + EngineVersionsTotal: stubVersionsProbe{}, + })) + require.Error(t, runtime.RegisterGauges(GaugeDependencies{ + RuntimeRecordsByStatus: stubRuntimeProbe{}, + SchedulerDueGames: stubSchedulerProbe{}, + })) +} diff --git a/gamemaster/internal/worker/healtheventsconsumer/worker.go b/gamemaster/internal/worker/healtheventsconsumer/worker.go new file mode 100644 index 0000000..9977c24 --- /dev/null +++ b/gamemaster/internal/worker/healtheventsconsumer/worker.go @@ -0,0 +1,556 @@ +// Package healtheventsconsumer implements the worker that consumes +// `runtime:health_events` from Runtime Manager and propagates engine +// health observations into Game Master state. +// +// On every consumed entry the worker: +// +// 1. Updates `runtime_records.engine_health` per game with a short +// summary string (`healthy`, `probe_failed`, `inspect_unhealthy`, +// `exited`, `oom`, `disappeared`). +// 2. For terminal container events (`container_exited`, +// `container_oom`, `container_disappeared`) attempts a +// compare-and-swap `running → engine_unreachable`. For +// `probe_recovered` attempts the symmetric recovery CAS +// `engine_unreachable → running`. Both transitions are pre-declared +// in `domain/runtime/transitions.go`. CAS conflicts (record not in +// the expected source status) fall back to a health-only update so +// the summary stays current even when another flow (turn +// generation, admin op) holds the status. +// 3. Publishes a `runtime_snapshot_update` on `gm:lobby_events` only +// when the status transitioned or when the engine-health summary +// differs from the previously emitted one for the same game. The +// last-emitted summary is tracked in process memory; on restart +// the cache is empty and the first event per game produces one +// snapshot. +// +// The XREAD loop, offset handling, and shutdown semantics mirror the +// Lobby `gmevents` consumer at `lobby/internal/worker/gmevents`. +package healtheventsconsumer + +import ( + "context" + "errors" + "fmt" + "log/slog" + "strconv" + "strings" + "sync" + "time" + + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" + + "github.com/redis/go-redis/v9" +) + +// Wire field names on the `runtime:health_events` Redis Stream entry, +// fixed by `rtmanager/api/runtime-health-asyncapi.yaml`. Renaming any +// of them breaks the contract. +const ( + fieldGameID = "game_id" + fieldEventType = "event_type" + fieldOccurredAtMS = "occurred_at_ms" +) + +// RTM event-type values per +// `rtmanager/internal/domain/health/snapshot.go`. Stage 18 maps all +// seven (the PLAN enumerates six; container_started and +// probe_recovered are added here). +const ( + eventTypeContainerStarted = "container_started" + eventTypeProbeRecovered = "probe_recovered" + eventTypeProbeFailed = "probe_failed" + eventTypeInspectUnhealthy = "inspect_unhealthy" + eventTypeContainerExited = "container_exited" + eventTypeContainerOOM = "container_oom" + eventTypeContainerDisappeared = "container_disappeared" +) + +// engine_health summary strings written to `runtime_records.engine_health`. +const ( + summaryHealthy = "healthy" + summaryProbeFailed = "probe_failed" + summaryInspectUnhealthy = "inspect_unhealthy" + summaryExited = "exited" + summaryOOM = "oom" + summaryDisappeared = "disappeared" +) + +// snapshotEventType is the discriminator written by +// `LobbyEventsPublisher.PublishSnapshotUpdate` and recorded on the +// `gamemaster.lobby_events.published` counter. +const snapshotEventType = "runtime_snapshot_update" + +// Dependencies groups the collaborators required by Worker. +type Dependencies struct { + // Client provides XREAD access to the runtime:health_events stream. + Client *redis.Client + + // Stream stores the Redis Streams key consumed by the worker + // (typically `runtime:health_events`). + Stream string + + // StreamLabel identifies the consumer in the stream-offset store. + // Defaults to `health_events` when empty. + StreamLabel string + + // BlockTimeout bounds the blocking XREAD window. Required positive. + BlockTimeout time.Duration + + // OffsetStore persists the last successfully processed entry id. + OffsetStore ports.StreamOffsetStore + + // RuntimeRecords is mutated on every observation. + RuntimeRecords ports.RuntimeRecordStore + + // LobbyEvents publishes the debounced `runtime_snapshot_update` + // messages that propagate health summary changes to Game Lobby. + LobbyEvents ports.LobbyEventsPublisher + + // Telemetry receives one consumed-event count per processed entry + // and one published-event count per emitted snapshot. Required. + Telemetry *telemetry.Runtime + + // Clock supplies the wall-clock used for store updates and for + // `RuntimeSnapshotUpdate.OccurredAt`. Defaults to `time.Now` when + // nil. + Clock func() time.Time + + // Logger receives structured worker-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger +} + +// defaultStreamLabel is used when Dependencies.StreamLabel is empty. +const defaultStreamLabel = "health_events" + +// Worker drives the runtime:health_events processing loop. +type Worker struct { + client *redis.Client + stream string + streamLabel string + blockTimeout time.Duration + offsetStore ports.StreamOffsetStore + runtimeRecords ports.RuntimeRecordStore + lobbyEvents ports.LobbyEventsPublisher + telemetry *telemetry.Runtime + clock func() time.Time + logger *slog.Logger + + mu sync.RWMutex + lastEmittedSummary map[string]string +} + +// NewWorker constructs one Worker from deps. +func NewWorker(deps Dependencies) (*Worker, error) { + switch { + case deps.Client == nil: + return nil, errors.New("new health events consumer: nil redis client") + case strings.TrimSpace(deps.Stream) == "": + return nil, errors.New("new health events consumer: stream must not be empty") + case deps.BlockTimeout <= 0: + return nil, errors.New("new health events consumer: block timeout must be positive") + case deps.OffsetStore == nil: + return nil, errors.New("new health events consumer: nil offset store") + case deps.RuntimeRecords == nil: + return nil, errors.New("new health events consumer: nil runtime records store") + case deps.LobbyEvents == nil: + return nil, errors.New("new health events consumer: nil lobby events publisher") + case deps.Telemetry == nil: + return nil, errors.New("new health events consumer: nil telemetry runtime") + } + + streamLabel := strings.TrimSpace(deps.StreamLabel) + if streamLabel == "" { + streamLabel = defaultStreamLabel + } + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + + return &Worker{ + client: deps.Client, + stream: deps.Stream, + streamLabel: streamLabel, + blockTimeout: deps.BlockTimeout, + offsetStore: deps.OffsetStore, + runtimeRecords: deps.RuntimeRecords, + lobbyEvents: deps.LobbyEvents, + telemetry: deps.Telemetry, + clock: clock, + logger: logger.With("worker", "gamemaster.healtheventsconsumer", "stream", deps.Stream), + lastEmittedSummary: make(map[string]string), + }, nil +} + +// Run drives the XREAD loop until ctx is cancelled. The offset advances +// only after a successful HandleMessage call. The loop exits on context +// cancellation or a fatal Redis error. +func (worker *Worker) Run(ctx context.Context) error { + if worker == nil { + return errors.New("run health events consumer: nil worker") + } + if ctx == nil { + return errors.New("run health events consumer: nil context") + } + if err := ctx.Err(); err != nil { + return err + } + + lastID, found, err := worker.offsetStore.Load(ctx, worker.streamLabel) + if err != nil { + return fmt.Errorf("run health events consumer: load offset: %w", err) + } + if !found { + lastID = "0-0" + } + + worker.logger.Info("health events consumer started", + "block_timeout", worker.blockTimeout.String(), + "start_entry_id", lastID, + ) + defer worker.logger.Info("health events consumer stopped") + + for { + streams, err := worker.client.XRead(ctx, &redis.XReadArgs{ + Streams: []string{worker.stream, lastID}, + Count: 1, + Block: worker.blockTimeout, + }).Result() + switch { + case err == nil: + for _, stream := range streams { + for _, message := range stream.Messages { + if !worker.HandleMessage(ctx, message) { + continue + } + if err := worker.offsetStore.Save(ctx, worker.streamLabel, message.ID); err != nil { + return fmt.Errorf("run health events consumer: save offset: %w", err) + } + lastID = message.ID + } + } + case errors.Is(err, redis.Nil): + continue + case ctx.Err() != nil && (errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) || errors.Is(err, redis.ErrClosed)): + return ctx.Err() + case errors.Is(err, context.Canceled), errors.Is(err, context.DeadlineExceeded), errors.Is(err, redis.ErrClosed): + return fmt.Errorf("run health events consumer: %w", err) + default: + return fmt.Errorf("run health events consumer: %w", err) + } + } +} + +// Shutdown is a no-op; the worker relies on context cancellation. +func (worker *Worker) Shutdown(ctx context.Context) error { + if ctx == nil { + return errors.New("shutdown health events consumer: nil context") + } + return nil +} + +// HandleMessage processes one Redis Stream entry and reports whether +// the offset is allowed to advance. Decode errors and orphan game ids +// return true so the offset advances past the entry; only fatal store +// or publisher failures return false (currently never — every error is +// logged and absorbed, the offset always advances after the entry has +// been observed). +// +// Exported so tests can drive the worker deterministically without +// spinning up a real XREAD loop. +func (worker *Worker) HandleMessage(ctx context.Context, message redis.XMessage) bool { + if worker == nil { + return false + } + + event, err := decodeEvent(message) + if err != nil { + worker.logger.WarnContext(ctx, "decode runtime health event", + "stream_entry_id", message.ID, + "err", err.Error(), + ) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + + plan, ok := planFor(event.EventType) + if !ok { + worker.logger.WarnContext(ctx, "unknown runtime health event type", + "stream_entry_id", message.ID, + "game_id", event.GameID, + "event_type", event.EventType, + ) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + + now := worker.clock().UTC() + + current, err := worker.runtimeRecords.Get(ctx, event.GameID) + if err != nil { + if errors.Is(err, runtime.ErrNotFound) { + worker.logger.WarnContext(ctx, "runtime health event for unknown game", + "stream_entry_id", message.ID, + "game_id", event.GameID, + "event_type", event.EventType, + ) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + worker.logger.WarnContext(ctx, "load runtime record for health event", + "stream_entry_id", message.ID, + "game_id", event.GameID, + "err", err.Error(), + ) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + + statusChanged := worker.applyMutation(ctx, message.ID, current, plan, now) + + if !worker.shouldPublish(event.GameID, plan.summary, statusChanged) { + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + + refreshed, err := worker.runtimeRecords.Get(ctx, event.GameID) + if err != nil { + worker.logger.WarnContext(ctx, "reload runtime record for snapshot", + "stream_entry_id", message.ID, + "game_id", event.GameID, + "err", err.Error(), + ) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + + snapshot := ports.RuntimeSnapshotUpdate{ + GameID: refreshed.GameID, + CurrentTurn: refreshed.CurrentTurn, + RuntimeStatus: refreshed.Status, + EngineHealthSummary: refreshed.EngineHealth, + PlayerTurnStats: nil, + OccurredAt: now, + } + if err := worker.lobbyEvents.PublishSnapshotUpdate(ctx, snapshot); err != nil { + logArgs := []any{ + "stream_entry_id", message.ID, + "game_id", event.GameID, + "err", err.Error(), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + worker.logger.WarnContext(ctx, "publish runtime snapshot update", logArgs...) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true + } + worker.telemetry.RecordLobbyEventPublished(ctx, snapshotEventType) + worker.rememberSummary(event.GameID, plan.summary) + worker.telemetry.RecordHealthEventConsumed(ctx) + return true +} + +// applyMutation applies the plan to the runtime record. When plan.transition +// is set, the worker first attempts a CAS UpdateStatus from the expected +// source status; on conflict or invalid-transition it falls back to a +// health-only UpdateEngineHealth. When plan.transition is nil only +// UpdateEngineHealth runs. Returns true when the status was actually +// transitioned. +func (worker *Worker) applyMutation( + ctx context.Context, + entryID string, + current runtime.RuntimeRecord, + plan eventPlan, + now time.Time, +) bool { + if plan.transition != nil { + summary := plan.summary + err := worker.runtimeRecords.UpdateStatus(ctx, ports.UpdateStatusInput{ + GameID: current.GameID, + ExpectedFrom: plan.transition.from, + To: plan.transition.to, + Now: now, + EngineHealthSummary: &summary, + }) + switch { + case err == nil: + worker.logger.InfoContext(ctx, "runtime status transitioned by health event", + "stream_entry_id", entryID, + "game_id", current.GameID, + "from_status", string(plan.transition.from), + "to_status", string(plan.transition.to), + "engine_health", plan.summary, + ) + return true + case errors.Is(err, runtime.ErrConflict), errors.Is(err, runtime.ErrInvalidTransition): + worker.logger.DebugContext(ctx, "runtime status CAS conflict, falling back to health-only update", + "stream_entry_id", entryID, + "game_id", current.GameID, + "current_status", string(current.Status), + "expected_from", string(plan.transition.from), + "engine_health", plan.summary, + ) + default: + worker.logger.WarnContext(ctx, "update runtime status from health event", + "stream_entry_id", entryID, + "game_id", current.GameID, + "err", err.Error(), + ) + return false + } + } + + if err := worker.runtimeRecords.UpdateEngineHealth(ctx, ports.UpdateEngineHealthInput{ + GameID: current.GameID, + EngineHealthSummary: plan.summary, + Now: now, + }); err != nil && !errors.Is(err, runtime.ErrNotFound) { + worker.logger.WarnContext(ctx, "update runtime engine health", + "stream_entry_id", entryID, + "game_id", current.GameID, + "err", err.Error(), + ) + } + return false +} + +// shouldPublish returns whether a snapshot must be emitted: either the +// status changed in this iteration, or the engine_health summary +// differs from the last summary published for this game. +func (worker *Worker) shouldPublish(gameID, summary string, statusChanged bool) bool { + if statusChanged { + return true + } + worker.mu.RLock() + last, ok := worker.lastEmittedSummary[gameID] + worker.mu.RUnlock() + if !ok { + return true + } + return last != summary +} + +// rememberSummary stores the latest published summary for gameID. +func (worker *Worker) rememberSummary(gameID, summary string) { + worker.mu.Lock() + worker.lastEmittedSummary[gameID] = summary + worker.mu.Unlock() +} + +// healthEvent stores the decoded XADD entry shared across handlers. +type healthEvent struct { + GameID string + EventType string + OccurredAt time.Time +} + +// decodeEvent parses a Redis Stream message into a healthEvent. Missing +// or malformed required fields produce an error. +func decodeEvent(message redis.XMessage) (healthEvent, error) { + gameID := optionalString(message.Values, fieldGameID) + if strings.TrimSpace(gameID) == "" { + return healthEvent{}, errors.New("missing game_id") + } + eventType := optionalString(message.Values, fieldEventType) + if strings.TrimSpace(eventType) == "" { + return healthEvent{}, errors.New("missing event_type") + } + occurredAtMSRaw := optionalString(message.Values, fieldOccurredAtMS) + if strings.TrimSpace(occurredAtMSRaw) == "" { + return healthEvent{}, errors.New("missing occurred_at_ms") + } + occurredAtMS, err := strconv.ParseInt(occurredAtMSRaw, 10, 64) + if err != nil { + return healthEvent{}, fmt.Errorf("invalid occurred_at_ms: %w", err) + } + if occurredAtMS <= 0 { + return healthEvent{}, errors.New("invalid occurred_at_ms: must be positive") + } + return healthEvent{ + GameID: gameID, + EventType: eventType, + OccurredAt: time.UnixMilli(occurredAtMS).UTC(), + }, nil +} + +// transitionPlan encodes one allowed CAS pair. nil-transition events +// only update the summary. +type transitionPlan struct { + from runtime.Status + to runtime.Status +} + +// eventPlan is the decoded reaction to one event_type. +type eventPlan struct { + summary string + transition *transitionPlan +} + +// planFor returns the eventPlan registered for eventType. The boolean +// reports whether the type is recognised. +func planFor(eventType string) (eventPlan, bool) { + switch eventType { + case eventTypeContainerStarted: + return eventPlan{summary: summaryHealthy}, true + case eventTypeProbeRecovered: + return eventPlan{ + summary: summaryHealthy, + transition: &transitionPlan{ + from: runtime.StatusEngineUnreachable, + to: runtime.StatusRunning, + }, + }, true + case eventTypeProbeFailed: + return eventPlan{summary: summaryProbeFailed}, true + case eventTypeInspectUnhealthy: + return eventPlan{summary: summaryInspectUnhealthy}, true + case eventTypeContainerExited: + return eventPlan{ + summary: summaryExited, + transition: &transitionPlan{ + from: runtime.StatusRunning, + to: runtime.StatusEngineUnreachable, + }, + }, true + case eventTypeContainerOOM: + return eventPlan{ + summary: summaryOOM, + transition: &transitionPlan{ + from: runtime.StatusRunning, + to: runtime.StatusEngineUnreachable, + }, + }, true + case eventTypeContainerDisappeared: + return eventPlan{ + summary: summaryDisappeared, + transition: &transitionPlan{ + from: runtime.StatusRunning, + to: runtime.StatusEngineUnreachable, + }, + }, true + default: + return eventPlan{}, false + } +} + +func optionalString(values map[string]any, key string) string { + raw, ok := values[key] + if !ok { + return "" + } + switch typed := raw.(type) { + case string: + return typed + case []byte: + return string(typed) + default: + return "" + } +} diff --git a/gamemaster/internal/worker/healtheventsconsumer/worker_test.go b/gamemaster/internal/worker/healtheventsconsumer/worker_test.go new file mode 100644 index 0000000..bd1f2eb --- /dev/null +++ b/gamemaster/internal/worker/healtheventsconsumer/worker_test.go @@ -0,0 +1,636 @@ +package healtheventsconsumer_test + +import ( + "context" + "errors" + "strconv" + "sync" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/telemetry" + "galaxy/gamemaster/internal/worker/healtheventsconsumer" + + "github.com/alicebob/miniredis/v2" + "github.com/redis/go-redis/v9" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +const ( + testStream = "runtime:health_events" + testLabel = "health_events" +) + +func newTestTelemetry(t *testing.T) *telemetry.Runtime { + t.Helper() + tm, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + return tm +} + +// runningRecord builds a runtime_records row in `running` with a known +// engine_health value. The seed simplifies expectations on Get reads. +func runningRecord(gameID, health string) runtime.RuntimeRecord { + created := time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC) + startedAt := created.Add(time.Second) + nextGen := created.Add(time.Hour) + return runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-" + gameID + ":8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 5, + NextGenerationAt: &nextGen, + EngineHealth: health, + CreatedAt: created, + UpdatedAt: startedAt, + StartedAt: &startedAt, + } +} + +func unreachableRecord(gameID, health string) runtime.RuntimeRecord { + rec := runningRecord(gameID, health) + rec.Status = runtime.StatusEngineUnreachable + return rec +} + +// withSummary returns a copy of rec with EngineHealth replaced. +func withSummary(rec runtime.RuntimeRecord, summary string) runtime.RuntimeRecord { + rec.EngineHealth = summary + return rec +} + +// withStatus returns a copy of rec with Status replaced. +func withStatus(rec runtime.RuntimeRecord, status runtime.Status) runtime.RuntimeRecord { + rec.Status = status + return rec +} + +// xMessage builds a redis.XMessage with the wire field layout used by +// RTM's healtheventspublisher. +func xMessage(id, gameID, eventType string, occurredAt time.Time) redis.XMessage { + return redis.XMessage{ + ID: id, + Values: map[string]any{ + "game_id": gameID, + "event_type": eventType, + "occurred_at_ms": strconv.FormatInt(occurredAt.UnixMilli(), 10), + "details": "{}", + }, + } +} + +// newWorker constructs a worker with mocked dependencies. The returned +// pointers are mocks; gomock.Controller is owned by the test. +type harness struct { + worker *healtheventsconsumer.Worker + store *mocks.MockRuntimeRecordStore + publisher *mocks.MockLobbyEventsPublisher + offsetStore *mocks.MockStreamOffsetStore + now time.Time +} + +func newHarness(t *testing.T, ctrl *gomock.Controller) *harness { + t.Helper() + now := time.Date(2026, time.May, 1, 13, 0, 0, 0, time.UTC) + store := mocks.NewMockRuntimeRecordStore(ctrl) + publisher := mocks.NewMockLobbyEventsPublisher(ctrl) + offsetStore := mocks.NewMockStreamOffsetStore(ctrl) + telem := newTestTelemetry(t) + worker, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: redis.NewClient(&redis.Options{Addr: "127.0.0.1:0"}), + Stream: testStream, + StreamLabel: testLabel, + BlockTimeout: 100 * time.Millisecond, + OffsetStore: offsetStore, + RuntimeRecords: store, + LobbyEvents: publisher, + Telemetry: telem, + Clock: func() time.Time { return now }, + }) + require.NoError(t, err) + return &harness{ + worker: worker, + store: store, + publisher: publisher, + offsetStore: offsetStore, + now: now, + } +} + +// TestNewWorkerValidates exercises every required-dep branch. +func TestNewWorkerValidates(t *testing.T) { + telem := newTestTelemetry(t) + client := redis.NewClient(&redis.Options{Addr: "127.0.0.1:0"}) + cases := []struct { + name string + mut func(*healtheventsconsumer.Dependencies) + }{ + {"client", func(d *healtheventsconsumer.Dependencies) { d.Client = nil }}, + {"stream", func(d *healtheventsconsumer.Dependencies) { d.Stream = " " }}, + {"block timeout", func(d *healtheventsconsumer.Dependencies) { d.BlockTimeout = 0 }}, + {"offset store", func(d *healtheventsconsumer.Dependencies) { d.OffsetStore = nil }}, + {"runtime records", func(d *healtheventsconsumer.Dependencies) { d.RuntimeRecords = nil }}, + {"lobby events", func(d *healtheventsconsumer.Dependencies) { d.LobbyEvents = nil }}, + {"telemetry", func(d *healtheventsconsumer.Dependencies) { d.Telemetry = nil }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + deps := healtheventsconsumer.Dependencies{ + Client: client, + Stream: testStream, + StreamLabel: testLabel, + BlockTimeout: time.Second, + OffsetStore: mocks.NewMockStreamOffsetStore(ctrl), + RuntimeRecords: mocks.NewMockRuntimeRecordStore(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Telemetry: telem, + } + tc.mut(&deps) + worker, err := healtheventsconsumer.NewWorker(deps) + require.Error(t, err) + require.Nil(t, worker) + }) + } +} + +func TestNewWorkerDefaultsLabel(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTestTelemetry(t) + worker, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: redis.NewClient(&redis.Options{Addr: "127.0.0.1:0"}), + Stream: testStream, + StreamLabel: "", + BlockTimeout: time.Second, + OffsetStore: mocks.NewMockStreamOffsetStore(ctrl), + RuntimeRecords: mocks.NewMockRuntimeRecordStore(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Telemetry: telem, + }) + require.NoError(t, err) + require.NotNil(t, worker) +} + +// TestHandleMessage_ContainerExited covers a terminal event from a +// healthy `running` record: status transitions to engine_unreachable +// and a snapshot is published. +func TestHandleMessage_ContainerExited(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, "healthy"), nil) + h.store.EXPECT().UpdateStatus(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateStatusInput) error { + require.Equal(t, runtime.StatusRunning, input.ExpectedFrom) + require.Equal(t, runtime.StatusEngineUnreachable, input.To) + require.NotNil(t, input.EngineHealthSummary) + require.Equal(t, "exited", *input.EngineHealthSummary) + return nil + }, + ) + h.store.EXPECT().Get(gomock.Any(), gameID).Return( + withStatus(withSummary(runningRecord(gameID, "healthy"), "exited"), runtime.StatusEngineUnreachable), + nil, + ) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, snap ports.RuntimeSnapshotUpdate) error { + assert.Equal(t, gameID, snap.GameID) + assert.Equal(t, runtime.StatusEngineUnreachable, snap.RuntimeStatus) + assert.Equal(t, "exited", snap.EngineHealthSummary) + assert.Nil(t, snap.PlayerTurnStats) + assert.Equal(t, h.now, snap.OccurredAt) + return nil + }, + ) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "container_exited", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_ProbeRecovered_Recovers demonstrates the symmetric +// recovery: engine_unreachable → running, summary set to healthy. +func TestHandleMessage_ProbeRecovered_Recovers(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(unreachableRecord(gameID, "exited"), nil) + h.store.EXPECT().UpdateStatus(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateStatusInput) error { + require.Equal(t, runtime.StatusEngineUnreachable, input.ExpectedFrom) + require.Equal(t, runtime.StatusRunning, input.To) + require.NotNil(t, input.EngineHealthSummary) + require.Equal(t, "healthy", *input.EngineHealthSummary) + return nil + }, + ) + h.store.EXPECT().Get(gomock.Any(), gameID).Return( + withStatus(withSummary(unreachableRecord(gameID, "exited"), "healthy"), runtime.StatusRunning), + nil, + ) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, snap ports.RuntimeSnapshotUpdate) error { + assert.Equal(t, runtime.StatusRunning, snap.RuntimeStatus) + assert.Equal(t, "healthy", snap.EngineHealthSummary) + return nil + }, + ) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "probe_recovered", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_ContainerStarted_NoTransition asserts that +// container_started writes summary `healthy` without status mutation. +func TestHandleMessage_ContainerStarted_NoTransition(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, ""), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateEngineHealthInput) error { + assert.Equal(t, gameID, input.GameID) + assert.Equal(t, "healthy", input.EngineHealthSummary) + return nil + }, + ) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, ""), "healthy"), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "container_started", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_ProbeFailed covers the non-transitional path: +// summary is updated; status stays running. +func TestHandleMessage_ProbeFailed(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, "healthy"), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).Return(nil) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, "healthy"), "probe_failed"), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, snap ports.RuntimeSnapshotUpdate) error { + assert.Equal(t, runtime.StatusRunning, snap.RuntimeStatus) + assert.Equal(t, "probe_failed", snap.EngineHealthSummary) + return nil + }, + ) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "probe_failed", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_FallsBackOnCASConflict — record is in +// generation_in_progress (not running); CAS rejects with ErrConflict and +// the worker falls back to UpdateEngineHealth + publishes a snapshot +// because the summary changed. +func TestHandleMessage_FallsBackOnCASConflict(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + current := withStatus(runningRecord(gameID, "healthy"), runtime.StatusGenerationInProgress) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(current, nil) + h.store.EXPECT().UpdateStatus(gomock.Any(), gomock.Any()).Return(runtime.ErrConflict) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateEngineHealthInput) error { + assert.Equal(t, "oom", input.EngineHealthSummary) + return nil + }, + ) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(current, "oom"), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, snap ports.RuntimeSnapshotUpdate) error { + assert.Equal(t, runtime.StatusGenerationInProgress, snap.RuntimeStatus, + "status must reflect the unchanged record after fallback") + assert.Equal(t, "oom", snap.EngineHealthSummary) + return nil + }, + ) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "container_oom", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_DebouncesUnchangedSummary — two consecutive +// probe_failed events for the same game yield exactly one snapshot +// publication. +func TestHandleMessage_DebouncesUnchangedSummary(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + // First event: store update + reload + publish. + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, "healthy"), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).Return(nil) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, "healthy"), "probe_failed"), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + // Second event: store update happens, but no second Get and no + // publication since the summary is unchanged. + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, "probe_failed"), "probe_failed"), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).Return(nil) + + ctx := context.Background() + require.True(t, h.worker.HandleMessage(ctx, xMessage("0-1", gameID, "probe_failed", h.now))) + require.True(t, h.worker.HandleMessage(ctx, xMessage("0-2", gameID, "probe_failed", h.now))) +} + +// TestHandleMessage_OrphanGameID — Get returns ErrNotFound, no further +// store calls, no publish, offset advances. +func TestHandleMessage_OrphanGameID(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "missing-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runtime.RuntimeRecord{}, runtime.ErrNotFound) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "probe_failed", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_UnknownEventType — unrecognised event type yields +// no store calls and no publication, but offset advances. +func TestHandleMessage_UnknownEventType(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", "game-001", "future_event", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_MalformedOccurredAtMS — malformed wire payload is +// logged + skipped without store calls. +func TestHandleMessage_MalformedOccurredAtMS(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + + msg := redis.XMessage{ + ID: "0-1", + Values: map[string]any{ + "game_id": "game-001", + "event_type": "probe_failed", + "occurred_at_ms": "not-a-number", + }, + } + advance := h.worker.HandleMessage(context.Background(), msg) + assert.True(t, advance) +} + +// TestHandleMessage_MissingFields — missing required wire field is +// logged + skipped. +func TestHandleMessage_MissingFields(t *testing.T) { + cases := []struct { + name string + msg redis.XMessage + }{ + {"missing game_id", redis.XMessage{ID: "0-1", Values: map[string]any{"event_type": "probe_failed", "occurred_at_ms": "1"}}}, + {"missing event_type", redis.XMessage{ID: "0-1", Values: map[string]any{"game_id": "g", "occurred_at_ms": "1"}}}, + {"missing occurred_at_ms", redis.XMessage{ID: "0-1", Values: map[string]any{"game_id": "g", "event_type": "probe_failed"}}}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + advance := h.worker.HandleMessage(context.Background(), tc.msg) + assert.True(t, advance) + }) + } +} + +// TestHandleMessage_PublishErrorAdvancesOffset — a publisher error is +// logged and absorbed; the offset still advances so a transient hiccup +// does not stall the consumer. +func TestHandleMessage_PublishErrorAdvancesOffset(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, "healthy"), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).Return(nil) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, "healthy"), "probe_failed"), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(errors.New("redis down")) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, "probe_failed", h.now)) + assert.True(t, advance) +} + +// TestHandleMessage_AllEventTypes_RouteSummaries asserts the event-type +// → summary mapping for the four non-CAS event types, plus that +// container_started is non-CAS too. The CAS variants are covered by +// dedicated tests above. +func TestHandleMessage_AllEventTypes_RouteSummaries(t *testing.T) { + type expectation struct { + eventType string + wantSummary string + wantsCASCall bool + } + cases := []expectation{ + {"container_started", "healthy", false}, + {"probe_failed", "probe_failed", false}, + {"inspect_unhealthy", "inspect_unhealthy", false}, + } + for _, tc := range cases { + t.Run(tc.eventType, func(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + gameID := "game-001" + + h.store.EXPECT().Get(gomock.Any(), gameID).Return(runningRecord(gameID, ""), nil) + h.store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).DoAndReturn( + func(_ context.Context, input ports.UpdateEngineHealthInput) error { + assert.Equal(t, tc.wantSummary, input.EngineHealthSummary) + return nil + }, + ) + h.store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(runningRecord(gameID, ""), tc.wantSummary), nil) + h.publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + advance := h.worker.HandleMessage(context.Background(), xMessage("0-1", gameID, tc.eventType, h.now)) + assert.True(t, advance) + }) + } +} + +// TestRun_LoadsOffsetAndAdvances drives a real XREAD loop against a +// miniredis instance. After XADD-ing one entry and observing the loop +// exit on context cancellation, the persisted offset must equal the +// consumed entry's ID. +func TestRun_LoadsOffsetAndAdvances(t *testing.T) { + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + + ctrl := gomock.NewController(t) + store := mocks.NewMockRuntimeRecordStore(ctrl) + publisher := mocks.NewMockLobbyEventsPublisher(ctrl) + telem := newTestTelemetry(t) + + gameID := "game-001" + rec := runningRecord(gameID, "healthy") + + var ( + mu sync.Mutex + offset string + offsetSet bool + ) + offsetStore := mocks.NewMockStreamOffsetStore(ctrl) + offsetStore.EXPECT().Load(gomock.Any(), testLabel).Return("", false, nil) + offsetStore.EXPECT().Save(gomock.Any(), testLabel, gomock.Any()).DoAndReturn( + func(_ context.Context, _ string, entryID string) error { + mu.Lock() + defer mu.Unlock() + offset = entryID + offsetSet = true + return nil + }, + ).MinTimes(1) + + store.EXPECT().Get(gomock.Any(), gameID).Return(rec, nil) + store.EXPECT().UpdateEngineHealth(gomock.Any(), gomock.Any()).Return(nil) + store.EXPECT().Get(gomock.Any(), gameID).Return(withSummary(rec, "probe_failed"), nil) + publisher.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Return(nil) + + worker, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: client, + Stream: testStream, + StreamLabel: testLabel, + BlockTimeout: 100 * time.Millisecond, + OffsetStore: offsetStore, + RuntimeRecords: store, + LobbyEvents: publisher, + Telemetry: telem, + }) + require.NoError(t, err) + + occurredMS := strconv.FormatInt(time.Date(2026, time.May, 1, 12, 0, 0, 0, time.UTC).UnixMilli(), 10) + entryID, err := client.XAdd(context.Background(), &redis.XAddArgs{ + Stream: testStream, + Values: map[string]any{ + "game_id": gameID, + "event_type": "probe_failed", + "occurred_at_ms": occurredMS, + "details": "{}", + }, + }).Result() + require.NoError(t, err) + + ctx, cancel := context.WithCancel(context.Background()) + done := make(chan error, 1) + go func() { done <- worker.Run(ctx) }() + + deadline := time.Now().Add(2 * time.Second) + for time.Now().Before(deadline) { + mu.Lock() + set := offsetSet + mu.Unlock() + if set { + break + } + time.Sleep(20 * time.Millisecond) + } + + cancel() + select { + case err := <-done: + assert.True(t, errors.Is(err, context.Canceled), "run must exit with context.Canceled, got %v", err) + case <-time.After(2 * time.Second): + t.Fatal("worker did not exit within deadline") + } + + mu.Lock() + defer mu.Unlock() + require.True(t, offsetSet, "offset must be persisted at least once") + assert.Equal(t, entryID, offset) +} + +// TestRun_ContextCancel — Run returns context.Canceled on cancel even +// when no stream entry is available. +func TestRun_ContextCancel(t *testing.T) { + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + + ctrl := gomock.NewController(t) + store := mocks.NewMockRuntimeRecordStore(ctrl) + publisher := mocks.NewMockLobbyEventsPublisher(ctrl) + offsetStore := mocks.NewMockStreamOffsetStore(ctrl) + offsetStore.EXPECT().Load(gomock.Any(), testLabel).Return("0-0", true, nil) + + worker, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: client, + Stream: testStream, + StreamLabel: testLabel, + BlockTimeout: 50 * time.Millisecond, + OffsetStore: offsetStore, + RuntimeRecords: store, + LobbyEvents: publisher, + Telemetry: newTestTelemetry(t), + }) + require.NoError(t, err) + + ctx, cancel := context.WithCancel(context.Background()) + done := make(chan error, 1) + go func() { done <- worker.Run(ctx) }() + + time.Sleep(150 * time.Millisecond) + cancel() + select { + case err := <-done: + assert.True(t, errors.Is(err, context.Canceled), "want context.Canceled, got %v", err) + case <-time.After(2 * time.Second): + t.Fatal("worker did not exit within deadline") + } +} + +// TestRun_FailsOnOffsetLoadError covers the bootstrap failure: a load +// error is fatal and surfaces from Run. +func TestRun_FailsOnOffsetLoadError(t *testing.T) { + server := miniredis.RunT(t) + client := redis.NewClient(&redis.Options{Addr: server.Addr()}) + t.Cleanup(func() { _ = client.Close() }) + + ctrl := gomock.NewController(t) + offsetStore := mocks.NewMockStreamOffsetStore(ctrl) + offsetStore.EXPECT().Load(gomock.Any(), testLabel).Return("", false, errors.New("redis down")) + + worker, err := healtheventsconsumer.NewWorker(healtheventsconsumer.Dependencies{ + Client: client, + Stream: testStream, + StreamLabel: testLabel, + BlockTimeout: 50 * time.Millisecond, + OffsetStore: offsetStore, + RuntimeRecords: mocks.NewMockRuntimeRecordStore(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Telemetry: newTestTelemetry(t), + }) + require.NoError(t, err) + + err = worker.Run(context.Background()) + require.Error(t, err) + assert.Contains(t, err.Error(), "load offset") +} + +// TestShutdown_Noop confirms Shutdown returns nil for a non-nil ctx +// and rejects a nil one. +func TestShutdown_Noop(t *testing.T) { + ctrl := gomock.NewController(t) + h := newHarness(t, ctrl) + require.NoError(t, h.worker.Shutdown(context.Background())) + + //nolint:staticcheck // Deliberate nil context to verify guard. + require.Error(t, h.worker.Shutdown(nil)) +} diff --git a/gamemaster/internal/worker/schedulerticker/worker.go b/gamemaster/internal/worker/schedulerticker/worker.go new file mode 100644 index 0000000..0dc709e --- /dev/null +++ b/gamemaster/internal/worker/schedulerticker/worker.go @@ -0,0 +1,218 @@ +// Package schedulerticker drives the periodic turn-generation +// scheduler described in `gamemaster/README.md §Background workers`. +// +// On every tick (default 1 s) the worker scans +// `runtime_records.ListDueRunning(now)` and dispatches one +// `turngeneration.Service.Handle` call per due game. Each in-flight +// game id is tracked in an in-process set so a long-running engine call +// never causes the same game to be dispatched twice. The CAS in +// `turngeneration` is the authoritative protection; the in-flight set +// is a cheap optimisation that avoids issuing a doomed engine call only +// to discard a `conflict` outcome. +// +// Per-tick errors are absorbed; the loop terminates only on context +// cancellation. +package schedulerticker + +import ( + "context" + "errors" + "log/slog" + "sync" + "time" + + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/logging" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" +) + +// Dependencies groups the collaborators required by Worker. +type Dependencies struct { + // RuntimeRecords lists due-now running records once per tick. + RuntimeRecords ports.RuntimeRecordStore + + // TurnGeneration drives the per-game turn-generation flow. + TurnGeneration *turngeneration.Service + + // Telemetry records `gamemaster.scheduler.due_games` indirectly via + // the gauge probe (Stage 19 wires it). The worker itself only + // records turn-generation outcomes inside `turngeneration.Service`. + Telemetry *telemetry.Runtime + + // Interval bounds the tick period. Required positive. + Interval time.Duration + + // Clock supplies the wall-clock used for ListDueRunning. Defaults + // to `time.Now` when nil. + Clock func() time.Time + + // Logger receives structured worker-level events. Defaults to + // `slog.Default()` when nil. + Logger *slog.Logger +} + +// Worker drives the scheduler tick loop. +type Worker struct { + runtimeRecords ports.RuntimeRecordStore + turnGeneration *turngeneration.Service + telemetry *telemetry.Runtime + + interval time.Duration + clock func() time.Time + logger *slog.Logger + + inflight sync.Map // map[gameID]struct{} + + wg sync.WaitGroup +} + +// NewWorker constructs one Worker from deps. +func NewWorker(deps Dependencies) (*Worker, error) { + switch { + case deps.RuntimeRecords == nil: + return nil, errors.New("new scheduler ticker: nil runtime records store") + case deps.TurnGeneration == nil: + return nil, errors.New("new scheduler ticker: nil turn generation service") + case deps.Telemetry == nil: + return nil, errors.New("new scheduler ticker: nil telemetry runtime") + case deps.Interval <= 0: + return nil, errors.New("new scheduler ticker: interval must be positive") + } + + clock := deps.Clock + if clock == nil { + clock = time.Now + } + logger := deps.Logger + if logger == nil { + logger = slog.Default() + } + + return &Worker{ + runtimeRecords: deps.RuntimeRecords, + turnGeneration: deps.TurnGeneration, + telemetry: deps.Telemetry, + interval: deps.Interval, + clock: clock, + logger: logger.With("worker", "gamemaster.schedulerticker"), + }, nil +} + +// Shutdown is a no-op kept so the worker satisfies the +// `app.Component` interface alongside `Run`. The loop already +// terminates when the context handed to Run is cancelled and the +// in-flight goroutines drain before Run returns; an explicit Shutdown +// has nothing extra to release. +func (worker *Worker) Shutdown(_ context.Context) error { + return nil +} + +// Run drives the scheduler loop until ctx is cancelled. Run waits for +// the in-flight goroutines launched on the most recent tick to return +// before exiting so cancellation is observable through ctx for both the +// loop and the per-game work. +func (worker *Worker) Run(ctx context.Context) error { + if worker == nil { + return errors.New("run scheduler ticker: nil worker") + } + if ctx == nil { + return errors.New("run scheduler ticker: nil context") + } + if err := ctx.Err(); err != nil { + return err + } + + worker.logger.Info("scheduler ticker started", + "interval", worker.interval.String(), + ) + defer worker.logger.Info("scheduler ticker stopped") + + ticker := time.NewTicker(worker.interval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + worker.wg.Wait() + return ctx.Err() + case <-ticker.C: + worker.Tick(ctx) + } + } +} + +// Tick performs one full pass. Exported so tests can drive the worker +// deterministically without waiting on a real ticker. +func (worker *Worker) Tick(ctx context.Context) { + if err := ctx.Err(); err != nil { + return + } + now := worker.clock().UTC() + + due, err := worker.runtimeRecords.ListDueRunning(ctx, now) + if err != nil { + logArgs := []any{ + "err", err.Error(), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + worker.logger.WarnContext(ctx, "list due running records", logArgs...) + return + } + if len(due) == 0 { + return + } + + for _, record := range due { + gameID := record.GameID + if _, loaded := worker.inflight.LoadOrStore(gameID, struct{}{}); loaded { + worker.logger.DebugContext(ctx, "skip due game: in-flight", + "game_id", gameID, + ) + continue + } + worker.wg.Add(1) + go worker.dispatch(ctx, gameID) + } +} + +// dispatch runs one turn-generation operation against gameID and +// releases the in-flight slot when the call returns. +func (worker *Worker) dispatch(ctx context.Context, gameID string) { + defer worker.wg.Done() + defer worker.inflight.Delete(gameID) + + result, err := worker.turnGeneration.Handle(ctx, turngeneration.Input{ + GameID: gameID, + Trigger: turngeneration.TriggerScheduler, + OpSource: operation.OpSourceAdminRest, + }) + if err != nil { + logArgs := []any{ + "game_id", gameID, + "err", err.Error(), + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + worker.logger.ErrorContext(ctx, "turn generation handle returned error", logArgs...) + return + } + if !result.IsSuccess() { + logArgs := []any{ + "game_id", gameID, + "error_code", result.ErrorCode, + "error_message", result.ErrorMessage, + } + logArgs = append(logArgs, logging.ContextAttrs(ctx)...) + worker.logger.DebugContext(ctx, "turn generation completed with non-success outcome", logArgs...) + } +} + +// Wait blocks until every in-flight goroutine launched by Run / Tick +// has returned. Useful for tests that drive Tick directly. +func (worker *Worker) Wait() { + if worker == nil { + return + } + worker.wg.Wait() +} diff --git a/gamemaster/internal/worker/schedulerticker/worker_test.go b/gamemaster/internal/worker/schedulerticker/worker_test.go new file mode 100644 index 0000000..e248eb2 --- /dev/null +++ b/gamemaster/internal/worker/schedulerticker/worker_test.go @@ -0,0 +1,542 @@ +package schedulerticker_test + +import ( + "context" + "errors" + "sync" + "sync/atomic" + "testing" + "time" + + "galaxy/gamemaster/internal/adapters/mocks" + "galaxy/gamemaster/internal/domain/operation" + "galaxy/gamemaster/internal/domain/playermapping" + "galaxy/gamemaster/internal/domain/runtime" + "galaxy/gamemaster/internal/ports" + "galaxy/gamemaster/internal/service/scheduler" + "galaxy/gamemaster/internal/service/turngeneration" + "galaxy/gamemaster/internal/telemetry" + "galaxy/gamemaster/internal/worker/schedulerticker" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" +) + +// fakeRuntimeRecordsBackend is a minimal in-memory implementation of +// the RuntimeRecordStore subset the ticker exercises plus the +// turn-generation orchestrator hooks. The fake mirrors the runtime CAS +// semantics so the in-flight set test can run a full +// running→generation_in_progress→running cycle. +type fakeRuntimeRecordsBackend struct { + mu sync.Mutex + stored map[string]runtime.RuntimeRecord + listErr error + listCalls atomic.Int32 + listCustom func(ctx context.Context, now time.Time) ([]runtime.RuntimeRecord, error) +} + +func newFakeRuntimeRecordsBackend() *fakeRuntimeRecordsBackend { + return &fakeRuntimeRecordsBackend{stored: map[string]runtime.RuntimeRecord{}} +} + +func (s *fakeRuntimeRecordsBackend) seed(record runtime.RuntimeRecord) { + s.mu.Lock() + defer s.mu.Unlock() + s.stored[record.GameID] = record +} + +func (s *fakeRuntimeRecordsBackend) Get(_ context.Context, gameID string) (runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[gameID] + if !ok { + return runtime.RuntimeRecord{}, runtime.ErrNotFound + } + return record, nil +} + +func (s *fakeRuntimeRecordsBackend) Insert(_ context.Context, record runtime.RuntimeRecord) error { + s.mu.Lock() + defer s.mu.Unlock() + if _, ok := s.stored[record.GameID]; ok { + return runtime.ErrConflict + } + s.stored[record.GameID] = record + return nil +} + +func (s *fakeRuntimeRecordsBackend) UpdateStatus(_ context.Context, input ports.UpdateStatusInput) error { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if record.Status != input.ExpectedFrom { + return runtime.ErrConflict + } + record.Status = input.To + record.UpdatedAt = input.Now + if input.To == runtime.StatusRunning && record.StartedAt == nil { + startedAt := input.Now + record.StartedAt = &startedAt + } + if input.To == runtime.StatusFinished { + finishedAt := input.Now + record.FinishedAt = &finishedAt + } + s.stored[input.GameID] = record + return nil +} + +func (s *fakeRuntimeRecordsBackend) UpdateScheduling(_ context.Context, input ports.UpdateSchedulingInput) error { + s.mu.Lock() + defer s.mu.Unlock() + record, ok := s.stored[input.GameID] + if !ok { + return runtime.ErrNotFound + } + if input.NextGenerationAt != nil { + next := *input.NextGenerationAt + record.NextGenerationAt = &next + } else { + record.NextGenerationAt = nil + } + record.SkipNextTick = input.SkipNextTick + record.CurrentTurn = input.CurrentTurn + record.UpdatedAt = input.Now + s.stored[input.GameID] = record + return nil +} + +func (s *fakeRuntimeRecordsBackend) UpdateImage(_ context.Context, _ ports.UpdateImageInput) error { + return errors.New("not used in schedulerticker tests") +} + +func (s *fakeRuntimeRecordsBackend) UpdateEngineHealth(_ context.Context, _ ports.UpdateEngineHealthInput) error { + return errors.New("not used in schedulerticker tests") +} + +func (s *fakeRuntimeRecordsBackend) Delete(_ context.Context, gameID string) error { + s.mu.Lock() + defer s.mu.Unlock() + delete(s.stored, gameID) + return nil +} + +func (s *fakeRuntimeRecordsBackend) ListDueRunning(ctx context.Context, now time.Time) ([]runtime.RuntimeRecord, error) { + s.listCalls.Add(1) + if s.listCustom != nil { + return s.listCustom(ctx, now) + } + if s.listErr != nil { + return nil, s.listErr + } + s.mu.Lock() + defer s.mu.Unlock() + var due []runtime.RuntimeRecord + for _, record := range s.stored { + if record.Status != runtime.StatusRunning { + continue + } + if record.NextGenerationAt == nil || record.NextGenerationAt.After(now) { + continue + } + due = append(due, record) + } + return due, nil +} + +func (s *fakeRuntimeRecordsBackend) ListByStatus(_ context.Context, status runtime.Status) ([]runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + var matching []runtime.RuntimeRecord + for _, record := range s.stored { + if record.Status == status { + matching = append(matching, record) + } + } + return matching, nil +} + +func (s *fakeRuntimeRecordsBackend) List(_ context.Context) ([]runtime.RuntimeRecord, error) { + s.mu.Lock() + defer s.mu.Unlock() + all := make([]runtime.RuntimeRecord, 0, len(s.stored)) + for _, record := range s.stored { + all = append(all, record) + } + return all, nil +} + +type stubMappings struct { + rows map[string][]playermapping.PlayerMapping +} + +func (s *stubMappings) BulkInsert(_ context.Context, _ []playermapping.PlayerMapping) error { + return errors.New("not used") +} + +func (s *stubMappings) Get(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} + +func (s *stubMappings) GetByRace(_ context.Context, _, _ string) (playermapping.PlayerMapping, error) { + return playermapping.PlayerMapping{}, errors.New("not used") +} + +func (s *stubMappings) ListByGame(_ context.Context, gameID string) ([]playermapping.PlayerMapping, error) { + return append([]playermapping.PlayerMapping(nil), s.rows[gameID]...), nil +} + +func (s *stubMappings) DeleteByGame(_ context.Context, _ string) error { + return errors.New("not used") +} + +type stubLogs struct{} + +func (stubLogs) Append(_ context.Context, _ operation.OperationEntry) (int64, error) { return 1, nil } +func (stubLogs) ListByGame(_ context.Context, _ string, _ int) ([]operation.OperationEntry, error) { + return nil, errors.New("not used") +} + +// --- helpers ---------------------------------------------------------- + +func newTelemetry(t *testing.T) *telemetry.Runtime { + t.Helper() + tm, err := telemetry.NewWithProviders(nil, nil) + require.NoError(t, err) + return tm +} + +func seedRunningRecord(t *testing.T, store *fakeRuntimeRecordsBackend, mappings *stubMappings, gameID string, due time.Time) { + t.Helper() + startedAt := due.Add(-1 * time.Hour) + store.seed(runtime.RuntimeRecord{ + GameID: gameID, + Status: runtime.StatusRunning, + EngineEndpoint: "http://galaxy-game-" + gameID + ":8080", + CurrentImageRef: "ghcr.io/galaxy/game:v1.2.3", + CurrentEngineVersion: "v1.2.3", + TurnSchedule: "0 18 * * *", + CurrentTurn: 0, + NextGenerationAt: &due, + EngineHealth: "healthy", + CreatedAt: due.Add(-2 * time.Hour), + UpdatedAt: due.Add(-2 * time.Hour), + StartedAt: &startedAt, + }) + if mappings.rows == nil { + mappings.rows = map[string][]playermapping.PlayerMapping{} + } + mappings.rows[gameID] = []playermapping.PlayerMapping{ + {GameID: gameID, UserID: "user-1", RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", CreatedAt: startedAt}, + {GameID: gameID, UserID: "user-2", RaceName: "Drazi", EnginePlayerUUID: "uuid-2", CreatedAt: startedAt}, + } +} + +// --- tests ------------------------------------------------------------ + +func TestNewWorkerRejectsMissingDeps(t *testing.T) { + telem := newTelemetry(t) + cases := []struct { + name string + mut func(*schedulerticker.Dependencies) + }{ + {"runtime records", func(d *schedulerticker.Dependencies) { d.RuntimeRecords = nil }}, + {"turn generation", func(d *schedulerticker.Dependencies) { d.TurnGeneration = nil }}, + {"telemetry", func(d *schedulerticker.Dependencies) { d.Telemetry = nil }}, + {"non-positive interval", func(d *schedulerticker.Dependencies) { d.Interval = 0 }}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ctrl := gomock.NewController(t) + turn := buildTurnService(t, ctrl, newFakeRuntimeRecordsBackend(), &stubMappings{}, telem) + deps := schedulerticker.Dependencies{ + RuntimeRecords: newFakeRuntimeRecordsBackend(), + TurnGeneration: turn, + Telemetry: telem, + Interval: time.Second, + } + tc.mut(&deps) + worker, err := schedulerticker.NewWorker(deps) + require.Error(t, err) + require.Nil(t, worker) + }) + } +} + +func TestTickDispatchesDueGames(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTelemetry(t) + store := newFakeRuntimeRecordsBackend() + mappings := &stubMappings{} + + now := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + due := now.Add(-5 * time.Minute) + seedRunningRecord(t, store, mappings, "game-a", due) + seedRunningRecord(t, store, mappings, "game-b", due) + + engine := mocks.NewMockEngineClient(ctrl) + lobbyEvents := mocks.NewMockLobbyEventsPublisher(ctrl) + notifications := mocks.NewMockNotificationIntentPublisher(ctrl) + lobby := mocks.NewMockLobbyClient(ctrl) + + engine.EXPECT(). + Turn(gomock.Any(), gomock.Any()). + Times(2). + Return(ports.StateResponse{Turn: 1, Players: []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 1, Population: 10}, + {RaceName: "Drazi", EnginePlayerUUID: "uuid-2", Planets: 1, Population: 10}, + }}, nil) + lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Times(2).Return(nil) + lobby.EXPECT().GetGameSummary(gomock.Any(), gomock.Any()).Times(2). + Return(ports.GameSummary{GameID: "g", GameName: "Game", Status: "running"}, nil) + notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Times(2).Return(nil) + + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: mappings, + OperationLogs: stubLogs{}, + Engine: engine, + LobbyEvents: lobbyEvents, + Notifications: notifications, + Lobby: lobby, + Scheduler: scheduler.New(), + Telemetry: telem, + Clock: func() time.Time { return now }, + }) + require.NoError(t, err) + + worker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: store, + TurnGeneration: turn, + Telemetry: telem, + Interval: time.Second, + Clock: func() time.Time { return now }, + }) + require.NoError(t, err) + + worker.Tick(context.Background()) + worker.Wait() + + // Both games should have advanced from running → running with + // current_turn=1. + for _, gameID := range []string{"game-a", "game-b"} { + record, err := store.Get(context.Background(), gameID) + require.NoError(t, err) + assert.Equal(t, runtime.StatusRunning, record.Status, "game %s", gameID) + assert.Equal(t, 1, record.CurrentTurn, "game %s", gameID) + } +} + +func TestTickDeduplicatesInflightGame(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTelemetry(t) + store := newFakeRuntimeRecordsBackend() + mappings := &stubMappings{} + + now := time.Date(2026, time.April, 30, 12, 0, 0, 0, time.UTC) + due := now.Add(-5 * time.Minute) + seedRunningRecord(t, store, mappings, "game-a", due) + + engine := mocks.NewMockEngineClient(ctrl) + lobbyEvents := mocks.NewMockLobbyEventsPublisher(ctrl) + notifications := mocks.NewMockNotificationIntentPublisher(ctrl) + lobby := mocks.NewMockLobbyClient(ctrl) + + releaseEngine := make(chan struct{}) + engine.EXPECT(). + Turn(gomock.Any(), gomock.Any()). + Times(1). + DoAndReturn(func(ctx context.Context, _ string) (ports.StateResponse, error) { + select { + case <-releaseEngine: + case <-ctx.Done(): + } + return ports.StateResponse{Turn: 1, Players: []ports.PlayerState{ + {RaceName: "Aelinari", EnginePlayerUUID: "uuid-1", Planets: 1, Population: 10}, + {RaceName: "Drazi", EnginePlayerUUID: "uuid-2", Planets: 1, Population: 10}, + }}, nil + }) + lobbyEvents.EXPECT().PublishSnapshotUpdate(gomock.Any(), gomock.Any()).Times(1).Return(nil) + lobby.EXPECT().GetGameSummary(gomock.Any(), gomock.Any()).Times(1). + Return(ports.GameSummary{GameID: "game-a", GameName: "Game A", Status: "running"}, nil) + notifications.EXPECT().Publish(gomock.Any(), gomock.Any()).Times(1).Return(nil) + + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: mappings, + OperationLogs: stubLogs{}, + Engine: engine, + LobbyEvents: lobbyEvents, + Notifications: notifications, + Lobby: lobby, + Scheduler: scheduler.New(), + Telemetry: telem, + Clock: func() time.Time { return now }, + }) + require.NoError(t, err) + + worker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: store, + TurnGeneration: turn, + Telemetry: telem, + Interval: time.Second, + Clock: func() time.Time { return now }, + }) + require.NoError(t, err) + + worker.Tick(context.Background()) + // Reset the runtime row to running so the second Tick would normally + // re-dispatch; the in-flight set must still skip it. + store.mu.Lock() + rec := store.stored["game-a"] + rec.Status = runtime.StatusRunning + rec.NextGenerationAt = &due + store.stored["game-a"] = rec + store.mu.Unlock() + + worker.Tick(context.Background()) + + close(releaseEngine) + worker.Wait() + + // Only one engine call must have happened despite two ticks. + assert.GreaterOrEqual(t, store.listCalls.Load(), int32(2), "ListDueRunning observed both ticks") +} + +func TestTickAbsorbsListError(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTelemetry(t) + store := newFakeRuntimeRecordsBackend() + store.listErr = errors.New("postgres timeout") + + engine := mocks.NewMockEngineClient(ctrl) + lobbyEvents := mocks.NewMockLobbyEventsPublisher(ctrl) + notifications := mocks.NewMockNotificationIntentPublisher(ctrl) + lobby := mocks.NewMockLobbyClient(ctrl) + + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: &stubMappings{}, + OperationLogs: stubLogs{}, + Engine: engine, + LobbyEvents: lobbyEvents, + Notifications: notifications, + Lobby: lobby, + Scheduler: scheduler.New(), + Telemetry: telem, + }) + require.NoError(t, err) + + worker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: store, + TurnGeneration: turn, + Telemetry: telem, + Interval: time.Second, + }) + require.NoError(t, err) + + worker.Tick(context.Background()) + worker.Wait() +} + +func TestTickEmptyDueListIsNoOp(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTelemetry(t) + store := newFakeRuntimeRecordsBackend() + + engine := mocks.NewMockEngineClient(ctrl) + lobbyEvents := mocks.NewMockLobbyEventsPublisher(ctrl) + notifications := mocks.NewMockNotificationIntentPublisher(ctrl) + lobby := mocks.NewMockLobbyClient(ctrl) + + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: &stubMappings{}, + OperationLogs: stubLogs{}, + Engine: engine, + LobbyEvents: lobbyEvents, + Notifications: notifications, + Lobby: lobby, + Scheduler: scheduler.New(), + Telemetry: telem, + }) + require.NoError(t, err) + + worker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: store, + TurnGeneration: turn, + Telemetry: telem, + Interval: time.Second, + }) + require.NoError(t, err) + + worker.Tick(context.Background()) + worker.Wait() +} + +func TestRunStopsOnContextCancellation(t *testing.T) { + ctrl := gomock.NewController(t) + telem := newTelemetry(t) + store := newFakeRuntimeRecordsBackend() + + engine := mocks.NewMockEngineClient(ctrl) + lobbyEvents := mocks.NewMockLobbyEventsPublisher(ctrl) + notifications := mocks.NewMockNotificationIntentPublisher(ctrl) + lobby := mocks.NewMockLobbyClient(ctrl) + + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: &stubMappings{}, + OperationLogs: stubLogs{}, + Engine: engine, + LobbyEvents: lobbyEvents, + Notifications: notifications, + Lobby: lobby, + Scheduler: scheduler.New(), + Telemetry: telem, + }) + require.NoError(t, err) + + worker, err := schedulerticker.NewWorker(schedulerticker.Dependencies{ + RuntimeRecords: store, + TurnGeneration: turn, + Telemetry: telem, + Interval: 10 * time.Millisecond, + }) + require.NoError(t, err) + + ctx, cancel := context.WithCancel(context.Background()) + done := make(chan error, 1) + go func() { done <- worker.Run(ctx) }() + cancel() + select { + case err := <-done: + assert.ErrorIs(t, err, context.Canceled) + case <-time.After(2 * time.Second): + t.Fatal("worker did not exit on context cancellation") + } +} + +// buildTurnService is a thin helper for the missing-deps test cases — +// it does not exercise the engine because the deps test never reaches +// the work path. +func buildTurnService(t *testing.T, ctrl *gomock.Controller, store *fakeRuntimeRecordsBackend, mappings *stubMappings, telem *telemetry.Runtime) *turngeneration.Service { + t.Helper() + turn, err := turngeneration.NewService(turngeneration.Dependencies{ + RuntimeRecords: store, + PlayerMappings: mappings, + OperationLogs: stubLogs{}, + Engine: mocks.NewMockEngineClient(ctrl), + LobbyEvents: mocks.NewMockLobbyEventsPublisher(ctrl), + Notifications: mocks.NewMockNotificationIntentPublisher(ctrl), + Lobby: mocks.NewMockLobbyClient(ctrl), + Scheduler: scheduler.New(), + Telemetry: telem, + }) + require.NoError(t, err) + return turn +} diff --git a/gamemaster/notificationintent_audit_test.go b/gamemaster/notificationintent_audit_test.go new file mode 100644 index 0000000..5273fbf --- /dev/null +++ b/gamemaster/notificationintent_audit_test.go @@ -0,0 +1,147 @@ +package gamemaster + +import ( + "testing" + "time" + + "github.com/stretchr/testify/require" + + "galaxy/notificationintent" +) + +// TestNotificationIntentConstructorsForGameMaster freezes the producer-side +// surface of the three GM-owned notification types against +// `pkg/notificationintent`. It complements the YAML-level catalog freeze in +// `notification/contract_asyncapi_test.go` by binding the contract at compile +// time: any rename of a constant, constructor, payload struct, or struct field +// breaks this file's build before it can reach a YAML edit. +// +// The three types frozen here are documented in `gamemaster/README.md` +// §Notification Contracts as the GM-owned producer catalog. +func TestNotificationIntentConstructorsForGameMaster(t *testing.T) { + t.Parallel() + + metadata := notificationintent.Metadata{ + IdempotencyKey: "gm-stage07-freeze", + OccurredAt: time.UnixMilli(1775121700000).UTC(), + } + recipientUserIDs := []string{"user-1", "user-2"} + + t.Run("game.turn.ready", func(t *testing.T) { + t.Parallel() + + intent, err := notificationintent.NewGameTurnReadyIntent( + metadata, + recipientUserIDs, + notificationintent.GameTurnReadyPayload{ + GameID: "game-1", + GameName: "Nebula Clash", + TurnNumber: 7, + }, + ) + require.NoError(t, err) + + require.Equal(t, notificationintent.NotificationTypeGameTurnReady, intent.NotificationType) + require.Equal(t, "game.turn.ready", intent.NotificationType.String()) + require.Equal(t, notificationintent.ProducerGameMaster, intent.Producer) + require.Equal(t, notificationintent.AudienceKindUser, intent.AudienceKind) + require.Equal(t, []string{"user-1", "user-2"}, intent.RecipientUserIDs) + require.Equal(t, metadata.IdempotencyKey, intent.IdempotencyKey) + require.True(t, intent.OccurredAt.Equal(metadata.OccurredAt)) + require.NoError(t, intent.Validate()) + + require.Contains(t, intent.PayloadJSON, `"game_id":"game-1"`) + require.Contains(t, intent.PayloadJSON, `"game_name":"Nebula Clash"`) + require.Contains(t, intent.PayloadJSON, `"turn_number":7`) + }) + + t.Run("game.finished", func(t *testing.T) { + t.Parallel() + + intent, err := notificationintent.NewGameFinishedIntent( + metadata, + recipientUserIDs, + notificationintent.GameFinishedPayload{ + GameID: "game-1", + GameName: "Nebula Clash", + FinalTurnNumber: 7, + }, + ) + require.NoError(t, err) + + require.Equal(t, notificationintent.NotificationTypeGameFinished, intent.NotificationType) + require.Equal(t, "game.finished", intent.NotificationType.String()) + require.Equal(t, notificationintent.ProducerGameMaster, intent.Producer) + require.Equal(t, notificationintent.AudienceKindUser, intent.AudienceKind) + require.Equal(t, []string{"user-1", "user-2"}, intent.RecipientUserIDs) + require.NoError(t, intent.Validate()) + + require.Contains(t, intent.PayloadJSON, `"game_id":"game-1"`) + require.Contains(t, intent.PayloadJSON, `"game_name":"Nebula Clash"`) + require.Contains(t, intent.PayloadJSON, `"final_turn_number":7`) + }) + + t.Run("game.generation_failed", func(t *testing.T) { + t.Parallel() + + intent, err := notificationintent.NewGameGenerationFailedIntent( + metadata, + notificationintent.GameGenerationFailedPayload{ + GameID: "game-1", + GameName: "Nebula Clash", + FailureReason: "engine_timeout", + }, + ) + require.NoError(t, err) + + require.Equal(t, notificationintent.NotificationTypeGameGenerationFailed, intent.NotificationType) + require.Equal(t, "game.generation_failed", intent.NotificationType.String()) + require.Equal(t, notificationintent.ProducerGameMaster, intent.Producer) + require.Equal(t, notificationintent.AudienceKindAdminEmail, intent.AudienceKind) + require.Empty(t, intent.RecipientUserIDs) + require.NoError(t, intent.Validate()) + + require.Contains(t, intent.PayloadJSON, `"game_id":"game-1"`) + require.Contains(t, intent.PayloadJSON, `"game_name":"Nebula Clash"`) + require.Contains(t, intent.PayloadJSON, `"failure_reason":"engine_timeout"`) + }) + + t.Run("audience and channel matrix", func(t *testing.T) { + t.Parallel() + + userTypes := []notificationintent.NotificationType{ + notificationintent.NotificationTypeGameTurnReady, + notificationintent.NotificationTypeGameFinished, + } + for _, notificationType := range userTypes { + notificationType := notificationType + t.Run(notificationType.String(), func(t *testing.T) { + t.Parallel() + + require.Equal(t, notificationintent.ProducerGameMaster, notificationType.ExpectedProducer()) + + require.True(t, notificationType.SupportsAudience(notificationintent.AudienceKindUser)) + require.False(t, notificationType.SupportsAudience(notificationintent.AudienceKindAdminEmail)) + + require.True(t, notificationType.SupportsChannel(notificationintent.AudienceKindUser, notificationintent.ChannelPush)) + require.True(t, notificationType.SupportsChannel(notificationintent.AudienceKindUser, notificationintent.ChannelEmail)) + require.False(t, notificationType.SupportsChannel(notificationintent.AudienceKindAdminEmail, notificationintent.ChannelEmail)) + }) + } + + t.Run("game.generation_failed", func(t *testing.T) { + t.Parallel() + + notificationType := notificationintent.NotificationTypeGameGenerationFailed + + require.Equal(t, notificationintent.ProducerGameMaster, notificationType.ExpectedProducer()) + + require.True(t, notificationType.SupportsAudience(notificationintent.AudienceKindAdminEmail)) + require.False(t, notificationType.SupportsAudience(notificationintent.AudienceKindUser)) + + require.True(t, notificationType.SupportsChannel(notificationintent.AudienceKindAdminEmail, notificationintent.ChannelEmail)) + require.False(t, notificationType.SupportsChannel(notificationintent.AudienceKindAdminEmail, notificationintent.ChannelPush)) + require.False(t, notificationType.SupportsChannel(notificationintent.AudienceKindUser, notificationintent.ChannelEmail)) + }) + }) +} diff --git a/gateway/go.mod b/gateway/go.mod index e5aebf9..840d151 100644 --- a/gateway/go.mod +++ b/gateway/go.mod @@ -68,7 +68,7 @@ require ( github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/pelletier/go-toml/v2 v2.3.0 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect diff --git a/gateway/go.sum b/gateway/go.sum index aca342f..c640d50 100644 --- a/gateway/go.sum +++ b/gateway/go.sum @@ -109,8 +109,7 @@ github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/pelletier/go-toml/v2 v2.3.0 h1:k59bC/lIZREW0/iVaQR8nDHxVq8OVlIzYCOJf421CaM= github.com/pelletier/go-toml/v2 v2.3.0/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY= github.com/perimeterx/marshmallow v1.1.5 h1:a2LALqQ1BlHM8PZblsDdidgv1mWi1DgC2UmX50IvK2s= diff --git a/go.work b/go.work index bd13729..aa789e1 100644 --- a/go.work +++ b/go.work @@ -4,6 +4,7 @@ use ( ./authsession ./client ./game + ./gamemaster ./gateway ./integration ./lobby @@ -11,6 +12,7 @@ use ( ./notification ./pkg/calc ./pkg/connector + ./pkg/cronutil ./pkg/error ./pkg/geoip ./pkg/model @@ -28,6 +30,7 @@ use ( replace ( galaxy/calc v0.0.0 => ./pkg/calc galaxy/connector v0.0.0 => ./pkg/connector + galaxy/cronutil v0.0.0 => ./pkg/cronutil galaxy/error v0.0.0 => ./pkg/error galaxy/geoip v0.0.0 => ./pkg/geoip galaxy/model v0.0.0 => ./pkg/model diff --git a/lobby/PLAN.md b/lobby/PLAN.md index 61c06ce..a57ba91 100644 --- a/lobby/PLAN.md +++ b/lobby/PLAN.md @@ -1450,3 +1450,16 @@ and the addition of the `reason` enum to the stop envelope — are owned by the Runtime Manager implementation plan, not by this document. See [`../rtmanager/PLAN.md`](../rtmanager/PLAN.md) §«Stage 06. Lobby publisher refactor». No new stages are added here for that work. + +## Note: Game Master Refactor (image-ref + membership invalidate) + +The retirement of `LOBBY_ENGINE_IMAGE_TEMPLATE` together with the +inline `engineimage.Resolver` package, the synchronous switch to +`Game Master`'s `GET /api/v1/internal/engine-versions/{version}/image-ref` +for image-ref resolution, and the new outgoing +`POST /api/v1/internal/games/{game_id}/memberships/invalidate` hook from +`approveapplication`, `rejectapplication`, `redeeminvite`, +`removemember`, `blockmember`, and the user-lifecycle cascade worker +are owned by the Game Master implementation plan, not by this document. +See [`../gamemaster/PLAN.md`](../gamemaster/PLAN.md) §«Stage 20. Lobby +refactor». No new stages are added here for that work. diff --git a/lobby/README.md b/lobby/README.md index a268b42..aa9d773 100644 --- a/lobby/README.md +++ b/lobby/README.md @@ -150,7 +150,9 @@ The service starts two HTTP listeners and one Redis Stream consumer pipeline. - `User Service` reachable at `LOBBY_USER_SERVICE_BASE_URL` (startup check only; runtime failures are surfaced as request errors, not boot failures) - `Game Master` at `LOBBY_GM_BASE_URL` (same policy — startup check omitted; - unreachability at registration triggers the forced-pause path) + unreachability at image-ref resolve fails `lobby.game.start` with + `service_unavailable`, unreachability at register-runtime triggers the + forced-pause path) ### Probes @@ -714,27 +716,55 @@ sequenceDiagram Admin->>Lobby: lobby.game.start Lobby->>Lobby: validate ready_to_start + roster - Lobby->>Lobby: status → starting - Lobby->>Redis: publish start job to runtime:start_jobs - Runtime->>Runtime: start container - Runtime->>Redis: publish result to runtime:job_results + Lobby->>GM: GET /internal/engine-versions/{version}/image-ref (sync) + alt GM image-ref resolve failed + GM-->>Lobby: error / timeout / not found + Lobby-->>Admin: service_unavailable (GM unreachable) or engine_version_not_found + else image_ref resolved + GM-->>Lobby: 200 OK { image_ref } + Lobby->>Lobby: status → starting + Lobby->>Redis: publish start job to runtime:start_jobs (with image_ref) + Runtime->>Runtime: start container + Runtime->>Redis: publish result to runtime:job_results - alt container start failed - Lobby->>Lobby: status → start_failed - else container started - Lobby->>Lobby: persist runtime binding - Lobby->>GM: POST /internal/games/{game_id}/register (sync) - alt GM registration success - GM-->>Lobby: 200 OK - Lobby->>Lobby: status → running; set started_at - else GM unavailable - GM-->>Lobby: error / timeout - Lobby->>Lobby: status → paused - Lobby->>Redis: publish lobby.runtime_paused_after_start intent + alt container start failed + Lobby->>Lobby: status → start_failed + else container started + Lobby->>Lobby: persist runtime binding + Lobby->>GM: POST /internal/games/{game_id}/register-runtime (sync) + alt GM registration success + GM-->>Lobby: 200 OK + Lobby->>Lobby: status → running; set started_at + else GM unavailable + GM-->>Lobby: error / timeout + Lobby->>Lobby: status → paused + Lobby->>Redis: publish lobby.runtime_paused_after_start intent + end end end ``` +### Image-ref resolution (synchronous via Game Master) + +Before publishing the start job, `Lobby` resolves the Docker `image_ref` +for `target_engine_version` by calling +`GET /api/v1/internal/engine-versions/{version}/image-ref` on `Game Master`'s +internal port. The call is synchronous and runs while the game is still +in `ready_to_start`: + +- success ⇒ `Lobby` proceeds to `starting`, embeds the resolved + `image_ref` into the `runtime:start_jobs` envelope, and publishes; +- the version is missing or deprecated on GM (`engine_version_not_found`) + ⇒ `lobby.game.start` returns `engine_version_not_found`; the game stays + in `ready_to_start`; +- GM is unreachable (network error, timeout, `5xx`) ⇒ `lobby.game.start` + returns `service_unavailable`; the game stays in `ready_to_start` and + the operator can retry. + +Resolving against GM is the v1 contract; the legacy +`LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template variable is retired together +with the inline `engineimage.Resolver`. + ### Critical invariants - If the container starts but `Lobby` cannot persist the runtime binding metadata, @@ -743,6 +773,10 @@ sequenceDiagram - If metadata is persisted but `Game Master` is unavailable, the game must be placed in `paused`, not in `start_failed`. The container is alive; only the platform tracking is incomplete. +- If `Game Master` is unavailable at image-ref resolve time, the start + command itself fails with `service_unavailable`. The game stays in + `ready_to_start`; no container is created and no `runtime:start_jobs` + envelope is published. - No start job is accepted while the game is not in `ready_to_start`. - Concurrent start attempts for the same game must be serialized; the second attempt must fail if the first already moved the game to `starting`. @@ -758,7 +792,7 @@ is no synchronous Lobby→RTM REST call in v1 or planned for v2. | Field | Type | Notes | | --- | --- | --- | | `game_id` | string | Lobby `game_id`. | -| `image_ref` | string | Docker reference resolved from `target_engine_version` via `LOBBY_ENGINE_IMAGE_TEMPLATE`. | +| `image_ref` | string | Docker reference resolved synchronously from `target_engine_version` against `Game Master`'s engine version registry; see §Game Start Flow. | | `requested_at_ms` | int64 | UTC milliseconds; diagnostics only. | `runtime:stop_jobs` envelope: @@ -803,40 +837,6 @@ Alternatives considered and rejected: outside that package and would have to depend on a concrete adapter for an enum value. -### Design rationale: `engineimage.Resolver` validates the template at construction - -`engineimage.Resolver` stores the validated template; the per-game -`Resolve(version)` call is therefore a pure string substitution that -cannot fail except on an empty `version`. - -`LOBBY_ENGINE_IMAGE_TEMPLATE` is loaded at startup. A malformed value -(missing `{engine_version}` placeholder, empty string) is an -operational misconfiguration that fails fast before any traffic arrives -— not on the first start-game request hours later. The synchronous -start handler then incurs no per-call template-shape recheck. - -A stateless free function `engineimage.Resolve(template, version)` was -rejected: the only useful checkpoint for the template literal is at -startup; a free function would either re-validate on every call (waste) -or skip validation (regression). - -The resolver only guards against an empty/whitespace `version`. Semver -validation lives in `lobby/internal/domain/game/model.go:validateSemver` -and runs at game-record construction time. Re-running it inside the -resolver would either duplicate the rule (drift risk) or import the -validator across package boundaries for no behavioural gain. Keeping the -resolver narrow leaves it reusable from a future producer (for example -`Game Master`, when it takes over `image_ref` resolution) without -dragging Lobby's domain rules along. - -The defensive `return start game: resolve image ref: %w` in -`startgame.Service.Handle` is a guard against a future invariant -violation; it is not exercised by the service-level test suite because -the only resolver-failure mode (empty `version`) requires bypassing -`game.Validate`, which `gameinmem.Save` always runs. Adding test -scaffolding to skip validation would teach the test suite a back door -that the production code path does not have. - ## Paused State `Lobby.paused` is a platform-level pause, distinct from `Game Master` runtime @@ -904,11 +904,12 @@ game finish. ### Per-member stats aggregate Each `runtime_snapshot_update` carries a `player_turn_stats` array with one -entry per active member: `{user_id, planets, population, ships_built}`. +entry per active member: `{user_id, planets, population}`. `Lobby` aggregates these in `lobby:game_turn_stats::` with the shape -`{initial_planets, initial_population, initial_ships_built, max_planets, -max_population, max_ships_built}`. +`{initial_planets, initial_population, max_planets, max_population}`. +`ships_built` is not part of the contract; the capability rule reduces to +`planets` and `population` only. Rules: @@ -1032,11 +1033,18 @@ Key internal endpoints: | `GET` | `/api/v1/internal/healthz` | health probe | | `GET` | `/api/v1/internal/readyz` | readiness probe | -Note: the registration call from Lobby to Game Master after a successful -container start is **outgoing** — Lobby calls -`POST /api/v1/internal/games/{game_id}/register-runtime` on Game Master's -internal port. Lobby does not expose an inbound `register-runtime` -endpoint. +Note: every Lobby ↔ Game Master synchronous call is **outgoing** from +Lobby to Game Master's internal port at `LOBBY_GM_BASE_URL`. Lobby does +not expose an inbound `register-runtime` endpoint or any other +GM-facing endpoint: + +| Call site | Method | Path on Game Master | Purpose | +| --- | --- | --- | --- | +| `startgame` (pre-publish) | `GET` | `/api/v1/internal/engine-versions/{version}/image-ref` | Resolve the Docker `image_ref` for `target_engine_version` synchronously before publishing `runtime:start_jobs`. Failure ⇒ `service_unavailable` or `engine_version_not_found`; the game stays in `ready_to_start`. | +| `startgame` (post-container-up) | `POST` | `/api/v1/internal/games/{game_id}/register-runtime` | Register the runtime after a successful container start. Failure ⇒ forced `paused` (see §Paused State). | +| `approveapplication`, `rejectapplication`, `redeeminvite`, `removemember`, `blockmember`, user-lifecycle cascade | `POST` | `/api/v1/internal/games/{game_id}/memberships/invalidate` | Tell GM to drop its in-process membership cache for the game after a roster mutation. Called **post-commit** and is fail-open: a non-2xx response is logged and metered but never rolls back the Lobby commit. GM's TTL safety net catches stale data within the next cache TTL window. | +| `removemember` (engine-side cleanup, post-commit) | `POST` | `/api/v1/internal/games/{game_id}/race/{race_name}/banish` | Ask GM to deactivate the engine-side player after a permanent removal. Fail-open in the same sense as the invalidate call. | +| `resumegame` | `GET` | `/api/v1/internal/games/{game_id}/liveness` | Check that GM has the runtime in `running` before transitioning the platform record from `paused` back to `running`. | Admin-only operations (approve, reject, cancel, create public games, etc.) are also exposed on the internal port and are intended to be called by `Admin Service` @@ -1158,6 +1166,9 @@ Stable error codes: `permanent_block` sanction - `forbidden` — caller is not authorized for this operation on this game or this race name +- `engine_version_not_found` — `target_engine_version` is missing or + deprecated on `Game Master`'s engine version registry (returned by + `lobby.game.start` at image-ref resolve time) - `internal_error` — unexpected service error - `service_unavailable` — upstream dependency unavailable @@ -1227,13 +1238,12 @@ Stream names: - `LOBBY_RUNTIME_JOB_RESULTS_READ_BLOCK_TIMEOUT` with default `2s` - `LOBBY_NOTIFICATION_INTENTS_STREAM` with default `notification:intents` -Runtime Manager integration: +Game Master image-ref resolver: -- `LOBBY_ENGINE_IMAGE_TEMPLATE` with default `galaxy/game:{engine_version}` — - Go-style template applied to a game's `target_engine_version` to resolve - the Docker `image_ref` published on `runtime:start_jobs`. The template - must contain the literal placeholder `{engine_version}`; Lobby fails - fast at startup otherwise. +- `image_ref` is resolved synchronously by `Game Master` from + `target_engine_version` over its engine version registry; see + §Game Start Flow. The legacy `LOBBY_ENGINE_IMAGE_TEMPLATE` Go-template + variable is retired and rejected at startup if set. Upstream clients: diff --git a/lobby/go.mod b/lobby/go.mod index 2373499..142754c 100644 --- a/lobby/go.mod +++ b/lobby/go.mod @@ -3,6 +3,7 @@ module galaxy/lobby go 1.26.1 require ( + galaxy/cronutil v0.0.0-00010101000000-000000000000 galaxy/postgres v0.0.0-00010101000000-000000000000 galaxy/redisconn v0.0.0-00010101000000-000000000000 github.com/alicebob/miniredis/v2 v2.37.0 @@ -11,7 +12,6 @@ require ( github.com/go-jet/jet/v2 v2.14.1 github.com/jackc/pgx/v5 v5.9.2 github.com/redis/go-redis/v9 v9.18.0 - github.com/robfig/cron/v3 v3.0.1 github.com/stretchr/testify v1.11.1 github.com/testcontainers/testcontainers-go v0.42.0 github.com/testcontainers/testcontainers-go/modules/postgres v0.42.0 @@ -47,6 +47,7 @@ require ( github.com/pressly/goose/v3 v3.27.1 // indirect github.com/redis/go-redis/extra/rediscmd/v9 v9.18.0 // indirect github.com/redis/go-redis/extra/redisotel/v9 v9.18.0 // indirect + github.com/robfig/cron/v3 v3.0.1 // indirect github.com/sethvargo/go-retry v0.3.0 // indirect go.uber.org/multierr v1.11.0 // indirect golang.org/x/sync v0.20.0 // indirect @@ -95,7 +96,7 @@ require ( github.com/moby/term v0.5.2 // indirect github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/opencontainers/go-digest v1.0.0 // indirect github.com/opencontainers/image-spec v1.1.1 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect @@ -123,6 +124,8 @@ require ( gopkg.in/yaml.v3 v3.0.1 // indirect ) +replace galaxy/cronutil => ../pkg/cronutil + replace galaxy/notificationintent => ../pkg/notificationintent replace galaxy/postgres => ../pkg/postgres diff --git a/lobby/go.sum b/lobby/go.sum index a063535..373f9f7 100644 --- a/lobby/go.sum +++ b/lobby/go.sum @@ -208,8 +208,7 @@ github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOF github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040= diff --git a/lobby/internal/domain/game/model.go b/lobby/internal/domain/game/model.go index 16c2205..15f34c5 100644 --- a/lobby/internal/domain/game/model.go +++ b/lobby/internal/domain/game/model.go @@ -7,9 +7,9 @@ import ( "strings" "time" + "galaxy/cronutil" "galaxy/lobby/internal/domain/common" - cron "github.com/robfig/cron/v3" "golang.org/x/mod/semver" ) @@ -213,12 +213,6 @@ type NewGameInput struct { Now time.Time } -// standardCronParser parses the frozen five-field cron expression grammar -// used by turn_schedule. -var standardCronParser = cron.NewParser( - cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow, -) - // New validates input and returns a draft Game record. Validation errors // are returned verbatim so callers can surface them as invalid_request. func New(input NewGameInput) (Game, error) { @@ -401,7 +395,7 @@ func validateCronExpression(value string) error { if strings.TrimSpace(value) == "" { return fmt.Errorf("turn schedule must not be empty") } - if _, err := standardCronParser.Parse(value); err != nil { + if _, err := cronutil.Parse(value); err != nil { return fmt.Errorf("turn schedule must be a valid five-field cron expression: %w", err) } return nil diff --git a/mail/go.mod b/mail/go.mod index 14f3133..e82cf4d 100644 --- a/mail/go.mod +++ b/mail/go.mod @@ -85,7 +85,7 @@ require ( github.com/moby/term v0.5.2 // indirect github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/opencontainers/go-digest v1.0.0 // indirect github.com/opencontainers/image-spec v1.1.1 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect diff --git a/mail/go.sum b/mail/go.sum index 0ffa8a0..83a8c9d 100644 --- a/mail/go.sum +++ b/mail/go.sum @@ -206,8 +206,7 @@ github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOF github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040= diff --git a/notification/README.md b/notification/README.md index 12eb7bc..1f8dbc8 100644 --- a/notification/README.md +++ b/notification/README.md @@ -302,6 +302,8 @@ Accepted intents use the original Redis Stream `stream_entry_id` as Rules: - v1 supports exactly the eighteen `notification_type` values listed above +- the three `game.*` types — `game.turn.ready`, `game.finished`, and + `game.generation_failed` — are produced exclusively by `Game Master` - `lobby.application.submitted` keeps one stable `notification_type` and one stable `payload_json` shape; private games publish `audience_kind=user` while public games publish `audience_kind=admin_email` diff --git a/pkg/cronutil/cronutil.go b/pkg/cronutil/cronutil.go new file mode 100644 index 0000000..4633c23 --- /dev/null +++ b/pkg/cronutil/cronutil.go @@ -0,0 +1,47 @@ +// Package cronutil provides a thin wrapper over github.com/robfig/cron/v3 +// for parsing the five-field cron expressions used by Galaxy services to +// describe periodic schedules such as turn_schedule. It exposes only the +// parser and a Next computation; no scheduler runtime, no logging, and no +// concurrency primitives. +package cronutil + +import ( + "fmt" + "time" + + "github.com/robfig/cron/v3" +) + +// fiveFieldParser parses standard five-field cron expressions +// (minute, hour, day-of-month, month, day-of-week). The grammar matches +// what Galaxy services accept for turn_schedule and is the only grammar +// supported by this package; six-field expressions with a seconds field +// are rejected. +var fiveFieldParser = cron.NewParser( + cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow, +) + +// Schedule holds a parsed five-field cron expression and computes the +// next firing time after a given moment. The zero value is not usable; +// callers obtain a Schedule from Parse. +type Schedule struct { + inner cron.Schedule +} + +// Parse parses expr as a five-field cron expression and returns the +// resulting Schedule. Parse returns an error if expr is empty, contains +// a seconds field, or is otherwise rejected by the underlying parser. +func Parse(expr string) (Schedule, error) { + inner, err := fiveFieldParser.Parse(expr) + if err != nil { + return Schedule{}, fmt.Errorf("cronutil: parse %q: %w", expr, err) + } + return Schedule{inner: inner}, nil +} + +// Next returns the next firing time strictly after after. The returned +// time is always in UTC; callers passing UTC values therefore get UTC +// values back. Calling Next on a zero-value Schedule panics. +func (s Schedule) Next(after time.Time) time.Time { + return s.inner.Next(after.UTC()).UTC() +} diff --git a/pkg/cronutil/cronutil_test.go b/pkg/cronutil/cronutil_test.go new file mode 100644 index 0000000..6fa6655 --- /dev/null +++ b/pkg/cronutil/cronutil_test.go @@ -0,0 +1,101 @@ +package cronutil_test + +import ( + "testing" + "time" + + "galaxy/cronutil" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestParseAcceptsFiveFieldExpressions(t *testing.T) { + t.Parallel() + + accepted := []string{ + "0 18 * * *", + "*/15 * * * *", + "0 */6 * * *", + "0 */4 * * *", + "0 0 * * *", + "0 0 1 1 *", + "30 9 * * MON-FRI", + } + for _, expr := range accepted { + t.Run(expr, func(t *testing.T) { + t.Parallel() + + _, err := cronutil.Parse(expr) + require.NoError(t, err) + }) + } +} + +func TestParseRejectsInvalidExpressions(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + expr string + }{ + {name: "empty", expr: ""}, + {name: "whitespace only", expr: " "}, + {name: "garbage", expr: "not a cron"}, + {name: "six field with seconds", expr: "*/30 * * * * *"}, + {name: "too few fields", expr: "0 18 *"}, + {name: "minute out of range", expr: "60 * * * *"}, + {name: "hour out of range", expr: "* 24 * * *"}, + {name: "month out of range", expr: "* * * 13 *"}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + _, err := cronutil.Parse(tc.expr) + require.Error(t, err) + }) + } +} + +func TestScheduleNextComputesExpectedFire(t *testing.T) { + t.Parallel() + + schedule, err := cronutil.Parse("0 18 * * *") + require.NoError(t, err) + + after := time.Date(2026, 1, 1, 0, 0, 0, 0, time.UTC) + next := schedule.Next(after) + + assert.Equal(t, time.Date(2026, 1, 1, 18, 0, 0, 0, time.UTC), next) + assert.Equal(t, time.UTC, next.Location()) +} + +func TestScheduleNextStepsToNextFifteenMinuteSlot(t *testing.T) { + t.Parallel() + + schedule, err := cronutil.Parse("*/15 * * * *") + require.NoError(t, err) + + after := time.Date(2026, 6, 30, 12, 7, 33, 0, time.UTC) + next := schedule.Next(after) + + assert.Equal(t, time.Date(2026, 6, 30, 12, 15, 0, 0, time.UTC), next) + assert.Equal(t, time.UTC, next.Location()) +} + +func TestScheduleNextReturnsUTCForNonUTCInput(t *testing.T) { + t.Parallel() + + schedule, err := cronutil.Parse("0 18 * * *") + require.NoError(t, err) + + moscow, err := time.LoadLocation("Europe/Moscow") + require.NoError(t, err) + + after := time.Date(2026, 1, 1, 12, 0, 0, 0, moscow) // 09:00 UTC + next := schedule.Next(after) + + assert.Equal(t, time.Date(2026, 1, 1, 18, 0, 0, 0, time.UTC), next) + assert.Equal(t, time.UTC, next.Location()) +} diff --git a/pkg/cronutil/go.mod b/pkg/cronutil/go.mod new file mode 100644 index 0000000..b4bc462 --- /dev/null +++ b/pkg/cronutil/go.mod @@ -0,0 +1,17 @@ +module galaxy/cronutil + +go 1.26.1 + +require ( + github.com/robfig/cron/v3 v3.0.1 + github.com/stretchr/testify v1.11.1 +) + +require ( + github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect + github.com/kr/pretty v0.3.1 // indirect + github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect + github.com/rogpeppe/go-internal v1.14.1 // indirect + gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect +) diff --git a/pkg/cronutil/go.sum b/pkg/cronutil/go.sum new file mode 100644 index 0000000..ccf5037 --- /dev/null +++ b/pkg/cronutil/go.sum @@ -0,0 +1,13 @@ +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= +github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= +github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= +github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs= +github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro= +github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= +github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= +github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/pkg/model/rest/banish.go b/pkg/model/rest/banish.go new file mode 100644 index 0000000..dea77de --- /dev/null +++ b/pkg/model/rest/banish.go @@ -0,0 +1,7 @@ +package rest + +// BanishRequest is the request body of POST /api/v1/admin/race/banish. +// RaceName must identify an existing race in the engine roster. +type BanishRequest struct { + RaceName string `json:"race_name" binding:"required,notblank"` +} diff --git a/pkg/model/rest/status.go b/pkg/model/rest/status.go index 0d758bf..7566359 100644 --- a/pkg/model/rest/status.go +++ b/pkg/model/rest/status.go @@ -11,6 +11,10 @@ type StateResponse struct { Stage uint `json:"stage"` // List of Game's players Players []PlayerState `json:"player"` + // Finished is true on the turn-generation response that ends the + // game; otherwise false. Game Master uses this as the sole signal to + // run the platform finish flow. + Finished bool `json:"finished"` } type PlayerState struct { diff --git a/rtmanager/README.md b/rtmanager/README.md index 3be0899..27eb256 100644 --- a/rtmanager/README.md +++ b/rtmanager/README.md @@ -561,10 +561,11 @@ Producer: `Runtime Manager`. Consumer: `Game Lobby`. | `error_code` | string | Stable code. `replay_no_op` for idempotent re-runs. | | `error_message` | string | Operator-readable detail. | -### `runtime:health_events` (out, new) +### `runtime:health_events` (out) -Producer: `Runtime Manager`. Consumers: `Game Master`; `Game Lobby` and `Admin Service` -are reserved as future consumers. +Producer: `Runtime Manager`. Consumer: `Game Master` — confirmed in +production. `Game Lobby` and `Admin Service` are reserved as future +consumers; they do not read the stream in v1. | Field | Type | Notes | | --- | --- | --- | diff --git a/rtmanager/go.mod b/rtmanager/go.mod index ba68ee9..f403d52 100644 --- a/rtmanager/go.mod +++ b/rtmanager/go.mod @@ -89,7 +89,7 @@ require ( github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/morikuni/aec v1.1.0 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/opencontainers/go-digest v1.0.0 // indirect github.com/opencontainers/image-spec v1.1.1 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect diff --git a/rtmanager/go.sum b/rtmanager/go.sum index 4d55a44..c8a5415 100644 --- a/rtmanager/go.sum +++ b/rtmanager/go.sum @@ -213,8 +213,7 @@ github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOF github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040= diff --git a/user/go.mod b/user/go.mod index 166e1e6..36d18f2 100644 --- a/user/go.mod +++ b/user/go.mod @@ -103,7 +103,7 @@ require ( github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect github.com/oasdiff/yaml v0.0.9 // indirect - github.com/oasdiff/yaml3 v0.0.9 // indirect + github.com/oasdiff/yaml3 v0.0.12 // indirect github.com/opencontainers/go-digest v1.0.0 // indirect github.com/opencontainers/image-spec v1.1.1 // indirect github.com/pelletier/go-toml/v2 v2.3.0 // indirect diff --git a/user/go.sum b/user/go.sum index 933ab7f..3ac6a4c 100644 --- a/user/go.sum +++ b/user/go.sum @@ -240,8 +240,7 @@ github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOF github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls= github.com/oasdiff/yaml v0.0.9 h1:zQOvd2UKoozsSsAknnWoDJlSK4lC0mpmjfDsfqNwX48= github.com/oasdiff/yaml v0.0.9/go.mod h1:8lvhgJG4xiKPj3HN5lDow4jZHPlx1i7dIwzkdAo6oAM= -github.com/oasdiff/yaml3 v0.0.9 h1:rWPrKccrdUm8J0F3sGuU+fuh9+1K/RdJlWF7O/9yw2g= -github.com/oasdiff/yaml3 v0.0.9/go.mod h1:y5+oSEHCPT/DGrS++Wc/479ERge0zTFxaF8PbGKcg2o= +github.com/oasdiff/yaml3 v0.0.12 h1:75urAtPeDg2/iDEWwzNrLOWxI9N/dCh81nTTJtokt2M= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=