galaxy-game/rtmanager/README.md

# Runtime Manager

`Runtime Manager` (RTM) is the only Galaxy platform service permitted to interact with the
Docker daemon. It owns the lifecycle of `galaxy/game` engine containers and the technical
runtime view of running games. Other services consume RTM via two transports: an asynchronous
Redis Streams contract (used by `Game Lobby`) and a synchronous internal REST surface (used by
`Game Master` and `Admin Service`).

## References

- [`../ARCHITECTURE.md`](../ARCHITECTURE.md) — system architecture, §9 Runtime Manager.
- [`../TESTING.md`](../TESTING.md) §7 — testing matrix for RTM.
- [`./docs/README.md`](./docs/README.md) — service-local documentation entry point.
- [`./api/internal-openapi.yaml`](./api/internal-openapi.yaml) — REST contract.
- [`./api/runtime-jobs-asyncapi.yaml`](./api/runtime-jobs-asyncapi.yaml) — start/stop job
  streams contract.
- [`./api/runtime-health-asyncapi.yaml`](./api/runtime-health-asyncapi.yaml) —
  `runtime:health_events` stream contract.
- [`../game/README.md`](../game/README.md) — game engine container contract (env, ports,
  `/healthz`).
- [`../lobby/README.md`](../lobby/README.md) — Game Lobby integration with RTM.

## Purpose

A running Galaxy game lives in exactly one Docker container. The platform must be able to:

- create the container with the right engine version and configuration;
- supply the engine with a stable storage location for game state;
- keep the runtime status visible to platform-level services;
- replace the container in place for patch upgrades and restarts;
- remove containers that are no longer needed;
- detect and surface engine failures to whoever should react.

`Runtime Manager` is the single component that performs these actions. It deliberately does
**not** reason about platform metadata, membership, schedules, turn cutoffs, or any other
business state. Game Lobby owns platform metadata; Game Master will own runtime business state
when implemented.

## Scope

`Runtime Manager` is the source of truth for:

- the mapping `game_id -> current_container_id` for every running container;
- the durable history of every start, stop, restart, patch, and cleanup operation it performed;
- the most recent technical health observation per game (last Docker event, last successful or
  failed probe, last inspect result).

`Runtime Manager` is not the source of truth for:

- any business or platform-level metadata of a game (owned by `Game Lobby`);
- runtime state visible to players or operators as game state, including current turn,
  generation status, engine version registry (owned by `Game Master`);
- the engine version catalogue or which engine version a game is allowed to use (`Game Master`
  is the future owner; `Game Lobby` supplies `image_ref` in v1);
- contents of the engine state directory; that is engine domain;
- backup, archival, or operator cleanup of state directories.

## Non-Goals

- Multi-instance operation in v1. Coordination is single-process; multiple replicas are an
  explicit future iteration.
- Engine version arbitration. The producer (`Game Lobby` in v1, `Game Master` later) supplies `image_ref`.
- Image registry control. Pull policy is configurable, but RTM does not push, retag, or
  promote images.
- TLS or mTLS on the internal listener. RTM trusts its network segment.
- Direct delivery of player-visible push notifications. RTM publishes admin-only notification
  intents only for failures invisible elsewhere; everything else is delegated.
- Kubernetes, Docker Swarm, or other orchestrators. v1 targets a single Docker daemon reached
  through `unix:///var/run/docker.sock`.

## Position in the System

```mermaid
flowchart LR
    Lobby["Game Lobby"]
    GM["Game Master"]
    Admin["Admin Service"]
    Notify["Notification Service"]
    RTM["Runtime Manager"]
    Engine["Game Engine container"]
    Docker["Docker Daemon"]
    Postgres["PostgreSQL\nschema rtmanager"]
    Redis["Redis\nstreams + leases"]

    Lobby -->|runtime:start_jobs / stop_jobs| RTM
    RTM -->|runtime:job_results| Lobby
    GM -->|internal REST| RTM
    Admin -->|internal REST| RTM
    RTM -->|notification:intents (admin)| Notify
    RTM -->|runtime:health_events| Redis
    RTM <--> Docker
    Docker -->|create / start / stop / rm| Engine
    RTM --> Postgres
    RTM --> Redis
    Engine -.bind mount.- StateDir["host:\n<RTMANAGER_GAME_STATE_ROOT>/{game_id}"]
```

## Responsibility Boundaries

`Runtime Manager` is responsible for:

- accepting start, stop, restart, patch, inspect, and cleanup requests through the supported
  transports and producing one durable outcome per request;
- creating Docker containers from a producer-supplied `image_ref` and binding them to the
  configured Docker network and host state directory;
- enforcing the one-game-one-container invariant in its own state and on Docker;
- monitoring container health through Docker events, periodic inspect, and active HTTP probes;
- publishing technical runtime events (`runtime:job_results`, `runtime:health_events`) and
  admin-only notification intents for failures that no other service can observe;
- reconciling its persistent state with Docker reality on startup and periodically;
- removing exited containers automatically by retention TTL or explicitly by admin command.

`Runtime Manager` is not responsible for:

- evaluating whether a game is allowed to start (Lobby validates roster, schedule, etc.);
- registering a started runtime with `Game Master` (Lobby calls GM after a successful job
  result);
- mapping platform users to engine players (GM owns this mapping);
- player command routing (GM proxies player commands directly to engine);
- cleaning up host state directories;
- patching the engine version registry; the registry lives in `Game Master`.

## Container Model

### Network

Containers attach to a single user-defined Docker bridge network. The network is provisioned
**outside** RTM: docker-compose, Terraform, or an operator runbook creates `galaxy-net` (or
whatever name is configured via `RTMANAGER_DOCKER_NETWORK`).

RTM validates the network's presence at startup. A missing network is a fail-fast condition;
the process exits non-zero before opening any listener.

### DNS name and engine endpoint

Each container is created with hostname `galaxy-game-{game_id}` and is attached to the
configured network. Docker's embedded DNS resolves the hostname for any other container in the
same network.

The `engine_endpoint` published in `runtime:job_results` and visible through the inspect REST
endpoint is the full URL `http://galaxy-game-{game_id}:8080`. The port is fixed at `8080`
inside the container; RTM does not publish ports to the host.

Restart and patch keep the same DNS name. The `container_id` changes; the `engine_endpoint`
does not.

### State storage (bind mount)

Engine state lives on the host filesystem. RTM never uses Docker named volumes — the rationale
is operator-friendly backup and inspection.

- Host root: `RTMANAGER_GAME_STATE_ROOT` (operator-supplied, e.g. `/var/lib/galaxy/games`).
- Per-game directory: `<RTMANAGER_GAME_STATE_ROOT>/{game_id}`. RTM creates it with permissions
  `RTMANAGER_GAME_STATE_DIR_MODE` (default `0750`) and ownership `RTMANAGER_GAME_STATE_OWNER_UID`
  / `_GID` (default `0:0` — operator overrides for non-root engine).
- Bind mount: the per-game directory is mounted into the container at the path declared by
  `RTMANAGER_ENGINE_STATE_MOUNT_PATH` (default `/var/lib/galaxy-game`).
- Environment: the container receives `GAME_STATE_PATH=<mount path>`. The engine resolves the
  path from this variable. The same variable is forwarded to the engine as `STORAGE_PATH` for
  backward compatibility — both names are accepted in v1.

RTM never deletes the host state directory. Removing it is the responsibility of operator
tooling (backup, manual cleanup, or future Admin Service workflows). Removing the container
through the cleanup endpoint or the retention TTL leaves the directory intact.

### Container labels

RTM applies the following labels to every container it creates:

| Label | Value | Purpose |
| --- | --- | --- |
| `com.galaxy.owner` | `rtmanager` | Filter for `docker ps` and reconcile. |
| `com.galaxy.kind` | `game-engine` | Differentiates from infra containers. |
| `com.galaxy.game_id` | `{game_id}` | Reverse lookup from container to platform game. |
| `com.galaxy.engine_image_ref` | `{image_ref}` | Cross-check against `runtime_records`. |
| `com.galaxy.started_at_ms` | `{ms}` | Unambiguous start timestamp. |

Labels are read from the resolved engine image to choose resource limits (see below).

### Resource limits

Resource limits originate in the **engine image**, not in the producer envelope or RTM config:

| Image label | Container limit | RTM fallback config |
| --- | --- | --- |
| `com.galaxy.cpu_quota` | `--cpus` value | `RTMANAGER_DEFAULT_CPU_QUOTA` (default `1.0`) |
| `com.galaxy.memory` | `--memory` value | `RTMANAGER_DEFAULT_MEMORY` (default `512m`) |
| `com.galaxy.pids_limit` | `--pids-limit` value | `RTMANAGER_DEFAULT_PIDS_LIMIT` (default `512`) |

If a label is missing or unparseable, RTM uses the matching fallback. Producers never pass
limits.

### Logging driver

Engine container stdout / stderr are routed by Docker's logging driver. RTM passes the driver
and its options when creating the container:

- `RTMANAGER_DOCKER_LOG_DRIVER` (default `json-file`).
- `RTMANAGER_DOCKER_LOG_OPTS` (default empty; comma-separated `key=value` pairs).

RTM never reads the container's stdout itself. Operators consume engine logs via `docker logs`
or via whatever sink the configured driver feeds (fluentd, journald, etc.).

The production Docker SDK adapter that creates and starts these containers lives at
`internal/adapters/docker/`. Its design rationale — fixed engine port, partial-rollback on
`ContainerStart` failure, events-stream filter rationale, and the `mockgen`-driven service-test
fixture — is captured in [`docs/adapters.md`](docs/adapters.md).

## Runtime Surface

### Listeners

| Listener | Default address | Purpose |
| --- | --- | --- |
| `internal` HTTP | `:8096` (`RTMANAGER_INTERNAL_HTTP_ADDR`) | Probes (`/healthz`, `/readyz`) and the trusted REST surface for `Game Master` and `Admin Service`. |

There is no public listener. The internal listener is unauthenticated and assumes a trusted
network segment.

### Background workers

| Worker | Driver | Description |
| --- | --- | --- |
| `startjobs` consumer | Redis Stream `runtime:start_jobs` | Decodes start envelope and invokes the start service. |
| `stopjobs` consumer | Redis Stream `runtime:stop_jobs` | Decodes stop envelope and invokes the stop service. |
| Docker events listener | Docker `/events` API | Subscribes with the label filter, emits `runtime:health_events` for container_started / exited / oom / disappeared. |
| Active HTTP probe | Periodic | `GET {engine_endpoint}/healthz` for every running runtime; emits `probe_failed` / `probe_recovered` with hysteresis. |
| Periodic Docker inspect | Periodic | Refreshes inspect data; emits `inspect_unhealthy` when restart_count grows or status is unexpected. |
| Reconciler | Startup + periodic | Reconciles `runtime_records` with `docker ps` (see Reconciliation section). |
| Container cleanup | Periodic | Removes exited containers older than `RTMANAGER_CONTAINER_RETENTION_DAYS`. |

### Startup dependencies

In start order:

1. PostgreSQL primary (DSN `RTMANAGER_POSTGRES_PRIMARY_DSN`). Goose migrations apply
   synchronously before any listener opens.
2. Redis master (`RTMANAGER_REDIS_MASTER_ADDR`).
3. Docker daemon at `RTMANAGER_DOCKER_HOST` (default `unix:///var/run/docker.sock`). RTM
   verifies API ping and the presence of `RTMANAGER_DOCKER_NETWORK`.
4. Telemetry exporter (OTLP grpc/http or stdout).
5. Internal HTTP listener.
6. Reconciler runs once and blocks until done.
7. Background workers start.

A failure in any step is fatal and exits the process non-zero.

### Probes

`/healthz` reports liveness — the process responds when the HTTP server is alive.

`/readyz` reports readiness — `200` only when:

- the PostgreSQL pool can ping the primary;
- the Redis master client can ping;
- the Docker client can ping;
- the configured Docker network exists.

Both probes are documented in [`./api/internal-openapi.yaml`](./api/internal-openapi.yaml).

## Lifecycles

All operations share a per-game-id Redis lease (`rtmanager:game_lease:{game_id}`,
TTL `RTMANAGER_GAME_LEASE_TTL_SECONDS`, default `60`). The lease serialises operations on a
single game across all entry points (stream consumers and REST handlers). v1 does not renew
the lease mid-operation; long pulls of multi-GB images can therefore expire the lease before
the operation finishes — the trade-off is documented in
[`docs/services.md` §1](docs/services.md).

### Start

**Triggers:**

- Lobby: a Redis Streams entry on `runtime:start_jobs` with envelope
  `{game_id, image_ref, requested_at_ms}`.
- Game Master / Admin Service: `POST /api/v1/internal/runtimes/{game_id}/start` with body
  `{image_ref}`.

**Pre-conditions:**

- `image_ref` is a non-empty string and parseable as a Docker reference.
- Configured Docker network exists.
- The lease for `{game_id}` is acquired.

**Flow on success:**

1. Read `runtime_records.{game_id}`. If `status=running` with the same `image_ref`, return
   the existing record (idempotent success, `error_code=replay_no_op`).
2. Pull the image per `RTMANAGER_IMAGE_PULL_POLICY` (default `if_missing`).
3. Inspect the resolved image, derive resource limits from labels.
4. Ensure the per-game state directory exists with the configured mode and ownership.
5. `docker create` with the configured network, hostname, labels, env (`GAME_STATE_PATH`,
   `STORAGE_PATH`), bind mount, log driver, resource limits.
6. `docker start`.
7. Upsert `runtime_records` (`status=running`, `current_container_id`, `engine_endpoint`,
   `current_image_ref`, `started_at`, `last_op_at`).
8. Append `operation_log` entry (`op_kind=start`, `outcome=success`, source-specific
   `op_source`).
9. Publish `runtime:health_events` `container_started`.
10. For Lobby callers: publish `runtime:job_results`
    `{game_id, outcome=success, container_id, engine_endpoint}`.
    For REST callers: respond `200` with the runtime record.

**Failure paths:**

| Failure | PG side effect | Notification intent | Outcome to caller |
| --- | --- | --- | --- |
| Invalid `image_ref` shape, network missing | `operation_log` failure | `runtime.start_config_invalid` | `failure / start_config_invalid` |
| Image pull error | `operation_log` failure | `runtime.image_pull_failed` | `failure / image_pull_failed` |
| `docker create` / `start` error | `operation_log` failure | `runtime.container_start_failed` | `failure / container_start_failed` |
| State directory creation error | `operation_log` failure | `runtime.start_config_invalid` | `failure / start_config_invalid` |

A failed start never leaves a partially-running container: if `docker create` succeeded but
the subsequent step failed, RTM removes the container before recording the failure.

The production start orchestrator that implements the flow and the failure paths above lives
at `internal/service/startruntime/`. Its design rationale — why the per-game lease and the
health-events publisher live with the start service, the `Result`-shaped contract consumed by
the stream consumer and the REST handler, the rollback rule on Upsert failure, and the
`created_at`-preservation rule for re-starts — is captured in
[`docs/services.md`](docs/services.md).

### Stop

**Triggers:**

- Lobby: Redis Streams entry on `runtime:stop_jobs` with envelope
  `{game_id, reason, requested_at_ms}`. `reason ∈ {orphan_cleanup, cancelled, finished,
  admin_request, timeout}`.
- Game Master / Admin Service: `POST /api/v1/internal/runtimes/{game_id}/stop` with body
  `{reason}`.

**Pre-conditions:**

- Lease acquired.

**Flow on success:**

1. Read `runtime_records.{game_id}`. If `status` is `stopped` or `removed`, return
   idempotent success (`error_code=replay_no_op`).
2. `docker stop` with `RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS` (default `30`). Docker fires
   SIGKILL if the engine ignores SIGTERM beyond the timeout. RTM does not call any HTTP
   shutdown endpoint on the engine.
3. Update `runtime_records` (`status=stopped`, `stopped_at`, `last_op_at`).
4. Append `operation_log` entry.
5. Publish `runtime:job_results` (for Lobby) or REST `200` (for REST callers).

The container stays in `exited` state until the cleanup worker removes it (TTL) or an admin
command forces removal.

**Failure paths:**

| Failure | Outcome |
| --- | --- |
| Container not found in Docker but record `running` | Update record `status=removed`, publish `container_disappeared`, return `success` (RTM treats this as already-stopped). |
| `docker stop` returns non-zero, container still alive | Failure recorded, no state change. Caller may retry. |

### Restart

**Triggers:**

- Game Master / Admin Service: `POST /api/v1/internal/runtimes/{game_id}/restart`.

Restart is **recreate**: stop + remove + run with the same `image_ref` and the same bind
mount. `container_id` changes; `engine_endpoint` is stable.

**Flow:**

1. Read `runtime_records.{game_id}`. The current `image_ref` is captured.
2. Acquire lease.
3. Run the stop flow (without releasing the lease).
4. `docker rm` the container.
5. Run the start flow with the captured `image_ref`.
6. Append a single `operation_log` entry with `op_kind=restart` and a correlation id linking
   the implicit stop and start log entries.

If any inner step fails, the operation log records the partial outcome and the outer caller
receives the same failure; the runtime record converges to whatever state Docker reports.

### Patch

**Triggers:**

- Game Master / Admin Service: `POST /api/v1/internal/runtimes/{game_id}/patch` with body
  `{image_ref}`.

Patch is restart with a **new** `image_ref`. The engine reads its state from the bind mount
on startup, so any data written before the patch survives.

**Pre-conditions:**

- New and current image refs both parse as semver tags. `image_ref_not_semver` failure
  otherwise.
- Major and minor versions are equal between current and new (`semver_patch_only` failure
  otherwise).

**Flow:** identical to restart, with a new `image_ref` injected before the start step.
`operation_log` entry has `op_kind=patch`.

### Cleanup

**Triggers:**

- Periodic worker: every container with `runtime_records.status=stopped` and
  `last_op_at < now - RTMANAGER_CONTAINER_RETENTION_DAYS` (default `30`).
- Admin Service: `DELETE /api/v1/internal/runtimes/{game_id}/container`.

**Pre-conditions:**

- The container is not in `running` state. RTM refuses to remove a running container through
  this path; stop first.

**Flow:**

1. Acquire lease.
2. `docker rm` the container.
3. Update `runtime_records` (`status=removed`, `removed_at`, `current_container_id=NULL`,
   `last_op_at`).
4. Append `operation_log` entry (`op_kind=cleanup_container`,
   `op_source ∈ {auto_ttl, admin_rest}`).

The host state directory is left untouched.

## Health Monitoring

Three independent sources feed `runtime:health_events` and `health_snapshots`:

1. **Docker events listener.** Subscribes to the Docker events stream and filters
   container-scoped events by the `com.galaxy.owner=rtmanager` label written into every
   container by the start service. Emits:
   - `container_exited` (action=`die` with non-zero exit code; exit `0` is the normal
     graceful stop and is suppressed).
   - `container_oom` (action=`oom`).
   - `container_disappeared` (action=`destroy` observed for a `runtime_records.status=running`
     row whose `current_container_id` still matches the destroyed container, i.e. a destroy
     RTM did not initiate).

   `container_started` is emitted by the start service when it runs the container (see
   `internal/service/startruntime`), not by this listener.
2. **Periodic Docker inspect** every `RTMANAGER_INSPECT_INTERVAL` (default `30s`). Emits
   `inspect_unhealthy` when:
   - `RestartCount` increases between observations;
   - `State.Status != "running"` for a record marked running;
   - `State.Health.Status == "unhealthy"` if the image declares a Docker `HEALTHCHECK`.
3. **Active HTTP probe** every `RTMANAGER_PROBE_INTERVAL` (default `15s`). Calls
   `GET {engine_endpoint}/healthz` with `RTMANAGER_PROBE_TIMEOUT` (default `2s`). Emits:
   - `probe_failed` after `RTMANAGER_PROBE_FAILURES_THRESHOLD` consecutive failures
     (default `3`);
   - `probe_recovered` on the first success after a `probe_failed` was published.

Every emission updates `health_snapshots.{game_id}` (latest event becomes the snapshot) and
appends to `runtime:health_events`.

In v1, RTM publishes admin-only notification intents only for first-touch failures of the
start flow. All ongoing health changes (probe failures, OOMs, exits) flow through
`runtime:health_events` only. `Game Master` is the consumer that decides whether to escalate
runtime-level events into notifications.

The three workers that implement the sources above live in
`internal/worker/{dockerevents,dockerinspect,healthprobe}`. Their design rationale —
`container_started` ownership, `container_disappeared` emission rules, `die` exit-code
suppression, probe hysteresis state model, parallel-probe cap, and the events-listener
reconnect policy — is captured in [`docs/workers.md`](docs/workers.md).

## Reconciliation

RTM never assumes Docker and PostgreSQL are in sync.

At startup (blocking, before workers start) and every `RTMANAGER_RECONCILE_INTERVAL`
(default `5m`):

1. List Docker containers with label `com.galaxy.owner=rtmanager`.
2. For each running container without a matching record:
   - Insert a `runtime_records` row with `status=running`, the discovered
     `current_image_ref`, `engine_endpoint`, and `started_at` taken from
     `com.galaxy.started_at_ms` if present (otherwise from `State.StartedAt`).
   - Append `operation_log` entry with `op_kind=reconcile_adopt`,
     `op_source=auto_reconcile`.
   - **Never stop or remove an unrecorded container.** Operators may have started one
     manually for diagnostics; RTM stays out of their way.
3. For each `runtime_records` row with `status=running` whose container is missing:
   - Update `status=removed`, `removed_at=now`, `current_container_id=NULL`.
   - Publish `runtime:health_events` `container_disappeared`.
   - Append `operation_log` entry with `op_kind=reconcile_dispose`.
4. For each `runtime_records` row with `status=running` whose container exists but is in
   `exited`:
   - Update `status=stopped`, `stopped_at=now` (reconciler observation time).
   - Publish `runtime:health_events` `container_exited` with the observed exit code.

The reconciler implementation lives at `internal/worker/reconcile/` and the periodic
TTL-cleanup worker at `internal/worker/containercleanup/`; the cleanup worker delegates
removal to `internal/service/cleanupcontainer/`. The design rationale — the per-game
lease around every drift mutation, the third `observed_exited` path beyond the two
named cases, the synchronous `ReconcileNow` plus periodic `Component` split, and why
the cleanup worker is a thin TTL filter on top of the existing service — is captured in
[`docs/workers.md`](docs/workers.md).

## Trusted Surfaces

### Internal REST

The internal REST surface is consumed by `Game Master` (sync interactions for inspect,
restart, patch, stop, cleanup) and `Admin Service` (operational tooling, force-cleanup).
The listener is unauthenticated; downstream services rely on network segmentation.

| Method | Path | Operation ID | Caller |
| --- | --- | --- | --- |
| `GET` | `/healthz` | `internalHealthz` | platform probes |
| `GET` | `/readyz` | `internalReadyz` | platform probes |
| `GET` | `/api/v1/internal/runtimes` | `internalListRuntimes` | GM, Admin |
| `GET` | `/api/v1/internal/runtimes/{game_id}` | `internalGetRuntime` | GM, Admin |
| `POST` | `/api/v1/internal/runtimes/{game_id}/start` | `internalStartRuntime` | GM, Admin |
| `POST` | `/api/v1/internal/runtimes/{game_id}/stop` | `internalStopRuntime` | GM, Admin |
| `POST` | `/api/v1/internal/runtimes/{game_id}/restart` | `internalRestartRuntime` | GM, Admin |
| `POST` | `/api/v1/internal/runtimes/{game_id}/patch` | `internalPatchRuntime` | GM, Admin |
| `DELETE` | `/api/v1/internal/runtimes/{game_id}/container` | `internalCleanupRuntimeContainer` | Admin |

Request and response shapes are defined in [`./api/internal-openapi.yaml`](./api/internal-openapi.yaml).
Unknown JSON fields are rejected with `invalid_request`.

Callers identify themselves through the optional `X-Galaxy-Caller`
request header (`gm` for `Game Master`, `admin` for `Admin Service`).
The header is recorded as `op_source` in `operation_log` (`gm_rest` or
`admin_rest`); when missing or carrying any other value Runtime
Manager defaults to `op_source = admin_rest`. The header is documented
on every runtime endpoint of
[`./api/internal-openapi.yaml`](./api/internal-openapi.yaml).

## Async Stream Contracts

### `runtime:start_jobs` (in)

Producer: `Game Lobby`.

| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | Lobby `game_id`. |
| `image_ref` | string | Docker reference. Lobby resolves it from `target_engine_version` using `LOBBY_ENGINE_IMAGE_TEMPLATE`. |
| `requested_at_ms` | int64 | UTC milliseconds. Used for diagnostics, not authoritative. |

### `runtime:stop_jobs` (in)

Producer: `Game Lobby`.

| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | |
| `reason` | enum | `orphan_cleanup`, `cancelled`, `finished`, `admin_request`, `timeout`. Recorded in `operation_log.error_code` when the reason matters; otherwise opaque. |
| `requested_at_ms` | int64 | |

### `runtime:job_results` (out)

Producer: `Runtime Manager`. Consumer: `Game Lobby`.

| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | |
| `outcome` | enum | `success`, `failure`. |
| `container_id` | string | Required for `success`. Empty on `failure`. |
| `engine_endpoint` | string | Required for `success`. Empty on `failure`. |
| `error_code` | string | Stable code. `replay_no_op` for idempotent re-runs. |
| `error_message` | string | Operator-readable detail. |

### `runtime:health_events` (out)

Producer: `Runtime Manager`. Consumer: `Game Master` — confirmed in
production. `Game Lobby` and `Admin Service` are reserved as future
consumers; they do not read the stream in v1.

| Field | Type | Notes |
| --- | --- | --- |
| `game_id` | string | |
| `container_id` | string | The container observed (may differ from current after a restart race). |
| `event_type` | enum | See below. |
| `occurred_at_ms` | int64 | UTC milliseconds. |
| `details` | json | Type-specific payload. |

`event_type` values and their `details` schemas:

| `event_type` | `details` payload |
| --- | --- |
| `container_started` | `{image_ref}` |
| `container_exited` | `{exit_code, oom: bool}` |
| `container_oom` | `{exit_code}` |
| `container_disappeared` | `{}` |
| `inspect_unhealthy` | `{restart_count, state, health}` |
| `probe_failed` | `{consecutive_failures, last_status, last_error}` |
| `probe_recovered` | `{prior_failure_count}` |

The full schema is enforced by [`./api/runtime-health-asyncapi.yaml`](./api/runtime-health-asyncapi.yaml).

## Notification Contracts

`Runtime Manager` publishes admin-only notification intents only for failures invisible to
any other service:

| Trigger | `notification_type` | Audience | Channels |
| --- | --- | --- | --- |
| Image pull error during start | `runtime.image_pull_failed` | admin | email |
| `docker create` / `docker start` error | `runtime.container_start_failed` | admin | email |
| Configuration validation error at start (bad image_ref, missing network) | `runtime.start_config_invalid` | admin | email |

Constructors live in `galaxy/pkg/notificationintent`. Catalog entries live in
[`../notification/README.md`](../notification/README.md) and
[`../notification/api/intents-asyncapi.yaml`](../notification/api/intents-asyncapi.yaml).
All three intents share the frozen field set
`{game_id, image_ref, error_code, error_message, attempted_at_ms}`; the
`_ms` suffix on `attempted_at_ms` follows the repo-wide convention for
millisecond integer fields.
The Redis Streams publisher wrapper used to emit these intents from RTM
ships in `internal/adapters/notificationpublisher/`; the rationale for the
signature shim that drops the upstream entry id lives in
[`docs/domain-and-ports.md` §7](docs/domain-and-ports.md) and the production
wiring is documented in [`docs/adapters.md`](docs/adapters.md).

Runtime-level changes after a successful start (probe failures, OOM, container exited)
**do not** produce notifications from RTM. Game Master decides whether to escalate.

## Persistence Layout

### PostgreSQL durable state (schema `rtmanager`)

| Table | Purpose | Key |
| --- | --- | --- |
| `runtime_records` | One row per game, latest known runtime status. | `game_id` |
| `operation_log` | Append-only audit of every operation RTM performed. | `id` (auto) |
| `health_snapshots` | Latest health observation per game. | `game_id` |

`runtime_records` columns:

- `game_id` — primary key, references Lobby's identifier.
- `status` — `running | stopped | removed`.
- `current_container_id` — nullable when `status=removed`.
- `current_image_ref` — non-null when status is `running` or `stopped`.
- `engine_endpoint` — `http://galaxy-game-{game_id}:8080`.
- `state_path` — absolute host path of the bind-mounted directory.
- `docker_network` — network name observed at create time.
- `started_at`, `stopped_at`, `removed_at` — last transition timestamps.
- `last_op_at` — drives retention TTL.
- `created_at` — first time RTM saw the game.

`operation_log` columns:

- `id`, `game_id`, `op_kind` (`start | stop | restart | patch | cleanup_container |
  reconcile_adopt | reconcile_dispose`), `op_source` (`lobby_stream | gm_rest | admin_rest |
  auto_ttl | auto_reconcile`), `source_ref` (stream entry id, REST request id, or admin
  user), `image_ref`, `container_id`, `outcome` (`success | failure`), `error_code`,
  `error_message`, `started_at`, `finished_at`.

`health_snapshots` columns:

- `game_id`, `container_id`, `status`
  (`healthy | probe_failed | exited | oom | inspect_unhealthy | container_disappeared`),
  `source` (`docker_event | inspect | probe`), `details` (jsonb), `observed_at`.

Indexes:

- `runtime_records (status, last_op_at)` — drives cleanup worker.
- `operation_log (game_id, started_at DESC)` — drives audit reads.

Migrations are embedded `00001_init.sql` (single-init pre-launch policy from
`ARCHITECTURE.md §Persistence Backends`).

### Redis runtime-coordination state

| Key shape | Purpose |
| --- | --- |
| `rtmanager:stream_offsets:{label}` | Last processed entry id per consumer (`startjobs`, `stopjobs`). Same shape as Lobby. |
| `rtmanager:game_lease:{game_id}` | Per-game lease string (`SET ... NX PX <ttl>`). TTL is `RTMANAGER_GAME_LEASE_TTL_SECONDS` (default 60s); not renewed mid-operation in v1. The trade-off is documented in [`docs/services.md` §1](docs/services.md). |

Stream key shapes themselves are configurable:

- `RTMANAGER_REDIS_START_JOBS_STREAM` (default `runtime:start_jobs`).
- `RTMANAGER_REDIS_STOP_JOBS_STREAM` (default `runtime:stop_jobs`).
- `RTMANAGER_REDIS_JOB_RESULTS_STREAM` (default `runtime:job_results`).
- `RTMANAGER_REDIS_HEALTH_EVENTS_STREAM` (default `runtime:health_events`).
- `RTMANAGER_NOTIFICATION_INTENTS_STREAM` (default `notification:intents`).

## Error Model

Error envelope: `{ "error": { "code": "...", "message": "..." } }`, identical to Lobby's.

Stable error codes:

| Code | Meaning |
| --- | --- |
| `invalid_request` | Malformed JSON, unknown fields, missing required parameter. |
| `not_found` | Runtime record does not exist. |
| `conflict` | Operation incompatible with current `status`. |
| `service_unavailable` | Dependency unavailable (Docker daemon, PG, Redis). |
| `internal_error` | Unspecified failure. |
| `image_pull_failed` | Image pull attempt failed. |
| `image_ref_not_semver` | Patch attempted with a tag that is not parseable semver. |
| `semver_patch_only` | Patch attempted across major/minor boundary. |
| `container_start_failed` | `docker create` / `docker start` failed. |
| `start_config_invalid` | Network missing, bind path inaccessible, or other config error. |
| `docker_unavailable` | Docker daemon ping failed. |
| `replay_no_op` | Idempotent replay; outcome is success but no work was done. |

## Configuration

All variables use the `RTMANAGER_` prefix. Required variables fail-fast on startup.

### Required

- `RTMANAGER_INTERNAL_HTTP_ADDR`
- `RTMANAGER_POSTGRES_PRIMARY_DSN`
- `RTMANAGER_REDIS_MASTER_ADDR`
- `RTMANAGER_REDIS_PASSWORD`
- `RTMANAGER_DOCKER_HOST`
- `RTMANAGER_DOCKER_NETWORK`
- `RTMANAGER_GAME_STATE_ROOT`

### Configuration groups

**Listener:**

- `RTMANAGER_INTERNAL_HTTP_ADDR` (e.g. `:8096`).
- `RTMANAGER_INTERNAL_HTTP_READ_TIMEOUT` (default `5s`).
- `RTMANAGER_INTERNAL_HTTP_WRITE_TIMEOUT` (default `15s`).
- `RTMANAGER_INTERNAL_HTTP_IDLE_TIMEOUT` (default `60s`).

**Docker:**

- `RTMANAGER_DOCKER_HOST` (default `unix:///var/run/docker.sock`).
- `RTMANAGER_DOCKER_API_VERSION` (default empty — let SDK negotiate).
- `RTMANAGER_DOCKER_NETWORK` (default `galaxy-net`).
- `RTMANAGER_DOCKER_LOG_DRIVER` (default `json-file`).
- `RTMANAGER_DOCKER_LOG_OPTS` (default empty).
- `RTMANAGER_IMAGE_PULL_POLICY` (default `if_missing`,
  values `if_missing | always | never`).

**Container defaults:**

- `RTMANAGER_DEFAULT_CPU_QUOTA` (default `1.0`).
- `RTMANAGER_DEFAULT_MEMORY` (default `512m`).
- `RTMANAGER_DEFAULT_PIDS_LIMIT` (default `512`).
- `RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS` (default `30`).
- `RTMANAGER_CONTAINER_RETENTION_DAYS` (default `30`).
- `RTMANAGER_ENGINE_STATE_MOUNT_PATH` (default `/var/lib/galaxy-game`).
- `RTMANAGER_ENGINE_STATE_ENV_NAME` (default `GAME_STATE_PATH`).
- `RTMANAGER_GAME_STATE_DIR_MODE` (default `0750`).
- `RTMANAGER_GAME_STATE_OWNER_UID` (default `0`).
- `RTMANAGER_GAME_STATE_OWNER_GID` (default `0`).
- `RTMANAGER_GAME_STATE_ROOT` (host path).

**Postgres:**

- `RTMANAGER_POSTGRES_PRIMARY_DSN` (`postgres://rtmanager:<pwd>@<host>:5432/galaxy?search_path=rtmanager&sslmode=disable`).
- `RTMANAGER_POSTGRES_REPLICA_DSNS` (optional, comma-separated; not used in v1).
- `RTMANAGER_POSTGRES_OPERATION_TIMEOUT` (default `2s`).
- `RTMANAGER_POSTGRES_MAX_OPEN_CONNS` (default `10`).
- `RTMANAGER_POSTGRES_MAX_IDLE_CONNS` (default `2`).
- `RTMANAGER_POSTGRES_CONN_MAX_LIFETIME` (default `30m`).

**Redis:**

- `RTMANAGER_REDIS_MASTER_ADDR`.
- `RTMANAGER_REDIS_REPLICA_ADDRS` (optional, comma-separated).
- `RTMANAGER_REDIS_PASSWORD`.
- `RTMANAGER_REDIS_DB` (default `0`).
- `RTMANAGER_REDIS_OPERATION_TIMEOUT` (default `2s`).

**Streams:**

- `RTMANAGER_REDIS_START_JOBS_STREAM` (default `runtime:start_jobs`).
- `RTMANAGER_REDIS_STOP_JOBS_STREAM` (default `runtime:stop_jobs`).
- `RTMANAGER_REDIS_JOB_RESULTS_STREAM` (default `runtime:job_results`).
- `RTMANAGER_REDIS_HEALTH_EVENTS_STREAM` (default `runtime:health_events`).
- `RTMANAGER_NOTIFICATION_INTENTS_STREAM` (default `notification:intents`).
- `RTMANAGER_STREAM_BLOCK_TIMEOUT` (default `5s`).

**Health monitoring:**

- `RTMANAGER_INSPECT_INTERVAL` (default `30s`).
- `RTMANAGER_PROBE_INTERVAL` (default `15s`).
- `RTMANAGER_PROBE_TIMEOUT` (default `2s`).
- `RTMANAGER_PROBE_FAILURES_THRESHOLD` (default `3`).

**Reconciler / cleanup:**

- `RTMANAGER_RECONCILE_INTERVAL` (default `5m`).
- `RTMANAGER_CLEANUP_INTERVAL` (default `1h`).

**Coordination:**

- `RTMANAGER_GAME_LEASE_TTL_SECONDS` (default `60`).

**Lobby internal client:**

- `RTMANAGER_LOBBY_INTERNAL_BASE_URL` (e.g. `http://lobby:8095`).
- `RTMANAGER_LOBBY_INTERNAL_TIMEOUT` (default `2s`).

**Logging:**

- `RTMANAGER_LOG_LEVEL` (default `info`).

**Lifecycle:**

- `RTMANAGER_SHUTDOWN_TIMEOUT` (default `30s`).

**Telemetry:** uses the standard OTLP env vars (`OTEL_EXPORTER_OTLP_ENDPOINT`,
`OTEL_EXPORTER_OTLP_PROTOCOL`, etc.) shared with other Galaxy services.

## Observability

### Metrics (OpenTelemetry, low cardinality)

- `rtmanager.start_outcomes` — counter, labels `outcome`, `error_code`, `op_source`.
- `rtmanager.stop_outcomes` — counter, labels `outcome`, `reason`, `op_source`.
- `rtmanager.restart_outcomes` — counter, labels `outcome`, `error_code`.
- `rtmanager.patch_outcomes` — counter, labels `outcome`, `error_code`.
- `rtmanager.cleanup_outcomes` — counter, labels `outcome`, `op_source`.
- `rtmanager.docker_op_latency` — histogram, label `op` (`pull | create | start | stop | rm
  | inspect | events`).
- `rtmanager.health_events` — counter, label `event_type`.
- `rtmanager.reconcile_drift` — counter, label `kind` (`adopt | dispose | observed_exited`).
- `rtmanager.runtime_records_by_status` — gauge, label `status`.
- `rtmanager.lease_acquire_latency` — histogram.
- `rtmanager.notification_intents` — counter, label `notification_type`.

### Structured logs (slog JSON to stdout)

Common fields on every entry: `service=rtmanager`, `request_id`, `trace_id`, `span_id`,
`game_id` (when known), `container_id` (when known), `op_kind`, `op_source`, `outcome`,
`error_code`.

Worker-specific fields: `stream_entry_id` (consumers), `event_type` (health), `image_ref`
(start/patch).

## Verification

Service-level (TESTING.md §7):

- Unit tests for every service-layer operation against mocked Docker.
- Adapter tests (PG, Redis, Docker) using `testcontainers-go` for PG/Redis and the Docker
  daemon socket for the real Docker adapter.
- Contract tests for `internal-openapi.yaml`, `runtime-jobs-asyncapi.yaml`,
  `runtime-health-asyncapi.yaml`.

Service-local integration suite under `rtmanager/integration/`:

- Lifecycle end-to-end (start, inspect, stop, restart, patch, cleanup) against the real
  `galaxy/game` test image.
- Replay safety (duplicate stream entries are no-ops).
- Health observability (kill the engine externally, observe `container_disappeared`; relaunch
  manually, observe reconcile adopt).
- Notification on first-touch failures (publish a start with an unresolvable image, observe
  `runtime.image_pull_failed` intent and a `failure` job result).

Inter-service suite under `integration/lobbyrtm/`:

- Real Lobby + real RTM + real `galaxy/game` test image. Covers happy path, cancel, and
  start-failed flows.

Manual smoke (development):

```sh
docker network create galaxy-net   # once
RTMANAGER_GAME_STATE_ROOT=/var/lib/galaxy/games \
RTMANAGER_DOCKER_NETWORK=galaxy-net \
RTMANAGER_INTERNAL_HTTP_ADDR=:8096 \
... go run ./rtmanager/cmd/rtmanager
```

After start, `curl http://localhost:8096/readyz` returns `200`. Driving Lobby through its
public flow brings up `galaxy-game-{game_id}` containers; RTM logs each lifecycle transition
and publishes the corresponding stream entries.