# Configuration And Contract Examples The examples below are illustrative. Replace `localhost`, port numbers, IDs, and timestamps with values that match the deployment under inspection. ## Example `.env` A minimum-viable `RTMANAGER_*` set for a local run against a single Redis container plus a PostgreSQL container with the `rtmanager` schema and the `rtmanagerservice` role provisioned. The full list with defaults lives in [`../README.md` §Configuration](../README.md). ```bash # Required RTMANAGER_INTERNAL_HTTP_ADDR=:8096 RTMANAGER_POSTGRES_PRIMARY_DSN=postgres://rtmanagerservice:rtmanagerservice@127.0.0.1:5432/galaxy?search_path=rtmanager&sslmode=disable RTMANAGER_REDIS_MASTER_ADDR=127.0.0.1:6379 RTMANAGER_REDIS_PASSWORD=local RTMANAGER_DOCKER_HOST=unix:///var/run/docker.sock RTMANAGER_DOCKER_NETWORK=galaxy-net RTMANAGER_GAME_STATE_ROOT=/var/lib/galaxy/games # Lobby internal client (diagnostic GET only in v1) RTMANAGER_LOBBY_INTERNAL_BASE_URL=http://127.0.0.1:8095 RTMANAGER_LOBBY_INTERNAL_TIMEOUT=2s # Container defaults (image labels override these per container) RTMANAGER_DEFAULT_CPU_QUOTA=1.0 RTMANAGER_DEFAULT_MEMORY=512m RTMANAGER_DEFAULT_PIDS_LIMIT=512 RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS=30 RTMANAGER_CONTAINER_RETENTION_DAYS=30 RTMANAGER_ENGINE_STATE_MOUNT_PATH=/var/lib/galaxy-game RTMANAGER_ENGINE_STATE_ENV_NAME=GAME_STATE_PATH RTMANAGER_GAME_STATE_DIR_MODE=0750 RTMANAGER_GAME_STATE_OWNER_UID=0 RTMANAGER_GAME_STATE_OWNER_GID=0 # Workers RTMANAGER_INSPECT_INTERVAL=30s RTMANAGER_PROBE_INTERVAL=15s RTMANAGER_PROBE_TIMEOUT=2s RTMANAGER_PROBE_FAILURES_THRESHOLD=3 RTMANAGER_RECONCILE_INTERVAL=5m RTMANAGER_CLEANUP_INTERVAL=1h # Coordination RTMANAGER_GAME_LEASE_TTL_SECONDS=60 # Process and logging RTMANAGER_LOG_LEVEL=info RTMANAGER_SHUTDOWN_TIMEOUT=30s # Telemetry (disabled for local dev — enable to ship traces / metrics) OTEL_SERVICE_NAME=galaxy-rtmanager OTEL_TRACES_EXPORTER=none OTEL_METRICS_EXPORTER=none ``` For a production-shaped deployment, set `RTMANAGER_IMAGE_PULL_POLICY=always` (forces a pull on every start so a tag mutation is immediately visible to the next runtime), `RTMANAGER_GAME_STATE_OWNER_UID` / `_GID` to match the engine container's user, and configure `OTEL_*` against the cluster's OTLP collector. The `RTMANAGER_DOCKER_LOG_DRIVER` / `RTMANAGER_DOCKER_LOG_OPTS` pair routes engine stdout/stderr to the sink the operator runs (fluentd, journald, etc.). For tests, point `RTMANAGER_POSTGRES_PRIMARY_DSN` and `RTMANAGER_REDIS_MASTER_ADDR` at the testcontainers fixtures the service-local harness brings up ([`integration-tests.md` §7](integration-tests.md)). ## Internal HTTP Examples Every endpoint admits the optional `X-Galaxy-Caller` header which the handler records as `op_source` in `operation_log` (`gm` → `gm_rest`, `admin` → `admin_rest`; missing or unknown values default to `admin_rest` in v1). Decision: [`services.md` §18](services.md). ### Probe a runtime record ```bash curl -s -H 'X-Galaxy-Caller: gm' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ... ``` Response (`200 OK`): ```json { "game_id": "game-01HZ...", "status": "running", "current_container_id": "1f2a...", "current_image_ref": "galaxy/game:1.4.0", "engine_endpoint": "http://galaxy-game-game-01HZ...:8080", "state_path": "/var/lib/galaxy/games/game-01HZ...", "docker_network": "galaxy-net", "started_at": "2026-04-28T07:18:54Z", "stopped_at": null, "removed_at": null, "last_op_at": "2026-04-28T07:18:54Z", "created_at": "2026-04-28T07:18:54Z" } ``` ### List all runtimes ```bash curl -s -H 'X-Galaxy-Caller: admin' \ http://localhost:8096/api/v1/internal/runtimes ``` The response shape is `{"items":[...]}`. ### Start a runtime ```bash curl -s -X POST \ -H 'Content-Type: application/json' \ -H 'X-Galaxy-Caller: gm' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ.../start \ -d '{"image_ref": "galaxy/game:1.4.0"}' ``` A `200` returns the `RuntimeRecord` for the running runtime. Failure shapes use the canonical envelope; e.g. an invalid image_ref: ```json { "error": { "code": "start_config_invalid", "message": "image_ref shape rejected by docker reference parser" } } ``` ### Stop a runtime ```bash curl -s -X POST \ -H 'Content-Type: application/json' \ -H 'X-Galaxy-Caller: admin' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ.../stop \ -d '{"reason": "admin_request"}' ``` Valid `reason` values: `orphan_cleanup | cancelled | finished | admin_request | timeout`. ### Restart a runtime ```bash curl -s -X POST \ -H 'X-Galaxy-Caller: admin' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ.../restart ``` The body is empty; restart re-uses the current `image_ref`. ### Patch a runtime ```bash curl -s -X POST \ -H 'Content-Type: application/json' \ -H 'X-Galaxy-Caller: admin' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ.../patch \ -d '{"image_ref": "galaxy/game:1.4.2"}' ``` Patch enforces the semver-only rule: a non-semver tag returns `image_ref_not_semver`; a cross-major or cross-minor change returns `semver_patch_only`. ### Cleanup a stopped runtime container ```bash curl -s -X DELETE \ -H 'X-Galaxy-Caller: admin' \ http://localhost:8096/api/v1/internal/runtimes/game-01HZ.../container ``` Cleanup refuses a `running` runtime with `409 conflict`; stop first. ## Stream Payload Examples Every stream key shape is configurable via `RTMANAGER_REDIS_*_STREAM`; the defaults are used below. Field types and required/optional semantics are frozen by [`../api/runtime-jobs-asyncapi.yaml`](../api/runtime-jobs-asyncapi.yaml) and [`../api/runtime-health-asyncapi.yaml`](../api/runtime-health-asyncapi.yaml). ### `runtime:start_jobs` (Lobby → RTM) ```bash redis-cli XADD runtime:start_jobs '*' \ game_id 'game-01HZ...' \ image_ref 'galaxy/game:1.4.0' \ requested_at_ms 1714081234567 ``` ### `runtime:stop_jobs` (Lobby → RTM) ```bash redis-cli XADD runtime:stop_jobs '*' \ game_id 'game-01HZ...' \ reason 'cancelled' \ requested_at_ms 1714081234567 ``` ### `runtime:job_results` (RTM → Lobby) Success envelope: ```bash redis-cli XADD runtime:job_results '*' \ game_id 'game-01HZ...' \ outcome 'success' \ container_id '1f2a...' \ engine_endpoint 'http://galaxy-game-game-01HZ...:8080' \ error_code '' \ error_message '' ``` Failure envelope: ```bash redis-cli XADD runtime:job_results '*' \ game_id 'game-01HZ...' \ outcome 'failure' \ container_id '' \ engine_endpoint '' \ error_code 'image_pull_failed' \ error_message 'pull failed: manifest unknown' ``` Idempotent replay envelope (success outcome with explicit `replay_no_op`): ```bash redis-cli XADD runtime:job_results '*' \ game_id 'game-01HZ...' \ outcome 'success' \ container_id '1f2a...' \ engine_endpoint 'http://galaxy-game-game-01HZ...:8080' \ error_code 'replay_no_op' \ error_message '' ``` The contract permits empty `container_id` and `engine_endpoint` strings on every value of `outcome` so the consumer can decode the envelope uniformly ([`workers.md` §11](workers.md)). ### `runtime:health_events` (RTM out) The wire shape is the same for every event type — only the `details` payload differs. `container_started`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'container_started' \ occurred_at_ms 1714081234567 \ details '{"image_ref":"galaxy/game:1.4.0"}' ``` `container_exited`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'container_exited' \ occurred_at_ms 1714081234567 \ details '{"exit_code":137,"oom":false}' ``` `container_oom`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'container_oom' \ occurred_at_ms 1714081234567 \ details '{"exit_code":137}' ``` `container_disappeared`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'container_disappeared' \ occurred_at_ms 1714081234567 \ details '{}' ``` `inspect_unhealthy`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'inspect_unhealthy' \ occurred_at_ms 1714081234567 \ details '{"restart_count":3,"state":"running","health":"unhealthy"}' ``` `probe_failed` (after the threshold is crossed): ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'probe_failed' \ occurred_at_ms 1714081234567 \ details '{"consecutive_failures":3,"last_status":0,"last_error":"context deadline exceeded"}' ``` `probe_recovered`: ```bash redis-cli XADD runtime:health_events '*' \ game_id 'game-01HZ...' \ container_id '1f2a...' \ event_type 'probe_recovered' \ occurred_at_ms 1714081234567 \ details '{"prior_failure_count":3}' ``` ### `notification:intents` (RTM admin notifications) RTM publishes admin-only notification intents only for the three first-touch start failures. Every payload shares the frozen field set `{game_id, image_ref, error_code, error_message, attempted_at_ms}` ([`../README.md` §Notification Contracts](../README.md#notification-contracts)). `runtime.image_pull_failed`: ```bash redis-cli XADD notification:intents '*' \ envelope '{ "type": "runtime.image_pull_failed", "producer": "rtmanager", "idempotency_key": "runtime.image_pull_failed:game-01HZ...:1714081234567", "audience": {"kind": "admin_email", "email_address_kind": "runtime_image_pull_failed"}, "payload": { "game_id": "game-01HZ...", "image_ref": "galaxy/game:1.4.0", "error_code": "image_pull_failed", "error_message": "pull failed: manifest unknown", "attempted_at_ms": 1714081234567 } }' ``` `runtime.container_start_failed` and `runtime.start_config_invalid` share the same envelope with their respective `type` and `error_code` values. ## Storage Inspection ### Inspect a runtime record (PostgreSQL) ```bash psql "$RTMANAGER_POSTGRES_PRIMARY_DSN" -c \ "SELECT * FROM rtmanager.runtime_records WHERE game_id = 'game-01HZ...'" ``` Columns mirror the fields documented in [`../README.md` §Persistence Layout](../README.md#persistence-layout). ### Inspect runtime status counts ```bash psql "$RTMANAGER_POSTGRES_PRIMARY_DSN" -c \ "SELECT status, COUNT(*) FROM rtmanager.runtime_records GROUP BY status" ``` ### Inspect the operation log for a game ```bash psql "$RTMANAGER_POSTGRES_PRIMARY_DSN" -c \ "SELECT id, op_kind, op_source, outcome, error_code, started_at, finished_at FROM rtmanager.operation_log WHERE game_id = 'game-01HZ...' ORDER BY started_at DESC, id DESC LIMIT 50" ``` ### Inspect the latest health snapshot ```bash psql "$RTMANAGER_POSTGRES_PRIMARY_DSN" -c \ "SELECT game_id, container_id, status, source, observed_at, details FROM rtmanager.health_snapshots WHERE game_id = 'game-01HZ...'" ``` ### Inspect Redis runtime-coordination keys ```bash # Stream offsets redis-cli GET rtmanager:stream_offsets:startjobs redis-cli GET rtmanager:stream_offsets:stopjobs # Per-game lease (only present while an operation is in flight) redis-cli GET rtmanager:game_lease:game-01HZ... redis-cli TTL rtmanager:game_lease:game-01HZ... # Recent stream entries redis-cli XRANGE runtime:start_jobs - + COUNT 20 redis-cli XRANGE runtime:job_results - + COUNT 20 redis-cli XRANGE runtime:health_events - + COUNT 50 # Stream metadata redis-cli XINFO STREAM runtime:start_jobs redis-cli XINFO STREAM runtime:stop_jobs redis-cli XINFO STREAM runtime:health_events ```