Files
galaxy-game/rtmanager/docs/runtime.md
T
2026-04-28 20:39:18 +02:00

16 KiB

Runtime and Components

The diagram below focuses on the deployed galaxy/rtmanager process and its runtime dependencies. The current-state contract for every listener, worker, and adapter lives in ../README.md; this document is the navigation aid that points at the right code path and the right design-rationale record.

flowchart LR
    subgraph Clients
        GM["Game Master"]
        Admin["Admin Service"]
        Lobby["Game Lobby"]
    end

    subgraph RTM["Runtime Manager process"]
        InternalHTTP["Internal HTTP listener\n:8096 /healthz /readyz + REST"]
        StartJobs["startjobsconsumer"]
        StopJobs["stopjobsconsumer"]
        DockerEvents["dockerevents listener"]
        HealthProbe["healthprobe worker"]
        DockerInspect["dockerinspect worker"]
        Reconcile["reconcile worker"]
        Cleanup["containercleanup worker"]
        Services["lifecycle services\n(start, stop, restart, patch, cleanupcontainer)"]
        IntentPublisher["notification:intents publisher"]
        ResultsPublisher["runtime:job_results publisher"]
        HealthPublisher["runtime:health_events publisher"]
        Telemetry["Logs, traces, metrics"]
    end

    Docker["Docker Daemon"]
    Engine["galaxy-game-{game_id} container"]
    Postgres["PostgreSQL\nschema rtmanager"]
    Redis["Redis\nstreams + leases + offsets"]
    LobbyHTTP["Lobby internal HTTP"]

    Lobby -. runtime:start_jobs .-> StartJobs
    Lobby -. runtime:stop_jobs .-> StopJobs
    GM --> InternalHTTP
    Admin --> InternalHTTP

    StartJobs --> Services
    StopJobs --> Services
    InternalHTTP --> Services

    Services --> Docker
    Services --> Postgres
    Services --> Redis
    Services --> ResultsPublisher
    Services --> HealthPublisher
    Services --> IntentPublisher
    Services -. GET diagnostic .-> LobbyHTTP

    DockerEvents --> Docker
    DockerInspect --> Docker
    HealthProbe --> Engine
    Reconcile --> Docker
    Reconcile --> Postgres
    Cleanup --> Postgres
    Cleanup --> Services

    DockerEvents --> HealthPublisher
    DockerInspect --> HealthPublisher
    HealthProbe --> HealthPublisher

    HealthPublisher --> Redis
    ResultsPublisher --> Redis
    IntentPublisher --> Redis

    StartJobs --> Redis
    StopJobs --> Redis
    InternalHTTP --> Postgres

    Docker -->|create / start / stop / rm| Engine
    Engine -. bind mount .- StateDir["host:\n<RTMANAGER_GAME_STATE_ROOT>/{game_id}"]

    InternalHTTP --> Telemetry
    Services --> Telemetry
    StartJobs --> Telemetry
    StopJobs --> Telemetry
    DockerEvents --> Telemetry
    HealthProbe --> Telemetry
    DockerInspect --> Telemetry
    Reconcile --> Telemetry
    Cleanup --> Telemetry

Notes:

  • cmd/rtmanager refuses startup when PostgreSQL is unreachable, when goose migrations fail, when Redis ping fails, when the Docker daemon ping fails, or when the configured Docker network is missing. Lobby reachability is not verified at boot — the start service's diagnostic GET /api/v1/internal/games/{game_id} call is a no-op outside of debug logging (services.md §7).
  • The reconciler runs synchronously once on startup before app.App.Run registers any other component, then re-runs periodically as a regular Component. The synchronous pass is the reason why orphaned containers from a prior process can never be observed by the events listener with no PG record (workers.md §17).
  • A single internal HTTP listener exposes both probes (/healthz, /readyz) and the trusted REST surface for Game Master and Admin Service. There is no public listener — RTM does not face end users.

Listeners

Listener Default addr Purpose
Internal HTTP :8096 Probes (/healthz, /readyz) plus the trusted REST surface for Game Master and Admin Service

Shared listener defaults from RTMANAGER_INTERNAL_HTTP_*:

  • read timeout: 5s
  • write timeout: 15s
  • idle timeout: 60s

The listener is unauthenticated and assumes a trusted network segment. The X-Galaxy-Caller request header carries an optional caller identity (gm or admin) that the handler records as operation_log.op_source (services.md §18).

Probe routes:

  • GET /healthz — process liveness; returns {"status":"ok"} while the listener is up.
  • GET /readyz — live-pings PostgreSQL primary, Redis master, and the Docker daemon, then asserts the configured Docker network exists. Returns {"status":"ready"} only when every check passes; otherwise returns 503 with the canonical error envelope.

Background Workers

Every worker runs as an app.Component and is registered in the order below by internal/app/runtime.go.

Worker Source Trigger Function
Start jobs consumer internal/worker/startjobsconsumer Redis XREAD runtime:start_jobs Decodes {game_id, image_ref, requested_at_ms} and invokes startruntime.Service; publishes the outcome to runtime:job_results
Stop jobs consumer internal/worker/stopjobsconsumer Redis XREAD runtime:stop_jobs Decodes {game_id, reason, requested_at_ms} and invokes stopruntime.Service; publishes the outcome to runtime:job_results
Docker events listener internal/worker/dockerevents Docker /events API filtered by com.galaxy.owner=rtmanager Emits runtime:health_events for container_exited, container_oom, container_disappeared. Reconnects on transport errors with a fixed 5s backoff (workers.md §7)
Health probe worker internal/worker/healthprobe Periodic RTMANAGER_PROBE_INTERVAL GET {engine_endpoint}/healthz for every running runtime; in-memory hysteresis emits probe_failed after RTMANAGER_PROBE_FAILURES_THRESHOLD consecutive failures and probe_recovered on the first success thereafter (workers.md §5–§6)
Docker inspect worker internal/worker/dockerinspect Periodic RTMANAGER_INSPECT_INTERVAL Calls InspectContainer for every running runtime; emits inspect_unhealthy on RestartCount growth, unexpected status, or Docker HEALTHCHECK=unhealthy
Reconciler internal/worker/reconcile Synchronous startup pass + periodic RTMANAGER_RECONCILE_INTERVAL Adopts unrecorded containers (reconcile_adopt), disposes records whose container vanished (reconcile_dispose), records observed exits (observed_exited); every mutation runs under the per-game lease (workers.md §14–§15)
Container cleanup internal/worker/containercleanup Periodic RTMANAGER_CLEANUP_INTERVAL Lists runtime_records rows with status=stopped AND last_op_at < now - retention, delegates to cleanupcontainer.Service per game (workers.md §19)

The events listener and the inspect worker do not emit container_started — that event is owned by the start service (workers.md §1). The events listener and the inspect worker also do not emit container_disappeared autonomously when a record is missing or stale; the conditional emission rules live in workers.md §2 and §4.

Lifecycle Services

The five lifecycle services are pure orchestrators called from both the stream consumers and the REST handlers. Each service owns the per-game lease for the duration of its operation.

Service Source Triggers Failure envelope
startruntime internal/service/startruntime runtime:start_jobs, POST /api/v1/internal/runtimes/{id}/start start_config_invalid, image_pull_failed, container_start_failed, conflict, service_unavailable, internal_error (services.md §4)
stopruntime internal/service/stopruntime runtime:stop_jobs, POST /api/v1/internal/runtimes/{id}/stop conflict, service_unavailable, internal_error, not_found (services.md §17)
restartruntime internal/service/restartruntime POST /api/v1/internal/runtimes/{id}/restart inherited from inner stop / start; lease covers both inner ops (services.md §12, §17)
patchruntime internal/service/patchruntime POST /api/v1/internal/runtimes/{id}/patch image_ref_not_semver, semver_patch_only, plus inherited start/stop codes (services.md §14, §17)
cleanupcontainer internal/service/cleanupcontainer DELETE /api/v1/internal/runtimes/{id}/container, periodic cleanup worker not_found, conflict, service_unavailable, internal_error (services.md §17)

All services share three behaviours captured in services.md:

  • the per-game Redis lease (rtmanager:game_lease:{game_id}, TTL RTMANAGER_GAME_LEASE_TTL_SECONDS) is acquired by the service, not by the caller — which keeps consumer and REST callers symmetric (services.md §1);
  • the canonical Result shape (Outcome, ErrorCode, Record, ContainerID, EngineEndpoint) is what consumers and REST handlers translate into job_results / HTTP responses (services.md §3);
  • failures pass through one operation_log write before returning, and three of the failure codes (start_config_invalid, image_pull_failed, container_start_failed) also publish a runtime.* admin notification intent (services.md §4).

Synchronous Upstream Client

Client Endpoint Failure mapping
Game Lobby internal GET {RTMANAGER_LOBBY_INTERNAL_BASE_URL}/api/v1/internal/games/{game_id} Diagnostic-only in v1; the start service ignores the body and absorbs network failures with a debug log. Decision: services.md §7

Lobby's outbound transport is the only synchronous client RTM holds. Every other interaction (Notification Service, Game Master, Admin Service) crosses an asynchronous boundary or is initiated by the peer.

Stream Offsets

Each consumer persists its position under a fixed label so process restart preserves stream progress.

Stream Offset key Block timeout env
runtime:start_jobs rtmanager:stream_offsets:startjobs RTMANAGER_STREAM_BLOCK_TIMEOUT
runtime:stop_jobs rtmanager:stream_offsets:stopjobs RTMANAGER_STREAM_BLOCK_TIMEOUT

The labels startjobs and stopjobs are stable identifiers — they are decoupled from the underlying stream key. An operator who renames a stream via RTMANAGER_REDIS_START_JOBS_STREAM / RTMANAGER_REDIS_STOP_JOBS_STREAM does not lose the persisted offset. Decision: workers.md §9.

The runtime:job_results, runtime:health_events, and notification:intents streams are outbound; RTM does not consume them itself.

Configuration Groups

The full env-var list with defaults lives in ../README.md §Configuration. The groups below summarise the structure:

  • RequiredRTMANAGER_INTERNAL_HTTP_ADDR, RTMANAGER_POSTGRES_PRIMARY_DSN, RTMANAGER_REDIS_MASTER_ADDR, RTMANAGER_REDIS_PASSWORD, RTMANAGER_DOCKER_HOST, RTMANAGER_DOCKER_NETWORK, RTMANAGER_GAME_STATE_ROOT.
  • ListenerRTMANAGER_INTERNAL_HTTP_* timeouts.
  • DockerRTMANAGER_DOCKER_HOST, RTMANAGER_DOCKER_API_VERSION, RTMANAGER_DOCKER_NETWORK, RTMANAGER_DOCKER_LOG_DRIVER, RTMANAGER_DOCKER_LOG_OPTS, RTMANAGER_IMAGE_PULL_POLICY.
  • Container defaultsRTMANAGER_DEFAULT_CPU_QUOTA, RTMANAGER_DEFAULT_MEMORY, RTMANAGER_DEFAULT_PIDS_LIMIT, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS, RTMANAGER_CONTAINER_RETENTION_DAYS, RTMANAGER_ENGINE_STATE_MOUNT_PATH, RTMANAGER_ENGINE_STATE_ENV_NAME, RTMANAGER_GAME_STATE_DIR_MODE, RTMANAGER_GAME_STATE_OWNER_UID, RTMANAGER_GAME_STATE_OWNER_GID.
  • PostgreSQL connectivityRTMANAGER_POSTGRES_PRIMARY_DSN, RTMANAGER_POSTGRES_REPLICA_DSNS, RTMANAGER_POSTGRES_OPERATION_TIMEOUT, RTMANAGER_POSTGRES_MAX_OPEN_CONNS, RTMANAGER_POSTGRES_MAX_IDLE_CONNS, RTMANAGER_POSTGRES_CONN_MAX_LIFETIME.
  • Redis connectivityRTMANAGER_REDIS_MASTER_ADDR, RTMANAGER_REDIS_REPLICA_ADDRS, RTMANAGER_REDIS_PASSWORD, RTMANAGER_REDIS_DB, RTMANAGER_REDIS_OPERATION_TIMEOUT.
  • StreamsRTMANAGER_REDIS_START_JOBS_STREAM, RTMANAGER_REDIS_STOP_JOBS_STREAM, RTMANAGER_REDIS_JOB_RESULTS_STREAM, RTMANAGER_REDIS_HEALTH_EVENTS_STREAM, RTMANAGER_NOTIFICATION_INTENTS_STREAM, RTMANAGER_STREAM_BLOCK_TIMEOUT.
  • Health monitoringRTMANAGER_INSPECT_INTERVAL, RTMANAGER_PROBE_INTERVAL, RTMANAGER_PROBE_TIMEOUT, RTMANAGER_PROBE_FAILURES_THRESHOLD.
  • Reconciler / cleanupRTMANAGER_RECONCILE_INTERVAL, RTMANAGER_CLEANUP_INTERVAL.
  • CoordinationRTMANAGER_GAME_LEASE_TTL_SECONDS.
  • Lobby internal clientRTMANAGER_LOBBY_INTERNAL_BASE_URL, RTMANAGER_LOBBY_INTERNAL_TIMEOUT.
  • Process and loggingRTMANAGER_LOG_LEVEL, RTMANAGER_SHUTDOWN_TIMEOUT.
  • Telemetry — standard OTEL_*.

Runtime Notes

  • Single-instance v1. Multi-instance Runtime Manager with Redis Streams consumer groups is explicitly out of scope for the current iteration. The per-game lease serialises operations on one game across the consumer + REST entry points; cross-instance coordination is deferred until a real workload demands it.
  • Lease semantics. rtmanager:game_lease:{game_id} is SET ... NX PX <ttl> with TTL RTMANAGER_GAME_LEASE_TTL_SECONDS (default 60s). The lease is not renewed mid-operation in v1; long pulls of multi-GB images can therefore expire the lease before the operation finishes — the trade-off is documented in services.md §1. The reconciler honours the same lease around every drift mutation (workers.md §14).
  • Operation log is the source of truth. Every lifecycle and reconcile mutation appends one row to rtmanager.operation_log. The runtime:health_events stream and the notification:intents emissions are best-effort — a publish failure logs at Error and proceeds, never rolling back the recorded operation (workers.md §8).
  • In-memory probe hysteresis. The active HTTP probe keeps per-game consecutiveFailures and failurePublished counters in a mutex-guarded map. State is non-persistent: a process restart that loses the counters re-establishes hysteresis from scratch, and state for a game that transitions through stopped → running is pruned at the start of every probe tick (workers.md §5).
  • Pull policy fallbacks. RTMANAGER_IMAGE_PULL_POLICY accepts if_missing (default), always, and never. Image labels (com.galaxy.cpu_quota, com.galaxy.memory, com.galaxy.pids_limit) drive resource limits when present; the matching RTMANAGER_DEFAULT_* env vars supply the fallback when a label is absent or unparseable. Producers never pass limits.
  • State directory ownership. RTM creates per-game state directories under RTMANAGER_GAME_STATE_ROOT with the configured mode and uid/gid, but never deletes them. Removing the directory is operator domain (backup tooling, a future Admin Service workflow). A cleanup that removes the container leaves the directory intact.