# Flows This document collects the lifecycle and observability flows that span Runtime Manager and its synchronous and asynchronous neighbours. Narrative descriptions of the rules these flows enforce live in [`../README.md`](../README.md); the diagrams here focus on the message order across the boundary. Design-rationale records linked from each section explain the *why*. ## Start (happy path) ```mermaid sequenceDiagram participant Lobby as Lobby publisher participant Stream as runtime:start_jobs participant Consumer as startjobsconsumer participant Service as startruntime participant Lease as Redis lease participant Docker participant PG as Postgres participant Health as runtime:health_events participant Results as runtime:job_results Lobby->>Stream: XADD {game_id, image_ref, requested_at_ms} Consumer->>Stream: XREAD Consumer->>Service: Handle(game_id, image_ref, OpSourceLobbyStream, entry_id) Service->>Lease: SET NX PX rtmanager:game_lease:{game_id} Service->>PG: SELECT runtime_records WHERE game_id Service->>Docker: PullImage(image_ref) per pull policy Service->>Docker: InspectImage → resource limits Service->>Service: prepareStateDir(/{game_id}) Service->>Docker: ContainerCreate + ContainerStart Service->>PG: Upsert runtime_records (status=running) Service->>PG: INSERT operation_log (op_kind=start, outcome=success) Service->>Health: XADD container_started Service-->>Consumer: Result{Outcome=success, ContainerID, EngineEndpoint} Consumer->>Results: XADD {outcome=success, container_id, engine_endpoint} Service->>Lease: DEL rtmanager:game_lease:{game_id} ``` REST callers (Game Master, Admin Service) drive the same service through `POST /api/v1/internal/runtimes/{game_id}/start`; the diagram's last two arrows collapse to an HTTP `200` response carrying the runtime record. Sources: [`../README.md` §Lifecycles → Start](../README.md#start), [`services.md` §3](services.md). ## Start failure (image pull) ```mermaid sequenceDiagram participant Service as startruntime participant Docker participant PG as Postgres participant Intents as notification:intents participant Results as runtime:job_results Service->>Docker: PullImage(image_ref) Docker-->>Service: error Service->>PG: INSERT operation_log (op_kind=start, outcome=failure, error_code=image_pull_failed) Service->>Intents: XADD runtime.image_pull_failed {game_id, image_ref, error_code, error_message, attempted_at_ms} Service-->>Service: Result{Outcome=failure, ErrorCode=image_pull_failed} Service->>Results: XADD {outcome=failure, error_code=image_pull_failed} ``` The same shape applies to the configuration-validation failures (`start_config_invalid` from `EnsureNetwork(ErrNetworkMissing)`, `prepareStateDir`, or invalid `image_ref` shape) and the Docker create/start failure (`container_start_failed`); only the error code and the matching `runtime.*` notification type differ. Three failure codes do **not** raise an admin notification: `conflict`, `service_unavailable`, `internal_error` ([`services.md` §4](services.md)). ## Start failure (orphan / Upsert-after-Run rollback) ```mermaid sequenceDiagram participant Service as startruntime participant Docker participant PG as Postgres participant Intents as notification:intents Service->>Docker: ContainerCreate + ContainerStart Docker-->>Service: container running Service->>PG: Upsert runtime_records PG-->>Service: error (transport / constraint) Note over Service: container is now an orphan
(running, no PG record) Service->>Docker: Remove(container_id) [fresh background context] Docker-->>Service: ok or logged failure Service->>PG: INSERT operation_log (outcome=failure, error_code=container_start_failed) Service->>Intents: XADD runtime.container_start_failed Service-->>Service: Result{Outcome=failure, ErrorCode=container_start_failed} ``` The Docker adapter already removes the container when `Run` itself fails after a successful `ContainerCreate` ([`adapters.md` §3](adapters.md)); the start service adds the post-`Run` rollback for the `Upsert` path. A `Remove` failure is logged but not propagated; the reconciler adopts surviving orphans on its periodic pass ([`services.md` §5](services.md)). ## Stop ```mermaid sequenceDiagram participant Caller as Lobby / GM / Admin participant Service as stopruntime participant Lease as Redis lease participant PG as Postgres participant Docker participant Results as runtime:job_results Caller->>Service: stop(game_id, reason) Service->>Lease: SET NX PX rtmanager:game_lease:{game_id} Service->>PG: SELECT runtime_records WHERE game_id alt status in {stopped, removed} Service->>PG: INSERT operation_log (outcome=success, error_code=replay_no_op) Service-->>Caller: success / replay_no_op else status = running Service->>Docker: ContainerStop(container_id, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS) Docker-->>Service: ok Service->>PG: UpdateStatus running→stopped (CAS by container_id) Service->>PG: INSERT operation_log (op_kind=stop, outcome=success) Service-->>Caller: success end Service->>Lease: DEL rtmanager:game_lease:{game_id} ``` Lobby callers receive the outcome through `runtime:job_results`; REST callers receive an HTTP `200`. The `reason` enum (`orphan_cleanup | cancelled | finished | admin_request | timeout`) is recorded in `operation_log` and is otherwise opaque to the stop service — RTM does not branch on the reason in v1 ([`services.md` §15, §17](services.md)). ## Restart ```mermaid sequenceDiagram participant Admin as GM / Admin participant Service as restartruntime participant Stop as stopruntime.Run participant Start as startruntime.Run participant Docker participant PG as Postgres Admin->>Service: POST /restart Service->>PG: SELECT runtime_records WHERE game_id Note over Service: capture current image_ref Service->>Service: acquire per-game lease (held across both inner ops) Service->>Stop: Run(game_id) [lease bypass] Stop->>Docker: ContainerStop Stop->>PG: UpdateStatus running→stopped Service->>Docker: ContainerRemove Service->>Start: Run(game_id, image_ref) [lease bypass] Start->>Docker: PullImage / Run Start->>PG: Upsert runtime_records (status=running) Service->>PG: INSERT operation_log (op_kind=restart, outcome=success, source_ref=correlation_id) Service-->>Admin: 200 {runtime_record} Service->>Service: release lease ``` The lease is acquired by `restartruntime` and held across both inner operations; `stopruntime.Run` and `startruntime.Run` are lease-bypass entry points that skip the inner lease acquisition ([`services.md` §12](services.md)). The single `operation_log` row uses `Input.SourceRef` as a correlation id linking the implicit stop and start entries ([`services.md` §13](services.md)). ## Patch ```mermaid sequenceDiagram participant Admin as GM / Admin participant Service as patchruntime participant Restart as restartruntime.Run Admin->>Service: POST /patch {image_ref: "galaxy/game:1.4.2"} Service->>Service: parse new image_ref + current image_ref alt either ref not semver Service-->>Admin: 422 image_ref_not_semver else major or minor differ Service-->>Admin: 422 semver_patch_only else major.minor match, patch differs (or equal) Service->>Restart: Run(game_id, new_image_ref) Restart-->>Service: Result Service-->>Admin: 200 {runtime_record} end ``` The semver gate uses the tag fragment of the Docker reference; the extraction strategy is recorded in [`services.md` §14](services.md). The restart delegate already owns the lease, the inner stop/start, the operation log, and the `runtime:health_events container_started` emission ([`workers.md` §1](workers.md)). ## Cleanup TTL ```mermaid sequenceDiagram participant Worker as containercleanup worker participant PG as Postgres participant Service as cleanupcontainer participant Lease as Redis lease participant Docker loop every RTMANAGER_CLEANUP_INTERVAL Worker->>PG: SELECT runtime_records WHERE status='stopped' AND last_op_at < now - retention loop per game Worker->>Service: cleanup(game_id, op_source=auto_ttl) Service->>Lease: SET NX PX rtmanager:game_lease:{game_id} Service->>PG: re-read runtime_records WHERE game_id alt status = running Service-->>Worker: refused / conflict else status in {stopped, removed} Service->>Docker: ContainerRemove(container_id) Service->>PG: UpdateStatus stopped→removed (CAS) Service->>PG: INSERT operation_log (op_kind=cleanup_container) Service-->>Worker: success end Service->>Lease: DEL rtmanager:game_lease:{game_id} end end ``` Admin-driven cleanup follows the same path through `DELETE /api/v1/internal/runtimes/{game_id}/container` with `op_source=admin_rest` instead of `auto_ttl`. The host state directory is **never** removed by this flow ([`../README.md` §Cleanup](../README.md#cleanup), [`services.md` §17](services.md), [`workers.md` §19](workers.md)). ## Reconcile drift adopt ```mermaid sequenceDiagram participant Reconciler as reconcile worker participant Docker participant PG as Postgres participant Lease as Redis lease Note over Reconciler: read pass (lockless) Reconciler->>Docker: List({label=com.galaxy.owner=rtmanager}) Reconciler->>PG: ListByStatus(running) Note over Reconciler: write pass (per-game lease) loop per Docker container without matching record Reconciler->>Lease: SET NX PX rtmanager:game_lease:{game_id} Reconciler->>PG: re-read runtime_records WHERE game_id alt record now exists Reconciler-->>Reconciler: skip (state changed since read pass) else record still missing Reconciler->>PG: Upsert runtime_records (status=running, image_ref, started_at) Reconciler->>PG: INSERT operation_log (op_kind=reconcile_adopt, op_source=auto_reconcile) end Reconciler->>Lease: DEL rtmanager:game_lease:{game_id} end ``` The reconciler **never** stops or removes an unrecorded container — operators may have started one manually for diagnostics. The `reconcile_dispose` and `observed_exited` paths follow the same read-pass / write-pass split, with `dispose` updating the orphaned record to `removed` and emitting `container_disappeared`, and `observed_exited` updating to `stopped` and emitting `container_exited` ([`../README.md` §Reconciliation](../README.md#reconciliation), [`workers.md` §14–§16](workers.md)). ## Health probe hysteresis ```mermaid sequenceDiagram participant Worker as healthprobe worker participant State as in-memory probe state participant Engine as galaxy-game-{id}:8080 participant Health as runtime:health_events loop every RTMANAGER_PROBE_INTERVAL Worker->>Worker: ListByStatus(running) Worker->>State: prune entries for games no longer running loop per game (semaphore cap = 16) Worker->>Engine: GET /healthz (RTMANAGER_PROBE_TIMEOUT) alt success State->>State: consecutiveFailures = 0 opt failurePublished was true Worker->>Health: XADD probe_recovered {prior_failure_count} State->>State: failurePublished = false end else failure State->>State: consecutiveFailures++ opt consecutiveFailures == RTMANAGER_PROBE_FAILURES_THRESHOLD AND not failurePublished Worker->>Health: XADD probe_failed {consecutive_failures, last_status, last_error} State->>State: failurePublished = true end end end end ``` Hysteresis prevents a single transient failure from emitting a `probe_failed` event, and prevents repeated emission while the failure persists. State is non-persistent: a process restart re-establishes the counters from scratch; a game's state is pruned when it transitions out of the running list ([`workers.md` §5–§6](workers.md)).