12 KiB
Flows
This document collects the lifecycle and observability flows that
span Runtime Manager and its synchronous and asynchronous neighbours.
Narrative descriptions of the rules these flows enforce live in
../README.md; the diagrams here focus on the message
order across the boundary. Design-rationale records linked from each
section explain the why.
Start (happy path)
sequenceDiagram
participant Lobby as Lobby publisher
participant Stream as runtime:start_jobs
participant Consumer as startjobsconsumer
participant Service as startruntime
participant Lease as Redis lease
participant Docker
participant PG as Postgres
participant Health as runtime:health_events
participant Results as runtime:job_results
Lobby->>Stream: XADD {game_id, image_ref, requested_at_ms}
Consumer->>Stream: XREAD
Consumer->>Service: Handle(game_id, image_ref, OpSourceLobbyStream, entry_id)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: SELECT runtime_records WHERE game_id
Service->>Docker: PullImage(image_ref) per pull policy
Service->>Docker: InspectImage → resource limits
Service->>Service: prepareStateDir(<root>/{game_id})
Service->>Docker: ContainerCreate + ContainerStart
Service->>PG: Upsert runtime_records (status=running)
Service->>PG: INSERT operation_log (op_kind=start, outcome=success)
Service->>Health: XADD container_started
Service-->>Consumer: Result{Outcome=success, ContainerID, EngineEndpoint}
Consumer->>Results: XADD {outcome=success, container_id, engine_endpoint}
Service->>Lease: DEL rtmanager:game_lease:{game_id}
REST callers (Game Master, Admin Service) drive the same service
through POST /api/v1/internal/runtimes/{game_id}/start; the
diagram's last two arrows collapse to an HTTP 200 response carrying
the runtime record. Sources:
../README.md §Lifecycles → Start,
services.md §3.
Start failure (image pull)
sequenceDiagram
participant Service as startruntime
participant Docker
participant PG as Postgres
participant Intents as notification:intents
participant Results as runtime:job_results
Service->>Docker: PullImage(image_ref)
Docker-->>Service: error
Service->>PG: INSERT operation_log (op_kind=start, outcome=failure, error_code=image_pull_failed)
Service->>Intents: XADD runtime.image_pull_failed {game_id, image_ref, error_code, error_message, attempted_at_ms}
Service-->>Service: Result{Outcome=failure, ErrorCode=image_pull_failed}
Service->>Results: XADD {outcome=failure, error_code=image_pull_failed}
The same shape applies to the configuration-validation failures
(start_config_invalid from EnsureNetwork(ErrNetworkMissing),
prepareStateDir, or invalid image_ref shape) and the Docker
create/start failure (container_start_failed); only the error code
and the matching runtime.* notification type differ. Three failure
codes do not raise an admin notification: conflict,
service_unavailable, internal_error
(services.md §4).
Start failure (orphan / Upsert-after-Run rollback)
sequenceDiagram
participant Service as startruntime
participant Docker
participant PG as Postgres
participant Intents as notification:intents
Service->>Docker: ContainerCreate + ContainerStart
Docker-->>Service: container running
Service->>PG: Upsert runtime_records
PG-->>Service: error (transport / constraint)
Note over Service: container is now an orphan<br/>(running, no PG record)
Service->>Docker: Remove(container_id) [fresh background context]
Docker-->>Service: ok or logged failure
Service->>PG: INSERT operation_log (outcome=failure, error_code=container_start_failed)
Service->>Intents: XADD runtime.container_start_failed
Service-->>Service: Result{Outcome=failure, ErrorCode=container_start_failed}
The Docker adapter already removes the container when Run itself
fails after a successful ContainerCreate
(adapters.md §3); the start service adds the
post-Run rollback for the Upsert path. A Remove failure is
logged but not propagated; the reconciler adopts surviving orphans on
its periodic pass (services.md §5).
Stop
sequenceDiagram
participant Caller as Lobby / GM / Admin
participant Service as stopruntime
participant Lease as Redis lease
participant PG as Postgres
participant Docker
participant Results as runtime:job_results
Caller->>Service: stop(game_id, reason)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: SELECT runtime_records WHERE game_id
alt status in {stopped, removed}
Service->>PG: INSERT operation_log (outcome=success, error_code=replay_no_op)
Service-->>Caller: success / replay_no_op
else status = running
Service->>Docker: ContainerStop(container_id, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS)
Docker-->>Service: ok
Service->>PG: UpdateStatus running→stopped (CAS by container_id)
Service->>PG: INSERT operation_log (op_kind=stop, outcome=success)
Service-->>Caller: success
end
Service->>Lease: DEL rtmanager:game_lease:{game_id}
Lobby callers receive the outcome through runtime:job_results; REST
callers receive an HTTP 200. The reason enum
(orphan_cleanup | cancelled | finished | admin_request | timeout)
is recorded in operation_log and is otherwise opaque to the stop
service — RTM does not branch on the reason in v1
(services.md §15, §17).
Restart
sequenceDiagram
participant Admin as GM / Admin
participant Service as restartruntime
participant Stop as stopruntime.Run
participant Start as startruntime.Run
participant Docker
participant PG as Postgres
Admin->>Service: POST /restart
Service->>PG: SELECT runtime_records WHERE game_id
Note over Service: capture current image_ref
Service->>Service: acquire per-game lease (held across both inner ops)
Service->>Stop: Run(game_id) [lease bypass]
Stop->>Docker: ContainerStop
Stop->>PG: UpdateStatus running→stopped
Service->>Docker: ContainerRemove
Service->>Start: Run(game_id, image_ref) [lease bypass]
Start->>Docker: PullImage / Run
Start->>PG: Upsert runtime_records (status=running)
Service->>PG: INSERT operation_log (op_kind=restart, outcome=success, source_ref=correlation_id)
Service-->>Admin: 200 {runtime_record}
Service->>Service: release lease
The lease is acquired by restartruntime and held across both inner
operations; stopruntime.Run and startruntime.Run are
lease-bypass entry points that skip the inner lease acquisition
(services.md §12). The single operation_log row
uses Input.SourceRef as a correlation id linking the implicit stop
and start entries (services.md §13).
Patch
sequenceDiagram
participant Admin as GM / Admin
participant Service as patchruntime
participant Restart as restartruntime.Run
Admin->>Service: POST /patch {image_ref: "galaxy/game:1.4.2"}
Service->>Service: parse new image_ref + current image_ref
alt either ref not semver
Service-->>Admin: 422 image_ref_not_semver
else major or minor differ
Service-->>Admin: 422 semver_patch_only
else major.minor match, patch differs (or equal)
Service->>Restart: Run(game_id, new_image_ref)
Restart-->>Service: Result
Service-->>Admin: 200 {runtime_record}
end
The semver gate uses the tag fragment of the Docker reference; the
extraction strategy is recorded in services.md §14.
The restart delegate already owns the lease, the inner stop/start,
the operation log, and the runtime:health_events container_started
emission (workers.md §1).
Cleanup TTL
sequenceDiagram
participant Worker as containercleanup worker
participant PG as Postgres
participant Service as cleanupcontainer
participant Lease as Redis lease
participant Docker
loop every RTMANAGER_CLEANUP_INTERVAL
Worker->>PG: SELECT runtime_records WHERE status='stopped' AND last_op_at < now - retention
loop per game
Worker->>Service: cleanup(game_id, op_source=auto_ttl)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: re-read runtime_records WHERE game_id
alt status = running
Service-->>Worker: refused / conflict
else status in {stopped, removed}
Service->>Docker: ContainerRemove(container_id)
Service->>PG: UpdateStatus stopped→removed (CAS)
Service->>PG: INSERT operation_log (op_kind=cleanup_container)
Service-->>Worker: success
end
Service->>Lease: DEL rtmanager:game_lease:{game_id}
end
end
Admin-driven cleanup follows the same path through
DELETE /api/v1/internal/runtimes/{game_id}/container with
op_source=admin_rest instead of auto_ttl. The host state directory
is never removed by this flow
(../README.md §Cleanup,
services.md §17,
workers.md §19).
Reconcile drift adopt
sequenceDiagram
participant Reconciler as reconcile worker
participant Docker
participant PG as Postgres
participant Lease as Redis lease
Note over Reconciler: read pass (lockless)
Reconciler->>Docker: List({label=com.galaxy.owner=rtmanager})
Reconciler->>PG: ListByStatus(running)
Note over Reconciler: write pass (per-game lease)
loop per Docker container without matching record
Reconciler->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Reconciler->>PG: re-read runtime_records WHERE game_id
alt record now exists
Reconciler-->>Reconciler: skip (state changed since read pass)
else record still missing
Reconciler->>PG: Upsert runtime_records (status=running, image_ref, started_at)
Reconciler->>PG: INSERT operation_log (op_kind=reconcile_adopt, op_source=auto_reconcile)
end
Reconciler->>Lease: DEL rtmanager:game_lease:{game_id}
end
The reconciler never stops or removes an unrecorded container —
operators may have started one manually for diagnostics. The
reconcile_dispose and observed_exited paths follow the same
read-pass / write-pass split, with dispose updating the orphaned
record to removed and emitting container_disappeared, and
observed_exited updating to stopped and emitting container_exited
(../README.md §Reconciliation,
workers.md §14–§16).
Health probe hysteresis
sequenceDiagram
participant Worker as healthprobe worker
participant State as in-memory probe state
participant Engine as galaxy-game-{id}:8080
participant Health as runtime:health_events
loop every RTMANAGER_PROBE_INTERVAL
Worker->>Worker: ListByStatus(running)
Worker->>State: prune entries for games no longer running
loop per game (semaphore cap = 16)
Worker->>Engine: GET /healthz (RTMANAGER_PROBE_TIMEOUT)
alt success
State->>State: consecutiveFailures = 0
opt failurePublished was true
Worker->>Health: XADD probe_recovered {prior_failure_count}
State->>State: failurePublished = false
end
else failure
State->>State: consecutiveFailures++
opt consecutiveFailures == RTMANAGER_PROBE_FAILURES_THRESHOLD AND not failurePublished
Worker->>Health: XADD probe_failed {consecutive_failures, last_status, last_error}
State->>State: failurePublished = true
end
end
end
end
Hysteresis prevents a single transient failure from emitting a
probe_failed event, and prevents repeated emission while the failure
persists. State is non-persistent: a process restart re-establishes
the counters from scratch; a game's state is pruned when it transitions
out of the running list (workers.md §5–§6).