Flows

This document collects the lifecycle and observability flows that span Runtime Manager and its synchronous and asynchronous neighbours. Narrative descriptions of the rules these flows enforce live in ../README.md; the diagrams here focus on the message order across the boundary. Design-rationale records linked from each section explain the why.

Start (happy path)

sequenceDiagram
    participant Lobby as Lobby publisher
    participant Stream as runtime:start_jobs
    participant Consumer as startjobsconsumer
    participant Service as startruntime
    participant Lease as Redis lease
    participant Docker
    participant PG as Postgres
    participant Health as runtime:health_events
    participant Results as runtime:job_results

    Lobby->>Stream: XADD {game_id, image_ref, requested_at_ms}
    Consumer->>Stream: XREAD
    Consumer->>Service: Handle(game_id, image_ref, OpSourceLobbyStream, entry_id)
    Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
    Service->>PG: SELECT runtime_records WHERE game_id
    Service->>Docker: PullImage(image_ref) per pull policy
    Service->>Docker: InspectImage → resource limits
    Service->>Service: prepareStateDir(<root>/{game_id})
    Service->>Docker: ContainerCreate + ContainerStart
    Service->>PG: Upsert runtime_records (status=running)
    Service->>PG: INSERT operation_log (op_kind=start, outcome=success)
    Service->>Health: XADD container_started
    Service-->>Consumer: Result{Outcome=success, ContainerID, EngineEndpoint}
    Consumer->>Results: XADD {outcome=success, container_id, engine_endpoint}
    Service->>Lease: DEL rtmanager:game_lease:{game_id}

REST callers (Game Master, Admin Service) drive the same service through POST /api/v1/internal/runtimes/{game_id}/start; the diagram's last two arrows collapse to an HTTP 200 response carrying the runtime record. Sources: ../README.md §Lifecycles → Start, services.md §3.

Start failure (image pull)

sequenceDiagram
    participant Service as startruntime
    participant Docker
    participant PG as Postgres
    participant Intents as notification:intents
    participant Results as runtime:job_results

    Service->>Docker: PullImage(image_ref)
    Docker-->>Service: error
    Service->>PG: INSERT operation_log (op_kind=start, outcome=failure, error_code=image_pull_failed)
    Service->>Intents: XADD runtime.image_pull_failed {game_id, image_ref, error_code, error_message, attempted_at_ms}
    Service-->>Service: Result{Outcome=failure, ErrorCode=image_pull_failed}
    Service->>Results: XADD {outcome=failure, error_code=image_pull_failed}

The same shape applies to the configuration-validation failures (start_config_invalid from EnsureNetwork(ErrNetworkMissing), prepareStateDir, or invalid image_ref shape) and the Docker create/start failure (container_start_failed); only the error code and the matching runtime.* notification type differ. Three failure codes do not raise an admin notification: conflict, service_unavailable, internal_error (services.md §4).

Start failure (orphan / Upsert-after-Run rollback)

sequenceDiagram
    participant Service as startruntime
    participant Docker
    participant PG as Postgres
    participant Intents as notification:intents

    Service->>Docker: ContainerCreate + ContainerStart
    Docker-->>Service: container running
    Service->>PG: Upsert runtime_records
    PG-->>Service: error (transport / constraint)
    Note over Service: container is now an orphan<br/>(running, no PG record)
    Service->>Docker: Remove(container_id) [fresh background context]
    Docker-->>Service: ok or logged failure
    Service->>PG: INSERT operation_log (outcome=failure, error_code=container_start_failed)
    Service->>Intents: XADD runtime.container_start_failed
    Service-->>Service: Result{Outcome=failure, ErrorCode=container_start_failed}

The Docker adapter already removes the container when Run itself fails after a successful ContainerCreate (adapters.md §3); the start service adds the post-Run rollback for the Upsert path. A Remove failure is logged but not propagated; the reconciler adopts surviving orphans on its periodic pass (services.md §5).

Stop

sequenceDiagram
    participant Caller as Lobby / GM / Admin
    participant Service as stopruntime
    participant Lease as Redis lease
    participant PG as Postgres
    participant Docker
    participant Results as runtime:job_results

    Caller->>Service: stop(game_id, reason)
    Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
    Service->>PG: SELECT runtime_records WHERE game_id
    alt status in {stopped, removed}
        Service->>PG: INSERT operation_log (outcome=success, error_code=replay_no_op)
        Service-->>Caller: success / replay_no_op
    else status = running
        Service->>Docker: ContainerStop(container_id, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS)
        Docker-->>Service: ok
        Service->>PG: UpdateStatus running→stopped (CAS by container_id)
        Service->>PG: INSERT operation_log (op_kind=stop, outcome=success)
        Service-->>Caller: success
    end
    Service->>Lease: DEL rtmanager:game_lease:{game_id}

Lobby callers receive the outcome through runtime:job_results; REST callers receive an HTTP 200. The reason enum (orphan_cleanup | cancelled | finished | admin_request | timeout) is recorded in operation_log and is otherwise opaque to the stop service — RTM does not branch on the reason in v1 (services.md §15, §17).

Restart

sequenceDiagram
    participant Admin as GM / Admin
    participant Service as restartruntime
    participant Stop as stopruntime.Run
    participant Start as startruntime.Run
    participant Docker
    participant PG as Postgres

    Admin->>Service: POST /restart
    Service->>PG: SELECT runtime_records WHERE game_id
    Note over Service: capture current image_ref
    Service->>Service: acquire per-game lease (held across both inner ops)
    Service->>Stop: Run(game_id) [lease bypass]
    Stop->>Docker: ContainerStop
    Stop->>PG: UpdateStatus running→stopped
    Service->>Docker: ContainerRemove
    Service->>Start: Run(game_id, image_ref) [lease bypass]
    Start->>Docker: PullImage / Run
    Start->>PG: Upsert runtime_records (status=running)
    Service->>PG: INSERT operation_log (op_kind=restart, outcome=success, source_ref=correlation_id)
    Service-->>Admin: 200 {runtime_record}
    Service->>Service: release lease

The lease is acquired by restartruntime and held across both inner operations; stopruntime.Run and startruntime.Run are lease-bypass entry points that skip the inner lease acquisition (services.md §12). The single operation_log row uses Input.SourceRef as a correlation id linking the implicit stop and start entries (services.md §13).

Patch

sequenceDiagram
    participant Admin as GM / Admin
    participant Service as patchruntime
    participant Restart as restartruntime.Run

    Admin->>Service: POST /patch {image_ref: "galaxy/game:1.4.2"}
    Service->>Service: parse new image_ref + current image_ref
    alt either ref not semver
        Service-->>Admin: 422 image_ref_not_semver
    else major or minor differ
        Service-->>Admin: 422 semver_patch_only
    else major.minor match, patch differs (or equal)
        Service->>Restart: Run(game_id, new_image_ref)
        Restart-->>Service: Result
        Service-->>Admin: 200 {runtime_record}
    end

The semver gate uses the tag fragment of the Docker reference; the extraction strategy is recorded in services.md §14. The restart delegate already owns the lease, the inner stop/start, the operation log, and the runtime:health_events container_started emission (workers.md §1).

Cleanup TTL

sequenceDiagram
    participant Worker as containercleanup worker
    participant PG as Postgres
    participant Service as cleanupcontainer
    participant Lease as Redis lease
    participant Docker

    loop every RTMANAGER_CLEANUP_INTERVAL
        Worker->>PG: SELECT runtime_records WHERE status='stopped' AND last_op_at < now - retention
        loop per game
            Worker->>Service: cleanup(game_id, op_source=auto_ttl)
            Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
            Service->>PG: re-read runtime_records WHERE game_id
            alt status = running
                Service-->>Worker: refused / conflict
            else status in {stopped, removed}
                Service->>Docker: ContainerRemove(container_id)
                Service->>PG: UpdateStatus stopped→removed (CAS)
                Service->>PG: INSERT operation_log (op_kind=cleanup_container)
                Service-->>Worker: success
            end
            Service->>Lease: DEL rtmanager:game_lease:{game_id}
        end
    end

Admin-driven cleanup follows the same path through DELETE /api/v1/internal/runtimes/{game_id}/container with op_source=admin_rest instead of auto_ttl. The host state directory is never removed by this flow (../README.md §Cleanup, services.md §17, workers.md §19).

Reconcile drift adopt

sequenceDiagram
    participant Reconciler as reconcile worker
    participant Docker
    participant PG as Postgres
    participant Lease as Redis lease

    Note over Reconciler: read pass (lockless)
    Reconciler->>Docker: List({label=com.galaxy.owner=rtmanager})
    Reconciler->>PG: ListByStatus(running)
    Note over Reconciler: write pass (per-game lease)
    loop per Docker container without matching record
        Reconciler->>Lease: SET NX PX rtmanager:game_lease:{game_id}
        Reconciler->>PG: re-read runtime_records WHERE game_id
        alt record now exists
            Reconciler-->>Reconciler: skip (state changed since read pass)
        else record still missing
            Reconciler->>PG: Upsert runtime_records (status=running, image_ref, started_at)
            Reconciler->>PG: INSERT operation_log (op_kind=reconcile_adopt, op_source=auto_reconcile)
        end
        Reconciler->>Lease: DEL rtmanager:game_lease:{game_id}
    end

The reconciler never stops or removes an unrecorded container — operators may have started one manually for diagnostics. The reconcile_dispose and observed_exited paths follow the same read-pass / write-pass split, with dispose updating the orphaned record to removed and emitting container_disappeared, and observed_exited updating to stopped and emitting container_exited (../README.md §Reconciliation, workers.md §14–§16).

Health probe hysteresis

sequenceDiagram
    participant Worker as healthprobe worker
    participant State as in-memory probe state
    participant Engine as galaxy-game-{id}:8080
    participant Health as runtime:health_events

    loop every RTMANAGER_PROBE_INTERVAL
        Worker->>Worker: ListByStatus(running)
        Worker->>State: prune entries for games no longer running
        loop per game (semaphore cap = 16)
            Worker->>Engine: GET /healthz (RTMANAGER_PROBE_TIMEOUT)
            alt success
                State->>State: consecutiveFailures = 0
                opt failurePublished was true
                    Worker->>Health: XADD probe_recovered {prior_failure_count}
                    State->>State: failurePublished = false
                end
            else failure
                State->>State: consecutiveFailures++
                opt consecutiveFailures == RTMANAGER_PROBE_FAILURES_THRESHOLD AND not failurePublished
                    Worker->>Health: XADD probe_failed {consecutive_failures, last_status, last_error}
                    State->>State: failurePublished = true
                end
            end
        end
    end

Hysteresis prevents a single transient failure from emitting a probe_failed event, and prevents repeated emission while the failure persists. State is non-persistent: a process restart re-establishes the counters from scratch; a game's state is pruned when it transitions out of the running list (workers.md §5–§6).

12 KiB Raw Blame History