galaxy-game/rtmanager/docs/flows.md

# Flows

This document collects the lifecycle and observability flows that
span Runtime Manager and its synchronous and asynchronous neighbours.
Narrative descriptions of the rules these flows enforce live in
[`../README.md`](../README.md); the diagrams here focus on the message
order across the boundary. Design-rationale records linked from each
section explain the *why*.

## Start (happy path)

```mermaid
sequenceDiagram
    participant Lobby as Lobby publisher
    participant Stream as runtime:start_jobs
    participant Consumer as startjobsconsumer
    participant Service as startruntime
    participant Lease as Redis lease
    participant Docker
    participant PG as Postgres
    participant Health as runtime:health_events
    participant Results as runtime:job_results

    Lobby->>Stream: XADD {game_id, image_ref, requested_at_ms}
    Consumer->>Stream: XREAD
    Consumer->>Service: Handle(game_id, image_ref, OpSourceLobbyStream, entry_id)
    Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
    Service->>PG: SELECT runtime_records WHERE game_id
    Service->>Docker: PullImage(image_ref) per pull policy
    Service->>Docker: InspectImage → resource limits
    Service->>Service: prepareStateDir(<root>/{game_id})
    Service->>Docker: ContainerCreate + ContainerStart
    Service->>PG: Upsert runtime_records (status=running)
    Service->>PG: INSERT operation_log (op_kind=start, outcome=success)
    Service->>Health: XADD container_started
    Service-->>Consumer: Result{Outcome=success, ContainerID, EngineEndpoint}
    Consumer->>Results: XADD {outcome=success, container_id, engine_endpoint}
    Service->>Lease: DEL rtmanager:game_lease:{game_id}
```

REST callers (Game Master, Admin Service) drive the same service
through `POST /api/v1/internal/runtimes/{game_id}/start`; the
diagram's last two arrows collapse to an HTTP `200` response carrying
the runtime record. Sources:
[`../README.md` §Lifecycles → Start](../README.md#start),
[`services.md` §3](services.md).

## Start failure (image pull)

```mermaid
sequenceDiagram
    participant Service as startruntime
    participant Docker
    participant PG as Postgres
    participant Intents as notification:intents
    participant Results as runtime:job_results

    Service->>Docker: PullImage(image_ref)
    Docker-->>Service: error
    Service->>PG: INSERT operation_log (op_kind=start, outcome=failure, error_code=image_pull_failed)
    Service->>Intents: XADD runtime.image_pull_failed {game_id, image_ref, error_code, error_message, attempted_at_ms}
    Service-->>Service: Result{Outcome=failure, ErrorCode=image_pull_failed}
    Service->>Results: XADD {outcome=failure, error_code=image_pull_failed}
```

The same shape applies to the configuration-validation failures
(`start_config_invalid` from `EnsureNetwork(ErrNetworkMissing)`,
`prepareStateDir`, or invalid `image_ref` shape) and the Docker
create/start failure (`container_start_failed`); only the error code
and the matching `runtime.*` notification type differ. Three failure
codes do **not** raise an admin notification: `conflict`,
`service_unavailable`, `internal_error`
([`services.md` §4](services.md)).

## Start failure (orphan / Upsert-after-Run rollback)

```mermaid
sequenceDiagram
    participant Service as startruntime
    participant Docker
    participant PG as Postgres
    participant Intents as notification:intents

    Service->>Docker: ContainerCreate + ContainerStart
    Docker-->>Service: container running
    Service->>PG: Upsert runtime_records
    PG-->>Service: error (transport / constraint)
    Note over Service: container is now an orphan<br/>(running, no PG record)
    Service->>Docker: Remove(container_id) [fresh background context]
    Docker-->>Service: ok or logged failure
    Service->>PG: INSERT operation_log (outcome=failure, error_code=container_start_failed)
    Service->>Intents: XADD runtime.container_start_failed
    Service-->>Service: Result{Outcome=failure, ErrorCode=container_start_failed}
```

The Docker adapter already removes the container when `Run` itself
fails after a successful `ContainerCreate`
([`adapters.md` §3](adapters.md)); the start service adds the
post-`Run` rollback for the `Upsert` path. A `Remove` failure is
logged but not propagated; the reconciler adopts surviving orphans on
its periodic pass ([`services.md` §5](services.md)).

## Stop

```mermaid
sequenceDiagram
    participant Caller as Lobby / GM / Admin
    participant Service as stopruntime
    participant Lease as Redis lease
    participant PG as Postgres
    participant Docker
    participant Results as runtime:job_results

    Caller->>Service: stop(game_id, reason)
    Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
    Service->>PG: SELECT runtime_records WHERE game_id
    alt status in {stopped, removed}
        Service->>PG: INSERT operation_log (outcome=success, error_code=replay_no_op)
        Service-->>Caller: success / replay_no_op
    else status = running
        Service->>Docker: ContainerStop(container_id, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS)
        Docker-->>Service: ok
        Service->>PG: UpdateStatus running→stopped (CAS by container_id)
        Service->>PG: INSERT operation_log (op_kind=stop, outcome=success)
        Service-->>Caller: success
    end
    Service->>Lease: DEL rtmanager:game_lease:{game_id}
```

Lobby callers receive the outcome through `runtime:job_results`; REST
callers receive an HTTP `200`. The `reason` enum
(`orphan_cleanup | cancelled | finished | admin_request | timeout`)
is recorded in `operation_log` and is otherwise opaque to the stop
service — RTM does not branch on the reason in v1
([`services.md` §15, §17](services.md)).

## Restart

```mermaid
sequenceDiagram
    participant Admin as GM / Admin
    participant Service as restartruntime
    participant Stop as stopruntime.Run
    participant Start as startruntime.Run
    participant Docker
    participant PG as Postgres

    Admin->>Service: POST /restart
    Service->>PG: SELECT runtime_records WHERE game_id
    Note over Service: capture current image_ref
    Service->>Service: acquire per-game lease (held across both inner ops)
    Service->>Stop: Run(game_id) [lease bypass]
    Stop->>Docker: ContainerStop
    Stop->>PG: UpdateStatus running→stopped
    Service->>Docker: ContainerRemove
    Service->>Start: Run(game_id, image_ref) [lease bypass]
    Start->>Docker: PullImage / Run
    Start->>PG: Upsert runtime_records (status=running)
    Service->>PG: INSERT operation_log (op_kind=restart, outcome=success, source_ref=correlation_id)
    Service-->>Admin: 200 {runtime_record}
    Service->>Service: release lease
```

The lease is acquired by `restartruntime` and held across both inner
operations; `stopruntime.Run` and `startruntime.Run` are
lease-bypass entry points that skip the inner lease acquisition
([`services.md` §12](services.md)). The single `operation_log` row
uses `Input.SourceRef` as a correlation id linking the implicit stop
and start entries ([`services.md` §13](services.md)).

## Patch

```mermaid
sequenceDiagram
    participant Admin as GM / Admin
    participant Service as patchruntime
    participant Restart as restartruntime.Run

    Admin->>Service: POST /patch {image_ref: "galaxy/game:1.4.2"}
    Service->>Service: parse new image_ref + current image_ref
    alt either ref not semver
        Service-->>Admin: 422 image_ref_not_semver
    else major or minor differ
        Service-->>Admin: 422 semver_patch_only
    else major.minor match, patch differs (or equal)
        Service->>Restart: Run(game_id, new_image_ref)
        Restart-->>Service: Result
        Service-->>Admin: 200 {runtime_record}
    end
```

The semver gate uses the tag fragment of the Docker reference; the
extraction strategy is recorded in [`services.md` §14](services.md).
The restart delegate already owns the lease, the inner stop/start,
the operation log, and the `runtime:health_events container_started`
emission ([`workers.md` §1](workers.md)).

## Cleanup TTL

```mermaid
sequenceDiagram
    participant Worker as containercleanup worker
    participant PG as Postgres
    participant Service as cleanupcontainer
    participant Lease as Redis lease
    participant Docker

    loop every RTMANAGER_CLEANUP_INTERVAL
        Worker->>PG: SELECT runtime_records WHERE status='stopped' AND last_op_at < now - retention
        loop per game
            Worker->>Service: cleanup(game_id, op_source=auto_ttl)
            Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
            Service->>PG: re-read runtime_records WHERE game_id
            alt status = running
                Service-->>Worker: refused / conflict
            else status in {stopped, removed}
                Service->>Docker: ContainerRemove(container_id)
                Service->>PG: UpdateStatus stopped→removed (CAS)
                Service->>PG: INSERT operation_log (op_kind=cleanup_container)
                Service-->>Worker: success
            end
            Service->>Lease: DEL rtmanager:game_lease:{game_id}
        end
    end
```

Admin-driven cleanup follows the same path through
`DELETE /api/v1/internal/runtimes/{game_id}/container` with
`op_source=admin_rest` instead of `auto_ttl`. The host state directory
is **never** removed by this flow
([`../README.md` §Cleanup](../README.md#cleanup),
[`services.md` §17](services.md),
[`workers.md` §19](workers.md)).

## Reconcile drift adopt

```mermaid
sequenceDiagram
    participant Reconciler as reconcile worker
    participant Docker
    participant PG as Postgres
    participant Lease as Redis lease

    Note over Reconciler: read pass (lockless)
    Reconciler->>Docker: List({label=com.galaxy.owner=rtmanager})
    Reconciler->>PG: ListByStatus(running)
    Note over Reconciler: write pass (per-game lease)
    loop per Docker container without matching record
        Reconciler->>Lease: SET NX PX rtmanager:game_lease:{game_id}
        Reconciler->>PG: re-read runtime_records WHERE game_id
        alt record now exists
            Reconciler-->>Reconciler: skip (state changed since read pass)
        else record still missing
            Reconciler->>PG: Upsert runtime_records (status=running, image_ref, started_at)
            Reconciler->>PG: INSERT operation_log (op_kind=reconcile_adopt, op_source=auto_reconcile)
        end
        Reconciler->>Lease: DEL rtmanager:game_lease:{game_id}
    end
```

The reconciler **never** stops or removes an unrecorded container —
operators may have started one manually for diagnostics. The
`reconcile_dispose` and `observed_exited` paths follow the same
read-pass / write-pass split, with `dispose` updating the orphaned
record to `removed` and emitting `container_disappeared`, and
`observed_exited` updating to `stopped` and emitting `container_exited`
([`../README.md` §Reconciliation](../README.md#reconciliation),
[`workers.md` §14–§16](workers.md)).

## Health probe hysteresis

```mermaid
sequenceDiagram
    participant Worker as healthprobe worker
    participant State as in-memory probe state
    participant Engine as galaxy-game-{id}:8080
    participant Health as runtime:health_events

    loop every RTMANAGER_PROBE_INTERVAL
        Worker->>Worker: ListByStatus(running)
        Worker->>State: prune entries for games no longer running
        loop per game (semaphore cap = 16)
            Worker->>Engine: GET /healthz (RTMANAGER_PROBE_TIMEOUT)
            alt success
                State->>State: consecutiveFailures = 0
                opt failurePublished was true
                    Worker->>Health: XADD probe_recovered {prior_failure_count}
                    State->>State: failurePublished = false
                end
            else failure
                State->>State: consecutiveFailures++
                opt consecutiveFailures == RTMANAGER_PROBE_FAILURES_THRESHOLD AND not failurePublished
                    Worker->>Health: XADD probe_failed {consecutive_failures, last_status, last_error}
                    State->>State: failurePublished = true
                end
            end
        end
    end
```

Hysteresis prevents a single transient failure from emitting a
`probe_failed` event, and prevents repeated emission while the failure
persists. State is non-persistent: a process restart re-establishes
the counters from scratch; a game's state is pruned when it transitions
out of the running list ([`workers.md` §5–§6](workers.md)).