Files
galaxy-game/rtmanager/docs/flows.md
T
2026-04-28 20:39:18 +02:00

306 lines
12 KiB
Markdown

# Flows
This document collects the lifecycle and observability flows that
span Runtime Manager and its synchronous and asynchronous neighbours.
Narrative descriptions of the rules these flows enforce live in
[`../README.md`](../README.md); the diagrams here focus on the message
order across the boundary. Design-rationale records linked from each
section explain the *why*.
## Start (happy path)
```mermaid
sequenceDiagram
participant Lobby as Lobby publisher
participant Stream as runtime:start_jobs
participant Consumer as startjobsconsumer
participant Service as startruntime
participant Lease as Redis lease
participant Docker
participant PG as Postgres
participant Health as runtime:health_events
participant Results as runtime:job_results
Lobby->>Stream: XADD {game_id, image_ref, requested_at_ms}
Consumer->>Stream: XREAD
Consumer->>Service: Handle(game_id, image_ref, OpSourceLobbyStream, entry_id)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: SELECT runtime_records WHERE game_id
Service->>Docker: PullImage(image_ref) per pull policy
Service->>Docker: InspectImage → resource limits
Service->>Service: prepareStateDir(<root>/{game_id})
Service->>Docker: ContainerCreate + ContainerStart
Service->>PG: Upsert runtime_records (status=running)
Service->>PG: INSERT operation_log (op_kind=start, outcome=success)
Service->>Health: XADD container_started
Service-->>Consumer: Result{Outcome=success, ContainerID, EngineEndpoint}
Consumer->>Results: XADD {outcome=success, container_id, engine_endpoint}
Service->>Lease: DEL rtmanager:game_lease:{game_id}
```
REST callers (Game Master, Admin Service) drive the same service
through `POST /api/v1/internal/runtimes/{game_id}/start`; the
diagram's last two arrows collapse to an HTTP `200` response carrying
the runtime record. Sources:
[`../README.md` §Lifecycles → Start](../README.md#start),
[`services.md` §3](services.md).
## Start failure (image pull)
```mermaid
sequenceDiagram
participant Service as startruntime
participant Docker
participant PG as Postgres
participant Intents as notification:intents
participant Results as runtime:job_results
Service->>Docker: PullImage(image_ref)
Docker-->>Service: error
Service->>PG: INSERT operation_log (op_kind=start, outcome=failure, error_code=image_pull_failed)
Service->>Intents: XADD runtime.image_pull_failed {game_id, image_ref, error_code, error_message, attempted_at_ms}
Service-->>Service: Result{Outcome=failure, ErrorCode=image_pull_failed}
Service->>Results: XADD {outcome=failure, error_code=image_pull_failed}
```
The same shape applies to the configuration-validation failures
(`start_config_invalid` from `EnsureNetwork(ErrNetworkMissing)`,
`prepareStateDir`, or invalid `image_ref` shape) and the Docker
create/start failure (`container_start_failed`); only the error code
and the matching `runtime.*` notification type differ. Three failure
codes do **not** raise an admin notification: `conflict`,
`service_unavailable`, `internal_error`
([`services.md` §4](services.md)).
## Start failure (orphan / Upsert-after-Run rollback)
```mermaid
sequenceDiagram
participant Service as startruntime
participant Docker
participant PG as Postgres
participant Intents as notification:intents
Service->>Docker: ContainerCreate + ContainerStart
Docker-->>Service: container running
Service->>PG: Upsert runtime_records
PG-->>Service: error (transport / constraint)
Note over Service: container is now an orphan<br/>(running, no PG record)
Service->>Docker: Remove(container_id) [fresh background context]
Docker-->>Service: ok or logged failure
Service->>PG: INSERT operation_log (outcome=failure, error_code=container_start_failed)
Service->>Intents: XADD runtime.container_start_failed
Service-->>Service: Result{Outcome=failure, ErrorCode=container_start_failed}
```
The Docker adapter already removes the container when `Run` itself
fails after a successful `ContainerCreate`
([`adapters.md` §3](adapters.md)); the start service adds the
post-`Run` rollback for the `Upsert` path. A `Remove` failure is
logged but not propagated; the reconciler adopts surviving orphans on
its periodic pass ([`services.md` §5](services.md)).
## Stop
```mermaid
sequenceDiagram
participant Caller as Lobby / GM / Admin
participant Service as stopruntime
participant Lease as Redis lease
participant PG as Postgres
participant Docker
participant Results as runtime:job_results
Caller->>Service: stop(game_id, reason)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: SELECT runtime_records WHERE game_id
alt status in {stopped, removed}
Service->>PG: INSERT operation_log (outcome=success, error_code=replay_no_op)
Service-->>Caller: success / replay_no_op
else status = running
Service->>Docker: ContainerStop(container_id, RTMANAGER_CONTAINER_STOP_TIMEOUT_SECONDS)
Docker-->>Service: ok
Service->>PG: UpdateStatus running→stopped (CAS by container_id)
Service->>PG: INSERT operation_log (op_kind=stop, outcome=success)
Service-->>Caller: success
end
Service->>Lease: DEL rtmanager:game_lease:{game_id}
```
Lobby callers receive the outcome through `runtime:job_results`; REST
callers receive an HTTP `200`. The `reason` enum
(`orphan_cleanup | cancelled | finished | admin_request | timeout`)
is recorded in `operation_log` and is otherwise opaque to the stop
service — RTM does not branch on the reason in v1
([`services.md` §15, §17](services.md)).
## Restart
```mermaid
sequenceDiagram
participant Admin as GM / Admin
participant Service as restartruntime
participant Stop as stopruntime.Run
participant Start as startruntime.Run
participant Docker
participant PG as Postgres
Admin->>Service: POST /restart
Service->>PG: SELECT runtime_records WHERE game_id
Note over Service: capture current image_ref
Service->>Service: acquire per-game lease (held across both inner ops)
Service->>Stop: Run(game_id) [lease bypass]
Stop->>Docker: ContainerStop
Stop->>PG: UpdateStatus running→stopped
Service->>Docker: ContainerRemove
Service->>Start: Run(game_id, image_ref) [lease bypass]
Start->>Docker: PullImage / Run
Start->>PG: Upsert runtime_records (status=running)
Service->>PG: INSERT operation_log (op_kind=restart, outcome=success, source_ref=correlation_id)
Service-->>Admin: 200 {runtime_record}
Service->>Service: release lease
```
The lease is acquired by `restartruntime` and held across both inner
operations; `stopruntime.Run` and `startruntime.Run` are
lease-bypass entry points that skip the inner lease acquisition
([`services.md` §12](services.md)). The single `operation_log` row
uses `Input.SourceRef` as a correlation id linking the implicit stop
and start entries ([`services.md` §13](services.md)).
## Patch
```mermaid
sequenceDiagram
participant Admin as GM / Admin
participant Service as patchruntime
participant Restart as restartruntime.Run
Admin->>Service: POST /patch {image_ref: "galaxy/game:1.4.2"}
Service->>Service: parse new image_ref + current image_ref
alt either ref not semver
Service-->>Admin: 422 image_ref_not_semver
else major or minor differ
Service-->>Admin: 422 semver_patch_only
else major.minor match, patch differs (or equal)
Service->>Restart: Run(game_id, new_image_ref)
Restart-->>Service: Result
Service-->>Admin: 200 {runtime_record}
end
```
The semver gate uses the tag fragment of the Docker reference; the
extraction strategy is recorded in [`services.md` §14](services.md).
The restart delegate already owns the lease, the inner stop/start,
the operation log, and the `runtime:health_events container_started`
emission ([`workers.md` §1](workers.md)).
## Cleanup TTL
```mermaid
sequenceDiagram
participant Worker as containercleanup worker
participant PG as Postgres
participant Service as cleanupcontainer
participant Lease as Redis lease
participant Docker
loop every RTMANAGER_CLEANUP_INTERVAL
Worker->>PG: SELECT runtime_records WHERE status='stopped' AND last_op_at < now - retention
loop per game
Worker->>Service: cleanup(game_id, op_source=auto_ttl)
Service->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Service->>PG: re-read runtime_records WHERE game_id
alt status = running
Service-->>Worker: refused / conflict
else status in {stopped, removed}
Service->>Docker: ContainerRemove(container_id)
Service->>PG: UpdateStatus stopped→removed (CAS)
Service->>PG: INSERT operation_log (op_kind=cleanup_container)
Service-->>Worker: success
end
Service->>Lease: DEL rtmanager:game_lease:{game_id}
end
end
```
Admin-driven cleanup follows the same path through
`DELETE /api/v1/internal/runtimes/{game_id}/container` with
`op_source=admin_rest` instead of `auto_ttl`. The host state directory
is **never** removed by this flow
([`../README.md` §Cleanup](../README.md#cleanup),
[`services.md` §17](services.md),
[`workers.md` §19](workers.md)).
## Reconcile drift adopt
```mermaid
sequenceDiagram
participant Reconciler as reconcile worker
participant Docker
participant PG as Postgres
participant Lease as Redis lease
Note over Reconciler: read pass (lockless)
Reconciler->>Docker: List({label=com.galaxy.owner=rtmanager})
Reconciler->>PG: ListByStatus(running)
Note over Reconciler: write pass (per-game lease)
loop per Docker container without matching record
Reconciler->>Lease: SET NX PX rtmanager:game_lease:{game_id}
Reconciler->>PG: re-read runtime_records WHERE game_id
alt record now exists
Reconciler-->>Reconciler: skip (state changed since read pass)
else record still missing
Reconciler->>PG: Upsert runtime_records (status=running, image_ref, started_at)
Reconciler->>PG: INSERT operation_log (op_kind=reconcile_adopt, op_source=auto_reconcile)
end
Reconciler->>Lease: DEL rtmanager:game_lease:{game_id}
end
```
The reconciler **never** stops or removes an unrecorded container —
operators may have started one manually for diagnostics. The
`reconcile_dispose` and `observed_exited` paths follow the same
read-pass / write-pass split, with `dispose` updating the orphaned
record to `removed` and emitting `container_disappeared`, and
`observed_exited` updating to `stopped` and emitting `container_exited`
([`../README.md` §Reconciliation](../README.md#reconciliation),
[`workers.md` §14–§16](workers.md)).
## Health probe hysteresis
```mermaid
sequenceDiagram
participant Worker as healthprobe worker
participant State as in-memory probe state
participant Engine as galaxy-game-{id}:8080
participant Health as runtime:health_events
loop every RTMANAGER_PROBE_INTERVAL
Worker->>Worker: ListByStatus(running)
Worker->>State: prune entries for games no longer running
loop per game (semaphore cap = 16)
Worker->>Engine: GET /healthz (RTMANAGER_PROBE_TIMEOUT)
alt success
State->>State: consecutiveFailures = 0
opt failurePublished was true
Worker->>Health: XADD probe_recovered {prior_failure_count}
State->>State: failurePublished = false
end
else failure
State->>State: consecutiveFailures++
opt consecutiveFailures == RTMANAGER_PROBE_FAILURES_THRESHOLD AND not failurePublished
Worker->>Health: XADD probe_failed {consecutive_failures, last_status, last_error}
State->>State: failurePublished = true
end
end
end
end
```
Hysteresis prevents a single transient failure from emitting a
`probe_failed` event, and prevents repeated emission while the failure
persists. State is non-persistent: a process restart re-establishes
the counters from scratch; a game's state is pruned when it transitions
out of the running list ([`workers.md` §5–§6](workers.md)).