feat: runtime manager
This commit is contained in:
@@ -0,0 +1,167 @@
|
||||
# Domain and Ports
|
||||
|
||||
This document explains why the `rtmanager` domain layer
|
||||
([`../internal/domain/`](../internal/domain)) and the port interfaces
|
||||
([`../internal/ports/`](../internal/ports)) are shaped the way they are.
|
||||
The current-state types and method signatures are the source of truth in
|
||||
the code; this file records the rationale so future readers do not
|
||||
re-litigate the same trade-offs.
|
||||
|
||||
For the surrounding behaviour see
|
||||
[`../README.md`](../README.md), the SQL CHECK constraints in
|
||||
[`../internal/adapters/postgres/migrations/00001_init.sql`](../internal/adapters/postgres/migrations/00001_init.sql),
|
||||
the wire contracts under [`../api/`](../api), and
|
||||
[`postgres-migration.md`](postgres-migration.md) for the persistence
|
||||
layer.
|
||||
|
||||
## 1. String-typed status enums
|
||||
|
||||
`runtime.Status`, `operation.OpKind`, `operation.OpSource`,
|
||||
`operation.Outcome`, `health.EventType`, `health.SnapshotStatus`, and
|
||||
`health.SnapshotSource` are all `type X string`.
|
||||
|
||||
The string approach wins on three counts:
|
||||
|
||||
- the SQL CHECK constraints already store the values as `text`, so a
|
||||
string domain type maps one-to-one with no codec layer;
|
||||
- it matches Lobby (`game.Status`, `membership.Status`,
|
||||
`application.Status`), so reviewers do not switch encoding mental
|
||||
models when crossing service boundaries;
|
||||
- `IsKnown` keeps the invariant cheap (a single switch); a `type X uint8`
|
||||
with stringer-generated names would pay a constant lookup and make raw
|
||||
SQL columns harder to read in diagnostics.
|
||||
|
||||
## 2. Plain `string` for `CurrentContainerID` and `CurrentImageRef`
|
||||
|
||||
The PostgreSQL columns are nullable. The domain model uses plain
|
||||
`string` with empty == NULL and bridges the SQL nullability inside the
|
||||
adapter. Pointer fields would force every consumer to dereference
|
||||
defensively even though business logic rarely cares about the
|
||||
NULL/empty distinction (removed records may legitimately carry either
|
||||
form depending on whether the record passed through `stopped` first).
|
||||
|
||||
The adapter's job is to translate `sql.NullString` ⇄ `string`; the rest
|
||||
of the codebase reads the field as a regular value.
|
||||
|
||||
## 3. `*time.Time` for nullable timestamps
|
||||
|
||||
`StartedAt`, `StoppedAt`, `RemovedAt` retain pointer types. `time.Time{}`
|
||||
is a real, comparable value in Go (`IsZero` only reports the canonical
|
||||
zero time); mixing "missing" and "set to UTC zero" through plain
|
||||
`time.Time` would invite bugs. The jet-generated `model.RuntimeRecords`
|
||||
already declares the same fields as `*time.Time`, so the domain type
|
||||
aligns with the persistence type and the adapter does not re-shape
|
||||
pointers.
|
||||
|
||||
## 4. `EventType` and `SnapshotStatus` are deliberately distinct
|
||||
|
||||
`runtime-health-asyncapi.yaml.EventType` enumerates seven values; the
|
||||
SQL CHECK on `health_snapshots.status` enumerates six. The two sets
|
||||
overlap but are not identical:
|
||||
|
||||
- `container_started` is an *event*; the snapshot collapses it to
|
||||
`healthy` (a successful start is observed as the container being
|
||||
live, not as an ongoing event);
|
||||
- `probe_recovered` is an *event*; it does not become a snapshot row of
|
||||
its own — the next inspect/probe overwrites the prior `probe_failed`
|
||||
with `healthy`.
|
||||
|
||||
Modelling them as one shared enum would require a separate "event vs
|
||||
snapshot" boolean and invite accidental mismatches. Two distinct types
|
||||
with explicit `IsKnown` matrices keep each surface honest at compile
|
||||
time.
|
||||
|
||||
## 5. `Inspect` split into `InspectImage` + `InspectContainer`
|
||||
|
||||
Two narrow methods replace a single polymorphic `Inspect`. The surface
|
||||
RTM exercises has two shapes:
|
||||
|
||||
- the start service inspects the *image* by reference to read resource
|
||||
limits from labels;
|
||||
- the periodic inspect worker, the reconciler, and the events listener
|
||||
inspect *containers* by id to read state, health, restart count, and
|
||||
exit code.
|
||||
|
||||
The inputs differ (ref vs id), and the result types differ
|
||||
(`ImageInspect.Labels` is the only field used at start time, while
|
||||
`ContainerInspect` carries a dozen state fields). One polymorphic
|
||||
method would either split internally on input type or return a tagged
|
||||
union; either is messier than two narrow methods.
|
||||
|
||||
## 6. `LobbyGameRecord` is intentionally minimal
|
||||
|
||||
`LobbyInternalClient.GetGame` returns `GameID`, `Status`, and
|
||||
`TargetEngineVersion`. The fetch is classified as ancillary diagnostics
|
||||
because the start envelope already carries the only required field
|
||||
(`image_ref`).
|
||||
|
||||
Anything more would invite RTM consumers to depend on Lobby's schema in
|
||||
ways that violate the "RTM never resolves engine versions" rule.
|
||||
Future fields are additive: each new field is opt-in to the consumer
|
||||
and does not break existing call sites. The minimalism is also a hedge
|
||||
against schema drift — Lobby's `GameRecord` is large and changes more
|
||||
often than RTM needs to track.
|
||||
|
||||
## 7. `NotificationIntentPublisher.Publish` returns `error`, not `(string, error)`
|
||||
|
||||
Lobby's `IntentPublisher.Publish` returns the Redis Stream entry id so
|
||||
business workflows that key on it (idempotency keys, audit
|
||||
correlation) can capture it. RTM publishes admin-only failure intents
|
||||
where the entry id has no consumer — failing starts do not loop back
|
||||
to RTM, and notification routing keys on the producer-supplied
|
||||
`idempotency_key` rather than the stream id. The adapter wraps
|
||||
`pkg/notificationintent.Publisher` and discards the entry id at the
|
||||
wrapper boundary.
|
||||
|
||||
## 8. Exactly four allowed runtime transitions
|
||||
|
||||
`runtime.AllowedTransitions` covers:
|
||||
|
||||
- `running → stopped` — graceful stop, observed exit, reconcile
|
||||
observed exited;
|
||||
- `running → removed` — `reconcile_dispose` when the container
|
||||
vanished;
|
||||
- `stopped → running` — restart and patch inner start;
|
||||
- `stopped → removed` — cleanup TTL or admin DELETE.
|
||||
|
||||
Other pairs are intentionally rejected:
|
||||
|
||||
- `running → running` and `stopped → stopped` would mean Upsert
|
||||
overwrote state without a CAS guard. Idempotent re-start / re-stop
|
||||
never transitions; the service layer returns `replay_no_op` and the
|
||||
record is left untouched.
|
||||
- `removed → *` is forbidden because `removed` is terminal. The
|
||||
reconciler creates fresh records with `reconcile_adopt` rather than
|
||||
resurrecting old ones.
|
||||
|
||||
Encoding the table this way means a future bug where a service tries
|
||||
to revive a removed record is rejected at the domain layer rather than
|
||||
the adapter, which keeps the failure mode close to the offending code.
|
||||
|
||||
## 9. `PullPolicy` re-declared inside `ports/dockerclient.go`
|
||||
|
||||
The same enum exists as `config.ImagePullPolicy`. Importing
|
||||
`internal/config` from the ports package would couple two unrelated
|
||||
layers and create a cyclic risk once the wiring layer pulls both in.
|
||||
The runtime/wiring layer (in `internal/app`) is the single point that
|
||||
translates between the two type aliases — both are `string`-typed, the
|
||||
value sets are identical, and the validation lives on each side
|
||||
independently.
|
||||
|
||||
## 10. Compile-time interface assertions live with adapters
|
||||
|
||||
Every interface has a `var _ ports.X = (*Y)(nil)` assertion, but the
|
||||
assertion lives in the adapter package (e.g.
|
||||
`var _ ports.RuntimeRecordStore = (*Store)(nil)` inside
|
||||
`internal/adapters/postgres/runtimerecordstore`). Putting the
|
||||
assertions in the port package would force the port package to import
|
||||
its own implementations and create an obvious import cycle.
|
||||
|
||||
## 11. `RunSpec.Validate` lives on the request type
|
||||
|
||||
The Docker port carries a non-trivial request type (`RunSpec`) with
|
||||
eight required fields and per-mount invariants. Putting `Validate` on
|
||||
the request struct keeps the rule next to the type definition, mirrors
|
||||
the pattern used by `lobby/internal/ports/gmclient.go`
|
||||
(`RegisterGameRequest.Validate`), and lets the adapter call it as the
|
||||
first defensive check before invoking the Docker SDK.
|
||||
Reference in New Issue
Block a user