Files
galaxy-game/rtmanager/docs/domain-and-ports.md
T
2026-04-28 20:39:18 +02:00

7.5 KiB

Domain and Ports

This document explains why the rtmanager domain layer (../internal/domain/) and the port interfaces (../internal/ports/) are shaped the way they are. The current-state types and method signatures are the source of truth in the code; this file records the rationale so future readers do not re-litigate the same trade-offs.

For the surrounding behaviour see ../README.md, the SQL CHECK constraints in ../internal/adapters/postgres/migrations/00001_init.sql, the wire contracts under ../api/, and postgres-migration.md for the persistence layer.

1. String-typed status enums

runtime.Status, operation.OpKind, operation.OpSource, operation.Outcome, health.EventType, health.SnapshotStatus, and health.SnapshotSource are all type X string.

The string approach wins on three counts:

  • the SQL CHECK constraints already store the values as text, so a string domain type maps one-to-one with no codec layer;
  • it matches Lobby (game.Status, membership.Status, application.Status), so reviewers do not switch encoding mental models when crossing service boundaries;
  • IsKnown keeps the invariant cheap (a single switch); a type X uint8 with stringer-generated names would pay a constant lookup and make raw SQL columns harder to read in diagnostics.

2. Plain string for CurrentContainerID and CurrentImageRef

The PostgreSQL columns are nullable. The domain model uses plain string with empty == NULL and bridges the SQL nullability inside the adapter. Pointer fields would force every consumer to dereference defensively even though business logic rarely cares about the NULL/empty distinction (removed records may legitimately carry either form depending on whether the record passed through stopped first).

The adapter's job is to translate sql.NullStringstring; the rest of the codebase reads the field as a regular value.

3. *time.Time for nullable timestamps

StartedAt, StoppedAt, RemovedAt retain pointer types. time.Time{} is a real, comparable value in Go (IsZero only reports the canonical zero time); mixing "missing" and "set to UTC zero" through plain time.Time would invite bugs. The jet-generated model.RuntimeRecords already declares the same fields as *time.Time, so the domain type aligns with the persistence type and the adapter does not re-shape pointers.

4. EventType and SnapshotStatus are deliberately distinct

runtime-health-asyncapi.yaml.EventType enumerates seven values; the SQL CHECK on health_snapshots.status enumerates six. The two sets overlap but are not identical:

  • container_started is an event; the snapshot collapses it to healthy (a successful start is observed as the container being live, not as an ongoing event);
  • probe_recovered is an event; it does not become a snapshot row of its own — the next inspect/probe overwrites the prior probe_failed with healthy.

Modelling them as one shared enum would require a separate "event vs snapshot" boolean and invite accidental mismatches. Two distinct types with explicit IsKnown matrices keep each surface honest at compile time.

5. Inspect split into InspectImage + InspectContainer

Two narrow methods replace a single polymorphic Inspect. The surface RTM exercises has two shapes:

  • the start service inspects the image by reference to read resource limits from labels;
  • the periodic inspect worker, the reconciler, and the events listener inspect containers by id to read state, health, restart count, and exit code.

The inputs differ (ref vs id), and the result types differ (ImageInspect.Labels is the only field used at start time, while ContainerInspect carries a dozen state fields). One polymorphic method would either split internally on input type or return a tagged union; either is messier than two narrow methods.

6. LobbyGameRecord is intentionally minimal

LobbyInternalClient.GetGame returns GameID, Status, and TargetEngineVersion. The fetch is classified as ancillary diagnostics because the start envelope already carries the only required field (image_ref).

Anything more would invite RTM consumers to depend on Lobby's schema in ways that violate the "RTM never resolves engine versions" rule. Future fields are additive: each new field is opt-in to the consumer and does not break existing call sites. The minimalism is also a hedge against schema drift — Lobby's GameRecord is large and changes more often than RTM needs to track.

7. NotificationIntentPublisher.Publish returns error, not (string, error)

Lobby's IntentPublisher.Publish returns the Redis Stream entry id so business workflows that key on it (idempotency keys, audit correlation) can capture it. RTM publishes admin-only failure intents where the entry id has no consumer — failing starts do not loop back to RTM, and notification routing keys on the producer-supplied idempotency_key rather than the stream id. The adapter wraps pkg/notificationintent.Publisher and discards the entry id at the wrapper boundary.

8. Exactly four allowed runtime transitions

runtime.AllowedTransitions covers:

  • running → stopped — graceful stop, observed exit, reconcile observed exited;
  • running → removedreconcile_dispose when the container vanished;
  • stopped → running — restart and patch inner start;
  • stopped → removed — cleanup TTL or admin DELETE.

Other pairs are intentionally rejected:

  • running → running and stopped → stopped would mean Upsert overwrote state without a CAS guard. Idempotent re-start / re-stop never transitions; the service layer returns replay_no_op and the record is left untouched.
  • removed → * is forbidden because removed is terminal. The reconciler creates fresh records with reconcile_adopt rather than resurrecting old ones.

Encoding the table this way means a future bug where a service tries to revive a removed record is rejected at the domain layer rather than the adapter, which keeps the failure mode close to the offending code.

9. PullPolicy re-declared inside ports/dockerclient.go

The same enum exists as config.ImagePullPolicy. Importing internal/config from the ports package would couple two unrelated layers and create a cyclic risk once the wiring layer pulls both in. The runtime/wiring layer (in internal/app) is the single point that translates between the two type aliases — both are string-typed, the value sets are identical, and the validation lives on each side independently.

10. Compile-time interface assertions live with adapters

Every interface has a var _ ports.X = (*Y)(nil) assertion, but the assertion lives in the adapter package (e.g. var _ ports.RuntimeRecordStore = (*Store)(nil) inside internal/adapters/postgres/runtimerecordstore). Putting the assertions in the port package would force the port package to import its own implementations and create an obvious import cycle.

11. RunSpec.Validate lives on the request type

The Docker port carries a non-trivial request type (RunSpec) with eight required fields and per-mount invariants. Putting Validate on the request struct keeps the rule next to the type definition, mirrors the pattern used by lobby/internal/ports/gmclient.go (RegisterGameRequest.Validate), and lets the adapter call it as the first defensive check before invoking the Docker SDK.