15d35f6f1f
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.
The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).
Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
296 lines
11 KiB
Markdown
296 lines
11 KiB
Markdown
# Domain and Protocol Flows
|
|
|
|
This document collects the multi-step interactions inside `backend`
|
|
that span domain modules. Each section assumes the reader is familiar
|
|
with `../README.md` and `../../docs/ARCHITECTURE.md`.
|
|
|
|
## Registration (send + confirm)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Gateway
|
|
participant Auth
|
|
participant User
|
|
participant Geo
|
|
participant Mail
|
|
participant Mailpit as SMTP relay
|
|
|
|
Client->>Gateway: POST /api/v1/public/auth/send-email-code\nbody: {email}; header Accept-Language
|
|
Gateway->>Auth: forward + Accept-Language
|
|
Auth->>Auth: hash code (bcrypt cost 10)
|
|
Auth->>Auth: persist auth_challenges row<br/>(stores preferred_language)
|
|
Auth->>Mail: EnqueueLoginCode(email, code, ttl)
|
|
Mail-->>Auth: delivery_id
|
|
Auth-->>Gateway: 200 {challenge_id}
|
|
Gateway-->>Client: 200 {challenge_id}
|
|
Mail->>Mailpit: SMTP delivery (worker)
|
|
|
|
Client->>Gateway: POST /api/v1/public/auth/confirm-email-code\nbody: {challenge_id, code, client_public_key, time_zone}
|
|
Gateway->>Auth: forward
|
|
Auth->>Auth: SELECT FOR UPDATE auth_challenges<br/>(increment attempts, enforce ceiling)
|
|
Auth->>Auth: bcrypt verify
|
|
Auth->>User: EnsureByEmail(email, preferred_language, time_zone, source_ip)
|
|
User->>User: insert account if missing<br/>(synth Player-XXXXXXXX)
|
|
User->>Geo: SetDeclaredCountryAtRegistration(user_id, source_ip)
|
|
User-->>Auth: user_id
|
|
Auth->>Auth: SELECT FOR UPDATE again,<br/>mark consumed,<br/>insert device_session,<br/>cache write-through
|
|
Auth-->>Gateway: 200 {device_session_id}
|
|
Gateway-->>Client: 200 {device_session_id}
|
|
```
|
|
|
|
A `challenge_id` is single-use: confirm consumes the row in the same
|
|
transaction that inserts the device session, so a second confirm-email-code
|
|
on the same id returns `400 invalid_request` (`auth.ErrChallengeNotFound`)
|
|
together with unknown and expired ids. The opaque error code is
|
|
deliberate — the API never differentiates "consumed", "expired", and
|
|
"never existed" so an attacker cannot mine challenge_id state.
|
|
|
|
Throttle reuses the latest un-consumed challenge rather than dropping
|
|
the request: send-email-code returns the existing `challenge_id` to a
|
|
caller hitting the throttle, leaving the wire shape identical to a
|
|
fresh issue.
|
|
|
|
`accounts.permanent_block` is checked twice on the registration path:
|
|
once in send-email-code (no fresh challenge for an already-blocked
|
|
address) and once in confirm-email-code after the verification code has
|
|
matched (catches the case where an admin applied the block in the
|
|
window between the two calls). Both paths surface
|
|
`auth.ErrEmailPermanentlyBlocked` and the handler maps it to `400
|
|
invalid_request` with message `email is not allowed`.
|
|
|
|
`accounts.user_name` is synthesised once at first sign-in and never
|
|
overwritten on subsequent sign-ins; the same account always lands the
|
|
same handle.
|
|
|
|
## Authenticated request lifecycle
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Gateway
|
|
participant Backend HTTP
|
|
participant Cache
|
|
participant Domain
|
|
participant Postgres
|
|
|
|
Client->>Gateway: signed gRPC ExecuteCommand
|
|
Gateway->>Gateway: verify signature, payload_hash,<br/>freshness, anti-replay
|
|
Gateway->>Backend HTTP: GET /api/v1/internal/sessions/{id}
|
|
Backend HTTP-->>Gateway: 200 {user_id, status:active}
|
|
Gateway->>Backend HTTP: forward command\nas REST + X-User-ID
|
|
Backend HTTP->>Cache: lookup
|
|
Cache-->>Backend HTTP: hit / miss
|
|
alt cache miss
|
|
Backend HTTP->>Postgres: read
|
|
Postgres-->>Backend HTTP: row
|
|
Backend HTTP->>Cache: warm
|
|
end
|
|
Backend HTTP->>Domain: business logic
|
|
Domain->>Postgres: write
|
|
Domain->>Cache: write-through after commit
|
|
Domain-->>Backend HTTP: result
|
|
Backend HTTP-->>Gateway: JSON
|
|
Gateway->>Gateway: encode FlatBuffers,<br/>sign response envelope
|
|
Gateway-->>Client: signed gRPC response
|
|
```
|
|
|
|
`X-User-ID` is the sole identity input on the user surface. The geo
|
|
counter middleware fires off `geo.IncrementCounterAsync` after the
|
|
handler returns successfully; the request itself does not block on
|
|
that.
|
|
|
|
## Lobby state machine and Race Name Directory
|
|
|
|
The lobby state machine is the closed transition graph below. Owner
|
|
endpoints (or admin overrides for public games owned by NULL) drive
|
|
forward transitions; the runtime callback is the only path that flips
|
|
`starting → running`. Every transition checks ownership, target state,
|
|
and idempotency.
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> draft
|
|
draft --> enrollment_open: open-enrollment
|
|
enrollment_open --> ready_to_start: ready-to-start (auto on min_players)
|
|
ready_to_start --> starting: start
|
|
starting --> running: runtime ack
|
|
starting --> start_failed: runtime error
|
|
start_failed --> ready_to_start: retry-start
|
|
running --> paused: pause
|
|
paused --> running: resume
|
|
running --> finished: engine finish callback
|
|
running --> cancelled: cancel
|
|
paused --> cancelled: cancel
|
|
starting --> cancelled: cancel
|
|
enrollment_open --> cancelled: cancel
|
|
ready_to_start --> cancelled: cancel
|
|
draft --> cancelled: cancel
|
|
cancelled --> [*]
|
|
finished --> [*]
|
|
```
|
|
|
|
The Race Name Directory has three tiers:
|
|
|
|
- **registered** — platform-unique. Single live binding per canonical
|
|
key.
|
|
- **reservation** — per-game; a user can hold the same canonical key
|
|
in multiple active games concurrently.
|
|
- **pending_registration** — issued after a "capable finish"
|
|
(`max_planets > initial AND max_population > initial`). The pending
|
|
entry is auto-promoted to `registered` if the user calls
|
|
`POST /api/v1/user/lobby/race-names/register` within
|
|
`BACKEND_LOBBY_PENDING_REGISTRATION_TTL` (default 30 days);
|
|
otherwise the sweeper releases it.
|
|
|
|
Canonicalisation goes through
|
|
[`disciplinedware/go-confusables`](https://github.com/disciplinedware/go-confusables)
|
|
plus a small anti-fraud map (digit-letter substitution for common
|
|
look-alikes). Cross-user uniqueness across reservations and pending
|
|
registrations is enforced with a per-canonical advisory lock at write
|
|
time, since `race_names` is a composite PK that does not express that
|
|
invariant alone.
|
|
|
|
## Mail outbox
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Producer
|
|
participant Mail
|
|
participant Postgres
|
|
participant Worker
|
|
participant SMTP
|
|
participant Admin
|
|
|
|
Producer->>Mail: EnqueueLoginCode / EnqueueTemplate
|
|
Mail->>Postgres: insert mail_payloads + mail_deliveries<br/>(unique on template_id, idempotency_key)
|
|
Mail-->>Producer: delivery_id
|
|
|
|
loop every BACKEND_MAIL_WORKER_INTERVAL
|
|
Worker->>Postgres: SELECT FOR UPDATE SKIP LOCKED
|
|
Postgres-->>Worker: row
|
|
Worker->>SMTP: send via wneessen/go-mail
|
|
alt success
|
|
Worker->>Postgres: insert mail_attempts(success),<br/>mark delivery sent
|
|
else transient
|
|
Worker->>Postgres: insert mail_attempts(transient),<br/>schedule next_attempt_at + jitter
|
|
else permanent or attempts >= MAX
|
|
Worker->>Postgres: insert mail_attempts(permanent),<br/>move to mail_dead_letters
|
|
Worker->>Admin: notification intent (mail.dead_lettered)
|
|
end
|
|
end
|
|
```
|
|
|
|
`mail_attempts.attempt_no` is monotonic across the entire history of a
|
|
single delivery. Resend on a `pending` / `retrying` / `dead_lettered`
|
|
row re-arms the row; resend on `sent` returns `409 Conflict`.
|
|
|
|
## Notification fan-out
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Producer
|
|
participant Notif
|
|
participant Postgres
|
|
participant Push
|
|
participant Mail
|
|
|
|
Producer->>Notif: Submit(intent)
|
|
Notif->>Notif: validate kind + payload
|
|
Notif->>Postgres: INSERT notifications ON CONFLICT (kind, idempotency_key) DO NOTHING
|
|
Notif->>Postgres: materialise notification_routes<br/>per channel from catalog
|
|
Notif->>Push: PublishClientEvent(user_id, payload)
|
|
Notif->>Mail: EnqueueTemplate(template_id, recipient,<br/>payload, route_id)
|
|
Notif-->>Producer: ok (best-effort dispatch)
|
|
|
|
loop every BACKEND_NOTIFICATION_WORKER_INTERVAL
|
|
Postgres-->>Notif: routes still in pending / retrying
|
|
Notif->>Push: retry push (or)
|
|
Notif->>Mail: re-arm mail row
|
|
end
|
|
```
|
|
|
|
`auth.login_code` bypasses notification entirely: auth writes the
|
|
delivery row directly so the challenge commit is atomic with the mail
|
|
queue insert. Catalog entries that target administrators land email
|
|
on `BACKEND_NOTIFICATION_ADMIN_EMAIL`; if the variable is empty the
|
|
route lands with `status='skipped'` and an operator log line records
|
|
the configuration miss.
|
|
|
|
## Runtime job lifecycle
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Lobby
|
|
participant Runtime
|
|
participant Workers
|
|
participant Docker
|
|
participant Engine
|
|
participant Reconciler
|
|
|
|
Lobby->>Runtime: StartGame(game_id)
|
|
Runtime->>Workers: enqueue start job
|
|
Runtime-->>Lobby: ack
|
|
|
|
Workers->>Docker: pull / create / start engine container
|
|
Docker-->>Workers: container id
|
|
Workers->>Engine: POST /api/v1/admin/init {gameId, races}
|
|
Engine-->>Workers: StateResponse{id == gameId} / error
|
|
Workers->>Runtime: write runtime_records (running or start_failed)
|
|
Workers->>Lobby: OnRuntimeJobResult
|
|
|
|
loop scheduler tick
|
|
Workers->>Engine: PUT /api/v1/admin/turn
|
|
Engine-->>Workers: snapshot
|
|
Workers->>Runtime: persist runtime_records
|
|
Workers->>Lobby: OnRuntimeSnapshot
|
|
end
|
|
|
|
Reconciler->>Docker: list containers labelled galaxy.backend=1
|
|
alt missing recorded container
|
|
Reconciler->>Runtime: mark removed
|
|
Reconciler->>Lobby: OnRuntimeJobResult(removed)
|
|
else unrecorded labelled container
|
|
Reconciler->>Runtime: adopt
|
|
end
|
|
```
|
|
|
|
Per-game serialisation is enforced by a `sync.Map[game_id]*sync.Mutex`
|
|
inside `runtime.Service`, so concurrent start / stop / patch attempts
|
|
on the same `game_id` cannot race. `runtime_operation_log` records
|
|
every operation for audit.
|
|
|
|
## Push gRPC
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Backend
|
|
participant Ring
|
|
participant Gateway
|
|
|
|
loop domain emits client_event / session_invalidation
|
|
Backend->>Ring: append, allocate cursor
|
|
end
|
|
|
|
Gateway->>Backend: SubscribePush(GatewaySubscribeRequest{cursor?})
|
|
alt cursor present and within ring TTL
|
|
Backend->>Gateway: replay events newer than cursor
|
|
else cursor missing or aged out
|
|
Backend->>Gateway: stream from current head
|
|
end
|
|
|
|
loop event published
|
|
Backend->>Gateway: PushEvent
|
|
end
|
|
|
|
Gateway->>Backend: same gateway_client_id reconnects
|
|
Backend->>Backend: cancel previous stream (codes.Aborted)
|
|
Backend->>Gateway: stream again
|
|
```
|
|
|
|
The cursor is a zero-padded decimal `uint64` minted by an in-process
|
|
counter; backend resets the sequence after a restart, so cursors are
|
|
only meaningful within a single process lifetime. Per-connection
|
|
backpressure is drop-oldest, with a log line on each drop so the
|
|
gateway side can correlate gaps.
|