296 lines
11 KiB
Markdown
296 lines
11 KiB
Markdown
# Domain and Protocol Flows
|
|
|
|
This document collects the multi-step interactions inside `backend`
|
|
that span domain modules. Each section assumes the reader is familiar
|
|
with `../README.md` and `../../docs/ARCHITECTURE.md`.
|
|
|
|
## Registration (send + confirm)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Gateway
|
|
participant Auth
|
|
participant User
|
|
participant Geo
|
|
participant Mail
|
|
participant Mailpit as SMTP relay
|
|
|
|
Client->>Gateway: POST /api/v1/public/auth/send-email-code\nbody: {email}; header Accept-Language
|
|
Gateway->>Auth: forward + Accept-Language
|
|
Auth->>Auth: hash code (bcrypt cost 10)
|
|
Auth->>Auth: persist auth_challenges row<br/>(stores preferred_language)
|
|
Auth->>Mail: EnqueueLoginCode(email, code, ttl)
|
|
Mail-->>Auth: delivery_id
|
|
Auth-->>Gateway: 200 {challenge_id}
|
|
Gateway-->>Client: 200 {challenge_id}
|
|
Mail->>Mailpit: SMTP delivery (worker)
|
|
|
|
Client->>Gateway: POST /api/v1/public/auth/confirm-email-code\nbody: {challenge_id, code, client_public_key, time_zone}
|
|
Gateway->>Auth: forward
|
|
Auth->>Auth: SELECT FOR UPDATE auth_challenges<br/>(increment attempts, enforce ceiling)
|
|
Auth->>Auth: bcrypt verify
|
|
Auth->>User: EnsureByEmail(email, preferred_language, time_zone, source_ip)
|
|
User->>User: insert account if missing<br/>(synth Player-XXXXXXXX)
|
|
User->>Geo: SetDeclaredCountryAtRegistration(user_id, source_ip)
|
|
User-->>Auth: user_id
|
|
Auth->>Auth: SELECT FOR UPDATE again,<br/>mark consumed,<br/>insert device_session,<br/>cache write-through
|
|
Auth-->>Gateway: 200 {device_session_id}
|
|
Gateway-->>Client: 200 {device_session_id}
|
|
```
|
|
|
|
A `challenge_id` is single-use: confirm consumes the row in the same
|
|
transaction that inserts the device session, so a second confirm-email-code
|
|
on the same id returns `400 invalid_request` (`auth.ErrChallengeNotFound`)
|
|
together with unknown and expired ids. The opaque error code is
|
|
deliberate — the API never differentiates "consumed", "expired", and
|
|
"never existed" so an attacker cannot mine challenge_id state.
|
|
|
|
Throttle reuses the latest un-consumed challenge rather than dropping
|
|
the request: send-email-code returns the existing `challenge_id` to a
|
|
caller hitting the throttle, leaving the wire shape identical to a
|
|
fresh issue.
|
|
|
|
`accounts.permanent_block` is checked twice on the registration path:
|
|
once in send-email-code (no fresh challenge for an already-blocked
|
|
address) and once in confirm-email-code after the verification code has
|
|
matched (catches the case where an admin applied the block in the
|
|
window between the two calls). Both paths surface
|
|
`auth.ErrEmailPermanentlyBlocked` and the handler maps it to `400
|
|
invalid_request` with message `email is not allowed`.
|
|
|
|
`accounts.user_name` is synthesised once at first sign-in and never
|
|
overwritten on subsequent sign-ins; the same account always lands the
|
|
same handle.
|
|
|
|
## Authenticated request lifecycle
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Gateway
|
|
participant Backend HTTP
|
|
participant Cache
|
|
participant Domain
|
|
participant Postgres
|
|
|
|
Client->>Gateway: signed gRPC ExecuteCommand
|
|
Gateway->>Gateway: verify signature, payload_hash,<br/>freshness, anti-replay
|
|
Gateway->>Backend HTTP: GET /api/v1/internal/sessions/{id}
|
|
Backend HTTP-->>Gateway: 200 {user_id, status:active}
|
|
Gateway->>Backend HTTP: forward command\nas REST + X-User-ID
|
|
Backend HTTP->>Cache: lookup
|
|
Cache-->>Backend HTTP: hit / miss
|
|
alt cache miss
|
|
Backend HTTP->>Postgres: read
|
|
Postgres-->>Backend HTTP: row
|
|
Backend HTTP->>Cache: warm
|
|
end
|
|
Backend HTTP->>Domain: business logic
|
|
Domain->>Postgres: write
|
|
Domain->>Cache: write-through after commit
|
|
Domain-->>Backend HTTP: result
|
|
Backend HTTP-->>Gateway: JSON
|
|
Gateway->>Gateway: encode FlatBuffers,<br/>sign response envelope
|
|
Gateway-->>Client: signed gRPC response
|
|
```
|
|
|
|
`X-User-ID` is the sole identity input on the user surface. The geo
|
|
counter middleware fires off `geo.IncrementCounterAsync` after the
|
|
handler returns successfully; the request itself does not block on
|
|
that.
|
|
|
|
## Lobby state machine and Race Name Directory
|
|
|
|
The lobby state machine is the closed transition graph below. Owner
|
|
endpoints (or admin overrides for public games owned by NULL) drive
|
|
forward transitions; the runtime callback is the only path that flips
|
|
`starting → running`. Every transition checks ownership, target state,
|
|
and idempotency.
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> draft
|
|
draft --> enrollment_open: open-enrollment
|
|
enrollment_open --> ready_to_start: ready-to-start (auto on min_players)
|
|
ready_to_start --> starting: start
|
|
starting --> running: runtime ack
|
|
starting --> start_failed: runtime error
|
|
start_failed --> ready_to_start: retry-start
|
|
running --> paused: pause
|
|
paused --> running: resume
|
|
running --> finished: engine finish callback
|
|
running --> cancelled: cancel
|
|
paused --> cancelled: cancel
|
|
starting --> cancelled: cancel
|
|
enrollment_open --> cancelled: cancel
|
|
ready_to_start --> cancelled: cancel
|
|
draft --> cancelled: cancel
|
|
cancelled --> [*]
|
|
finished --> [*]
|
|
```
|
|
|
|
The Race Name Directory has three tiers:
|
|
|
|
- **registered** — platform-unique. Single live binding per canonical
|
|
key.
|
|
- **reservation** — per-game; a user can hold the same canonical key
|
|
in multiple active games concurrently.
|
|
- **pending_registration** — issued after a "capable finish"
|
|
(`max_planets > initial AND max_population > initial`). The pending
|
|
entry is auto-promoted to `registered` if the user calls
|
|
`POST /api/v1/user/lobby/race-names/register` within
|
|
`BACKEND_LOBBY_PENDING_REGISTRATION_TTL` (default 30 days);
|
|
otherwise the sweeper releases it.
|
|
|
|
Canonicalisation goes through
|
|
[`disciplinedware/go-confusables`](https://github.com/disciplinedware/go-confusables)
|
|
plus a small anti-fraud map (digit-letter substitution for common
|
|
look-alikes). Cross-user uniqueness across reservations and pending
|
|
registrations is enforced with a per-canonical advisory lock at write
|
|
time, since `race_names` is a composite PK that does not express that
|
|
invariant alone.
|
|
|
|
## Mail outbox
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Producer
|
|
participant Mail
|
|
participant Postgres
|
|
participant Worker
|
|
participant SMTP
|
|
participant Admin
|
|
|
|
Producer->>Mail: EnqueueLoginCode / EnqueueTemplate
|
|
Mail->>Postgres: insert mail_payloads + mail_deliveries<br/>(unique on template_id, idempotency_key)
|
|
Mail-->>Producer: delivery_id
|
|
|
|
loop every BACKEND_MAIL_WORKER_INTERVAL
|
|
Worker->>Postgres: SELECT FOR UPDATE SKIP LOCKED
|
|
Postgres-->>Worker: row
|
|
Worker->>SMTP: send via wneessen/go-mail
|
|
alt success
|
|
Worker->>Postgres: insert mail_attempts(success),<br/>mark delivery sent
|
|
else transient
|
|
Worker->>Postgres: insert mail_attempts(transient),<br/>schedule next_attempt_at + jitter
|
|
else permanent or attempts >= MAX
|
|
Worker->>Postgres: insert mail_attempts(permanent),<br/>move to mail_dead_letters
|
|
Worker->>Admin: notification intent (mail.dead_lettered)
|
|
end
|
|
end
|
|
```
|
|
|
|
`mail_attempts.attempt_no` is monotonic across the entire history of a
|
|
single delivery. Resend on a `pending` / `retrying` / `dead_lettered`
|
|
row re-arms the row; resend on `sent` returns `409 Conflict`.
|
|
|
|
## Notification fan-out
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Producer
|
|
participant Notif
|
|
participant Postgres
|
|
participant Push
|
|
participant Mail
|
|
|
|
Producer->>Notif: Submit(intent)
|
|
Notif->>Notif: validate kind + payload
|
|
Notif->>Postgres: INSERT notifications ON CONFLICT (kind, idempotency_key) DO NOTHING
|
|
Notif->>Postgres: materialise notification_routes<br/>per channel from catalog
|
|
Notif->>Push: PublishClientEvent(user_id, payload)
|
|
Notif->>Mail: EnqueueTemplate(template_id, recipient,<br/>payload, route_id)
|
|
Notif-->>Producer: ok (best-effort dispatch)
|
|
|
|
loop every BACKEND_NOTIFICATION_WORKER_INTERVAL
|
|
Postgres-->>Notif: routes still in pending / retrying
|
|
Notif->>Push: retry push (or)
|
|
Notif->>Mail: re-arm mail row
|
|
end
|
|
```
|
|
|
|
`auth.login_code` bypasses notification entirely: auth writes the
|
|
delivery row directly so the challenge commit is atomic with the mail
|
|
queue insert. Catalog entries that target administrators land email
|
|
on `BACKEND_NOTIFICATION_ADMIN_EMAIL`; if the variable is empty the
|
|
route lands with `status='skipped'` and an operator log line records
|
|
the configuration miss.
|
|
|
|
## Runtime job lifecycle
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Lobby
|
|
participant Runtime
|
|
participant Workers
|
|
participant Docker
|
|
participant Engine
|
|
participant Reconciler
|
|
|
|
Lobby->>Runtime: StartGame(game_id)
|
|
Runtime->>Workers: enqueue start job
|
|
Runtime-->>Lobby: ack
|
|
|
|
Workers->>Docker: pull / create / start engine container
|
|
Docker-->>Workers: container id
|
|
Workers->>Engine: POST /api/v1/admin/init
|
|
Engine-->>Workers: ok / error
|
|
Workers->>Runtime: write runtime_records (running or start_failed)
|
|
Workers->>Lobby: OnRuntimeJobResult
|
|
|
|
loop scheduler tick
|
|
Workers->>Engine: PUT /api/v1/admin/turn
|
|
Engine-->>Workers: snapshot
|
|
Workers->>Runtime: persist runtime_records
|
|
Workers->>Lobby: OnRuntimeSnapshot
|
|
end
|
|
|
|
Reconciler->>Docker: list containers labelled galaxy.backend=1
|
|
alt missing recorded container
|
|
Reconciler->>Runtime: mark removed
|
|
Reconciler->>Lobby: OnRuntimeJobResult(removed)
|
|
else unrecorded labelled container
|
|
Reconciler->>Runtime: adopt
|
|
end
|
|
```
|
|
|
|
Per-game serialisation is enforced by a `sync.Map[game_id]*sync.Mutex`
|
|
inside `runtime.Service`, so concurrent start / stop / patch attempts
|
|
on the same `game_id` cannot race. `runtime_operation_log` records
|
|
every operation for audit.
|
|
|
|
## Push gRPC
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Backend
|
|
participant Ring
|
|
participant Gateway
|
|
|
|
loop domain emits client_event / session_invalidation
|
|
Backend->>Ring: append, allocate cursor
|
|
end
|
|
|
|
Gateway->>Backend: SubscribePush(GatewaySubscribeRequest{cursor?})
|
|
alt cursor present and within ring TTL
|
|
Backend->>Gateway: replay events newer than cursor
|
|
else cursor missing or aged out
|
|
Backend->>Gateway: stream from current head
|
|
end
|
|
|
|
loop event published
|
|
Backend->>Gateway: PushEvent
|
|
end
|
|
|
|
Gateway->>Backend: same gateway_client_id reconnects
|
|
Backend->>Backend: cancel previous stream (codes.Aborted)
|
|
Backend->>Gateway: stream again
|
|
```
|
|
|
|
The cursor is a zero-padded decimal `uint64` minted by an in-process
|
|
counter; backend resets the sequence after a restart, so cursors are
|
|
only meaningful within a single process lifetime. Per-connection
|
|
backpressure is drop-oldest, with a log line on each drop so the
|
|
gateway side can correlate gaps.
|