Files
galaxy-game/backend/docs/flows.md
T
2026-05-06 10:14:55 +03:00

9.8 KiB

Domain and Protocol Flows

This document collects the multi-step interactions inside backend that span domain modules. Each section assumes the reader is familiar with ../README.md and ../../ARCHITECTURE.md.

Registration (send + confirm)

sequenceDiagram
    participant Client
    participant Gateway
    participant Auth
    participant User
    participant Geo
    participant Mail
    participant Mailpit as SMTP relay

    Client->>Gateway: POST /api/v1/public/auth/send-email-code\nbody: {email}; header Accept-Language
    Gateway->>Auth: forward + Accept-Language
    Auth->>Auth: hash code (bcrypt cost 10)
    Auth->>Auth: persist auth_challenges row<br/>(stores preferred_language)
    Auth->>Mail: EnqueueLoginCode(email, code, ttl)
    Mail-->>Auth: delivery_id
    Auth-->>Gateway: 200 {challenge_id}
    Gateway-->>Client: 200 {challenge_id}
    Mail->>Mailpit: SMTP delivery (worker)

    Client->>Gateway: POST /api/v1/public/auth/confirm-email-code\nbody: {challenge_id, code, client_public_key, time_zone}
    Gateway->>Auth: forward
    Auth->>Auth: SELECT FOR UPDATE auth_challenges<br/>(increment attempts, enforce ceiling)
    Auth->>Auth: bcrypt verify
    Auth->>User: EnsureByEmail(email, preferred_language, time_zone, source_ip)
    User->>User: insert account if missing<br/>(synth Player-XXXXXXXX)
    User->>Geo: SetDeclaredCountryAtRegistration(user_id, source_ip)
    User-->>Auth: user_id
    Auth->>Auth: SELECT FOR UPDATE again,<br/>mark consumed,<br/>insert device_session,<br/>cache write-through
    Auth-->>Gateway: 200 {device_session_id}
    Gateway-->>Client: 200 {device_session_id}

Re-confirming the same challenge_id returns the existing session and clears the throttle window (the throttle reuses the latest un-consumed challenge rather than dropping the request). accounts.user_name is synthesised once and never overwritten on subsequent sign-ins; the same account always lands the same handle.

Authenticated request lifecycle

sequenceDiagram
    participant Client
    participant Gateway
    participant Backend HTTP
    participant Cache
    participant Domain
    participant Postgres

    Client->>Gateway: signed gRPC ExecuteCommand
    Gateway->>Gateway: verify signature, payload_hash,<br/>freshness, anti-replay
    Gateway->>Backend HTTP: GET /api/v1/internal/sessions/{id}
    Backend HTTP-->>Gateway: 200 {user_id, status:active}
    Gateway->>Backend HTTP: forward command\nas REST + X-User-ID
    Backend HTTP->>Cache: lookup
    Cache-->>Backend HTTP: hit / miss
    alt cache miss
        Backend HTTP->>Postgres: read
        Postgres-->>Backend HTTP: row
        Backend HTTP->>Cache: warm
    end
    Backend HTTP->>Domain: business logic
    Domain->>Postgres: write
    Domain->>Cache: write-through after commit
    Domain-->>Backend HTTP: result
    Backend HTTP-->>Gateway: JSON
    Gateway->>Gateway: encode FlatBuffers,<br/>sign response envelope
    Gateway-->>Client: signed gRPC response

X-User-ID is the sole identity input on the user surface. The geo counter middleware fires off geo.IncrementCounterAsync after the handler returns successfully; the request itself does not block on that.

Lobby state machine and Race Name Directory

The lobby state machine is the closed transition graph below. Owner endpoints (or admin overrides for public games owned by NULL) drive forward transitions; the runtime callback is the only path that flips starting → running. Every transition checks ownership, target state, and idempotency.

stateDiagram-v2
    [*] --> draft
    draft --> enrollment_open: open-enrollment
    enrollment_open --> ready_to_start: ready-to-start (auto on min_players)
    ready_to_start --> starting: start
    starting --> running: runtime ack
    starting --> start_failed: runtime error
    start_failed --> ready_to_start: retry-start
    running --> paused: pause
    paused --> running: resume
    running --> finished: engine finish callback
    running --> cancelled: cancel
    paused --> cancelled: cancel
    starting --> cancelled: cancel
    enrollment_open --> cancelled: cancel
    ready_to_start --> cancelled: cancel
    draft --> cancelled: cancel
    cancelled --> [*]
    finished --> [*]

The Race Name Directory has three tiers:

  • registered — platform-unique. Single live binding per canonical key.
  • reservation — per-game; a user can hold the same canonical key in multiple active games concurrently.
  • pending_registration — issued after a "capable finish" (max_planets > initial AND max_population > initial). The pending entry is auto-promoted to registered if the user calls POST /api/v1/user/lobby/race-names/register within BACKEND_LOBBY_PENDING_REGISTRATION_TTL (default 30 days); otherwise the sweeper releases it.

Canonicalisation goes through disciplinedware/go-confusables plus a small anti-fraud map (digit-letter substitution for common look-alikes). Cross-user uniqueness across reservations and pending registrations is enforced with a per-canonical advisory lock at write time, since race_names is a composite PK that does not express that invariant alone.

Mail outbox

sequenceDiagram
    participant Producer
    participant Mail
    participant Postgres
    participant Worker
    participant SMTP
    participant Admin

    Producer->>Mail: EnqueueLoginCode / EnqueueTemplate
    Mail->>Postgres: insert mail_payloads + mail_deliveries<br/>(unique on template_id, idempotency_key)
    Mail-->>Producer: delivery_id

    loop every BACKEND_MAIL_WORKER_INTERVAL
        Worker->>Postgres: SELECT FOR UPDATE SKIP LOCKED
        Postgres-->>Worker: row
        Worker->>SMTP: send via wneessen/go-mail
        alt success
            Worker->>Postgres: insert mail_attempts(success),<br/>mark delivery sent
        else transient
            Worker->>Postgres: insert mail_attempts(transient),<br/>schedule next_attempt_at + jitter
        else permanent or attempts >= MAX
            Worker->>Postgres: insert mail_attempts(permanent),<br/>move to mail_dead_letters
            Worker->>Admin: notification intent (mail.dead_lettered)
        end
    end

mail_attempts.attempt_no is monotonic across the entire history of a single delivery. Resend on a pending / retrying / dead_lettered row re-arms the row; resend on sent returns 409 Conflict.

Notification fan-out

sequenceDiagram
    participant Producer
    participant Notif
    participant Postgres
    participant Push
    participant Mail

    Producer->>Notif: Submit(intent)
    Notif->>Notif: validate kind + payload
    Notif->>Postgres: INSERT notifications ON CONFLICT (kind, idempotency_key) DO NOTHING
    Notif->>Postgres: materialise notification_routes<br/>per channel from catalog
    Notif->>Push: PublishClientEvent(user_id, payload)
    Notif->>Mail: EnqueueTemplate(template_id, recipient,<br/>payload, route_id)
    Notif-->>Producer: ok (best-effort dispatch)

    loop every BACKEND_NOTIFICATION_WORKER_INTERVAL
        Postgres-->>Notif: routes still in pending / retrying
        Notif->>Push: retry push (or)
        Notif->>Mail: re-arm mail row
    end

auth.login_code bypasses notification entirely: auth writes the delivery row directly so the challenge commit is atomic with the mail queue insert. Catalog entries that target administrators land email on BACKEND_NOTIFICATION_ADMIN_EMAIL; if the variable is empty the route lands with status='skipped' and an operator log line records the configuration miss.

Runtime job lifecycle

sequenceDiagram
    participant Lobby
    participant Runtime
    participant Workers
    participant Docker
    participant Engine
    participant Reconciler

    Lobby->>Runtime: StartGame(game_id)
    Runtime->>Workers: enqueue start job
    Runtime-->>Lobby: ack

    Workers->>Docker: pull / create / start engine container
    Docker-->>Workers: container id
    Workers->>Engine: POST /api/v1/admin/init
    Engine-->>Workers: ok / error
    Workers->>Runtime: write runtime_records (running or start_failed)
    Workers->>Lobby: OnRuntimeJobResult

    loop scheduler tick
        Workers->>Engine: PUT /api/v1/admin/turn
        Engine-->>Workers: snapshot
        Workers->>Runtime: persist runtime_records
        Workers->>Lobby: OnRuntimeSnapshot
    end

    Reconciler->>Docker: list containers labelled galaxy.backend=1
    alt missing recorded container
        Reconciler->>Runtime: mark removed
        Reconciler->>Lobby: OnRuntimeJobResult(removed)
    else unrecorded labelled container
        Reconciler->>Runtime: adopt
    end

Per-game serialisation is enforced by a sync.Map[game_id]*sync.Mutex inside runtime.Service, so concurrent start / stop / patch attempts on the same game_id cannot race. runtime_operation_log records every operation for audit.

Push gRPC

sequenceDiagram
    participant Backend
    participant Ring
    participant Gateway

    loop domain emits client_event / session_invalidation
        Backend->>Ring: append, allocate cursor
    end

    Gateway->>Backend: SubscribePush(GatewaySubscribeRequest{cursor?})
    alt cursor present and within ring TTL
        Backend->>Gateway: replay events newer than cursor
    else cursor missing or aged out
        Backend->>Gateway: stream from current head
    end

    loop event published
        Backend->>Gateway: PushEvent
    end

    Gateway->>Backend: same gateway_client_id reconnects
    Backend->>Backend: cancel previous stream (codes.Aborted)
    Backend->>Gateway: stream again

The cursor is a zero-padded decimal uint64 minted by an in-process counter; backend resets the sequence after a restart, so cursors are only meaningful within a single process lifetime. Per-connection backpressure is drop-oldest, with a log line on each drop so the gateway side can correlate gaps.