From e6b73a8f554f7fd87980f34eb82f214cfd3c9f1e Mon Sep 17 00:00:00 2001 From: Ilia Denisov Date: Wed, 8 Apr 2026 22:03:34 +0200 Subject: [PATCH] docs: update architecture --- ARCHITECTURE.md | 1019 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 854 insertions(+), 165 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index bf99cf6..0ec295b 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -4,251 +4,940 @@ Galaxy Plus: Turn-based Strategy Game ## Purpose -This document fixes the high-level service architecture of the system. -It is the starting point for implementing the external edge layer, authentication/session management, business services, and push delivery. +This document defines the high-level architecture of the Galaxy Plus platform as a single source of truth for implementing all core microservices. + +It describes: + +* public and trusted service boundaries; +* ownership of main business entities and state; +* request routing and transport rules; +* interaction rules between services; +* runtime model for game containers; +* notification and event propagation model; +* recommended implementation order. + +Detailed behavior of each concrete service belongs in its own README. +This document fixes the system-level structure and the architectural rules that must remain stable across service implementations. + +## Scope + +Galaxy Plus is a multiplayer turn-based online strategy game platform. + +Core product properties: + +* many game sessions may exist simultaneously; +* one user may participate in multiple games at once; +* users authenticate by e-mail confirmation code; +* users have platform roles and tariff/entitlement state; +* games may be public or private; +* public games are managed by system administrators; +* private games are created and managed by eligible paid users; +* each running game is executed inside its own dedicated game engine container; +* each running game is bound to one concrete engine version; +* in-place upgrade of a running game is allowed only as a patch update within the same semver major/minor line; +* player commands are turn-bound and are accepted only before the next scheduled turn generation cutoff. + +The current v1 platform uses Redis as the main data store and Redis Streams as the internal event bus. ## Main Principles -- The system exposes a single external entry point: **Edge Gateway**. -- Internal business services are **not reachable directly from outside**. -- Any external command, except public auth commands, must be authenticated before it is routed further. -- Gateway handles only edge concerns. Business validation and domain rules remain inside business services. -- Gateway owns external delivery channels; the v1 implementation uses - authenticated gRPC server-streaming push, while long-polling remains out of - scope. +* The platform exposes a single external entry point: **Edge Gateway**. +* Public unauthenticated flows use REST/JSON. +* Authenticated user traffic uses signed gRPC over HTTP/2 with protobuf control envelopes and FlatBuffers payload bytes. +* The gateway handles only edge concerns: parsing, authentication, integrity checks, anti-replay, rate limiting, routing, and push delivery. Business authorization and domain rules remain in downstream services. +* `Auth / Session Service` is the source of truth for `device_session`, but it is not on the hot path of every authenticated request. Gateway authenticates steady-state traffic from session cache and lifecycle updates. +* `Game Lobby` owns platform-level metadata of game sessions. +* `Game Master` owns runtime and operational state of running games. +* `Runtime Manager` is the only service allowed to access Docker API directly. +* `Notification Service` is the platform-level delivery/orchestration layer for push and most non-auth email notifications. +* `Mail Service` sends email; auth-code mail is sent directly by `Auth / Session Service`, while all other platform mail is initiated through `Notification Service`. +* `Geo Profile Service` is auxiliary and fail-open relative to gameplay; it never blocks the currently processed request and may affect only later requests. +* If a user-facing request must complete with a deterministic result in the same flow, the critical internal chain must be synchronous. If the interaction is propagation, notification, cache update, runtime job completion, telemetry, or denormalized read-model update, it should be asynchronous. + +## Security and Transport Model + +The former standalone security model is part of the main architecture and is no longer treated as a separate subsystem. + +### Public and authenticated transport classes + +The gateway already distinguishes: + +* public REST/JSON for unauthenticated traffic such as health checks and public auth; +* authenticated gRPC over HTTP/2 for verified commands and push delivery. + +The public auth contract is: + +* `send-email-code(email) -> challenge_id` +* `confirm-email-code(challenge_id, code, client_public_key) -> device_session_id` + +The authenticated request contract is based on: + +* `device_session_id` +* `message_type` +* `timestamp_ms` +* `request_id` +* `payload_hash` +* Ed25519 client signature over canonical envelope fields. + +Server responses and push events are signed by the gateway so clients can verify server-originated messages. Push streams are bound to authenticated `user_id` and `device_session_id`, and session revoke closes only streams bound to the revoked session. + +### Verification boundary + +Before routing an authenticated request, gateway must: + +1. validate envelope presence and protocol version; +2. resolve session from session cache; +3. reject unknown or revoked sessions; +4. verify `payload_hash`; +5. verify client signature; +6. verify freshness window; +7. verify anti-replay by `device_session_id + request_id`; +8. apply edge rate limits and basic policy checks; +9. build an authenticated internal command context and only then route downstream. + +Downstream services must never receive unauthenticated external traffic. + +## High-Level System Diagram ```mermaid flowchart LR - Client["Clients\n(native and browser)"] - Gateway["Edge Gateway\npublic REST + authenticated gRPC"] + Client["Game Client\n(native / browser)"] + AdminUI["Admin UI"] + Gateway["Edge Gateway\nPublic REST\nAuthenticated gRPC\nAdmin REST"] Auth["Auth / Session Service"] - Business["Business Services"] - Redis["Redis\nsession cache + replay keys + event streams"] - Telemetry["Telemetry Backends\nPrometheus / OTLP"] + User["User Service"] + Lobby["Game Lobby Service"] + GM["Game Master"] + Runtime["Runtime Manager"] + Notify["Notification Service"] + Mail["Mail Service"] + Geo["Geo Profile Service"] + Billing["Billing Service\nfuture"] + Redis["Redis\nKV + Streams"] + Telemetry["Telemetry"] Client --> Gateway + AdminUI --> Gateway + Gateway --> Auth - Gateway --> Business - Gateway --> Redis - Gateway --> Telemetry + Gateway --> User + Gateway --> Lobby + Gateway --> GM + Gateway --> Geo + + Auth --> User + Auth --> Mail Auth --> Redis - Business --> Redis + + User --> Redis + + Lobby --> User + Lobby --> GM + Lobby --> Runtime + Lobby --> Redis + + GM --> Lobby + GM --> Runtime + GM --> Redis + + Geo --> Auth + Geo --> User + Geo --> Redis + + Notify --> Gateway + Notify --> Mail + Notify --> Redis + + Runtime --> Redis + Billing --> User + Telemetry --- Gateway + Telemetry --- Auth + Telemetry --- User + Telemetry --- Lobby + Telemetry --- GM + Telemetry --- Runtime + Telemetry --- Notify + Telemetry --- Geo ``` -## Main Components +The baseline gateway/auth/session/pub-sub model above is consistent with the existing architecture and service READMEs. -### 1. [Edge Gateway](./gateway/README.md) +## Service List and Responsibility Boundaries -The gateway is the only public entry point for client traffic. +## 1. [Edge Gateway](gateway/README.md) -Responsibilities: +`Edge Gateway` is the only public entry point for all external traffic. It already owns transport parsing, session-cache-based authentication, signature verification, freshness/replay checks, edge rate limiting, routing, and push delivery. It must remain free of domain-specific business logic. -- transport parsing -- authentication of external requests -- transport integrity checks -- session cache lookup -- request signature verification -- timestamp window verification -- anti-replay checks -- rate limiting and abuse protection -- command routing -- basic policy enforcement -- authenticated gRPC server-streaming push connection handling -- delivery of client-facing events from pub/sub +External surfaces: -The gateway must not implement domain-specific business logic. +* public REST: -### 2. [Auth / Session Service](./authsession/README.md) + * health and readiness; + * public auth commands; + * browser/bootstrap and public route classes where needed. +* authenticated gRPC: -This service owns authentication and device session lifecycle. + * generic `ExecuteCommand`; + * authenticated `SubscribeEvents`. +* admin REST: -Responsibilities: + * separate public administrative surface for system administrators; + * routed only for authenticated users with admin role. -- `send_email_code` -- `confirm_email_code` -- device session creation -- public key registration for device sessions -- session revoke / logout -- persistence of session state -- publishing session state changes for cache invalidation/update +The gateway does not directly access game engine containers. +For running games it routes to `Game Master`. +For pre-game platform flows it routes to `Game Lobby`. +For user-profile requests it routes to `User Service`. +For public auth it routes to `Auth / Session Service`. -This service is the source of truth for `device_session` state. +## 2. [Auth / Session Service](authsession/README.md) -### 3. Session Store +`Auth / Session Service` owns: -Persistent storage for device sessions. +* challenge lifecycle; +* e-mail-code authentication; +* creation of `device_session`; +* registration of the client Ed25519 public key; +* revoke/logout/block state; +* trusted internal read/revoke/block API; +* projection of session lifecycle state into gateway-consumable Redis data. -Typical fields: +It is the source of truth for: -- `device_session_id` -- `user_id` -- client public key -- session status -- creation / revoke timestamps -- optional client metadata +* authentication challenges; +* `device_session`; +* revoke/block state. -### 4. Session Cache +Important architectural rules: -Fast lookup cache used by the gateway. +* public auth stays synchronous; +* `confirm-email-code` returns a ready `device_session_id`; +* no async “pending session provisioning” stage exists; +* session source of truth and gateway-facing projection remain separate; +* active-session limits are configuration-driven; +* `send-email-code` stays success-shaped for existing, new, blocked, and throttled email flows. -Purpose: +Direct integrations: -- resolve `device_session_id -> user_id + public_key + status` -- avoid synchronous calls from gateway to auth/session service on every request +* synchronous to `User Service` for user resolution/create/block decision; +* synchronous to `Mail Service` for auth-code delivery; +* asynchronous session lifecycle projection into Redis for gateway consumption. -Cache updates should be driven by session lifecycle events and may also use TTL. +## 3. User Service -### 5. Anti-Replay Store +`User Service` owns user identity and profile as platform-level business data. -Edge-level storage for recently seen transport `request_id` values. +It is the source of truth for: -Purpose: +* `user_id`; +* profile fields and editable user settings; +* role model, including admin role; +* current tariff/entitlement state; +* user-specific limits and platform sanctions; +* latest effective `declared_country`. -- reject replayed authenticated transport messages within the allowed time window +It is directly reachable through gateway for selected user-facing operations such as: -This is transport-level replay protection, not business idempotency. +* reading and editing allowed profile fields; +* viewing tariff and entitlement state; +* viewing user settings; +* viewing current restrictions and sanctions. -### 6. Rate Limit Store +Not every profile mutation goes directly here. For example: -Shared state for edge-level throttling and abuse control. +* email change must use a code-confirm flow; +* `declared_country` change remains under admin approval flow via `Geo Profile Service`. -Typical dimensions: +Future billing does not become a direct dependency of other services. `Billing Service` will feed entitlement/payment outcomes into `User Service`, and the rest of the platform will continue to use `User Service` as the source of truth for current entitlements. -- IP / network -- `device_session_id` -- `user_id` -- command class +## 4. Mail Service -### 7. Business Services +`Mail Service` is the internal email delivery service. -Internal services that process authenticated commands. +Split of responsibility: -Responsibilities: +* auth code emails: `Auth / Session Service -> Mail Service` directly; +* all other user/admin notification emails: `Notification Service -> Mail Service`. -- business validation -- authorization by `user_id` -- ownership checks -- domain invariants -- state transitions -- business idempotency where required -- publishing domain events and/or client-facing events +Mail delivery may be internally queued inside the mail service, but to its callers it is still a synchronous internal command where the caller needs a deterministic send-or-fail result. -Business services do not verify external signatures and do not access external clients directly. +## 5. [Geo Profile Service](geoprofile/README.md) -### 8. Event Bus / Pub-Sub +`Geo Profile Service` is an internal trusted auxiliary service for country-level connection signals of authenticated users. -Used for internal event distribution. +It integrates with: -Purposes: +* gateway as asynchronous ingest producer; +* `User Service` for current effective `declared_country`; +* `Auth / Session Service` for suspicious session blocking; +* `Mail Service` only for optional admin notifications. -- session cache invalidation/update -- client-facing event delivery through the gateway -- optional internal domain event propagation between services +It owns: -## External Flows +* observed country facts; +* per-session country aggregation; +* `usual_connection_country`; +* `country_review_recommended`; +* history of `declared_country` changes. -### Public Auth Flow +It does not block the request that triggered suspicion. +It can only request block of suspicious sessions for subsequent requests. -These commands are public and do not require an existing device session: +In this document, references to `Edge Service` in older geo documentation should be understood as `Edge Gateway`. -- `send_email_code` -- `confirm_email_code` +## 6. Admin Service -Flow: +`Admin Service` is the external backend/orchestration layer for the administrative UI. -1. client sends public auth command to gateway -2. gateway applies public-edge checks (format, rate limits, abuse policy) -3. gateway routes command to auth/session service -4. auth/session service performs auth logic -5. if login is confirmed, auth/session service creates `device_session` -6. auth/session service publishes cache update/invalidation event +It is not a heavy domain owner. +Its job is to: -### Authenticated Command Flow +* expose administrator-facing workflows; +* call trusted internal APIs of other services; +* aggregate administrative views where needed; +* enforce system-admin role checks at the gateway/admin boundary. -All other external commands require authentication. +System administrators can view and operate on all games, including private ones. -Flow: +## 7. Game Lobby Service -1. client sends authenticated request to gateway -2. gateway validates transport envelope presence and protocol version -3. gateway resolves `device_session_id` through session cache -4. gateway rejects unknown or revoked sessions -5. gateway verifies request signature -6. gateway verifies timestamp window -7. gateway verifies anti-replay constraints -8. gateway applies rate limits and basic policy checks -9. gateway extracts authenticated context, including `user_id` -10. gateway routes the request to the target business service based on `message_type` +`Game Lobby` owns platform-level metadata and lifecycle of game sessions as platform entities. -No business service should receive an unauthenticated external request. +It is the source of truth for: -### Push Flow +* game records before and after runtime existence; +* public/private game type; +* owner of a private game; +* invitations and invite code lifecycle; +* applications and approvals; +* membership and roster; +* blocked/removed participants at platform level; +* turn schedule configuration; +* target engine version for launch; +* user-facing lists of games; +* denormalized runtime snapshot imported from `Game Master`. -The gateway owns external delivery connections. -The v1 gateway uses authenticated gRPC server-streaming push. -Long-polling remains out of scope for the implemented gateway. +`Game Lobby` is the source of truth for: -Flow: +* party membership; +* invited / pending / active / finished / removed status of players relative to games; +* user-visible lists such as `active / finished / pending / invited games`. -1. client opens authenticated push connection through gateway -2. gateway binds connection to `user_id` and `device_session_id` -3. gateway starts the channel with a signed service event that includes the - current server time for clock offset calculation -4. internal services publish client-facing events to pub/sub targeted by - `user_id` and optionally by `device_session_id` -5. gateway consumes those events and delivers them to the proper client - connections +It also stores a denormalized runtime snapshot for convenience, at least: -Gateway is a delivery layer, not the source of business events. +* `current_turn`; +* `runtime_status`; +* `engine_health_summary`. -## Internal Contract Between Gateway and Business Services +This prevents user-facing list/read flows from fan-out requests into `Game Master`. -Business services should receive an internal authenticated command, not raw external transport data. +### Lobby status model -Typical internal authenticated context: +Minimum platform-level status set: -- `user_id` -- `device_session_id` -- `message_type` -- verified payload bytes -- transport `request_id` -- optional command id / trace id -- optional client metadata relevant for logging +* `draft` +* `enrollment_open` +* `enrollment_closed` +* `ready_to_start` +* `starting` +* `running` +* `paused` +* `finished` +* `cancelled` -Business services must trust only the gateway as their external ingress. +`Lobby.paused` is a business/platform pause, distinct from engine/runtime failure states. -## Separation of Responsibilities +### Membership rules -### Gateway is responsible for +* `User Service` owns users of the platform as identities. +* `Game Lobby` owns membership in concrete games. +* game engine does not own platform membership; +* `Game Master` may cache membership for runtime authorization, but `Game Lobby` remains the source of truth. -- who sent the request -- whether transport integrity is valid -- whether the request is fresh -- whether replay is detected -- whether request volume is acceptable -- where to route the request +### Public vs private game rules -### Business services are responsible for +Public games: -- whether the user is allowed to perform the business action -- whether the target object belongs to the user -- whether the domain state transition is valid -- whether business idempotency rules are satisfied +* created and controlled by system administrators; +* visible in public list; +* joining is based on application and manual admin approval in v1. -## Revocation Behavior +Private games: -When a device session is revoked: +* can be created only by eligible paid users; +* visible only to their owner and to invited users who used an invite code and were accepted; +* joining uses invite code plus owner approval; +* invite lifecycle belongs entirely to `Game Lobby`. -- auth/session service updates the source of truth -- auth/session service publishes revoke/invalidation event -- gateway updates or invalidates session cache -- gateway rejects further requests for that session -- gateway closes active authenticated push streams bound to that session, if applicable +Private-party owners get a limited owner-admin capability set, not full system admin power. + +## 8. Game Master + +`Game Master` owns runtime and operational metadata of already running games. + +It is the only trusted service allowed to communicate with game engine containers. + +It owns: + +* runtime mapping of running game to container endpoint/binding; +* current turn number; +* runtime status; +* generation status; +* engine health; +* patch state; +* engine version registry and version-specific engine options; +* runtime mapping `platform user_id -> engine player UUID` for each running game. + +### Game Master status model + +Minimum runtime-level status set: + +* `starting` +* `running` +* `generation_in_progress` +* `generation_failed` +* `stopped` +* `engine_unreachable` + +`running` here means `running_accepting_commands`. + +### Game command routing + +All game-related `message_type` include `game_id`. + +Gateway enriches them with authenticated `user_id` and routes them to `Game Master`. +`Game Master` checks whether this user may access this running game, using membership data sourced from `Game Lobby`, then routes the command to the correct engine container. + +The gateway never routes directly to game engine containers. + +### Runtime admin operations + +For already running games, `Game Master` handles: + +* `stop game` +* `force next turn` +* `patch engine` +* admin/runtime status reads +* player deactivation/removal inside engine when required +* regular collection of game runtime metrics + +System admin can use all of them. +Private-game owner can use the subset allowed for the owner of that game. + +### Turn cutoff and scheduling + +`Game Master` is the owner of authoritative platform time for turn cutoff decisions. + +Commands arriving exactly on the boundary of a new turn are considered stale and must not reach the engine. + +The scheduler is a subsystem inside `Game Master`. +It triggers turn generation according to the game schedule. + +If a manual “force next turn” is executed, the next scheduled turn slot must be skipped so that players still get at least one full normal schedule interval before the following generated turn. + +### Runtime/engine finish flow + +When the engine determines that a game is finished: + +1. engine reports finish to `Game Master`; +2. `Game Master` updates runtime state; +3. `Game Master` notifies `Game Lobby`; +4. `Game Lobby` updates the platform-level game record to `finished`. + +### Player removal after start + +After a game has started, two different actions exist: + +* temporary removal/block at platform level: + + * the player cannot send commands through gateway/platform; + * the engine still keeps the player slot; +* final removal or account-level block: + + * `Game Master` must additionally send an admin command to the engine to deactivate/remove the player inside the game. + +This distinction is architectural and must remain explicit. + +## 9. Runtime Manager + +`Runtime Manager` is the only internal service allowed to access Docker API directly. + +It owns: + +* starting game engine containers; +* stopping containers; +* restarting containers where allowed; +* patching/replacing containers where allowed; +* technical runtime inspection/status; +* monitoring containers and publishing technical health events. + +It does **not** own platform metadata of games. +It does **not** own runtime business state of games. +It executes runtime jobs for `Game Lobby` and `Game Master`. + +### Container model + +* one game = one container; +* one container = one game. + +This is a hard invariant. + +## 10. Notification Service + +`Notification Service` is the async delivery/orchestration layer for platform notifications. + +It has a deliberately minimal role: + +* consume domain/integration events from services; +* decide whether a given event should result in push, email, or both; +* render and route notification payloads; +* send push-targeted events toward gateway; +* send email-targeted commands toward `Mail Service`. + +It is not a source of truth for user preferences in v1 unless a later feature requires it. + +All platform notifications except auth-code delivery flow through this service, including: + +* game lifecycle notifications; +* invite/application updates; +* new turn notifications; +* operational/admin notifications where appropriate. + +## 11. Billing Service (future) + +`Billing Service` is not part of the first implementation wave. + +When introduced, it will: + +* process payment/billing events; +* calculate or validate payment outcomes; +* feed resulting entitlement changes into `User Service`. + +`User Service` remains the source of truth for current entitlement used by the rest of the platform. + +## Data Ownership Summary + +```mermaid +flowchart TD + U["User Service"] + A["Auth / Session Service"] + L["Game Lobby"] + G["Game Master"] + R["Runtime Manager"] + P["Geo Profile Service"] + N["Notification Service"] + M["Mail Service"] + + U -->|"users, roles, tariffs, limits, sanctions, current declared_country"| X1["Platform user identity"] + A -->|"challenges, device sessions, revoke/block state"| X2["Auth/session state"] + L -->|"game metadata, invites, applications, membership, roster"| X3["Platform game records"] + G -->|"runtime state, current turn, engine health, engine mapping, engine version registry"| X4["Running-game state"] + R -->|"container execution and technical runtime control"| X5["Container runtime"] + P -->|"observed country, usual_connection_country, review state, declared_country history"| X6["Geo state"] + N -->|"notification routing only"| X7["Notification orchestration"] + M -->|"email delivery only"| X8["Email transport"] +``` + +## Internal Transport Semantics + +The platform uses one simple rule: + +* if the user-facing request must complete with a deterministic result in the same flow, the critical internal chain is synchronous; +* if the interaction is propagation, notification, cache invalidation, runtime job completion, telemetry, or denormalized read-model update, it is asynchronous. + +### Fixed synchronous interactions + +* `Gateway -> Auth / Session Service` +* `Gateway -> Admin Service` +* `Gateway -> User Service` +* `Gateway -> Game Lobby` +* `Gateway -> Game Master` +* `Auth / Session Service -> User Service` +* `Auth / Session Service -> Mail Service` +* `Geo Profile Service -> Auth / Session Service` +* `Geo Profile Service -> User Service` +* `Game Lobby -> User Service` +* `Game Lobby -> Game Master` for critical registration/update calls + +### Fixed asynchronous interactions + +* session lifecycle projection toward gateway cache; +* revoke propagation; +* `Lobby -> Runtime Manager` runtime jobs; +* `Game Master -> Runtime Manager` runtime jobs; +* all event-bus propagation; +* `Notification Service -> Gateway`; +* `Notification Service -> Mail Service`; +* geo auxiliary ingest from gateway to geo service; +* runtime health events from `Runtime Manager`. + +### Mixed interactions + +Some service pairs may use both styles for different flows. +The main example is `Lobby -> Game Master`: + +* synchronous for critical registration/update after successful start; +* asynchronous for secondary propagation and denormalized status fan-out. + +## Redis as Data and Event Infrastructure + +Redis is the first-stage shared infrastructure for: + +* main persistent data of services where no SQL backend is yet introduced; +* gateway session cache backing data; +* replay reservation store for gateway; +* session lifecycle projection; +* internal event bus using Redis Streams; +* notification fan-out; +* runtime job completion events; +* lobby/game-master propagation events; +* geo auxiliary events. + +Redis Streams are therefore the platform event bus in v1. + +This is an accepted trade-off for simpler early-stage infrastructure. +Service boundaries must still stay storage-agnostic where future SQL migration is expected, especially in `Auth / Session Service`. + +## Main End-to-End Flows + +## 1. Public authentication flow + +```mermaid +sequenceDiagram + participant Client + participant Gateway + participant Auth + participant User + participant Mail + participant Redis + + Client->>Gateway: POST send-email-code + Gateway->>Auth: send-email-code + Auth->>User: resolve existing/creatable/blocked + User-->>Auth: decision + Auth->>Mail: send or suppress code + Auth-->>Gateway: challenge_id + Gateway-->>Client: challenge_id + + Client->>Gateway: POST confirm-email-code + Gateway->>Auth: confirm-email-code + Auth->>Auth: validate challenge/code/public key + Auth->>User: resolve/create/block + User-->>Auth: user_id or deny + Auth->>Auth: create device_session + Auth->>Redis: write gateway session projection + Auth->>Redis: publish session lifecycle update + Auth-->>Gateway: device_session_id + Gateway-->>Client: device_session_id +``` + +This preserves the existing gateway/auth contract and the rule that auth is not on the steady-state hot path. + +## 2. Authenticated game/platform request flow + +```mermaid +sequenceDiagram + participant Client + participant Gateway + participant Lobby + participant GM as Game Master + + Client->>Gateway: ExecuteCommand(message_type, payload, signature) + Gateway->>Gateway: verify session, signature, freshness, replay + alt platform-level command + Gateway->>Lobby: verified authenticated command + Lobby-->>Gateway: response + else running-game command + Gateway->>GM: verified authenticated command with game_id + GM-->>Gateway: response + end + Gateway-->>Client: signed response +``` + +## 3. Game creation and pre-start lifecycle + +```mermaid +sequenceDiagram + participant Client + participant Gateway + participant Lobby + participant User + + Client->>Gateway: create/apply/invite/approve/start-preparation commands + Gateway->>Lobby: verified platform command + Lobby->>User: entitlement/limit checks when needed + User-->>Lobby: allow/deny and user metadata + Lobby->>Lobby: update game metadata, roster, schedule, target engine version + Lobby-->>Gateway: response + Gateway-->>Client: signed response +``` + +## 4. Game start flow + +```mermaid +sequenceDiagram + participant Owner as Admin or Private Owner + participant Gateway + participant Lobby + participant Runtime + participant GM as Game Master + participant Engine as Game Engine Container + participant Redis + + Owner->>Gateway: start game + Gateway->>Lobby: verified start command + Lobby->>Lobby: validate ready_to_start and roster + Lobby->>Runtime: async start job + Runtime-->>Redis: runtime job result event + + alt start failed + Lobby->>Lobby: keep failure / starting error state + Lobby-->>Gateway: failure or accepted-then-observed failure path + else container started + Lobby->>Lobby: persist game metadata and runtime binding + Lobby->>GM: sync running-game registration + GM->>Engine: initial engine setup API + GM->>GM: initialize runtime state + GM-->>Lobby: registration result + Lobby->>Lobby: mark game running or paused + end +``` + +Critical rule: +if the container starts but `Lobby` cannot persist metadata, the launch is considered a full failure and the container must be removed. +If metadata is persisted but `Game Master` is unavailable, the game is placed into `paused` and administrators are notified. + +## 5. Running-game command flow + +```mermaid +sequenceDiagram + participant Client + participant Gateway + participant GM as Game Master + participant Lobby + participant Engine + + Client->>Gateway: game-related ExecuteCommand(game_id,...) + Gateway->>GM: verified authenticated command + GM->>GM: check runtime status + GM->>Lobby: resolve/cached-check membership if needed + Lobby-->>GM: membership / permissions + GM->>Engine: game or runtime-admin API call + Engine-->>GM: result + GM-->>Gateway: response payload + Gateway-->>Client: signed response +``` + +## 6. Scheduled turn generation flow + +```mermaid +sequenceDiagram + participant Scheduler as Game Master Scheduler + participant GM as Game Master + participant Engine + participant Lobby + participant Notify as Notification Service + participant Gateway + + Scheduler->>GM: due turn slot reached + GM->>GM: switch runtime_status to generation_in_progress + GM->>Engine: generate next turn + alt generation success + Engine-->>GM: new turn result / maybe finished + GM->>GM: update current_turn and runtime state + GM->>Lobby: sync runtime snapshot + GM->>Notify: publish new-turn event + Notify->>Gateway: client-facing push events + else generation failed + Engine-->>GM: error / timeout + GM->>GM: mark generation_failed + GM->>Lobby: sync runtime snapshot + GM->>Notify: notify administrators only + end +``` + +Players receive only a lightweight push notification that a new turn exists. +They then request their own per-player game state separately. + +If `force next turn` is used, the next scheduled slot is skipped so that the effective time between turns never becomes shorter than the schedule spacing. + +## 7. Game finish flow + +```mermaid +sequenceDiagram + participant Engine + participant GM as Game Master + participant Lobby + participant Notify as Notification Service + participant Gateway + + Engine->>GM: game finished + GM->>GM: update runtime state + GM->>Lobby: mark platform game finished + Lobby->>Lobby: finalize game record + GM->>Notify: publish finish event + Notify->>Gateway: push user-facing/platform events +``` + +## 8. Geo profile auxiliary flow + +```mermaid +sequenceDiagram + participant Gateway + participant Geo + participant User + participant Auth + + Gateway-->>Geo: async observation(user_id, device_session_id, ip_addr) + Geo->>Geo: derive observed_country and aggregates + alt suspicious multi-country pattern + Geo->>Auth: sync block suspicious session(s) + end + alt declared_country admin change approved later + Geo->>User: sync current declared_country update + end +``` + +This flow is intentionally fail-open relative to gameplay. + +## Separation of Platform Metadata and Engine State + +This distinction is fundamental. + +### Platform-level state + +Owned by `Game Lobby`: + +* who owns the game; +* who is invited; +* who applied; +* who was approved; +* who is currently a platform participant; +* what the schedule is; +* whether the game is public/private; +* whether the game is `draft`, `running`, `paused`, `finished`, etc. as a platform entity. + +### Runtime/operational state + +Owned by `Game Master`: + +* current turn; +* runtime status; +* generation state; +* engine reachability; +* patch state; +* mapping to engine player UUIDs; +* engine version registry; +* operational metadata of the running game. + +### Full game state + +Owned only by the game engine container: + +* actual per-player game state; +* internal mechanics and progression; +* player-visible game state snapshots; +* win/lose logic; +* domain truth of the game world. + +The platform must not attempt to duplicate the full game state outside the engine. + +## Versioning of Game Engines + +Every game runs on one specific game engine version. + +Rules: + +* active games stay on the version with which they were started; +* upgrade during a running game is allowed only as a patch update within the same major/minor line; +* game-engine version management is manual in v1; +* each engine version may carry version-specific engine options; +* `Game Master` owns the engine version registry and its internal API. + +## Administrative Access Model + +Two distinct external admin modes exist. + +### System administrator + +Uses a separate admin-facing REST surface via gateway and `Admin Service`. + +System administrator can: + +* manage public games; +* see and operate on all private games; +* inspect platform operational state; +* launch, stop, patch, pause, and monitor games; +* approve/reject participation in public games; +* perform user/game administrative actions. + +### Private-game owner + +Uses the normal authenticated client protocol, not the separate system admin UI. + +Allowed owner-admin actions are limited to the owner’s own private games and include at least: + +* initiate enrollment; +* distribute invite codes outside the system; +* approve/reject applicants; +* start game after enrollment; +* force next turn while running; +* stop game; +* temporarily or permanently remove/block players from that game according to allowed policy. + +These operations use dedicated admin-related `message_type` values in the normal authenticated game/client protocol. ## Non-Goals -The gateway is not a place for full domain authorization logic. -It must not become a business “god service”. +The architecture intentionally does not try to solve all future concerns now. -The auth/session service is not the hot path for every authenticated request. -The gateway should authenticate most requests from cache, not by synchronous round-trips. +Current non-goals: + +* a separate global SQL storage layer in v1; +* a separate policy engine; +* automatic billing integration in v1; +* automatic match balancing in v1; +* direct external access to internal services; +* pushing full per-player game state over notification channels; +* allowing game engine containers to be called directly by clients or by services other than `Game Master`; +* using `Auth / Session Service` as a hot synchronous dependency for all authenticated traffic; +* making `Notification Service` the source of truth for notification preferences in v1. + +## Recommended Order of Service Implementation + +Recommended order for implementation is: + +1. **Edge Gateway Service** (implemented) + First public ingress, transport boundary, authentication boundary, signed request/response model, push delivery, session cache, replay protection. + +2. **Auth / Session Service** (implemented) + Public auth flow, `device_session`, revoke/block lifecycle, gateway session projection. + +3. **User Service** + Platform user identity, roles, tariffs/entitlements, user limits, settings, sanctions, and current `declared_country`. + +4. **Mail Service** + Internal email delivery for auth codes first, later for platform notifications. + +5. **Notification Service** + Unified async delivery of push and non-auth email notifications. + +6. **Game Lobby Service** + Platform game records, membership, invites, applications, approvals, schedules, user-facing lists, pre-start lifecycle. + +7. **Runtime Manager** + Dedicated Docker-control service for container start/stop/patch/status and technical runtime monitoring. + +8. **Game Master** + Running-game orchestration, engine version registry, runtime state, turn scheduler, engine API mediation, operational controls. + +9. **Admin Service** + Admin UI backend that orchestrates trusted APIs of other services. + +10. **Geo Profile Service** (planned) + Auxiliary geo aggregation, review recommendation, suspicious-session blocking, declared-country workflow. + +11. **Billing Service** + Late-stage payment and subscription source feeding entitlements into `User Service`. + +This order gives the platform a usable public perimeter first, then identity/auth, then core gameplay lifecycle, then runtime orchestration, and only afterward secondary auxiliary services.