Files
scrabble-game/docs/ARCHITECTURE.md
T
Ilia Denisov 408da3f201
Tests · Go / test (push) Successful in 8s
Tests · Integration / integration (push) Successful in 11s
Tests · Go / test (pull_request) Successful in 6s
Tests · Integration / integration (pull_request) Successful in 10s
Stage 6: gateway edge (Connect/FlatBuffers over h2c, platform/email/guest auth, sessions, rate-limit, admin passthrough, live push bridge)
New public ingress and the first network edge. Framework + a vertical slice of
operations end-to-end; remaining ops reuse the same transcode pattern in Stage 7.

Contracts (new module scrabble/pkg):
- push.proto (backend->gateway gRPC server-stream) + scrabble.fbs (FlatBuffers
  edge payloads), committed generated Go; buf/flatc Makefiles (dev-time codegen).

Backend:
- REST handlers on the /api/v1 groups: internal session endpoints
  (telegram/guest/email login -> mint, resolve, revoke) and the user slice
  (profile, submit_play, state, lobby enqueue/poll, chat).
- internal/notify in-process Publisher hub + internal/pushgrpc gRPC server
  (BACKEND_GRPC_ADDR) streaming your_turn/opponent_moved/chat/nudge/match_found;
  emission in game.commit, social, matchmaker.
- migration 00005 accounts.is_guest; guests are durable rows excluded from stats;
  ProvisionGuest; email-as-login (RequestLoginCode/LoginWithCode).

Gateway (new module scrabble/gateway):
- Connect Gateway service over h2c (Execute + Subscribe), FlatBuffers<->JSON
  transcode registry, Telegram initData HMAC validator (seam), session cache,
  token-bucket rate limiter (3 classes), push fan-out hub, backend REST + push
  gRPC client, admin Basic-Auth reverse proxy.

go.work: use ./pkg, ./gateway + replace scrabble/pkg. CI: gateway/**, pkg/**
path filters; unit build/vet/test span all three modules. Docs (PLAN,
ARCHITECTURE, FUNCTIONAL+ru, TESTING, READMEs) updated; gateway/pkg unit tests +
guest/email-login integration tests.
2026-06-02 22:38:24 +02:00

25 KiB
Raw Blame History

Scrabble Game — Architecture

Source of truth for the platform architecture, transport, security model and cross-service contracts. User-visible behaviour per domain lives in FUNCTIONAL.md; the staged build order lives in ../PLAN.md. This document always describes the current design, not the history of how it was reached. Sections describing not-yet-implemented components are marked (planned).

1. Overview

Three executables plus per-platform side-services:

  • gateway — the only public ingress (module scrabble/gateway). Performs anti-abuse (rate limiting), authenticates the player against the originating platform (or an email/guest session), resolves the internal user_id, and forwards authenticated traffic to backend with an X-User-ID header. Hosts an admin surface behind HTTP Basic Auth. Bridges live events from backend to the client. The shared wire contracts (the push proto and the FlatBuffers edge payloads) live in scrabble/pkg, imported by both gateway and backend.
  • backend — internal-only service that owns every domain concern: identity/sessions, accounts and linking, lobby and matchmaking, the game runtime, the robot opponent, chat, notifications, statistics, history, and administration. Embeds the scrabble-solver engine as a library, in-process — there is no per-game container. The only network consumer of backend is gateway (plus platform side-services over an internal API).
  • ui (planned) — pure-HTML5 client (plain Svelte + Vite, static build). Talks to backend only through gateway. Embeddable in platform webviews; packageable to native (iOS/Android) via Capacitor.
  • platform/<name> (planned) — per-platform side-services (Telegram bot first): deep-link invites and platform-native push notifications. They talk to backend over an internal API.
flowchart LR
  Client((Client / webview)) -- Connect-RPC + FlatBuffers (h2c) --> Gateway
  Gateway -- REST/JSON, X-User-ID --> Backend
  Backend -- gRPC server-stream (live events) --> Gateway
  Gateway -- in-app stream --> Client
  Backend -- pgx --> Postgres[(Postgres)]
  Backend -. embeds .- Solver[[scrabble-solver library]]
  Telegram[Telegram bot side-service] -- internal API --> Backend

The MVP runs gateway and backend as single-instance processes inside a trusted network. No Redis is planned (anti-replay crypto was deliberately dropped). Horizontal scaling is explicit future work.

2. Transport

  • client ↔ gateway: Connect-RPC + FlatBuffers over HTTP/2 cleartext (h2c). Binary payloads, server-streaming for the in-app live channel, first-class JS clients (@connectrpc/connect-web + the flatbuffers npm package). The contract is kept minimal: a single Gateway service (defined in gateway/proto/edge/v1) with Execute(message_type, payload, request_id) for unary operations and Subscribe for the live stream. The proto envelope is a thin carrier; the real request/response and event bodies are FlatBuffers tables (pkg/fbs, the scrabblefb namespace) inside the payload bytes, which the gateway transcodes to and from the backend's JSON. The session token rides in the Authorization: Bearer header (there is no per-request signing, §3); auth operations are unauthenticated and return the minted token. A unary operation's domain outcome rides back in ExecuteResponse.result_code (HTTP 200); only edge failures (rate limit, missing session, unknown type, internal) surface as Connect error codes.
  • gateway ↔ backend (sync): plain HTTP REST/JSON. The gateway injects X-User-ID for authenticated requests; backend never re-derives identity from the body.
  • backend → gateway (live): a single gRPC server-stream carries live events (your-turn, opponent-moved, chat, nudge). The gateway bridges them to the client's in-app stream while the app is open. Out-of-app delivery uses platform-native push via the platform side-service.

3. Authentication & sessions

Platform-native, deliberately simple: no Ed25519 client keys, no per-request signing, no anti-replay crypto (these were considered and dropped — players arrive from a platform rather than completing a mandatory registration).

  • The gateway validates the originating credential once — the platform's signed launch data (e.g. Telegram initData HMAC), an email-code login, or a guest bootstrap — then mints a thin opaque server session token (session_id).
  • The client holds session_id in memory for the app session (browser/OS storage is optional and may be unavailable; losing it means re-login).
  • The gateway caches session → user_id and injects X-User-ID. Session records live in backend, which stores only a SHA-256 hash of the opaque token (never the plaintext), keeps a warmed in-memory cache for fast resolution, and treats sessions as revoke-only — they have no TTL and live until explicitly revoked (statusrevoked).
  • Guest = ephemeral web session (no platform, no email). A guest is backed by a durable accounts row flagged is_guest and carrying no identity — the row is a technical necessity (the sessions and game_players foreign keys require one, the same way the robot pool is durable), not a profile: no friends, statistics or history are kept for it, and it is restricted to auto-match. Platform and email users are auto-provisioned durable accounts with an identity. (Reaping abandoned guest rows is deferred — PLAN.md TODO-3.)

4. Accounts, identities, linking & merge

  • One internal account may carry several platform identities (telegram, vk, …) plus an optional email identity. First contact from a platform auto-provisions a durable account bound to that platform identity. Concretely, platform and email identities share one identities table keyed by a unique (kind, external_id); email is an identity with kind=email and a confirmed flag. A synthetic kind='robot' identity (Stage 5) backs each pooled robot opponent (§7). The email confirm-code flow (Stage 4) binds an email to the authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute TTL, ≤ 5 attempts) is sent through a Mailer seam (an SMTP relay, or a development log mailer when none is configured) and, once verified, attaches a confirmed email identity. An email already confirmed by another account is refused — adopting it would be a merge, which Stage 10 owns. Accounts and identities use application-generated UUIDv7 primary keys.
  • Linking is initiated from an authenticated profile: choose a platform → complete that platform's web-auth confirm → attach the identity to the current account.
  • Merge: if the identity being linked already has its own account with history, the two accounts are merged into the current one (A is primary): statistics are summed, games and friends are transferred, duplicates are de-duplicated, the secondary account is retired. High blast-radius; an isolated, well-tested stage.

5. Game engine integration (scrabble-solver)

backend embeds the solver library in-process behind internal/engine, the only package that imports scrabble-solver (see CLAUDE.md for the solver's public API and constraints). The engine is a self-contained rules library — no persistence, transport or scheduling; the game domain drives it. Key points:

  • Variants at launch: English Scrabble, Russian Scrabble, Эрудит (engine.Variant, mapping to rules.English() / RussianScrabble() / Erudit()). Эрудит's specifics (non-doubling centre, ё with no tiles, 3 blanks, a 15-point bonus) live entirely in the solver ruleset, so the engine treats every variant uniformly.
  • Dictionaries are committed DAWGs loaded with dawg.Load from the directory BACKEND_DICT_DIR; backend loads the engine.Registry at startup as a hard dependency (like migrations), so a missing dictionary fails the boot. The registry holds dictionaries in memory addressed by (variant, dict_version), tracking the latest version per variant, and answers the word-check tool through Registry.Lookup.
  • Dictionary versioning — pin per game. A game records the dict_version it started on and finishes on that version; new games use the latest. Multiple versions may be resident at once. An admin reload (planned, Stage 9) registers a new version through Registry.Load; delivery is the DAWG file in the image / a volume mounted at the dictionary directory. (A future split of the solver into engine + dictionary generator with versioned artifacts is recorded in ../PLAN.md TODO-2.)
  • Move generation/validation/scoring use Solver.GenerateMoves (ranked), Solver.ValidatePlay and Solver.ScorePlay; board mutation uses scrabble.Apply. The engine adds its own deterministic, seeded tile bag that can return tiles (an exchange needs this; the solver's self-play bag cannot).
  • engine.Game is the in-memory match state and the pure rules engine: it deals racks, applies legal plays / passes / exchanges / resignations, refills from the bag, keeps the scores and whose turn it is, and detects the end of the game — empty bag with an empty rack, or six consecutive scoreless turns, applying the end-game rack-value adjustment, or a resignation. On a resignation the resigner keeps their accumulated score (no rack adjustment) and never wins: the win goes to the highest score among the remaining seats, unconditionally the other player in a two-player game. The engine exposes a decoded, solver-free API (SubmitPlay/SubmitExchange/EvaluatePlay/ HintView/Hand) so internal/game drives it without importing the solver.
  • The game domain (internal/game) owns everything the engine does not — persistence, turn scheduling, the configurable turn timeout / auto-resign, the hint budget, word-check complaints, history and GCG — and is the engine's only consumer. Timeout auto-resign reuses engine.Resign, recording the move as a timeout, so it inherits the resignation win/loss.
  • History is dictionary-independent (§9.1): the engine emits decoded MoveRecords and reconstructs the board from them with engine.ReplayBoard (alphabet only, no dictionary).

6. Game rules

  • Word legality: validate-at-submit. An illegal play is rejected by Solver.ValidatePlay; there is no challenge phase.
  • End of game: the bag is empty and a player empties their rack, or 6 consecutive scoreless turns (passes/exchanges), or a resignation, or a missed turn. The per-game turn timeout is chosen at creation (5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h); a turn not made within it becomes an automatic resignation, applied by a background sweeper. The sweeper honours each player's away window — a daily local-time sleep interval on the account (default 00:0007:00, midnight-cross aware) — so a player is never timed out while asleep.
  • Players: auto-match is always 2 players; friend games are 24 players. backend owns turn order and the bag for any player count. A resignation or timeout in a two-player game ends it with the other player winning. In a game with three or more seats a resignation or timeout drops that seat and the rest play on — the engine skips the resigned seat in the turn rotation and excludes it from the win, finishing the game (the sole survivor wins) only once one active seat remains, or by the ordinary end conditions among the active seats. A per-game drop-out tile disposition, chosen at creation (dropout_tiles: remove from play — the default — or return to the bag), governs the leaver's rack, which is never revealed to the remaining players; it is recorded for deterministic journal replay. (Two-player games end on the first drop-out, so the disposition does not affect them.)
  • Hint: governed by two per-game settings — whether hints are allowed and the starting per-player allowance — plus a per-account hint wallet (hint_balance, spent after the allowance; top-ups are a later feature). A hint reveals the top-1 ranked move (GenerateMoves[0]). The lobby/tournament caller picks the per-game defaults (e.g. one in casual random games, none in tournaments).
  • Word-check tool: unlimited dictionary lookups against the game's pinned dictionary; each result offers a complaint (complainant, game, variant, dict_version, word, the disputed result, an optional note) that lands in an admin review queue (admin side planned, Stage 9).

7. Robot opponent

Substitutes for a human in 2-player auto-match when the pool yields no human within 10 seconds (§8). It lives in internal/robot and plays as an ordinary seated account through the game service, so only internal/engine imports the solver. It is designed to be indistinguishable from a person.

The robot keeps no per-game state: every choice is derived deterministically from the game's bag seed (a restart-stable FNV-1a mix), so a background driver (robot.Service.Run, mirroring the turn-timeout sweeper) recomputes the same behaviour on every scan and after a restart — the same philosophy as journal replay. A pool of durable accounts — each a kind='robot' identity (§4), provisioned at startup with chat and friend requests blocked — backs the human-like name pool; those two profile toggles are all the friend/DM blocking requires (there is no DM surface; chat is per-game).

  • Balance: at game start it decides once whether to play to win, with P(play-to-win) ≈ 0.40 (so the human wins ≈ 60%), derived from the seed. Adaptive difficulty is post-MVP.
  • Margin targeting: each turn it picks from the ranked candidates (engine.Candidates) the move whose resulting lead (playing to win) or deficit (playing to lose) is closest to a small band (130 points), rather than always the maximum; with no legal play it exchanges a full rack when the bag can refill it, else passes.
  • Timing: per-move delay sampled from a right-skewed distribution (short delays frequent, median ≈ 10 min), clamped to [2, 90] minutes; it sleeps 00:0007:00 anchored to the opponent's profile timezone with a per-game drift of ±3 h (fallback UTC), so its night overlaps the human's rather than running anti-phase; on a daytime nudge it replies within 210 minutes; it proactively nudges the human after 12 hours idle (subject to the once-per-hour chat limit).
  • Observability: robot accounts accrue ordinary statistics (§9) — the authoritative balance metric (target ≈ 40% robot wins) — and a robot_games_finished_total OTel counter plus a per-finish log give a live view.

8. Lobby & social

  • Matchmaking: an in-memory FIFO pool keyed by variant (the variant fixes the board language), pairing the next two humans into a two-player auto-match with the seat order randomised for first-move fairness. The pool is lost on restart (players re-queue) and is anonymous, so it does not consult blocks. After 10 s with no human a background reaper substitutes a pooled robot (§7) and starts the game. On a pairing or substitution the matchmaker emits a match-found notification (§10), delivered over the live stream; Poll remains as a fallback for a client that is not currently streaming.
  • Friends: a request → accept graph (one friendships table) — add by friend list or internal ID now, by platform deep-link with Stage 8. Declining or cancelling removes the pending request; blocking someone severs an existing friendship.
  • Block: two independent global account toggles (block_chat, block_friend_requests) plus a per-user block list. A per-user block is applied mutually: it hides the pair's chat from each other and refuses friend requests and game invitations between them.
  • Friend games: formed by invitation → accept (an game_invitations record with one row per invitee). The 24 player game starts once every invitee accepts; any decline cancels the invitation, and a pending invitation expires after 7 days (enforced lazily on access).
  • Chat: per-game, persisted (kept with the game's archive), ≤ 60 runes, and validated on input — links, email addresses and phone numbers (including lightly obfuscated forms) are rejected, since the chat is for quick reactions, not contact exchange. Each message stores the sender's IP (forwarded by the gateway in Stage 6) for moderation. A sender who has disabled chat cannot post, and messages from a blocked sender are hidden from the viewer.
  • Nudge: folded into the chat as a nudge message kind. The player awaiting the opponent may nudge once per hour per game; it is not allowed on one's own turn. The platform-native delivery is wired with the gateway / platform side-service (Stage 6 / 8).
  • Profile: preferred_language (en/ru), display name, email (confirm-code binding, see §4), timezone (drives the away window and the robot's sleep; user-editable), the daily away window and the block toggles — all editable through account.UpdateProfile. Linked platform accounts and merge are Stage 10.

9. Persistence

  • Single Postgres database, schema backend; backend is the only writer. The "pgx pool" is a database/sql handle backed by the pgx stdlib driver and instrumented with otelsql; type-safe queries use go-jet (code generated into internal/postgres/jet and committed, regenerated by cmd/jetgen). Migrations are embedded SQL applied with pressly/goose/v3 at startup. Primary keys are application-generated UUIDv7.
  • Tables: accounts (durable internal accounts; Stage 3 added the away-window columns away_start/away_end and the hint wallet hint_balance; Stage 6's migration 00005 added the is_guest flag for ephemeral guest rows), identities (platform/email/robot identities, unique (kind, external_id); Stage 5's migration 00004 admits the robot kind), sessions (revoke-only opaque-token hashes), the Stage 3 game tables games (Stage 4 added the dropout_tiles disposition column), game_players, game_moves (the move journal), complaints and account_stats, and the Stage 4 social/lobby tables friendships (the request/accept graph), blocks (per-user blocks), chat_messages (per-game chat and nudges), email_confirmations (pending confirm-codes) and game_invitations / game_invitation_invitees (friend-game invitations). The matchmaking pool is in-memory and persists nothing.
  • Active games are event-sourced. A game is a games row (pinned variant/dict_version, bag seed, the per-game settings, and a denormalised turn cursor) plus an append-only, decoded move journal (game_moves); the live position is an engine.Game held in an in-memory cache (≈24 h idle TTL) and rebuilt by replaying the journal on a miss, which the seeded bag makes exact. Each game is serialised by a per-game lock; a persistence failure evicts the live game so the next access rebuilds from the journal. game_players records each seat's account, running score, hints used and winner flag.
  • Statistics (account_stats, recomputed on each finish for durable non-guest accounts only — the finish-time recompute skips any is_guest seat): wins, losses, draws, max points in a game, and max points for a single move (which already folds in every word the move formed plus the all-tiles bonus). A tie increments draws only; a resignation or timeout is a loss for the acting player.

9.1 History invariant (must hold forever)

Archived games must replay independently of any dictionary and of the solver's internal encoding — at least visually. Therefore the move journal persists only decoded concrete values: action kind (play / pass / exchange / resign / timeout), acting player, per-move score and running total, timestamp, and — in a per-move JSON payload — the acting player's rack before the move (with ? for a blank), and for a play its direction, main-word anchor, placed tiles (letter as text, coordinate, blank flag) and the words formed; for an exchange, the swapped tiles. This is exactly what is needed both to replay the game through the engine (a cache miss) and to render history or emit GCG without a dictionary: the board for visual replay is reconstructed by applying placements onto an empty grid, since moves were validated at play time and scores are stored. variant and dict_version are kept as metadata only (audit, complaint review), never as a replay dependency. GCG export is derived from the same rows and is likewise self-contained — we ship our own writer (the solver exposes none): the standard Poslfit dialect (UTF-8, #player/#lexicon pragmas, 8G/H8 coordinates, lower-case blanks, . pass-throughs, -TILES exchanges), plus #note lines for resignations and timeouts, which the standard does not cover.

10. Notifications

Two channels: the in-app live stream (delivered from Stage 6) and platform-native push (out-of-app, via the platform side-service — Stage 8). The backend emits notification intents through an in-process hub (internal/notify, a Publisher seam installed on the game, social and lobby services); a single backend→gateway gRPC server-stream (Push.Subscribe, pkg/proto/push/v1) carries every event, and the gateway fans them out by user_id to each client's Connect Subscribe stream while the app is open. The catalog is your-turn and opponent-moved (emitted from the game commit, so robot-driver and timeout-sweeper moves emit too), chat-message and nudge (from the social service), and match-found (from the matchmaker, §8). Event payloads are FlatBuffers-encoded by the backend and forwarded verbatim. A client that is not currently streaming falls back to the matchmaker's Poll for match-found. Out-of-app platform push (your-turn, nudge) is wired in Stage 8; session-revocation events and cursor-based stream resume are deferred (single-instance MVP).

11. Observability

  • Structured logging with go.uber.org/zap (JSON). OpenTelemetry tracer and meter providers are wired (Stage 1), env-gated by BACKEND_OTEL_{TRACES,METRICS}_EXPORTER with a default of none (so no collector is required locally or in CI); stdout is available for debugging and the Postgres pool is instrumented with otelsql. OTLP export, a Prometheus pull endpoint, and dashboards arrive with the first real workload.
  • Per-request server-side timing via gin middleware from day one (the access log carries method, route, status, latency and the active trace id). A client-measured RTT piggybacked on the next request is a later enhancement.
  • Unauthenticated GET /healthz (liveness) and GET /readyz (readiness — the database answers a bounded ping and the session cache is warmed).
  • The backend serves a second listener — a gRPC server (BACKEND_GRPC_ADDR, default :9090) for the live-event push stream to the gateway — alongside the HTTP listener; both start together and stop on signal.

12. Security boundaries

Concern Enforced by
Public rate limiting / anti-abuse gateway
Platform credential validation, session minting gateway
Session → user_id resolution, X-User-ID injection gateway
Authorisation, ownership, state transitions backend (X-User-ID is the sole identity input)
Admin authentication gateway validates HTTP Basic Auth (GATEWAY_ADMIN_*), then reverse-proxies to backend admin endpoints
backend ↔ gateway trust the network (only gateway may reach backend)

This is an explicit, accepted MVP risk: compromise of the gateway↔backend network segment defeats backend authentication. Mitigated by network isolation; mutual auth is a future hardening step.

13. Deployment (informational)

Single public origin, path-routed: the UI, the gateway public surface and the admin surface share one host that terminates TLS. MVP runs one gateway, one backend, one Postgres. Docker/compose environments are introduced when there is something to deploy.

14. CI & branches

  • Trunk is master; feature work happens on feature/* branches merged via PR with a green CI gate (from Stage 1 onward — the genesis commit necessarily lands on master).
  • .gitea/workflows/ holds the CI. go-unit.yaml runs gofmt/vet/build/unit-test on Go changes; integration.yaml runs the Postgres-backed tests behind the integration build tag (testcontainers postgres:17-alpine, Ryuk disabled, serial). Further workflows (ui-test, deploy) are added with the components they cover.
  • Since Stage 2 both Go workflows clone the public scrabble-solver sibling (master HEAD, no credentials) into ../scrabble-solver before building, so the go.work replace resolves; the engine tests read the committed DAWGs from that checkout via BACKEND_DICT_DIR.
  • After any push, the run is watched to green before a stage is declared done (python3 ~/.claude/bin/gitea-ci-watch.py).