Pass (a) removed a stale "(reaping abandoned guest rows is deferred — TODO-3)" note from ARCHITECTURE §3, but guest reaping is implemented (the background reaper, BACKEND_GUEST_REAP_INTERVAL / BACKEND_GUEST_RETENTION, covered by inttest). State the current behaviour instead. A full section-by-section review of ARCHITECTURE / FUNCTIONAL (+_ru) / TESTING / UI_DESIGN against the code found no other drift — each R-phase baked its own docs, and FUNCTIONAL/TESTING already describe the reaper correctly.
51 KiB
Scrabble Game — Architecture
Source of truth for the platform architecture, transport, security model and
cross-service contracts. User-visible behaviour per domain lives in
FUNCTIONAL.md; the staged build order lives in
../PLAN.md. This document always describes the current
design, not the history of how it was reached. Sections describing
not-yet-implemented components are marked (planned).
1. Overview
Three executables plus per-platform side-services:
gateway— the only public ingress (modulescrabble/gateway). Performs anti-abuse (rate limiting), authenticates the player against the originating platform (or an email/guest session), resolves the internaluser_id, and forwards authenticated traffic tobackendwith anX-User-IDheader. Serves the backend's admin console at/_gmon its public listener behind HTTP Basic Auth. Bridges live events frombackendto the client. The shared wire contracts (the push proto and the FlatBuffers edge payloads) live inscrabble/pkg, imported by bothgatewayandbackend.backend— internal-only service that owns every domain concern: identity/sessions, accounts and linking, lobby and matchmaking, the game runtime, the robot opponent, chat, notifications, statistics, history, and administration. Embeds thescrabble-solverengine as a library, in-process — there is no per-game container. The only network consumer ofbackendisgateway(plus platform side-services over an internal API).ui— pure-HTML5 client (plain Svelte 5 + TypeScript + Vite, static build; no SvelteKit). Talks tobackendonly throughgatewayover Connect-RPC + FlatBuffers, with the edge TS bindings generated from the sameedge.protoandscrabble.fbsand committed underui/src/gen/. The client covers auth, "my games", auto-match, the board (play/pass/exchange/ resign), hint, word-check, chat/nudge, the live stream, i18n (en/ru) and a profile view, plus the social/account/history surfaces. There is no board on the wire — the client reconstructs the 15×15 board by replaying the move journal (§9.1) and renders board, tiles, premium squares and effects as pure CSS + Unicode (no image/font/SVG assets). Tiles are placed by Pointer-Events drag or tap; a CSS-token theme is light/dark and Telegram-themeParams-ready; navigation is a hash router and the session token is held in memory + IndexedDB. A build-flagged in-memory mock transport (pnpm start) runs the whole slice with no backend. Embeddable in platform webviews; packageable to native (iOS/Android) via Capacitor. The client uses a mobile-app shell (a growing nav bar; content pinned to the bottom), a one-line announcement banner under the nav (a client-side mock rotation, gated off in the build until polished after release — a server-driven channel later, §10), and a client board-style setting (bonus-label mode). The visual/interaction design system is documented inUI_DESIGN.md.platform/telegram— the Telegram side-service (the "connector", modulescrabble/platform/telegram). It is the only component holding the bot tokens — one bot per service language (en/ru), each its own token + game channel, the same Telegram user id spanning both (§3). It runs a Bot API long-poll loop per bot (Mini App launch +/startdeep-links) and serves a gRPC API (pkg/proto/telegram/v1) thatgateway(Mini App initData validation and out-of-app push) andbackend(operator broadcasts) call over the trusted internal network. Its generic delivery methods are platform-agnostic (keyed by the identityexternal_id), so a future VK/MAX connector reuses them; only initData validation is Telegram-specific. It runs in its own container, egressing to Telegram through a VPN sidecar.
flowchart LR
Client((Client / webview)) -- Connect-RPC + FlatBuffers (h2c) --> Gateway
Gateway -- REST/JSON, X-User-ID --> Backend
Backend -- gRPC server-stream (live events) --> Gateway
Gateway -- in-app stream --> Client
Backend -- pgx --> Postgres[(Postgres)]
Backend -. embeds .- Solver[[scrabble-solver library]]
Gateway -- gRPC (validate initData, out-of-app push) --> Telegram[Telegram connector]
Backend -. operator broadcasts (gRPC) .-> Telegram
Telegram -- Bot API (via VPN sidecar) --> TgCloud((Telegram))
The MVP runs gateway and backend as single-instance processes inside a
trusted network. No Redis is planned (anti-replay crypto was deliberately
dropped). Horizontal scaling is explicit future work.
2. Transport
- client ↔ gateway: Connect-RPC + FlatBuffers over HTTP/2 cleartext
(
h2c). Binary payloads, server-streaming for the in-app live channel, first-class JS clients (@connectrpc/connect-web+ theflatbuffersnpm package). The contract is kept minimal: a singleGatewayservice (defined ingateway/proto/edge/v1) withExecute(message_type, payload, request_id)for unary operations andSubscribefor the live stream. The proto envelope is a thin carrier; the real request/response and event bodies are FlatBuffers tables (pkg/fbs, thescrabblefbnamespace) inside thepayloadbytes, which the gateway transcodes to and from the backend's JSON. The session token rides in theAuthorization: Bearerheader (there is no per-request signing, §3); auth operations are unauthenticated and return the minted token. A unary operation's domain outcome rides back inExecuteResponse.result_code(HTTP 200); only edge failures (rate limit, missing session, unknown type, internal) surface as Connect error codes. The client treats a connectivity edge failure as state, not a per-call toast: a transportunavailableor arate_limitedflips a globalonlinesignal that drives a header "Connecting…" spinner and softly disables proactive actions, and the transport auto-retries with capped exponential backoff — every op on a rate-limit (the gateway rejected it before processing, so it is safe), but only read-only ops onunavailable(a mutation is never blindly re-sent, to avoid double-applying one whose response was lost — its button is disabled while offline and the player re-issues it on reconnect). A reachability watcher (a lightweightprofile.getprobe) clears the signal when no other traffic is in flight; the liveSubscribestream's drop/recovery feeds the same signal. Edge hardening: every request body on the public listener is capped atGATEWAY_MAX_BODY_BYTES(default 1 MiB — far above any legitimate payload), both at the HTTP layer (http.MaxBytesReader) and as the Connect per-message read limit, so an oversizedExecuteis refused (resource_exhausted) without buffering. The h2c server carries explicit sizing:MaxConcurrentStreams250 (the x/net default made visible — a real client holds oneSubscribestream plus a few unary calls) and a 3-minute connectionIdleTimeout(a liveSubscribestream keeps its connection active, so only abandoned connections are reaped); thehttp.Serversets onlyReadHeaderTimeout(10 s) — Read/WriteTimeout would kill the stream. - Alphabet on the wire: live play exchanges alphabet indices, not
concrete letters. The rack (
StateView.rack), theSubmitPlay/Evaluatetiles, theExchangetiles and theCheckWordword areubyteindices into the variant's alphabet (a blank is the sentinel index 255). The client is alphabet-agnostic: on a per-variant cache miss it setsStateRequest.include_alphabet, and the backend embeds the variant's(index, letter, value)table (engine.AlphabetTable, derived from the solver ruleset — no dictionary) for display; the client caches it by variant and renders the rack and the blank chooser from it. The backend maps index↔letter at its REST edge, so the gateway forwards indices verbatim (it holds no alphabet table) and the engine's letter-based domain API — shared with the robot — is unchanged. The table is pinned by the solver version, so it cannot drift from the running backend. The move journal, history and GCG are unaffected (they stay decoded concrete characters, §9.1). - gateway ↔ backend (sync): plain HTTP REST/JSON. The gateway injects
X-User-IDfor authenticated requests;backendnever re-derives identity from the body. - backend → gateway (live): a single gRPC server-stream carries live events (your-turn, opponent-moved, chat, nudge). The gateway bridges them to the client's in-app stream while the app is open. Out-of-app delivery uses platform-native push via the platform side-service.
3. Authentication & sessions
Platform-native, deliberately simple: no Ed25519 client keys, no per-request signing, no anti-replay crypto (these were considered and dropped — players arrive from a platform rather than completing a mandatory registration).
- The gateway validates the originating credential once — Telegram
initData(delegated to the connector'sValidateInitDataRPC, which holds the bot token — the HMAC secret — so it never reaches the gateway), an email-code login, or a guest bootstrap — then mints a thin opaque server session token (session_id). First Telegram contact seeds the new account's language (from the launchlanguage_code) and display name (§4). - Service language & variant gating. The connector hosts one bot per
service language (
en/ru), each its own token + game channel; the same Telegram user id spans both.ValidateInitDatatries each token in turn and returns the validating bot's service language and its supported-languages set. The set rides theSession(FlatBuffers, session-scoped, not persisted): the UI offers only the variants those languages support on New Game (en→ English;ru→ Russian- Эрудит). Starting a new game is the only gated action — opening and playing
existing games of any language is unrestricted, and the backend does not enforce the
gate (it is a product affordance, not a trust boundary). The service language is
persisted per account (
accounts.service_language, updated on every Telegram login — last-login-wins) and routes the user's out-of-app push back through the right bot (§10) — except a game event, which routes by the game's own language (its variant → en/ru), so a game's notification always comes from the game's bot rather than the recipient's latest login bot. The service language is distinct frompreferred_language(the interface language) and from a game's variant language. Non-Telegram logins (web / email / guest) carry the gateway's default set (GATEWAY_DEFAULT_SUPPORTED_LANGUAGES, all variants by default).
- Эрудит). Starting a new game is the only gated action — opening and playing
existing games of any language is unrestricted, and the backend does not enforce the
gate (it is a product affordance, not a trust boundary). The service language is
persisted per account (
- The client holds
session_idin memory for the app session (browser/OS storage is optional and may be unavailable; losing it means re-login). - The gateway caches
session → user_idand injectsX-User-ID. Session records live inbackend, which stores only a SHA-256 hash of the opaque token (never the plaintext), keeps a warmed in-memory cache for fast resolution, and treats sessions as revoke-only — they have no TTL and live until explicitly revoked (status→revoked). A revoke can target one token or, on an account merge (§4), every session of the retired account (RevokeAllForAccount, which also evicts them from the warm cache). - Guest = ephemeral web session (no platform, no email). A guest is backed by
a durable
accountsrow flaggedis_guestand carrying no identity — the row is a technical necessity (thesessionsandgame_playersforeign keys require one, the same way the robot pool is durable), not a profile: no friends, statistics or history are kept for it, and it is restricted to auto-match. A background guest reaper deletes an abandoned guest — flaggedis_guest, holding no game seat, older thanBACKEND_GUEST_RETENTION— on aBACKEND_GUEST_REAP_INTERVALsweep, so transient guest rows do not accumulate. Platform and email users are auto-provisioned durable accounts with an identity.
4. Accounts, identities, linking & merge
- One internal account may carry several platform identities
(
telegram,vk, …) plus an optional email identity. First contact from a platform auto-provisions a durable account bound to that platform identity. Concretely, platform and email identities share oneidentitiestable keyed by a unique(kind, external_id); email is an identity withkind=emailand aconfirmedflag. A synthetickind='robot'identity backs each pooled robot opponent (§7). The email confirm-code flow binds an email to the authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute TTL, ≤ 5 attempts) is sent through aMailerseam (an SMTP relay, or a development log mailer when none is configured) and, once verified, attaches a confirmed email identity. Accounts and identities use application-generated UUIDv7 primary keys. A service flagpaid_account(lifetime one-time payment; no purchase flow yet) is carried on the account and ORed on a merge. - Linking is initiated from an authenticated profile and proves
control of the identity before attaching it: email through the confirm-code
flow, Telegram through the web Login Widget (validated by the connector,
HMAC under
SHA-256(bot_token)— distinct from Mini App initData; the gateway passes the trustedexternal_idto the backend, as forauth.telegram). The request step always sends/accepts the proof (no pre-send "already taken" signal, so a probe cannot enumerate registered addresses); a required merge is revealed only after the proof is verified and is performed behind an explicit, irreversible confirmation. A free identity is simply attached (and a guest is promoted to durable, clearingis_guest). - Merge retires the account that owns the linked identity into the current
account, in a single transaction (
internal/accountmerge): statistics summed (max points kept), the hint wallet summed,paid_accountORed, identities repointed, games / chat / complaints transferred, friends and blocks de-duplicated (friendships keep the strongest status accepted>pending>declined), pending invitations/codes dropped, and the secondary kept as an audit tombstone (accounts.merged_into/merged_at) so a shared finished game's no-cascade foreign keys stay valid — its seat there is left untouched. A merge is refused only when the two share an active game. The current account is the primary, except when the initiator is a guest and the linked identity already has a durable owner: then the durable account wins, the guest's active games move into it, the guest is retired, and a fresh session is minted for the durable account (the client switches to it). The secondary's sessions are revoked (§3). High blast-radius; isolated and well-tested.
5. Game engine integration (scrabble-solver)
backend embeds the solver library in-process behind internal/engine, the
only package that imports scrabble-solver (see CLAUDE.md for
the solver's public API and constraints). The engine is a self-contained rules
library — no persistence, transport or scheduling; the game domain drives it.
Key points:
- Variants at launch: English Scrabble, Russian Scrabble, Эрудит
(
engine.Variant, mapping torules.English()/RussianScrabble()/Erudit()). Эрудит's specifics (non-doubling centre,ёwith no tiles, 3 blanks, a 15-point bonus) live entirely in the solver ruleset, so the engine treats every variant uniformly. - Dictionaries are committed DAWGs loaded with
dawg.Loadfrom the directoryBACKEND_DICT_DIR;backendloads theengine.Registryat startup as a hard dependency (like migrations), so a missing dictionary fails the boot. The registry holds dictionaries in memory addressed by(variant, dict_version), tracking the latest version per variant, and answers the word-check tool throughRegistry.Lookup. - Dictionary versioning — pin per game. A game records the
dict_versionit started on and finishes on that version; new games use the latest. Multiple versions may be resident at once. The boot version loads from the flatBACKEND_DICT_DIR; the admin console hot-reloads a new version from a per-version subdirectoryBACKEND_DICT_DIR/<version>/throughRegistry.LoadAvailable(only the variants whose DAWG is present there), and a restart re-loads every resident version viaengine.OpenWithVersions(the flat boot version plus each subdirectory). In-flight games keep their pinned version; new games use the latest. (The solver is published as a versioned module and the dictionaries ship as a separate versioned release artifact from thescrabble-dictionaryrepo; the runtime contract above is unchanged.) - Move generation/validation/scoring use
Solver.GenerateMoves(ranked),Solver.ValidatePlayandSolver.ScorePlay; board mutation usesscrabble.Apply. The engine adds its own deterministic, seeded tile bag that can return tiles (an exchange needs this; the solver's self-play bag cannot). engine.Gameis the in-memory match state and the pure rules engine: it deals racks, applies legal plays / passes / exchanges / resignations, refills from the bag, keeps the scores and whose turn it is, and detects the end of the game — empty bag with an empty rack, or six consecutive scoreless turns, applying the end-game rack-value adjustment, or a resignation. On a resignation the resigner keeps their accumulated score (no rack adjustment) and never wins: the win goes to the highest score among the remaining seats, unconditionally the other player in a two-player game. A player may resign on the opponent's turn (a forfeit is not a turn-scoped move):engine.ResignSeat(seat)resigns that player's own seat whoever is to move, and the game domain skips the turn check for resign. The engine exposes a decoded, solver-free API (SubmitPlay/SubmitExchange/EvaluatePlay/HintView/Hand) sointernal/gamedrives it without importing the solver.- The game domain (
internal/game) owns everything the engine does not — persistence, turn scheduling, the configurable turn timeout / auto-resign, the hint budget, word-check complaints, history and GCG — and is the engine's only consumer. Timeout auto-resign reusesengine.Resign, recording the move as a timeout, so it inherits the resignation win/loss. - History is dictionary-independent (§9.1): the engine emits decoded
MoveRecords and reconstructs the board from them withengine.ReplayBoard(alphabet only, no dictionary).
6. Game rules
- Word legality: validate-at-submit. An illegal play is rejected by
Solver.ValidatePlay; there is no challenge phase. - End of game: the bag is empty and a player empties their rack, or 6 consecutive scoreless turns (passes/exchanges), or a resignation, or a missed turn. The per-game turn timeout is chosen at creation (5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h); a turn not made within it becomes an automatic resignation, applied by a background sweeper. The sweeper honours each player's away window — a daily local-time sleep interval on the account (default 00:00–07:00, midnight-cross aware) — so a player is never timed out while asleep.
- Players: auto-match is always 2 players; friend games are 2–4 players.
backendowns turn order and the bag for any player count. A resignation or timeout in a two-player game ends it with the other player winning. In a game with three or more seats a resignation or timeout drops that seat and the rest play on — the engine skips the resigned seat in the turn rotation and excludes it from the win, finishing the game (the sole survivor wins) only once one active seat remains, or by the ordinary end conditions among the active seats. A per-game drop-out tile disposition, chosen at creation (dropout_tiles:removefrom play — the default — orreturnto the bag), governs the leaver's rack, which is never revealed to the remaining players; it is recorded for deterministic journal replay. (Two-player games end on the first drop-out, so the disposition does not affect them.) - Hint: governed by two per-game settings — whether hints are allowed and the
starting per-player allowance — plus a per-account hint wallet
(
hint_balance, spent after the allowance; top-ups are a later feature). A hint reveals the top-1 ranked move (GenerateMoves[0]). The lobby/tournament caller picks the per-game defaults (e.g. one in casual random games, none in tournaments). The client lays the hinted tiles onto the board as a pending placement and leaves the commit to the player. When the rack has no legal move the service spends nothing and returnsErrNoHintAvailable— surfaced as the distinct result codeno_hint_available(separate fromhint_unavailable) so the UI can say "no options" rather than "no hints left". - Word-check tool: unlimited dictionary lookups against the game's pinned
dictionary; each result offers a complaint (complainant, game, variant,
dict_version, word, the disputed result, an optional note) that lands in the admin
review queue. An operator resolves it (
open → resolved) with a disposition — reject, accept-add or accept-remove; the accepted ones form a derived pending-changes list that feeds the offline dictionary rebuild and is marked applied once the rebuilt version is hot-reloaded (§5, §12).
7. Robot opponent
Substitutes for a human in 2-player auto-match when the pool yields no human
within 10 seconds (§8). It lives in internal/robot and plays as an ordinary
seated account through the game service, so only internal/engine imports the
solver. It is designed to be indistinguishable from a person.
The robot keeps no per-game state: every choice is derived deterministically
from the game's bag seed (a restart-stable FNV-1a mix), so a background driver
(robot.Service.Run, mirroring the turn-timeout sweeper) recomputes the same
behaviour on every scan and after a restart — the same philosophy as journal
replay. A pool of durable accounts — each a kind='robot' identity (§4), keyed
robot-<lang>-<index> and provisioned at startup with chat blocked but friend
requests open — a request to a robot is accepted as pending and expires unanswered
(the robot never responds), mirroring a human who ignores it; the chat
block backs the human-like names (there is no DM surface; chat is per-game). Names are
composed per language from a first-name pool (32 full + 32 colloquial forms) and
a surname pool (gender-agreed for Russian) in one of three forms (first only /
first + surname initial / first + full surname), deterministically per pool slot so
they stay stable across restarts. Substitution is variant-aware: a Russian game
(Russian Scrabble or Эрудит) draws a Russian-named robot with at most ~20% Latin, an
English game the Latin pool.
- Balance: at game start it decides once whether to play to win, with
P(play-to-win) ≈ 0.40(so the human wins ≈ 60%), derived from the seed. Adaptive difficulty is post-MVP. - Margin targeting: each turn it picks from the ranked candidates
(
engine.Candidates) the move whose resulting lead (playing to win) or deficit (playing to lose) is closest to a small band (1–30 points), rather than always the maximum; with no legal play it exchanges a full rack when the bag can refill it, else passes. - Timing: the per-move delay is move-number-aware — a right-skewed sample (exponent k=4, short delays frequent) from a band that interpolates from [3, 10] min at the first move to [10, 90] min by ~28 moves, so openings are quick and the endgame can run long, clamped to [1, 90] minutes; it sleeps 00:00–07:00 anchored to the opponent's profile timezone with a per-game drift of ±3 h (fallback UTC), so its night overlaps the human's rather than running anti-phase; on a daytime nudge it replies near the move's lower band; it proactively nudges the idle human on a lengthening, randomized schedule — the first ~60-90 min into the turn, each later reminder spaced further out toward 1-6 h — so a long wait gets a handful of increasingly-spaced nudges rather than an hourly stream.
- Observability: robot accounts accrue ordinary statistics (§9) — the
authoritative balance metric (target ≈ 40% robot wins) — and a
robot_games_finished_totalOTel counter plus a per-finish log give a live view. The admin game card surfaces each robot seat's per-game play-to-win intent (from the seed) and, on the robot's turn, its deterministic next-move ETA.
8. Lobby & social
- Matchmaking: an in-memory FIFO pool keyed by
variant(the variant fixes the board language), pairing the next two humans into a two-player auto-match with the seat order randomised for first-move fairness. The pool is lost on restart (players re-queue) and is anonymous, so it does not consult blocks. After 10 s with no human a background reaper substitutes a pooled robot (§7) and starts the game. On a pairing or substitution the matchmaker emits a match-found notification (§10), delivered over the live stream;Pollremains as a fallback for a client that is not currently streaming. Cancel (POST /lobby/cancel) removes the player from the pool and drops any pending matched result, so a cancelled quick-match is dequeued rather than left for the reaper to robot-substitute. - Friends: two add paths over one
friendshipstable. A one-time code the to-be-added player issues (afriend_codesrow: 6-digit numeric, SHA-256-hashed, 12 h TTL, one live code per issuer, single-use, redeem rate-limited) is redeemed by the other player to become friends immediately. Alternatively a request → accept is sent to someone you share a game with (active or finished); the recipient may accept, ignore (the pending row lazily expires after 30 days and may be re-sent), or decline — a decline is remembered (status='declined') and blocks further requests from that sender, unless they hand them a code, which overrides it. The requester's own cancel still deletes the row; blocking someone severs an existing friendship. (Discovery by friend list or platform deep-link is future work.) - Block: two independent global account toggles (
block_chat,block_friend_requests) plus a per-user block list. A per-user block is applied mutually: it hides the pair's chat from each other and refuses friend requests and game invitations between them. - Friend games: formed by invitation → accept (an
game_invitationsrecord with one row per invitee). The 2–4 player game starts once every invitee accepts; any decline cancels the invitation, and a pending invitation expires after 7 days (enforced lazily on access). - Chat: per-game, persisted (kept with the game's archive), ≤ 60 runes, and validated on input — links, email addresses and phone numbers (including lightly obfuscated forms) are rejected, since the chat is for quick reactions, not contact exchange. Each message stores the sender's IP (forwarded by the gateway) for moderation. A sender who has disabled chat cannot post, and messages from a blocked sender are hidden from the viewer. The operator console has a Messages section that lists posted messages (nudges excluded) newest-first with the sender's resolved name, source (guest / robot / oldest identity kind), IP and game, searchable by sender name / external-id glob masks and pinnable to one game or sender (linked from the game and user cards).
- Nudge: folded into the chat as a
nudgemessage kind. The player awaiting the opponent may nudge once per hour per game; it is not allowed on one's own turn. The platform-native delivery runs through the gateway and the platform side-service. - Profile:
preferred_language(en/ru, edited in Settings), display name, email (confirm-code binding, see §4), timezone, the daily away window and the block toggles — all editable throughaccount.UpdateProfile, which validates them: a display name is Unicode letters joined by single/./_separators (no leading/trailing/adjacent separators, ≤ 32 runes); the timezone is a fixed±HH:MMUTC offset (or a legacy IANA name) resolved byaccount.ResolveZonefor the sweeper and the robot's sleep (a fixed offset trades DST for a simple picker); the away window is at most 12 h (midnight-wrap aware). Linked platform accounts and merge are covered in §4.
9. Persistence
- Single Postgres database, schema
backend;backendis the only writer. The "pgx pool" is adatabase/sqlhandle backed by the pgx stdlib driver and instrumented with otelsql; type-safe queries use go-jet (code generated intointernal/postgres/jetand committed, regenerated bycmd/jetgen). Migrations are embedded SQL applied withpressly/goose/v3at startup. Primary keys are application-generated UUIDv7. - Tables:
accounts(durable internal accounts, carrying the away-window columnsaway_start/away_end, the hint wallethint_balance, theis_guestflag for ephemeral guest rows, thenotifications_in_app_onlyout-of-app push toggle, thepaid_accountservice flag and the merge-tombstone columnsmerged_into/merged_at),identities(platform/email/robot identities, unique(kind, external_id), thekindadmittingrobot),sessions(revoke-only opaque-token hashes), the game tablesgames(carrying thedropout_tilesdisposition column),game_players,game_moves(the move journal),complaintsandaccount_stats, and the social/lobby tablesfriendships(the request/accept graph, its status admittingdeclined),blocks(per-user blocks),chat_messages(per-game chat and nudges),email_confirmations(pending confirm-codes),game_invitations/game_invitation_invitees(friend-game invitations),friend_codes(one-time add-a-friend codes),game_drafts(a player's in-progress rack order + board composition per game) andgame_hidden((account_id, game_id)rows that drop a finished game from one account's own lobby list, leaving it visible to the other players — finished-only and irreversible by design, so there is no un-hide). The matchmaking pool is in-memory and persists nothing. - Active games are event-sourced. A game is a
gamesrow (pinnedvariant/dict_version, bagseed, the per-game settings, and a denormalised turn cursor) plus an append-only, decoded move journal (game_moves); the live position is anengine.Gameheld in an in-memory cache (≈24 h idle TTL) and rebuilt by replaying the journal on a miss, which the seeded bag makes exact. Each game is serialised by a per-game lock; a persistence failure evicts the live game so the next access rebuilds from the journal.game_playersrecords each seat's account, running score, hints used and winner flag. - Statistics (
account_stats, recomputed on each finish for durable non-guest accounts only — the finish-time recompute skips anyis_guestseat): wins, losses, draws, max points in a game, and max points for a single move (which already folds in every word the move formed plus the all-tiles bonus). A tie increments draws only; a resignation or timeout is a loss for the acting player.
9.1 History invariant (must hold forever)
Archived games must replay independently of any dictionary and of the
solver's internal encoding — at least visually. Therefore the move journal
persists only decoded concrete values: action kind (play / pass / exchange /
resign / timeout), acting player, per-move score and running total, timestamp,
and — in a per-move JSON payload — the acting player's rack before the move (with
? for a blank), and for a play its direction, main-word anchor, placed tiles
(letter as text, coordinate, blank flag) and the words formed; for an exchange,
the swapped tiles. This is exactly what is needed both to replay the game
through the engine (a cache miss) and to render history or emit GCG without a
dictionary: the board for visual replay is reconstructed by applying placements
onto an empty grid, since moves were validated at play time and scores are
stored. variant and dict_version are kept as metadata only (audit,
complaint review), never as a replay dependency. GCG export is derived from
the same rows and is likewise self-contained — we ship our own writer (the solver
exposes none): the standard Poslfit dialect (UTF-8, #player/#lexicon
pragmas, 8G/H8 coordinates, lower-case blanks, . pass-throughs, -TILES
exchanges), plus #note lines for resignations and timeouts, which the standard
does not cover. GCG export is offered only on a finished game (game.ErrGameActive
otherwise), so an in-progress journal is never leaked mid-play; the client
shares the .gcg file via the Web Share API where available, else downloads it.
The alphabet-on-the-wire transport does not touch this invariant: the live edge
exchanges alphabet indices, but the persisted journal (and everything derived from it —
replay, history, GCG) keeps the decoded concrete letters described above, so an archived
game still replays with the variant's rules.Alphabet alone, independent of any dictionary.
10. Notifications
Two channels: the in-app live stream and
platform-native push (out-of-app, via the platform side-service).
The backend emits notification intents through an in-process hub
(internal/notify, a Publisher seam installed on the game, social and lobby
services); a single backend→gateway gRPC server-stream (Push.Subscribe,
pkg/proto/push/v1) carries every event, and the gateway fans them out by
user_id to each client's Connect Subscribe stream while the app is open. The
catalog is your-turn and opponent-moved (emitted from the game commit, so
robot-driver and timeout-sweeper moves emit too; opponent-moved goes to every seat,
including the mover, so the mover's own other devices and their lobby refresh — it is
in-app only, so the actor gets no out-of-app push for their own move), chat-message and nudge
(from the social service), match-found (from the matchmaker, §8), and notify
(a lightweight "re-poll" signal carrying a sub-kind: friend-request,
friend-added, friend-declined, invitation or game-started; emitted on a friend-request,
on answering one (accept → friend-added, decline → friend-declined — to the original
requester, so a game screen watching that opponent re-derives its "add to friends" state),
and on an invitation create or its game start). game-over is emitted to every
seat from the same game commit when a game finishes — any path: a closing play, all-pass,
resign or timeout — and your-turn is enriched so the out-of-app push reads in full: it
also carries the mover's display name, their last action and the main word of a scoring play,
and a recipient-first running score line (e.g. 120:95:80, the reader's score first).
The in-app stream is a delta channel so the client renders from the event
without a follow-up game.state: opponent-moved carries the committed move plus the post-move
summary (per-seat scores, whose turn, move count, status) and the bag size, which the client
applies to its per-game cache keyed on the move count — idempotent (a re-delivered or own-move
echo is a no-op) and gap-safe (a missed move falls back to a game.state + game.history
refetch); your-turn carries that move count as a consistency check; match-found and the
game-started notify carry the recipient's full initial StateView, so opening a freshly
started game is instant; game-over carries the final summary; the lobby notify sub-kinds
carry the changed account / invitation. The move-commit response (submit_play / pass /
exchange / resign) likewise returns the actor's own refilled rack and bag size, so the mover
renders the next turn without a self-refetch. The notify package owns the FlatBuffers encoding
(fed wire-agnostic input structs by the domain services) and the gateway forwards every payload
verbatim. A client that is not currently streaming falls back to the matchmaker's Poll for
match-found — the client polls only while the stream is down, since a live stream delivers
match-found itself; for the lobby notification badge (incoming friend requests + open
invitations) the client re-polls on the notify event and on lobby open / focus, covering a push
missed while the app was hidden. Out-of-app platform push is a fallback
the gateway routes from the same firehose: for an event whose recipient has no
live in-app stream it resolves the backend /internal/push-target (their Telegram
external_id, the service language — the bot they last signed in through, falling
back to the interface language — and the notifications_in_app_only flag). A game event,
however, carries the game's own language on the push, and the gateway routes by
that instead of the service language — so a game's notification always comes from the game's bot,
not the recipient's latest-login bot. It then asks the Telegram connector to deliver a
localized message with a Mini App deep-link button — only when the recipient has a Telegram
identity and has not confined notifications to the app, so the two channels never duplicate. The
connector routes by that language to the matching bot and renders the message in it. The out-of-app set is
your-turn, game-over, nudge, match-found and the invitation / friend-request notify sub-kinds;
the connector renders the message and skips the rest. Operator broadcasts
(SendToUser / SendToGameChannel, §10 admin) instead pick the bot by an
operator-chosen language in the console, unrelated to the recipient's login. Session-revocation events and
cursor-based stream resume stay deferred (single-instance MVP).
A separate announcements channel feeds the client's one-line banner (UI_DESIGN.md). It is a client-side mock rotation today; a server-driven source (operational notices, promotions) is future work and would deliver short markdown messages (text + links).
11. Observability
- Structured logging with
go.uber.org/zap(JSON). OpenTelemetry tracer and meter providers are wired in all three services (backend, gateway, the Telegram connector) through a sharedpkg/telemetrybootstrap, env-gated per service by{BACKEND,GATEWAY,TELEGRAM}_OTEL_{TRACES,METRICS}_EXPORTERwith a default ofnone(so no collector is required locally or in CI).stdoutis available for debugging;otlp(gRPC, endpoint from the standardOTEL_EXPORTER_OTLP_*environment) exports to a collector. The Postgres pool is instrumented with otelsql andotelgrpctraces the backend↔gateway push stream and the gateway↔connector calls. The OTLP Collector (OTLP/gRPC → Prometheus metrics + Tempo traces), Prometheus (15d), Tempo (72h) and Grafana (provisioned datasources + dashboards, behind the caddy/_gm/grafanaBasic-Auth) are stood up with the deploy (deploy/); the default exporter staysnone, so CI needs no collector. The contour also runs cAdvisor (per-container CPU/memory/network) and postgres_exporter (connections, cache-hit ratio, transactions, db size), scraped by Prometheus and surfaced on the Scrabble — Resources Grafana dashboard, which captures a resource baseline; these export directly in Prometheus format (not through the collector). - Per-request server-side timing via gin middleware from day one (the access log carries method, route, status, latency and the active trace id). A client-measured RTT piggybacked on the next request is a later enhancement.
- Domain/operational metrics, recorded through the meter and invisible
until an exporter is configured: histograms
game_replay_duration(journal rebuild on a cache miss),game_move_validate_durationandgame_move_duration(a seat's think time per committed move, attributed byvariantand aphaseof opening/middle/endgame; it aggregates all seats including robots, whose synthetic timing dominates the tail, so per-human analysis lives in the admin console, below); countersgames_started_total,games_abandoned_total(a turn-timeout seat drop),chat_messages_total(kind= message/nudge) androbot_games_finished_total; an observable gaugegame_cache_active; the gatewayedge_request_duration(the UI-perceived roundtrip, bymessage_type/result); and Go runtime/heap metrics. Game-scoped metrics carry avariantattribute (scrabble_en/scrabble_ru/erudit_ru). - Per-user move-time analytics are offline, derived in the admin
console from the move journal (
game_moves.created_atdeltas, the first move from the game's creation), not Prometheus labels (which anaccount_idwould explode): the user list shows each account's min/avg/max think time, and the user-detail page draws a zero-JS inline-SVG chart of min/mean/max by the player's move number. - User metrics: a backend counter
accounts_created_total(kind= telegram/email/guest; robots are a provisioned pool, not users, and are excluded) and a gateway in-memory observable gaugeactive_users(window= 24h/7d) — distinct accounts that performed an authenticated edge action in the window. The gauge is single-process by design (single-instance MVP, §10): it is correct for one gateway, resets on restart, and is a live operational figure, not a billing count. - Rate-limit observability: every limiter rejection increments the gateway
counter
gateway_rate_limited_total(class= user/public/email/admin — aggregate only, honouring the no-per-user-label discipline above) and logs one Debug line; a gateway reporter drains the per-key rejection tracker every 30 s, emits one Warn summary per throttled key and posts the report to the backend (POST /api/v1/internal/ratelimit/report, network-trusted likesessions/resolve). The backend'sratewatchkeeps a bounded in-memory episode window (single-instance, resets on restart, likeactive_users) surfaced on the admin console's Throttled page next to the flagged-account review queue, and applies the conservative auto-flag: an account sustainingBACKEND_HIGHRATE_FLAG_THRESHOLDrejected calls (default 1000) withinBACKEND_HIGHRATE_FLAG_WINDOW(default 10 min) gets the soft, reversibleaccounts.flagged_high_rate_atmarker — set once, shown in the user list/detail, cleared by the operator, never an automatic ban and never a request gate. The Edge/UX dashboard graphs the aggregate request rate against the rejection rate by class. - Unauthenticated
GET /healthz(liveness) andGET /readyz(readiness — the database answers a bounded ping and the session cache is warmed). - The backend serves a second listener — a gRPC server
(
BACKEND_GRPC_ADDR, default:9090) for the live-event push stream to the gateway — alongside the HTTP listener; both start together and stop on signal.
12. Security boundaries
| Concern | Enforced by |
|---|---|
| Public rate limiting / anti-abuse | gateway (per-IP public/email/admin classes, per-user authenticated class; a request body cap of GATEWAY_MAX_BODY_BYTES; rejections are metered, summarised to the backend and surfaced in the admin console with a conservative reversible auto-flag — §11) |
| Telegram initData validation (bot-token HMAC) | the Telegram connector; the gateway delegates it over gRPC, so the bot token lives only in the connector |
| Session minting; email-code / guest validation | gateway (with backend) |
Session → user_id resolution, X-User-ID injection |
gateway |
| Authorisation, ownership, state transitions | backend (X-User-ID is the sole identity input) |
| Admin authentication | a single Basic-Auth gate on /_gm/*, forwarded verbatim to the backend's server-rendered admin console (and, in the deployed contour, routing /_gm/grafana/* to Grafana). In the deploy the caddy owns this gate (§13); a local non-caddy run uses the gateway's own GATEWAY_ADMIN_* proxy, which the per-IP admin limiter class guards ahead of its Basic-Auth — the caddy-fronted path has no limiter (stock caddy), an accepted gap. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a same-origin check — the console's CSRF defence. No operator identity is tracked |
| backend ↔ gateway ↔ connector trust | the network (only gateway may reach backend; the connector serves unauthenticated gRPC on the internal segment) |
This is an explicit, accepted MVP risk: compromise of the gateway↔backend network segment defeats backend authentication. Mitigated by network isolation; mutual auth is a future hardening step.
Short numeric codes (email confirm-codes and friend codes) are stored only as SHA-256 hashes and are short-lived and single-use. The unauthenticated email path carries a tight per-IP sub-limit (5 / 10 min); the friend-code redeem is authenticated, so it rides the per-user limit (300 / min) and is further bounded by the code's 12 h TTL, single use, and one live code per issuer (which caps the valid-code population). Brute-forcing a 6-digit friend code within these limits is an accepted MVP risk with low blast radius (an unwanted friendship is removable/blockable); a dedicated redeem sub-limit or a longer code is the hardening step if abuse appears.
13. Deployment (informational)
Single public origin, path-routed. The Vite build has two entries: a lightweight
landing page and the game SPA. The gateway embeds the SPA build
(go:embed, baked in by a node stage in gateway/Dockerfile) and serves it at
/app/ (web) and /telegram/ (the Telegram Mini App; outside Telegram that path
redirects to the root — the client-side guard); a stray hit on the gateway's /
308-redirects to /app/. The landing ships in its own static container: the
landing target of gateway/Dockerfile (caddy:2-alpine + the same Vite build,
deploy/landing/Caddyfile) serves it at /, so stray public traffic is absorbed by
static file serving and never reaches the Go edge. Hash-named /assets/* are served
immutable (a relaunch is a cache hit, not a re-download); the HTML shells are
no-cache so a new deploy is picked up — both containers apply the same caching. An
in-compose caddy is the contour's edge: it owns a single /_gm Basic-Auth and
routes /_gm/grafana/* to Grafana (anonymous-admin, so the one shared login gates
it with no per-user Grafana accounts) and the rest of /_gm/* to the backend-rendered
admin console; /app/, /telegram/ and the Connect path go to the gateway; the
catch-all — notably the landing at / — goes to the landing container. The
Telegram connector runs as a separate container with no public ingress — it
long-polls Telegram and egresses through a VPN sidecar, answering only internal gRPC.
The full contour (deploy/docker-compose.yml) runs one gateway, one backend,
one Postgres, the static landing, the connector (+ its VPN sidecar) and the observability stack —
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
with provisioned datasources and dashboards. All three services export OTLP to the
collector; the connector shares the VPN sidecar's netns, so its AWG_CONF must not
carry a DNS= directive (that would hijack resolv.conf and stop it resolving
otelcol; without it the netns uses Docker's resolver, which resolves both
otelcol and api.telegram.org). Inter-service traffic uses a private internal
network (project-scoped DNS); only caddy joins the shared external edge network
(alias scrabble).
Two contours, two secret/variable prefixes (TEST_ / PROD_):
- Test: auto-deploys on a PR into — or a push to —
development(.gitea/workflows/ci.yaml→docker compose up -d --buildon the Gitea runner host, thenGET /+GET /app/probes through caddy — the landing container and the gateway). The host caddy terminates TLS and forwards the domain toscrabble:80, so the in-compose caddy serves plain HTTP (CADDY_SITE_ADDRESS=:80). The in-compose caddy trusts X-Forwarded-For from private-range upstreams (trusted_proxies private_ranges), so the real client IP — used for chat-moderation logging and the gateway's per-IP rate limiting — survives the host-caddy hop; in prod (no host caddy) public clients are untrusted and Caddy uses the real peer, so the single config is correct and spoof-safe in both contours. - Prod: a manual SSH deploy after
development → master. There is no host caddy, so the contour ships its own caddy terminating TLS — setCADDY_SITE_ADDRESSto the domain and the caddy does its own ACME.
14. CI & branches
- Two long-lived branches:
developmentis the integration trunk andmasterthe production trunk;feature/*branches are cut fromdevelopmentand PR back into it (the genesis commit necessarily landed onmaster). A commit to afeature/*branch triggers nothing. - A single
.gitea/workflows/ci.yaml(Gitea has no cross-workflowneeds) runs the suite on a PR intodevelopment/masterand on a push todevelopment. Itsunit(gofmt/vet/build/unit-test),integration(Postgres-backedintegrationtag, testcontainerspostgres:17-alpine, Ryuk off, serial) andui(check/unit/build/bundle-budget/e2e) jobs are path-conditional (achangesjob filters by changed paths), and an always-runninggatejob aggregates them (passing when each succeeded or was skipped) and is the single branch-protection required check (CI / gate), so a path-skipped job never blocks a merge. - A gated
deployjob auto-rolls the test contour on a PR into — or a push to —development(docker compose up -d --buildon the runner host), then probes the gateway (GET /) and the Telegram connector's liveness (viadocker inspect: running, not restarting, stable restart count, with a VPN-handshake grace period, since the connector has no public ingress and a crash-loop is otherwise invisible). A PR intomasteris test-only; the prod deploy is the manual workflow. Secrets/variables are prefixedTEST_/PROD_per contour. - The engine consumes
scrabble-solveras a published, versioned module (gitea.iliadenisov.ru/developer/scrabble-solver, pinned inbackend/go.mod); both Go workflows setGOPRIVATE=gitea.iliadenisov.ru/*so go fetches it directly from this Gitea (no public proxy/checksum DB, no sibling clone). The dictionaries ship as a release artifact from thescrabble-dictionaryrepo; the workflows downloadscrabble-dawg-<DICT_VERSION>.tar.gzand point the engine tests at it viaBACKEND_DICT_DIR. - After any push, the run is watched to green before a stage is declared done
(
python3 ~/.claude/bin/gitea-ci-watch.py).