Files
scrabble-game/docs/ARCHITECTURE.md
T
Ilia Denisov e01faae28a
CI / changes (pull_request) Successful in 2s
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Has been skipped
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m14s
Stage 17 round 6 (#18, PR D): admin Messages moderation section
A new /_gm/messages console page lists posted chat messages (nudges
excluded) newest-first — time, source (guest/robot/oldest identity kind),
sender (linked to the user card), IP, body, game (linked to the game card)
— searchable by sender name / external-id glob masks and pinnable to one
game (?game=) or sender (?user=), linked from the game and user cards.

The list query lives in social (raw SQL, kind='message', source via a SQL
CASE), reusing the now-exported account.LikePattern. Server-rendered
adminconsole MessagesView + messages.gohtml, 50/page via the shared pager.

Tests: adminconsole render case; backend integration AdminListMessages
(real Postgres) — nudge exclusion, game/sender pins, glob masks, source.
Docs: ARCHITECTURE section 8 chat moderation, PLAN round-6.
2026-06-08 19:58:55 +02:00

646 lines
45 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Scrabble Game — Architecture
Source of truth for the platform architecture, transport, security model and
cross-service contracts. User-visible behaviour per domain lives in
[`FUNCTIONAL.md`](FUNCTIONAL.md); the staged build order lives in
[`../PLAN.md`](../PLAN.md). This document always describes the **current**
design, not the history of how it was reached. Sections describing
not-yet-implemented components are marked *(planned)*.
## 1. Overview
Three executables plus per-platform side-services:
- **`gateway`** — the only public ingress (module `scrabble/gateway`). Performs
anti-abuse (rate limiting), authenticates the player against the originating
platform (or an email/guest session), resolves the internal `user_id`, and
forwards authenticated traffic to `backend` with an `X-User-ID` header. Serves the
backend's admin console at `/_gm` on its public listener behind HTTP Basic Auth.
Bridges live events from `backend` to the
client. The shared wire contracts (the push proto and the FlatBuffers edge
payloads) live in `scrabble/pkg`, imported by both `gateway` and `backend`.
- **`backend`** — internal-only service that owns every domain concern:
identity/sessions, accounts and linking, lobby and matchmaking, the game
runtime, the robot opponent, chat, notifications, statistics, history, and
administration. Embeds the **`scrabble-solver`** engine **as a library,
in-process** — there is no per-game container. The only network consumer of
`backend` is `gateway` (plus platform side-services over an internal API).
- **`ui`** — pure-HTML5 client (plain Svelte 5 + TypeScript + Vite, static build;
no SvelteKit). Talks to `backend` only through `gateway` over Connect-RPC +
FlatBuffers, with the edge TS bindings generated from the **same** `edge.proto`
and `scrabble.fbs` and committed under `ui/src/gen/`. The **playable slice**
(Stage 7) covers auth, "my games", auto-match, the board (play/pass/exchange/
resign), hint, word-check, chat/nudge, the live stream, i18n (en/ru) and a profile
view; the social/account/history surfaces follow in Stage 8. There is no board on
the wire — the client **reconstructs the 15×15 board by replaying the move
journal** (§9.1) and renders board, tiles, premium squares and effects as pure
CSS + Unicode (no image/font/SVG assets). Tiles are placed by Pointer-Events drag
or tap; a CSS-token theme is light/dark and Telegram-themeParams-ready; navigation
is a hash router and the session token is held in memory + IndexedDB. A build-flagged
in-memory mock transport (`pnpm start`) runs the whole slice with no backend.
Embeddable in platform webviews; packageable to native (iOS/Android) via Capacitor.
The client uses a mobile-app shell (a growing nav bar; content pinned to the bottom),
a one-line **announcement banner** under the nav (a client-side mock rotation today —
a server-driven channel later, §10), and a client **board-style** setting (bonus-label
mode). The visual/interaction design system is documented in
[`UI_DESIGN.md`](UI_DESIGN.md).
- **`platform/telegram`** — the Telegram side-service (the "connector", module
`scrabble/platform/telegram`). It is the only component holding the bot tokens — **one
bot per service language** (`en`/`ru`), each its own token + game channel, the same
Telegram user id spanning both (§3). It
runs a Bot API long-poll loop per bot (Mini App launch + `/start` deep-links) and serves
a gRPC API (`pkg/proto/telegram/v1`) that `gateway` (Mini App initData validation
and out-of-app push) and `backend` (operator broadcasts) call over the
trusted internal network. Its generic delivery methods are **platform-agnostic**
(keyed by the identity `external_id`), so a future VK/MAX connector reuses them; only
initData validation is Telegram-specific. It runs in its own container, egressing to
Telegram through a VPN sidecar.
```mermaid
flowchart LR
Client((Client / webview)) -- Connect-RPC + FlatBuffers (h2c) --> Gateway
Gateway -- REST/JSON, X-User-ID --> Backend
Backend -- gRPC server-stream (live events) --> Gateway
Gateway -- in-app stream --> Client
Backend -- pgx --> Postgres[(Postgres)]
Backend -. embeds .- Solver[[scrabble-solver library]]
Gateway -- gRPC (validate initData, out-of-app push) --> Telegram[Telegram connector]
Backend -. operator broadcasts (gRPC) .-> Telegram
Telegram -- Bot API (via VPN sidecar) --> TgCloud((Telegram))
```
The MVP runs `gateway` and `backend` as single-instance processes inside a
trusted network. No Redis is planned (anti-replay crypto was deliberately
dropped). Horizontal scaling is explicit future work.
## 2. Transport
- **client ↔ gateway**: **Connect-RPC + FlatBuffers** over HTTP/2 cleartext
(`h2c`). Binary payloads, server-streaming for the in-app live channel,
first-class JS clients (`@connectrpc/connect-web` + the `flatbuffers` npm
package). The contract is kept minimal: a single `Gateway` service (defined in
`gateway/proto/edge/v1`) with `Execute(message_type, payload, request_id)` for
unary operations and `Subscribe` for the live stream. The proto envelope is a
thin carrier; the real request/response and event bodies are **FlatBuffers**
tables (`pkg/fbs`, the `scrabblefb` namespace) inside the `payload` bytes, which
the gateway transcodes to and from the backend's JSON. The session token rides
in the `Authorization: Bearer` header (there is no per-request signing, §3);
auth operations are unauthenticated and return the minted token. A unary
operation's domain outcome rides back in `ExecuteResponse.result_code` (HTTP
200); only edge failures (rate limit, missing session, unknown type, internal)
surface as Connect error codes.
- **Alphabet on the wire (Stage 13)**: live play exchanges **alphabet indices**, not
concrete letters. The rack (`StateView.rack`), the `SubmitPlay`/`Evaluate` tiles, the
`Exchange` tiles and the `CheckWord` word are `ubyte` indices into the variant's alphabet
(a blank is the sentinel index **255**). The client is **alphabet-agnostic**: on a
per-variant cache miss it sets `StateRequest.include_alphabet`, and the backend embeds the
variant's `(index, letter, value)` table (`engine.AlphabetTable`, derived from the solver
ruleset — no dictionary) for display; the client caches it by variant and renders the rack
and the blank chooser from it. The backend maps index↔letter at its REST edge, so the
gateway forwards indices **verbatim** (it holds no alphabet table) and the engine's
letter-based domain API — shared with the robot — is unchanged. The table is pinned by the
solver version, so it cannot drift from the running backend. The **move journal, history
and GCG are unaffected** (they stay decoded concrete characters, §9.1).
- **gateway ↔ backend (sync)**: plain HTTP REST/JSON. The gateway injects
`X-User-ID` for authenticated requests; `backend` never re-derives identity
from the body.
- **backend → gateway (live)**: a single gRPC server-stream carries live events
(your-turn, opponent-moved, chat, nudge). The gateway bridges them to the
client's in-app stream while the app is open. Out-of-app delivery uses
platform-native push via the platform side-service.
## 3. Authentication & sessions
Platform-native, deliberately simple: **no Ed25519 client keys, no per-request
signing, no anti-replay crypto** (these were considered and dropped — players
arrive from a platform rather than completing a mandatory registration).
- The gateway validates the originating credential **once** — Telegram `initData`
(delegated to the connector's `ValidateInitData` RPC, which holds the bot token —
the HMAC secret — so it never reaches the gateway), an email-code login, or a guest
bootstrap — then mints a **thin opaque server session token** (`session_id`). First
Telegram contact seeds the new account's language (from the launch `language_code`)
and display name (§4).
- **Service language & variant gating (Stage 15).** The connector hosts **one bot per
service language** (`en`/`ru`), each its own token + game channel; the same Telegram
user id spans both. `ValidateInitData` tries each token in turn and returns the
validating bot's **service language** and its **supported-languages set**. The set
rides the **`Session`** (FlatBuffers, session-scoped, not persisted): the UI offers
only the variants those languages support on New Game (`en` → English; `ru` → Russian
+ Эрудит). **Starting** a new game is the only gated action — opening and playing
existing games of any language is unrestricted, and the backend does not enforce the
gate (it is a product affordance, not a trust boundary). The service language is
**persisted** per account (`accounts.service_language`, updated on every Telegram
login — last-login-wins) and routes the user's out-of-app push back through the right
bot (§10); it is distinct from `preferred_language` (the interface language) and from
a game's variant language. Non-Telegram logins (web / email / guest) carry the
gateway's default set (`GATEWAY_DEFAULT_SUPPORTED_LANGUAGES`, all variants by default).
- The client holds `session_id` in memory for the app session (browser/OS
storage is optional and may be unavailable; losing it means re-login).
- The gateway caches `session → user_id` and injects `X-User-ID`. Session
records live in `backend`, which stores only a **SHA-256 hash** of the opaque
token (never the plaintext), keeps a warmed in-memory cache for fast
resolution, and treats sessions as **revoke-only** — they have no TTL and live
until explicitly revoked (`status``revoked`). A revoke can target one token or,
on an account merge (§4), **every** session of the retired account
(`RevokeAllForAccount`, which also evicts them from the warm cache).
- **Guest** = ephemeral web session (no platform, no email). A guest is backed by
a durable `accounts` row flagged `is_guest` and carrying **no identity** — the
row is a technical necessity (the `sessions` and `game_players` foreign keys
require one, the same way the robot pool is durable), not a profile: no
friends, statistics or history are kept for it, and it is restricted to
auto-match. Platform and email users are auto-provisioned **durable** accounts
with an identity. (Reaping abandoned guest rows is deferred — PLAN.md TODO-3.)
## 4. Accounts, identities, linking & merge
- One internal account may carry several **platform identities**
(`telegram`, `vk`, …) plus an optional **email** identity. First contact from
a platform auto-provisions a durable account bound to that platform identity.
Concretely, platform and email identities share one `identities` table keyed by
a unique `(kind, external_id)`; email is an identity with `kind=email` and a
`confirmed` flag. A synthetic `kind='robot'` identity (Stage 5) backs each pooled
robot opponent (§7). The **email confirm-code flow** (Stage 4) binds an email to the
authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute
TTL, ≤ 5 attempts) is sent through a `Mailer` seam (an SMTP relay, or a
development log mailer when none is configured) and, once verified, attaches a
confirmed email identity. Accounts and identities use application-generated
**UUIDv7** primary keys. A service flag `paid_account` (lifetime one-time
payment; no purchase flow yet) is carried on the account and ORed on a merge.
- **Linking** (Stage 11) is initiated from an authenticated profile and proves
control of the identity before attaching it: **email** through the confirm-code
flow, **Telegram** through the web **Login Widget** (validated by the connector,
HMAC under `SHA-256(bot_token)` — distinct from Mini App initData; the gateway
passes the trusted `external_id` to the backend, as for `auth.telegram`). The
request step **always** sends/accepts the proof (no pre-send "already taken"
signal, so a probe cannot enumerate registered addresses); a required **merge**
is revealed **only after** the proof is verified and is performed behind an
explicit, irreversible confirmation. A free identity is simply attached (and a
guest is promoted to durable, clearing `is_guest`).
- **Merge** retires the account that owns the linked identity into the **current**
account, in a single transaction (`internal/accountmerge`): statistics summed
(max points kept), the hint wallet summed, `paid_account` ORed, identities
repointed, games / chat / complaints transferred, friends and blocks
de-duplicated (friendships keep the strongest status accepted>pending>declined),
pending invitations/codes dropped, and the secondary kept as an **audit
tombstone** (`accounts.merged_into`/`merged_at`) so a shared **finished** game's
no-cascade foreign keys stay valid — its seat there is left untouched. A merge is
**refused** only when the two share an **active** game. The current account is the
primary, **except** when the initiator is a **guest** and the linked identity
already has a **durable** owner: then the durable account wins, the guest's active
games move into it, the guest is retired, and a **fresh session** is minted for the
durable account (the client switches to it). The secondary's sessions are revoked
(§3). High blast-radius; an isolated, well-tested stage.
## 5. Game engine integration (`scrabble-solver`)
`backend` embeds the solver library in-process behind `internal/engine`, the
only package that imports `scrabble-solver` (see [`CLAUDE.md`](../CLAUDE.md) for
the solver's public API and constraints). The engine is a self-contained rules
library — no persistence, transport or scheduling; the game domain drives it.
Key points:
- Variants at launch: **English Scrabble**, **Russian Scrabble**, **Эрудит**
(`engine.Variant`, mapping to `rules.English()` / `RussianScrabble()` /
`Erudit()`). Эрудит's specifics (non-doubling centre, `ё` with no tiles, 3
blanks, a 15-point bonus) live entirely in the solver ruleset, so the engine
treats every variant uniformly.
- **Dictionaries** are committed DAWGs loaded with `dawg.Load` from the
directory `BACKEND_DICT_DIR`; `backend` loads the `engine.Registry` at startup
as a hard dependency (like migrations), so a missing dictionary fails the boot.
The registry holds dictionaries in memory addressed by `(variant,
dict_version)`, tracking the latest version per variant, and answers the
word-check tool through `Registry.Lookup`.
- **Dictionary versioning — pin per game.** A game records the `dict_version` it
started on and finishes on that version; new games use the latest. Multiple
versions may be resident at once. The boot version loads from the flat
`BACKEND_DICT_DIR`; the admin console **hot-reloads** a new version from a
per-version subdirectory `BACKEND_DICT_DIR/<version>/` through
`Registry.LoadAvailable` (only the variants whose DAWG is present there), and a
restart re-loads every resident version via `engine.OpenWithVersions` (the flat
boot version plus each subdirectory). In-flight games keep their pinned version;
new games use the latest. (The solver is published as a versioned module and the
dictionaries ship as a separate versioned **release artifact** from the
`scrabble-dictionary` repo — TODO-1/TODO-2, Stage 14; the runtime contract above is
unchanged.)
- Move generation/validation/scoring use `Solver.GenerateMoves` (ranked),
`Solver.ValidatePlay` and `Solver.ScorePlay`; board mutation uses
`scrabble.Apply`. The engine adds its own deterministic, seeded tile **bag**
that can return tiles (an exchange needs this; the solver's self-play bag
cannot).
- **`engine.Game`** is the in-memory match state and the pure rules engine: it
deals racks, applies legal plays / passes / exchanges / resignations, refills
from the bag, keeps the scores and whose turn it is, and **detects the end of
the game** — empty bag with an empty rack, or six consecutive scoreless turns,
applying the end-game rack-value adjustment, or a resignation. On a
**resignation the resigner keeps their accumulated score (no rack adjustment)
and never wins**: the win goes to the highest score among the remaining seats,
unconditionally the other player in a two-player game. A player may resign **on the
opponent's turn** (a forfeit is not a turn-scoped move): `engine.ResignSeat(seat)`
resigns that player's own seat whoever is to move, and the game domain skips the turn
check for resign (Stage 17). The engine exposes a
decoded, solver-free API (`SubmitPlay`/`SubmitExchange`/`EvaluatePlay`/
`HintView`/`Hand`) so `internal/game` drives it without importing the solver.
- The **game domain** (`internal/game`) owns everything the engine does not —
persistence, turn scheduling, the configurable turn timeout / auto-resign, the
hint budget, word-check complaints, history and GCG — and is the engine's only
consumer. Timeout auto-resign reuses `engine.Resign`, recording the move as a
timeout, so it inherits the resignation win/loss.
- History is dictionary-independent (§9.1): the engine emits decoded
`MoveRecord`s and reconstructs the board from them with `engine.ReplayBoard`
(alphabet only, no dictionary).
## 6. Game rules
- **Word legality: validate-at-submit.** An illegal play is rejected by
`Solver.ValidatePlay`; there is no challenge phase.
- **End of game**: the bag is empty **and** a player empties their rack, **or**
**6 consecutive scoreless turns** (passes/exchanges), **or** a resignation, or
a missed turn. The **per-game turn timeout** is chosen at creation
(5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h); a turn not made within it
becomes an automatic resignation, applied by a background sweeper. The sweeper
honours each player's **away window** — a daily local-time sleep interval on the
account (default 00:0007:00, midnight-cross aware) — so a player is never
timed out while asleep.
- **Players**: auto-match is always 2 players; friend games are 24 players.
`backend` owns turn order and the bag for any player count. A resignation or
timeout in a two-player game ends it with the other player winning. In a game
with **three or more seats** a resignation or timeout **drops that seat and the
rest play on** — the engine skips the resigned seat in the turn rotation and
excludes it from the win, finishing the game (the sole survivor wins) only once
one active seat remains, or by the ordinary end conditions among the active
seats. A per-game **drop-out tile disposition**, chosen at creation
(`dropout_tiles`: `remove` from play — the default — or `return` to the bag),
governs the leaver's rack, which is **never revealed** to the remaining players;
it is recorded for deterministic journal replay. (Two-player games end on the
first drop-out, so the disposition does not affect them.)
- **Hint**: governed by two per-game settings — whether hints are allowed and the
starting per-player allowance — plus a per-account hint **wallet**
(`hint_balance`, spent after the allowance; top-ups are a later feature). A hint
reveals the top-1 ranked move (`GenerateMoves[0]`). The lobby/tournament caller
picks the per-game defaults (e.g. one in casual random games, none in
tournaments). The client **lays the hinted tiles onto the board** as a pending
placement and leaves the commit to the player. When the rack has no legal move the
service spends **nothing** and returns `ErrNoHintAvailable` — surfaced as the distinct
result code `no_hint_available` (separate from `hint_unavailable`) so the UI can say
"no options" rather than "no hints left".
- **Word-check tool**: unlimited dictionary lookups against the game's pinned
dictionary; each result offers a **complaint** (complainant, game, variant,
dict_version, word, the disputed result, an optional note) that lands in the admin
review queue. An operator resolves it (`open → resolved`) with a **disposition**
reject, accept-add or accept-remove; the accepted ones form a derived
**pending-changes** list that feeds the offline dictionary rebuild and is marked
applied once the rebuilt version is hot-reloaded (§5, §12).
## 7. Robot opponent
Substitutes for a human in 2-player auto-match when the pool yields no human
within 10 seconds (§8). It lives in `internal/robot` and plays as an ordinary
seated account through the game service, so only `internal/engine` imports the
solver. It is designed to be indistinguishable from a person.
The robot keeps **no per-game state**: every choice is derived deterministically
from the game's bag `seed` (a restart-stable FNV-1a mix), so a background driver
(`robot.Service.Run`, mirroring the turn-timeout sweeper) recomputes the same
behaviour on every scan and after a restart — the same philosophy as journal
replay. A pool of durable accounts — each a `kind='robot'` identity (§4), keyed
`robot-<lang>-<index>` and provisioned at startup with **chat blocked but friend
requests open** — a request to a robot is accepted as pending and expires unanswered
(the robot never responds), mirroring a human who ignores it (Stage 17); the chat
block backs the human-like names (there is no DM surface; chat is per-game). Names are
**composed per language** from a first-name pool (32 full + 32 colloquial forms) and
a surname pool (gender-agreed for Russian) in one of three forms (first only /
first + surname initial / first + full surname), deterministically per pool slot so
they stay stable across restarts. Substitution is **variant-aware**: a Russian game
(Russian Scrabble or Эрудит) draws a Russian-named robot with at most ~20% Latin, an
English game the Latin pool.
- **Balance**: at game start it decides once whether to play to win, with
`P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%), derived from the seed.
Adaptive difficulty is post-MVP.
- **Margin targeting**: each turn it picks from the ranked candidates
(`engine.Candidates`) the move whose resulting lead (playing to win) or deficit
(playing to lose) is closest to a small band (**130 points**), rather than
always the maximum; with no legal play it exchanges a full rack when the bag can
refill it, else passes.
- **Timing**: the per-move delay is **move-number-aware** — a right-skewed sample
(exponent k=4, short delays frequent) from a band that interpolates from
**[3, 10] min** at the first move to **[10, 90] min** by ~28 moves, so openings are
quick and the endgame can run long, clamped to **[1, 90] minutes**; it
**sleeps 00:0007:00** anchored to the **opponent's** profile timezone with a
per-game drift of **±3 h** (fallback UTC), so its night overlaps the human's
rather than running anti-phase; on a daytime nudge it replies near the move's lower
band; it proactively nudges the human after **12 hours** idle (subject to the
once-per-hour chat limit).
- **Observability**: robot accounts accrue ordinary statistics (§9) — the
authoritative balance metric (target ≈ 40% robot wins) — and a
`robot_games_finished_total` OTel counter plus a per-finish log give a live view.
The **admin game card** surfaces each robot seat's per-game play-to-win intent (from
the seed) and, on the robot's turn, its deterministic **next-move ETA** (Stage 17).
## 8. Lobby & social
- **Matchmaking**: an **in-memory** FIFO pool keyed by `variant` (the variant
fixes the board language), pairing the next two humans into a two-player
auto-match with the seat order randomised for first-move fairness. The pool is
lost on restart (players re-queue) and is anonymous, so it does not consult
blocks. After **10 s** with no human a background reaper substitutes a pooled
robot (§7) and starts the game. On a pairing or substitution the matchmaker
emits a **match-found** notification (§10), delivered over the live stream;
`Poll` remains as a fallback for a client that is not currently streaming.
**Cancel** (`POST /lobby/cancel`) removes the player from the pool and drops any
pending matched result, so a cancelled quick-match is dequeued rather than left for
the reaper to robot-substitute (Stage 17).
- **Friends** (Stage 8): two add paths over one `friendships` table. A **one-time
code** the to-be-added player issues (a `friend_codes` row: 6-digit numeric,
SHA-256-hashed, **12 h** TTL, one live code per issuer, single-use, redeem
rate-limited) is redeemed by the other player to become friends immediately.
Alternatively a **request → accept** is sent to someone you **share a game with**
(active or finished); the recipient may accept, ignore (the pending row lazily
expires after **30 days** and may be re-sent), or **decline** — a decline is
remembered (`status='declined'`) and blocks further requests from that sender,
unless they hand them a code, which overrides it. The requester's own cancel still
deletes the row; blocking someone severs an existing friendship. (Discovery by
friend list or platform deep-link arrives with Stage 9 / TODO-5.)
- **Block**: two independent **global** account toggles (`block_chat`,
`block_friend_requests`) **plus** a **per-user block list**. A per-user block is
applied mutually: it hides the pair's chat from each other and refuses friend
requests and game invitations between them.
- **Friend games**: formed by **invitation → accept** (an `game_invitations`
record with one row per invitee). The 24 player game starts once **every**
invitee accepts; any decline cancels the invitation, and a pending invitation
expires after 7 days (enforced lazily on access).
- **Chat**: per-game, persisted (kept with the game's archive), **≤ 60 runes**,
and **validated on input** — links, email addresses and phone numbers (including
lightly obfuscated forms) are rejected, since the chat is for quick reactions,
not contact exchange. Each message stores the sender's IP (forwarded by the
gateway in Stage 6) for moderation. A sender who has disabled chat cannot post,
and messages from a blocked sender are hidden from the viewer. The operator console
has a **Messages** section (Stage 17) that lists posted messages (nudges excluded)
newest-first with the sender's resolved name, **source** (guest / robot / oldest
identity kind), IP and game, searchable by sender name / external-id glob masks and
pinnable to one game or sender (linked from the game and user cards).
- **Nudge**: folded into the chat as a `nudge` message kind. The player awaiting
the opponent may nudge **once per hour per game**; it is not allowed on one's own
turn. The platform-native delivery is wired with the gateway / platform
side-service (Stage 6 / 8).
- **Profile**: `preferred_language` (en/ru, edited in Settings), display name, email
(confirm-code binding, see §4), **timezone**, the daily **away window** and the
block toggles — all editable through `account.UpdateProfile`, which validates them
(Stage 8): a display name is Unicode letters joined by single ` `/`.`/`_`
separators (no leading/trailing/adjacent separators, ≤ 32 runes); the timezone is a
fixed `±HH:MM` **UTC offset** (or a legacy IANA name) resolved by `account.ResolveZone`
for the sweeper and the robot's sleep (a fixed offset trades DST for a simple
picker); the away window is at most **12 h** (midnight-wrap aware). Linked platform
accounts and merge are Stage 11.
## 9. Persistence
- Single Postgres database, schema `backend`; `backend` is the only writer. The
"pgx pool" is a `database/sql` handle backed by the pgx stdlib driver and
instrumented with otelsql; type-safe queries use **go-jet** (code generated
into `internal/postgres/jet` and committed, regenerated by `cmd/jetgen`).
Migrations are embedded SQL applied with `pressly/goose/v3` at startup. Primary
keys are application-generated **UUIDv7**.
- Tables: `accounts` (durable internal accounts; Stage 3 added the away-window
columns `away_start`/`away_end` and the hint wallet `hint_balance`; Stage 6's
migration `00005` added the `is_guest` flag for ephemeral guest rows; Stage 9's
migration `00007` added the `notifications_in_app_only` out-of-app push toggle;
Stage 11's migration `00009` added the `paid_account` service flag and the
merge-tombstone columns `merged_into`/`merged_at`),
`identities` (platform/email/robot identities, unique `(kind, external_id)`;
Stage 5's migration `00004` admits the `robot` kind),
`sessions` (revoke-only opaque-token hashes), the Stage 3 game tables
`games` (Stage 4 added the `dropout_tiles` disposition column), `game_players`,
`game_moves` (the move journal), `complaints` and `account_stats`, and the
Stage 4 social/lobby tables `friendships` (the request/accept graph), `blocks`
(per-user blocks), `chat_messages` (per-game chat and nudges), `email_confirmations`
(pending confirm-codes) and `game_invitations` / `game_invitation_invitees`
(friend-game invitations). Stage 8's migration `00006` widened the `friendships`
status to admit `declined` and added `friend_codes` (one-time add-a-friend codes).
The matchmaking pool is **in-memory** and persists nothing.
- **Active games are event-sourced.** A game is a `games` row (pinned
`variant`/`dict_version`, bag `seed`, the per-game settings, and a denormalised
turn cursor) plus an append-only, decoded move journal (`game_moves`); the live
position is an `engine.Game` held in an in-memory cache (≈24 h idle TTL) and
rebuilt by replaying the journal on a miss, which the seeded bag makes exact.
Each game is serialised by a per-game lock; a persistence failure evicts the
live game so the next access rebuilds from the journal. `game_players` records
each seat's account, running score, hints used and winner flag.
- **Statistics** (`account_stats`, recomputed on each finish for durable
non-guest accounts only — the finish-time recompute skips any `is_guest`
seat): wins, losses, **draws**, max points in a game, and
max points for a single **move** (which already folds in every word the move
formed plus the all-tiles bonus). A tie increments draws only; a resignation or
timeout is a loss for the acting player.
### 9.1 History invariant (must hold forever)
Archived games must replay **independently of any dictionary and of the
solver's internal encoding** — at least visually. Therefore the move journal
persists only **decoded concrete values**: action kind (play / pass / exchange /
resign / timeout), acting player, per-move score and running total, timestamp,
and — in a per-move JSON payload — the acting player's rack before the move (with
`?` for a blank), and for a play its direction, main-word anchor, placed tiles
(letter as text, coordinate, blank flag) and the words formed; for an exchange,
the swapped tiles. This is exactly what is needed both to **replay the game
through the engine** (a cache miss) and to render history or emit GCG **without a
dictionary**: the board for visual replay is reconstructed by applying placements
onto an empty grid, since moves were validated at play time and scores are
stored. `variant` and `dict_version` are kept as **metadata only** (audit,
complaint review), never as a replay dependency. **GCG export** is derived from
the same rows and is likewise self-contained — we ship our own writer (the solver
exposes none): the standard Poslfit dialect (UTF-8, `#player`/`#lexicon`
pragmas, `8G`/`H8` coordinates, lower-case blanks, `.` pass-throughs, `-TILES`
exchanges), plus `#note` lines for resignations and timeouts, which the standard
does not cover. **GCG export is offered only on a finished game** (`game.ErrGameActive`
otherwise, Stage 8), so an in-progress journal is never leaked mid-play; the client
shares the `.gcg` file via the Web Share API where available, else downloads it.
The Stage 13 alphabet-on-the-wire change does **not** touch this invariant: the live edge
exchanges alphabet indices, but the persisted journal (and everything derived from it —
replay, history, GCG) keeps the decoded concrete letters described above, so an archived
game still replays with the variant's `rules.Alphabet` alone, independent of any dictionary.
## 10. Notifications
Two channels: the **in-app live stream** (delivered from Stage 6) and
**platform-native push** (out-of-app, via the platform side-service — Stage 9).
The backend emits notification intents through an in-process hub
(`internal/notify`, a `Publisher` seam installed on the game, social and lobby
services); a single backend→gateway **gRPC server-stream** (`Push.Subscribe`,
`pkg/proto/push/v1`) carries every event, and the gateway fans them out by
`user_id` to each client's Connect `Subscribe` stream while the app is open. The
catalog is **your-turn** and **opponent-moved** (emitted from the game commit, so
robot-driver and timeout-sweeper moves emit too; opponent-moved goes to **every seat,
including the mover**, so the mover's own other devices and their lobby refresh — it is
in-app only, so the actor gets no out-of-app push for their own move), **chat-message** and **nudge**
(from the social service), **match-found** (from the matchmaker, §8), and **notify**
(Stage 8 — a lightweight "re-poll" signal carrying a sub-kind: friend-request,
friend-added, invitation or game-started; emitted on a friend-request and invitation
create and on an invitation's game start). Event payloads are FlatBuffers-encoded by
the backend and forwarded verbatim. A client that is not currently streaming falls
back to the matchmaker's `Poll` for match-found and, for the lobby **notification
badge** (incoming friend requests + open invitations), the client polls on lobby
open and on focus as well as re-polling on the `notify` event — covering a push
missed while the app was hidden. **Out-of-app platform push** (Stage 9) is a fallback
the **gateway** routes from the same firehose: for an event whose recipient has **no
live in-app stream** it resolves the backend `/internal/push-target` (their Telegram
`external_id`, the **service language** — the bot they last signed in through, falling
back to the interface language — and the `notifications_in_app_only` flag) and asks the
**Telegram connector** to deliver a localized message with a Mini App deep-link
button — only when the recipient has a Telegram identity and has not confined
notifications to the app, so the two channels never duplicate. The connector routes by
that language to the matching bot and renders the message in it. The out-of-app set is
your-turn, nudge, match-found and the invitation / friend-request notify sub-kinds;
the connector renders the message and skips the rest. Operator broadcasts
(`SendToUser` / `SendToGameChannel`, §10 admin) instead pick the bot by an
**operator-chosen** language in the console, unrelated to the recipient's login. Session-revocation events and
cursor-based stream resume stay deferred (single-instance MVP).
A separate **announcements channel** feeds the client's one-line banner (UI_DESIGN.md).
It is a client-side **mock** rotation today; a server-driven source (operational notices,
promotions) is future work and would deliver short markdown messages (text + links).
## 11. Observability
- Structured logging with `go.uber.org/zap` (JSON). OpenTelemetry tracer and
meter providers are wired in **all three services** (backend, gateway, the
Telegram connector) through a shared `pkg/telemetry` bootstrap, env-gated per
service by `{BACKEND,GATEWAY,TELEGRAM}_OTEL_{TRACES,METRICS}_EXPORTER` with a
default of `none` (so no collector is required locally or in CI). `stdout` is
available for debugging; **`otlp`** (gRPC, endpoint from the standard
`OTEL_EXPORTER_OTLP_*` environment) exports to a collector. The Postgres pool is
instrumented with otelsql and `otelgrpc` traces the backend↔gateway push stream
and the gateway↔connector calls. The OTLP **Collector** (OTLP/gRPC → Prometheus
metrics + Tempo traces), **Prometheus** (15d), **Tempo** (72h) and **Grafana**
(provisioned datasources + dashboards, behind the caddy `/_gm/grafana` Basic-Auth)
are stood up with the deploy (`deploy/`, Stage 16); the default exporter stays
`none`, so CI needs no collector.
- Per-request server-side timing via gin middleware from day one (the access log
carries method, route, status, latency and the active trace id). A
client-measured RTT piggybacked on the next request is a later enhancement.
- Domain/operational metrics (Stage 12), recorded through the meter and invisible
until an exporter is configured: histograms `game_replay_duration` (journal
rebuild on a cache miss), `game_move_validate_duration` and `game_move_duration`
(Stage 17 — a seat's think time per committed move, attributed by `variant` and a
`phase` of opening/middle/endgame; it aggregates **all** seats including robots,
whose synthetic timing dominates the tail, so per-human analysis lives in the admin
console, below); counters `games_started_total`, `games_abandoned_total` (a
turn-timeout seat drop), `chat_messages_total` (`kind` = message/nudge) and
`robot_games_finished_total`; an observable gauge `game_cache_active`; the gateway
`edge_request_duration` (the UI-perceived roundtrip, by `message_type`/`result`);
and Go runtime/heap metrics. Game-scoped metrics carry a `variant` attribute
(english/russian_scrabble/erudit).
- Per-user move-time analytics (Stage 17) are **offline**, derived in the admin
console from the move journal (`game_moves.created_at` deltas, the first move from
the game's creation), not Prometheus labels (which an `account_id` would explode):
the user list shows each account's min/avg/max think time, and the user-detail page
draws a zero-JS inline-SVG chart of min/mean/max by the player's move number.
- User metrics (Stage 16): a backend counter `accounts_created_total` (`kind` =
telegram/email/guest; robots are a provisioned pool, not users, and are excluded)
and a gateway **in-memory** observable gauge `active_users` (`window` = 24h/7d) —
distinct accounts that performed an authenticated edge action in the window. The
gauge is single-process by design (single-instance MVP, §10): it is correct for one
gateway, resets on restart, and is a live operational figure, not a billing count.
- Unauthenticated `GET /healthz` (liveness) and `GET /readyz` (readiness — the
database answers a bounded ping and the session cache is warmed).
- The backend serves a **second listener** — a gRPC server
(`BACKEND_GRPC_ADDR`, default `:9090`) for the live-event push stream to the
gateway — alongside the HTTP listener; both start together and stop on signal.
## 12. Security boundaries
| Concern | Enforced by |
| --- | --- |
| Public rate limiting / anti-abuse | gateway |
| Telegram initData validation (bot-token HMAC) | the Telegram connector; the gateway delegates it over gRPC, so the bot token lives only in the connector |
| Session minting; email-code / guest validation | gateway (with backend) |
| Session → `user_id` resolution, `X-User-ID` injection | gateway |
| Authorisation, ownership, state transitions | backend (`X-User-ID` is the sole identity input) |
| Admin authentication | a single Basic-Auth gate on `/_gm/*`, forwarded **verbatim** to the backend's server-rendered admin console (and, in the deployed contour, routing `/_gm/grafana/*` to Grafana). In the deploy the **caddy** owns this gate (§13); a local non-caddy run uses the gateway's own `GATEWAY_ADMIN_*` proxy. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked |
| backend ↔ gateway ↔ connector trust | the network (only gateway may reach backend; the connector serves unauthenticated gRPC on the internal segment) |
This is an explicit, accepted MVP risk: compromise of the gateway↔backend
network segment defeats backend authentication. Mitigated by network isolation;
mutual auth is a future hardening step.
**Short numeric codes** (email confirm-codes and Stage 8 friend codes) are stored
only as SHA-256 hashes and are short-lived and single-use. The unauthenticated
email path carries a tight per-IP sub-limit (5 / 10 min); the **friend-code redeem**
is authenticated, so it rides the per-user limit (120 / min) and is further bounded
by the code's 12 h TTL, single use, and **one live code per issuer** (which caps the
valid-code population). Brute-forcing a 6-digit friend code within these limits is an
accepted MVP risk with low blast radius (an unwanted friendship is removable/blockable);
a dedicated redeem sub-limit or a longer code is the hardening step if abuse appears.
## 13. Deployment (informational)
Single public origin, path-routed. The gateway **embeds** the static UI build
(`go:embed`, baked in by a node stage in `gateway/Dockerfile`). The Vite build has two
entries: a lightweight **landing page** served at `/`, and the game **SPA** served at
`/app/` (web) and `/telegram/` (the Telegram Mini App; outside Telegram that path
redirects to the root — the client-side guard). Hash-named `/assets/*` are served
`immutable` (a relaunch is a cache hit, not a re-download); the HTML shells are
`no-cache` so a new deploy is picked up. An in-compose **caddy** is the
contour's edge: it owns a single `/_gm` Basic-Auth and routes `/_gm/grafana/*` to
**Grafana** (anonymous-admin, so the one shared login gates it with no per-user
Grafana accounts) and the rest of `/_gm/*` to the backend-rendered **admin console**;
everything else (`/`, `/app/`, `/telegram/`, the Connect edge) goes to the gateway. The
**Telegram connector** runs as a separate container with **no public ingress** — it
long-polls Telegram and egresses through a VPN sidecar, answering only internal gRPC.
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
one Postgres, the connector (+ its VPN sidecar) and the **observability stack**
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
with provisioned datasources and dashboards. All three services export OTLP to the
collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not
carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving
`otelcol`; without it the netns uses Docker's resolver, which resolves both
`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal`
network (project-scoped DNS); only caddy joins the shared external `edge` network
(alias `scrabble`).
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`
(`.gitea/workflows/ci.yaml``docker compose up -d --build` on the Gitea runner
host, then a `GET /` probe through caddy). The host caddy terminates TLS and
forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP
(`CADDY_SITE_ADDRESS=:80`).
- **Prod** (Stage 18): a manual SSH deploy after `development → master`. There is no
host caddy, so the contour ships its own caddy terminating TLS — set
`CADDY_SITE_ADDRESS` to the domain and the caddy does its own ACME.
## 14. CI & branches
- **Two long-lived branches** (Stage 16): **`development`** is the integration
trunk and **`master`** the production trunk; `feature/*` branches are cut from
`development` and PR back into it (the genesis commit necessarily landed on
`master`). A commit to a `feature/*` branch triggers nothing.
- A single `.gitea/workflows/ci.yaml` (Gitea has no cross-workflow `needs`) runs the
suite on a PR into `development`/`master` and on a push to `development`. Its
`unit` (gofmt/vet/build/unit-test), `integration` (Postgres-backed `integration`
tag, testcontainers `postgres:17-alpine`, Ryuk off, serial) and `ui`
(check/unit/build/bundle-budget/e2e) jobs are **path-conditional** (Stage 17 — a
`changes` job filters by changed paths), and an always-running **`gate`** job
aggregates them (passing when each succeeded or was **skipped**) and is the single
branch-protection required check (`CI / gate`), so a path-skipped job never blocks
a merge.
- A gated **`deploy`** job auto-rolls the **test contour** on a PR into — or a push
to — `development` (`docker compose up -d --build` on the runner host), then probes
the gateway (`GET /`) **and the Telegram connector's liveness** (Stage 17 —
`docker inspect`: running, not restarting, stable restart count, with a
VPN-handshake grace period, since the connector has no public ingress and a
crash-loop is otherwise invisible). A PR into `master` is test-only; the prod
deploy is the manual Stage 18 workflow. Secrets/variables are prefixed
`TEST_`/`PROD_` per contour.
- The engine consumes `scrabble-solver` as a **published, versioned module**
(`gitea.iliadenisov.ru/developer/scrabble-solver`, pinned in `backend/go.mod`); both Go
workflows set `GOPRIVATE=gitea.iliadenisov.ru/*` so go fetches it directly from this Gitea
(no public proxy/checksum DB, no sibling clone). The dictionaries ship as a **release
artifact** from the `scrabble-dictionary` repo; the workflows download
`scrabble-dawg-<DICT_VERSION>.tar.gz` and point the engine tests at it via
`BACKEND_DICT_DIR` (TODO-1/TODO-2 discharged in Stage 14).
- After any push, the run is watched to green before a stage is declared done
(`python3 ~/.claude/bin/gitea-ci-watch.py`).