7e75c32d07
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 36s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m7s
- Edge/UX dashboard: aggregate request-rate vs rejection-rate panel (gateway_rate_limited_total by class; no per-user labels). - ARCHITECTURE §2/§11/§12/§13: body cap + explicit h2c sizing, the rate-limit observability pipeline and auto-flag policy, the admin-limiter note (and the caddy-path gap), the landing container topology; fixed the stale 120/min per-user figure. - FUNCTIONAL (+_ru): the Throttled view and the reversible high-rate flag. - gateway/backend/deploy READMEs, TESTING.md, root CLAUDE.md updated. - PRERELEASE.md: R3 interview decisions + implementation refinements logged; tracker R3 -> done (this PR implements it; CI gates the merge).
709 lines
51 KiB
Markdown
709 lines
51 KiB
Markdown
# Scrabble Game — Architecture
|
||
|
||
Source of truth for the platform architecture, transport, security model and
|
||
cross-service contracts. User-visible behaviour per domain lives in
|
||
[`FUNCTIONAL.md`](FUNCTIONAL.md); the staged build order lives in
|
||
[`../PLAN.md`](../PLAN.md). This document always describes the **current**
|
||
design, not the history of how it was reached. Sections describing
|
||
not-yet-implemented components are marked *(planned)*.
|
||
|
||
## 1. Overview
|
||
|
||
Three executables plus per-platform side-services:
|
||
|
||
- **`gateway`** — the only public ingress (module `scrabble/gateway`). Performs
|
||
anti-abuse (rate limiting), authenticates the player against the originating
|
||
platform (or an email/guest session), resolves the internal `user_id`, and
|
||
forwards authenticated traffic to `backend` with an `X-User-ID` header. Serves the
|
||
backend's admin console at `/_gm` on its public listener behind HTTP Basic Auth.
|
||
Bridges live events from `backend` to the
|
||
client. The shared wire contracts (the push proto and the FlatBuffers edge
|
||
payloads) live in `scrabble/pkg`, imported by both `gateway` and `backend`.
|
||
- **`backend`** — internal-only service that owns every domain concern:
|
||
identity/sessions, accounts and linking, lobby and matchmaking, the game
|
||
runtime, the robot opponent, chat, notifications, statistics, history, and
|
||
administration. Embeds the **`scrabble-solver`** engine **as a library,
|
||
in-process** — there is no per-game container. The only network consumer of
|
||
`backend` is `gateway` (plus platform side-services over an internal API).
|
||
- **`ui`** — pure-HTML5 client (plain Svelte 5 + TypeScript + Vite, static build;
|
||
no SvelteKit). Talks to `backend` only through `gateway` over Connect-RPC +
|
||
FlatBuffers, with the edge TS bindings generated from the **same** `edge.proto`
|
||
and `scrabble.fbs` and committed under `ui/src/gen/`. The **playable slice**
|
||
(Stage 7) covers auth, "my games", auto-match, the board (play/pass/exchange/
|
||
resign), hint, word-check, chat/nudge, the live stream, i18n (en/ru) and a profile
|
||
view; the social/account/history surfaces follow in Stage 8. There is no board on
|
||
the wire — the client **reconstructs the 15×15 board by replaying the move
|
||
journal** (§9.1) and renders board, tiles, premium squares and effects as pure
|
||
CSS + Unicode (no image/font/SVG assets). Tiles are placed by Pointer-Events drag
|
||
or tap; a CSS-token theme is light/dark and Telegram-themeParams-ready; navigation
|
||
is a hash router and the session token is held in memory + IndexedDB. A build-flagged
|
||
in-memory mock transport (`pnpm start`) runs the whole slice with no backend.
|
||
Embeddable in platform webviews; packageable to native (iOS/Android) via Capacitor.
|
||
The client uses a mobile-app shell (a growing nav bar; content pinned to the bottom),
|
||
a one-line **announcement banner** under the nav (a client-side mock rotation, **gated off
|
||
in the build until polished after release**, Stage 17 — a server-driven channel later, §10),
|
||
and a client **board-style** setting (bonus-label
|
||
mode). The visual/interaction design system is documented in
|
||
[`UI_DESIGN.md`](UI_DESIGN.md).
|
||
- **`platform/telegram`** — the Telegram side-service (the "connector", module
|
||
`scrabble/platform/telegram`). It is the only component holding the bot tokens — **one
|
||
bot per service language** (`en`/`ru`), each its own token + game channel, the same
|
||
Telegram user id spanning both (§3). It
|
||
runs a Bot API long-poll loop per bot (Mini App launch + `/start` deep-links) and serves
|
||
a gRPC API (`pkg/proto/telegram/v1`) that `gateway` (Mini App initData validation
|
||
and out-of-app push) and `backend` (operator broadcasts) call over the
|
||
trusted internal network. Its generic delivery methods are **platform-agnostic**
|
||
(keyed by the identity `external_id`), so a future VK/MAX connector reuses them; only
|
||
initData validation is Telegram-specific. It runs in its own container, egressing to
|
||
Telegram through a VPN sidecar.
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
Client((Client / webview)) -- Connect-RPC + FlatBuffers (h2c) --> Gateway
|
||
Gateway -- REST/JSON, X-User-ID --> Backend
|
||
Backend -- gRPC server-stream (live events) --> Gateway
|
||
Gateway -- in-app stream --> Client
|
||
Backend -- pgx --> Postgres[(Postgres)]
|
||
Backend -. embeds .- Solver[[scrabble-solver library]]
|
||
Gateway -- gRPC (validate initData, out-of-app push) --> Telegram[Telegram connector]
|
||
Backend -. operator broadcasts (gRPC) .-> Telegram
|
||
Telegram -- Bot API (via VPN sidecar) --> TgCloud((Telegram))
|
||
```
|
||
|
||
The MVP runs `gateway` and `backend` as single-instance processes inside a
|
||
trusted network. No Redis is planned (anti-replay crypto was deliberately
|
||
dropped). Horizontal scaling is explicit future work.
|
||
|
||
## 2. Transport
|
||
|
||
- **client ↔ gateway**: **Connect-RPC + FlatBuffers** over HTTP/2 cleartext
|
||
(`h2c`). Binary payloads, server-streaming for the in-app live channel,
|
||
first-class JS clients (`@connectrpc/connect-web` + the `flatbuffers` npm
|
||
package). The contract is kept minimal: a single `Gateway` service (defined in
|
||
`gateway/proto/edge/v1`) with `Execute(message_type, payload, request_id)` for
|
||
unary operations and `Subscribe` for the live stream. The proto envelope is a
|
||
thin carrier; the real request/response and event bodies are **FlatBuffers**
|
||
tables (`pkg/fbs`, the `scrabblefb` namespace) inside the `payload` bytes, which
|
||
the gateway transcodes to and from the backend's JSON. The session token rides
|
||
in the `Authorization: Bearer` header (there is no per-request signing, §3);
|
||
auth operations are unauthenticated and return the minted token. A unary
|
||
operation's domain outcome rides back in `ExecuteResponse.result_code` (HTTP
|
||
200); only edge failures (rate limit, missing session, unknown type, internal)
|
||
surface as Connect error codes. The client (Stage 17) treats a connectivity edge failure as
|
||
**state, not a per-call toast**: a transport `unavailable` or a `rate_limited` flips a global
|
||
`online` signal that drives a header **"Connecting…"** spinner and softly disables proactive
|
||
actions, and the transport **auto-retries with capped exponential backoff** — every op on a
|
||
rate-limit (the gateway rejected it before processing, so it is safe), but only **read-only**
|
||
ops on `unavailable` (a mutation is never blindly re-sent, to avoid double-applying one whose
|
||
response was lost — its button is disabled while offline and the player re-issues it on
|
||
reconnect). A reachability watcher (a lightweight `profile.get` probe) clears the signal when no
|
||
other traffic is in flight; the live `Subscribe` stream's drop/recovery feeds the same signal.
|
||
**Edge hardening (R3):** every request body on the public listener is capped at
|
||
`GATEWAY_MAX_BODY_BYTES` (default 1 MiB — far above any legitimate payload), both at the HTTP
|
||
layer (`http.MaxBytesReader`) and as the Connect per-message read limit, so an oversized
|
||
`Execute` is refused (`resource_exhausted`) without buffering. The h2c server carries explicit
|
||
sizing: `MaxConcurrentStreams` 250 (the x/net default made visible — a real client holds one
|
||
`Subscribe` stream plus a few unary calls) and a 3-minute connection `IdleTimeout` (a live
|
||
`Subscribe` stream keeps its connection active, so only abandoned connections are reaped); the
|
||
`http.Server` sets only `ReadHeaderTimeout` (10 s) — Read/WriteTimeout would kill the stream.
|
||
R7 revisits the exact values under load.
|
||
- **Alphabet on the wire (Stage 13)**: live play exchanges **alphabet indices**, not
|
||
concrete letters. The rack (`StateView.rack`), the `SubmitPlay`/`Evaluate` tiles, the
|
||
`Exchange` tiles and the `CheckWord` word are `ubyte` indices into the variant's alphabet
|
||
(a blank is the sentinel index **255**). The client is **alphabet-agnostic**: on a
|
||
per-variant cache miss it sets `StateRequest.include_alphabet`, and the backend embeds the
|
||
variant's `(index, letter, value)` table (`engine.AlphabetTable`, derived from the solver
|
||
ruleset — no dictionary) for display; the client caches it by variant and renders the rack
|
||
and the blank chooser from it. The backend maps index↔letter at its REST edge, so the
|
||
gateway forwards indices **verbatim** (it holds no alphabet table) and the engine's
|
||
letter-based domain API — shared with the robot — is unchanged. The table is pinned by the
|
||
solver version, so it cannot drift from the running backend. The **move journal, history
|
||
and GCG are unaffected** (they stay decoded concrete characters, §9.1).
|
||
- **gateway ↔ backend (sync)**: plain HTTP REST/JSON. The gateway injects
|
||
`X-User-ID` for authenticated requests; `backend` never re-derives identity
|
||
from the body.
|
||
- **backend → gateway (live)**: a single gRPC server-stream carries live events
|
||
(your-turn, opponent-moved, chat, nudge). The gateway bridges them to the
|
||
client's in-app stream while the app is open. Out-of-app delivery uses
|
||
platform-native push via the platform side-service.
|
||
|
||
## 3. Authentication & sessions
|
||
|
||
Platform-native, deliberately simple: **no Ed25519 client keys, no per-request
|
||
signing, no anti-replay crypto** (these were considered and dropped — players
|
||
arrive from a platform rather than completing a mandatory registration).
|
||
|
||
- The gateway validates the originating credential **once** — Telegram `initData`
|
||
(delegated to the connector's `ValidateInitData` RPC, which holds the bot token —
|
||
the HMAC secret — so it never reaches the gateway), an email-code login, or a guest
|
||
bootstrap — then mints a **thin opaque server session token** (`session_id`). First
|
||
Telegram contact seeds the new account's language (from the launch `language_code`)
|
||
and display name (§4).
|
||
- **Service language & variant gating (Stage 15).** The connector hosts **one bot per
|
||
service language** (`en`/`ru`), each its own token + game channel; the same Telegram
|
||
user id spans both. `ValidateInitData` tries each token in turn and returns the
|
||
validating bot's **service language** and its **supported-languages set**. The set
|
||
rides the **`Session`** (FlatBuffers, session-scoped, not persisted): the UI offers
|
||
only the variants those languages support on New Game (`en` → English; `ru` → Russian
|
||
+ Эрудит). **Starting** a new game is the only gated action — opening and playing
|
||
existing games of any language is unrestricted, and the backend does not enforce the
|
||
gate (it is a product affordance, not a trust boundary). The service language is
|
||
**persisted** per account (`accounts.service_language`, updated on every Telegram
|
||
login — last-login-wins) and routes the user's out-of-app push back through the right
|
||
bot (§10) — **except a game event, which routes by the game's own language** (its variant →
|
||
en/ru, Stage 17), so a game's notification always comes from the game's bot rather than the
|
||
recipient's latest login bot. The service language is distinct from `preferred_language` (the
|
||
interface language) and from a game's variant language. Non-Telegram logins (web / email / guest) carry the
|
||
gateway's default set (`GATEWAY_DEFAULT_SUPPORTED_LANGUAGES`, all variants by default).
|
||
- The client holds `session_id` in memory for the app session (browser/OS
|
||
storage is optional and may be unavailable; losing it means re-login).
|
||
- The gateway caches `session → user_id` and injects `X-User-ID`. Session
|
||
records live in `backend`, which stores only a **SHA-256 hash** of the opaque
|
||
token (never the plaintext), keeps a warmed in-memory cache for fast
|
||
resolution, and treats sessions as **revoke-only** — they have no TTL and live
|
||
until explicitly revoked (`status` → `revoked`). A revoke can target one token or,
|
||
on an account merge (§4), **every** session of the retired account
|
||
(`RevokeAllForAccount`, which also evicts them from the warm cache).
|
||
- **Guest** = ephemeral web session (no platform, no email). A guest is backed by
|
||
a durable `accounts` row flagged `is_guest` and carrying **no identity** — the
|
||
row is a technical necessity (the `sessions` and `game_players` foreign keys
|
||
require one, the same way the robot pool is durable), not a profile: no
|
||
friends, statistics or history are kept for it, and it is restricted to
|
||
auto-match. Platform and email users are auto-provisioned **durable** accounts
|
||
with an identity. (Reaping abandoned guest rows is deferred — PLAN.md TODO-3.)
|
||
|
||
## 4. Accounts, identities, linking & merge
|
||
|
||
- One internal account may carry several **platform identities**
|
||
(`telegram`, `vk`, …) plus an optional **email** identity. First contact from
|
||
a platform auto-provisions a durable account bound to that platform identity.
|
||
Concretely, platform and email identities share one `identities` table keyed by
|
||
a unique `(kind, external_id)`; email is an identity with `kind=email` and a
|
||
`confirmed` flag. A synthetic `kind='robot'` identity (Stage 5) backs each pooled
|
||
robot opponent (§7). The **email confirm-code flow** (Stage 4) binds an email to the
|
||
authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute
|
||
TTL, ≤ 5 attempts) is sent through a `Mailer` seam (an SMTP relay, or a
|
||
development log mailer when none is configured) and, once verified, attaches a
|
||
confirmed email identity. Accounts and identities use application-generated
|
||
**UUIDv7** primary keys. A service flag `paid_account` (lifetime one-time
|
||
payment; no purchase flow yet) is carried on the account and ORed on a merge.
|
||
- **Linking** (Stage 11) is initiated from an authenticated profile and proves
|
||
control of the identity before attaching it: **email** through the confirm-code
|
||
flow, **Telegram** through the web **Login Widget** (validated by the connector,
|
||
HMAC under `SHA-256(bot_token)` — distinct from Mini App initData; the gateway
|
||
passes the trusted `external_id` to the backend, as for `auth.telegram`). The
|
||
request step **always** sends/accepts the proof (no pre-send "already taken"
|
||
signal, so a probe cannot enumerate registered addresses); a required **merge**
|
||
is revealed **only after** the proof is verified and is performed behind an
|
||
explicit, irreversible confirmation. A free identity is simply attached (and a
|
||
guest is promoted to durable, clearing `is_guest`).
|
||
- **Merge** retires the account that owns the linked identity into the **current**
|
||
account, in a single transaction (`internal/accountmerge`): statistics summed
|
||
(max points kept), the hint wallet summed, `paid_account` ORed, identities
|
||
repointed, games / chat / complaints transferred, friends and blocks
|
||
de-duplicated (friendships keep the strongest status accepted>pending>declined),
|
||
pending invitations/codes dropped, and the secondary kept as an **audit
|
||
tombstone** (`accounts.merged_into`/`merged_at`) so a shared **finished** game's
|
||
no-cascade foreign keys stay valid — its seat there is left untouched. A merge is
|
||
**refused** only when the two share an **active** game. The current account is the
|
||
primary, **except** when the initiator is a **guest** and the linked identity
|
||
already has a **durable** owner: then the durable account wins, the guest's active
|
||
games move into it, the guest is retired, and a **fresh session** is minted for the
|
||
durable account (the client switches to it). The secondary's sessions are revoked
|
||
(§3). High blast-radius; an isolated, well-tested stage.
|
||
|
||
## 5. Game engine integration (`scrabble-solver`)
|
||
|
||
`backend` embeds the solver library in-process behind `internal/engine`, the
|
||
only package that imports `scrabble-solver` (see [`CLAUDE.md`](../CLAUDE.md) for
|
||
the solver's public API and constraints). The engine is a self-contained rules
|
||
library — no persistence, transport or scheduling; the game domain drives it.
|
||
Key points:
|
||
|
||
- Variants at launch: **English Scrabble**, **Russian Scrabble**, **Эрудит**
|
||
(`engine.Variant`, mapping to `rules.English()` / `RussianScrabble()` /
|
||
`Erudit()`). Эрудит's specifics (non-doubling centre, `ё` with no tiles, 3
|
||
blanks, a 15-point bonus) live entirely in the solver ruleset, so the engine
|
||
treats every variant uniformly.
|
||
- **Dictionaries** are committed DAWGs loaded with `dawg.Load` from the
|
||
directory `BACKEND_DICT_DIR`; `backend` loads the `engine.Registry` at startup
|
||
as a hard dependency (like migrations), so a missing dictionary fails the boot.
|
||
The registry holds dictionaries in memory addressed by `(variant,
|
||
dict_version)`, tracking the latest version per variant, and answers the
|
||
word-check tool through `Registry.Lookup`.
|
||
- **Dictionary versioning — pin per game.** A game records the `dict_version` it
|
||
started on and finishes on that version; new games use the latest. Multiple
|
||
versions may be resident at once. The boot version loads from the flat
|
||
`BACKEND_DICT_DIR`; the admin console **hot-reloads** a new version from a
|
||
per-version subdirectory `BACKEND_DICT_DIR/<version>/` through
|
||
`Registry.LoadAvailable` (only the variants whose DAWG is present there), and a
|
||
restart re-loads every resident version via `engine.OpenWithVersions` (the flat
|
||
boot version plus each subdirectory). In-flight games keep their pinned version;
|
||
new games use the latest. (The solver is published as a versioned module and the
|
||
dictionaries ship as a separate versioned **release artifact** from the
|
||
`scrabble-dictionary` repo — TODO-1/TODO-2, Stage 14; the runtime contract above is
|
||
unchanged.)
|
||
- Move generation/validation/scoring use `Solver.GenerateMoves` (ranked),
|
||
`Solver.ValidatePlay` and `Solver.ScorePlay`; board mutation uses
|
||
`scrabble.Apply`. The engine adds its own deterministic, seeded tile **bag**
|
||
that can return tiles (an exchange needs this; the solver's self-play bag
|
||
cannot).
|
||
- **`engine.Game`** is the in-memory match state and the pure rules engine: it
|
||
deals racks, applies legal plays / passes / exchanges / resignations, refills
|
||
from the bag, keeps the scores and whose turn it is, and **detects the end of
|
||
the game** — empty bag with an empty rack, or six consecutive scoreless turns,
|
||
applying the end-game rack-value adjustment, or a resignation. On a
|
||
**resignation the resigner keeps their accumulated score (no rack adjustment)
|
||
and never wins**: the win goes to the highest score among the remaining seats,
|
||
unconditionally the other player in a two-player game. A player may resign **on the
|
||
opponent's turn** (a forfeit is not a turn-scoped move): `engine.ResignSeat(seat)`
|
||
resigns that player's own seat whoever is to move, and the game domain skips the turn
|
||
check for resign (Stage 17). The engine exposes a
|
||
decoded, solver-free API (`SubmitPlay`/`SubmitExchange`/`EvaluatePlay`/
|
||
`HintView`/`Hand`) so `internal/game` drives it without importing the solver.
|
||
- The **game domain** (`internal/game`) owns everything the engine does not —
|
||
persistence, turn scheduling, the configurable turn timeout / auto-resign, the
|
||
hint budget, word-check complaints, history and GCG — and is the engine's only
|
||
consumer. Timeout auto-resign reuses `engine.Resign`, recording the move as a
|
||
timeout, so it inherits the resignation win/loss.
|
||
- History is dictionary-independent (§9.1): the engine emits decoded
|
||
`MoveRecord`s and reconstructs the board from them with `engine.ReplayBoard`
|
||
(alphabet only, no dictionary).
|
||
|
||
## 6. Game rules
|
||
|
||
- **Word legality: validate-at-submit.** An illegal play is rejected by
|
||
`Solver.ValidatePlay`; there is no challenge phase.
|
||
- **End of game**: the bag is empty **and** a player empties their rack, **or**
|
||
**6 consecutive scoreless turns** (passes/exchanges), **or** a resignation, or
|
||
a missed turn. The **per-game turn timeout** is chosen at creation
|
||
(5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h); a turn not made within it
|
||
becomes an automatic resignation, applied by a background sweeper. The sweeper
|
||
honours each player's **away window** — a daily local-time sleep interval on the
|
||
account (default 00:00–07:00, midnight-cross aware) — so a player is never
|
||
timed out while asleep.
|
||
- **Players**: auto-match is always 2 players; friend games are 2–4 players.
|
||
`backend` owns turn order and the bag for any player count. A resignation or
|
||
timeout in a two-player game ends it with the other player winning. In a game
|
||
with **three or more seats** a resignation or timeout **drops that seat and the
|
||
rest play on** — the engine skips the resigned seat in the turn rotation and
|
||
excludes it from the win, finishing the game (the sole survivor wins) only once
|
||
one active seat remains, or by the ordinary end conditions among the active
|
||
seats. A per-game **drop-out tile disposition**, chosen at creation
|
||
(`dropout_tiles`: `remove` from play — the default — or `return` to the bag),
|
||
governs the leaver's rack, which is **never revealed** to the remaining players;
|
||
it is recorded for deterministic journal replay. (Two-player games end on the
|
||
first drop-out, so the disposition does not affect them.)
|
||
- **Hint**: governed by two per-game settings — whether hints are allowed and the
|
||
starting per-player allowance — plus a per-account hint **wallet**
|
||
(`hint_balance`, spent after the allowance; top-ups are a later feature). A hint
|
||
reveals the top-1 ranked move (`GenerateMoves[0]`). The lobby/tournament caller
|
||
picks the per-game defaults (e.g. one in casual random games, none in
|
||
tournaments). The client **lays the hinted tiles onto the board** as a pending
|
||
placement and leaves the commit to the player. When the rack has no legal move the
|
||
service spends **nothing** and returns `ErrNoHintAvailable` — surfaced as the distinct
|
||
result code `no_hint_available` (separate from `hint_unavailable`) so the UI can say
|
||
"no options" rather than "no hints left".
|
||
- **Word-check tool**: unlimited dictionary lookups against the game's pinned
|
||
dictionary; each result offers a **complaint** (complainant, game, variant,
|
||
dict_version, word, the disputed result, an optional note) that lands in the admin
|
||
review queue. An operator resolves it (`open → resolved`) with a **disposition** —
|
||
reject, accept-add or accept-remove; the accepted ones form a derived
|
||
**pending-changes** list that feeds the offline dictionary rebuild and is marked
|
||
applied once the rebuilt version is hot-reloaded (§5, §12).
|
||
|
||
## 7. Robot opponent
|
||
|
||
Substitutes for a human in 2-player auto-match when the pool yields no human
|
||
within 10 seconds (§8). It lives in `internal/robot` and plays as an ordinary
|
||
seated account through the game service, so only `internal/engine` imports the
|
||
solver. It is designed to be indistinguishable from a person.
|
||
|
||
The robot keeps **no per-game state**: every choice is derived deterministically
|
||
from the game's bag `seed` (a restart-stable FNV-1a mix), so a background driver
|
||
(`robot.Service.Run`, mirroring the turn-timeout sweeper) recomputes the same
|
||
behaviour on every scan and after a restart — the same philosophy as journal
|
||
replay. A pool of durable accounts — each a `kind='robot'` identity (§4), keyed
|
||
`robot-<lang>-<index>` and provisioned at startup with **chat blocked but friend
|
||
requests open** — a request to a robot is accepted as pending and expires unanswered
|
||
(the robot never responds), mirroring a human who ignores it (Stage 17); the chat
|
||
block backs the human-like names (there is no DM surface; chat is per-game). Names are
|
||
**composed per language** from a first-name pool (32 full + 32 colloquial forms) and
|
||
a surname pool (gender-agreed for Russian) in one of three forms (first only /
|
||
first + surname initial / first + full surname), deterministically per pool slot so
|
||
they stay stable across restarts. Substitution is **variant-aware**: a Russian game
|
||
(Russian Scrabble or Эрудит) draws a Russian-named robot with at most ~20% Latin, an
|
||
English game the Latin pool.
|
||
|
||
- **Balance**: at game start it decides once whether to play to win, with
|
||
`P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%), derived from the seed.
|
||
Adaptive difficulty is post-MVP.
|
||
- **Margin targeting**: each turn it picks from the ranked candidates
|
||
(`engine.Candidates`) the move whose resulting lead (playing to win) or deficit
|
||
(playing to lose) is closest to a small band (**1–30 points**), rather than
|
||
always the maximum; with no legal play it exchanges a full rack when the bag can
|
||
refill it, else passes.
|
||
- **Timing**: the per-move delay is **move-number-aware** — a right-skewed sample
|
||
(exponent k=4, short delays frequent) from a band that interpolates from
|
||
**[3, 10] min** at the first move to **[10, 90] min** by ~28 moves, so openings are
|
||
quick and the endgame can run long, clamped to **[1, 90] minutes**; it
|
||
**sleeps 00:00–07:00** anchored to the **opponent's** profile timezone with a
|
||
per-game drift of **±3 h** (fallback UTC), so its night overlaps the human's
|
||
rather than running anti-phase; on a daytime nudge it replies near the move's lower
|
||
band; it proactively nudges the idle human on a **lengthening, randomized schedule** — the
|
||
first ~60-90 min into the turn, each later reminder spaced further out toward 1-6 h — so a long
|
||
wait gets a handful of increasingly-spaced nudges rather than an hourly stream (Stage 17).
|
||
- **Observability**: robot accounts accrue ordinary statistics (§9) — the
|
||
authoritative balance metric (target ≈ 40% robot wins) — and a
|
||
`robot_games_finished_total` OTel counter plus a per-finish log give a live view.
|
||
The **admin game card** surfaces each robot seat's per-game play-to-win intent (from
|
||
the seed) and, on the robot's turn, its deterministic **next-move ETA** (Stage 17).
|
||
|
||
## 8. Lobby & social
|
||
|
||
- **Matchmaking**: an **in-memory** FIFO pool keyed by `variant` (the variant
|
||
fixes the board language), pairing the next two humans into a two-player
|
||
auto-match with the seat order randomised for first-move fairness. The pool is
|
||
lost on restart (players re-queue) and is anonymous, so it does not consult
|
||
blocks. After **10 s** with no human a background reaper substitutes a pooled
|
||
robot (§7) and starts the game. On a pairing or substitution the matchmaker
|
||
emits a **match-found** notification (§10), delivered over the live stream;
|
||
`Poll` remains as a fallback for a client that is not currently streaming.
|
||
**Cancel** (`POST /lobby/cancel`) removes the player from the pool and drops any
|
||
pending matched result, so a cancelled quick-match is dequeued rather than left for
|
||
the reaper to robot-substitute (Stage 17).
|
||
- **Friends** (Stage 8): two add paths over one `friendships` table. A **one-time
|
||
code** the to-be-added player issues (a `friend_codes` row: 6-digit numeric,
|
||
SHA-256-hashed, **12 h** TTL, one live code per issuer, single-use, redeem
|
||
rate-limited) is redeemed by the other player to become friends immediately.
|
||
Alternatively a **request → accept** is sent to someone you **share a game with**
|
||
(active or finished); the recipient may accept, ignore (the pending row lazily
|
||
expires after **30 days** and may be re-sent), or **decline** — a decline is
|
||
remembered (`status='declined'`) and blocks further requests from that sender,
|
||
unless they hand them a code, which overrides it. The requester's own cancel still
|
||
deletes the row; blocking someone severs an existing friendship. (Discovery by
|
||
friend list or platform deep-link arrives with Stage 9 / TODO-5.)
|
||
- **Block**: two independent **global** account toggles (`block_chat`,
|
||
`block_friend_requests`) **plus** a **per-user block list**. A per-user block is
|
||
applied mutually: it hides the pair's chat from each other and refuses friend
|
||
requests and game invitations between them.
|
||
- **Friend games**: formed by **invitation → accept** (an `game_invitations`
|
||
record with one row per invitee). The 2–4 player game starts once **every**
|
||
invitee accepts; any decline cancels the invitation, and a pending invitation
|
||
expires after 7 days (enforced lazily on access).
|
||
- **Chat**: per-game, persisted (kept with the game's archive), **≤ 60 runes**,
|
||
and **validated on input** — links, email addresses and phone numbers (including
|
||
lightly obfuscated forms) are rejected, since the chat is for quick reactions,
|
||
not contact exchange. Each message stores the sender's IP (forwarded by the
|
||
gateway in Stage 6) for moderation. A sender who has disabled chat cannot post,
|
||
and messages from a blocked sender are hidden from the viewer. The operator console
|
||
has a **Messages** section (Stage 17) that lists posted messages (nudges excluded)
|
||
newest-first with the sender's resolved name, **source** (guest / robot / oldest
|
||
identity kind), IP and game, searchable by sender name / external-id glob masks and
|
||
pinnable to one game or sender (linked from the game and user cards).
|
||
- **Nudge**: folded into the chat as a `nudge` message kind. The player awaiting
|
||
the opponent may nudge **once per hour per game**; it is not allowed on one's own
|
||
turn. The platform-native delivery is wired with the gateway / platform
|
||
side-service (Stage 6 / 8).
|
||
- **Profile**: `preferred_language` (en/ru, edited in Settings), display name, email
|
||
(confirm-code binding, see §4), **timezone**, the daily **away window** and the
|
||
block toggles — all editable through `account.UpdateProfile`, which validates them
|
||
(Stage 8): a display name is Unicode letters joined by single ` `/`.`/`_`
|
||
separators (no leading/trailing/adjacent separators, ≤ 32 runes); the timezone is a
|
||
fixed `±HH:MM` **UTC offset** (or a legacy IANA name) resolved by `account.ResolveZone`
|
||
for the sweeper and the robot's sleep (a fixed offset trades DST for a simple
|
||
picker); the away window is at most **12 h** (midnight-wrap aware). Linked platform
|
||
accounts and merge are Stage 11.
|
||
|
||
## 9. Persistence
|
||
|
||
- Single Postgres database, schema `backend`; `backend` is the only writer. The
|
||
"pgx pool" is a `database/sql` handle backed by the pgx stdlib driver and
|
||
instrumented with otelsql; type-safe queries use **go-jet** (code generated
|
||
into `internal/postgres/jet` and committed, regenerated by `cmd/jetgen`).
|
||
Migrations are embedded SQL applied with `pressly/goose/v3` at startup. Primary
|
||
keys are application-generated **UUIDv7**.
|
||
- Tables: `accounts` (durable internal accounts; Stage 3 added the away-window
|
||
columns `away_start`/`away_end` and the hint wallet `hint_balance`; Stage 6's
|
||
migration `00005` added the `is_guest` flag for ephemeral guest rows; Stage 9's
|
||
migration `00007` added the `notifications_in_app_only` out-of-app push toggle;
|
||
Stage 11's migration `00009` added the `paid_account` service flag and the
|
||
merge-tombstone columns `merged_into`/`merged_at`),
|
||
`identities` (platform/email/robot identities, unique `(kind, external_id)`;
|
||
Stage 5's migration `00004` admits the `robot` kind),
|
||
`sessions` (revoke-only opaque-token hashes), the Stage 3 game tables
|
||
`games` (Stage 4 added the `dropout_tiles` disposition column), `game_players`,
|
||
`game_moves` (the move journal), `complaints` and `account_stats`, and the
|
||
Stage 4 social/lobby tables `friendships` (the request/accept graph), `blocks`
|
||
(per-user blocks), `chat_messages` (per-game chat and nudges), `email_confirmations`
|
||
(pending confirm-codes) and `game_invitations` / `game_invitation_invitees`
|
||
(friend-game invitations). Stage 8's migration `00006` widened the `friendships`
|
||
status to admit `declined` and added `friend_codes` (one-time add-a-friend codes).
|
||
Stage 17 added `game_drafts` (a player's in-progress rack order + board composition per
|
||
game) and `game_hidden` (`(account_id, game_id)` rows that drop a finished game from one
|
||
account's own lobby list, leaving it visible to the other players — finished-only and
|
||
irreversible by design, so there is no un-hide).
|
||
The matchmaking pool is **in-memory** and persists nothing.
|
||
- **Active games are event-sourced.** A game is a `games` row (pinned
|
||
`variant`/`dict_version`, bag `seed`, the per-game settings, and a denormalised
|
||
turn cursor) plus an append-only, decoded move journal (`game_moves`); the live
|
||
position is an `engine.Game` held in an in-memory cache (≈24 h idle TTL) and
|
||
rebuilt by replaying the journal on a miss, which the seeded bag makes exact.
|
||
Each game is serialised by a per-game lock; a persistence failure evicts the
|
||
live game so the next access rebuilds from the journal. `game_players` records
|
||
each seat's account, running score, hints used and winner flag.
|
||
- **Statistics** (`account_stats`, recomputed on each finish for durable
|
||
non-guest accounts only — the finish-time recompute skips any `is_guest`
|
||
seat): wins, losses, **draws**, max points in a game, and
|
||
max points for a single **move** (which already folds in every word the move
|
||
formed plus the all-tiles bonus). A tie increments draws only; a resignation or
|
||
timeout is a loss for the acting player.
|
||
|
||
### 9.1 History invariant (must hold forever)
|
||
|
||
Archived games must replay **independently of any dictionary and of the
|
||
solver's internal encoding** — at least visually. Therefore the move journal
|
||
persists only **decoded concrete values**: action kind (play / pass / exchange /
|
||
resign / timeout), acting player, per-move score and running total, timestamp,
|
||
and — in a per-move JSON payload — the acting player's rack before the move (with
|
||
`?` for a blank), and for a play its direction, main-word anchor, placed tiles
|
||
(letter as text, coordinate, blank flag) and the words formed; for an exchange,
|
||
the swapped tiles. This is exactly what is needed both to **replay the game
|
||
through the engine** (a cache miss) and to render history or emit GCG **without a
|
||
dictionary**: the board for visual replay is reconstructed by applying placements
|
||
onto an empty grid, since moves were validated at play time and scores are
|
||
stored. `variant` and `dict_version` are kept as **metadata only** (audit,
|
||
complaint review), never as a replay dependency. **GCG export** is derived from
|
||
the same rows and is likewise self-contained — we ship our own writer (the solver
|
||
exposes none): the standard Poslfit dialect (UTF-8, `#player`/`#lexicon`
|
||
pragmas, `8G`/`H8` coordinates, lower-case blanks, `.` pass-throughs, `-TILES`
|
||
exchanges), plus `#note` lines for resignations and timeouts, which the standard
|
||
does not cover. **GCG export is offered only on a finished game** (`game.ErrGameActive`
|
||
otherwise, Stage 8), so an in-progress journal is never leaked mid-play; the client
|
||
shares the `.gcg` file via the Web Share API where available, else downloads it.
|
||
|
||
The Stage 13 alphabet-on-the-wire change does **not** touch this invariant: the live edge
|
||
exchanges alphabet indices, but the persisted journal (and everything derived from it —
|
||
replay, history, GCG) keeps the decoded concrete letters described above, so an archived
|
||
game still replays with the variant's `rules.Alphabet` alone, independent of any dictionary.
|
||
|
||
## 10. Notifications
|
||
|
||
Two channels: the **in-app live stream** (delivered from Stage 6) and
|
||
**platform-native push** (out-of-app, via the platform side-service — Stage 9).
|
||
The backend emits notification intents through an in-process hub
|
||
(`internal/notify`, a `Publisher` seam installed on the game, social and lobby
|
||
services); a single backend→gateway **gRPC server-stream** (`Push.Subscribe`,
|
||
`pkg/proto/push/v1`) carries every event, and the gateway fans them out by
|
||
`user_id` to each client's Connect `Subscribe` stream while the app is open. The
|
||
catalog is **your-turn** and **opponent-moved** (emitted from the game commit, so
|
||
robot-driver and timeout-sweeper moves emit too; opponent-moved goes to **every seat,
|
||
including the mover**, so the mover's own other devices and their lobby refresh — it is
|
||
in-app only, so the actor gets no out-of-app push for their own move), **chat-message** and **nudge**
|
||
(from the social service), **match-found** (from the matchmaker, §8), and **notify**
|
||
(Stage 8 — a lightweight "re-poll" signal carrying a sub-kind: friend-request,
|
||
friend-added, friend-declined, invitation or game-started; emitted on a friend-request,
|
||
on answering one (accept → friend-added, decline → friend-declined — to the original
|
||
requester, so a game screen watching that opponent re-derives its "add to friends" state,
|
||
Stage 17), and on an invitation create or its game start). Stage 17 added **game-over** (emitted to every
|
||
seat from the same game commit when a game finishes — any path: a closing play, all-pass,
|
||
resign or timeout) and **enriched your-turn** so the out-of-app push reads in full: it now
|
||
also carries the mover's display name, their last action and the main word of a scoring play,
|
||
and a **recipient-first** running score line (e.g. `120:95:80`, the reader's score first).
|
||
Event payloads are FlatBuffers-encoded by
|
||
the backend and forwarded verbatim. A client that is not currently streaming falls
|
||
back to the matchmaker's `Poll` for match-found and, for the lobby **notification
|
||
badge** (incoming friend requests + open invitations), the client polls on lobby
|
||
open and on focus as well as re-polling on the `notify` event — covering a push
|
||
missed while the app was hidden. **Out-of-app platform push** (Stage 9) is a fallback
|
||
the **gateway** routes from the same firehose: for an event whose recipient has **no
|
||
live in-app stream** it resolves the backend `/internal/push-target` (their Telegram
|
||
`external_id`, the **service language** — the bot they last signed in through, falling
|
||
back to the interface language — and the `notifications_in_app_only` flag). A **game** event,
|
||
however, carries the **game's own language** on the push (Stage 17), and the gateway routes by
|
||
that instead of the service language — so a game's notification always comes from the game's bot,
|
||
not the recipient's latest-login bot. It then asks the **Telegram connector** to deliver a
|
||
localized message with a Mini App deep-link button — only when the recipient has a Telegram
|
||
identity and has not confined notifications to the app, so the two channels never duplicate. The
|
||
connector routes by that language to the matching bot and renders the message in it. The out-of-app set is
|
||
your-turn, game-over, nudge, match-found and the invitation / friend-request notify sub-kinds;
|
||
the connector renders the message and skips the rest. Operator broadcasts
|
||
(`SendToUser` / `SendToGameChannel`, §10 admin) instead pick the bot by an
|
||
**operator-chosen** language in the console, unrelated to the recipient's login. Session-revocation events and
|
||
cursor-based stream resume stay deferred (single-instance MVP).
|
||
|
||
A separate **announcements channel** feeds the client's one-line banner (UI_DESIGN.md).
|
||
It is a client-side **mock** rotation today; a server-driven source (operational notices,
|
||
promotions) is future work and would deliver short markdown messages (text + links).
|
||
|
||
## 11. Observability
|
||
|
||
- Structured logging with `go.uber.org/zap` (JSON). OpenTelemetry tracer and
|
||
meter providers are wired in **all three services** (backend, gateway, the
|
||
Telegram connector) through a shared `pkg/telemetry` bootstrap, env-gated per
|
||
service by `{BACKEND,GATEWAY,TELEGRAM}_OTEL_{TRACES,METRICS}_EXPORTER` with a
|
||
default of `none` (so no collector is required locally or in CI). `stdout` is
|
||
available for debugging; **`otlp`** (gRPC, endpoint from the standard
|
||
`OTEL_EXPORTER_OTLP_*` environment) exports to a collector. The Postgres pool is
|
||
instrumented with otelsql and `otelgrpc` traces the backend↔gateway push stream
|
||
and the gateway↔connector calls. The OTLP **Collector** (OTLP/gRPC → Prometheus
|
||
metrics + Tempo traces), **Prometheus** (15d), **Tempo** (72h) and **Grafana**
|
||
(provisioned datasources + dashboards, behind the caddy `/_gm/grafana` Basic-Auth)
|
||
are stood up with the deploy (`deploy/`, Stage 16); the default exporter stays
|
||
`none`, so CI needs no collector. The contour also runs **cAdvisor** (per-container
|
||
CPU/memory/network) and **postgres_exporter** (connections, cache-hit ratio,
|
||
transactions, db size), scraped by Prometheus and surfaced on the **Scrabble —
|
||
Resources** Grafana dashboard (R2), so the pre-release stress runs capture a resource
|
||
baseline; these export directly in Prometheus format (not through the collector).
|
||
- Per-request server-side timing via gin middleware from day one (the access log
|
||
carries method, route, status, latency and the active trace id). A
|
||
client-measured RTT piggybacked on the next request is a later enhancement.
|
||
- Domain/operational metrics (Stage 12), recorded through the meter and invisible
|
||
until an exporter is configured: histograms `game_replay_duration` (journal
|
||
rebuild on a cache miss), `game_move_validate_duration` and `game_move_duration`
|
||
(Stage 17 — a seat's think time per committed move, attributed by `variant` and a
|
||
`phase` of opening/middle/endgame; it aggregates **all** seats including robots,
|
||
whose synthetic timing dominates the tail, so per-human analysis lives in the admin
|
||
console, below); counters `games_started_total`, `games_abandoned_total` (a
|
||
turn-timeout seat drop), `chat_messages_total` (`kind` = message/nudge) and
|
||
`robot_games_finished_total`; an observable gauge `game_cache_active`; the gateway
|
||
`edge_request_duration` (the UI-perceived roundtrip, by `message_type`/`result`);
|
||
and Go runtime/heap metrics. Game-scoped metrics carry a `variant` attribute
|
||
(scrabble_en/scrabble_ru/erudit_ru).
|
||
- Per-user move-time analytics (Stage 17) are **offline**, derived in the admin
|
||
console from the move journal (`game_moves.created_at` deltas, the first move from
|
||
the game's creation), not Prometheus labels (which an `account_id` would explode):
|
||
the user list shows each account's min/avg/max think time, and the user-detail page
|
||
draws a zero-JS inline-SVG chart of min/mean/max by the player's move number.
|
||
- User metrics (Stage 16): a backend counter `accounts_created_total` (`kind` =
|
||
telegram/email/guest; robots are a provisioned pool, not users, and are excluded)
|
||
and a gateway **in-memory** observable gauge `active_users` (`window` = 24h/7d) —
|
||
distinct accounts that performed an authenticated edge action in the window. The
|
||
gauge is single-process by design (single-instance MVP, §10): it is correct for one
|
||
gateway, resets on restart, and is a live operational figure, not a billing count.
|
||
- **Rate-limit observability (R3):** every limiter rejection increments the gateway
|
||
counter `gateway_rate_limited_total` (`class` = user/public/email/admin — aggregate
|
||
only, honouring the no-per-user-label discipline above) and logs one **Debug** line;
|
||
a gateway reporter drains the per-key rejection tracker every 30 s, emits one **Warn**
|
||
summary per throttled key and posts the report to the backend
|
||
(`POST /api/v1/internal/ratelimit/report`, network-trusted like `sessions/resolve`).
|
||
The backend's `ratewatch` keeps a bounded in-memory episode window (single-instance,
|
||
resets on restart, like `active_users`) surfaced on the admin console's **Throttled**
|
||
page next to the flagged-account review queue, and applies the **conservative
|
||
auto-flag**: an account sustaining `BACKEND_HIGHRATE_FLAG_THRESHOLD` rejected calls
|
||
(default 1000) within `BACKEND_HIGHRATE_FLAG_WINDOW` (default 10 min) gets the soft,
|
||
reversible `accounts.flagged_high_rate_at` marker — set once, shown in the user
|
||
list/detail, cleared by the operator, **never an automatic ban** and never a request
|
||
gate. The Edge/UX dashboard graphs the aggregate request rate against the rejection
|
||
rate by class.
|
||
- Unauthenticated `GET /healthz` (liveness) and `GET /readyz` (readiness — the
|
||
database answers a bounded ping and the session cache is warmed).
|
||
- The backend serves a **second listener** — a gRPC server
|
||
(`BACKEND_GRPC_ADDR`, default `:9090`) for the live-event push stream to the
|
||
gateway — alongside the HTTP listener; both start together and stop on signal.
|
||
|
||
## 12. Security boundaries
|
||
|
||
| Concern | Enforced by |
|
||
| --- | --- |
|
||
| Public rate limiting / anti-abuse | gateway (per-IP public/email/admin classes, per-user authenticated class; a request body cap of `GATEWAY_MAX_BODY_BYTES`; rejections are metered, summarised to the backend and surfaced in the admin console with a conservative reversible auto-flag — R3, §11) |
|
||
| Telegram initData validation (bot-token HMAC) | the Telegram connector; the gateway delegates it over gRPC, so the bot token lives only in the connector |
|
||
| Session minting; email-code / guest validation | gateway (with backend) |
|
||
| Session → `user_id` resolution, `X-User-ID` injection | gateway |
|
||
| Authorisation, ownership, state transitions | backend (`X-User-ID` is the sole identity input) |
|
||
| Admin authentication | a single Basic-Auth gate on `/_gm/*`, forwarded **verbatim** to the backend's server-rendered admin console (and, in the deployed contour, routing `/_gm/grafana/*` to Grafana). In the deploy the **caddy** owns this gate (§13); a local non-caddy run uses the gateway's own `GATEWAY_ADMIN_*` proxy, which the per-IP admin limiter class guards ahead of its Basic-Auth (R3) — the caddy-fronted path has no limiter (stock caddy), an accepted gap. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked |
|
||
| backend ↔ gateway ↔ connector trust | the network (only gateway may reach backend; the connector serves unauthenticated gRPC on the internal segment) |
|
||
|
||
This is an explicit, accepted MVP risk: compromise of the gateway↔backend
|
||
network segment defeats backend authentication. Mitigated by network isolation;
|
||
mutual auth is a future hardening step.
|
||
|
||
**Short numeric codes** (email confirm-codes and Stage 8 friend codes) are stored
|
||
only as SHA-256 hashes and are short-lived and single-use. The unauthenticated
|
||
email path carries a tight per-IP sub-limit (5 / 10 min); the **friend-code redeem**
|
||
is authenticated, so it rides the per-user limit (300 / min) and is further bounded
|
||
by the code's 12 h TTL, single use, and **one live code per issuer** (which caps the
|
||
valid-code population). Brute-forcing a 6-digit friend code within these limits is an
|
||
accepted MVP risk with low blast radius (an unwanted friendship is removable/blockable);
|
||
a dedicated redeem sub-limit or a longer code is the hardening step if abuse appears.
|
||
|
||
## 13. Deployment (informational)
|
||
|
||
Single public origin, path-routed. The Vite build has two entries: a lightweight
|
||
**landing page** and the game **SPA**. The gateway **embeds** the SPA build
|
||
(`go:embed`, baked in by a node stage in `gateway/Dockerfile`) and serves it at
|
||
`/app/` (web) and `/telegram/` (the Telegram Mini App; outside Telegram that path
|
||
redirects to the root — the client-side guard); a stray hit on the gateway's `/`
|
||
308-redirects to `/app/`. The **landing** ships in its own static container (R3): the
|
||
`landing` target of `gateway/Dockerfile` (caddy:2-alpine + the same Vite build,
|
||
`deploy/landing/Caddyfile`) serves it at `/`, so stray public traffic is absorbed by
|
||
static file serving and never reaches the Go edge. Hash-named `/assets/*` are served
|
||
`immutable` (a relaunch is a cache hit, not a re-download); the HTML shells are
|
||
`no-cache` so a new deploy is picked up — both containers apply the same caching. An
|
||
in-compose **caddy** is the contour's edge: it owns a single `/_gm` Basic-Auth and
|
||
routes `/_gm/grafana/*` to **Grafana** (anonymous-admin, so the one shared login gates
|
||
it with no per-user Grafana accounts) and the rest of `/_gm/*` to the backend-rendered
|
||
**admin console**; `/app/`, `/telegram/` and the Connect path go to the gateway; the
|
||
catch-all — notably the landing at `/` — goes to the landing container. The
|
||
**Telegram connector** runs as a separate container with **no public ingress** — it
|
||
long-polls Telegram and egresses through a VPN sidecar, answering only internal gRPC.
|
||
|
||
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
|
||
one Postgres, the static `landing`, the connector (+ its VPN sidecar) and the **observability stack** —
|
||
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
|
||
with provisioned datasources and dashboards. All three services export OTLP to the
|
||
collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not
|
||
carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving
|
||
`otelcol`; without it the netns uses Docker's resolver, which resolves both
|
||
`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal`
|
||
network (project-scoped DNS); only caddy joins the shared external `edge` network
|
||
(alias `scrabble`).
|
||
|
||
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
|
||
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`
|
||
(`.gitea/workflows/ci.yaml` → `docker compose up -d --build` on the Gitea runner
|
||
host, then `GET /` + `GET /app/` probes through caddy — the landing container and
|
||
the gateway, R3). The host caddy terminates TLS and
|
||
forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP
|
||
(`CADDY_SITE_ADDRESS=:80`). The in-compose caddy **trusts X-Forwarded-For from
|
||
private-range upstreams** (`trusted_proxies private_ranges`), so the real client IP —
|
||
used for chat-moderation logging and the gateway's per-IP rate limiting — survives the
|
||
host-caddy hop; in prod (no host caddy) public clients are untrusted and Caddy uses the
|
||
real peer, so the single config is correct and spoof-safe in both contours (Stage 17).
|
||
- **Prod** (Stage 18): a manual SSH deploy after `development → master`. There is no
|
||
host caddy, so the contour ships its own caddy terminating TLS — set
|
||
`CADDY_SITE_ADDRESS` to the domain and the caddy does its own ACME.
|
||
|
||
## 14. CI & branches
|
||
|
||
- **Two long-lived branches** (Stage 16): **`development`** is the integration
|
||
trunk and **`master`** the production trunk; `feature/*` branches are cut from
|
||
`development` and PR back into it (the genesis commit necessarily landed on
|
||
`master`). A commit to a `feature/*` branch triggers nothing.
|
||
- A single `.gitea/workflows/ci.yaml` (Gitea has no cross-workflow `needs`) runs the
|
||
suite on a PR into `development`/`master` and on a push to `development`. Its
|
||
`unit` (gofmt/vet/build/unit-test), `integration` (Postgres-backed `integration`
|
||
tag, testcontainers `postgres:17-alpine`, Ryuk off, serial) and `ui`
|
||
(check/unit/build/bundle-budget/e2e) jobs are **path-conditional** (Stage 17 — a
|
||
`changes` job filters by changed paths), and an always-running **`gate`** job
|
||
aggregates them (passing when each succeeded or was **skipped**) and is the single
|
||
branch-protection required check (`CI / gate`), so a path-skipped job never blocks
|
||
a merge.
|
||
- A gated **`deploy`** job auto-rolls the **test contour** on a PR into — or a push
|
||
to — `development` (`docker compose up -d --build` on the runner host), then probes
|
||
the gateway (`GET /`) **and the Telegram connector's liveness** (Stage 17 —
|
||
`docker inspect`: running, not restarting, stable restart count, with a
|
||
VPN-handshake grace period, since the connector has no public ingress and a
|
||
crash-loop is otherwise invisible). A PR into `master` is test-only; the prod
|
||
deploy is the manual Stage 18 workflow. Secrets/variables are prefixed
|
||
`TEST_`/`PROD_` per contour.
|
||
- The engine consumes `scrabble-solver` as a **published, versioned module**
|
||
(`gitea.iliadenisov.ru/developer/scrabble-solver`, pinned in `backend/go.mod`); both Go
|
||
workflows set `GOPRIVATE=gitea.iliadenisov.ru/*` so go fetches it directly from this Gitea
|
||
(no public proxy/checksum DB, no sibling clone). The dictionaries ship as a **release
|
||
artifact** from the `scrabble-dictionary` repo; the workflows download
|
||
`scrabble-dawg-<DICT_VERSION>.tar.gz` and point the engine tests at it via
|
||
`BACKEND_DICT_DIR` (TODO-1/TODO-2 discharged in Stage 14).
|
||
- After any push, the run is watched to green before a stage is declared done
|
||
(`python3 ~/.claude/bin/gitea-ci-watch.py`).
|