Files
scrabble-game/docs/ARCHITECTURE.md
T
Ilia Denisov eeaad62b10
Tests · Go / test (push) Successful in 11s
Tests · Integration / integration (push) Successful in 8s
Stage 1: backend foundation (Postgres, sessions, accounts, OTel)
- internal/postgres: pgx-over-database/sql pool (otelsql), embedded goose
  migrations into schema 'backend', committed go-jet code + cmd/jetgen tool.
- internal/account: durable accounts + unified telegram/email identities
  (UUIDv7 keys), find-or-create provisioning with unique-conflict handling.
- internal/session: opaque 256-bit tokens stored as a SHA-256 hash, revoke-only
  (no TTL); write-through cache gating /readyz; store + service.
- internal/telemetry: OTel tracer/meter providers (none/stdout) + request-timing
  middleware; internal/config gains Postgres + OTel env loading.
- internal/server: /api/v1 {public,user,internal,admin} skeleton + X-User-ID
  middleware; /readyz checks DB ping + cache; main wires
  telemetry -> db+migrate -> warm cache -> server.
- Tests: unit + integration (build tag 'integration', testcontainers
  postgres:17) for migrations, accounts, sessions, readyz; new integration.yaml.
- Docs: ARCHITECTURE, TESTING, PLAN refinements, root + backend READMEs.

Session/account REST handlers deferred to Stage 6 (gateway); OTLP + dashboards
to Stage 11.
2026-06-02 13:52:26 +02:00

13 KiB
Raw Blame History

Scrabble Game — Architecture

Source of truth for the platform architecture, transport, security model and cross-service contracts. User-visible behaviour per domain lives in FUNCTIONAL.md; the staged build order lives in ../PLAN.md. This document always describes the current design, not the history of how it was reached. Sections describing not-yet-implemented components are marked (planned).

1. Overview

Three executables plus per-platform side-services:

  • gateway (planned) — the only public ingress. Performs anti-abuse (rate limiting), authenticates the player against the originating platform (or an email/guest session), resolves the internal user_id, and forwards authenticated traffic to backend with an X-User-ID header. Hosts an admin surface behind HTTP Basic Auth. Bridges live events from backend to the client.
  • backend — internal-only service that owns every domain concern: identity/sessions, accounts and linking, lobby and matchmaking, the game runtime, the robot opponent, chat, notifications, statistics, history, and administration. Embeds the scrabble-solver engine as a library, in-process — there is no per-game container. The only network consumer of backend is gateway (plus platform side-services over an internal API).
  • ui (planned) — pure-HTML5 client (plain Svelte + Vite, static build). Talks to backend only through gateway. Embeddable in platform webviews; packageable to native (iOS/Android) via Capacitor.
  • platform/<name> (planned) — per-platform side-services (Telegram bot first): deep-link invites and platform-native push notifications. They talk to backend over an internal API.
flowchart LR
  Client((Client / webview)) -- Connect-RPC + FlatBuffers (h2c) --> Gateway
  Gateway -- REST/JSON, X-User-ID --> Backend
  Backend -- gRPC server-stream (live events) --> Gateway
  Gateway -- in-app stream --> Client
  Backend -- pgx --> Postgres[(Postgres)]
  Backend -. embeds .- Solver[[scrabble-solver library]]
  Telegram[Telegram bot side-service] -- internal API --> Backend

The MVP runs gateway and backend as single-instance processes inside a trusted network. No Redis is planned (anti-replay crypto was deliberately dropped). Horizontal scaling is explicit future work.

2. Transport

  • client ↔ gateway: Connect-RPC + FlatBuffers over HTTP/2 cleartext (h2c). Binary payloads, server-streaming for the in-app live channel, first-class JS clients (@connectrpc/connect-web + the flatbuffers npm package). The contract is kept minimal.
  • gateway ↔ backend (sync): plain HTTP REST/JSON. The gateway injects X-User-ID for authenticated requests; backend never re-derives identity from the body.
  • backend → gateway (live): a single gRPC server-stream carries live events (your-turn, opponent-moved, chat, nudge). The gateway bridges them to the client's in-app stream while the app is open. Out-of-app delivery uses platform-native push via the platform side-service.

3. Authentication & sessions

Platform-native, deliberately simple: no Ed25519 client keys, no per-request signing, no anti-replay crypto (these were considered and dropped — players arrive from a platform rather than completing a mandatory registration).

  • The gateway validates the originating credential once — the platform's signed launch data (e.g. Telegram initData HMAC), an email-code login, or a guest bootstrap — then mints a thin opaque server session token (session_id).
  • The client holds session_id in memory for the app session (browser/OS storage is optional and may be unavailable; losing it means re-login).
  • The gateway caches session → user_id and injects X-User-ID. Session records live in backend, which stores only a SHA-256 hash of the opaque token (never the plaintext), keeps a warmed in-memory cache for fast resolution, and treats sessions as revoke-only — they have no TTL and live until explicitly revoked (statusrevoked).
  • Guest = ephemeral web session (no platform, no email): session-only, nothing persisted; restricted to auto-match, with no friends and no stats/history. Platform users are auto-provisioned durable accounts.

4. Accounts, identities, linking & merge

  • One internal account may carry several platform identities (telegram, vk, …) plus an optional email identity. First contact from a platform auto-provisions a durable account bound to that platform identity. Concretely, platform and email identities share one identities table keyed by a unique (kind, external_id); email is an identity with kind=email and a confirmed flag (the confirm-code flow lands later). Accounts and identities use application-generated UUIDv7 primary keys.
  • Linking is initiated from an authenticated profile: choose a platform → complete that platform's web-auth confirm → attach the identity to the current account.
  • Merge: if the identity being linked already has its own account with history, the two accounts are merged into the current one (A is primary): statistics are summed, games and friends are transferred, duplicates are de-duplicated, the secondary account is retired. High blast-radius; an isolated, well-tested stage.

5. Game engine integration (scrabble-solver)

backend embeds the solver library (see CLAUDE.md for the exact public API and constraints). Key points:

  • Variants at launch: English Scrabble, Russian Scrabble, Эрудитrules.English(), rules.RussianScrabble(), rules.Erudit().
  • Dictionaries are committed DAWGs loaded with dawg.Load; held in memory and addressed by (variant, dict_version).
  • Dictionary versioning — pin per game. A game records the dict_version it started on and finishes on that version; new games use the latest. Multiple versions may be resident at once. An admin reload endpoint (planned) adds a new version; delivery is the DAWG file in the image / a mounted volume.
  • Move generation/validation/scoring use Solver.GenerateMoves (ranked), Solver.ValidatePlay, Solver.ScorePlay; board mutation uses scrabble.Apply. Tile bag follows the selfplay.Bag pattern.

6. Game rules

  • Word legality: validate-at-submit. An illegal play is rejected by Solver.ValidatePlay; there is no challenge phase.
  • End of game: the bag is empty and a player empties their rack, or 6 consecutive scoreless turns (passes/exchanges). A move that is not made within the 24-hour turn timeout becomes an automatic resignation.
  • Players: auto-match is always 2 players; friend games are 24 players. backend owns turn order and the bag for any player count.
  • Hint: one per game; reveals the top-1 ranked move (GenerateMoves[0]).
  • Word-check tool: unlimited dictionary lookups; each result offers a complaint that lands in an admin review queue (admin side planned).

7. Robot opponent

Substitutes for a human in 2-player auto-match when the pool yields no human within 10 seconds. Designed to be indistinguishable from a person.

  • Balance: at game start it decides once whether to play to win, with P(play-to-win) ≈ 0.40 (so the human wins ≈ 60%). Adaptive difficulty is post-MVP.
  • Margin targeting: each turn it picks from GenerateMoves a move that keeps the resulting lead (when playing to win) or deficit (when playing to lose) small (≈ 120 points), rather than always the maximum.
  • Timing: per-move delay sampled from a right-skewed distribution (short delays frequent), clamped to [2, 90] minutes; sleeps 00:0007:00 in the opponent's profile timezone (fallback UTC); on a daytime nudge after 60 minutes idle it replies within 210 minutes; it proactively nudges the human after 12 hours idle.
  • Blocks friend requests and direct messages; uses a human-like name pool.

8. Lobby & social

  • Matchmaking (detail planned): a FIFO pool keyed by (variant, language); 10 s with no human match → substitute the robot.
  • Friends: add by friend list, internal ID, or platform deep-link.
  • Block settings independently suppress in-game chat and friend requests.
  • Chat: per-game, persisted, length-limited, suppressed by the block setting.
  • Nudge: a player may nudge the opponent whose turn is awaited once per hour; the opponent receives a platform-native notification.
  • Profile: preferred_language (en/ru), display name, linked platform accounts, email (confirm-code binding), timezone (drives robot sleep; default from platform/locale, user-editable), block toggles.

9. Persistence

  • Single Postgres database, schema backend; backend is the only writer. The "pgx pool" is a database/sql handle backed by the pgx stdlib driver and instrumented with otelsql; type-safe queries use go-jet (code generated into internal/postgres/jet and committed, regenerated by cmd/jetgen). Migrations are embedded SQL applied with pressly/goose/v3 at startup. Primary keys are application-generated UUIDv7.
  • Stage 1 tables: accounts (durable internal accounts), identities (platform/email identities, unique (kind, external_id)) and sessions (revoke-only opaque-token hashes).
  • Active game state is stored structurally with the dict_version pinned.
  • Statistics (computed on finish): wins, losses, max points in a game, max points for a single word.

9.1 History invariant (must hold forever)

Archived games must replay independently of any dictionary and of the solver's internal encoding — at least visually. Therefore the move log persists only decoded concrete values: letters as text, coordinates, blank flag, action kind (play / pass / exchange / resign / timeout), acting player, per-move score and running total, timestamp. The board for visual replay is reconstructed by applying placements onto an empty grid; no dictionary is needed because moves were validated at play time and scores are stored. variant and dict_version are kept as metadata only (audit, complaint review), never as a replay dependency. GCG export is derived from the same rows and is likewise self-contained (we ship our own writer; the solver exposes no public GCG writer).

10. Notifications

Two channels: platform-native push (out-of-app, via the platform side-service — your-turn, nudge) and the in-app live stream (chat, opponent-moved, while the app is open). Backend emits notification intents; delivery fans out to the appropriate channel.

11. Observability

  • Structured logging with go.uber.org/zap (JSON). OpenTelemetry tracer and meter providers are wired (Stage 1), env-gated by BACKEND_OTEL_{TRACES,METRICS}_EXPORTER with a default of none (so no collector is required locally or in CI); stdout is available for debugging and the Postgres pool is instrumented with otelsql. OTLP export, a Prometheus pull endpoint, and dashboards arrive with the first real workload.
  • Per-request server-side timing via gin middleware from day one (the access log carries method, route, status, latency and the active trace id). A client-measured RTT piggybacked on the next request is a later enhancement.
  • Unauthenticated GET /healthz (liveness) and GET /readyz (readiness — the database answers a bounded ping and the session cache is warmed).

12. Security boundaries

Concern Enforced by
Public rate limiting / anti-abuse gateway
Platform credential validation, session minting gateway
Session → user_id resolution, X-User-ID injection gateway
Authorisation, ownership, state transitions backend (X-User-ID is the sole identity input)
Admin authentication gateway Basic Auth → backend admin endpoints
backend ↔ gateway trust the network (only gateway may reach backend)

This is an explicit, accepted MVP risk: compromise of the gateway↔backend network segment defeats backend authentication. Mitigated by network isolation; mutual auth is a future hardening step.

13. Deployment (informational)

Single public origin, path-routed: the UI, the gateway public surface and the admin surface share one host that terminates TLS. MVP runs one gateway, one backend, one Postgres. Docker/compose environments are introduced when there is something to deploy.

14. CI & branches

  • Trunk is master; feature work happens on feature/* branches merged via PR with a green CI gate (from Stage 1 onward — the genesis commit necessarily lands on master).
  • .gitea/workflows/ holds the CI. go-unit.yaml runs gofmt/vet/build/unit-test on Go changes; integration.yaml runs the Postgres-backed tests behind the integration build tag (testcontainers postgres:17-alpine, Ryuk disabled, serial). Further workflows (ui-test, deploy) are added with the components they cover.
  • After any push, the run is watched to green before a stage is declared done (python3 ~/.claude/bin/gitea-ci-watch.py).