scrabble-game/PLAN.md

# Scrabble Game — implementation plan

Living plan and **stage tracker**. Each stage is implemented in its own session;
the rules for starting and finishing a stage are in [`CLAUDE.md`](CLAUDE.md).
The architecture/decision record is [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md);
behaviour is [`docs/FUNCTIONAL.md`](docs/FUNCTIONAL.md). When a stage produces a
decision, bake it back here **and** into the affected docs/code in the same PR.

## Context

Greenfield multiplatform Scrabble. Players arrive from a platform (Telegram
first; later VK/MAX/iOS/Android) or standalone web (email / guest). Three
executables — `gateway`, `backend`, `ui` — plus per-platform side-services.
Deliberately simpler than the sibling `../galaxy-game` (idea donor, not a
template). The `../scrabble-solver` engine is embedded in-process as a library.

## Locked decisions (recap — full record in docs/ARCHITECTURE.md)

Stack: `go.work` monorepo, modules `scrabble/<name>`, Go 1.26.x, backend
gin+pgx+Postgres(schema `backend`)+goose+zap+OTel (deps added when first used).
Wire: Connect-RPC + FlatBuffers (client↔gateway), REST/JSON + `X-User-ID`
(gateway↔backend), gRPC server-stream for live events. Auth: platform-native,
thin opaque session token, no Ed25519/signing, likely no Redis. UI: pure
HTML5/CSS, plain Svelte + Vite, Capacitor for native. MVP surfaces: Telegram +
web (email + ephemeral guest) + link/merge. Variants: ru/en/Эрудит.
Legality: validate-at-submit. End: empty bag+rack / 6 scoreless / 24h timeout.
Hint: top-1. Word-check: unlimited + complaint. Robot: P(win)≈0.40, margin
targeting, [2,90]min skewed timing, sleep 00:00–07:00 opp-tz, nudge logic.
Dictionary: pin per game. History: structured + GCG export, dictionary-
independent (see ARCHITECTURE §9.1).

## Stage tracker

| # | Stage | Status |
|---|-------|--------|
| 0 | Scaffolding (go.work, backend skeleton, docs, CI) | **done** |
| 1 | Backend foundation (config, server, Postgres+goose, sessions, accounts) | **done** |
| 2 | Engine package over scrabble-solver | **done** |
| 3 | Game domain (lifecycle, rules, hint, word-check, history+GCG, stats) | **done** |
| 4 | Lobby & social (matchmaking, friends, block, chat, profile, nudge) | **done** |
| 5 | Robot opponent | **done** |
| 6 | Gateway edge (Connect/FB, platform auth, sessions, push bridge, admin) | todo |
| 7 | UI (plain Svelte + Vite, board, lobby, chat, i18n) | todo |
| 8 | Telegram integration (bot side-service, deep-link, push) | todo |
| 9 | Admin & dictionary ops (complaint review, version reload) | todo |
| 10 | Account linking & merge | todo |
| 11 | Polish (observability, perf with evidence, deploy) | todo |

Scaffolding is incremental: `go.work` lists only existing modules; each stage
adds the modules it needs.

## Stages

Each stage: read this plan + relevant docs, **interview the owner on the open
details below**, implement within scope, then update plan/docs/code and get CI
green before marking done.

### Stage 0 — Scaffolding *(done)*
Scope: `go.work` (Go 1.26.3, `use ./backend`); minimal runnable `backend`
(gin, zap, `/healthz`, `/readyz`, env config); docs skeleton; `PLAN.md`;
`CLAUDE.md`; `.gitea/workflows/go-unit.yaml`; README; `.gitignore`.
Acceptance: `go build ./backend/...` + `go vet` + gofmt clean +
`go test ./backend/...` green; CI green on push.

### Stage 1 — Backend foundation
Scope: config/server route groups (`/api/v1/{public,user,internal,admin}`,
probes), Postgres (pgx) + embedded goose migrations + schema `backend`,
telemetry (OTel) wiring, in-memory cache scaffolding, thin sessions + accounts +
platform identities.
Open details: Postgres version + DSN/`search_path` convention; jet vs
sqlc/sqlx (default jet); migration naming; exact session-token shape (opaque
random length, TTL, revocation); account/identity table shape; whether the
admin bootstrap lands here or in Stage 9.

### Stage 2 — Engine package
Scope: `backend/internal/engine` over scrabble-solver — versioned DAWG
load/registry, GenerateMoves/ValidatePlay/ScorePlay wrappers, bag/rack, the
**dictionary-independent** game-state model + decode helpers. Add
`replace scrabble-solver => ../scrabble-solver` to `go.work` here and solve the
CI sibling-checkout (clone `gitea.iliadenisov.ru/.../scrabble-solver`).
Open details: how CI obtains the solver (clone sibling vs publish/tag the
solver module); in-memory game-state representation; how blanks and exchanges
are modelled; Эрудит specifics to verify against the solver.

### Stage 3 — Game domain
Scope: create/join, turn order, submit play/pass/exchange/resign,
validate-at-submit, scoring, end-conditions, 24h timeout/auto-resign, hint,
word-check + complaint capture, structured history + GCG writer, stats on
finish.
Open details: GCG dialect details (blanks, exchanges, notation); exact stats
edge cases; turn-timeout scheduler mechanism (cron vs per-game timer);
complaint payload shape.

### Stage 4 — Lobby & social
Scope: matchmaking pool, friends, block, per-game chat, profile + email
confirm-code, nudge.
Open details: pool fairness/keying confirmation; deep-link format per platform;
chat length limit + retention; friend-request lifecycle; email-code provider
(SMTP relay choice).

### Stage 5 — Robot opponent
Scope: human-like player — balance ~0.40, margin targeting, skewed [2,90]min
timing + sleep + nudge logic, friend/DM blocking, name pool.
Open details: exact delay distribution + parameters; margin band; name pool
source; how the scheduler drives robot moves; metrics for tuning balance.

### Stage 6 — Gateway edge
Scope: Connect/gRPC-Web (h2c), Telegram initData validation → session →
`X-User-ID`, in-memory rate-limit, admin Basic-Auth passthrough, FlatBuffers
transcoding, in-app push stream bridging backend `push` gRPC stream, email +
ephemeral-guest paths.
Open details: FlatBuffers schema layout + message_type catalog; rate-limit
classes/limits; admin surface routing; session cache shape at the gateway.

### Stage 7 — UI
Scope: plain Svelte + Vite static; Connect-web + FlatBuffers client; lobby (my
games, profile tabs); board (HTML5/CSS grid, drag-n-drop, no assets); chat;
hint/word-check; in-app stream; i18n en/ru; in-memory session (+IndexedDB if
available); Capacitor-ready structure.
Open details: detailed game-board UX (deferred by the owner to this stage);
client routing; offline/refresh behaviour; design system / theming.

### Stage 8 — Telegram integration
Scope: bot side-service, deep-link invites, platform push (your-turn / nudge),
Mini App launch/auth; backend↔platform internal API.
Open details: bot framework/library; deep-link scheme; push message templates;
internal API contract; Mini App hosting/origin.

### Stage 9 — Admin & dictionary ops
Scope: admin endpoints (users, games, complaint review queue, dictionary
versions + reload), complaint→dictionary update pipeline.
Open details: whether a server-rendered console is wanted or JSON-only; the
dictionary rebuild/deploy pipeline; complaint resolution workflow.

### Stage 10 — Account linking & merge
Scope: link-via-confirm; merge-into-A (stats sum, transfer games/friends,
dedupe). High blast-radius — focused regression tests.
Open details: conflict resolution (active games on both, duplicate friends,
display-name collisions); irreversibility/audit; confirm-flow per platform.

### Stage 11 — Polish
Scope: observability dashboards, evidence-based performance work, prod
build/deploy.
Open details: deployment target/host; dashboards; load expectations.

## Refinements logged during implementation

- **Stage 0**: solver `replace` deferred to Stage 2 (nothing imports it yet;
  adding the path now would break CI, which checks out only this repo). Docker /
  compose deferred to a stage that has something to deploy. Trunk is `master`
  (owner preference); `feature/*` + PR from Stage 1; the genesis commit lands on
  `master` by necessity.
- **Stage 1** (interview + implementation):
  - Query layer: **go-jet** over `database/sql` (pgx stdlib) + otelsql; a
    `cmd/jetgen` tool regenerates the **committed** code from a throwaway
    container. Postgres **17** pinned for jetgen, tests and prod.
  - Sessions: opaque token stored only as a **SHA-256 hash** (kept as hex
    `text`, not `bytea` — avoids jet bytea-literal friction), **revoke-only**
    (no TTL); revocation-audit table deferred. Backend keeps a warmed
    write-through session cache that gates `/readyz`.
  - Data model: **UUIDv7** PKs; one unified `identities` table
    (`kind ∈ telegram|email`, widen to `vk`/`max` later); no soft-delete /
    actor-audit columns yet.
  - HTTP surface: **service/store/cache layer only**. `/api/v1/{public,user,
    internal,admin}` groups + `X-User-ID` middleware are scaffolding (exposed via
    `Server` group accessors); the session/account REST handlers land with the
    gateway in **Stage 6**. Admin bootstrap deferred to **Stage 9**.
  - Telemetry: providers + request-timing middleware + otelsql; exporters
    `none` (default) / `stdout`; OTLP + dashboards deferred to **Stage 11**.
  - Tests/CI: integration tests behind the `integration` build tag in
    `backend/internal/inttest` + new `integration.yaml` (testcontainers, Ryuk
    off, serial), firing on push and PR. Backend now **hard-depends on Postgres
    at boot** (migrations at startup) — a deliberate contract change from
    Stage 0, documented in both READMEs. All code stays in the existing
    `backend` module under `internal/` (+ `cmd/jetgen`); `go.work` untouched.
- **Stage 2** (interview + implementation):
  - Scope: `internal/engine` is a self-contained **library** (registry, bag,
    `Game` state machine, decode/replay). No `config`/`main`/`server` wiring this
    stage — there is no consumer yet; wiring lands in **Stage 3**, mirroring
    Stage 1's deferred handlers.
  - **Pure rules engine** (interview): the engine owns the in-memory `Game`,
    pure transitions (play/pass/exchange/resign + draw) **and end-condition
    detection**, including the standard **end-game rack-adjustment scoring** — a
    deliberate slice of Stage 3's "scoring/end-conditions" that the pure-engine
    boundary implies. Stage 3 keeps scheduling, the 24h timeout, persistence and
    GCG.
  - **Solver wiring**: `replace scrabble-solver => ../scrabble-solver` in
    `go.work`; `backend/go.mod` requires `scrabble-solver` (placeholder version,
    redirected by the replace) and `github.com/iliadenisov/dafsa` directly (for
    `dawg.Load`). CI clones the **public** solver repo at **master HEAD**
    anonymously into `../scrabble-solver` (no token); both Go workflows gained
    the step (the engine's untagged tests run under the integration workflow too)
    and set `BACKEND_DICT_DIR`.
  - **Dictionaries**: registry loads the committed DAWGs from a directory
    parameter; `dict_version` is an explicit string label; the latest version
    per variant is tracked. Smoke tests validate a known word per variant
    (English/Russian/Эрудит). **Эрудит is handled uniformly** — every real
    difference is already in `rules.Erudit()`; the move.go "single orientation
    per turn" note needs no special code (any single play is one-directional).
  - **Bag/blanks/exchange**: own deterministic `Bag` (Draw + Return) because
    `selfplay.Bag` cannot return tiles; exchange is legal only when the bag holds
    at least a rack and draws replacements before returning the swapped tiles. A
    blank is `Placement{Blank:true}` carrying its designated letter; the history
    keeps the concrete letter plus a blank flag (decoded via `Alphabet.Character`
    / `Decode`). `ReplayBoard` reuses `scrabble.Apply`, so no `internal/encoding`
    dependency.
  - **Deviation from the approved plan**: `docs/FUNCTIONAL.md` (+`_ru`) was left
    unchanged. Stage 2 adds no user-visible behaviour; the variant, per-game
    dictionary and dictionary-independent-history user stories already live in
    Stages 3–4, so a "light touch" here would have duplicated or pre-empted them.
- **Stage 3** (interview + implementation):
  - Scope, as in Stages 1–2: **domain service/store layer + engine wiring, no
    HTTP** (`internal/game`). The gateway↔backend REST surface lands in Stage 6;
    the only active driver this stage is a background turn-timeout sweeper started
    from `main`. The robot (Stage 5) will consume the same service API.
  - **Persistence = event-sourcing + warm cache** (interview): durable state is
    the `games` row plus an append-only decoded move journal (`game_moves`); the
    live position is an `engine.Game` kept in an in-memory cache with a ~24h idle
    TTL and rebuilt by replaying the journal on a miss (the seeded bag makes
    replay exact). Each game is serialised by a per-game mutex; a persistence
    failure evicts the live game so the next access rebuilds. §9 reworded from
    "stored structurally" to this model.
  - **Resign/timeout split** (interview): 2-player resign/timeout only this stage
    (the other player wins); multiplayer drop-out-and-continue + resigned-tiles
    disposition deferred to Stage 4. Per-game **turn-timeout duration** setting
    (5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h) and a per-user **away window**
    (`accounts.away_start/away_end`, default 00:00–07:00 local, honoured by the
    sweeper with midnight-cross handling) added now; profile editing of the away
    window is Stage 4 and the robot's sleep (Stage 5) reuses it.
  - **Engine `Resign` fix** (interview, in `internal/engine`): the resigner keeps
    their accumulated score (no end-game rack adjustment) and never wins; `winner`
    excludes the resigner, so a two-player resign/timeout gives the win to the
    other player regardless of score. Timeout reuses `Resign`, so the game domain
    needs no winner override.
  - **Additive engine domain API**: `Direction`, `Game.SubmitPlay/SubmitExchange/
    EvaluatePlay/HintView/Hand`, `MoveRecord.{Dir,MainRow,MainCol}`,
    `Registry.Lookup`, `ParseVariant` — so `internal/game` never imports
    `scrabble-solver` (keeps the §5 single-importer invariant).
  - **Create = atomic with seats** (interview): `Create` seats all accounts and
    starts; lobby seat-filling is Stage 4. **Sweeper = periodic goroutine**
    (interview; default 60 s, `BACKEND_GAME_TIMEOUT_SWEEP_INTERVAL`).
  - **Hint = settings + wallet** (interview): per-game `hints_allowed` +
    `hints_per_player`, plus a profile wallet `accounts.hint_balance` (spent after
    the allowance; purchases later). Category defaults (random 1 / tournament 0 /
    friendly 1-or-0) are the caller's job (lobby/tournaments).
  - **Stats** (interview): `account_stats` with **`draws`** added beyond §9's
    wins/losses; `max_word_points` = best single **move** score; ties draw,
    resign/timeout is a loss, guests get no stats.
  - **Complaint** (interview): full payload with `game_id`; word-check is scoped
    to the game's pinned `(variant, dict_version)`. Stage 9 owns the resolution
    lifecycle, so the `status` column carries no value CHECK yet.
  - **GCG** (interview): standard Poslfit dialect (UTF-8, `#player`/`#lexicon`
    pragmas, `8G`/`H8` coordinates, lower-case blanks, `.` pass-throughs, `-TILES`
    exchange) plus `#note` lines for resign/timeout; derived from the journal, so
    dictionary-independent.
  - **Engine wiring + config**: `main` loads the registry (`engine.Open`, a hard
    boot dependency like migrations) and starts the sweeper. New config:
    `BACKEND_DICT_DIR` (required), `BACKEND_DICT_VERSION` (default `v1`),
    `BACKEND_GAME_TIMEOUT_SWEEP_INTERVAL` (60 s), `BACKEND_GAME_CACHE_TTL` (24 h).
    No CI change — both Go workflows already clone the solver sibling and export
    `BACKEND_DICT_DIR`. `accounts` gained `away_start`/`away_end`/`hint_balance`
    and the `account` package gained `SpendHint` (it owns its table).

- **Stage 4** (interview + implementation):
  - Scope, as in Stages 1–3: **domain service/store layer, no HTTP** — REST/stream
    is Stage 6. Chat and nudges are **persisted** now; live delivery (push /
    in-app stream) is Stage 6/8. New packages `internal/social` (friends, blocks,
    chat+nudge) and `internal/lobby` (matchmaking + invitations); profile editing
    and the email confirm-code extend `internal/account`. The services have no
    active driver this stage, so `main` builds them and hands them to the server,
    which exposes them via accessors (the Stage 1 scaffolding-accessor pattern) for
    the Stage 6 handlers.
  - **Friends** (interview): request → accept on a single `friendships` table;
    decline/cancel delete the pending row; **blocking severs** any friendship.
  - **Blocks** (interview): the existing global toggles **plus** a per-user
    `blocks` table; block effects are **mutual** (a block either way suppresses
    chat visibility and prevents requests/invitations between the pair).
  - **Friend games** (interview): invitation → accept; the game starts only when
    **all** invitees accept, any decline cancels it, and a pending invitation
    **lazily expires after 7 days** (checked on access — no new sweeper).
  - **Chat** (interview): ≤ **60 runes**, stored with the game forever, the
    sender **IP** kept for moderation (as `text`, following Stage 1's no-`bytea`
    precedent; the gateway forwards it in Stage 6), input **content-filtered**
    (links/emails/phone numbers incl. obfuscated forms) via `mvdan.cc/xurls/v2`
    plus a compact leet/separator normaliser and a ≥7-digit phone heuristic — the
    one new dependency. **Nudge is a chat message** (`kind='nudge'`), rate-limited
    to once per hour per game per sender.
  - **Matchmaking** (interview): an **in-memory** FIFO pool keyed by **variant**
    only (variant fixes the board language), pairing two humans (seat order
    randomised). The 10 s wait and **robot substitution are deferred to Stage 5**.
    The pool does **not** consult blocks (auto-match is anonymous) — a deliberate
    simplification of the plan's optional block-skip that also avoids a DB call
    under the pool lock.
  - **Email confirm-code** (interview): 6-digit code, 15-min TTL, ≤ 5 attempts,
    stored as a **SHA-256 hash**; a `Mailer` seam with an SMTP relay
    (`BACKEND_SMTP_*`) and a default **log mailer**. It binds an email to the
    current account; an email already confirmed by another account → `ErrEmailTaken`
    (**merge is Stage 10**); email-as-login is Stage 6 and reuses this mechanism.
  - **Multi-player drop-out** (interview; discharges the Stage 3 deferral): the
    engine's `Resign` now drops a seat and the rest **play on** while ≥ 2 are
    active, finishing (last-survivor wins) when one remains; `winner` excludes all
    resigned seats. A per-game **`dropout_tiles`** setting (`remove` default |
    `return`) governs the leaver's rack, which is **never revealed** to the others.
    Timeout reuses `Resign`, so a multi-player timeout drops one seat and play
    continues; `game.commit`/`timeoutGame` were already keyed on `g.Over()`, so they
    only needed the setting threaded through create/replay.
  - **Build/deps**: `go mod tidy` is not run — the bare-path `scrabble-solver`
    replace lives only in `go.work`, so `tidy`/`go get` cannot resolve it; the
    `xurls` dependency was added with `go mod edit -require` + `go mod download`,
    its checksums recorded in the committed **`go.work.sum`**. No CI workflow change
    (both Go workflows already clone the solver sibling and export
    `BACKEND_DICT_DIR`).

- **Stage 5** (interview + implementation):
  - Scope, as in Stages 1–4: **domain layer, no HTTP** — the robot consumes the
    public game API as an ordinary seated player (`internal/robot`), so only
    `internal/engine` still imports the solver. New: `engine.Candidates()` (decoded
    ranked plays) and a thin `game.Service.Candidates` + `RobotTurns` read.
  - **Account model** (interview): a pool of **durable accounts**, each a single
    `identities` row `kind='robot'` (migration `00004` widens the kind CHECK — a
    CHECK-only change, no jetgen). A curated ~16-name pool in code; `EnsurePool`
    provisions them idempotently at boot (a hard dependency, like the registry) with
    `block_chat`/`block_friend_requests` set, which is **all** the friend/DM blocking
    needs (no special-casing).
  - **Driver + state** (interview): a background sweeper goroutine
    (`robot.Service.Run`/`Drive`, mirroring the timeout sweeper); **every per-game
    and per-turn choice is derived deterministically from the game `seed`** (FNV-1a
    mix, restart-stable — not `hash/maphash`), so the robot keeps **no extra state**.
    `playToWin = mix(seed,"win")%100 < 40`; per-turn `delay`; sleep `drift`.
  - **Timing** (interview): per-move delay `2 + 88·u^k` minutes, `u~U(0,1)`,
    **k≈3.5 → median ~10 min**, clamped to [2,90]. A daytime nudge on the robot's
    turn pulls the move into a 2–10 min reply window; the robot proactively nudges
    after **12 h** idle on the human's turn (reusing `social.Nudge`'s once-per-hour
    guard; `social.LastNudgeAt` added to detect the human's nudge).
  - **Sleep** (interview — resolves the §7-vs-`account.go` mismatch): the robot
    sleeps 00:00–07:00 in the **opponent's timezone shifted by a per-game drift ∈
    [−3,+3]h** (so its night overlaps the human's rather than running anti-phase),
    computed on the fly per game — **no profile mutation, no concurrency cap**. The
    `account.go` away-window comment was corrected accordingly.
  - **Margin** (interview): pick the candidate whose resulting margin (own+move−opp)
    is closest to **[1,30]** when playing to win / **[−30,−1]** when playing to lose,
    tie-broken toward the conservative edge; no legal play → exchange the full rack
    when the bag can refill it, else pass.
  - **Substitution** (interview): a matchmaker **reaper** (`Reap`/`RunReaper`)
    substitutes a pooled robot after a **10 s** wait (`BACKEND_LOBBY_ROBOT_WAIT`),
    `NewMatchmaker` now takes a `RobotProvider`. A waiter learns of a match — human
    pairing **or** substitution — through a new `Poll` + results map; production
    delivery is a **match-found notification** (session/in-app push + side-service),
    Stage 6/8 — noted in §10.
  - **Metrics** (interview, 1+2): robots are durable accounts, so `account_stats`
    is the authoritative, complete balance ground-truth (target ~40% robot wins);
    an OTel counter (`robot_games_finished_total`, exporter `none` today) and a
    structured log cover robot-finished games for live observation.
  - **Config**: `BACKEND_ROBOT_DRIVE_INTERVAL` (30 s), `BACKEND_LOBBY_ROBOT_WAIT`
    (10 s), `BACKEND_LOBBY_REAPER_INTERVAL` (1 s). No CI change (both Go workflows
    already clone the solver sibling and export `BACKEND_DICT_DIR`).

## Deferred TODOs (cross-stage)

- **TODO-1 — publish & version the solver.** Once `scrabble-solver` is stable,
  give it a real module URL and switch `backend` to a versioned dependency,
  dropping the `go.work` replace and the CI clone. Removes the floating
  `master` dependency accepted for now (Stage 2 interview).
- **TODO-2 — split the solver into engine vs dictionary generator + versioned
  dictionary artifacts.** Owner's idea, with the caveats agreed at the Stage 2
  interview: the split is sound (build-time wordlist→DAWG vs runtime load have
  different lifecycles and shrink the runtime dependency surface), **but** the
  generator must pin the **same** `dafsa`/`alphabet` versions and alphabet
  definitions as the runtime engine or the on-disk format / letter indexing
  drifts and silently corrupts validation. For delivery prefer **Git LFS or an
  artifact store** (Gitea releases / OCI artifact / object storage) over a raw
  git submodule (the ~0.5–0.7 MB DAWGs are regenerated wholesale and bloat git
  history); pin by tag/hash for a reproducible startup set. A submodule/LFS pull
  is a **deploy-time** way to populate the directory, **not** the runtime
  dynamic-reload mechanism (Stage 9) — keep the `BACKEND_DICT_DIR` directory as
  the runtime contract: a new `.dawg` appears in it and is loaded with
  `dawg.Load`.