Files
scrabble-game/PLAN.md
T
Ilia Denisov 85baabe4ba
Tests · Go / test (push) Successful in 6s
Tests · Integration / integration (push) Successful in 10s
Tests · Go / test (pull_request) Successful in 5s
Tests · Integration / integration (pull_request) Successful in 10s
Stage 5: robot opponent (pool, seed-derived strategy, move driver, matchmaker substitution)
- internal/robot: durable kind='robot' account pool (migration 00004); every
  per-game and per-turn choice derived deterministically from the game seed
  (restart-stable FNV mix); a background move driver; margin targeting (band
  1-30, closest-to-band); right-skewed [2,90]min delays (median ~10m);
  opponent-anchored sleep with +/-3h drift; daytime nudge reply + proactive
  12h nudge; friend/chat blocked via profile toggles.
- engine.Candidates (decoded ranked plays); game.Candidates + RobotTurns;
  social.LastNudgeAt.
- matchmaker: 10s wait then robot substitution (reaper) + Poll delivery seam.
- config (BACKEND_ROBOT_DRIVE_INTERVAL, BACKEND_LOBBY_ROBOT_WAIT,
  BACKEND_LOBBY_REAPER_INTERVAL); main wiring + boot-time pool provisioning.
- metrics: robot account_stats (authoritative balance) + robot_games_finished_total
  OTel counter + per-finish log.
- docs: PLAN, ARCHITECTURE, FUNCTIONAL(+ru), TESTING, README; account.go comment.
- tests: robot strategy units, matchmaker reaper/Poll, engine.Candidates; inttest
  robot full-game / substitution / proactive-nudge.
2026-06-02 21:02:20 +02:00

24 KiB
Raw Blame History

Scrabble Game — implementation plan

Living plan and stage tracker. Each stage is implemented in its own session; the rules for starting and finishing a stage are in CLAUDE.md. The architecture/decision record is docs/ARCHITECTURE.md; behaviour is docs/FUNCTIONAL.md. When a stage produces a decision, bake it back here and into the affected docs/code in the same PR.

Context

Greenfield multiplatform Scrabble. Players arrive from a platform (Telegram first; later VK/MAX/iOS/Android) or standalone web (email / guest). Three executables — gateway, backend, ui — plus per-platform side-services. Deliberately simpler than the sibling ../galaxy-game (idea donor, not a template). The ../scrabble-solver engine is embedded in-process as a library.

Locked decisions (recap — full record in docs/ARCHITECTURE.md)

Stack: go.work monorepo, modules scrabble/<name>, Go 1.26.x, backend gin+pgx+Postgres(schema backend)+goose+zap+OTel (deps added when first used). Wire: Connect-RPC + FlatBuffers (client↔gateway), REST/JSON + X-User-ID (gateway↔backend), gRPC server-stream for live events. Auth: platform-native, thin opaque session token, no Ed25519/signing, likely no Redis. UI: pure HTML5/CSS, plain Svelte + Vite, Capacitor for native. MVP surfaces: Telegram + web (email + ephemeral guest) + link/merge. Variants: ru/en/Эрудит. Legality: validate-at-submit. End: empty bag+rack / 6 scoreless / 24h timeout. Hint: top-1. Word-check: unlimited + complaint. Robot: P(win)≈0.40, margin targeting, [2,90]min skewed timing, sleep 00:0007:00 opp-tz, nudge logic. Dictionary: pin per game. History: structured + GCG export, dictionary- independent (see ARCHITECTURE §9.1).

Stage tracker

# Stage Status
0 Scaffolding (go.work, backend skeleton, docs, CI) done
1 Backend foundation (config, server, Postgres+goose, sessions, accounts) done
2 Engine package over scrabble-solver done
3 Game domain (lifecycle, rules, hint, word-check, history+GCG, stats) done
4 Lobby & social (matchmaking, friends, block, chat, profile, nudge) done
5 Robot opponent done
6 Gateway edge (Connect/FB, platform auth, sessions, push bridge, admin) todo
7 UI (plain Svelte + Vite, board, lobby, chat, i18n) todo
8 Telegram integration (bot side-service, deep-link, push) todo
9 Admin & dictionary ops (complaint review, version reload) todo
10 Account linking & merge todo
11 Polish (observability, perf with evidence, deploy) todo

Scaffolding is incremental: go.work lists only existing modules; each stage adds the modules it needs.

Stages

Each stage: read this plan + relevant docs, interview the owner on the open details below, implement within scope, then update plan/docs/code and get CI green before marking done.

Stage 0 — Scaffolding (done)

Scope: go.work (Go 1.26.3, use ./backend); minimal runnable backend (gin, zap, /healthz, /readyz, env config); docs skeleton; PLAN.md; CLAUDE.md; .gitea/workflows/go-unit.yaml; README; .gitignore. Acceptance: go build ./backend/... + go vet + gofmt clean + go test ./backend/... green; CI green on push.

Stage 1 — Backend foundation

Scope: config/server route groups (/api/v1/{public,user,internal,admin}, probes), Postgres (pgx) + embedded goose migrations + schema backend, telemetry (OTel) wiring, in-memory cache scaffolding, thin sessions + accounts + platform identities. Open details: Postgres version + DSN/search_path convention; jet vs sqlc/sqlx (default jet); migration naming; exact session-token shape (opaque random length, TTL, revocation); account/identity table shape; whether the admin bootstrap lands here or in Stage 9.

Stage 2 — Engine package

Scope: backend/internal/engine over scrabble-solver — versioned DAWG load/registry, GenerateMoves/ValidatePlay/ScorePlay wrappers, bag/rack, the dictionary-independent game-state model + decode helpers. Add replace scrabble-solver => ../scrabble-solver to go.work here and solve the CI sibling-checkout (clone gitea.iliadenisov.ru/.../scrabble-solver). Open details: how CI obtains the solver (clone sibling vs publish/tag the solver module); in-memory game-state representation; how blanks and exchanges are modelled; Эрудит specifics to verify against the solver.

Stage 3 — Game domain

Scope: create/join, turn order, submit play/pass/exchange/resign, validate-at-submit, scoring, end-conditions, 24h timeout/auto-resign, hint, word-check + complaint capture, structured history + GCG writer, stats on finish. Open details: GCG dialect details (blanks, exchanges, notation); exact stats edge cases; turn-timeout scheduler mechanism (cron vs per-game timer); complaint payload shape.

Stage 4 — Lobby & social

Scope: matchmaking pool, friends, block, per-game chat, profile + email confirm-code, nudge. Open details: pool fairness/keying confirmation; deep-link format per platform; chat length limit + retention; friend-request lifecycle; email-code provider (SMTP relay choice).

Stage 5 — Robot opponent

Scope: human-like player — balance ~0.40, margin targeting, skewed [2,90]min timing + sleep + nudge logic, friend/DM blocking, name pool. Open details: exact delay distribution + parameters; margin band; name pool source; how the scheduler drives robot moves; metrics for tuning balance.

Stage 6 — Gateway edge

Scope: Connect/gRPC-Web (h2c), Telegram initData validation → session → X-User-ID, in-memory rate-limit, admin Basic-Auth passthrough, FlatBuffers transcoding, in-app push stream bridging backend push gRPC stream, email + ephemeral-guest paths. Open details: FlatBuffers schema layout + message_type catalog; rate-limit classes/limits; admin surface routing; session cache shape at the gateway.

Stage 7 — UI

Scope: plain Svelte + Vite static; Connect-web + FlatBuffers client; lobby (my games, profile tabs); board (HTML5/CSS grid, drag-n-drop, no assets); chat; hint/word-check; in-app stream; i18n en/ru; in-memory session (+IndexedDB if available); Capacitor-ready structure. Open details: detailed game-board UX (deferred by the owner to this stage); client routing; offline/refresh behaviour; design system / theming.

Stage 8 — Telegram integration

Scope: bot side-service, deep-link invites, platform push (your-turn / nudge), Mini App launch/auth; backend↔platform internal API. Open details: bot framework/library; deep-link scheme; push message templates; internal API contract; Mini App hosting/origin.

Stage 9 — Admin & dictionary ops

Scope: admin endpoints (users, games, complaint review queue, dictionary versions + reload), complaint→dictionary update pipeline. Open details: whether a server-rendered console is wanted or JSON-only; the dictionary rebuild/deploy pipeline; complaint resolution workflow.

Stage 10 — Account linking & merge

Scope: link-via-confirm; merge-into-A (stats sum, transfer games/friends, dedupe). High blast-radius — focused regression tests. Open details: conflict resolution (active games on both, duplicate friends, display-name collisions); irreversibility/audit; confirm-flow per platform.

Stage 11 — Polish

Scope: observability dashboards, evidence-based performance work, prod build/deploy. Open details: deployment target/host; dashboards; load expectations.

Refinements logged during implementation

  • Stage 0: solver replace deferred to Stage 2 (nothing imports it yet; adding the path now would break CI, which checks out only this repo). Docker / compose deferred to a stage that has something to deploy. Trunk is master (owner preference); feature/* + PR from Stage 1; the genesis commit lands on master by necessity.

  • Stage 1 (interview + implementation):

    • Query layer: go-jet over database/sql (pgx stdlib) + otelsql; a cmd/jetgen tool regenerates the committed code from a throwaway container. Postgres 17 pinned for jetgen, tests and prod.
    • Sessions: opaque token stored only as a SHA-256 hash (kept as hex text, not bytea — avoids jet bytea-literal friction), revoke-only (no TTL); revocation-audit table deferred. Backend keeps a warmed write-through session cache that gates /readyz.
    • Data model: UUIDv7 PKs; one unified identities table (kind ∈ telegram|email, widen to vk/max later); no soft-delete / actor-audit columns yet.
    • HTTP surface: service/store/cache layer only. /api/v1/{public,user, internal,admin} groups + X-User-ID middleware are scaffolding (exposed via Server group accessors); the session/account REST handlers land with the gateway in Stage 6. Admin bootstrap deferred to Stage 9.
    • Telemetry: providers + request-timing middleware + otelsql; exporters none (default) / stdout; OTLP + dashboards deferred to Stage 11.
    • Tests/CI: integration tests behind the integration build tag in backend/internal/inttest + new integration.yaml (testcontainers, Ryuk off, serial), firing on push and PR. Backend now hard-depends on Postgres at boot (migrations at startup) — a deliberate contract change from Stage 0, documented in both READMEs. All code stays in the existing backend module under internal/ (+ cmd/jetgen); go.work untouched.
  • Stage 2 (interview + implementation):

    • Scope: internal/engine is a self-contained library (registry, bag, Game state machine, decode/replay). No config/main/server wiring this stage — there is no consumer yet; wiring lands in Stage 3, mirroring Stage 1's deferred handlers.
    • Pure rules engine (interview): the engine owns the in-memory Game, pure transitions (play/pass/exchange/resign + draw) and end-condition detection, including the standard end-game rack-adjustment scoring — a deliberate slice of Stage 3's "scoring/end-conditions" that the pure-engine boundary implies. Stage 3 keeps scheduling, the 24h timeout, persistence and GCG.
    • Solver wiring: replace scrabble-solver => ../scrabble-solver in go.work; backend/go.mod requires scrabble-solver (placeholder version, redirected by the replace) and github.com/iliadenisov/dafsa directly (for dawg.Load). CI clones the public solver repo at master HEAD anonymously into ../scrabble-solver (no token); both Go workflows gained the step (the engine's untagged tests run under the integration workflow too) and set BACKEND_DICT_DIR.
    • Dictionaries: registry loads the committed DAWGs from a directory parameter; dict_version is an explicit string label; the latest version per variant is tracked. Smoke tests validate a known word per variant (English/Russian/Эрудит). Эрудит is handled uniformly — every real difference is already in rules.Erudit(); the move.go "single orientation per turn" note needs no special code (any single play is one-directional).
    • Bag/blanks/exchange: own deterministic Bag (Draw + Return) because selfplay.Bag cannot return tiles; exchange is legal only when the bag holds at least a rack and draws replacements before returning the swapped tiles. A blank is Placement{Blank:true} carrying its designated letter; the history keeps the concrete letter plus a blank flag (decoded via Alphabet.Character / Decode). ReplayBoard reuses scrabble.Apply, so no internal/encoding dependency.
    • Deviation from the approved plan: docs/FUNCTIONAL.md (+_ru) was left unchanged. Stage 2 adds no user-visible behaviour; the variant, per-game dictionary and dictionary-independent-history user stories already live in Stages 34, so a "light touch" here would have duplicated or pre-empted them.
  • Stage 3 (interview + implementation):

    • Scope, as in Stages 12: domain service/store layer + engine wiring, no HTTP (internal/game). The gateway↔backend REST surface lands in Stage 6; the only active driver this stage is a background turn-timeout sweeper started from main. The robot (Stage 5) will consume the same service API.
    • Persistence = event-sourcing + warm cache (interview): durable state is the games row plus an append-only decoded move journal (game_moves); the live position is an engine.Game kept in an in-memory cache with a ~24h idle TTL and rebuilt by replaying the journal on a miss (the seeded bag makes replay exact). Each game is serialised by a per-game mutex; a persistence failure evicts the live game so the next access rebuilds. §9 reworded from "stored structurally" to this model.
    • Resign/timeout split (interview): 2-player resign/timeout only this stage (the other player wins); multiplayer drop-out-and-continue + resigned-tiles disposition deferred to Stage 4. Per-game turn-timeout duration setting (5/10/15/30 min, 1/2/3/6/12/24 h; default 24 h) and a per-user away window (accounts.away_start/away_end, default 00:0007:00 local, honoured by the sweeper with midnight-cross handling) added now; profile editing of the away window is Stage 4 and the robot's sleep (Stage 5) reuses it.
    • Engine Resign fix (interview, in internal/engine): the resigner keeps their accumulated score (no end-game rack adjustment) and never wins; winner excludes the resigner, so a two-player resign/timeout gives the win to the other player regardless of score. Timeout reuses Resign, so the game domain needs no winner override.
    • Additive engine domain API: Direction, Game.SubmitPlay/SubmitExchange/ EvaluatePlay/HintView/Hand, MoveRecord.{Dir,MainRow,MainCol}, Registry.Lookup, ParseVariant — so internal/game never imports scrabble-solver (keeps the §5 single-importer invariant).
    • Create = atomic with seats (interview): Create seats all accounts and starts; lobby seat-filling is Stage 4. Sweeper = periodic goroutine (interview; default 60 s, BACKEND_GAME_TIMEOUT_SWEEP_INTERVAL).
    • Hint = settings + wallet (interview): per-game hints_allowed + hints_per_player, plus a profile wallet accounts.hint_balance (spent after the allowance; purchases later). Category defaults (random 1 / tournament 0 / friendly 1-or-0) are the caller's job (lobby/tournaments).
    • Stats (interview): account_stats with draws added beyond §9's wins/losses; max_word_points = best single move score; ties draw, resign/timeout is a loss, guests get no stats.
    • Complaint (interview): full payload with game_id; word-check is scoped to the game's pinned (variant, dict_version). Stage 9 owns the resolution lifecycle, so the status column carries no value CHECK yet.
    • GCG (interview): standard Poslfit dialect (UTF-8, #player/#lexicon pragmas, 8G/H8 coordinates, lower-case blanks, . pass-throughs, -TILES exchange) plus #note lines for resign/timeout; derived from the journal, so dictionary-independent.
    • Engine wiring + config: main loads the registry (engine.Open, a hard boot dependency like migrations) and starts the sweeper. New config: BACKEND_DICT_DIR (required), BACKEND_DICT_VERSION (default v1), BACKEND_GAME_TIMEOUT_SWEEP_INTERVAL (60 s), BACKEND_GAME_CACHE_TTL (24 h). No CI change — both Go workflows already clone the solver sibling and export BACKEND_DICT_DIR. accounts gained away_start/away_end/hint_balance and the account package gained SpendHint (it owns its table).
  • Stage 4 (interview + implementation):

    • Scope, as in Stages 13: domain service/store layer, no HTTP — REST/stream is Stage 6. Chat and nudges are persisted now; live delivery (push / in-app stream) is Stage 6/8. New packages internal/social (friends, blocks, chat+nudge) and internal/lobby (matchmaking + invitations); profile editing and the email confirm-code extend internal/account. The services have no active driver this stage, so main builds them and hands them to the server, which exposes them via accessors (the Stage 1 scaffolding-accessor pattern) for the Stage 6 handlers.
    • Friends (interview): request → accept on a single friendships table; decline/cancel delete the pending row; blocking severs any friendship.
    • Blocks (interview): the existing global toggles plus a per-user blocks table; block effects are mutual (a block either way suppresses chat visibility and prevents requests/invitations between the pair).
    • Friend games (interview): invitation → accept; the game starts only when all invitees accept, any decline cancels it, and a pending invitation lazily expires after 7 days (checked on access — no new sweeper).
    • Chat (interview): ≤ 60 runes, stored with the game forever, the sender IP kept for moderation (as text, following Stage 1's no-bytea precedent; the gateway forwards it in Stage 6), input content-filtered (links/emails/phone numbers incl. obfuscated forms) via mvdan.cc/xurls/v2 plus a compact leet/separator normaliser and a ≥7-digit phone heuristic — the one new dependency. Nudge is a chat message (kind='nudge'), rate-limited to once per hour per game per sender.
    • Matchmaking (interview): an in-memory FIFO pool keyed by variant only (variant fixes the board language), pairing two humans (seat order randomised). The 10 s wait and robot substitution are deferred to Stage 5. The pool does not consult blocks (auto-match is anonymous) — a deliberate simplification of the plan's optional block-skip that also avoids a DB call under the pool lock.
    • Email confirm-code (interview): 6-digit code, 15-min TTL, ≤ 5 attempts, stored as a SHA-256 hash; a Mailer seam with an SMTP relay (BACKEND_SMTP_*) and a default log mailer. It binds an email to the current account; an email already confirmed by another account → ErrEmailTaken (merge is Stage 10); email-as-login is Stage 6 and reuses this mechanism.
    • Multi-player drop-out (interview; discharges the Stage 3 deferral): the engine's Resign now drops a seat and the rest play on while ≥ 2 are active, finishing (last-survivor wins) when one remains; winner excludes all resigned seats. A per-game dropout_tiles setting (remove default | return) governs the leaver's rack, which is never revealed to the others. Timeout reuses Resign, so a multi-player timeout drops one seat and play continues; game.commit/timeoutGame were already keyed on g.Over(), so they only needed the setting threaded through create/replay.
    • Build/deps: go mod tidy is not run — the bare-path scrabble-solver replace lives only in go.work, so tidy/go get cannot resolve it; the xurls dependency was added with go mod edit -require + go mod download, its checksums recorded in the committed go.work.sum. No CI workflow change (both Go workflows already clone the solver sibling and export BACKEND_DICT_DIR).
  • Stage 5 (interview + implementation):

    • Scope, as in Stages 14: domain layer, no HTTP — the robot consumes the public game API as an ordinary seated player (internal/robot), so only internal/engine still imports the solver. New: engine.Candidates() (decoded ranked plays) and a thin game.Service.Candidates + RobotTurns read.
    • Account model (interview): a pool of durable accounts, each a single identities row kind='robot' (migration 00004 widens the kind CHECK — a CHECK-only change, no jetgen). A curated ~16-name pool in code; EnsurePool provisions them idempotently at boot (a hard dependency, like the registry) with block_chat/block_friend_requests set, which is all the friend/DM blocking needs (no special-casing).
    • Driver + state (interview): a background sweeper goroutine (robot.Service.Run/Drive, mirroring the timeout sweeper); every per-game and per-turn choice is derived deterministically from the game seed (FNV-1a mix, restart-stable — not hash/maphash), so the robot keeps no extra state. playToWin = mix(seed,"win")%100 < 40; per-turn delay; sleep drift.
    • Timing (interview): per-move delay 2 + 88·u^k minutes, u~U(0,1), k≈3.5 → median ~10 min, clamped to [2,90]. A daytime nudge on the robot's turn pulls the move into a 210 min reply window; the robot proactively nudges after 12 h idle on the human's turn (reusing social.Nudge's once-per-hour guard; social.LastNudgeAt added to detect the human's nudge).
    • Sleep (interview — resolves the §7-vs-account.go mismatch): the robot sleeps 00:0007:00 in the opponent's timezone shifted by a per-game drift ∈ [3,+3]h (so its night overlaps the human's rather than running anti-phase), computed on the fly per game — no profile mutation, no concurrency cap. The account.go away-window comment was corrected accordingly.
    • Margin (interview): pick the candidate whose resulting margin (own+moveopp) is closest to [1,30] when playing to win / [30,1] when playing to lose, tie-broken toward the conservative edge; no legal play → exchange the full rack when the bag can refill it, else pass.
    • Substitution (interview): a matchmaker reaper (Reap/RunReaper) substitutes a pooled robot after a 10 s wait (BACKEND_LOBBY_ROBOT_WAIT), NewMatchmaker now takes a RobotProvider. A waiter learns of a match — human pairing or substitution — through a new Poll + results map; production delivery is a match-found notification (session/in-app push + side-service), Stage 6/8 — noted in §10.
    • Metrics (interview, 1+2): robots are durable accounts, so account_stats is the authoritative, complete balance ground-truth (target ~40% robot wins); an OTel counter (robot_games_finished_total, exporter none today) and a structured log cover robot-finished games for live observation.
    • Config: BACKEND_ROBOT_DRIVE_INTERVAL (30 s), BACKEND_LOBBY_ROBOT_WAIT (10 s), BACKEND_LOBBY_REAPER_INTERVAL (1 s). No CI change (both Go workflows already clone the solver sibling and export BACKEND_DICT_DIR).

Deferred TODOs (cross-stage)

  • TODO-1 — publish & version the solver. Once scrabble-solver is stable, give it a real module URL and switch backend to a versioned dependency, dropping the go.work replace and the CI clone. Removes the floating master dependency accepted for now (Stage 2 interview).
  • TODO-2 — split the solver into engine vs dictionary generator + versioned dictionary artifacts. Owner's idea, with the caveats agreed at the Stage 2 interview: the split is sound (build-time wordlist→DAWG vs runtime load have different lifecycles and shrink the runtime dependency surface), but the generator must pin the same dafsa/alphabet versions and alphabet definitions as the runtime engine or the on-disk format / letter indexing drifts and silently corrupts validation. For delivery prefer Git LFS or an artifact store (Gitea releases / OCI artifact / object storage) over a raw git submodule (the ~0.50.7 MB DAWGs are regenerated wholesale and bloat git history); pin by tag/hash for a reproducible startup set. A submodule/LFS pull is a deploy-time way to populate the directory, not the runtime dynamic-reload mechanism (Stage 9) — keep the BACKEND_DICT_DIR directory as the runtime contract: a new .dawg appears in it and is loaded with dawg.Load.