Stage 5: robot opponent (pool, seed-derived strategy, move driver, matchmaker substitution)

- internal/robot: durable kind='robot' account pool (migration 00004); every per-game and per-turn choice derived deterministically from the game seed (restart-stable FNV mix); a background move driver; margin targeting (band 1-30, closest-to-band); right-skewed [2,90]min delays (median ~10m); opponent-anchored sleep with +/-3h drift; daytime nudge reply + proactive 12h nudge; friend/chat blocked via profile toggles. - engine.Candidates (decoded ranked plays); game.Candidates + RobotTurns; social.LastNudgeAt. - matchmaker: 10s wait then robot substitution (reaper) + Poll delivery seam. - config (BACKEND_ROBOT_DRIVE_INTERVAL, BACKEND_LOBBY_ROBOT_WAIT, BACKEND_LOBBY_REAPER_INTERVAL); main wiring + boot-time pool provisioning. - metrics: robot account_stats (authoritative balance) + robot_games_finished_total OTel counter + per-finish log. - docs: PLAN, ARCHITECTURE, FUNCTIONAL(+ru), TESTING, README; account.go comment. - tests: robot strategy units, matchmaker reaper/Poll, engine.Candidates; inttest robot full-game / substitution / proactive-nudge.
2026-06-02 21:02:20 +02:00
parent 12fc6e498e
commit 85baabe4ba
26 changed files with 1700 additions and 85 deletions
@@ -87,7 +87,8 @@ arrive from a platform rather than completing a mandatory registration).
  a platform auto-provisions a durable account bound to that platform identity.
  Concretely, platform and email identities share one `identities` table keyed by
  a unique `(kind, external_id)`; email is an identity with `kind=email` and a
-  `confirmed` flag. The **email confirm-code flow** (Stage 4) binds an email to the
+  `confirmed` flag. A synthetic `kind='robot'` identity (Stage 5) backs each pooled
+  robot opponent (§7). The **email confirm-code flow** (Stage 4) binds an email to the
  authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute
  TTL, ≤ 5 attempts) is sent through a `Mailer` seam (an SMTP relay, or a
  development log mailer when none is configured) and, once verified, attaches a
@@ -191,20 +192,37 @@ Key points:
 ## 7. Robot opponent

 Substitutes for a human in 2-player auto-match when the pool yields no human
-within 10 seconds. Designed to be indistinguishable from a person.
+within 10 seconds (§8). It lives in `internal/robot` and plays as an ordinary
+seated account through the game service, so only `internal/engine` imports the
+solver. It is designed to be indistinguishable from a person.
+
+The robot keeps **no per-game state**: every choice is derived deterministically
+from the game's bag `seed` (a restart-stable FNV-1a mix), so a background driver
+(`robot.Service.Run`, mirroring the turn-timeout sweeper) recomputes the same
+behaviour on every scan and after a restart — the same philosophy as journal
+replay. A pool of durable accounts — each a `kind='robot'` identity (§4),
+provisioned at startup with chat and friend requests blocked — backs the
+human-like name pool; those two profile toggles are all the friend/DM blocking
+requires (there is no DM surface; chat is per-game).

 - **Balance**: at game start it decides once whether to play to win, with
-  `P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%). Adaptive difficulty is
-  post-MVP.
- **Margin targeting**: each turn it picks from `GenerateMoves` a move that
-  keeps the resulting lead (when playing to win) or deficit (when playing to
-  lose) small (≈ 1–20 points), rather than always the maximum.
+  `P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%), derived from the seed.
+  Adaptive difficulty is post-MVP.
+- **Margin targeting**: each turn it picks from the ranked candidates
+  (`engine.Candidates`) the move whose resulting lead (playing to win) or deficit
+  (playing to lose) is closest to a small band (**1–30 points**), rather than
+  always the maximum; with no legal play it exchanges a full rack when the bag can
+  refill it, else passes.
 - **Timing**: per-move delay sampled from a right-skewed distribution (short
-  delays frequent), clamped to **[2, 90] minutes**; **sleeps 00:00–07:00** in
-  the opponent's profile timezone (fallback UTC); on a daytime nudge after 60
-  minutes idle it replies within **2–10 minutes**; it proactively nudges the
-  human after 12 hours idle.
- Blocks friend requests and direct messages; uses a human-like name pool.
+  delays frequent, median ≈ 10 min), clamped to **[2, 90] minutes**; it
+  **sleeps 00:00–07:00** anchored to the **opponent's** profile timezone with a
+  per-game drift of **±3 h** (fallback UTC), so its night overlaps the human's
+  rather than running anti-phase; on a daytime nudge it replies within
+  **2–10 minutes**; it proactively nudges the human after **12 hours** idle
+  (subject to the once-per-hour chat limit).
+- **Observability**: robot accounts accrue ordinary statistics (§9) — the
+  authoritative balance metric (target ≈ 40% robot wins) — and a
+  `robot_games_finished_total` OTel counter plus a per-finish log give a live view.

 ## 8. Lobby & social

@@ -212,8 +230,10 @@ within 10 seconds. Designed to be indistinguishable from a person.
  fixes the board language), pairing the next two humans into a two-player
  auto-match with the seat order randomised for first-move fairness. The pool is
  lost on restart (players re-queue) and is anonymous, so it does not consult
-  blocks. The 10 s wait and the **robot substitution** for a missing human are
-  added in Stage 5.
+  blocks. After **10 s** with no human a background reaper substitutes a pooled
+  robot (§7) and starts the game. A queued player learns of a pairing or a
+  substitution through the matchmaker's `Poll`, the interim delivery seam until the
+  live match-found notification (§10).
 - **Friends**: a **request → accept** graph (one `friendships` table) — add by
  friend list or internal ID now, by platform deep-link with Stage 8. Declining or
  cancelling removes the pending request; blocking someone severs an existing
@@ -252,7 +272,8 @@ within 10 seconds. Designed to be indistinguishable from a person.
  keys are application-generated **UUIDv7**.
 - Tables: `accounts` (durable internal accounts; Stage 3 added the away-window
  columns `away_start`/`away_end` and the hint wallet `hint_balance`),
-  `identities` (platform/email identities, unique `(kind, external_id)`),
+  `identities` (platform/email/robot identities, unique `(kind, external_id)`;
+  Stage 5's migration `00004` admits the `robot` kind),
  `sessions` (revoke-only opaque-token hashes), the Stage 3 game tables
  `games` (Stage 4 added the `dropout_tiles` disposition column), `game_players`,
  `game_moves` (the move journal), `complaints` and `account_stats`, and the
@@ -301,9 +322,12 @@ does not cover.
 Two channels: **platform-native push** (out-of-app, via the platform
 side-service — your-turn, nudge) and the **in-app live stream** (chat,
 opponent-moved, while the app is open). Backend emits notification intents;
-delivery fans out to the appropriate channel. Stage 4 **persists** the
-notification-worthy events (chat messages and nudges) but does not yet deliver
-them: the gRPC stream to the gateway and the platform push arrive in Stage 6 / 8.
+delivery fans out to the appropriate channel. A **match-found** event (a human
+pairing or a robot substitution in auto-match, §8) belongs to the same fabric.
+Stage 4 **persists** the notification-worthy events (chat messages and nudges) but
+does not yet deliver them, and Stage 5's match-found has no live channel yet: the
+gRPC stream to the gateway and the platform push arrive in Stage 6 / 8. Until then
+a waiting client retrieves its started game by polling the matchmaker (`Poll`).

 ## 11. Observability

@@ -49,9 +49,14 @@ the bag or removed from play) is chosen when the game is created, and the leaver
 rack is never shown to the others.

 ### Robot opponent *(Stage 5)*
-Indistinguishable-from-human substitute in auto-match. Decides once whether to
-play to win (~40%), targets a small score margin, plays with human-like timing
-and a night sleep window, and nudges/answers nudges like a person.
+When auto-match finds no human within ten seconds, a robot opponent takes the empty
+seat so the game starts without waiting. It is meant to feel like a person: it
+decides once per game whether to play to win (about 40% of the time, so the human
+wins most games), aims for a close score rather than crushing or throwing the game,
+and plays at a human pace — short thinking times for most moves, the occasional long
+one, and a night-time pause that tracks the player's own day. It answers a nudge
+within a few minutes and nudges back when the player has been away a long time. It
+carries a human-like name and neither chats nor accepts friend requests.

 ### Social: friends, block, chat, nudge *(Stage 4)*
 Send a friend request and have it accepted (decline or cancel withdraws it,
@@ -48,9 +48,14 @@ session-токен; backend сопоставляет его с внутренн
 показывается остальным.

 ### Робот-соперник *(Stage 5)*
-Неотличимый от человека дублёр в авто-подборе. Один раз решает, играть ли на
-победу (~40%), целится в небольшой отрыв по очкам, ходит с человеческим
-таймингом и ночным сном, делает и принимает nudge как человек.
+Если авто-подбор не находит человека за десять секунд, свободное место занимает
+робот-соперник, и партия стартует без ожидания. Он задуман неотличимым от человека:
+один раз за партию решает, играть ли на победу (примерно в 40% случаев, так что
+человек выигрывает большинство партий), целится в близкий счёт, а не в разгром или
+поддавки, и ходит с человеческим темпом — чаще короткие раздумья, изредка долгие, и
+ночная пауза, подстроенная под день игрока. На nudge отвечает за несколько минут и
+сам шлёт nudge, когда игрок надолго пропал. Носит человекоподобное имя, не общается
+в чате и не принимает заявки в друзья.

 ### Социальное: друзья, блок, чат, nudge *(Stage 4)*
 Заявка в друзья и её принятие (отклонение или отмена снимают заявку, удаление —
@@ -32,21 +32,30 @@ tests or touching CI.
  Postgres-backed integration tests in `inttest` (full lifecycle to a natural
  end, **journal-replay equivalence**, the turn-timeout sweep with away-window
  grace, resign win/loss and statistics, the hint allowance-then-wallet policy,
-  word-check and complaint capture, and per-game-lock serialisation). The robot
-  balance/margin regression tests arrive with Stage 5. Stage 4 adds the engine's
-  **multi-player drop-out** cases (continue after one resign, last-survivor win,
-  the tile-disposition bag effect) and a domain integration test for a 3-player
-  **timeout that continues**.
+  word-check and complaint capture, and per-game-lock serialisation). Stage 4 adds
+  the engine's **multi-player drop-out** cases (continue after one resign,
+  last-survivor win, the tile-disposition bag effect) and a domain integration test
+  for a 3-player **timeout that continues**. The engine also gains a `Candidates`
+  ranked/decoded test (Stage 5).
 - **Social & lobby** *(Stage 4+)* — `backend/internal/social` unit-tests the chat
  **content filter** (links/emails/phones plus obfuscated forms) and
  `backend/internal/lobby` unit-tests the in-memory **matchmaker** (FIFO pairing,
-  cancel, per-variant pools) with a fake game creator. Postgres-backed `inttest`
-  covers the friend request/accept lifecycle with the block/toggle guards, the
-  per-user block (and its severing of friendships), chat post/list with the IP,
+  cancel, per-variant pools, plus the Stage 5 **robot substitution** reaper and
+  `Poll` delivery) with fake game-creator and robot-provider seams. Postgres-backed
+  `inttest` covers the friend request/accept lifecycle with the block/toggle guards,
+  the per-user block (and its severing of friendships), chat post/list with the IP,
  content and block-visibility rules, the nudge turn/rate-limit rules, the
  invitation flow (all-accept starts the game, decline cancels, lazy expiry,
  inviter-only cancel), and the email confirm-code flow (request/confirm, taken
  email, expiry and attempt-cap) with a fixture mailer.
+- **Robot** *(Stage 5+)* — `backend/internal/robot` unit-tests the pure strategy:
+  the ≈ 40% play-to-win split over many seeds, the right-skewed move-delay
+  (bounds, ~10-min median, determinism), the margin selection (win/lose, in-band
+  and out-of-band fallbacks, no-play exchange/pass), the sleep window with drift
+  and the midnight wrap, and mix restart-stability. Postgres-backed `inttest`
+  drives a robot through a full auto-match to a natural end (asserting a robot
+  statistics row), the matchmaker substitution end-to-end (enqueue → reap →
+  `[human, robot]`, discoverable via `Poll`), and a proactive 12-hour nudge.

 ## Principles