Stage 5: robot opponent (pool, seed-derived strategy, move driver, matchmaker substitution)
Tests · Go / test (push) Successful in 6s
Tests · Integration / integration (push) Successful in 10s
Tests · Go / test (pull_request) Successful in 5s
Tests · Integration / integration (pull_request) Successful in 10s

- internal/robot: durable kind='robot' account pool (migration 00004); every
  per-game and per-turn choice derived deterministically from the game seed
  (restart-stable FNV mix); a background move driver; margin targeting (band
  1-30, closest-to-band); right-skewed [2,90]min delays (median ~10m);
  opponent-anchored sleep with +/-3h drift; daytime nudge reply + proactive
  12h nudge; friend/chat blocked via profile toggles.
- engine.Candidates (decoded ranked plays); game.Candidates + RobotTurns;
  social.LastNudgeAt.
- matchmaker: 10s wait then robot substitution (reaper) + Poll delivery seam.
- config (BACKEND_ROBOT_DRIVE_INTERVAL, BACKEND_LOBBY_ROBOT_WAIT,
  BACKEND_LOBBY_REAPER_INTERVAL); main wiring + boot-time pool provisioning.
- metrics: robot account_stats (authoritative balance) + robot_games_finished_total
  OTel counter + per-finish log.
- docs: PLAN, ARCHITECTURE, FUNCTIONAL(+ru), TESTING, README; account.go comment.
- tests: robot strategy units, matchmaker reaper/Poll, engine.Candidates; inttest
  robot full-game / substitution / proactive-nudge.
This commit is contained in:
Ilia Denisov
2026-06-02 21:02:20 +02:00
parent 12fc6e498e
commit 85baabe4ba
26 changed files with 1700 additions and 85 deletions
+42 -18
View File
@@ -87,7 +87,8 @@ arrive from a platform rather than completing a mandatory registration).
a platform auto-provisions a durable account bound to that platform identity.
Concretely, platform and email identities share one `identities` table keyed by
a unique `(kind, external_id)`; email is an identity with `kind=email` and a
`confirmed` flag. The **email confirm-code flow** (Stage 4) binds an email to the
`confirmed` flag. A synthetic `kind='robot'` identity (Stage 5) backs each pooled
robot opponent (§7). The **email confirm-code flow** (Stage 4) binds an email to the
authenticated account: a 6-digit code (stored only as a SHA-256 hash, 15-minute
TTL, ≤ 5 attempts) is sent through a `Mailer` seam (an SMTP relay, or a
development log mailer when none is configured) and, once verified, attaches a
@@ -191,20 +192,37 @@ Key points:
## 7. Robot opponent
Substitutes for a human in 2-player auto-match when the pool yields no human
within 10 seconds. Designed to be indistinguishable from a person.
within 10 seconds (§8). It lives in `internal/robot` and plays as an ordinary
seated account through the game service, so only `internal/engine` imports the
solver. It is designed to be indistinguishable from a person.
The robot keeps **no per-game state**: every choice is derived deterministically
from the game's bag `seed` (a restart-stable FNV-1a mix), so a background driver
(`robot.Service.Run`, mirroring the turn-timeout sweeper) recomputes the same
behaviour on every scan and after a restart — the same philosophy as journal
replay. A pool of durable accounts — each a `kind='robot'` identity (§4),
provisioned at startup with chat and friend requests blocked — backs the
human-like name pool; those two profile toggles are all the friend/DM blocking
requires (there is no DM surface; chat is per-game).
- **Balance**: at game start it decides once whether to play to win, with
`P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%). Adaptive difficulty is
post-MVP.
- **Margin targeting**: each turn it picks from `GenerateMoves` a move that
keeps the resulting lead (when playing to win) or deficit (when playing to
lose) small (≈ 120 points), rather than always the maximum.
`P(play-to-win) ≈ 0.40` (so the human wins ≈ 60%), derived from the seed.
Adaptive difficulty is post-MVP.
- **Margin targeting**: each turn it picks from the ranked candidates
(`engine.Candidates`) the move whose resulting lead (playing to win) or deficit
(playing to lose) is closest to a small band (**130 points**), rather than
always the maximum; with no legal play it exchanges a full rack when the bag can
refill it, else passes.
- **Timing**: per-move delay sampled from a right-skewed distribution (short
delays frequent), clamped to **[2, 90] minutes**; **sleeps 00:0007:00** in
the opponent's profile timezone (fallback UTC); on a daytime nudge after 60
minutes idle it replies within **210 minutes**; it proactively nudges the
human after 12 hours idle.
- Blocks friend requests and direct messages; uses a human-like name pool.
delays frequent, median ≈ 10 min), clamped to **[2, 90] minutes**; it
**sleeps 00:0007:00** anchored to the **opponent's** profile timezone with a
per-game drift of **±3 h** (fallback UTC), so its night overlaps the human's
rather than running anti-phase; on a daytime nudge it replies within
**210 minutes**; it proactively nudges the human after **12 hours** idle
(subject to the once-per-hour chat limit).
- **Observability**: robot accounts accrue ordinary statistics (§9) — the
authoritative balance metric (target ≈ 40% robot wins) — and a
`robot_games_finished_total` OTel counter plus a per-finish log give a live view.
## 8. Lobby & social
@@ -212,8 +230,10 @@ within 10 seconds. Designed to be indistinguishable from a person.
fixes the board language), pairing the next two humans into a two-player
auto-match with the seat order randomised for first-move fairness. The pool is
lost on restart (players re-queue) and is anonymous, so it does not consult
blocks. The 10 s wait and the **robot substitution** for a missing human are
added in Stage 5.
blocks. After **10 s** with no human a background reaper substitutes a pooled
robot (§7) and starts the game. A queued player learns of a pairing or a
substitution through the matchmaker's `Poll`, the interim delivery seam until the
live match-found notification (§10).
- **Friends**: a **request → accept** graph (one `friendships` table) — add by
friend list or internal ID now, by platform deep-link with Stage 8. Declining or
cancelling removes the pending request; blocking someone severs an existing
@@ -252,7 +272,8 @@ within 10 seconds. Designed to be indistinguishable from a person.
keys are application-generated **UUIDv7**.
- Tables: `accounts` (durable internal accounts; Stage 3 added the away-window
columns `away_start`/`away_end` and the hint wallet `hint_balance`),
`identities` (platform/email identities, unique `(kind, external_id)`),
`identities` (platform/email/robot identities, unique `(kind, external_id)`;
Stage 5's migration `00004` admits the `robot` kind),
`sessions` (revoke-only opaque-token hashes), the Stage 3 game tables
`games` (Stage 4 added the `dropout_tiles` disposition column), `game_players`,
`game_moves` (the move journal), `complaints` and `account_stats`, and the
@@ -301,9 +322,12 @@ does not cover.
Two channels: **platform-native push** (out-of-app, via the platform
side-service — your-turn, nudge) and the **in-app live stream** (chat,
opponent-moved, while the app is open). Backend emits notification intents;
delivery fans out to the appropriate channel. Stage 4 **persists** the
notification-worthy events (chat messages and nudges) but does not yet deliver
them: the gRPC stream to the gateway and the platform push arrive in Stage 6 / 8.
delivery fans out to the appropriate channel. A **match-found** event (a human
pairing or a robot substitution in auto-match, §8) belongs to the same fabric.
Stage 4 **persists** the notification-worthy events (chat messages and nudges) but
does not yet deliver them, and Stage 5's match-found has no live channel yet: the
gRPC stream to the gateway and the platform push arrive in Stage 6 / 8. Until then
a waiting client retrieves its started game by polling the matchmaker (`Poll`).
## 11. Observability