R2: load-test harness + contour resource observability
CI / changes (pull_request) Successful in 2s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 38s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Failing after 3s

New scrabble/loadtest module (the pre-release stress harness): seeds 1000 guest +
10000 durable accounts with pre-created sessions directly in Postgres (token hash
matches backend/internal/session), drives virtual players through the edge protocol
(real 2-4p games assembled via invitations, mid-ranked legal moves generated locally
by the embedded scrabble-solver — the edge carries no board, so the client replays
history), plus nudge/chat/check-word/draft/profile/stats and a gateway-hammer that
verifies the rate limiter. Prints a trip-report summary (per-op latency percentiles,
result codes, live-event tally). Go unit tests cover the pure pieces; the DAWG-backed
move test runs under BACKEND_DICT_DIR.

Contour: add cAdvisor + postgres_exporter + a 'Scrabble - Resources' Grafana
dashboard and the two Prometheus scrape jobs, for the R2/R7 stress-run resource
baseline.

CI: gate ./loadtest/... (path filter + vet/build/test). Docs: TESTING, ARCHITECTURE,
project CLAUDE repo layout.
This commit is contained in:
Ilia Denisov
2026-06-09 23:45:24 +02:00
parent bf3ee62711
commit aa137e3558
27 changed files with 2554 additions and 7 deletions
+94
View File
@@ -0,0 +1,94 @@
# loadtest — R2 stress harness
Reusable load harness for the pre-release stress pass (`PRERELEASE.md` R2/R7). It
seeds a large account population with pre-created sessions, drives virtual players
through the **gateway edge protocol** in realistic games, hammers the rate limiter,
and prints a trip-report summary. It stays in the repo for repeats.
## What it does
1. **Seed** (direct Postgres, schema `backend`): inserts `--durable` durable accounts
(each with a confirmed email identity) + `--guest` guest accounts and an active
`sessions` row per account, then hands the plaintext bearer tokens to the driver.
Token hashes match `backend/internal/session` (`hex(sha256(token))`), so the seeded
sessions resolve. Every row is tagged with the `lt:` marker for cleanup.
2. **Drive** (edge protocol over h2c): assembles real 24 player games via the
invitation flow (`invitation.create``invitation.accept`, no robots), then runs
each player's turn loop — poll `game.state`, replay `game.history`, generate a legal
**mid-ranked** move with the embedded `scrabble-solver`, and `game.submit_play`
(or pass/exchange). A fraction of turns exercise nudge / chat / check-word / draft /
profile-update / stats. Each player also holds a live `Subscribe` stream. The
moderate ramp is **50 → 200 → 500** concurrent players, ~12 min per step.
3. **Hammer**: drives `games.list` from one account far above the per-user rate limit
to verify the limiter holds (`rate_limited` results) and measure its cost.
4. **Report**: per-operation latency percentiles, throughput, result-code breakdown,
live-event tally and the aggregate error rate.
The driver runs the solver **locally** because the edge protocol carries no board: the
client reconstructs it from decoded history (the same invariant as the UI).
## Connection model
The harness reaches Postgres and the gateway directly, so run it as a one-shot
container on the contour's docker network (this bypasses the host→gateway hairpin):
```sh
# from the repo root
docker build -f loadtest/Dockerfile -t scrabble-loadtest .
docker run --rm --name scrabble-loadtest --network scrabble-internal \
-e POSTGRES_PASSWORD="$TEST_POSTGRES_PASSWORD" \
scrabble-loadtest run
```
Defaults assume the contour service names: `postgres:5432` and `gateway:8081`. The
DAWGs are baked into the image (`/opt/dawg`, pinned to the dictionary release). Run with
`--name scrabble-loadtest` so the harness's own CPU/memory show up as a `scrabble-*`
series in cAdvisor (keeping it separable from the system under test). Capture the
resource baseline from the Grafana **Scrabble — Resources** dashboard
(cAdvisor + postgres_exporter) while the run is in progress.
## Commands & flags
```
loadtest run [flags] seed, drive the ramp + hammer, print the report
loadtest cleanup [flags] delete everything the harness seeded (matched by the lt: marker)
```
Key `run` flags (env in parentheses):
| flag | default | meaning |
|------|---------|---------|
| `--gateway` (`LOADTEST_GATEWAY_URL`) | `http://gateway:8081` | gateway base URL |
| `--dsn` (`LOADTEST_DSN`) | from `POSTGRES_*` | backend Postgres DSN (schema `backend`) |
| `--dawg` (`LOADTEST_DAWG_DIR`) | `/dawg` (image: `/opt/dawg`) | committed `*.dawg` directory |
| `--durable` / `--guest` | `10000` / `1000` | accounts to seed |
| `--steps` | `50,200,500` | concurrent-player ramp steps |
| `--step-dur` | `12m` | hold time per step |
| `--games-per-player` | `0` (random 35) | target concurrent games per player |
| `--tick` | `800ms` | per-player op cadence (keeps a player under the per-user limit) |
| `--secondary-prob` | `0.08` | chance per tick of a non-move op |
| `--hammer-workers` / `--hammer-dur` | `20` / `15s` | gateway-hammer (0 workers disables) |
| `--reset` / `--cleanup` | `false` | delete harness rows before / after the run |
`run` re-seeds every time (plaintext tokens are never stored), so pass `--reset` to
clear a prior run's rows first. The authoritative hard reset of the contour remains the
DB wipe (`DROP SCHEMA backend CASCADE` + backend restart).
## Build & test
```sh
go build ./loadtest/...
go vet ./loadtest/...
BACKEND_DICT_DIR=../scrabble-solver/dawg go test -count=1 ./loadtest/...
```
The DAWG-backed `moves` test runs only when `BACKEND_DICT_DIR` is set (as the engine
tests use); the pure logic (hashing, board replay, rack build, move selection, report)
runs unconditionally.
## Caveat
The harness shares the host CPU with the contour, so the early-pass resource baseline
is read with the harness's own container series in mind; a cleaner number on separate
hardware is an R7 goal. The moderate ramp keeps the generator from being the bottleneck.