Files
scrabble-game/loadtest/README.md
T
Ilia Denisov aa137e3558
CI / changes (pull_request) Successful in 2s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 38s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Failing after 3s
R2: load-test harness + contour resource observability
New scrabble/loadtest module (the pre-release stress harness): seeds 1000 guest +
10000 durable accounts with pre-created sessions directly in Postgres (token hash
matches backend/internal/session), drives virtual players through the edge protocol
(real 2-4p games assembled via invitations, mid-ranked legal moves generated locally
by the embedded scrabble-solver — the edge carries no board, so the client replays
history), plus nudge/chat/check-word/draft/profile/stats and a gateway-hammer that
verifies the rate limiter. Prints a trip-report summary (per-op latency percentiles,
result codes, live-event tally). Go unit tests cover the pure pieces; the DAWG-backed
move test runs under BACKEND_DICT_DIR.

Contour: add cAdvisor + postgres_exporter + a 'Scrabble - Resources' Grafana
dashboard and the two Prometheus scrape jobs, for the R2/R7 stress-run resource
baseline.

CI: gate ./loadtest/... (path filter + vet/build/test). Docs: TESTING, ARCHITECTURE,
project CLAUDE repo layout.
2026-06-09 23:45:24 +02:00

95 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# loadtest — R2 stress harness
Reusable load harness for the pre-release stress pass (`PRERELEASE.md` R2/R7). It
seeds a large account population with pre-created sessions, drives virtual players
through the **gateway edge protocol** in realistic games, hammers the rate limiter,
and prints a trip-report summary. It stays in the repo for repeats.
## What it does
1. **Seed** (direct Postgres, schema `backend`): inserts `--durable` durable accounts
(each with a confirmed email identity) + `--guest` guest accounts and an active
`sessions` row per account, then hands the plaintext bearer tokens to the driver.
Token hashes match `backend/internal/session` (`hex(sha256(token))`), so the seeded
sessions resolve. Every row is tagged with the `lt:` marker for cleanup.
2. **Drive** (edge protocol over h2c): assembles real 24 player games via the
invitation flow (`invitation.create``invitation.accept`, no robots), then runs
each player's turn loop — poll `game.state`, replay `game.history`, generate a legal
**mid-ranked** move with the embedded `scrabble-solver`, and `game.submit_play`
(or pass/exchange). A fraction of turns exercise nudge / chat / check-word / draft /
profile-update / stats. Each player also holds a live `Subscribe` stream. The
moderate ramp is **50 → 200 → 500** concurrent players, ~12 min per step.
3. **Hammer**: drives `games.list` from one account far above the per-user rate limit
to verify the limiter holds (`rate_limited` results) and measure its cost.
4. **Report**: per-operation latency percentiles, throughput, result-code breakdown,
live-event tally and the aggregate error rate.
The driver runs the solver **locally** because the edge protocol carries no board: the
client reconstructs it from decoded history (the same invariant as the UI).
## Connection model
The harness reaches Postgres and the gateway directly, so run it as a one-shot
container on the contour's docker network (this bypasses the host→gateway hairpin):
```sh
# from the repo root
docker build -f loadtest/Dockerfile -t scrabble-loadtest .
docker run --rm --name scrabble-loadtest --network scrabble-internal \
-e POSTGRES_PASSWORD="$TEST_POSTGRES_PASSWORD" \
scrabble-loadtest run
```
Defaults assume the contour service names: `postgres:5432` and `gateway:8081`. The
DAWGs are baked into the image (`/opt/dawg`, pinned to the dictionary release). Run with
`--name scrabble-loadtest` so the harness's own CPU/memory show up as a `scrabble-*`
series in cAdvisor (keeping it separable from the system under test). Capture the
resource baseline from the Grafana **Scrabble — Resources** dashboard
(cAdvisor + postgres_exporter) while the run is in progress.
## Commands & flags
```
loadtest run [flags] seed, drive the ramp + hammer, print the report
loadtest cleanup [flags] delete everything the harness seeded (matched by the lt: marker)
```
Key `run` flags (env in parentheses):
| flag | default | meaning |
|------|---------|---------|
| `--gateway` (`LOADTEST_GATEWAY_URL`) | `http://gateway:8081` | gateway base URL |
| `--dsn` (`LOADTEST_DSN`) | from `POSTGRES_*` | backend Postgres DSN (schema `backend`) |
| `--dawg` (`LOADTEST_DAWG_DIR`) | `/dawg` (image: `/opt/dawg`) | committed `*.dawg` directory |
| `--durable` / `--guest` | `10000` / `1000` | accounts to seed |
| `--steps` | `50,200,500` | concurrent-player ramp steps |
| `--step-dur` | `12m` | hold time per step |
| `--games-per-player` | `0` (random 35) | target concurrent games per player |
| `--tick` | `800ms` | per-player op cadence (keeps a player under the per-user limit) |
| `--secondary-prob` | `0.08` | chance per tick of a non-move op |
| `--hammer-workers` / `--hammer-dur` | `20` / `15s` | gateway-hammer (0 workers disables) |
| `--reset` / `--cleanup` | `false` | delete harness rows before / after the run |
`run` re-seeds every time (plaintext tokens are never stored), so pass `--reset` to
clear a prior run's rows first. The authoritative hard reset of the contour remains the
DB wipe (`DROP SCHEMA backend CASCADE` + backend restart).
## Build & test
```sh
go build ./loadtest/...
go vet ./loadtest/...
BACKEND_DICT_DIR=../scrabble-solver/dawg go test -count=1 ./loadtest/...
```
The DAWG-backed `moves` test runs only when `BACKEND_DICT_DIR` is set (as the engine
tests use); the pure logic (hashing, board replay, rack build, move selection, report)
runs unconditionally.
## Caveat
The harness shares the host CPU with the contour, so the early-pass resource baseline
is read with the harness's own container series in mind; a cleaner number on separate
hardware is an R7 goal. The moderate ramp keeps the generator from being the bottleneck.