Files
scrabble-game/loadtest/README.md
T
Ilia Denisov 2a48df9b83
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s
R7: trip report + docs/tracker bake-back; mark R7 done
- loadtest/REPORT-R7.md: the final stress-run report — method, the 500-player resource
  profile, the agreed tuning, the validation (transport_error 2.49% -> 0.72% at 3 gateway
  cores; the burst run showing connection-bound behavior), and the prod-sizing
  recommendation for Stage 18.
- loadtest/README.md: per-player transports, --cpus capping, docker_stats (was cAdvisor),
  the absolute BACKEND_DICT_DIR for ./loadtest/... , and report links.
- docs/TESTING.md + docs/ARCHITECTURE.md: observability now uses the otelcol docker_stats
  receiver (cAdvisor removed); links to both trip reports.
- CLAUDE.md: repo-layout line reflects docker_stats + per-service limits.
- PRERELEASE.md: R7 marked done in the tracker + heading; a Refinements entry recording
  the decisions, findings, applied tuning and validation.

This is the final pre-release hardening phase; Stage 18 (prod cutover) is next.
2026-06-11 11:18:57 +02:00

109 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# loadtest — stress harness
Reusable load harness for the pre-release stress pass. It
seeds a large account population with pre-created sessions, drives virtual players
through the **gateway edge protocol** in realistic games, hammers the rate limiter,
and prints a trip-report summary. It stays in the repo for repeats.
## What it does
1. **Seed** (direct Postgres, schema `backend`): inserts `--durable` durable accounts
(each with a confirmed email identity) + `--guest` guest accounts and an active
`sessions` row per account, then hands the plaintext bearer tokens to the driver.
Token hashes match `backend/internal/session` (`hex(sha256(token))`), so the seeded
sessions resolve. Every row carries a distinctive display-name marker for cleanup.
2. **Drive** (edge protocol over h2c): assembles real 24 player games via the
invitation flow (`invitation.create``invitation.accept`, no robots), then runs
each player's turn loop — poll `game.state`, replay `game.history`, generate a legal
**mid-ranked** move with the embedded `scrabble-solver`, and `game.submit_play`
(or pass/exchange). A fraction of turns exercise nudge / chat / check-word / draft /
profile-update / stats. Each player also holds a live `Subscribe` stream. The
moderate ramp is **50 → 200 → 500** concurrent players, ~12 min per step.
3. **Hammer**: drives `games.list` from one account far above the per-user rate limit
to verify the limiter holds (`rate_limited` results) and measure its cost.
4. **Report**: per-operation latency percentiles, throughput, result-code breakdown,
live-event tally and the aggregate error rate.
The driver runs the solver **locally** because the edge protocol carries no board: the
client reconstructs it from decoded history (the same invariant as the UI).
## Connection model
The harness reaches Postgres and the gateway directly, so run it as a one-shot
container on the contour's docker network (this bypasses the host→gateway hairpin):
```sh
# from the repo root
docker build -f loadtest/Dockerfile -t scrabble-loadtest .
docker run --rm --cpus=3 --name scrabble-loadtest --network scrabble-internal \
-e POSTGRES_PASSWORD="$TEST_POSTGRES_PASSWORD" \
scrabble-loadtest run
```
Each virtual player gets its own `edge.Client` (its own h2c connection), mirroring real
clients rather than multiplexing every player over one transport. Defaults assume the
contour service names: `postgres:5432` and `gateway:8081`. The DAWGs are baked into the
image (`/opt/dawg`, pinned to the dictionary release). On a host shared with the contour,
cap the harness (`--cpus=3`) so the contour keeps the spare cores. Run with
`--name scrabble-loadtest` so the harness's own CPU/memory show up as a `scrabble-*`
series in the metrics (keeping it separable from the system under test). Capture the
resource baseline from the Grafana **Scrabble — Resources** dashboard (the otelcol
`docker_stats` receiver + postgres_exporter), or from `docker stats` directly, while the
run is in progress.
## Commands & flags
```
loadtest run [flags] seed, drive the ramp + hammer, print the report
loadtest cleanup [flags] delete everything the harness seeded (matched by the display-name marker)
```
Key `run` flags (env in parentheses):
| flag | default | meaning |
|------|---------|---------|
| `--gateway` (`LOADTEST_GATEWAY_URL`) | `http://gateway:8081` | gateway base URL |
| `--dsn` (`LOADTEST_DSN`) | from `POSTGRES_*` | backend Postgres DSN (schema `backend`) |
| `--dawg` (`LOADTEST_DAWG_DIR`) | `/dawg` (image: `/opt/dawg`) | committed `*.dawg` directory |
| `--durable` / `--guest` | `10000` / `1000` | accounts to seed |
| `--steps` | `50,200,500` | concurrent-player ramp steps |
| `--step-dur` | `12m` | hold time per step |
| `--games-per-player` | `0` (random 35) | target concurrent games per player |
| `--tick` | `800ms` | per-player op cadence (keeps a player under the per-user limit) |
| `--secondary-prob` | `0.08` | chance per tick of a non-move op |
| `--hammer-workers` / `--hammer-dur` | `20` / `15s` | gateway-hammer (0 workers disables) |
| `--reset` / `--cleanup` | `false` | delete harness rows before / after the run |
`run` re-seeds every time (plaintext tokens are never stored), so pass `--reset` to
clear a prior run's rows first. The authoritative hard reset of the contour remains the
DB wipe (`DROP SCHEMA backend CASCADE` + backend restart).
## Build & test
```sh
go build ./loadtest/...
go vet ./loadtest/...
BACKEND_DICT_DIR="$PWD/../scrabble-solver/dawg" go test -count=1 ./loadtest/...
```
The DAWG-backed `moves` test runs only when `BACKEND_DICT_DIR` is set (as the engine
tests use); the pure logic (hashing, board replay, rack build, move selection, report)
runs unconditionally. Use an **absolute** path (here via `$PWD`): `go test ./loadtest/...`
runs each package from its own directory, so a relative `BACKEND_DICT_DIR` would not
resolve.
## Trip reports
The two stress passes are written up in the repo: the early pass in
[`REPORT-R2.md`](REPORT-R2.md) and the final, tuned pass in
[`REPORT-R7.md`](REPORT-R7.md).
## Caveat
The harness shares the host CPU with the contour, so its own `scrabble-loadtest`
container series is read alongside the system under test; capping it with `--cpus`
keeps the contour's quota. Per-player transports (R7) removed the shared-transport
artifact that inflated R2's `transport_error`, so the figures reflect the system. A
fully isolated ceiling on separate hardware remains future work.