R7: trip report + docs/tracker bake-back; mark R7 done
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s
- loadtest/REPORT-R7.md: the final stress-run report — method, the 500-player resource profile, the agreed tuning, the validation (transport_error 2.49% -> 0.72% at 3 gateway cores; the burst run showing connection-bound behavior), and the prod-sizing recommendation for Stage 18. - loadtest/README.md: per-player transports, --cpus capping, docker_stats (was cAdvisor), the absolute BACKEND_DICT_DIR for ./loadtest/... , and report links. - docs/TESTING.md + docs/ARCHITECTURE.md: observability now uses the otelcol docker_stats receiver (cAdvisor removed); links to both trip reports. - CLAUDE.md: repo-layout line reflects docker_stats + per-service limits. - PRERELEASE.md: R7 marked done in the tracker + heading; a Refinements entry recording the decisions, findings, applied tuning and validation. This is the final pre-release hardening phase; Stage 18 (prod cutover) is next.
This commit is contained in:
+25
-11
@@ -36,17 +36,21 @@ container on the contour's docker network (this bypasses the host→gateway hair
|
||||
# from the repo root
|
||||
docker build -f loadtest/Dockerfile -t scrabble-loadtest .
|
||||
|
||||
docker run --rm --name scrabble-loadtest --network scrabble-internal \
|
||||
docker run --rm --cpus=3 --name scrabble-loadtest --network scrabble-internal \
|
||||
-e POSTGRES_PASSWORD="$TEST_POSTGRES_PASSWORD" \
|
||||
scrabble-loadtest run
|
||||
```
|
||||
|
||||
Defaults assume the contour service names: `postgres:5432` and `gateway:8081`. The
|
||||
DAWGs are baked into the image (`/opt/dawg`, pinned to the dictionary release). Run with
|
||||
Each virtual player gets its own `edge.Client` (its own h2c connection), mirroring real
|
||||
clients rather than multiplexing every player over one transport. Defaults assume the
|
||||
contour service names: `postgres:5432` and `gateway:8081`. The DAWGs are baked into the
|
||||
image (`/opt/dawg`, pinned to the dictionary release). On a host shared with the contour,
|
||||
cap the harness (`--cpus=3`) so the contour keeps the spare cores. Run with
|
||||
`--name scrabble-loadtest` so the harness's own CPU/memory show up as a `scrabble-*`
|
||||
series in cAdvisor (keeping it separable from the system under test). Capture the
|
||||
resource baseline from the Grafana **Scrabble — Resources** dashboard
|
||||
(cAdvisor + postgres_exporter) while the run is in progress.
|
||||
series in the metrics (keeping it separable from the system under test). Capture the
|
||||
resource baseline from the Grafana **Scrabble — Resources** dashboard (the otelcol
|
||||
`docker_stats` receiver + postgres_exporter), or from `docker stats` directly, while the
|
||||
run is in progress.
|
||||
|
||||
## Commands & flags
|
||||
|
||||
@@ -80,15 +84,25 @@ DB wipe (`DROP SCHEMA backend CASCADE` + backend restart).
|
||||
```sh
|
||||
go build ./loadtest/...
|
||||
go vet ./loadtest/...
|
||||
BACKEND_DICT_DIR=../scrabble-solver/dawg go test -count=1 ./loadtest/...
|
||||
BACKEND_DICT_DIR="$PWD/../scrabble-solver/dawg" go test -count=1 ./loadtest/...
|
||||
```
|
||||
|
||||
The DAWG-backed `moves` test runs only when `BACKEND_DICT_DIR` is set (as the engine
|
||||
tests use); the pure logic (hashing, board replay, rack build, move selection, report)
|
||||
runs unconditionally.
|
||||
runs unconditionally. Use an **absolute** path (here via `$PWD`): `go test ./loadtest/...`
|
||||
runs each package from its own directory, so a relative `BACKEND_DICT_DIR` would not
|
||||
resolve.
|
||||
|
||||
## Trip reports
|
||||
|
||||
The two stress passes are written up in the repo: the early pass in
|
||||
[`REPORT-R2.md`](REPORT-R2.md) and the final, tuned pass in
|
||||
[`REPORT-R7.md`](REPORT-R7.md).
|
||||
|
||||
## Caveat
|
||||
|
||||
The harness shares the host CPU with the contour, so the early-pass resource baseline
|
||||
is read with the harness's own container series in mind; a cleaner number on separate
|
||||
hardware is future work. The moderate ramp keeps the generator from being the bottleneck.
|
||||
The harness shares the host CPU with the contour, so its own `scrabble-loadtest`
|
||||
container series is read alongside the system under test; capping it with `--cpus`
|
||||
keeps the contour's quota. Per-player transports (R7) removed the shared-transport
|
||||
artifact that inflated R2's `transport_error`, so the figures reflect the system. A
|
||||
fully isolated ceiling on separate hardware remains future work.
|
||||
|
||||
Reference in New Issue
Block a user