R2: stress harness + contour resource observability + early run #33

Merged
developer merged 4 commits from feature/r2-loadtest-observability into development 2026-06-09 23:01:30 +00:00

4 Commits

Author SHA1 Message Date
Ilia Denisov a2265a122e R2: early-pass trip report + mark R2 done
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 13s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 57s
Ran the moderate early pass (50/200/500, 10 min/step) against the contour: ramped
clean to 500 players, 1.2 M edge calls, 48 870 plays, 2 798 games finished, no
crash/deadlock; cleanup removed all 11 000 seeded accounts. The per-user limiter held
under the gateway-hammer (99.97 % rejected, p99 2 ms).

Top finding: ~14 % transport_error on game.state at 500 players under CPU saturation
(backend/gateway/Postgres each ~1 core), amplified by the harness's single shared
http2.Transport (the harness itself peaked at 86 % of a core on the same host).
Observability finding: cAdvisor yields only the root cgroup on the contour host
(separate XFS /var/lib/docker); per-container metrics captured via docker stats; R7
should adopt the otelcol docker_stats receiver. Full report in loadtest/REPORT-R2.md;
PRERELEASE refinements logged; R2 marked done.
2026-06-10 00:47:16 +02:00
Ilia Denisov 422bd14b53 R2: harness payload fixes found by the smoke pass
- display-name marker: letters-only 'Zzloadtest' (the editable-name validator
  forbids digits/colons), so profile.update resends the seeded name successfully.
- draft.save: rack_order is a string in the backend draft DTO (was sent as []),
  fixing the bad_request.
Both confirmed ok against the contour. chat_not_your_turn / nudge_own_turn are
by-design turn gates (backend/internal/social/chat.go), correctly exercised.
2026-06-10 00:11:33 +02:00
Ilia Denisov 0c55574ddd R2: drop ./loadtest from the backend/gateway/telegram image builds
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 13s
CI / ui (pull_request) Successful in 36s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m8s
Adding the loadtest module to go.work (use ./loadtest + the scrabble/gateway
replace it needs) broke the other services' Docker builds: their reduced
workspace still referenced ./loadtest (not in their build context), failing with
'cannot load module loadtest: open loadtest/go.mod: no such file or directory'.
Each service Dockerfile now also -dropuse=./loadtest; backend and telegram (which
do not COPY ./gateway) additionally -dropreplace the loadtest-only scrabble/gateway
replace. Verified by building all three images plus loadtest locally.
2026-06-09 23:57:26 +02:00
Ilia Denisov aa137e3558 R2: load-test harness + contour resource observability
CI / changes (pull_request) Successful in 2s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 38s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Failing after 3s
New scrabble/loadtest module (the pre-release stress harness): seeds 1000 guest +
10000 durable accounts with pre-created sessions directly in Postgres (token hash
matches backend/internal/session), drives virtual players through the edge protocol
(real 2-4p games assembled via invitations, mid-ranked legal moves generated locally
by the embedded scrabble-solver — the edge carries no board, so the client replays
history), plus nudge/chat/check-word/draft/profile/stats and a gateway-hammer that
verifies the rate limiter. Prints a trip-report summary (per-op latency percentiles,
result codes, live-event tally). Go unit tests cover the pure pieces; the DAWG-backed
move test runs under BACKEND_DICT_DIR.

Contour: add cAdvisor + postgres_exporter + a 'Scrabble - Resources' Grafana
dashboard and the two Prometheus scrape jobs, for the R2/R7 stress-run resource
baseline.

CI: gate ./loadtest/... (path filter + vet/build/test). Docs: TESTING, ARCHITECTURE,
project CLAUDE repo layout.
2026-06-09 23:45:24 +02:00