R7: trip report + docs/tracker bake-back; mark R7 done
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s

- loadtest/REPORT-R7.md: the final stress-run report — method, the 500-player resource
  profile, the agreed tuning, the validation (transport_error 2.49% -> 0.72% at 3 gateway
  cores; the burst run showing connection-bound behavior), and the prod-sizing
  recommendation for Stage 18.
- loadtest/README.md: per-player transports, --cpus capping, docker_stats (was cAdvisor),
  the absolute BACKEND_DICT_DIR for ./loadtest/... , and report links.
- docs/TESTING.md + docs/ARCHITECTURE.md: observability now uses the otelcol docker_stats
  receiver (cAdvisor removed); links to both trip reports.
- CLAUDE.md: repo-layout line reflects docker_stats + per-service limits.
- PRERELEASE.md: R7 marked done in the tracker + heading; a Refinements entry recording
  the decisions, findings, applied tuning and validation.

This is the final pre-release hardening phase; Stage 18 (prod cutover) is next.
This commit is contained in:
Ilia Denisov
2026-06-11 11:18:57 +02:00
parent f23da88028
commit 2a48df9b83
6 changed files with 257 additions and 21 deletions
+7 -5
View File
@@ -561,11 +561,13 @@ promotions) is future work and would deliver short markdown messages (text + lin
metrics + Tempo traces), **Prometheus** (15d), **Tempo** (72h) and **Grafana**
(provisioned datasources + dashboards, behind the caddy `/_gm/grafana` Basic-Auth)
are stood up with the deploy (`deploy/`); the default exporter stays
`none`, so CI needs no collector. The contour also runs **cAdvisor** (per-container
CPU/memory/network) and **postgres_exporter** (connections, cache-hit ratio,
transactions, db size), scraped by Prometheus and surfaced on the **Scrabble —
Resources** Grafana dashboard, which captures a resource
baseline; these export directly in Prometheus format (not through the collector).
`none`, so CI needs no collector. The collector also runs a **`docker_stats`**
receiver (per-container CPU/memory/network read from the Docker API and exported
through its Prometheus endpoint), and the contour runs **postgres_exporter**
(connections, cache-hit ratio, transactions, db size, scraped directly by Prometheus);
both are surfaced on the **Scrabble — Resources** Grafana dashboard, which captures the
stress-run resource profile. (`docker_stats` replaced cAdvisor, which on the contour
host resolved only the root cgroup — a separate-XFS `/var/lib/docker`.)
- Per-request server-side timing via gin middleware from day one (the access log
carries method, route, status, latency and the active trace id). A
client-measured RTT piggybacked on the next request is a later enhancement.