R7: trip report + docs/tracker bake-back; mark R7 done
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 58s
- loadtest/REPORT-R7.md: the final stress-run report — method, the 500-player resource profile, the agreed tuning, the validation (transport_error 2.49% -> 0.72% at 3 gateway cores; the burst run showing connection-bound behavior), and the prod-sizing recommendation for Stage 18. - loadtest/README.md: per-player transports, --cpus capping, docker_stats (was cAdvisor), the absolute BACKEND_DICT_DIR for ./loadtest/... , and report links. - docs/TESTING.md + docs/ARCHITECTURE.md: observability now uses the otelcol docker_stats receiver (cAdvisor removed); links to both trip reports. - CLAUDE.md: repo-layout line reflects docker_stats + per-service limits. - PRERELEASE.md: R7 marked done in the tracker + heading; a Refinements entry recording the decisions, findings, applied tuning and validation. This is the final pre-release hardening phase; Stage 18 (prod cutover) is next.
This commit is contained in:
@@ -561,11 +561,13 @@ promotions) is future work and would deliver short markdown messages (text + lin
|
||||
metrics + Tempo traces), **Prometheus** (15d), **Tempo** (72h) and **Grafana**
|
||||
(provisioned datasources + dashboards, behind the caddy `/_gm/grafana` Basic-Auth)
|
||||
are stood up with the deploy (`deploy/`); the default exporter stays
|
||||
`none`, so CI needs no collector. The contour also runs **cAdvisor** (per-container
|
||||
CPU/memory/network) and **postgres_exporter** (connections, cache-hit ratio,
|
||||
transactions, db size), scraped by Prometheus and surfaced on the **Scrabble —
|
||||
Resources** Grafana dashboard, which captures a resource
|
||||
baseline; these export directly in Prometheus format (not through the collector).
|
||||
`none`, so CI needs no collector. The collector also runs a **`docker_stats`**
|
||||
receiver (per-container CPU/memory/network read from the Docker API and exported
|
||||
through its Prometheus endpoint), and the contour runs **postgres_exporter**
|
||||
(connections, cache-hit ratio, transactions, db size, scraped directly by Prometheus);
|
||||
both are surfaced on the **Scrabble — Resources** Grafana dashboard, which captures the
|
||||
stress-run resource profile. (`docker_stats` replaced cAdvisor, which on the contour
|
||||
host resolved only the root cgroup — a separate-XFS `/var/lib/docker`.)
|
||||
- Per-request server-side timing via gin middleware from day one (the access log
|
||||
carries method, route, status, latency and the active trace id). A
|
||||
client-measured RTT piggybacked on the next request is a later enhancement.
|
||||
|
||||
+4
-2
@@ -127,8 +127,10 @@ tests or touching CI.
|
||||
selection, the report); the DAWG-backed move test runs under `BACKEND_DICT_DIR` (as the
|
||||
engine tests do). It is **not** part of the per-PR suite's behavioural assertions: it
|
||||
runs ad hoc as a one-shot container against the contour, producing a trip report (bugs
|
||||
+ a resource baseline) read off the **cAdvisor + postgres_exporter** Grafana dashboard
|
||||
on the contour. See [`../loadtest/README.md`](../loadtest/README.md).
|
||||
+ a per-container resource profile) read off the **otelcol `docker_stats` +
|
||||
postgres_exporter** Grafana dashboard on the contour. Two passes are recorded — the
|
||||
early [`REPORT-R2.md`](../loadtest/REPORT-R2.md) and the final, tuned
|
||||
[`REPORT-R7.md`](../loadtest/REPORT-R7.md). See [`../loadtest/README.md`](../loadtest/README.md).
|
||||
|
||||
## Principles
|
||||
|
||||
|
||||
Reference in New Issue
Block a user