R7: contour docker_stats observability + container limits/GOMAXPROCS
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 13s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m21s
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 13s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m21s
Observability: replace cAdvisor (which resolves only the root cgroup on the contour host — separate-XFS /var/lib/docker) with the otelcol docker_stats receiver, which reads per-container CPU/memory/network straight from the Docker API and works the same in prod. The collector joins the host docker group (DOCKER_GID, default 989) and mounts the socket read-only; its metrics flow out through the existing prometheus exporter, so the cAdvisor scrape job and the privileged cAdvisor service are removed. The Resources dashboard panels are retargeted to the docker_stats metric names (container_name label; container.cpu.utilization/100 == cores). Container limits: apply deploy.resources.limits (honoured by Compose v2) across the contour and pin GOMAXPROCS to the CPU limit on the Go services so the runtime matches the cgroup quota. Starting values are generous over the R2 peak (~1 core / <=100 MiB per app service) to avoid skewing or OOM-killing the measurement run; they are tightened to the agreed prod sizing after the final stress run (R7 Round 2). The privileged VPN sidecar is left unconstrained.
This commit is contained in:
@@ -6,17 +6,14 @@ global:
|
||||
evaluation_interval: 30s
|
||||
|
||||
scrape_configs:
|
||||
# otelcol exposes both the services' OTLP metrics and the docker_stats receiver's
|
||||
# per-container resource metrics (CPU/memory/network) on one endpoint.
|
||||
- job_name: otelcol
|
||||
static_configs:
|
||||
- targets: ["otelcol:9464"]
|
||||
- job_name: prometheus
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
# Container resource metrics (CPU/memory/network/disk) for every contour
|
||||
# container, for the R2/R7 stress runs' resource baseline.
|
||||
- job_name: cadvisor
|
||||
static_configs:
|
||||
- targets: ["cadvisor:8080"]
|
||||
# Postgres server metrics (connections, cache hit ratio, transactions, db size).
|
||||
- job_name: postgres_exporter
|
||||
static_configs:
|
||||
|
||||
Reference in New Issue
Block a user