Files
scrabble-game/deploy/grafana/dashboards/resources.json
T
Ilia Denisov c16f27475f
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 13s
CI / ui (pull_request) Successful in 37s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m21s
R7: contour docker_stats observability + container limits/GOMAXPROCS
Observability: replace cAdvisor (which resolves only the root cgroup on the
contour host — separate-XFS /var/lib/docker) with the otelcol docker_stats
receiver, which reads per-container CPU/memory/network straight from the Docker
API and works the same in prod. The collector joins the host docker group
(DOCKER_GID, default 989) and mounts the socket read-only; its metrics flow out
through the existing prometheus exporter, so the cAdvisor scrape job and the
privileged cAdvisor service are removed. The Resources dashboard panels are
retargeted to the docker_stats metric names (container_name label;
container.cpu.utilization/100 == cores).

Container limits: apply deploy.resources.limits (honoured by Compose v2) across
the contour and pin GOMAXPROCS to the CPU limit on the Go services so the runtime
matches the cgroup quota. Starting values are generous over the R2 peak (~1 core /
<=100 MiB per app service) to avoid skewing or OOM-killing the measurement run;
they are tightened to the agreed prod sizing after the final stress run (R7
Round 2). The privileged VPN sidecar is left unconstrained.
2026-06-10 18:53:19 +02:00

85 lines
4.4 KiB
JSON

{
"uid": "scrabble-resources",
"title": "Scrabble — Resources",
"tags": ["scrabble"],
"timezone": "",
"schemaVersion": 39,
"version": 2,
"refresh": "30s",
"time": { "from": "now-1h", "to": "now" },
"panels": [
{
"type": "stat",
"title": "Postgres connections",
"description": "Backends connected to the scrabble database (postgres_exporter).",
"gridPos": { "h": 5, "w": 6, "x": 0, "y": 0 },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "sum(pg_stat_database_numbackends{datname=\"scrabble\"})" }]
},
{
"type": "stat",
"title": "Postgres cache hit ratio",
"description": "blks_hit / (blks_hit + blks_read) over 5m.",
"gridPos": { "h": 5, "w": 6, "x": 6, "y": 0 },
"fieldConfig": { "defaults": { "unit": "percentunit", "min": 0, "max": 1 }, "overrides": [] },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "sum(rate(pg_stat_database_blks_hit{datname=\"scrabble\"}[5m])) / clamp_min(sum(rate(pg_stat_database_blks_hit{datname=\"scrabble\"}[5m])) + sum(rate(pg_stat_database_blks_read{datname=\"scrabble\"}[5m])), 1)" }]
},
{
"type": "stat",
"title": "Postgres commits/s",
"gridPos": { "h": 5, "w": 6, "x": 12, "y": 0 },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "sum(rate(pg_stat_database_xact_commit{datname=\"scrabble\"}[5m]))" }]
},
{
"type": "stat",
"title": "Database size",
"gridPos": { "h": 5, "w": 6, "x": 18, "y": 0 },
"fieldConfig": { "defaults": { "unit": "bytes" }, "overrides": [] },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "max(pg_database_size_bytes{datname=\"scrabble\"})" }]
},
{
"type": "timeseries",
"title": "Container CPU (cores) by container",
"description": "docker_stats container.cpu.utilization (a gauge where 100 == one core) / 100, per scrabble-* container; the load harness appears when run as --name scrabble-loadtest. Verify the scaling against live Prometheus.",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 5 },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "max(container_cpu_utilization{container_name=~\"scrabble-.+\"}) by (container_name) / 100", "legendFormat": "{{container_name}}" }]
},
{
"type": "timeseries",
"title": "Container memory (usage) by container",
"description": "docker_stats container.memory.usage.total bytes, per scrabble-* container.",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 5 },
"fieldConfig": { "defaults": { "unit": "bytes" }, "overrides": [] },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [{ "refId": "A", "expr": "max(container_memory_usage_total{container_name=~\"scrabble-.+\"}) by (container_name)", "legendFormat": "{{container_name}}" }]
},
{
"type": "timeseries",
"title": "Container network I/O by container",
"description": "docker_stats receive (+) and transmit (-) byte rates per scrabble-* container (summed across interfaces).",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 13 },
"fieldConfig": { "defaults": { "unit": "Bps" }, "overrides": [] },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "refId": "A", "expr": "sum(rate(container_network_io_usage_rx_bytes{container_name=~\"scrabble-.+\"}[5m])) by (container_name)", "legendFormat": "rx {{container_name}}" },
{ "refId": "B", "expr": "-sum(rate(container_network_io_usage_tx_bytes{container_name=~\"scrabble-.+\"}[5m])) by (container_name)", "legendFormat": "tx {{container_name}}" }
]
},
{
"type": "timeseries",
"title": "Postgres transactions/s",
"description": "Commit and rollback rates on the scrabble database (postgres_exporter).",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 13 },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "refId": "A", "expr": "sum(rate(pg_stat_database_xact_commit{datname=\"scrabble\"}[5m]))", "legendFormat": "commit" },
{ "refId": "B", "expr": "sum(rate(pg_stat_database_xact_rollback{datname=\"scrabble\"}[5m]))", "legendFormat": "rollback" }
]
}
]
}