Fix Grafana dashboards mount; connector OTLP via AWG_CONF (no DNS=) #18
Reference in New Issue
Block a user
Delete Branch "feature/contour-defect-fixes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Defect fixes found by inspecting the live test contour's container logs.
1. Grafana dashboards never loaded — the provider logged
readdirent /var/lib/grafana/dashboards: no such file or directoryevery 10s. Thegrafana-datanamed volume mounts over/var/lib/grafana, shadowing the nested dashboards bind. Fix: mount the dashboards at/etc/grafana/dashboards(no volume there) and point the provider path at it. Takes effect on this deploy.2. Connector telemetry — the connector logged
failed to upload metrics: ... produced zero addressesevery minute. Diagnosed against the running netns: routing to the collector's internal IP is fine (connected route, off-tunnel), but the VPN sidecar sets resolv.conf to the VPN DNS, so the docker nameotelcoldoesn't resolve. Telemetry stays on (OTLP by name); the fix is owner-side and documented:All other containers are healthy (backend/gateway export telemetry fine, otelcol/prometheus/tempo/caddy/postgres OK).
- deploy/docker-compose.yml: mount the provisioned dashboards at /etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data volume mounts over the latter and shadows the nested bind, so the provider logged "readdirent /var/lib/grafana/dashboards: no such file or directory". dashboards.yaml provider path updated to match. - Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't resolve ("produced zero addresses"). Without DNS= the netns uses Docker's resolver (resolves both otelcol and api.telegram.org). Documented in deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or directory": the CI runner checks out into an ephemeral act workspace that is removed after the job, so binding the compose config files straight from it dangles the mounts in the long-lived containers (verified the act source dir is emptied after the job). caddy/otelcol/prometheus/tempo read their config once at startup so they survive, but would break on a restart — same latent bug. Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via ${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header, deploy/README.md and the ci.yaml step.