Fix Grafana dashboards mount; keep connector OTLP (AWG_CONF must omit DNS=)
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 20s
CI / deploy (pull_request) Successful in 19s

- deploy/docker-compose.yml: mount the provisioned dashboards at
  /etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data
  volume mounts over the latter and shadows the nested bind, so the provider
  logged "readdirent /var/lib/grafana/dashboards: no such file or directory".
  dashboards.yaml provider path updated to match.
- Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's
  internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks
  name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't
  resolve ("produced zero addresses"). Without DNS= the netns uses Docker's
  resolver (resolves both otelcol and api.telegram.org). Documented in
  deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.
This commit is contained in:
Ilia Denisov
2026-06-05 17:34:33 +02:00
parent dce3edacee
commit 4a07d48a7b
4 changed files with 27 additions and 10 deletions
+9 -5
View File
@@ -49,7 +49,7 @@ feeds the compose's `POSTGRES_PASSWORD`, etc.
| Variable | Gitea kind | Purpose |
| --- | --- | --- |
| `POSTGRES_PASSWORD` | secret | Postgres password (also embedded in `BACKEND_POSTGRES_DSN`). |
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). |
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). **Must not contain a `DNS=` line** — it hijacks the shared netns's resolv.conf and breaks the connector resolving `otelcol` (telemetry export). Without it, Docker's resolver handles both `otelcol` and `api.telegram.org`. |
| `GM_BASICAUTH_HASH` | secret | bcrypt hash gating `/_gm` (admin console + Grafana). Generate with `docker run --rm caddy:2-alpine caddy hash-password --plaintext '<pw>'`. |
| `TELEGRAM_MINIAPP_URL` | variable | The Mini App URL the connector hands out in deep links / buttons. |
@@ -87,10 +87,14 @@ These are hard-wired in `docker-compose.yml` (no `${...}`), pointing the service
at each other on the `internal` network — listed here so they are not mistaken for
missing config: `BACKEND_POSTGRES_DSN` (→ `postgres`, `search_path=backend`),
`GATEWAY_BACKEND_HTTP_URL`/`_GRPC_ADDR` (→ `backend`),
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), the three
services' `*_OTEL_*_EXPORTER=otlp` + `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
(`_INSECURE=true`). `GATEWAY_ADMIN_*` is intentionally **unset** — caddy owns `/_gm`
in the contour.
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), and all three
services' `*_OTEL_*_EXPORTER=otlp` `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
(`_INSECURE=true`). The connector shares the VPN sidecar's netns: routing to the
collector's internal IP is fine (connected route), but its `AWG_CONF` must **not**
set a `DNS=` directive — that hijacks resolv.conf and breaks resolving `otelcol`
("produced zero addresses"); without it the netns uses Docker's resolver, which
resolves both `otelcol` and `api.telegram.org`. `GATEWAY_ADMIN_*` is intentionally
**unset** — caddy owns `/_gm` in the contour.
## Host-side setup (outside this repo)
+10 -1
View File
@@ -125,6 +125,12 @@ services:
TELEGRAM_API_BASE_URL: ${TELEGRAM_API_BASE_URL:-}
TELEGRAM_LOG_LEVEL: ${LOG_LEVEL:-info}
TELEGRAM_SERVICE_NAME: scrabble-telegram
# The connector shares the VPN sidecar's netns. Routing to the collector's
# internal IP stays off the tunnel (connected route), but the sidecar's DNS
# hijacks name resolution: AWG_CONF must NOT carry a `DNS=` directive, else
# `otelcol` won't resolve ("produced zero addresses"). Without DNS= the netns
# uses Docker's resolver, which resolves both otelcol and api.telegram.org
# (see deploy/README.md).
TELEGRAM_OTEL_TRACES_EXPORTER: otlp
TELEGRAM_OTEL_METRICS_EXPORTER: otlp
OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4317
@@ -199,7 +205,10 @@ services:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-admin}
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
# Dashboards live under /etc/grafana (NOT /var/lib/grafana, which the
# grafana-data volume mounts over — a nested bind there is shadowed and the
# provider logs "no such file or directory").
- ./grafana/dashboards:/etc/grafana/dashboards:ro
- grafana-data:/var/lib/grafana
networks: [internal]
@@ -11,5 +11,5 @@ providers:
editable: true
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
path: /etc/grafana/dashboards
foldersFromFilesStructure: false
+7 -3
View File
@@ -559,9 +559,13 @@ long-polls Telegram and egresses through a VPN sidecar, answering only internal
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
one Postgres, the connector (+ its VPN sidecar) and the **observability stack**
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
with provisioned datasources and dashboards. Inter-service traffic uses a private
`internal` network (project-scoped DNS); only caddy joins the shared external `edge`
network (alias `scrabble`).
with provisioned datasources and dashboards. All three services export OTLP to the
collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not
carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving
`otelcol`; without it the netns uses Docker's resolver, which resolves both
`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal`
network (project-scoped DNS); only caddy joins the shared external `edge` network
(alias `scrabble`).
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`