diff --git a/deploy/README.md b/deploy/README.md index 8797966..0b28545 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -49,7 +49,7 @@ feeds the compose's `POSTGRES_PASSWORD`, etc. | Variable | Gitea kind | Purpose | | --- | --- | --- | | `POSTGRES_PASSWORD` | secret | Postgres password (also embedded in `BACKEND_POSTGRES_DSN`). | -| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). | +| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). **Must not contain a `DNS=` line** — it hijacks the shared netns's resolv.conf and breaks the connector resolving `otelcol` (telemetry export). Without it, Docker's resolver handles both `otelcol` and `api.telegram.org`. | | `GM_BASICAUTH_HASH` | secret | bcrypt hash gating `/_gm` (admin console + Grafana). Generate with `docker run --rm caddy:2-alpine caddy hash-password --plaintext ''`. | | `TELEGRAM_MINIAPP_URL` | variable | The Mini App URL the connector hands out in deep links / buttons. | @@ -87,10 +87,14 @@ These are hard-wired in `docker-compose.yml` (no `${...}`), pointing the service at each other on the `internal` network — listed here so they are not mistaken for missing config: `BACKEND_POSTGRES_DSN` (→ `postgres`, `search_path=backend`), `GATEWAY_BACKEND_HTTP_URL`/`_GRPC_ADDR` (→ `backend`), -`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), the three -services' `*_OTEL_*_EXPORTER=otlp` + `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317` -(`_INSECURE=true`). `GATEWAY_ADMIN_*` is intentionally **unset** — caddy owns `/_gm` -in the contour. +`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), and all three +services' `*_OTEL_*_EXPORTER=otlp` → `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317` +(`_INSECURE=true`). The connector shares the VPN sidecar's netns: routing to the +collector's internal IP is fine (connected route), but its `AWG_CONF` must **not** +set a `DNS=` directive — that hijacks resolv.conf and breaks resolving `otelcol` +("produced zero addresses"); without it the netns uses Docker's resolver, which +resolves both `otelcol` and `api.telegram.org`. `GATEWAY_ADMIN_*` is intentionally +**unset** — caddy owns `/_gm` in the contour. ## Host-side setup (outside this repo) diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml index 6868737..f36a531 100644 --- a/deploy/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -125,6 +125,12 @@ services: TELEGRAM_API_BASE_URL: ${TELEGRAM_API_BASE_URL:-} TELEGRAM_LOG_LEVEL: ${LOG_LEVEL:-info} TELEGRAM_SERVICE_NAME: scrabble-telegram + # The connector shares the VPN sidecar's netns. Routing to the collector's + # internal IP stays off the tunnel (connected route), but the sidecar's DNS + # hijacks name resolution: AWG_CONF must NOT carry a `DNS=` directive, else + # `otelcol` won't resolve ("produced zero addresses"). Without DNS= the netns + # uses Docker's resolver, which resolves both otelcol and api.telegram.org + # (see deploy/README.md). TELEGRAM_OTEL_TRACES_EXPORTER: otlp TELEGRAM_OTEL_METRICS_EXPORTER: otlp OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4317 @@ -199,7 +205,10 @@ services: GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-admin} volumes: - ./grafana/provisioning:/etc/grafana/provisioning:ro - - ./grafana/dashboards:/var/lib/grafana/dashboards:ro + # Dashboards live under /etc/grafana (NOT /var/lib/grafana, which the + # grafana-data volume mounts over — a nested bind there is shadowed and the + # provider logs "no such file or directory"). + - ./grafana/dashboards:/etc/grafana/dashboards:ro - grafana-data:/var/lib/grafana networks: [internal] diff --git a/deploy/grafana/provisioning/dashboards/dashboards.yaml b/deploy/grafana/provisioning/dashboards/dashboards.yaml index 3772be2..8b92fd6 100644 --- a/deploy/grafana/provisioning/dashboards/dashboards.yaml +++ b/deploy/grafana/provisioning/dashboards/dashboards.yaml @@ -11,5 +11,5 @@ providers: editable: true allowUiUpdates: true options: - path: /var/lib/grafana/dashboards + path: /etc/grafana/dashboards foldersFromFilesStructure: false diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index f172a9f..6f369f0 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -559,9 +559,13 @@ long-polls Telegram and egresses through a VPN sidecar, answering only internal The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`, one Postgres, the connector (+ its VPN sidecar) and the **observability stack** — OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana -with provisioned datasources and dashboards. Inter-service traffic uses a private -`internal` network (project-scoped DNS); only caddy joins the shared external `edge` -network (alias `scrabble`). +with provisioned datasources and dashboards. All three services export OTLP to the +collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not +carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving +`otelcol`; without it the netns uses Docker's resolver, which resolves both +`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal` +network (project-scoped DNS); only caddy joins the shared external `edge` network +(alias `scrabble`). Two contours, two secret/variable prefixes (`TEST_` / `PROD_`): - **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`