Merge pull request 'Fix Grafana dashboards mount; connector OTLP via AWG_CONF (no DNS=)' (#18) from feature/contour-defect-fixes into development
CI / unit (push) Successful in 8s
CI / integration (push) Successful in 11s
CI / ui (push) Successful in 19s
CI / deploy (push) Successful in 38s

This commit was merged in pull request #18.
This commit is contained in:
2026-06-05 15:46:59 +00:00
5 changed files with 56 additions and 15 deletions
+10
View File
@@ -188,6 +188,16 @@ jobs:
DICT_VERSION: ${{ vars.TEST_DICT_VERSION }}
LOG_LEVEL: ${{ vars.TEST_LOG_LEVEL }}
run: |
# Seed the config files to a stable host path. The runner checks out into
# an ephemeral act workspace that is removed after the job, which would
# dangle the compose config bind mounts in the long-lived containers
# (e.g. Grafana then logs "no such file or directory"). Bind from a stable
# dir instead (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring).
conf="$HOME/.scrabble-deploy"
rm -rf "$conf"
mkdir -p "$conf"
cp -r caddy otelcol prometheus tempo grafana "$conf"/
export SCRABBLE_CONFIG_DIR="$conf"
docker compose --ansi never build --progress plain
docker compose --ansi never up -d --remove-orphans
+17 -5
View File
@@ -42,6 +42,14 @@ runs `docker compose up -d --build` on the runner host. Stage 18 (prod) maps the
**`PROD_`** set the same way. So a Gitea secret named `TEST_POSTGRES_PASSWORD`
feeds the compose's `POSTGRES_PASSWORD`, etc.
The deploy job also **seeds the config files** (`caddy`, `otelcol`, `prometheus`,
`tempo`, `grafana`) to a stable host path (`$HOME/.scrabble-deploy`) and sets
`SCRABBLE_CONFIG_DIR` to it before `up`. The runner's checkout is an ephemeral act
workspace that is removed after the job — binding config straight from it would
dangle the mounts in the long-lived containers (Grafana would log
`no such file or directory`). Locally `SCRABBLE_CONFIG_DIR` defaults to `.`, so the
compose binds from this directory.
## Required variables
`docker compose` aborts immediately if any of these is unset (they use `:?`):
@@ -49,7 +57,7 @@ feeds the compose's `POSTGRES_PASSWORD`, etc.
| Variable | Gitea kind | Purpose |
| --- | --- | --- |
| `POSTGRES_PASSWORD` | secret | Postgres password (also embedded in `BACKEND_POSTGRES_DSN`). |
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). |
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). **Must not contain a `DNS=` line** — it hijacks the shared netns's resolv.conf and breaks the connector resolving `otelcol` (telemetry export). Without it, Docker's resolver handles both `otelcol` and `api.telegram.org`. |
| `GM_BASICAUTH_HASH` | secret | bcrypt hash gating `/_gm` (admin console + Grafana). Generate with `docker run --rm caddy:2-alpine caddy hash-password --plaintext '<pw>'`. |
| `TELEGRAM_MINIAPP_URL` | variable | The Mini App URL the connector hands out in deep links / buttons. |
@@ -87,10 +95,14 @@ These are hard-wired in `docker-compose.yml` (no `${...}`), pointing the service
at each other on the `internal` network — listed here so they are not mistaken for
missing config: `BACKEND_POSTGRES_DSN` (→ `postgres`, `search_path=backend`),
`GATEWAY_BACKEND_HTTP_URL`/`_GRPC_ADDR` (→ `backend`),
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), the three
services' `*_OTEL_*_EXPORTER=otlp` + `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
(`_INSECURE=true`). `GATEWAY_ADMIN_*` is intentionally **unset** — caddy owns `/_gm`
in the contour.
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), and all three
services' `*_OTEL_*_EXPORTER=otlp` `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
(`_INSECURE=true`). The connector shares the VPN sidecar's netns: routing to the
collector's internal IP is fine (connected route), but its `AWG_CONF` must **not**
set a `DNS=` directive — that hijacks resolv.conf and breaks resolving `otelcol`
("produced zero addresses"); without it the netns uses Docker's resolver, which
resolves both `otelcol` and `api.telegram.org`. `GATEWAY_ADMIN_*` is intentionally
**unset** — caddy owns `/_gm` in the contour.
## Host-side setup (outside this repo)
+21 -6
View File
@@ -5,6 +5,12 @@
# interpolated from Gitea Actions TEST_ secrets/variables exported by the deploy
# job (see deploy/.env.example for the unprefixed names).
#
# Config bind sources are prefixed with ${SCRABBLE_CONFIG_DIR:-.}: locally they bind
# straight from this directory, but CI seeds them to a stable host path and sets
# SCRABBLE_CONFIG_DIR to it, because the runner's checkout is ephemeral (act removes
# it after the job) and the bind mounts must outlive the job in the long-running
# containers (see .gitea/workflows/ci.yaml + deploy/README.md).
#
# Networking (mirrors ../galaxy-game):
# - `internal` (scrabble-internal): all inter-service traffic, project-private
# DNS so service names never collide on the shared `edge` network.
@@ -125,6 +131,12 @@ services:
TELEGRAM_API_BASE_URL: ${TELEGRAM_API_BASE_URL:-}
TELEGRAM_LOG_LEVEL: ${LOG_LEVEL:-info}
TELEGRAM_SERVICE_NAME: scrabble-telegram
# The connector shares the VPN sidecar's netns. Routing to the collector's
# internal IP stays off the tunnel (connected route), but the sidecar's DNS
# hijacks name resolution: AWG_CONF must NOT carry a `DNS=` directive, else
# `otelcol` won't resolve ("produced zero addresses"). Without DNS= the netns
# uses Docker's resolver, which resolves both otelcol and api.telegram.org
# (see deploy/README.md).
TELEGRAM_OTEL_TRACES_EXPORTER: otlp
TELEGRAM_OTEL_METRICS_EXPORTER: otlp
OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4317
@@ -142,7 +154,7 @@ services:
GM_BASICAUTH_USER: ${GM_BASICAUTH_USER:-gm}
GM_BASICAUTH_HASH: ${GM_BASICAUTH_HASH:?set GM_BASICAUTH_HASH}
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
- ${SCRABBLE_CONFIG_DIR:-.}/caddy/Caddyfile:/etc/caddy/Caddyfile:ro
- caddy-data:/data
networks:
internal: {}
@@ -156,7 +168,7 @@ services:
restart: unless-stopped
command: ["--config=/etc/otelcol/config.yaml"]
volumes:
- ./otelcol/config.yaml:/etc/otelcol/config.yaml:ro
- ${SCRABBLE_CONFIG_DIR:-.}/otelcol/config.yaml:/etc/otelcol/config.yaml:ro
networks: [internal]
prometheus:
@@ -167,7 +179,7 @@ services:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=15d
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ${SCRABBLE_CONFIG_DIR:-.}/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
networks: [internal]
@@ -177,7 +189,7 @@ services:
restart: unless-stopped
command: ["-config.file=/etc/tempo/tempo.yaml"]
volumes:
- ./tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
- ${SCRABBLE_CONFIG_DIR:-.}/tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
- tempo-data:/var/tempo
networks: [internal]
@@ -198,8 +210,11 @@ services:
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-admin}
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
- ${SCRABBLE_CONFIG_DIR:-.}/grafana/provisioning:/etc/grafana/provisioning:ro
# Dashboards live under /etc/grafana (NOT /var/lib/grafana, which the
# grafana-data volume mounts over — a nested bind there is shadowed and the
# provider logs "no such file or directory").
- ${SCRABBLE_CONFIG_DIR:-.}/grafana/dashboards:/etc/grafana/dashboards:ro
- grafana-data:/var/lib/grafana
networks: [internal]
@@ -11,5 +11,5 @@ providers:
editable: true
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
path: /etc/grafana/dashboards
foldersFromFilesStructure: false
+7 -3
View File
@@ -559,9 +559,13 @@ long-polls Telegram and egresses through a VPN sidecar, answering only internal
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
one Postgres, the connector (+ its VPN sidecar) and the **observability stack**
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
with provisioned datasources and dashboards. Inter-service traffic uses a private
`internal` network (project-scoped DNS); only caddy joins the shared external `edge`
network (alias `scrabble`).
with provisioned datasources and dashboards. All three services export OTLP to the
collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not
carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving
`otelcol`; without it the netns uses Docker's resolver, which resolves both
`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal`
network (project-scoped DNS); only caddy joins the shared external `edge` network
(alias `scrabble`).
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`