Merge pull request 'Fix Grafana dashboards mount; connector OTLP via AWG_CONF (no DNS=)' (#18) from feature/contour-defect-fixes into development
This commit was merged in pull request #18.
This commit is contained in:
@@ -188,6 +188,16 @@ jobs:
|
||||
DICT_VERSION: ${{ vars.TEST_DICT_VERSION }}
|
||||
LOG_LEVEL: ${{ vars.TEST_LOG_LEVEL }}
|
||||
run: |
|
||||
# Seed the config files to a stable host path. The runner checks out into
|
||||
# an ephemeral act workspace that is removed after the job, which would
|
||||
# dangle the compose config bind mounts in the long-lived containers
|
||||
# (e.g. Grafana then logs "no such file or directory"). Bind from a stable
|
||||
# dir instead (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring).
|
||||
conf="$HOME/.scrabble-deploy"
|
||||
rm -rf "$conf"
|
||||
mkdir -p "$conf"
|
||||
cp -r caddy otelcol prometheus tempo grafana "$conf"/
|
||||
export SCRABBLE_CONFIG_DIR="$conf"
|
||||
docker compose --ansi never build --progress plain
|
||||
docker compose --ansi never up -d --remove-orphans
|
||||
|
||||
|
||||
+17
-5
@@ -42,6 +42,14 @@ runs `docker compose up -d --build` on the runner host. Stage 18 (prod) maps the
|
||||
**`PROD_`** set the same way. So a Gitea secret named `TEST_POSTGRES_PASSWORD`
|
||||
feeds the compose's `POSTGRES_PASSWORD`, etc.
|
||||
|
||||
The deploy job also **seeds the config files** (`caddy`, `otelcol`, `prometheus`,
|
||||
`tempo`, `grafana`) to a stable host path (`$HOME/.scrabble-deploy`) and sets
|
||||
`SCRABBLE_CONFIG_DIR` to it before `up`. The runner's checkout is an ephemeral act
|
||||
workspace that is removed after the job — binding config straight from it would
|
||||
dangle the mounts in the long-lived containers (Grafana would log
|
||||
`no such file or directory`). Locally `SCRABBLE_CONFIG_DIR` defaults to `.`, so the
|
||||
compose binds from this directory.
|
||||
|
||||
## Required variables
|
||||
|
||||
`docker compose` aborts immediately if any of these is unset (they use `:?`):
|
||||
@@ -49,7 +57,7 @@ feeds the compose's `POSTGRES_PASSWORD`, etc.
|
||||
| Variable | Gitea kind | Purpose |
|
||||
| --- | --- | --- |
|
||||
| `POSTGRES_PASSWORD` | secret | Postgres password (also embedded in `BACKEND_POSTGRES_DSN`). |
|
||||
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). |
|
||||
| `AWG_CONF` | secret | AmneziaWG config for the VPN sidecar (the connector's only egress). **Must not contain a `DNS=` line** — it hijacks the shared netns's resolv.conf and breaks the connector resolving `otelcol` (telemetry export). Without it, Docker's resolver handles both `otelcol` and `api.telegram.org`. |
|
||||
| `GM_BASICAUTH_HASH` | secret | bcrypt hash gating `/_gm` (admin console + Grafana). Generate with `docker run --rm caddy:2-alpine caddy hash-password --plaintext '<pw>'`. |
|
||||
| `TELEGRAM_MINIAPP_URL` | variable | The Mini App URL the connector hands out in deep links / buttons. |
|
||||
|
||||
@@ -87,10 +95,14 @@ These are hard-wired in `docker-compose.yml` (no `${...}`), pointing the service
|
||||
at each other on the `internal` network — listed here so they are not mistaken for
|
||||
missing config: `BACKEND_POSTGRES_DSN` (→ `postgres`, `search_path=backend`),
|
||||
`GATEWAY_BACKEND_HTTP_URL`/`_GRPC_ADDR` (→ `backend`),
|
||||
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), the three
|
||||
services' `*_OTEL_*_EXPORTER=otlp` + `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
|
||||
(`_INSECURE=true`). `GATEWAY_ADMIN_*` is intentionally **unset** — caddy owns `/_gm`
|
||||
in the contour.
|
||||
`GATEWAY_CONNECTOR_ADDR`/`BACKEND_CONNECTOR_ADDR` (→ `telegram:9091`), and all three
|
||||
services' `*_OTEL_*_EXPORTER=otlp` → `OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317`
|
||||
(`_INSECURE=true`). The connector shares the VPN sidecar's netns: routing to the
|
||||
collector's internal IP is fine (connected route), but its `AWG_CONF` must **not**
|
||||
set a `DNS=` directive — that hijacks resolv.conf and breaks resolving `otelcol`
|
||||
("produced zero addresses"); without it the netns uses Docker's resolver, which
|
||||
resolves both `otelcol` and `api.telegram.org`. `GATEWAY_ADMIN_*` is intentionally
|
||||
**unset** — caddy owns `/_gm` in the contour.
|
||||
|
||||
## Host-side setup (outside this repo)
|
||||
|
||||
|
||||
@@ -5,6 +5,12 @@
|
||||
# interpolated from Gitea Actions TEST_ secrets/variables exported by the deploy
|
||||
# job (see deploy/.env.example for the unprefixed names).
|
||||
#
|
||||
# Config bind sources are prefixed with ${SCRABBLE_CONFIG_DIR:-.}: locally they bind
|
||||
# straight from this directory, but CI seeds them to a stable host path and sets
|
||||
# SCRABBLE_CONFIG_DIR to it, because the runner's checkout is ephemeral (act removes
|
||||
# it after the job) and the bind mounts must outlive the job in the long-running
|
||||
# containers (see .gitea/workflows/ci.yaml + deploy/README.md).
|
||||
#
|
||||
# Networking (mirrors ../galaxy-game):
|
||||
# - `internal` (scrabble-internal): all inter-service traffic, project-private
|
||||
# DNS so service names never collide on the shared `edge` network.
|
||||
@@ -125,6 +131,12 @@ services:
|
||||
TELEGRAM_API_BASE_URL: ${TELEGRAM_API_BASE_URL:-}
|
||||
TELEGRAM_LOG_LEVEL: ${LOG_LEVEL:-info}
|
||||
TELEGRAM_SERVICE_NAME: scrabble-telegram
|
||||
# The connector shares the VPN sidecar's netns. Routing to the collector's
|
||||
# internal IP stays off the tunnel (connected route), but the sidecar's DNS
|
||||
# hijacks name resolution: AWG_CONF must NOT carry a `DNS=` directive, else
|
||||
# `otelcol` won't resolve ("produced zero addresses"). Without DNS= the netns
|
||||
# uses Docker's resolver, which resolves both otelcol and api.telegram.org
|
||||
# (see deploy/README.md).
|
||||
TELEGRAM_OTEL_TRACES_EXPORTER: otlp
|
||||
TELEGRAM_OTEL_METRICS_EXPORTER: otlp
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4317
|
||||
@@ -142,7 +154,7 @@ services:
|
||||
GM_BASICAUTH_USER: ${GM_BASICAUTH_USER:-gm}
|
||||
GM_BASICAUTH_HASH: ${GM_BASICAUTH_HASH:?set GM_BASICAUTH_HASH}
|
||||
volumes:
|
||||
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/caddy/Caddyfile:/etc/caddy/Caddyfile:ro
|
||||
- caddy-data:/data
|
||||
networks:
|
||||
internal: {}
|
||||
@@ -156,7 +168,7 @@ services:
|
||||
restart: unless-stopped
|
||||
command: ["--config=/etc/otelcol/config.yaml"]
|
||||
volumes:
|
||||
- ./otelcol/config.yaml:/etc/otelcol/config.yaml:ro
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/otelcol/config.yaml:/etc/otelcol/config.yaml:ro
|
||||
networks: [internal]
|
||||
|
||||
prometheus:
|
||||
@@ -167,7 +179,7 @@ services:
|
||||
- --config.file=/etc/prometheus/prometheus.yml
|
||||
- --storage.tsdb.retention.time=15d
|
||||
volumes:
|
||||
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- prometheus-data:/prometheus
|
||||
networks: [internal]
|
||||
|
||||
@@ -177,7 +189,7 @@ services:
|
||||
restart: unless-stopped
|
||||
command: ["-config.file=/etc/tempo/tempo.yaml"]
|
||||
volumes:
|
||||
- ./tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
|
||||
- tempo-data:/var/tempo
|
||||
networks: [internal]
|
||||
|
||||
@@ -198,8 +210,11 @@ services:
|
||||
GF_USERS_ALLOW_SIGN_UP: "false"
|
||||
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-admin}
|
||||
volumes:
|
||||
- ./grafana/provisioning:/etc/grafana/provisioning:ro
|
||||
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/grafana/provisioning:/etc/grafana/provisioning:ro
|
||||
# Dashboards live under /etc/grafana (NOT /var/lib/grafana, which the
|
||||
# grafana-data volume mounts over — a nested bind there is shadowed and the
|
||||
# provider logs "no such file or directory").
|
||||
- ${SCRABBLE_CONFIG_DIR:-.}/grafana/dashboards:/etc/grafana/dashboards:ro
|
||||
- grafana-data:/var/lib/grafana
|
||||
networks: [internal]
|
||||
|
||||
|
||||
@@ -11,5 +11,5 @@ providers:
|
||||
editable: true
|
||||
allowUiUpdates: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
path: /etc/grafana/dashboards
|
||||
foldersFromFilesStructure: false
|
||||
|
||||
@@ -559,9 +559,13 @@ long-polls Telegram and egresses through a VPN sidecar, answering only internal
|
||||
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
|
||||
one Postgres, the connector (+ its VPN sidecar) and the **observability stack** —
|
||||
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
|
||||
with provisioned datasources and dashboards. Inter-service traffic uses a private
|
||||
`internal` network (project-scoped DNS); only caddy joins the shared external `edge`
|
||||
network (alias `scrabble`).
|
||||
with provisioned datasources and dashboards. All three services export OTLP to the
|
||||
collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not
|
||||
carry a `DNS=` directive (that would hijack resolv.conf and stop it resolving
|
||||
`otelcol`; without it the netns uses Docker's resolver, which resolves both
|
||||
`otelcol` and `api.telegram.org`). Inter-service traffic uses a private `internal`
|
||||
network (project-scoped DNS); only caddy joins the shared external `edge` network
|
||||
(alias `scrabble`).
|
||||
|
||||
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
|
||||
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`
|
||||
|
||||
Reference in New Issue
Block a user