Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or
directory": the CI runner checks out into an ephemeral act workspace that is
removed after the job, so binding the compose config files straight from it
dangles the mounts in the long-lived containers (verified the act source dir is
emptied after the job). caddy/otelcol/prometheus/tempo read their config once at
startup so they survive, but would break on a restart — same latent bug.
Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds
the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via
${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header,
deploy/README.md and the ci.yaml step.
- deploy/docker-compose.yml: mount the provisioned dashboards at
/etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data
volume mounts over the latter and shadows the nested bind, so the provider
logged "readdirent /var/lib/grafana/dashboards: no such file or directory".
dashboards.yaml provider path updated to match.
- Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's
internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks
name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't
resolve ("produced zero addresses"). Without DNS= the netns uses Docker's
resolver (resolves both otelcol and api.telegram.org). Documented in
deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.
- PLAN.md: new Stage 17 "Test-contour verification & defect fixes" (exercise the
deployed contour end-to-end and fix what it surfaces — connector liveness check,
path-conditional CI); the former prod-deploy stage becomes Stage 18.
- Renumber every "Stage 17" prod-deploy reference to "Stage 18" across docs,
compose, Caddyfile, ci.yaml and CLAUDE.md; the post-Stage-14 split range is now
"Stages 15–18".
- bot.New now selects Telegram's test environment with the library's native
tgbot.UseTestEnvironment() instead of a token += "/test" hack (functionally
identical URL /bot<token>/test/METHOD, but idiomatic) + a bot test asserting
the getMe path for both test and prod.
- ci.yaml pins TELEGRAM_TEST_ENV=true for the test contour (it IS the test
environment) instead of a TEST_TELEGRAM_TEST_ENV variable: removes the
confusing double-TEST, telegram-specific, prefixed operator knob and the
secret-vs-variable footgun. Prod (Stage 17) leaves it false.
- deploy/README.md + PLAN.md updated.
- deploy/README.md documents the services, how to run it locally and in CI, and
every variable: required (the four :? ones + ≥1 bot token) and optional with
defaults, marked secret-vs-variable and with the TEST_/PROD_ Gitea mapping;
plus the fixed internal wiring and the host-side setup.
- ci.yaml maps the remaining POSTGRES_DB/USER, DICT_VERSION and LOG_LEVEL (unset
renders empty -> the compose ":-" defaults apply), so every documented var is
per-contour overridable.
- .env.example points at the README for the full reference.