Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or
directory": the CI runner checks out into an ephemeral act workspace that is
removed after the job, so binding the compose config files straight from it
dangles the mounts in the long-lived containers (verified the act source dir is
emptied after the job). caddy/otelcol/prometheus/tempo read their config once at
startup so they survive, but would break on a restart — same latent bug.
Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds
the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via
${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header,
deploy/README.md and the ci.yaml step.
- deploy/docker-compose.yml: mount the provisioned dashboards at
/etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data
volume mounts over the latter and shadows the nested bind, so the provider
logged "readdirent /var/lib/grafana/dashboards: no such file or directory".
dashboards.yaml provider path updated to match.
- Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's
internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks
name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't
resolve ("produced zero addresses"). Without DNS= the netns uses Docker's
resolver (resolves both otelcol and api.telegram.org). Documented in
deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.
- PLAN.md: new Stage 17 "Test-contour verification & defect fixes" (exercise the
deployed contour end-to-end and fix what it surfaces — connector liveness check,
path-conditional CI); the former prod-deploy stage becomes Stage 18.
- Renumber every "Stage 17" prod-deploy reference to "Stage 18" across docs,
compose, Caddyfile, ci.yaml and CLAUDE.md; the post-Stage-14 split range is now
"Stages 15–18".
- backend + gateway multi-stage distroless Dockerfiles; the gateway embeds and
serves the SPA at / and /telegram/ via go:embed (committed dist placeholder,
real build baked in by the image's node stage)
- deploy/docker-compose.yml: backend + gateway + Postgres + Telegram connector
(VPN sidecar) + OTel Collector + Prometheus (15d) + Tempo (72h) + Grafana,
fronted by a caddy owning a single /_gm Basic-Auth (admin console + Grafana
subpath); inter-service on a private network, only caddy on the edge network
- new metrics: backend accounts_created_total{kind} (robots excluded) and an
in-memory gateway active_users{window=24h,7d} gauge
- CI: single .gitea/workflows/ci.yaml (unit/integration/ui + a gated test-contour
deploy) on the new feature/* -> development -> master branch model; the old
go-unit/integration/ui-test workflows are folded in; the connector-scoped
compose is retired (superseded by deploy/)
- docs: ARCHITECTURE §11/§12/§13, root + gateway READMEs, CLAUDE.md branching,
PLAN.md (stage 16 done + refinements + Stage 17 forward-notes)