- Grafana: disable Live (GF_LIVE_MAX_CONNECTIONS=0) so its WebSocket no longer trips caddy Basic-Auth and re-prompts; admin console gains a Grafana nav link
- deploy: force-recreate config-only services so reseeded Grafana dashboards / Caddyfile are actually picked up (the move-duration panel was invisible because the bind-mount went stale)
- rate-limit: raise per-user budget 120/40 -> 300/80; UI skips reloading on the echo of the player's own move (fewer requests, no double-load)
- iOS/Telegram reconnect: suppress the connection banner while backgrounded and for a short grace after resume; reconnect silently; wire visibilitychange + pageshow/pagehide + Telegram activated/deactivated (Bot API 8.0)
- hint button disabled when 0 hints remain; nudge button shows a disabled state on your own turn
- players plaque: invert so the active seat pops (accent chip, raised) and others recede
- make-move UX: a direct ✅ commit button (no hold/popover); the Shuffle tab becomes ↩️ Reset while tiles are pending
Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or
directory": the CI runner checks out into an ephemeral act workspace that is
removed after the job, so binding the compose config files straight from it
dangles the mounts in the long-lived containers (verified the act source dir is
emptied after the job). caddy/otelcol/prometheus/tempo read their config once at
startup so they survive, but would break on a restart — same latent bug.
Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds
the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via
${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header,
deploy/README.md and the ci.yaml step.
- deploy/docker-compose.yml: mount the provisioned dashboards at
/etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data
volume mounts over the latter and shadows the nested bind, so the provider
logged "readdirent /var/lib/grafana/dashboards: no such file or directory".
dashboards.yaml provider path updated to match.
- Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's
internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks
name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't
resolve ("produced zero addresses"). Without DNS= the netns uses Docker's
resolver (resolves both otelcol and api.telegram.org). Documented in
deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.
- PLAN.md: new Stage 17 "Test-contour verification & defect fixes" (exercise the
deployed contour end-to-end and fix what it surfaces — connector liveness check,
path-conditional CI); the former prod-deploy stage becomes Stage 18.
- Renumber every "Stage 17" prod-deploy reference to "Stage 18" across docs,
compose, Caddyfile, ci.yaml and CLAUDE.md; the post-Stage-14 split range is now
"Stages 15–18".
- backend + gateway multi-stage distroless Dockerfiles; the gateway embeds and
serves the SPA at / and /telegram/ via go:embed (committed dist placeholder,
real build baked in by the image's node stage)
- deploy/docker-compose.yml: backend + gateway + Postgres + Telegram connector
(VPN sidecar) + OTel Collector + Prometheus (15d) + Tempo (72h) + Grafana,
fronted by a caddy owning a single /_gm Basic-Auth (admin console + Grafana
subpath); inter-service on a private network, only caddy on the edge network
- new metrics: backend accounts_created_total{kind} (robots excluded) and an
in-memory gateway active_users{window=24h,7d} gauge
- CI: single .gitea/workflows/ci.yaml (unit/integration/ui + a gated test-contour
deploy) on the new feature/* -> development -> master branch model; the old
go-unit/integration/ui-test workflows are folded in; the connector-scoped
compose is retired (superseded by deploy/)
- docs: ARCHITECTURE §11/§12/§13, root + gateway READMEs, CLAUDE.md branching,
PLAN.md (stage 16 done + refinements + Stage 17 forward-notes)