Commit Graph

8 Commits

Author SHA1 Message Date
Ilia Denisov c94cd3c3bf Stage 17 (contour round 2): Grafana Live/reload, rate-limit, iOS reconnect, hint/plaque/make-move UX
- Grafana: disable Live (GF_LIVE_MAX_CONNECTIONS=0) so its WebSocket no longer trips caddy Basic-Auth and re-prompts; admin console gains a Grafana nav link
- deploy: force-recreate config-only services so reseeded Grafana dashboards / Caddyfile are actually picked up (the move-duration panel was invisible because the bind-mount went stale)
- rate-limit: raise per-user budget 120/40 -> 300/80; UI skips reloading on the echo of the player's own move (fewer requests, no double-load)
- iOS/Telegram reconnect: suppress the connection banner while backgrounded and for a short grace after resume; reconnect silently; wire visibilitychange + pageshow/pagehide + Telegram activated/deactivated (Bot API 8.0)
- hint button disabled when 0 hints remain; nudge button shows a disabled state on your own turn
- players plaque: invert so the active seat pops (accent chip, raised) and others recede
- make-move UX: a direct  commit button (no hold/popover); the Shuffle tab becomes ↩️ Reset while tiles are pending
2026-06-06 12:33:52 +02:00
Ilia Denisov c0b46a7ca6 Stage 17: path-conditional CI behind an aggregate gate + connector liveness probe; Grafana move-duration panel
- #10 a `changes` job path-filters unit/integration/ui; an always-running `gate` job aggregates them (success-or-skipped) and becomes the only required check
- #9 deploy adds a Telegram-connector liveness probe (docker inspect: running, not restarting, stable restart count) with a VPN-handshake grace period
- #1a Game-domain dashboard gains a 'Move think-time by phase (p50/p95)' panel
- deploy README: branch protection now requires only CI / gate
2026-06-06 10:05:01 +02:00
Ilia Denisov 831ecd0cab Fix dangling config binds: seed configs to a stable host path
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 19s
CI / deploy (pull_request) Successful in 20s
Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or
directory": the CI runner checks out into an ephemeral act workspace that is
removed after the job, so binding the compose config files straight from it
dangles the mounts in the long-lived containers (verified the act source dir is
emptied after the job). caddy/otelcol/prometheus/tempo read their config once at
startup so they survive, but would break on a restart — same latent bug.

Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds
the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via
${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header,
deploy/README.md and the ci.yaml step.
2026-06-05 17:42:21 +02:00
Ilia Denisov 4a07d48a7b Fix Grafana dashboards mount; keep connector OTLP (AWG_CONF must omit DNS=)
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 20s
CI / deploy (pull_request) Successful in 19s
- deploy/docker-compose.yml: mount the provisioned dashboards at
  /etc/grafana/dashboards, not /var/lib/grafana/dashboards — the grafana-data
  volume mounts over the latter and shadows the nested bind, so the provider
  logged "readdirent /var/lib/grafana/dashboards: no such file or directory".
  dashboards.yaml provider path updated to match.
- Connector telemetry stays OTLP. The VPN sidecar's netns reaches the collector's
  internal IP fine (connected route, off-tunnel), but the sidecar's DNS hijacks
  name resolution: AWG_CONF must NOT carry a DNS= directive, else otelcol won't
  resolve ("produced zero addresses"). Without DNS= the netns uses Docker's
  resolver (resolves both otelcol and api.telegram.org). Documented in
  deploy/README.md (AWG_CONF row + wiring note), ARCHITECTURE §13, compose comment.
2026-06-05 17:34:33 +02:00
Ilia Denisov efbaf657c6 Stage 16: insert Stage 17 (test-contour verification); renumber prod deploy to 18
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 20s
CI / deploy (pull_request) Successful in 21s
- PLAN.md: new Stage 17 "Test-contour verification & defect fixes" (exercise the
  deployed contour end-to-end and fix what it surfaces — connector liveness check,
  path-conditional CI); the former prod-deploy stage becomes Stage 18.
- Renumber every "Stage 17" prod-deploy reference to "Stage 18" across docs,
  compose, Caddyfile, ci.yaml and CLAUDE.md; the post-Stage-14 split range is now
  "Stages 15–18".
2026-06-05 16:57:17 +02:00
Ilia Denisov 0ea35fe991 Stage 16: connector test-env via UseTestEnvironment; pin it in the test contour
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 10s
CI / ui (pull_request) Successful in 20s
CI / deploy (pull_request) Successful in 30s
- bot.New now selects Telegram's test environment with the library's native
  tgbot.UseTestEnvironment() instead of a token += "/test" hack (functionally
  identical URL /bot<token>/test/METHOD, but idiomatic) + a bot test asserting
  the getMe path for both test and prod.
- ci.yaml pins TELEGRAM_TEST_ENV=true for the test contour (it IS the test
  environment) instead of a TEST_TELEGRAM_TEST_ENV variable: removes the
  confusing double-TEST, telegram-specific, prefixed operator knob and the
  secret-vs-variable footgun. Prod (Stage 17) leaves it false.
- deploy/README.md + PLAN.md updated.
2026-06-05 16:44:10 +02:00
Ilia Denisov ee8d4fd85e Stage 16: deploy/README.md — full environment-variable reference
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 10s
CI / ui (pull_request) Successful in 20s
CI / deploy (pull_request) Successful in 20s
- deploy/README.md documents the services, how to run it locally and in CI, and
  every variable: required (the four :? ones + ≥1 bot token) and optional with
  defaults, marked secret-vs-variable and with the TEST_/PROD_ Gitea mapping;
  plus the fixed internal wiring and the host-side setup.
- ci.yaml maps the remaining POSTGRES_DB/USER, DICT_VERSION and LOG_LEVEL (unset
  renders empty -> the compose ":-" defaults apply), so every documented var is
  per-contour overridable.
- .env.example points at the README for the full reference.
2026-06-05 12:01:31 +02:00
Ilia Denisov 8700fbfae1 Stage 16: deploy infra & test contour
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 19s
CI / deploy (pull_request) Failing after 1s
- backend + gateway multi-stage distroless Dockerfiles; the gateway embeds and
  serves the SPA at / and /telegram/ via go:embed (committed dist placeholder,
  real build baked in by the image's node stage)
- deploy/docker-compose.yml: backend + gateway + Postgres + Telegram connector
  (VPN sidecar) + OTel Collector + Prometheus (15d) + Tempo (72h) + Grafana,
  fronted by a caddy owning a single /_gm Basic-Auth (admin console + Grafana
  subpath); inter-service on a private network, only caddy on the edge network
- new metrics: backend accounts_created_total{kind} (robots excluded) and an
  in-memory gateway active_users{window=24h,7d} gauge
- CI: single .gitea/workflows/ci.yaml (unit/integration/ui + a gated test-contour
  deploy) on the new feature/* -> development -> master branch model; the old
  go-unit/integration/ui-test workflows are folded in; the connector-scoped
  compose is retired (superseded by deploy/)
- docs: ARCHITECTURE §11/§12/§13, root + gateway READMEs, CLAUDE.md branching,
  PLAN.md (stage 16 done + refinements + Stage 17 forward-notes)
2026-06-05 11:42:26 +02:00