Root cause of the Grafana "readdirent /etc/grafana/dashboards: no such file or
directory": the CI runner checks out into an ephemeral act workspace that is
removed after the job, so binding the compose config files straight from it
dangles the mounts in the long-lived containers (verified the act source dir is
emptied after the job). caddy/otelcol/prometheus/tempo read their config once at
startup so they survive, but would break on a restart — same latent bug.
Fix (mirrors ../galaxy-game's $HOME/.galaxy-dev/monitoring): the deploy job seeds
the config dirs to a stable $HOME/.scrabble-deploy and the compose binds them via
${SCRABBLE_CONFIG_DIR:-.} (local runs keep "."). Documented in the compose header,
deploy/README.md and the ci.yaml step.
deploy
The full Scrabble contour: backend + gateway + Postgres + the Telegram
connector (with a VPN sidecar) + the observability stack (OTel Collector →
Prometheus + Tempo → Grafana), fronted by a caddy that owns a single /_gm
Basic-Auth (the admin console + Grafana). Topology and the decision record are in
../docs/ARCHITECTURE.md §13; this file is the
operational reference for every environment variable.
Services
| Service | Image | Role |
|---|---|---|
caddy |
caddy:2-alpine |
Edge proxy (alias scrabble on edge): single /_gm Basic-Auth → admin console + Grafana; everything else → gateway. TLS per CADDY_SITE_ADDRESS. |
gateway |
built (gateway/Dockerfile) |
Public edge; serves the embedded SPA at / and /telegram/; Connect-RPC edge. |
backend |
built (backend/Dockerfile) |
Domain service; bakes in the DAWG dictionaries; runs migrations at boot. |
postgres |
postgres:17-alpine |
Database (named volume, pg_isready healthcheck). |
vpn + telegram |
sidecar + built (platform/telegram/Dockerfile) |
Telegram connector; egresses through the AmneziaWG sidecar; internal gRPC at telegram:9091. |
otelcol |
otel/opentelemetry-collector-contrib |
OTLP/gRPC :4317 → Prometheus scrape (:9464) + Tempo. |
prometheus |
prom/prometheus |
Metrics, 15d retention. |
tempo |
grafana/tempo |
Traces, 72h retention. |
grafana |
grafana/grafana |
Dashboards (provisioned), anonymous-admin behind caddy's /_gm/grafana. |
Networking: inter-service traffic is on the private internal network
(project-scoped DNS); only caddy joins the shared external edge network so the
host caddy can reach it at scrabble:80. edge must already exist on the host
(docker network create edge).
Run it
Locally — copy the template, fill the required values, bring it up:
cp deploy/.env.example deploy/.env # then edit deploy/.env
docker network create edge # once, if it does not exist
cd deploy && docker compose up -d --build
In CI (the test contour) — .gitea/workflows/ci.yaml's deploy job maps the
Gitea TEST_-prefixed secrets/variables onto the unprefixed names below and
runs docker compose up -d --build on the runner host. Stage 18 (prod) maps the
PROD_ set the same way. So a Gitea secret named TEST_POSTGRES_PASSWORD
feeds the compose's POSTGRES_PASSWORD, etc.
The deploy job also seeds the config files (caddy, otelcol, prometheus,
tempo, grafana) to a stable host path ($HOME/.scrabble-deploy) and sets
SCRABBLE_CONFIG_DIR to it before up. The runner's checkout is an ephemeral act
workspace that is removed after the job — binding config straight from it would
dangle the mounts in the long-lived containers (Grafana would log
no such file or directory). Locally SCRABBLE_CONFIG_DIR defaults to ., so the
compose binds from this directory.
Required variables
docker compose aborts immediately if any of these is unset (they use :?):
| Variable | Gitea kind | Purpose |
|---|---|---|
POSTGRES_PASSWORD |
secret | Postgres password (also embedded in BACKEND_POSTGRES_DSN). |
AWG_CONF |
secret | AmneziaWG config for the VPN sidecar (the connector's only egress). Must not contain a DNS= line — it hijacks the shared netns's resolv.conf and breaks the connector resolving otelcol (telemetry export). Without it, Docker's resolver handles both otelcol and api.telegram.org. |
GM_BASICAUTH_HASH |
secret | bcrypt hash gating /_gm (admin console + Grafana). Generate with docker run --rm caddy:2-alpine caddy hash-password --plaintext '<pw>'. |
TELEGRAM_MINIAPP_URL |
variable | The Mini App URL the connector hands out in deep links / buttons. |
Plus at least one bot token — TELEGRAM_BOT_TOKEN_EN or TELEGRAM_BOT_TOKEN_RU
(secrets). Compose cannot express "one of", so they default to empty, but the
connector fails at boot if both are empty.
Optional variables (with defaults)
| Variable | Gitea kind | Default | Purpose |
|---|---|---|---|
POSTGRES_DB |
variable | scrabble |
Database name. |
POSTGRES_USER |
variable | scrabble |
Database user. |
DICT_VERSION |
variable | v1.0.0 |
scrabble-dictionary release tag baked into the backend image (build-arg). |
LOG_LEVEL |
variable | info |
Shared log level for backend / gateway / connector (debug|info|warn|error). |
CADDY_SITE_ADDRESS |
variable | :80 |
Caddy site address. Test: :80 (host caddy terminates TLS). Prod: a domain, so caddy does its own ACME. |
GM_BASICAUTH_USER |
variable | gm |
Username for the /_gm Basic-Auth. |
GRAFANA_ROOT_URL |
variable | /_gm/grafana/ |
Grafana root URL (sub-path serving). Set the full https://<domain>/_gm/grafana/ behind a real domain. |
GRAFANA_ADMIN_PASSWORD |
secret | admin |
Grafana admin password. Low impact (the login form is disabled, access is anonymous-admin behind caddy) but set it anyway. |
TELEGRAM_GAME_CHANNEL_ID_EN |
variable | (empty) | English game-channel id; empty/0 disables channel posts. |
TELEGRAM_GAME_CHANNEL_ID_RU |
variable | (empty) | Russian game-channel id; empty/0 disables channel posts. |
TELEGRAM_TEST_ENV |
pinned | false |
true routes the bot through Telegram's test environment (.../bot<token>/test/METHOD). The CI test contour pins this to true in ci.yaml (the contour is the test environment) — it is not a Gitea variable. Set it in .env for a local run; prod (Stage 18) leaves it false. |
TELEGRAM_API_BASE_URL |
variable | (empty) | Override the Bot API host (a mock/self-hosted server); empty = https://api.telegram.org. |
GATEWAY_DEFAULT_SUPPORTED_LANGUAGES |
variable | en,ru |
Variant-gating set for non-Telegram logins (web/email/guest). |
VITE_TELEGRAM_BOT_ID |
variable | (empty) | UI build-arg: numeric bot id for the web Login Widget. |
VITE_TELEGRAM_LINK |
variable | (empty) | UI build-arg: deep-link base for share-to-Telegram (e.g. https://t.me/<bot>/<app>). |
VITE_GATEWAY_URL |
variable | (empty) | UI build-arg: gateway origin; empty = same-origin (the usual single-origin deploy). |
The three VITE_* are build-args baked into the gateway image at build time, so
changing them requires a rebuild (--build), not just a restart.
Fixed internal wiring (not operator-set)
These are hard-wired in docker-compose.yml (no ${...}), pointing the services
at each other on the internal network — listed here so they are not mistaken for
missing config: BACKEND_POSTGRES_DSN (→ postgres, search_path=backend),
GATEWAY_BACKEND_HTTP_URL/_GRPC_ADDR (→ backend),
GATEWAY_CONNECTOR_ADDR/BACKEND_CONNECTOR_ADDR (→ telegram:9091), and all three
services' *_OTEL_*_EXPORTER=otlp → OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317
(_INSECURE=true). The connector shares the VPN sidecar's netns: routing to the
collector's internal IP is fine (connected route), but its AWG_CONF must not
set a DNS= directive — that hijacks resolv.conf and breaks resolving otelcol
("produced zero addresses"); without it the netns uses Docker's resolver, which
resolves both otelcol and api.telegram.org. GATEWAY_ADMIN_* is intentionally
unset — caddy owns /_gm in the contour.
Host-side setup (outside this repo)
edgenetwork must exist on the host (docker network create edge).- Host caddy route
<domain> → scrabble:80(the in-compose caddy serves HTTP in the test contour; the host caddy terminates TLS). Not needed on prod, where the contour caddy owns TLS (setCADDY_SITE_ADDRESSto the domain). - Branch protection required-status-check names are
CI / unit,CI / integration,CI / ui(see../CLAUDE.md"Branching & CI").