Observability: replace cAdvisor (which resolves only the root cgroup on the contour host — separate-XFS /var/lib/docker) with the otelcol docker_stats receiver, which reads per-container CPU/memory/network straight from the Docker API and works the same in prod. The collector joins the host docker group (DOCKER_GID, default 989) and mounts the socket read-only; its metrics flow out through the existing prometheus exporter, so the cAdvisor scrape job and the privileged cAdvisor service are removed. The Resources dashboard panels are retargeted to the docker_stats metric names (container_name label; container.cpu.utilization/100 == cores). Container limits: apply deploy.resources.limits (honoured by Compose v2) across the contour and pin GOMAXPROCS to the CPU limit on the Go services so the runtime matches the cgroup quota. Starting values are generous over the R2 peak (~1 core / <=100 MiB per app service) to avoid skewing or OOM-killing the measurement run; they are tightened to the agreed prod sizing after the final stress run (R7 Round 2). The privileged VPN sidecar is left unconstrained.
deploy
The full Scrabble contour: backend + gateway + the static landing + Postgres +
the Telegram connector (with a VPN sidecar) + the observability stack (OTel
Collector → Prometheus + Tempo → Grafana), fronted by a caddy that owns a single
/_gm Basic-Auth (the admin console + Grafana). Topology and the decision record are in
../docs/ARCHITECTURE.md §13; this file is the
operational reference for every environment variable.
Services
| Service | Image | Role |
|---|---|---|
caddy |
caddy:2-alpine |
Edge proxy (alias scrabble on edge): single /_gm Basic-Auth → admin console + Grafana; /app/, /telegram/ + the Connect path → gateway; the catch-all (incl. /) → landing. TLS per CADDY_SITE_ADDRESS. |
gateway |
built (gateway/Dockerfile, target gateway) |
Public edge; serves the embedded game SPA at /app/ + /telegram/; Connect-RPC edge. / redirects to /app/. |
landing |
built (gateway/Dockerfile, target landing) |
Static landing page at / (caddy:2-alpine + the shared Vite build, deploy/landing/Caddyfile); absorbs stray public paths. |
backend |
built (backend/Dockerfile) |
Domain service; bakes in the DAWG dictionaries; runs migrations at boot. |
postgres |
postgres:17-alpine |
Database (named volume, pg_isready healthcheck). |
vpn + telegram |
sidecar + built (platform/telegram/Dockerfile) |
Telegram connector; egresses through the AmneziaWG sidecar; internal gRPC at telegram:9091. |
otelcol |
otel/opentelemetry-collector-contrib |
OTLP/gRPC :4317 → Prometheus scrape (:9464) + Tempo. |
prometheus |
prom/prometheus |
Metrics, 15d retention. |
tempo |
grafana/tempo |
Traces, 72h retention. |
grafana |
grafana/grafana |
Dashboards (provisioned), anonymous-admin behind caddy's /_gm/grafana. |
Networking: inter-service traffic is on the private internal network
(project-scoped DNS); only caddy joins the shared external edge network so the
host caddy can reach it at scrabble:80. edge must already exist on the host
(docker network create edge).
Run it
Locally — copy the template, fill the required values, bring it up:
cp deploy/.env.example deploy/.env # then edit deploy/.env
docker network create edge # once, if it does not exist
cd deploy && docker compose up -d --build
In CI (the test contour) — .gitea/workflows/ci.yaml's deploy job maps the
Gitea TEST_-prefixed secrets/variables onto the unprefixed names below and
runs docker compose up -d --build on the runner host. The prod deploy maps the
PROD_ set the same way. So a Gitea secret named TEST_POSTGRES_PASSWORD
feeds the compose's POSTGRES_PASSWORD, etc.
The deploy job also seeds the config files (caddy, otelcol, prometheus,
tempo, grafana) to a stable host path ($HOME/.scrabble-deploy) and sets
SCRABBLE_CONFIG_DIR to it before up. The runner's checkout is an ephemeral act
workspace that is removed after the job — binding config straight from it would
dangle the mounts in the long-lived containers (Grafana would log
no such file or directory). Locally SCRABBLE_CONFIG_DIR defaults to ., so the
compose binds from this directory.
Required variables
docker compose aborts immediately if any of these is unset (they use :?):
| Variable | Gitea kind | Purpose |
|---|---|---|
POSTGRES_PASSWORD |
secret | Postgres password (also embedded in BACKEND_POSTGRES_DSN). |
AWG_CONF |
secret | AmneziaWG config for the VPN sidecar (the connector's only egress). Must not contain a DNS= line — it hijacks the shared netns's resolv.conf and breaks the connector resolving otelcol (telemetry export). Without it, Docker's resolver handles both otelcol and api.telegram.org. |
GM_BASICAUTH_HASH |
secret | bcrypt hash gating /_gm (admin console + Grafana). Generate with docker run --rm caddy:2-alpine caddy hash-password --plaintext '<pw>'. |
TELEGRAM_MINIAPP_URL |
variable | The Mini App URL the connector hands out in deep links / buttons. |
Plus at least one bot token — TELEGRAM_BOT_TOKEN_EN or TELEGRAM_BOT_TOKEN_RU
(secrets). Compose cannot express "one of", so they default to empty, but the
connector fails at boot if both are empty.
Optional variables (with defaults)
| Variable | Gitea kind | Default | Purpose |
|---|---|---|---|
POSTGRES_DB |
variable | scrabble |
Database name. |
POSTGRES_USER |
variable | scrabble |
Database user. |
DICT_VERSION |
variable | v1.0.0 |
scrabble-dictionary release tag baked into the backend image (build-arg). |
LOG_LEVEL |
variable | info |
Shared log level for backend / gateway / connector (debug|info|warn|error). |
CADDY_SITE_ADDRESS |
variable | :80 |
Caddy site address. Test: :80 (host caddy terminates TLS). Prod: a domain, so caddy does its own ACME. |
GM_BASICAUTH_USER |
variable | gm |
Username for the /_gm Basic-Auth. |
GRAFANA_ROOT_URL |
variable | /_gm/grafana/ |
Grafana root URL (sub-path serving). Set the full https://<domain>/_gm/grafana/ behind a real domain. |
GRAFANA_ADMIN_PASSWORD |
secret | admin |
Grafana admin password. Low impact (the login form is disabled, access is anonymous-admin behind caddy) but set it anyway. |
TELEGRAM_GAME_CHANNEL_ID_EN |
variable | (empty) | English game-channel id; empty/0 disables channel posts. |
TELEGRAM_GAME_CHANNEL_ID_RU |
variable | (empty) | Russian game-channel id; empty/0 disables channel posts. |
TELEGRAM_TEST_ENV |
pinned | false |
true routes the bot through Telegram's test environment (.../bot<token>/test/METHOD). The CI test contour pins this to true in ci.yaml (the contour is the test environment) — it is not a Gitea variable. Set it in .env for a local run; prod leaves it false. |
TELEGRAM_API_BASE_URL |
variable | (empty) | Override the Bot API host (a mock/self-hosted server); empty = https://api.telegram.org. |
GATEWAY_DEFAULT_SUPPORTED_LANGUAGES |
variable | en,ru |
Variant-gating set for non-Telegram logins (web/email/guest). |
VITE_TELEGRAM_BOT_ID |
variable | (empty) | UI build-arg: numeric bot id for the web Login Widget. |
VITE_TELEGRAM_LINK |
variable | (empty) | UI build-arg: deep-link base for share-to-Telegram (e.g. https://t.me/<bot>/<app>). |
VITE_TELEGRAM_GAME_CHANNEL_NAME_EN |
variable | (empty) | UI build-arg: the landing "Play in Telegram" link for the English bot (e.g. https://t.me/Scrabble_Game). |
VITE_TELEGRAM_GAME_CHANNEL_NAME_RU |
variable | (empty) | UI build-arg: the landing "Play in Telegram" link for the Russian bot (e.g. https://t.me/Erudit_Game). |
VITE_GATEWAY_URL |
variable | (empty) | UI build-arg: gateway origin; empty = same-origin (the usual single-origin deploy). |
The five VITE_* are build-args baked into the gateway and landing images at
build time (both targets share one UI build stage — keep the args identical so it is
built once), so changing them requires a rebuild (--build), not just a restart.
Fixed internal wiring (not operator-set)
These are hard-wired in docker-compose.yml (no ${...}), pointing the services
at each other on the internal network — listed here so they are not mistaken for
missing config: BACKEND_POSTGRES_DSN (→ postgres, search_path=backend),
GATEWAY_BACKEND_HTTP_URL/_GRPC_ADDR (→ backend),
GATEWAY_CONNECTOR_ADDR/BACKEND_CONNECTOR_ADDR (→ telegram:9091), and all three
services' *_OTEL_*_EXPORTER=otlp → OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4317
(_INSECURE=true). The connector shares the VPN sidecar's netns: routing to the
collector's internal IP is fine (connected route), but its AWG_CONF must not
set a DNS= directive — that hijacks resolv.conf and breaks resolving otelcol
("produced zero addresses"); without it the netns uses Docker's resolver, which
resolves both otelcol and api.telegram.org. GATEWAY_ADMIN_* is intentionally
unset — caddy owns /_gm in the contour.
Host-side setup (outside this repo)
edgenetwork must exist on the host (docker network create edge).- Host caddy route
<domain> → scrabble:80(the in-compose caddy serves HTTP in the test contour; the host caddy terminates TLS). Not needed on prod, where the contour caddy owns TLS (setCADDY_SITE_ADDRESSto the domain). - Branch protection requires the single status check
CI / gate. Theunit/integration/uijobs are path-conditional (they skip when their code did not change), and the always-runninggatejob aggregates them (passing when each succeeded or was skipped), so a skipped job never blocks a merge. See../CLAUDE.md"Branching & CI".