Stage 16: deploy infra & test contour
CI / unit (pull_request) Successful in 9s
CI / integration (pull_request) Successful in 11s
CI / ui (pull_request) Successful in 19s
CI / deploy (pull_request) Failing after 1s

- backend + gateway multi-stage distroless Dockerfiles; the gateway embeds and
  serves the SPA at / and /telegram/ via go:embed (committed dist placeholder,
  real build baked in by the image's node stage)
- deploy/docker-compose.yml: backend + gateway + Postgres + Telegram connector
  (VPN sidecar) + OTel Collector + Prometheus (15d) + Tempo (72h) + Grafana,
  fronted by a caddy owning a single /_gm Basic-Auth (admin console + Grafana
  subpath); inter-service on a private network, only caddy on the edge network
- new metrics: backend accounts_created_total{kind} (robots excluded) and an
  in-memory gateway active_users{window=24h,7d} gauge
- CI: single .gitea/workflows/ci.yaml (unit/integration/ui + a gated test-contour
  deploy) on the new feature/* -> development -> master branch model; the old
  go-unit/integration/ui-test workflows are folded in; the connector-scoped
  compose is retired (superseded by deploy/)
- docs: ARCHITECTURE §11/§12/§13, root + gateway READMEs, CLAUDE.md branching,
  PLAN.md (stage 16 done + refinements + Stage 17 forward-notes)
This commit is contained in:
Ilia Denisov
2026-06-05 11:42:26 +02:00
parent 8c8f8c4d42
commit 8700fbfae1
35 changed files with 1413 additions and 318 deletions
+39 -13
View File
@@ -489,8 +489,11 @@ promotions) is future work and would deliver short markdown messages (text + lin
available for debugging; **`otlp`** (gRPC, endpoint from the standard
`OTEL_EXPORTER_OTLP_*` environment) exports to a collector. The Postgres pool is
instrumented with otelsql and `otelgrpc` traces the backend↔gateway push stream
and the gateway↔connector calls. The OTLP collector and Grafana dashboards are
stood up with the deploy (Stage 15).
and the gateway↔connector calls. The OTLP **Collector** (OTLP/gRPC → Prometheus
metrics + Tempo traces), **Prometheus** (15d), **Tempo** (72h) and **Grafana**
(provisioned datasources + dashboards, behind the caddy `/_gm/grafana` Basic-Auth)
are stood up with the deploy (`deploy/`, Stage 16); the default exporter stays
`none`, so CI needs no collector.
- Per-request server-side timing via gin middleware from day one (the access log
carries method, route, status, latency and the active trace id). A
client-measured RTT piggybacked on the next request is a later enhancement.
@@ -503,6 +506,12 @@ promotions) is future work and would deliver short markdown messages (text + lin
(the UI-perceived roundtrip, by `message_type`/`result`); and Go runtime/heap
metrics. Game-scoped metrics carry a `variant` attribute
(english/russian_scrabble/erudit).
- User metrics (Stage 16): a backend counter `accounts_created_total` (`kind` =
telegram/email/guest; robots are a provisioned pool, not users, and are excluded)
and a gateway **in-memory** observable gauge `active_users` (`window` = 24h/7d) —
distinct accounts that performed an authenticated edge action in the window. The
gauge is single-process by design (single-instance MVP, §10): it is correct for one
gateway, resets on restart, and is a live operational figure, not a billing count.
- Unauthenticated `GET /healthz` (liveness) and `GET /readyz` (readiness — the
database answers a bounded ping and the session cache is warmed).
- The backend serves a **second listener** — a gRPC server
@@ -518,7 +527,7 @@ promotions) is future work and would deliver short markdown messages (text + lin
| Session minting; email-code / guest validation | gateway (with backend) |
| Session → `user_id` resolution, `X-User-ID` injection | gateway |
| Authorisation, ownership, state transitions | backend (`X-User-ID` is the sole identity input) |
| Admin authentication | gateway validates HTTP Basic Auth (`GATEWAY_ADMIN_*`) on the public `/_gm/*` path and reverse-proxies it **verbatim** to the backend's server-rendered admin console; the backend trusts the gateway (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked |
| Admin authentication | a single Basic-Auth gate on `/_gm/*`, forwarded **verbatim** to the backend's server-rendered admin console (and, in the deployed contour, routing `/_gm/grafana/*` to Grafana). In the deploy the **caddy** owns this gate (§13); a local non-caddy run uses the gateway's own `GATEWAY_ADMIN_*` proxy. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked |
| backend ↔ gateway ↔ connector trust | the network (only gateway may reach backend; the connector serves unauthenticated gRPC on the internal segment) |
This is an explicit, accepted MVP risk: compromise of the gateway↔backend
@@ -536,16 +545,33 @@ a dedicated redeem sub-limit or a longer code is the hardening step if abuse app
## 13. Deployment (informational)
Single public origin, path-routed: a mini-landing at the root, the **Telegram Mini
App under `/telegram/`** (the gateway serves the static UI build, wired in Stage 15;
outside Telegram that path redirects to the root), the gateway public surface and the **admin console
at `/_gm`** (backend-rendered, Basic-Auth at the gateway) share one host that
terminates TLS. The **Telegram connector** runs as a separate
container with **no public ingress** — it long-polls Telegram and egresses through a
VPN sidecar, answering only internal gRPC. MVP runs one `gateway`, one `backend`, one
Postgres, plus the connector. The connector's Docker/compose ships now
(`platform/telegram/deploy`, mirroring `../15-puzzle`); the gateway's static UI serving
and the full multi-service deploy land in Stage 15.
Single public origin, path-routed. The gateway **embeds** the static UI build
(`go:embed`, baked in by a node stage in `gateway/Dockerfile`) and serves the one
SPA at both `/` (web) and `/telegram/` (the Telegram Mini App; outside Telegram that
path redirects to the root — the client-side guard). An in-compose **caddy** is the
contour's edge: it owns a single `/_gm` Basic-Auth and routes `/_gm/grafana/*` to
**Grafana** (anonymous-admin, so the one shared login gates it with no per-user
Grafana accounts) and the rest of `/_gm/*` to the backend-rendered **admin console**;
everything else (`/`, `/telegram/`, the Connect edge) goes to the gateway. The
**Telegram connector** runs as a separate container with **no public ingress** — it
long-polls Telegram and egresses through a VPN sidecar, answering only internal gRPC.
The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`,
one Postgres, the connector (+ its VPN sidecar) and the **observability stack**
OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana
with provisioned datasources and dashboards. Inter-service traffic uses a private
`internal` network (project-scoped DNS); only caddy joins the shared external `edge`
network (alias `scrabble`).
Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
- **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development`
(`.gitea/workflows/ci.yaml``docker compose up -d --build` on the Gitea runner
host, then a `GET /` probe through caddy). The host caddy terminates TLS and
forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP
(`CADDY_SITE_ADDRESS=:80`).
- **Prod** (Stage 17): a manual SSH deploy after `development → master`. There is no
host caddy, so the contour ships its own caddy terminating TLS — set
`CADDY_SITE_ADDRESS` to the domain and the caddy does its own ACME.
## 14. CI & branches