Stage 16: deploy infra & test contour
- backend + gateway multi-stage distroless Dockerfiles; the gateway embeds and
serves the SPA at / and /telegram/ via go:embed (committed dist placeholder,
real build baked in by the image's node stage)
- deploy/docker-compose.yml: backend + gateway + Postgres + Telegram connector
(VPN sidecar) + OTel Collector + Prometheus (15d) + Tempo (72h) + Grafana,
fronted by a caddy owning a single /_gm Basic-Auth (admin console + Grafana
subpath); inter-service on a private network, only caddy on the edge network
- new metrics: backend accounts_created_total{kind} (robots excluded) and an
in-memory gateway active_users{window=24h,7d} gauge
- CI: single .gitea/workflows/ci.yaml (unit/integration/ui + a gated test-contour
deploy) on the new feature/* -> development -> master branch model; the old
go-unit/integration/ui-test workflows are folded in; the connector-scoped
compose is retired (superseded by deploy/)
- docs: ARCHITECTURE §11/§12/§13, root + gateway READMEs, CLAUDE.md branching,
PLAN.md (stage 16 done + refinements + Stage 17 forward-notes)
This commit is contained in:
@@ -49,7 +49,7 @@ independent (see ARCHITECTURE §9.1).
|
||||
| 13 | Alphabet on the wire (UI alphabet-agnostic) | **done** |
|
||||
| 14 | Solver & dictionary split (publish solver + scrabble-dictionary repo/artifact) | **done** |
|
||||
| 15 | Dual Telegram bots & language-gated variants | **done** |
|
||||
| 16 | Deploy infra & test contour (Dockerfiles, gateway static UI, compose, observability) | todo |
|
||||
| 16 | Deploy infra & test contour (Dockerfiles, gateway static UI, compose, observability) | **done** |
|
||||
| 17 | Prod contour deploy (SSH export/import, manual after merge) | todo |
|
||||
|
||||
Scaffolding is incremental: `go.work` lists only existing modules; each stage
|
||||
@@ -279,7 +279,7 @@ back to `preferred_language`). Non-Telegram logins (web/email/guest) carry the g
|
||||
(`GATEWAY_DEFAULT_SUPPORTED_LANGUAGES`, all variants). Admin broadcasts (`SendToUser`/`SendToGameChannel`)
|
||||
pick the bot by an **operator-chosen** language in the console — unrelated to `ValidateInitData`.
|
||||
|
||||
### Stage 16 — Deploy infra & test contour
|
||||
### Stage 16 — Deploy infra & test contour *(done)*
|
||||
Scope: the deploy machinery + the **test contour** (the bulk of the original Stage 14). Backend +
|
||||
gateway **Dockerfiles** (multi-stage distroless, mirroring the Stage 9 connector image); the gateway
|
||||
gains **static UI serving** — **embedded** via `go:embed` (a node build stage in the gateway image),
|
||||
@@ -300,12 +300,16 @@ collector/Tempo/Prometheus retention.
|
||||
### Stage 17 — Prod contour deploy
|
||||
Scope: the **production contour** on a remote host over SSH. Deploy by **container export/import**
|
||||
(`docker save` → `scp`/ssh → `docker load` → `docker compose up` on the remote), the SSH key + host IP
|
||||
in Gitea secrets; **strictly manual** (`workflow_dispatch`) after a feature branch is merged to
|
||||
`master`. Two-contour config uses **`TEST_`/`PROD_` secret/variable prefixes** — Gitea 1.26 has no
|
||||
deployment environments (verified: the `environments` API 404s), so a flat prefixed namespace is the
|
||||
convention.
|
||||
Open details (re-interview): export/import vs a registry trade-off; prod domain/TLS at the remote
|
||||
caddy; prod VPN; rollback.
|
||||
in Gitea secrets; **strictly manual** (`workflow_dispatch`) after `development` is merged to `master`
|
||||
(the Stage 16 branch model: `feature/* → development → master`, merge gated green). Two-contour config
|
||||
uses **`TEST_`/`PROD_` secret/variable prefixes** — Gitea 1.26 has no deployment environments (verified:
|
||||
the `environments` API 404s), so a flat prefixed namespace is the convention.
|
||||
Reuses the Stage 16 `deploy/docker-compose.yml` as-is, mapping the **`PROD_`** set onto the same
|
||||
unprefixed compose vars. **No host caddy on prod**, so the contour's own caddy terminates TLS — set
|
||||
`CADDY_SITE_ADDRESS` to the prod domain so caddy does its own ACME (the Caddyfile is already
|
||||
parameterised for this; the test contour leaves it `:80` behind the host caddy).
|
||||
Open details (re-interview): export/import vs a registry trade-off; prod domain/cert source (ACME vs a
|
||||
provided cert) at the contour caddy; prod VPN; rollback.
|
||||
|
||||
## Refinements logged during implementation
|
||||
|
||||
@@ -1036,6 +1040,56 @@ caddy; prod VPN; rollback.
|
||||
per-language vars (the full deploy stack is Stage 16). No CI workflow change (the Go and UI workflows
|
||||
already span the touched modules).
|
||||
|
||||
- **Stage 16** (interview + implementation):
|
||||
- **Branch model reshaped** (interview, supersedes the Stage 0 `feature/* → master`): a long-lived
|
||||
**`development`** integration branch + **`master`** as the prod trunk. Feature branches are cut from
|
||||
`development`; a feature-branch commit triggers nothing. A single consolidated
|
||||
`.gitea/workflows/ci.yaml` (Gitea has no cross-workflow `needs`) runs `unit`+`integration`+`ui` on a PR
|
||||
into `development`/`master` and a **gated `deploy`** job (`needs` the three) that auto-rolls the test
|
||||
contour **on a PR into — or a push to — `development`** (owner's "и PR, и push"). A PR into `master` is
|
||||
test-only; prod is the manual Stage 17. The former `go-unit`/`integration`/`ui-test` workflows were
|
||||
folded in (no path filters — full CI on every PR, per the owner). Console kept plain (`NO_COLOR`,
|
||||
`docker compose --ansi never`, `--progress plain`).
|
||||
- **Gateway serves the UI** (interview, the §13 single-origin): a new `gateway/internal/webui` embeds
|
||||
`dist` via `go:embed` (a committed placeholder index so `go build`/CI compile without a UI build) and
|
||||
serves the SPA at `/` and `/telegram/` (a path-stripping SPA handler, index.html fallback for the hash
|
||||
router), mounted in the edge mux **below** the h2c wrap; `/_gm` stays an explicit 404 when the local
|
||||
admin proxy is off so the catch-all does not leak the shell. The `gateway/Dockerfile` node stage builds
|
||||
the UI with the `VITE_*` build-args and copies it into the embed dir before `go build`.
|
||||
- **Images** (interview): multi-stage distroless `backend/Dockerfile` (a DAWG stage `curl`s the
|
||||
`scrabble-dawg` release pinned to `DICT_VERSION`, `GOPRIVATE` fetches the solver) and `gateway/Dockerfile`
|
||||
(node UI stage + Go stage), both trimming `go.work` like `platform/telegram/Dockerfile`. Built and
|
||||
verified locally.
|
||||
- **Contour = caddy-fronted** (interview, "caddy всё равно нужен для https"): a new `caddy` service owns
|
||||
a **single `/_gm` Basic-Auth** and routes `/_gm/grafana/*` → Grafana (anonymous-admin + sub-path, no
|
||||
own accounts) and the rest of `/_gm/*` → the backend console; everything else → the gateway. This
|
||||
**supersedes Stage 10's** gateway-fronts-`/_gm` model **in the deploy topology** (the gateway's own
|
||||
`/_gm` proxy stays for a local non-caddy run). TLS: the **host caddy** terminates it for the test
|
||||
contour and forwards to `scrabble:80`; the in-compose caddy is parameterised (`CADDY_SITE_ADDRESS`) to
|
||||
own ACME on prod (Stage 17) where there is no host caddy.
|
||||
- **Networks** (engineering): inter-service traffic on a private `internal` network (project-scoped DNS,
|
||||
no name collisions on the shared `edge`); only caddy joins the external `edge` (alias `scrabble`). The
|
||||
connector keeps its VPN sidecar (the only egress that needs the tunnel). The connector-scoped
|
||||
`platform/telegram/deploy/docker-compose.yml` was **retired** (the root `deploy/docker-compose.yml`
|
||||
supersedes it; the connector Dockerfile stays).
|
||||
- **Observability stack** (interview): OTel Collector (OTLP/gRPC → a Prometheus scrape endpoint +
|
||||
Tempo OTLP) + Prometheus (**15d**) + Tempo (**72h**) + Grafana (provisioned Prometheus+Tempo datasources
|
||||
+ four dashboards: Service overview, Edge/UX, Game domain, Users; Traces via the Tempo datasource +
|
||||
Explore, no fixed panels). The collector's prometheus exporter uses `add_metric_suffixes:false` +
|
||||
`resource_to_telemetry_conversion` so the dashboards' PromQL matches the in-code metric names and carries
|
||||
`service_name`. The three services export `otlp` in the contour (default stays `none`, so CI needs no
|
||||
collector). Loki/logs were left out of scope (container stdout / zap JSON).
|
||||
- **User metrics** (interview): a backend `accounts_created_total{kind}` counter (telegram/email/guest;
|
||||
robots excluded — they are a provisioned pool, not users) via the Stage-12 `SetMetrics` no-op pattern,
|
||||
and a gateway **in-memory** `active_users{window=24h,7d}` observable gauge (distinct authenticated edge
|
||||
actors). The owner chose the in-memory gauge over a DB `last_seen_at` (overkill); its single-instance /
|
||||
reset-on-restart limits are documented (a live gauge, not billing).
|
||||
- **Owner actions before the contour is green** (surfaced, not blockers): set the **`TEST_`** Gitea
|
||||
secrets/variables (see `deploy/.env.example`) and add a host-caddy route `<test domain> → scrabble:80`
|
||||
on the runner host. CI bootstrap nuance: the first PR introducing `ci.yaml` may first deploy on the
|
||||
post-merge push to `development` (depending on whether Gitea runs head/base workflows for a PR), after
|
||||
which PR-time deploys work.
|
||||
|
||||
## Deferred TODOs (cross-stage)
|
||||
|
||||
- ~~**TODO-1 — publish & version the solver.**~~ **Done in Stage 14.** `scrabble-solver` is
|
||||
|
||||
Reference in New Issue
Block a user