diff --git a/.gitea/workflows/ci.yaml b/.gitea/workflows/ci.yaml index a71229c..75228cb 100644 --- a/.gitea/workflows/ci.yaml +++ b/.gitea/workflows/ci.yaml @@ -9,7 +9,7 @@ name: CI # `development` or `master` (the full test suite — the merge gate) and on a push # to `development` (after a merge). The deploy job runs only for `development` # (PR or merge), so a PR into `master` is test-only; the prod deploy is a manual -# workflow (Stage 17). +# workflow (Stage 18). # # Console output is kept plain (NO_COLOR + `docker compose --ansi never` + # `--progress plain`) so the Gitea logs stay readable. @@ -176,7 +176,7 @@ jobs: TELEGRAM_GAME_CHANNEL_ID_EN: ${{ vars.TEST_TELEGRAM_GAME_CHANNEL_ID_EN }} TELEGRAM_GAME_CHANNEL_ID_RU: ${{ vars.TEST_TELEGRAM_GAME_CHANNEL_ID_RU }} # The test contour always uses Telegram's test environment — pinned here, - # not an operator variable. Stage 17's prod workflow leaves it false. + # not an operator variable. Stage 18's prod workflow leaves it false. TELEGRAM_TEST_ENV: "true" VITE_TELEGRAM_BOT_ID: ${{ vars.TEST_VITE_TELEGRAM_BOT_ID }} VITE_TELEGRAM_LINK: ${{ vars.TEST_VITE_TELEGRAM_LINK }} diff --git a/CLAUDE.md b/CLAUDE.md index 69a6a6c..f798924 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -61,7 +61,7 @@ conversation memory — is the source of continuity. Keep it that way. (`docker compose up -d --build` on the runner host + a `GET /` probe). A PR into `master` is test-only. - Merge `development → master` only when CI is green; the **prod** deploy is then a - **manual** workflow (Stage 17), never automatic. Secrets/variables are prefixed + **manual** workflow (Stage 18), never automatic. Secrets/variables are prefixed `TEST_` / `PROD_` per contour (Gitea 1.26 has no deployment environments). - After any push, watch the run to green before declaring a stage done — use the ready-made watcher, never an inline poll loop: diff --git a/PLAN.md b/PLAN.md index d59c49e..edbc419 100644 --- a/PLAN.md +++ b/PLAN.md @@ -50,7 +50,8 @@ independent (see ARCHITECTURE §9.1). | 14 | Solver & dictionary split (publish solver + scrabble-dictionary repo/artifact) | **done** | | 15 | Dual Telegram bots & language-gated variants | **done** | | 16 | Deploy infra & test contour (Dockerfiles, gateway static UI, compose, observability) | **done** | -| 17 | Prod contour deploy (SSH export/import, manual after merge) | todo | +| 17 | Test-contour verification & defect fixes | todo | +| 18 | Prod contour deploy (SSH export/import, manual after merge) | todo | Scaffolding is incremental: `go.work` lists only existing modules; each stage adds the modules it needs. @@ -244,7 +245,7 @@ indices; the premiums.ts parity-test rework. ### Stage 14 — Solver & dictionary split (TODO-1 + TODO-2) Re-scoped from the original "CI & deploy": that was several sessions of work, so the -deploy + observability + the two-bots idea were split into **Stages 15–17** below and this +deploy + observability + the two-bots idea were split into **Stages 15–18** below and this stage took only the dependency/artifact split that everything else builds on. Scope: publish `scrabble-solver` as a versioned Gitea module and split the dictionary build into a new `scrabble-dictionary` repo delivering a **release artifact**, then make `scrabble-game` consume @@ -297,7 +298,25 @@ h2c wrap — `/` + `/telegram/` mounts; a committed `dist` placeholder so `go bu build); Postgres healthcheck/volume; whether the connector-scoped compose is retired for the root one; collector/Tempo/Prometheus retention. -### Stage 17 — Prod contour deploy +### Stage 17 — Test-contour verification & defect fixes +Scope: exercise the deployed **test contour** end-to-end and fix the defects it surfaces — the +"does it actually work in the contour" pass before prod. Bring up the `development` deploy, then +verify each piece against a real run: the gateway serves the SPA at `/` and `/telegram/`; the admin +console and Grafana sit behind the single `/_gm` Basic-Auth; the Telegram **bots** start (test +environment) and the Mini App launches/authenticates; a game can be created and played through (web ++ Mini App); the **observability** stack receives data (Prometheus targets up, the dashboards +populate incl. `accounts_created_total`/`active_users`, traces reach Tempo); the out-of-app push +works. Fix the defects found and harden where the run exposes gaps — notably a CI **connector +liveness check** (the deploy probe only hits the gateway today, so a crash-looping connector is +invisible — that is how the Stage 16 test-env miss went unnoticed) and **path-conditional CI** (skip +the jobs whose code did not change, behind a single always-running gate job so branch-protection +required checks stay satisfiable — a skipped required check otherwise blocks the merge). +Open details (interview at start): the verification checklist + pass bar; which discovered defects +are in-scope vs deferred; the changed-paths design + the aggregate gate job; the connector +liveness-check grace period (the VPN sidecar handshake lets the connector restart a few times before +it settles). + +### Stage 18 — Prod contour deploy Scope: the **production contour** on a remote host over SSH. Deploy by **container export/import** (`docker save` → `scp`/ssh → `docker load` → `docker compose up` on the remote), the SSH key + host IP in Gitea secrets; **strictly manual** (`workflow_dispatch`) after `development` is merged to `master` @@ -905,7 +924,7 @@ provided cert) at the contour caddy; prod VPN; rollback. CI & deploy (TODO-1, TODO-2, the collector + dashboards). The latter two were written into the plan now as the agreed baseline (each still re-interviews at its own start). (Stage 14 was itself later re-scoped to the solver/dictionary split alone; deploy + - observability + the dual-bot idea split into Stages 15–17.) + observability + the dual-bot idea split into Stages 15–18.) - **Shared telemetry** (interview): a new `pkg/telemetry` owns the OTel provider bootstrap (exporter selection, W3C propagators, shutdown, Go runtime metrics); the backend `internal/telemetry` is now a thin facade over it (keeping its gin middleware), @@ -985,7 +1004,7 @@ provided cert) at the contour caddy; prod VPN; rollback. - **Stage 14** (interview + implementation, re-scoped + discharges TODO-1/TODO-2): - **Re-scoped to the split** (interview): the original "CI & deploy" was several sessions of work, so it was cut to the **solver/dictionary split** (the dependency foundation) and the deploy + - observability + the dual-bot idea were written into the plan as new **Stages 15–17**. The deploy + observability + the dual-bot idea were written into the plan as new **Stages 15–18**. The deploy decisions taken at the interview are recorded there (embed the UI in the gateway via `go:embed`; full Collector+Prometheus+Tempo+Grafana stack; **two contours** — test = auto on feature-branch push on the local host, prod = manual SSH `docker save`/`load` after merge; `TEST_`/`PROD_` secret @@ -1047,7 +1066,7 @@ provided cert) at the contour caddy; prod VPN; rollback. `.gitea/workflows/ci.yaml` (Gitea has no cross-workflow `needs`) runs `unit`+`integration`+`ui` on a PR into `development`/`master` and a **gated `deploy`** job (`needs` the three) that auto-rolls the test contour **on a PR into — or a push to — `development`** (owner's "и PR, и push"). A PR into `master` is - test-only; prod is the manual Stage 17. The former `go-unit`/`integration`/`ui-test` workflows were + test-only; prod is the manual Stage 18. The former `go-unit`/`integration`/`ui-test` workflows were folded in (no path filters — full CI on every PR, per the owner). Console kept plain (`NO_COLOR`, `docker compose --ansi never`, `--progress plain`). - **Gateway serves the UI** (interview, the §13 single-origin): a new `gateway/internal/webui` embeds @@ -1066,7 +1085,7 @@ provided cert) at the contour caddy; prod VPN; rollback. **supersedes Stage 10's** gateway-fronts-`/_gm` model **in the deploy topology** (the gateway's own `/_gm` proxy stays for a local non-caddy run). TLS: the **host caddy** terminates it for the test contour and forwards to `scrabble:80`; the in-compose caddy is parameterised (`CADDY_SITE_ADDRESS`) to - own ACME on prod (Stage 17) where there is no host caddy. + own ACME on prod (Stage 18) where there is no host caddy. - **Networks** (engineering): inter-service traffic on a private `internal` network (project-scoped DNS, no name collisions on the shared `edge`); only caddy joins the external `edge` (alias `scrabble`). The connector keeps its VPN sidecar (the only egress that needs the tunnel). The connector-scoped @@ -1094,7 +1113,7 @@ provided cert) at the contour caddy; prod VPN; rollback. verified, but the option is idiomatic and now has a `bot` test asserting the `/bot/test/getMe` path). The test contour **pins `TELEGRAM_TEST_ENV=true` in `ci.yaml`** (the contour is the test environment) rather than via a `TEST_`-prefixed variable — removing a confusing double-`TEST` operator - knob and the secret-vs-variable footgun; prod (Stage 17) leaves it `false`. + knob and the secret-vs-variable footgun; prod (Stage 18) leaves it `false`. ## Deferred TODOs (cross-stage) diff --git a/README.md b/README.md index b615c32..b307133 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,6 @@ docker compose -f deploy/docker-compose.yml config # validate (needs the CI auto-deploys the **test contour** on a PR into — or push to — `development` (`.gitea/workflows/ci.yaml`); the **prod contour** is a manual deploy after -`development → master` (Stage 17). Env reference: [`deploy/.env.example`](deploy/.env.example); +`development → master` (Stage 18). Env reference: [`deploy/.env.example`](deploy/.env.example); the topology and the two-contour model are in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) §13. diff --git a/deploy/.env.example b/deploy/.env.example index 55bbd18..7edeb9f 100644 --- a/deploy/.env.example +++ b/deploy/.env.example @@ -1,5 +1,5 @@ # Environment for deploy/docker-compose.yml. The CI deploy job (ci.yaml) maps the -# Gitea TEST_-prefixed secrets/variables onto these unprefixed names; Stage 17 +# Gitea TEST_-prefixed secrets/variables onto these unprefixed names; Stage 18 # maps the PROD_-prefixed set the same way. Copy to deploy/.env for a local run. # # Full reference (required vs optional, defaults, secret-vs-variable): deploy/README.md. @@ -17,7 +17,7 @@ LOG_LEVEL=info # --- Edge / caddy ----------------------------------------------------------- # Test: ":80" (the host caddy terminates TLS and forwards to scrabble:80 on the -# external `edge` network). Prod (Stage 17): a domain so caddy does its own ACME. +# external `edge` network). Prod (Stage 18): a domain so caddy does its own ACME. CADDY_SITE_ADDRESS=:80 GM_BASICAUTH_USER=gm GM_BASICAUTH_HASH= # required; `caddy hash-password` bcrypt hash diff --git a/deploy/README.md b/deploy/README.md index ba4267e..8797966 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -38,7 +38,7 @@ cd deploy && docker compose up -d --build **In CI** (the test contour) — `.gitea/workflows/ci.yaml`'s `deploy` job maps the Gitea **`TEST_`-prefixed** secrets/variables onto the unprefixed names below and -runs `docker compose up -d --build` on the runner host. Stage 17 (prod) maps the +runs `docker compose up -d --build` on the runner host. Stage 18 (prod) maps the **`PROD_`** set the same way. So a Gitea secret named `TEST_POSTGRES_PASSWORD` feeds the compose's `POSTGRES_PASSWORD`, etc. @@ -71,7 +71,7 @@ connector **fails at boot** if both are empty. | `GRAFANA_ADMIN_PASSWORD` | secret | `admin` | Grafana admin password. Low impact (the login form is disabled, access is anonymous-admin behind caddy) but set it anyway. | | `TELEGRAM_GAME_CHANNEL_ID_EN` | variable | _(empty)_ | English game-channel id; empty/`0` disables channel posts. | | `TELEGRAM_GAME_CHANNEL_ID_RU` | variable | _(empty)_ | Russian game-channel id; empty/`0` disables channel posts. | -| `TELEGRAM_TEST_ENV` | _pinned_ | `false` | `true` routes the bot through Telegram's test environment (`.../bot/test/METHOD`). **The CI test contour pins this to `true` in `ci.yaml`** (the contour is the test environment) — it is not a Gitea variable. Set it in `.env` for a local run; prod (Stage 17) leaves it `false`. | +| `TELEGRAM_TEST_ENV` | _pinned_ | `false` | `true` routes the bot through Telegram's test environment (`.../bot/test/METHOD`). **The CI test contour pins this to `true` in `ci.yaml`** (the contour is the test environment) — it is not a Gitea variable. Set it in `.env` for a local run; prod (Stage 18) leaves it `false`. | | `TELEGRAM_API_BASE_URL` | variable | _(empty)_ | Override the Bot API host (a mock/self-hosted server); empty = `https://api.telegram.org`. | | `GATEWAY_DEFAULT_SUPPORTED_LANGUAGES` | variable | `en,ru` | Variant-gating set for non-Telegram logins (web/email/guest). | | `VITE_TELEGRAM_BOT_ID` | variable | _(empty)_ | UI build-arg: numeric bot id for the web Login Widget. | diff --git a/deploy/caddy/Caddyfile b/deploy/caddy/Caddyfile index 880ee9a..25a4e20 100644 --- a/deploy/caddy/Caddyfile +++ b/deploy/caddy/Caddyfile @@ -4,7 +4,7 @@ # Connect edge) goes to the gateway. Mirrors ../galaxy-game's /_gm model. # # CADDY_SITE_ADDRESS is ":80" in the test contour (the host caddy terminates TLS -# and forwards); set it to a domain in prod (Stage 17) so this caddy does its own +# and forwards); set it to a domain in prod (Stage 18) so this caddy does its own # ACME and the contour is self-contained. { admin off diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml index 9d88a89..6868737 100644 --- a/deploy/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -11,7 +11,7 @@ # - `edge` (external): the host caddy reaches this contour at `scrabble:80` # (the in-compose caddy's alias). The in-compose caddy terminates only HTTP in # the test contour; the host caddy terminates TLS and forwards. For prod -# (Stage 17, no host caddy) set CADDY_SITE_ADDRESS to the domain so the caddy +# (Stage 18, no host caddy) set CADDY_SITE_ADDRESS to the domain so the caddy # does its own ACME — the contour is then self-contained. # - The connector egresses to api.telegram.org through the `vpn` sidecar # (network_mode: service:vpn); it answers internal gRPC at `telegram:9091`. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 1674fdd..f172a9f 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -569,7 +569,7 @@ Two contours, two secret/variable prefixes (`TEST_` / `PROD_`): host, then a `GET /` probe through caddy). The host caddy terminates TLS and forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP (`CADDY_SITE_ADDRESS=:80`). -- **Prod** (Stage 17): a manual SSH deploy after `development → master`. There is no +- **Prod** (Stage 18): a manual SSH deploy after `development → master`. There is no host caddy, so the contour ships its own caddy terminating TLS — set `CADDY_SITE_ADDRESS` to the domain and the caddy does its own ACME.