Stage 16: insert Stage 17 (test-contour verification); renumber prod deploy to 18
- PLAN.md: new Stage 17 "Test-contour verification & defect fixes" (exercise the deployed contour end-to-end and fix what it surfaces — connector liveness check, path-conditional CI); the former prod-deploy stage becomes Stage 18. - Renumber every "Stage 17" prod-deploy reference to "Stage 18" across docs, compose, Caddyfile, ci.yaml and CLAUDE.md; the post-Stage-14 split range is now "Stages 15–18".
This commit is contained in:
@@ -9,7 +9,7 @@ name: CI
|
||||
# `development` or `master` (the full test suite — the merge gate) and on a push
|
||||
# to `development` (after a merge). The deploy job runs only for `development`
|
||||
# (PR or merge), so a PR into `master` is test-only; the prod deploy is a manual
|
||||
# workflow (Stage 17).
|
||||
# workflow (Stage 18).
|
||||
#
|
||||
# Console output is kept plain (NO_COLOR + `docker compose --ansi never` +
|
||||
# `--progress plain`) so the Gitea logs stay readable.
|
||||
@@ -176,7 +176,7 @@ jobs:
|
||||
TELEGRAM_GAME_CHANNEL_ID_EN: ${{ vars.TEST_TELEGRAM_GAME_CHANNEL_ID_EN }}
|
||||
TELEGRAM_GAME_CHANNEL_ID_RU: ${{ vars.TEST_TELEGRAM_GAME_CHANNEL_ID_RU }}
|
||||
# The test contour always uses Telegram's test environment — pinned here,
|
||||
# not an operator variable. Stage 17's prod workflow leaves it false.
|
||||
# not an operator variable. Stage 18's prod workflow leaves it false.
|
||||
TELEGRAM_TEST_ENV: "true"
|
||||
VITE_TELEGRAM_BOT_ID: ${{ vars.TEST_VITE_TELEGRAM_BOT_ID }}
|
||||
VITE_TELEGRAM_LINK: ${{ vars.TEST_VITE_TELEGRAM_LINK }}
|
||||
|
||||
@@ -61,7 +61,7 @@ conversation memory — is the source of continuity. Keep it that way.
|
||||
(`docker compose up -d --build` on the runner host + a `GET /` probe). A PR into
|
||||
`master` is test-only.
|
||||
- Merge `development → master` only when CI is green; the **prod** deploy is then a
|
||||
**manual** workflow (Stage 17), never automatic. Secrets/variables are prefixed
|
||||
**manual** workflow (Stage 18), never automatic. Secrets/variables are prefixed
|
||||
`TEST_` / `PROD_` per contour (Gitea 1.26 has no deployment environments).
|
||||
- After any push, watch the run to green before declaring a stage done — use the
|
||||
ready-made watcher, never an inline poll loop:
|
||||
|
||||
@@ -50,7 +50,8 @@ independent (see ARCHITECTURE §9.1).
|
||||
| 14 | Solver & dictionary split (publish solver + scrabble-dictionary repo/artifact) | **done** |
|
||||
| 15 | Dual Telegram bots & language-gated variants | **done** |
|
||||
| 16 | Deploy infra & test contour (Dockerfiles, gateway static UI, compose, observability) | **done** |
|
||||
| 17 | Prod contour deploy (SSH export/import, manual after merge) | todo |
|
||||
| 17 | Test-contour verification & defect fixes | todo |
|
||||
| 18 | Prod contour deploy (SSH export/import, manual after merge) | todo |
|
||||
|
||||
Scaffolding is incremental: `go.work` lists only existing modules; each stage
|
||||
adds the modules it needs.
|
||||
@@ -244,7 +245,7 @@ indices; the premiums.ts parity-test rework.
|
||||
|
||||
### Stage 14 — Solver & dictionary split (TODO-1 + TODO-2)
|
||||
Re-scoped from the original "CI & deploy": that was several sessions of work, so the
|
||||
deploy + observability + the two-bots idea were split into **Stages 15–17** below and this
|
||||
deploy + observability + the two-bots idea were split into **Stages 15–18** below and this
|
||||
stage took only the dependency/artifact split that everything else builds on. Scope: publish
|
||||
`scrabble-solver` as a versioned Gitea module and split the dictionary build into a new
|
||||
`scrabble-dictionary` repo delivering a **release artifact**, then make `scrabble-game` consume
|
||||
@@ -297,7 +298,25 @@ h2c wrap — `/` + `/telegram/` mounts; a committed `dist` placeholder so `go bu
|
||||
build); Postgres healthcheck/volume; whether the connector-scoped compose is retired for the root one;
|
||||
collector/Tempo/Prometheus retention.
|
||||
|
||||
### Stage 17 — Prod contour deploy
|
||||
### Stage 17 — Test-contour verification & defect fixes
|
||||
Scope: exercise the deployed **test contour** end-to-end and fix the defects it surfaces — the
|
||||
"does it actually work in the contour" pass before prod. Bring up the `development` deploy, then
|
||||
verify each piece against a real run: the gateway serves the SPA at `/` and `/telegram/`; the admin
|
||||
console and Grafana sit behind the single `/_gm` Basic-Auth; the Telegram **bots** start (test
|
||||
environment) and the Mini App launches/authenticates; a game can be created and played through (web
|
||||
+ Mini App); the **observability** stack receives data (Prometheus targets up, the dashboards
|
||||
populate incl. `accounts_created_total`/`active_users`, traces reach Tempo); the out-of-app push
|
||||
works. Fix the defects found and harden where the run exposes gaps — notably a CI **connector
|
||||
liveness check** (the deploy probe only hits the gateway today, so a crash-looping connector is
|
||||
invisible — that is how the Stage 16 test-env miss went unnoticed) and **path-conditional CI** (skip
|
||||
the jobs whose code did not change, behind a single always-running gate job so branch-protection
|
||||
required checks stay satisfiable — a skipped required check otherwise blocks the merge).
|
||||
Open details (interview at start): the verification checklist + pass bar; which discovered defects
|
||||
are in-scope vs deferred; the changed-paths design + the aggregate gate job; the connector
|
||||
liveness-check grace period (the VPN sidecar handshake lets the connector restart a few times before
|
||||
it settles).
|
||||
|
||||
### Stage 18 — Prod contour deploy
|
||||
Scope: the **production contour** on a remote host over SSH. Deploy by **container export/import**
|
||||
(`docker save` → `scp`/ssh → `docker load` → `docker compose up` on the remote), the SSH key + host IP
|
||||
in Gitea secrets; **strictly manual** (`workflow_dispatch`) after `development` is merged to `master`
|
||||
@@ -905,7 +924,7 @@ provided cert) at the contour caddy; prod VPN; rollback.
|
||||
CI & deploy (TODO-1, TODO-2, the collector + dashboards). The latter two were written
|
||||
into the plan now as the agreed baseline (each still re-interviews at its own start).
|
||||
(Stage 14 was itself later re-scoped to the solver/dictionary split alone; deploy +
|
||||
observability + the dual-bot idea split into Stages 15–17.)
|
||||
observability + the dual-bot idea split into Stages 15–18.)
|
||||
- **Shared telemetry** (interview): a new `pkg/telemetry` owns the OTel provider
|
||||
bootstrap (exporter selection, W3C propagators, shutdown, Go runtime metrics); the
|
||||
backend `internal/telemetry` is now a thin facade over it (keeping its gin middleware),
|
||||
@@ -985,7 +1004,7 @@ provided cert) at the contour caddy; prod VPN; rollback.
|
||||
- **Stage 14** (interview + implementation, re-scoped + discharges TODO-1/TODO-2):
|
||||
- **Re-scoped to the split** (interview): the original "CI & deploy" was several sessions of work,
|
||||
so it was cut to the **solver/dictionary split** (the dependency foundation) and the deploy +
|
||||
observability + the dual-bot idea were written into the plan as new **Stages 15–17**. The deploy
|
||||
observability + the dual-bot idea were written into the plan as new **Stages 15–18**. The deploy
|
||||
decisions taken at the interview are recorded there (embed the UI in the gateway via `go:embed`;
|
||||
full Collector+Prometheus+Tempo+Grafana stack; **two contours** — test = auto on feature-branch
|
||||
push on the local host, prod = manual SSH `docker save`/`load` after merge; `TEST_`/`PROD_` secret
|
||||
@@ -1047,7 +1066,7 @@ provided cert) at the contour caddy; prod VPN; rollback.
|
||||
`.gitea/workflows/ci.yaml` (Gitea has no cross-workflow `needs`) runs `unit`+`integration`+`ui` on a PR
|
||||
into `development`/`master` and a **gated `deploy`** job (`needs` the three) that auto-rolls the test
|
||||
contour **on a PR into — or a push to — `development`** (owner's "и PR, и push"). A PR into `master` is
|
||||
test-only; prod is the manual Stage 17. The former `go-unit`/`integration`/`ui-test` workflows were
|
||||
test-only; prod is the manual Stage 18. The former `go-unit`/`integration`/`ui-test` workflows were
|
||||
folded in (no path filters — full CI on every PR, per the owner). Console kept plain (`NO_COLOR`,
|
||||
`docker compose --ansi never`, `--progress plain`).
|
||||
- **Gateway serves the UI** (interview, the §13 single-origin): a new `gateway/internal/webui` embeds
|
||||
@@ -1066,7 +1085,7 @@ provided cert) at the contour caddy; prod VPN; rollback.
|
||||
**supersedes Stage 10's** gateway-fronts-`/_gm` model **in the deploy topology** (the gateway's own
|
||||
`/_gm` proxy stays for a local non-caddy run). TLS: the **host caddy** terminates it for the test
|
||||
contour and forwards to `scrabble:80`; the in-compose caddy is parameterised (`CADDY_SITE_ADDRESS`) to
|
||||
own ACME on prod (Stage 17) where there is no host caddy.
|
||||
own ACME on prod (Stage 18) where there is no host caddy.
|
||||
- **Networks** (engineering): inter-service traffic on a private `internal` network (project-scoped DNS,
|
||||
no name collisions on the shared `edge`); only caddy joins the external `edge` (alias `scrabble`). The
|
||||
connector keeps its VPN sidecar (the only egress that needs the tunnel). The connector-scoped
|
||||
@@ -1094,7 +1113,7 @@ provided cert) at the contour caddy; prod VPN; rollback.
|
||||
verified, but the option is idiomatic and now has a `bot` test asserting the `/bot<token>/test/getMe`
|
||||
path). The test contour **pins `TELEGRAM_TEST_ENV=true` in `ci.yaml`** (the contour is the test
|
||||
environment) rather than via a `TEST_`-prefixed variable — removing a confusing double-`TEST` operator
|
||||
knob and the secret-vs-variable footgun; prod (Stage 17) leaves it `false`.
|
||||
knob and the secret-vs-variable footgun; prod (Stage 18) leaves it `false`.
|
||||
|
||||
## Deferred TODOs (cross-stage)
|
||||
|
||||
|
||||
@@ -98,6 +98,6 @@ docker compose -f deploy/docker-compose.yml config # validate (needs the
|
||||
|
||||
CI auto-deploys the **test contour** on a PR into — or push to — `development`
|
||||
(`.gitea/workflows/ci.yaml`); the **prod contour** is a manual deploy after
|
||||
`development → master` (Stage 17). Env reference: [`deploy/.env.example`](deploy/.env.example);
|
||||
`development → master` (Stage 18). Env reference: [`deploy/.env.example`](deploy/.env.example);
|
||||
the topology and the two-contour model are in
|
||||
[`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) §13.
|
||||
|
||||
+2
-2
@@ -1,5 +1,5 @@
|
||||
# Environment for deploy/docker-compose.yml. The CI deploy job (ci.yaml) maps the
|
||||
# Gitea TEST_-prefixed secrets/variables onto these unprefixed names; Stage 17
|
||||
# Gitea TEST_-prefixed secrets/variables onto these unprefixed names; Stage 18
|
||||
# maps the PROD_-prefixed set the same way. Copy to deploy/.env for a local run.
|
||||
#
|
||||
# Full reference (required vs optional, defaults, secret-vs-variable): deploy/README.md.
|
||||
@@ -17,7 +17,7 @@ LOG_LEVEL=info
|
||||
|
||||
# --- Edge / caddy -----------------------------------------------------------
|
||||
# Test: ":80" (the host caddy terminates TLS and forwards to scrabble:80 on the
|
||||
# external `edge` network). Prod (Stage 17): a domain so caddy does its own ACME.
|
||||
# external `edge` network). Prod (Stage 18): a domain so caddy does its own ACME.
|
||||
CADDY_SITE_ADDRESS=:80
|
||||
GM_BASICAUTH_USER=gm
|
||||
GM_BASICAUTH_HASH= # required; `caddy hash-password` bcrypt hash
|
||||
|
||||
+2
-2
@@ -38,7 +38,7 @@ cd deploy && docker compose up -d --build
|
||||
|
||||
**In CI** (the test contour) — `.gitea/workflows/ci.yaml`'s `deploy` job maps the
|
||||
Gitea **`TEST_`-prefixed** secrets/variables onto the unprefixed names below and
|
||||
runs `docker compose up -d --build` on the runner host. Stage 17 (prod) maps the
|
||||
runs `docker compose up -d --build` on the runner host. Stage 18 (prod) maps the
|
||||
**`PROD_`** set the same way. So a Gitea secret named `TEST_POSTGRES_PASSWORD`
|
||||
feeds the compose's `POSTGRES_PASSWORD`, etc.
|
||||
|
||||
@@ -71,7 +71,7 @@ connector **fails at boot** if both are empty.
|
||||
| `GRAFANA_ADMIN_PASSWORD` | secret | `admin` | Grafana admin password. Low impact (the login form is disabled, access is anonymous-admin behind caddy) but set it anyway. |
|
||||
| `TELEGRAM_GAME_CHANNEL_ID_EN` | variable | _(empty)_ | English game-channel id; empty/`0` disables channel posts. |
|
||||
| `TELEGRAM_GAME_CHANNEL_ID_RU` | variable | _(empty)_ | Russian game-channel id; empty/`0` disables channel posts. |
|
||||
| `TELEGRAM_TEST_ENV` | _pinned_ | `false` | `true` routes the bot through Telegram's test environment (`.../bot<token>/test/METHOD`). **The CI test contour pins this to `true` in `ci.yaml`** (the contour is the test environment) — it is not a Gitea variable. Set it in `.env` for a local run; prod (Stage 17) leaves it `false`. |
|
||||
| `TELEGRAM_TEST_ENV` | _pinned_ | `false` | `true` routes the bot through Telegram's test environment (`.../bot<token>/test/METHOD`). **The CI test contour pins this to `true` in `ci.yaml`** (the contour is the test environment) — it is not a Gitea variable. Set it in `.env` for a local run; prod (Stage 18) leaves it `false`. |
|
||||
| `TELEGRAM_API_BASE_URL` | variable | _(empty)_ | Override the Bot API host (a mock/self-hosted server); empty = `https://api.telegram.org`. |
|
||||
| `GATEWAY_DEFAULT_SUPPORTED_LANGUAGES` | variable | `en,ru` | Variant-gating set for non-Telegram logins (web/email/guest). |
|
||||
| `VITE_TELEGRAM_BOT_ID` | variable | _(empty)_ | UI build-arg: numeric bot id for the web Login Widget. |
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
# Connect edge) goes to the gateway. Mirrors ../galaxy-game's /_gm model.
|
||||
#
|
||||
# CADDY_SITE_ADDRESS is ":80" in the test contour (the host caddy terminates TLS
|
||||
# and forwards); set it to a domain in prod (Stage 17) so this caddy does its own
|
||||
# and forwards); set it to a domain in prod (Stage 18) so this caddy does its own
|
||||
# ACME and the contour is self-contained.
|
||||
{
|
||||
admin off
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
# - `edge` (external): the host caddy reaches this contour at `scrabble:80`
|
||||
# (the in-compose caddy's alias). The in-compose caddy terminates only HTTP in
|
||||
# the test contour; the host caddy terminates TLS and forwards. For prod
|
||||
# (Stage 17, no host caddy) set CADDY_SITE_ADDRESS to the domain so the caddy
|
||||
# (Stage 18, no host caddy) set CADDY_SITE_ADDRESS to the domain so the caddy
|
||||
# does its own ACME — the contour is then self-contained.
|
||||
# - The connector egresses to api.telegram.org through the `vpn` sidecar
|
||||
# (network_mode: service:vpn); it answers internal gRPC at `telegram:9091`.
|
||||
|
||||
@@ -569,7 +569,7 @@ Two contours, two secret/variable prefixes (`TEST_` / `PROD_`):
|
||||
host, then a `GET /` probe through caddy). The host caddy terminates TLS and
|
||||
forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP
|
||||
(`CADDY_SITE_ADDRESS=:80`).
|
||||
- **Prod** (Stage 17): a manual SSH deploy after `development → master`. There is no
|
||||
- **Prod** (Stage 18): a manual SSH deploy after `development → master`. There is no
|
||||
host caddy, so the contour ships its own caddy terminating TLS — set
|
||||
`CADDY_SITE_ADDRESS` to the domain and the caddy does its own ACME.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user