Stage 14: solver & dictionary split — consume published module + DAWG artifact (TODO-1/TODO-2)
Tests · Go / test (push) Successful in 8s
Tests · Integration / integration (push) Successful in 11s
Tests · Go / test (pull_request) Successful in 8s
Tests · Integration / integration (pull_request) Successful in 11s

- backend/go.mod pins gitea.iliadenisov.ru/developer/scrabble-solver v1.0.0; the engine's
  imports use the published module path; go.work drops the solver replace (GOPRIVATE fetches
  it directly from Gitea). The solver's wordlist/dictdawg are now public packages.
- CI (go-unit, integration): drop the solver sibling-clone, set GOPRIVATE, and download the
  dictionary DAWG release artifact (scrabble-dawg-<DICT_VERSION>.tar.gz from the new
  scrabble-dictionary repo) for BACKEND_DICT_DIR.
- Docs: ARCHITECTURE §5/§11/§13/§14 + backend/README updated to the published-module +
  release-artifact model. PLAN.md re-scoped Stage 14 to the split and added Stages 15 (deploy
  infra & test contour), 16 (prod contour), 17 (dual Telegram bots); TODO-1/TODO-2 marked done.
This commit is contained in:
Ilia Denisov
2026-06-04 20:00:36 +02:00
parent da6665b967
commit ec435c0e7f
19 changed files with 214 additions and 127 deletions
+118 -59
View File
@@ -47,7 +47,10 @@ independent (see ARCHITECTURE §9.1).
| 11 | Account linking & merge | **done** |
| 12 | Observability & performance (telemetry, metrics, guest GC) | **done** |
| 13 | Alphabet on the wire (UI alphabet-agnostic) | **done** |
| 14 | CI & deploy (multi-service, dictionary artifacts) | todo |
| 14 | Solver & dictionary split (publish solver + scrabble-dictionary repo/artifact) | **done** |
| 15 | Deploy infra & test contour (Dockerfiles, gateway static UI, compose, observability) | todo |
| 16 | Prod contour deploy (SSH export/import, manual after merge) | todo |
| 17 | Dual Telegram bots & language-gated variants | todo |
Scaffolding is incremental: `go.work` lists only existing modules; each stage
adds the modules it needs.
@@ -213,7 +216,7 @@ new `pkg/telemetry`; add telemetry to the **gateway** and the **Telegram connect
domain/operational **metrics** close to the business (game replay/validate timings,
started/abandoned games, live-cache size, chat/nudge counts, the edge roundtrip, Go
runtime metrics); discharge **TODO-3** (abandoned-guest GC). The OTLP collector and
dashboards are stood up with the deploy (Stage 14); the default exporter stays `none`,
dashboards are stood up with the deploy (Stage 15); the default exporter stays `none`,
so CI needs no collector. Performance is operational-metric instrumentation, not
speculative optimisation (the standing "evidence first" rule — no measured hotspot yet).
Open details: exporter default and whether a collector is stood up now; the metric set
@@ -239,35 +242,69 @@ Open details: the fbs shape and `include_alphabet` flag placement; whether to ke
concrete-letter fields during the transition; whether tile exchange moves fully to
indices; the premiums.ts parity-test rework.
### Stage 14 — CI & deploy
Scope: the full **multi-service production deploy** plus the observability backend, also
discharging **TODO-1** and **TODO-2**. Backend + gateway **Dockerfiles** (multi-stage
distroless, mirroring the Stage 9 connector image); the gateway gains **static UI
serving** (the §13 single-origin model — mini-landing at `/`, Mini App under
`/telegram/`), documented since Stage 9 but **not yet implemented**; prod UI build vars
(`VITE_TELEGRAM_BOT_ID` for the Login Widget, the Mini App URL / share link); a root
`deploy/docker-compose.yml` (backend + gateway + Postgres + connector + the OTLP
collector / Grafana stack) on the external `edge` network behind the host caddy, the VPN
sidecar only for the connector; a **deploy workflow** mirroring `../15-puzzle` (host-mode
runner, `docker compose up -d --build`, no external registry, env from Gitea secrets, a
post-deploy probe). Stand up the **OTLP collector + dashboards** (the export wiring landed
in Stage 12).
- **TODO-1 — publish & version the solver:** tag/publish `scrabble-solver`, drop the
`go.work` replace + the CI clone, pin a version in `backend/go.mod` (or keep cloning the
sibling as the minimal-diff fallback). The DAWGs are delivered separately regardless.
- **TODO-2 — versioned dictionary artifacts:** a **new versioned repo** for the wordlist
parsers + built DAWGs, delivered as a **release artifact** (Gitea release / OCI / object
store — not `go get`; DAWGs are data). **One semver label `vX.Y.Z` for the whole set**,
additive: a deploy drops a new `BACKEND_DICT_DIR/<version>/` subdir;
`engine.OpenWithVersions` loads every present subdir at boot; `BACKEND_DICT_VERSION`
selects the default for **new** games. A new version never breaks a running backend
(each game pins its `dict_version`; versions are additive); **only active games need a
dictionary** (validate-at-submit — finished games replay the dictionary-independent
journal), so a version is safe to retire once no active game pins it. The dict repo must
build against the **same `dafsa`/`alphabet`/solver** the backend runs, or letter indexing
drifts (ties into Stage 13).
Open details: embed-vs-mount for the UI build and the DAWG set; the OTLP collector /
dashboard stack; solver-publish vs clone-in-build; load expectations.
### Stage 14 — Solver & dictionary split (TODO-1 + TODO-2)
Re-scoped from the original "CI & deploy": that was several sessions of work, so the
deploy + observability + the two-bots idea were split into **Stages 1517** below and this
stage took only the dependency/artifact split that everything else builds on. Scope: publish
`scrabble-solver` as a versioned Gitea module and split the dictionary build into a new
`scrabble-dictionary` repo delivering a **release artifact**, then make `scrabble-game` consume
both — discharging **TODO-1** and **TODO-2**.
- **TODO-1 — solver published.** `scrabble-solver` renamed to module
`gitea.iliadenisov.ru/developer/scrabble-solver`, tagged **v1.0.0**; `wordlist`/`dictdawg`
de-internalised to public packages (the dict repo imports them); `cmd/builddict`/`dictprep`/the
`dictionaries` submodule moved out; `internal/dict` repointed at the committed `dawg/*.dawg`
fixtures. `backend/go.mod` pins `v1.0.0`; the `go.work` replace and the CI sibling-clone are
gone; `GOPRIVATE=gitea.iliadenisov.ru/*` makes go fetch it directly (no public proxy/checksum DB).
- **TODO-2 — dictionary artifacts.** New repo `developer/scrabble-dictionary` holds the word-list
sources + `cmd/builddict` and builds the three DAWGs against the **published solver + pinned
`dafsa`/`alphabet` v1.1.0**, so they are byte-identical to the solver's fixtures (no index drift).
Released as `scrabble-dawg-vX.Y.Z.tar.gz` (flat, one semver per set); the Go workflows download it
and point `BACKEND_DICT_DIR` at it. The runtime contract is unchanged (additive
`BACKEND_DICT_DIR/<version>/`, `engine.OpenWithVersions`, per-game `dict_version` pin; a version is
safe to retire once no active game pins it).
### Stage 15 — Deploy infra & test contour
Scope: the deploy machinery + the **test contour** (the bulk of the original Stage 14). Backend +
gateway **Dockerfiles** (multi-stage distroless, mirroring the Stage 9 connector image); the gateway
gains **static UI serving****embedded** via `go:embed` (a node build stage in the gateway image),
SPA served at both `/` (web) and `/telegram/` (Mini App), the §13 single-origin model; prod UI build
vars (`VITE_TELEGRAM_BOT_ID`, `VITE_TELEGRAM_LINK`, `VITE_GATEWAY_URL`) as image build-args; a root
`deploy/docker-compose.yml` (backend + gateway + Postgres + connector + VPN sidecar + the **full
observability stack** — OTel Collector + Prometheus + Tempo + Grafana with provisioned dashboards) on
the external `edge` network behind the host caddy (VPN sidecar only for the connector); the backend
image pulls the DAWG release artifact (Stage 14). **The test contour deploys automatically on push to
a feature branch** (`docker compose up -d --build` on the local host where the gitea runner lives),
with a post-deploy probe (`GET /` on the gateway). Test-contour secrets use the **`TEST_`** prefix
(see Stage 16).
Open details (re-interview at start): the dashboard set; the gateway static-serving hook (before the
h2c wrap — `/` + `/telegram/` mounts; a committed `dist` placeholder so `go build` works without a UI
build); Postgres healthcheck/volume; whether the connector-scoped compose is retired for the root one;
collector/Tempo/Prometheus retention.
### Stage 16 — Prod contour deploy
Scope: the **production contour** on a remote host over SSH. Deploy by **container export/import**
(`docker save``scp`/ssh → `docker load``docker compose up` on the remote), the SSH key + host IP
in Gitea secrets; **strictly manual** (`workflow_dispatch`) after a feature branch is merged to
`master`. Two-contour config uses **`TEST_`/`PROD_` secret/variable prefixes** — Gitea 1.26 has no
deployment environments (verified: the `environments` API 404s), so a flat prefixed namespace is the
convention.
Open details (re-interview): export/import vs a registry trade-off; prod domain/TLS at the remote
caddy; prod VPN; rollback.
### Stage 17 — Dual Telegram bots & language-gated variants *(feature; own interview)*
Scope (owner's idea, to design in detail at its own start): run **two bots in the one connector
container** — one for the English audience, one for Russian — each with its own token + game-channel id
+ service-language tag (the same Telegram user id spans both). `initData` validation tries each bot's
token in turn (none succeeds ⇒ invalid). The connector returns the **service language `en`/`ru`**;
`Notify`/`SendToUser` take a language key so the right bot delivers. The UI **gates the game-type
(variant) choice** by service language (en → English; ru → Russian + Эрудит).
Open details (own interview): which bot sends a notification for an **existing** game (game language vs
the player's service language) given one user id spans both bots; behaviour for **non-Telegram**
players (web/email/guest — ungated, or by interface language); the proto/wire changes
(`ValidateInitData` service-language field, a bot/language selector on the push RPCs); per-bot config +
tests. Engineering feedback already captured at the Stage 14 interview: the two-bots-in-one-container +
sequential validation + language-keyed routing model is sound.
## Refinements logged during implementation
@@ -862,13 +899,15 @@ dashboard stack; solver-publish vs clone-in-build; load expectations.
+ performance + guest GC; **Stage 13** = alphabet-on-the-wire (TODO-4); **Stage 14** =
CI & deploy (TODO-1, TODO-2, the collector + dashboards). The latter two were written
into the plan now as the agreed baseline (each still re-interviews at its own start).
(Stage 14 was itself later re-scoped to the solver/dictionary split alone; deploy +
observability + the dual-bot idea split into Stages 1517.)
- **Shared telemetry** (interview): a new `pkg/telemetry` owns the OTel provider
bootstrap (exporter selection, W3C propagators, shutdown, Go runtime metrics); the
backend `internal/telemetry` is now a thin facade over it (keeping its gin middleware),
and the gateway and connector gained telemetry runtimes. A configurable **`otlp`**
exporter was added alongside `none`/`stdout`; the **default stays `none`**, the OTLP
endpoint comes from the standard `OTEL_EXPORTER_OTLP_*` env, and the collector +
dashboards are Stage 14 (so CI needs none). `otelgrpc` instruments the backend push
dashboards are Stage 15 (so CI needs none). `otelgrpc` instruments the backend push
server, the gateway's backend + connector clients, and the connector's gRPC server.
New config `GATEWAY_SERVICE_NAME`/`GATEWAY_OTEL_*` and `TELEGRAM_SERVICE_NAME`/
`TELEGRAM_OTEL_*`; the backend's existing `BACKEND_OTEL_*` gained the `otlp` value.
@@ -938,33 +977,52 @@ dashboard stack; solver-publish vs clone-in-build; load expectations.
handled by construction (the running backend produces the table, so client↔server cannot
drift); the DAWG/solver build-time agreement remains **Stage 14 / TODO-2**.
- **Stage 14** (interview + implementation, re-scoped + discharges TODO-1/TODO-2):
- **Re-scoped to the split** (interview): the original "CI & deploy" was several sessions of work,
so it was cut to the **solver/dictionary split** (the dependency foundation) and the deploy +
observability + the dual-bot idea were written into the plan as new **Stages 1517**. The deploy
decisions taken at the interview are recorded there (embed the UI in the gateway via `go:embed`;
full Collector+Prometheus+Tempo+Grafana stack; **two contours** — test = auto on feature-branch
push on the local host, prod = manual SSH `docker save`/`load` after merge; `TEST_`/`PROD_` secret
prefixes since Gitea 1.26 has no environments — verified).
- **TODO-1 — publish solver** (interview: "опубликовать и запинить"): `scrabble-solver` renamed to
module `gitea.iliadenisov.ru/developer/scrabble-solver`, `internal/{wordlist,dictdawg}`
**de-internalised** to public packages (so the dict repo imports one builder — no drift), the build
pipeline (`cmd/builddict`, `dictprep`, the `dictionaries` submodule) moved out, `internal/dict`
repointed at the committed `dawg/*.dawg` fixtures, tagged **v1.0.0**. scrabble-game pins it in
`backend/go.mod`, drops the `go.work` replace + the CI clone, and sets `GOPRIVATE=gitea.iliadenisov.ru/*`
(go fetches the module directly from Gitea — verified end-to-end). The solver hash lives in
`go.work.sum` (workspace mode; the bare-path `scrabble/pkg` replace still blocks `go mod tidy`).
- **TODO-2 — dictionary repo** (interview: "полный TODO-2, новый репо"): `developer/scrabble-dictionary`
builds the three DAWGs against the published solver + pinned `dafsa`/`alphabet` v1.1.0,
**byte-identical** to the solver fixtures; published as the release artifact
`scrabble-dawg-v1.0.0.tar.gz`; both Go workflows download it for `BACKEND_DICT_DIR` instead of
cloning the solver. English source vendored from `kamilmielnik/scrabble-dictionaries`; the Эрудит
fold is committed as `dictprep/russian/erudit.txt`, so the build needs no `python`.
- **Bootstrap nuances** (encountered): the dict repo was created empty with a protected `master`, so
it was seeded once via an owner-authorised protection lift→push→restore (a subsequent CI-fix push
correctly went through a PR, not another lift); it was made **public** (like the solver) so the Go
workflows fetch the artifact anonymously. Its CI is a **build-only** validation gate — the
auto-release step's `${{ github.* }}` contexts failed the Gitea workflow compile, so releases are
published manually for now (a logged follow-up).
## Deferred TODOs (cross-stage)
- **TODO-1 — publish & version the solver.** Once `scrabble-solver` is stable,
give it a real module URL and switch `backend` to a versioned dependency,
dropping the `go.work` replace and the CI clone. Removes the floating
`master` dependency accepted for now (Stage 2 interview). **Planned for Stage 14**
(it cleans up the backend Docker build; a clone-in-build fallback stays available).
- **TODO-2 — split the solver into engine vs dictionary generator + versioned
dictionary artifacts.** Owner's idea, with the caveats agreed at the Stage 2
interview: the split is sound (build-time wordlist→DAWG vs runtime load have
different lifecycles and shrink the runtime dependency surface), **but** the
generator must pin the **same** `dafsa`/`alphabet` versions and alphabet
definitions as the runtime engine or the on-disk format / letter indexing
drifts and silently corrupts validation. For delivery prefer **Git LFS or an
artifact store** (Gitea releases / OCI artifact / object storage) over a raw
git submodule (the ~0.50.7 MB DAWGs are regenerated wholesale and bloat git
history); pin by tag/hash for a reproducible startup set. A submodule/LFS pull
is a **deploy-time** way to populate the directory, **not** the runtime
dynamic-reload mechanism (**implemented in Stage 10**: a per-version subdirectory
`BACKEND_DICT_DIR/<version>/` loaded via `Registry.LoadAvailable`, restart-restored by
`engine.OpenWithVersions`) — keep the `BACKEND_DICT_DIR` directory as
the runtime contract: a new `.dawg` appears in it and is loaded with
`dawg.Load`. **Planned for Stage 14**, agreed resolution: a **new versioned repo**
for the parsers + built DAWGs, delivered as a **release artifact** (not `go get`),
versioned with **one semver label for the whole set** (additive; old versions retired
once no active game pins them — see Stage 14). The generator must build against the same
`dafsa`/`alphabet`/solver as the runtime (the index-drift caveat, shared with TODO-4).
- ~~**TODO-1 — publish & version the solver.**~~ **Done in Stage 14.** `scrabble-solver` is
published as module `gitea.iliadenisov.ru/developer/scrabble-solver` (tagged `v1.0.0`, with
`wordlist`/`dictdawg` de-internalised to public packages); `backend/go.mod` pins it, the `go.work`
replace and the CI sibling-clone are gone, and `GOPRIVATE=gitea.iliadenisov.ru/*` fetches it directly
(no public proxy/checksum DB). Removes the floating `master` dependency accepted since Stage 2.
- ~~**TODO-2 — split the solver into engine vs dictionary generator + versioned dictionary
artifacts.**~~ **Done in Stage 14.** A new repo `developer/scrabble-dictionary` holds the word-list
sources + `cmd/builddict` (moved out of the solver, with `dictprep` and the `dictionaries` submodule)
and builds the three DAWGs against the **published solver + pinned `dafsa`/`alphabet` v1.1.0** — the
output is **byte-identical** to the solver's committed fixtures, so the index-drift caveat is handled
by construction. Delivered as a Gitea **release artifact** `scrabble-dawg-vX.Y.Z.tar.gz` (not
`go get`; DAWGs are data; **one semver label for the whole set**); the Go workflows download it for
`BACKEND_DICT_DIR`. The runtime dynamic-reload contract (per-version `BACKEND_DICT_DIR/<version>/` via
`Registry.LoadAvailable` / `engine.OpenWithVersions`, Stage 10) is unchanged — a deploy drops a new
set into the directory; a version is safe to retire once no active game pins it.
- ~~**TODO-3 — garbage-collect abandoned guest accounts.**~~ **Done in Stage 12.**
A periodic `account.GuestReaper` deletes guests (`is_guest`) **with no game seat at
all** whose account age exceeds `BACKEND_GUEST_RETENTION` (default 30 d, swept every
@@ -984,8 +1042,9 @@ dashboard stack; solver-publish vs clone-in-build; load expectations.
produced from the solver ruleset (`engine.AlphabetTable`), so it is pinned by the solver
version and cannot drift from the running backend, and `ui/src/lib/premiums.ts` is now
geometry only. The durable journal / history / GCG stay decoded concrete characters (§9.1,
unchanged). The DAWG/solver build-time agreement (the original caveat, shared with TODO-2)
remains Stage 14.
unchanged). The DAWG/solver build-time agreement (the original caveat, shared with TODO-2) was
discharged in Stage 14: the dict repo builds against the published solver + pinned
`dafsa`/`alphabet`, byte-identical to the fixtures.
- **TODO-5 — QR friend codes (owner's idea, Stage 8).** *Partially done in Stage 9:*
the deep-link scheme now exists (`f<code>`, shared Go ↔ TS), the bot redeems it on
launch, and the UI shows a **share-to-Telegram** link for an issued code when