diff --git a/.gitignore b/.gitignore index 259564b..912d4c2 100644 --- a/.gitignore +++ b/.gitignore @@ -16,3 +16,6 @@ # Local, unstaged env overrides **/.env.local **/.env.*.local + +# Claude Code harness runtime artifacts +.claude/scheduled_tasks.lock diff --git a/CLAUDE.md b/CLAUDE.md index ba32697..28a1502 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -127,8 +127,8 @@ docs/ .gitea/workflows/ PLAN.md CLAUDE.md README.md gateway/ ui/ pkg/ # added by their stages platform/telegram/ # Telegram connector side-service (Stage 9): bot + gRPC API loadtest/ # module scrabble/loadtest: the pre-release stress harness (R2) -backend/Dockerfile gateway/Dockerfile platform/telegram/Dockerfile loadtest/Dockerfile # multi-stage distroless (Stage 16; loadtest R2) -deploy/ # docker-compose + caddy + otelcol/prometheus/tempo/grafana (+ cAdvisor/postgres_exporter, R2) +backend/Dockerfile gateway/Dockerfile platform/telegram/Dockerfile loadtest/Dockerfile # multi-stage distroless (Stage 16; loadtest R2); gateway/Dockerfile also has the `landing` target (R3) +deploy/ # docker-compose + caddy + landing + otelcol/prometheus/tempo/grafana (+ cAdvisor/postgres_exporter, R2) ``` ## Build & test @@ -144,8 +144,9 @@ go run ./backend/cmd/backend # /healthz, /readyz on :8080 cd ui && pnpm install && pnpm check && pnpm test:unit && pnpm build # the UI (Stage 7+) pnpm start # UI mock mode: lobby -> game, no backend -docker build -f backend/Dockerfile -t scrabble-backend . # images (Stage 16); gateway embeds the UI -docker build -f gateway/Dockerfile -t scrabble-gateway . +docker build -f backend/Dockerfile -t scrabble-backend . # images (Stage 16); gateway embeds the SPA +docker build -f gateway/Dockerfile --target gateway -t scrabble-gateway . +docker build -f gateway/Dockerfile --target landing -t scrabble-landing . # static landing (R3) docker compose -f deploy/docker-compose.yml config # validate the full contour ``` diff --git a/PRERELEASE.md b/PRERELEASE.md index 46cf842..51a2c75 100644 --- a/PRERELEASE.md +++ b/PRERELEASE.md @@ -19,7 +19,7 @@ the edge before prod. Each phase maps back to the owner's raw pre-release TODO l |---|-------|-----------|--------| | R1 | Schema & naming reset | 1 + 10 | **done** | | R2 | Stress harness + contour observability + early run | 9a | **done** | -| R3 | Edge hardening | 2 + 8 + 3 | todo | +| R3 | Edge hardening | 2 + 8 + 3 | **done** | | R4 | Push enrichment + kill the last poll | 4 + 5 | todo | | R5 | Bundle slimming | 6 | todo | | R6 | Refactor + docs reconciliation + de-staging | 7 | todo | @@ -253,3 +253,31 @@ Then Stage 18. it feeds R3 (h2c `MaxConcurrentStreams`/timeouts, body-size cap), R6 and R7 (per-player transports, separate hardware, pool/limit sizing). - **CI:** `./loadtest/...` added to the path filter + vet/build/test; `go.work.sum` carries the new deps. + +- **R3** (interview + implementation): + - **Locked decisions:** the flag column lands by **editing the R1 baseline** (+ a contour schema + wipe after merge — no migration chain accrues before prod); auto-flag defaults **1000 rejected / + 10 min** (`BACKEND_HIGHRATE_FLAG_THRESHOLD`/`_WINDOW`, rolling window, set-once, operator clears, + no auto-ban); landing image = **caddy:2-alpine**; throttle data flows **gateway → backend** (a + 30 s per-key summary POST to the new `/api/v1/internal/ratelimit/report`, the existing trusted + direction) with the episode window + flag rule in the backend (`internal/ratewatch`); rejection + logging = **Warn summary per key per window + Debug per rejection** — a deliberate deviation from + the phase's "structured log per rejection" (the R2 hammer would have logged ~522k lines in + minutes); all three R2-report tails included (explicit h2c sizing, the session-resolve failure + cause at Warn, reviving the admin limiter). + - **Body cap:** `GATEWAY_MAX_BODY_BYTES` (default 1 MiB) as both the Connect per-message read limit + and an `http.MaxBytesReader` wrap of the public mux; an oversized Execute is `resource_exhausted`. + - **Dead config found:** `AdminPerMinute`/`AdminBurst` were never wired — the gateway `/_gm` mount is + now 429-guarded per IP ahead of its Basic-Auth. The caddy-fronted contour path stays unlimited + (stock caddy has no limiter) — an accepted gap, recorded in `docs/ARCHITECTURE.md` §12. + - **Landing split:** a `landing` target in `gateway/Dockerfile` (the UI build stage is shared; + identical compose build args keep it one cached build); the gateway drops `landing.html` from the + embed and 308-redirects `/` → `/app/`; the contour caddy routes `/app/`, `/telegram/` and the + Connect path to the gateway and the catch-all to the landing container; the CI deploy probe now + checks both `/` (landing) and `/app/` (gateway). + - **Observability:** `gateway_rate_limited_total{class}` (user/public/email/admin, aggregate-only) + + a rate-vs-rejections panel on the Edge/UX dashboard; the admin console gains the **Throttled** + page (the in-memory episode window, reset-on-restart like `active_users`, plus the flagged-account + queue) and the flag badge / clear action on the user list / card. + - The jet regen also restored the previously missing `game_drafts`/`game_hidden` generated models + (their tables were added after the last jetgen run; no behaviour change). diff --git a/backend/README.md b/backend/README.md index 03453ce..724642f 100644 --- a/backend/README.md +++ b/backend/README.md @@ -99,6 +99,14 @@ durable owner — then the durable account wins and a fresh session is minted fo The `accounts.paid_account`/`merged_into`/`merged_at` columns back this. This supersedes the Stage 8 `email.bind.*` edge surface (the `RequestCode`/`ConfirmCode` primitives stay). +**R3** adds rate-limit observability: the gateway posts its periodic rejection +summaries to `POST /api/v1/internal/ratelimit/report`; `internal/ratewatch` keeps a +bounded in-memory episode window for the console's **Throttled** page and applies the +conservative auto-flag — an account sustaining `BACKEND_HIGHRATE_FLAG_THRESHOLD` +rejected calls within `BACKEND_HIGHRATE_FLAG_WINDOW` gets the soft, reversible +`accounts.flagged_high_rate_at` marker (set-once; a badge in the user list and a +**Clear** action on the user card; never an automatic ban). + ## Package layout ``` @@ -121,6 +129,7 @@ internal/lobby/ # in-memory matchmaking pool (+ robot substitution) + frien internal/robot/ # human-like robot opponent: account pool, seed-derived strategy, move driver internal/adminconsole/ # server-rendered admin console (Go templates + embedded CSS, view models), served at /_gm internal/connector/ # backend gRPC client to the Telegram connector (operator broadcasts) +internal/ratewatch/ # gateway rate-limit reports: episode window for the console + the high-rate auto-flag (R3) ``` ## Configuration (environment) @@ -153,6 +162,8 @@ internal/connector/ # backend gRPC client to the Telegram connector (operator b | `BACKEND_CONNECTOR_ADDR` | — | Telegram connector gRPC address for admin-console operator broadcasts. Empty disables broadcasts. | | `BACKEND_GUEST_REAP_INTERVAL` | `1h` | How often the abandoned-guest reaper sweeps. | | `BACKEND_GUEST_RETENTION` | `720h` | Account age past which a guest with no game seat is deleted. | +| `BACKEND_HIGHRATE_FLAG_THRESHOLD` | `1000` | Gateway-reported rejected calls within the window past which an account is soft-flagged (R3). | +| `BACKEND_HIGHRATE_FLAG_WINDOW` | `10m` | The rolling window those rejections accumulate over. | ## Run diff --git a/deploy/README.md b/deploy/README.md index 9435c8a..5caa0e5 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -1,9 +1,9 @@ # deploy -The full Scrabble contour: `backend` + `gateway` + Postgres + the Telegram -connector (with a VPN sidecar) + the observability stack (OTel Collector → -Prometheus + Tempo → Grafana), fronted by a **caddy** that owns a single `/_gm` -Basic-Auth (the admin console + Grafana). Topology and the decision record are in +The full Scrabble contour: `backend` + `gateway` + the static `landing` + Postgres + +the Telegram connector (with a VPN sidecar) + the observability stack (OTel +Collector → Prometheus + Tempo → Grafana), fronted by a **caddy** that owns a single +`/_gm` Basic-Auth (the admin console + Grafana). Topology and the decision record are in [`../docs/ARCHITECTURE.md`](../docs/ARCHITECTURE.md) §13; this file is the operational reference for **every environment variable**. @@ -11,8 +11,9 @@ operational reference for **every environment variable**. | Service | Image | Role | | --- | --- | --- | -| `caddy` | `caddy:2-alpine` | Edge proxy (alias `scrabble` on `edge`): single `/_gm` Basic-Auth → admin console + Grafana; everything else → gateway. TLS per `CADDY_SITE_ADDRESS`. | -| `gateway` | built (`gateway/Dockerfile`) | Public edge; serves the embedded landing at `/` and the game SPA at `/app/` + `/telegram/`; Connect-RPC edge. | +| `caddy` | `caddy:2-alpine` | Edge proxy (alias `scrabble` on `edge`): single `/_gm` Basic-Auth → admin console + Grafana; `/app/`, `/telegram/` + the Connect path → gateway; the catch-all (incl. `/`) → landing. TLS per `CADDY_SITE_ADDRESS`. | +| `gateway` | built (`gateway/Dockerfile`, target `gateway`) | Public edge; serves the embedded game SPA at `/app/` + `/telegram/`; Connect-RPC edge. `/` redirects to `/app/`. | +| `landing` | built (`gateway/Dockerfile`, target `landing`) | Static landing page at `/` (caddy:2-alpine + the shared Vite build, `deploy/landing/Caddyfile`); absorbs stray public paths (R3). | | `backend` | built (`backend/Dockerfile`) | Domain service; bakes in the DAWG dictionaries; runs migrations at boot. | | `postgres` | `postgres:17-alpine` | Database (named volume, `pg_isready` healthcheck). | | `vpn` + `telegram` | sidecar + built (`platform/telegram/Dockerfile`) | Telegram connector; egresses through the AmneziaWG sidecar; internal gRPC at `telegram:9091`. | @@ -88,8 +89,9 @@ connector **fails at boot** if both are empty. | `VITE_TELEGRAM_GAME_CHANNEL_NAME_RU` | variable | _(empty)_ | UI build-arg: the landing "Play in Telegram" link for the **Russian** bot (e.g. `https://t.me/Erudit_Game`). | | `VITE_GATEWAY_URL` | variable | _(empty)_ | UI build-arg: gateway origin; empty = same-origin (the usual single-origin deploy). | -The five `VITE_*` are **build-args** baked into the gateway image at build time, so -changing them requires a rebuild (`--build`), not just a restart. +The five `VITE_*` are **build-args** baked into the gateway and landing images at +build time (both targets share one UI build stage — keep the args identical so it is +built once), so changing them requires a rebuild (`--build`), not just a restart. ## Fixed internal wiring (not operator-set) diff --git a/deploy/grafana/dashboards/edge-ux.json b/deploy/grafana/dashboards/edge-ux.json index 44b49d6..413a268 100644 --- a/deploy/grafana/dashboards/edge-ux.json +++ b/deploy/grafana/dashboards/edge-ux.json @@ -34,6 +34,18 @@ "fieldConfig": { "defaults": { "unit": "reqps" }, "overrides": [] }, "datasource": { "type": "prometheus", "uid": "prometheus" }, "targets": [{ "refId": "A", "expr": "sum(rate(edge_request_duration_count[5m])) by (result)", "legendFormat": "{{result}}" }] + }, + { + "type": "timeseries", + "title": "Rate limiting — request rate vs rejections (R3)", + "description": "Aggregate only (no per-user labels, the Stage 12/17 discipline): total edge request rate against the limiter rejection rate by class. Per-key detail lives in the admin console's Throttled view.", + "gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 }, + "fieldConfig": { "defaults": { "unit": "reqps" }, "overrides": [] }, + "datasource": { "type": "prometheus", "uid": "prometheus" }, + "targets": [ + { "refId": "A", "expr": "sum(rate(edge_request_duration_count[5m]))", "legendFormat": "requests" }, + { "refId": "B", "expr": "sum(rate(gateway_rate_limited_total[5m])) by (class)", "legendFormat": "rejected · {{class}}" } + ] } ] } diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 6b6e0f2..d1546a5 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -98,6 +98,15 @@ dropped). Horizontal scaling is explicit future work. response was lost — its button is disabled while offline and the player re-issues it on reconnect). A reachability watcher (a lightweight `profile.get` probe) clears the signal when no other traffic is in flight; the live `Subscribe` stream's drop/recovery feeds the same signal. + **Edge hardening (R3):** every request body on the public listener is capped at + `GATEWAY_MAX_BODY_BYTES` (default 1 MiB — far above any legitimate payload), both at the HTTP + layer (`http.MaxBytesReader`) and as the Connect per-message read limit, so an oversized + `Execute` is refused (`resource_exhausted`) without buffering. The h2c server carries explicit + sizing: `MaxConcurrentStreams` 250 (the x/net default made visible — a real client holds one + `Subscribe` stream plus a few unary calls) and a 3-minute connection `IdleTimeout` (a live + `Subscribe` stream keeps its connection active, so only abandoned connections are reaped); the + `http.Server` sets only `ReadHeaderTimeout` (10 s) — Read/WriteTimeout would kill the stream. + R7 revisits the exact values under load. - **Alphabet on the wire (Stage 13)**: live play exchanges **alphabet indices**, not concrete letters. The rack (`StateView.rack`), the `SubmitPlay`/`Evaluate` tiles, the `Exchange` tiles and the `CheckWord` word are `ubyte` indices into the variant's alphabet @@ -572,6 +581,21 @@ promotions) is future work and would deliver short markdown messages (text + lin distinct accounts that performed an authenticated edge action in the window. The gauge is single-process by design (single-instance MVP, §10): it is correct for one gateway, resets on restart, and is a live operational figure, not a billing count. +- **Rate-limit observability (R3):** every limiter rejection increments the gateway + counter `gateway_rate_limited_total` (`class` = user/public/email/admin — aggregate + only, honouring the no-per-user-label discipline above) and logs one **Debug** line; + a gateway reporter drains the per-key rejection tracker every 30 s, emits one **Warn** + summary per throttled key and posts the report to the backend + (`POST /api/v1/internal/ratelimit/report`, network-trusted like `sessions/resolve`). + The backend's `ratewatch` keeps a bounded in-memory episode window (single-instance, + resets on restart, like `active_users`) surfaced on the admin console's **Throttled** + page next to the flagged-account review queue, and applies the **conservative + auto-flag**: an account sustaining `BACKEND_HIGHRATE_FLAG_THRESHOLD` rejected calls + (default 1000) within `BACKEND_HIGHRATE_FLAG_WINDOW` (default 10 min) gets the soft, + reversible `accounts.flagged_high_rate_at` marker — set once, shown in the user + list/detail, cleared by the operator, **never an automatic ban** and never a request + gate. The Edge/UX dashboard graphs the aggregate request rate against the rejection + rate by class. - Unauthenticated `GET /healthz` (liveness) and `GET /readyz` (readiness — the database answers a bounded ping and the session cache is warmed). - The backend serves a **second listener** — a gRPC server @@ -582,12 +606,12 @@ promotions) is future work and would deliver short markdown messages (text + lin | Concern | Enforced by | | --- | --- | -| Public rate limiting / anti-abuse | gateway | +| Public rate limiting / anti-abuse | gateway (per-IP public/email/admin classes, per-user authenticated class; a request body cap of `GATEWAY_MAX_BODY_BYTES`; rejections are metered, summarised to the backend and surfaced in the admin console with a conservative reversible auto-flag — R3, §11) | | Telegram initData validation (bot-token HMAC) | the Telegram connector; the gateway delegates it over gRPC, so the bot token lives only in the connector | | Session minting; email-code / guest validation | gateway (with backend) | | Session → `user_id` resolution, `X-User-ID` injection | gateway | | Authorisation, ownership, state transitions | backend (`X-User-ID` is the sole identity input) | -| Admin authentication | a single Basic-Auth gate on `/_gm/*`, forwarded **verbatim** to the backend's server-rendered admin console (and, in the deployed contour, routing `/_gm/grafana/*` to Grafana). In the deploy the **caddy** owns this gate (§13); a local non-caddy run uses the gateway's own `GATEWAY_ADMIN_*` proxy. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked | +| Admin authentication | a single Basic-Auth gate on `/_gm/*`, forwarded **verbatim** to the backend's server-rendered admin console (and, in the deployed contour, routing `/_gm/grafana/*` to Grafana). In the deploy the **caddy** owns this gate (§13); a local non-caddy run uses the gateway's own `GATEWAY_ADMIN_*` proxy, which the per-IP admin limiter class guards ahead of its Basic-Auth (R3) — the caddy-fronted path has no limiter (stock caddy), an accepted gap. The backend trusts the proxy (no admin principal) and guards its state-changing POSTs with a **same-origin** check — the console's CSRF defence. No operator identity is tracked | | backend ↔ gateway ↔ connector trust | the network (only gateway may reach backend; the connector serves unauthenticated gRPC on the internal segment) | This is an explicit, accepted MVP risk: compromise of the gateway↔backend @@ -597,7 +621,7 @@ mutual auth is a future hardening step. **Short numeric codes** (email confirm-codes and Stage 8 friend codes) are stored only as SHA-256 hashes and are short-lived and single-use. The unauthenticated email path carries a tight per-IP sub-limit (5 / 10 min); the **friend-code redeem** -is authenticated, so it rides the per-user limit (120 / min) and is further bounded +is authenticated, so it rides the per-user limit (300 / min) and is further bounded by the code's 12 h TTL, single use, and **one live code per issuer** (which caps the valid-code population). Brute-forcing a 6-digit friend code within these limits is an accepted MVP risk with low blast radius (an unwanted friendship is removable/blockable); @@ -605,22 +629,27 @@ a dedicated redeem sub-limit or a longer code is the hardening step if abuse app ## 13. Deployment (informational) -Single public origin, path-routed. The gateway **embeds** the static UI build -(`go:embed`, baked in by a node stage in `gateway/Dockerfile`). The Vite build has two -entries: a lightweight **landing page** served at `/`, and the game **SPA** served at +Single public origin, path-routed. The Vite build has two entries: a lightweight +**landing page** and the game **SPA**. The gateway **embeds** the SPA build +(`go:embed`, baked in by a node stage in `gateway/Dockerfile`) and serves it at `/app/` (web) and `/telegram/` (the Telegram Mini App; outside Telegram that path -redirects to the root — the client-side guard). Hash-named `/assets/*` are served +redirects to the root — the client-side guard); a stray hit on the gateway's `/` +308-redirects to `/app/`. The **landing** ships in its own static container (R3): the +`landing` target of `gateway/Dockerfile` (caddy:2-alpine + the same Vite build, +`deploy/landing/Caddyfile`) serves it at `/`, so stray public traffic is absorbed by +static file serving and never reaches the Go edge. Hash-named `/assets/*` are served `immutable` (a relaunch is a cache hit, not a re-download); the HTML shells are -`no-cache` so a new deploy is picked up. An in-compose **caddy** is the -contour's edge: it owns a single `/_gm` Basic-Auth and routes `/_gm/grafana/*` to -**Grafana** (anonymous-admin, so the one shared login gates it with no per-user -Grafana accounts) and the rest of `/_gm/*` to the backend-rendered **admin console**; -everything else (`/`, `/app/`, `/telegram/`, the Connect edge) goes to the gateway. The +`no-cache` so a new deploy is picked up — both containers apply the same caching. An +in-compose **caddy** is the contour's edge: it owns a single `/_gm` Basic-Auth and +routes `/_gm/grafana/*` to **Grafana** (anonymous-admin, so the one shared login gates +it with no per-user Grafana accounts) and the rest of `/_gm/*` to the backend-rendered +**admin console**; `/app/`, `/telegram/` and the Connect path go to the gateway; the +catch-all — notably the landing at `/` — goes to the landing container. The **Telegram connector** runs as a separate container with **no public ingress** — it long-polls Telegram and egresses through a VPN sidecar, answering only internal gRPC. The full contour (`deploy/docker-compose.yml`) runs one `gateway`, one `backend`, -one Postgres, the connector (+ its VPN sidecar) and the **observability stack** — +one Postgres, the static `landing`, the connector (+ its VPN sidecar) and the **observability stack** — OTel Collector (OTLP/gRPC ingest → Prometheus metrics + Tempo traces) and Grafana with provisioned datasources and dashboards. All three services export OTLP to the collector; the connector shares the VPN sidecar's netns, so its `AWG_CONF` must not @@ -633,7 +662,8 @@ network (project-scoped DNS); only caddy joins the shared external `edge` networ Two contours, two secret/variable prefixes (`TEST_` / `PROD_`): - **Test** (Stage 16): auto-deploys on a PR into — or a push to — `development` (`.gitea/workflows/ci.yaml` → `docker compose up -d --build` on the Gitea runner - host, then a `GET /` probe through caddy). The host caddy terminates TLS and + host, then `GET /` + `GET /app/` probes through caddy — the landing container and + the gateway, R3). The host caddy terminates TLS and forwards the domain to `scrabble:80`, so the in-compose caddy serves plain HTTP (`CADDY_SITE_ADDRESS=:80`). The in-compose caddy **trusts X-Forwarded-For from private-range upstreams** (`trusted_proxies private_ranges`), so the real client IP — diff --git a/docs/FUNCTIONAL.md b/docs/FUNCTIONAL.md index c425027..3399b89 100644 --- a/docs/FUNCTIONAL.md +++ b/docs/FUNCTIONAL.md @@ -171,3 +171,11 @@ applied after a reload). When a Telegram connector is configured an operator can **message a user** (by their Telegram identity) or **post to the game channel**. State-changing actions are protected by a same-origin check; the console tracks no operator identity. + +The console also surfaces **rate-limit abuse** (R3): a **Throttled** page lists the +recently throttled users/IPs the gateway reported (an in-memory window — it resets on +a backend restart) and the accounts currently carrying the soft **high-rate flag**. An +account sustaining rejections past a tunable threshold is flagged automatically — +the marker is reversible, shown as a badge in the user list and on the user card, and +**never blocks play**; the operator reviews and clears it from the user card. There is +no automatic ban. diff --git a/docs/FUNCTIONAL_ru.md b/docs/FUNCTIONAL_ru.md index f82266b..ee0e7b3 100644 --- a/docs/FUNCTIONAL_ru.md +++ b/docs/FUNCTIONAL_ru.md @@ -175,3 +175,11 @@ identity, их игры) и **игры** (сводка + места), разби подключён Telegram-коннектор, оператор также может **написать пользователю** (по его Telegram-identity) или **отправить пост в игровой канал**. Изменяющие действия защищены проверкой same-origin; личность оператора не отслеживается. + +Консоль также показывает **злоупотребление лимитами** (R3): страница **Throttled** +перечисляет недавно затроттленных пользователей/IP по отчётам gateway (окно в памяти — +сбрасывается при рестарте backend) и аккаунты с действующим мягким **high-rate +флагом**. Аккаунт, устойчиво превышающий настраиваемый порог отказов, помечается +автоматически — маркер обратим, виден бейджем в списке пользователей и на карточке +аккаунта и **никогда не блокирует игру**; оператор рассматривает и снимает его с +карточки пользователя. Автоматического бана нет. diff --git a/docs/TESTING.md b/docs/TESTING.md index 2319a8e..02dc510 100644 --- a/docs/TESTING.md +++ b/docs/TESTING.md @@ -76,7 +76,14 @@ tests or touching CI. unsubscribe), the transcode round-trips (FlatBuffers↔JSON, X-User-ID forwarding, nested GameView, domain-code surfacing), the admin Basic-Auth reverse proxy (401 / forward), and a full Connect `Execute` path end to end - (guest auth, unauthenticated rejection, unknown message type). The backend gains + (guest auth, unauthenticated rejection, unknown message type). **R3** adds the + edge-hardening cases: an oversized `Execute` payload is refused + (`resource_exhausted`, the `GATEWAY_MAX_BODY_BYTES` cap), a limiter rejection + lands in `gateway_rate_limited_total{class}` and the rejection tracker + (drain/aggregate unit tests), the report POST reaches + `/api/v1/internal/ratelimit/report` with the agreed JSON shape, the `/_gm` + mount is 429-guarded by the per-IP admin class, and the gateway's `/` + 308-redirects to `/app/` (the landing left the embed). The backend gains the **guest** lifecycle (a guest plays an auto-match to a natural end yet accrues no statistics) and the **email-as-login** flow (request/verify, returning user) in `inttest`. Stage 8 adds gateway transcode round-trips for the new social/account @@ -92,7 +99,12 @@ tests or touching CI. 404 when not). Postgres-backed `inttest` drives the **complaint resolution → dictionary-change pipeline** (file → resolve with a disposition → pending change → mark applied), the admin **list/count** read queries, and the **/_gm console over HTTP** - (pages render; a resolve POST needs a same-origin header). + (pages render; a resolve POST needs a same-origin header). **R3** adds `ratewatch` + unit tests (window accumulation, the auto-flag threshold + expiry, the bounded + episode map), the account-store **high-rate flag round-trip** (set-once / clear / + re-flag) and a console flow in `inttest`: a gateway report auto-flags the account, + the **Throttled** page shows the episode and the flagged queue, the user card + carries the marker and the CSRF-guarded **Clear** reverses it. - **Observability & performance** *(Stage 12)* — `pkg/telemetry` unit-tests the exporter selection (`none`/`stdout`/`otlp` build providers; OTLP constructs with no collector; the nil-runtime fallback). The domain metrics are exercised through a manual