R3: dashboards, docs and tracker bake-back
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 36s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m7s

- Edge/UX dashboard: aggregate request-rate vs rejection-rate panel
  (gateway_rate_limited_total by class; no per-user labels).
- ARCHITECTURE §2/§11/§12/§13: body cap + explicit h2c sizing, the rate-limit
  observability pipeline and auto-flag policy, the admin-limiter note (and the
  caddy-path gap), the landing container topology; fixed the stale 120/min
  per-user figure.
- FUNCTIONAL (+_ru): the Throttled view and the reversible high-rate flag.
- gateway/backend/deploy READMEs, TESTING.md, root CLAUDE.md updated.
- PRERELEASE.md: R3 interview decisions + implementation refinements logged;
  tracker R3 -> done (this PR implements it; CI gates the merge).
This commit is contained in:
Ilia Denisov
2026-06-10 05:12:17 +02:00
parent f20a4b49ff
commit 7e75c32d07
10 changed files with 144 additions and 29 deletions
+29 -1
View File
@@ -19,7 +19,7 @@ the edge before prod. Each phase maps back to the owner's raw pre-release TODO l
|---|-------|-----------|--------|
| R1 | Schema & naming reset | 1 + 10 | **done** |
| R2 | Stress harness + contour observability + early run | 9a | **done** |
| R3 | Edge hardening | 2 + 8 + 3 | todo |
| R3 | Edge hardening | 2 + 8 + 3 | **done** |
| R4 | Push enrichment + kill the last poll | 4 + 5 | todo |
| R5 | Bundle slimming | 6 | todo |
| R6 | Refactor + docs reconciliation + de-staging | 7 | todo |
@@ -253,3 +253,31 @@ Then Stage 18.
it feeds R3 (h2c `MaxConcurrentStreams`/timeouts, body-size cap), R6 and R7 (per-player transports,
separate hardware, pool/limit sizing).
- **CI:** `./loadtest/...` added to the path filter + vet/build/test; `go.work.sum` carries the new deps.
- **R3** (interview + implementation):
- **Locked decisions:** the flag column lands by **editing the R1 baseline** (+ a contour schema
wipe after merge — no migration chain accrues before prod); auto-flag defaults **1000 rejected /
10 min** (`BACKEND_HIGHRATE_FLAG_THRESHOLD`/`_WINDOW`, rolling window, set-once, operator clears,
no auto-ban); landing image = **caddy:2-alpine**; throttle data flows **gateway → backend** (a
30 s per-key summary POST to the new `/api/v1/internal/ratelimit/report`, the existing trusted
direction) with the episode window + flag rule in the backend (`internal/ratewatch`); rejection
logging = **Warn summary per key per window + Debug per rejection** — a deliberate deviation from
the phase's "structured log per rejection" (the R2 hammer would have logged ~522k lines in
minutes); all three R2-report tails included (explicit h2c sizing, the session-resolve failure
cause at Warn, reviving the admin limiter).
- **Body cap:** `GATEWAY_MAX_BODY_BYTES` (default 1 MiB) as both the Connect per-message read limit
and an `http.MaxBytesReader` wrap of the public mux; an oversized Execute is `resource_exhausted`.
- **Dead config found:** `AdminPerMinute`/`AdminBurst` were never wired — the gateway `/_gm` mount is
now 429-guarded per IP ahead of its Basic-Auth. The caddy-fronted contour path stays unlimited
(stock caddy has no limiter) — an accepted gap, recorded in `docs/ARCHITECTURE.md` §12.
- **Landing split:** a `landing` target in `gateway/Dockerfile` (the UI build stage is shared;
identical compose build args keep it one cached build); the gateway drops `landing.html` from the
embed and 308-redirects `/``/app/`; the contour caddy routes `/app/`, `/telegram/` and the
Connect path to the gateway and the catch-all to the landing container; the CI deploy probe now
checks both `/` (landing) and `/app/` (gateway).
- **Observability:** `gateway_rate_limited_total{class}` (user/public/email/admin, aggregate-only)
+ a rate-vs-rejections panel on the Edge/UX dashboard; the admin console gains the **Throttled**
page (the in-memory episode window, reset-on-restart like `active_users`, plus the flagged-account
queue) and the flag badge / clear action on the user list / card.
- The jet regen also restored the previously missing `game_drafts`/`game_hidden` generated models
(their tables were added after the last jetgen run; no behaviour change).