R3: edge hardening — body cap, rate-limit observability, auto-flag, landing split #34

Merged
developer merged 4 commits from feature/r3-edge-hardening into development 2026-06-10 03:38:52 +00:00
Owner

Phase R3 of PRERELEASE.md (TODO 2 + 8 + 3), feeding from the R2 trip report.

Gateway

  • GATEWAY_MAX_BODY_BYTES (1 MiB): Connect per-message read limit + http.MaxBytesReader on the public mux; oversized Execute -> resource_exhausted.
  • Explicit h2c sizing (MaxConcurrentStreams 250, IdleTimeout 3m) + ReadHeaderTimeout 10s; values re-checked in R7.
  • gateway_rate_limited_total{class} + Debug per rejection; a 30 s reporter drains the per-key tracker into a Warn summary and POST /api/v1/internal/ratelimit/report.
  • The dead AdminPerMinute/AdminBurst policy now 429-guards the /_gm mount ahead of its Basic-Auth; session-resolve infra failures log their cause at Warn (the R2 0.04% unauthenticated).

Backend

  • accounts.flagged_high_rate_at baked into the R1 baseline (the contour schema must be wiped after mergeDROP SCHEMA backend CASCADE + restart, the R1 procedure); jet regenerated (also restores the missing game_drafts/game_hidden models).
  • New internal/ratewatch: bounded in-memory episode window + the conservative auto-flag (1000 rejected / 10 min, BACKEND_HIGHRATE_FLAG_*, set-once, operator clears, no auto-ban).
  • Admin console: Throttled page (episodes + flagged queue), the high-rate badge in the user list, the marker + Clear action on the user card (CSRF-guarded).

Landing split

  • landing target in gateway/Dockerfile (caddy:2-alpine + the shared Vite build); the gateway drops landing.html from the embed and 308-redirects / -> /app/; the contour caddy routes /app/, /telegram/ + the Connect path to the gateway and the catch-all to the landing container; the CI probe checks both / and /app/.

Observability/docs: Edge/UX dashboard panel (rate vs rejections by class); ARCHITECTURE par.2/11/12/13, FUNCTIONAL(+_ru), TESTING, READMEs, PRERELEASE refinements + tracker (R3 done).

Verified locally: gofmt/vet/build clean; 34 Go packages unit-green; the full integration suite green; UI check/test/build green; gateway+landing+backend images build; the landing container serves / (no-cache, junk-path fallback) and the gateway image redirects / -> /app/ with no landing content.

After merge: wipe the contour schema, then probe /, /app/, /_gm/throttled; optionally a short loadtest hammer run to watch the metric, the Warn summaries, the auto-flag and the operator clear end to end.

Phase R3 of `PRERELEASE.md` (TODO 2 + 8 + 3), feeding from the R2 trip report. **Gateway** - `GATEWAY_MAX_BODY_BYTES` (1 MiB): Connect per-message read limit + `http.MaxBytesReader` on the public mux; oversized `Execute` -> `resource_exhausted`. - Explicit h2c sizing (`MaxConcurrentStreams` 250, `IdleTimeout` 3m) + `ReadHeaderTimeout` 10s; values re-checked in R7. - `gateway_rate_limited_total{class}` + Debug per rejection; a 30 s reporter drains the per-key tracker into a Warn summary and `POST /api/v1/internal/ratelimit/report`. - The dead `AdminPerMinute/AdminBurst` policy now 429-guards the `/_gm` mount ahead of its Basic-Auth; session-resolve infra failures log their cause at Warn (the R2 0.04% `unauthenticated`). **Backend** - `accounts.flagged_high_rate_at` baked into the R1 baseline (**the contour schema must be wiped after merge** — `DROP SCHEMA backend CASCADE` + restart, the R1 procedure); jet regenerated (also restores the missing `game_drafts`/`game_hidden` models). - New `internal/ratewatch`: bounded in-memory episode window + the conservative auto-flag (1000 rejected / 10 min, `BACKEND_HIGHRATE_FLAG_*`, set-once, operator clears, no auto-ban). - Admin console: **Throttled** page (episodes + flagged queue), the high-rate badge in the user list, the marker + Clear action on the user card (CSRF-guarded). **Landing split** - `landing` target in `gateway/Dockerfile` (caddy:2-alpine + the shared Vite build); the gateway drops `landing.html` from the embed and 308-redirects `/` -> `/app/`; the contour caddy routes `/app/`, `/telegram/` + the Connect path to the gateway and the catch-all to the landing container; the CI probe checks both `/` and `/app/`. **Observability/docs**: Edge/UX dashboard panel (rate vs rejections by class); ARCHITECTURE par.2/11/12/13, FUNCTIONAL(+_ru), TESTING, READMEs, PRERELEASE refinements + tracker (R3 done). **Verified locally**: gofmt/vet/build clean; 34 Go packages unit-green; the full integration suite green; UI check/test/build green; gateway+landing+backend images build; the landing container serves `/` (no-cache, junk-path fallback) and the gateway image redirects `/` -> `/app/` with no landing content. **After merge**: wipe the contour schema, then probe `/`, `/app/`, `/_gm/throttled`; optionally a short `loadtest` hammer run to watch the metric, the Warn summaries, the auto-flag and the operator clear end to end.
developer added 4 commits 2026-06-10 03:14:12 +00:00
- GATEWAY_MAX_BODY_BYTES (1 MiB): connect WithReadMaxBytes + http.MaxBytesReader
  on the public mux; explicit http2.Server MaxConcurrentStreams/IdleTimeout and
  an http.Server ReadHeaderTimeout (R2 report follow-up).
- gateway_rate_limited_total{class} counter, Debug per rejection, a rejection
  tracker drained every 30 s into a Warn summary per key and a report POST to
  /api/v1/internal/ratelimit/report (feeds the admin view + auto-flag).
- The dead AdminPerMinute/AdminBurst policy now guards the /_gm mount (429),
  ahead of its Basic-Auth.
- resolve() logs the cause of infra session-resolve failures at Warn (the
  transient unauthenticated dips from the R2 run); unknown tokens stay silent.
- accounts.flagged_high_rate_at baked into the R1 baseline (no prod data; the
  contour schema is wiped after merge); jet regenerated — the regen also picks
  up the previously missing game_drafts/game_hidden models.
- account.Store: FlagHighRate (set-once), ClearHighRateFlag, the flag in
  GetByID/ListUsers and a ListFlaggedHighRate review queue.
- New internal/ratewatch: ingests the gateway rejection reports, keeps a
  bounded in-memory episode window for the console and applies the
  conservative auto-flag (1000 rejected / 10 min, BACKEND_HIGHRATE_FLAG_*).
- POST /api/v1/internal/ratelimit/report (network-trusted, like
  sessions/resolve).
- Admin console: Throttled page (episodes + flagged accounts), a high-rate
  badge in the user list, the marker + operator clear action on the user card.
- Tests: ratewatch unit suite, report-route handler test, renderer cases,
  integration coverage for the store round-trip and the console flow.
- gateway/Dockerfile gains a `landing` target: caddy:2-alpine + the shared
  Vite build (identical build args keep the ui stage a single cached build);
  the gateway target drops landing.html from the embed.
- The contour caddy routes /app/, /telegram/ and the Connect path to the
  gateway; the catch-all — the landing at / and any stray path — goes to the
  new landing service, so junk traffic is absorbed by static file serving.
- deploy/landing/Caddyfile mirrors the webui caching (immutable assets,
  no-cache shells) and falls back unknown paths to the landing shell.
- The gateway's / now 308-redirects to /app/ (keeps a local no-caddy run
  usable); webui placeholder landing.html removed.
- CI deploy probe checks both / (landing) and /app/ (gateway).

Verified: both images build; the landing container serves landing.html at /
(no-cache) with junk-path fallback; the gateway image redirects / to /app/
and carries no landing content.
R3: dashboards, docs and tracker bake-back
CI / changes (pull_request) Successful in 1s
CI / unit (pull_request) Successful in 8s
CI / integration (pull_request) Successful in 12s
CI / ui (pull_request) Successful in 36s
CI / gate (pull_request) Successful in 0s
CI / deploy (pull_request) Successful in 1m7s
7e75c32d07
- Edge/UX dashboard: aggregate request-rate vs rejection-rate panel
  (gateway_rate_limited_total by class; no per-user labels).
- ARCHITECTURE §2/§11/§12/§13: body cap + explicit h2c sizing, the rate-limit
  observability pipeline and auto-flag policy, the admin-limiter note (and the
  caddy-path gap), the landing container topology; fixed the stale 120/min
  per-user figure.
- FUNCTIONAL (+_ru): the Throttled view and the reversible high-rate flag.
- gateway/backend/deploy READMEs, TESTING.md, root CLAUDE.md updated.
- PRERELEASE.md: R3 interview decisions + implementation refinements logged;
  tracker R3 -> done (this PR implements it; CI gates the merge).
owner approved these changes 2026-06-10 03:37:59 +00:00
developer merged commit e3b08461f0 into development 2026-06-10 03:38:52 +00:00
developer deleted branch feature/r3-edge-hardening 2026-06-10 03:38:52 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: developer/scrabble-game#34