R3: gateway edge hardening — body cap, h2c sizing, rate-limit observability

- GATEWAY_MAX_BODY_BYTES (1 MiB): connect WithReadMaxBytes + http.MaxBytesReader
  on the public mux; explicit http2.Server MaxConcurrentStreams/IdleTimeout and
  an http.Server ReadHeaderTimeout (R2 report follow-up).
- gateway_rate_limited_total{class} counter, Debug per rejection, a rejection
  tracker drained every 30 s into a Warn summary per key and a report POST to
  /api/v1/internal/ratelimit/report (feeds the admin view + auto-flag).
- The dead AdminPerMinute/AdminBurst policy now guards the /_gm mount (429),
  ahead of its Basic-Auth.
- resolve() logs the cause of infra session-resolve failures at Warn (the
  transient unauthenticated dips from the R2 run); unknown tokens stay silent.
This commit is contained in:
Ilia Denisov
2026-06-10 01:58:48 +02:00
parent c23ac94c4e
commit 8878711cf3
12 changed files with 549 additions and 35 deletions
+11 -2
View File
@@ -23,7 +23,7 @@ proto/edge/v1/ # Connect envelope contract (committed generated Go)
internal/config/ # GATEWAY_* env config
internal/backendclient/ # typed REST client (+ X-User-ID) and push gRPC client
internal/session/ # in-memory session cache (LRU/TTL, backend fallback)
internal/ratelimit/ # token-bucket limiter (golang.org/x/time/rate)
internal/ratelimit/ # token-bucket limiter (golang.org/x/time/rate) + the rejection tracker (R3)
internal/connector/ # gRPC client to the Telegram connector (initData validate, out-of-app push) + routing
internal/push/ # live-event fan-out hub (per-user client streams)
internal/transcode/ # FlatBuffers<->REST bridge + message_type registry
@@ -79,12 +79,21 @@ connector (`ValidateLoginWidget`) and forward the trusted `external_id`. These
| `GATEWAY_SESSION_TTL` | `10m` | cached session lifetime |
| `GATEWAY_SESSION_CACHE_MAX` | `50000` | cached session cap |
| `GATEWAY_PUSH_HEARTBEAT_INTERVAL` | `10s` | live-stream keep-alive (an immediate heartbeat also fires on open, under the ~15s edge idle timeout) |
| `GATEWAY_MAX_BODY_BYTES` | `1048576` | caps one request body and one Connect message read; an oversized Execute is refused with `resource_exhausted` (R3) |
| `GATEWAY_SERVICE_NAME` | `scrabble-gateway` | OpenTelemetry `service.name` |
| `GATEWAY_OTEL_TRACES_EXPORTER` | `none` | `none`, `stdout` or `otlp` (gRPC; endpoint from `OTEL_EXPORTER_OTLP_*`) |
| `GATEWAY_OTEL_METRICS_EXPORTER` | `none` | `none`, `stdout` or `otlp` |
Rate-limit defaults (built-in): public 30/min·IP (burst 10), authenticated
120/min·user (burst 40), admin 60/min·IP (burst 20), email-code 5/10 min·IP.
300/min·user (burst 80, raised in Stage 17 for multi-device play), admin
60/min·IP (burst 20, guarding the `/_gm` mount ahead of its Basic-Auth),
email-code 5/10 min·IP (burst 2).
Every rejection increments `gateway_rate_limited_total{class}`
(`user`/`public`/`email`/`admin`) and logs one Debug line; a reporter drains the
per-key rejection tracker every 30 s, emits a Warn summary per throttled key and
posts the report to the backend (`/api/v1/internal/ratelimit/report`), feeding
the admin console's throttled view and the high-rate auto-flag (R3).
## Run