feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive #17

Merged
developer merged 1 commits from feature/subscribe-events-heartbeat into development 2026-05-19 07:35:35 +00:00
Owner

Summary

  • Server-side silence-based heartbeat on the authenticated push stream. Wraps the bound stream in heartbeatingStream after the signed gateway.server_time bootstrap; emits a minimal unsigned gateway.heartbeat event when no real Send happened for GATEWAY_PUSH_HEARTBEAT_INTERVAL (default 15s, 0s disables). Every real event resets the silence timer.
  • Fixes the intermittent Safari Fetch API cannot load … due to access control checks console error (misleading WebKit symptom — CORS headers are present) and the underlying lost-push problem during the reconnect window. Caddy flush_interval -1 experiment earlier was inconclusive and reverted.
  • New gateway.push.heartbeats_sent{outcome} counter so bandwidth and transport-failure spikes are visible in Prometheus.
  • Client (events.svelte.ts): early continue on gateway.heartbeat before signature verification + dispatch, while still flipping connectionStatusconnected on the first heartbeat.
  • docs/ARCHITECTURE.md §15 documents the unsigned envelope + per-scale wire-cost projection (100…100 000 clients × 15…60 s). FUNCTIONAL.md + ru mirror updated. gateway/README.md notes the env var + metric.

Test plan

  • go test ./gateway/... — full suite passes (Go 1.26 timer Reset patterns covered)
  • npx vitest run events.test.ts — heartbeat skipped before verify, leading heartbeat still flips connected
  • npm run check — svelte-check 0 errors / 0 warnings
  • Gitea go-unit + ui-test green on this branch (run #199 + #200; the first go-unit miss was the unrelated TestPlanetRandomName flake at ~1/10000)
  • Manual on dev-deploy: Safari https://www.galaxy.lan — SubscribeEvents stream stays open across lobby↔game navigation; no CORS console error; turn-ready / mail-received toasts land without a refresh
## Summary - Server-side silence-based heartbeat on the authenticated push stream. Wraps the bound stream in `heartbeatingStream` after the signed `gateway.server_time` bootstrap; emits a minimal unsigned `gateway.heartbeat` event when no real Send happened for `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`, `0s` disables). Every real event resets the silence timer. - Fixes the intermittent Safari `Fetch API cannot load … due to access control checks` console error (misleading WebKit symptom — CORS headers are present) and the underlying lost-push problem during the reconnect window. Caddy `flush_interval -1` experiment earlier was inconclusive and reverted. - New `gateway.push.heartbeats_sent{outcome}` counter so bandwidth and transport-failure spikes are visible in Prometheus. - Client (`events.svelte.ts`): early `continue` on `gateway.heartbeat` before signature verification + dispatch, while still flipping `connectionStatus` → `connected` on the first heartbeat. - `docs/ARCHITECTURE.md §15` documents the unsigned envelope + per-scale wire-cost projection (100…100 000 clients × 15…60 s). FUNCTIONAL.md + ru mirror updated. gateway/README.md notes the env var + metric. ## Test plan - [x] `go test ./gateway/...` — full suite passes (Go 1.26 timer Reset patterns covered) - [x] `npx vitest run events.test.ts` — heartbeat skipped before verify, leading heartbeat still flips `connected` - [x] `npm run check` — svelte-check 0 errors / 0 warnings - [x] Gitea go-unit + ui-test green on this branch (run #199 + #200; the first go-unit miss was the unrelated `TestPlanetRandomName` flake at ~1/10000) - [ ] Manual on dev-deploy: Safari `https://www.galaxy.lan` — SubscribeEvents stream stays open across lobby↔game navigation; no CORS console error; turn-ready / mail-received toasts land without a refresh
developer added 1 commit 2026-05-19 07:35:29 +00:00
feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive
Tests · UI / test (push) Successful in 2m35s
Tests · Go / test (push) Successful in 1m56s
Tests · UI / test (pull_request) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 2m0s
14b65389ef
Browser fetch-streaming layers close response bodies they consider
idle after roughly 15-30 s without incoming bytes. Safari is the
most aggressive, but the symptom matters everywhere: a quiet
SubscribeEvents stream (lobby, between turns, mailbox empty) gets
torn down by the browser, the EventStream singleton reconnects with
backoff, and any push event that fires inside the reconnect window
is lost because `push.Hub` queues are not persisted across
subscription closes. The user-visible failure mode is the
intermittent "Fetch API cannot load … due to access control checks"
console error (a misleading WebKit symptom — CORS headers are
actually present) plus missed turn-ready / mail-received toasts.

Server-side fix: a silence-based heartbeat at the
`authenticatedPushStreamService` wrapper layer. After the signed
`gateway.server_time` bootstrap event, gateway wraps the bound
stream with `heartbeatingStream`. Every tail Send (fan-out, future
variants) resets the silence timer; when the timer elapses, a
goroutine emits `gateway.heartbeat` with only `EventType` set —
everything else stays at proto3 defaults, so the wire frame is
~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine
with tail Sends because grpc.ServerStream.Send is not goroutine-safe.

The heartbeat is intentionally UNSIGNED: heartbeats carry no
payload, dispatch to no handler on the client, and an injected
heartbeat trivially causes no user-visible state change. TLS still
protects the wire and real events keep the signed envelope
unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the
per-scale bandwidth projection (100…100 000 clients × 15…60 s).

Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`,
`0s` disables). Telemetry: new
`gateway.push.heartbeats_sent{outcome}` counter so operators can
budget bandwidth and spot a sudden `outcome=error` bump as an
upstream-failing-before-flush signal.

Client (`ui/frontend/src/api/events.svelte.ts`): early `continue`
on `event.eventType === "gateway.heartbeat"` before `verifyEvent`,
`verifyPayloadHash`, or dispatch — empty signature would otherwise
trip SignatureError and reconnect. A leading heartbeat still flips
`connectionStatus` to `connected` and resets backoff, because
receiving one is proof the stream is healthy.

Tests:
- `push_heartbeat_test.go`: unit tests for the wrapper — zero
  interval returns nil, heartbeat fires after silence, real Send
  resets the timer, Stop / context-cancel halt the goroutine,
  Send errors propagate.
- `server_test.go`: integration tests through the full gateway
  pipeline — heartbeat fires after the configured silence window,
  zero interval keeps the stream silent.
- `config_test.go`: default applied, env-override parsed,
  negative value rejected.
- `events.test.ts`: heartbeat skipped before verification + not
  dispatched to handlers; leading heartbeat still flips
  `connectionStatus` to `connected`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
developer merged commit 8170abd5fa into development 2026-05-19 07:35:35 +00:00
developer deleted branch feature/subscribe-events-heartbeat 2026-05-19 07:35:35 +00:00
Sign in to join this conversation.