feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive #17
Reference in New Issue
Block a user
Delete Branch "feature/subscribe-events-heartbeat"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
heartbeatingStreamafter the signedgateway.server_timebootstrap; emits a minimal unsignedgateway.heartbeatevent when no real Send happened forGATEWAY_PUSH_HEARTBEAT_INTERVAL(default15s,0sdisables). Every real event resets the silence timer.Fetch API cannot load … due to access control checksconsole error (misleading WebKit symptom — CORS headers are present) and the underlying lost-push problem during the reconnect window. Caddyflush_interval -1experiment earlier was inconclusive and reverted.gateway.push.heartbeats_sent{outcome}counter so bandwidth and transport-failure spikes are visible in Prometheus.events.svelte.ts): earlycontinueongateway.heartbeatbefore signature verification + dispatch, while still flippingconnectionStatus→connectedon the first heartbeat.docs/ARCHITECTURE.md §15documents the unsigned envelope + per-scale wire-cost projection (100…100 000 clients × 15…60 s). FUNCTIONAL.md + ru mirror updated. gateway/README.md notes the env var + metric.Test plan
go test ./gateway/...— full suite passes (Go 1.26 timer Reset patterns covered)npx vitest run events.test.ts— heartbeat skipped before verify, leading heartbeat still flipsconnectednpm run check— svelte-check 0 errors / 0 warningsTestPlanetRandomNameflake at ~1/10000)https://www.galaxy.lan— SubscribeEvents stream stays open across lobby↔game navigation; no CORS console error; turn-ready / mail-received toasts land without a refreshBrowser fetch-streaming layers close response bodies they consider idle after roughly 15-30 s without incoming bytes. Safari is the most aggressive, but the symptom matters everywhere: a quiet SubscribeEvents stream (lobby, between turns, mailbox empty) gets torn down by the browser, the EventStream singleton reconnects with backoff, and any push event that fires inside the reconnect window is lost because `push.Hub` queues are not persisted across subscription closes. The user-visible failure mode is the intermittent "Fetch API cannot load … due to access control checks" console error (a misleading WebKit symptom — CORS headers are actually present) plus missed turn-ready / mail-received toasts. Server-side fix: a silence-based heartbeat at the `authenticatedPushStreamService` wrapper layer. After the signed `gateway.server_time` bootstrap event, gateway wraps the bound stream with `heartbeatingStream`. Every tail Send (fan-out, future variants) resets the silence timer; when the timer elapses, a goroutine emits `gateway.heartbeat` with only `EventType` set — everything else stays at proto3 defaults, so the wire frame is ~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine with tail Sends because grpc.ServerStream.Send is not goroutine-safe. The heartbeat is intentionally UNSIGNED: heartbeats carry no payload, dispatch to no handler on the client, and an injected heartbeat trivially causes no user-visible state change. TLS still protects the wire and real events keep the signed envelope unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the per-scale bandwidth projection (100…100 000 clients × 15…60 s). Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`, `0s` disables). Telemetry: new `gateway.push.heartbeats_sent{outcome}` counter so operators can budget bandwidth and spot a sudden `outcome=error` bump as an upstream-failing-before-flush signal. Client (`ui/frontend/src/api/events.svelte.ts`): early `continue` on `event.eventType === "gateway.heartbeat"` before `verifyEvent`, `verifyPayloadHash`, or dispatch — empty signature would otherwise trip SignatureError and reconnect. A leading heartbeat still flips `connectionStatus` to `connected` and resets backoff, because receiving one is proof the stream is healthy. Tests: - `push_heartbeat_test.go`: unit tests for the wrapper — zero interval returns nil, heartbeat fires after silence, real Send resets the timer, Stop / context-cancel halt the goroutine, Send errors propagate. - `server_test.go`: integration tests through the full gateway pipeline — heartbeat fires after the configured silence window, zero interval keeps the stream silent. - `config_test.go`: default applied, env-override parsed, negative value rejected. - `events.test.ts`: heartbeat skipped before verification + not dispatched to handlers; leading heartbeat still flips `connectionStatus` to `connected`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>