Commit Graph

9 Commits

Author SHA1 Message Date
Ilia Denisov 91e34a0929 fix(gateway): verify client signature before payload_hash
Tests · Go / test (push) Successful in 2m1s
Tests · Go / test (pull_request) Successful in 2m58s
Tests · Integration / integration (pull_request) Successful in 1m39s
ARCHITECTURE.md §15 "Verification order" specifies signature verification
(step 4) before payload_hash (step 5), but the authenticated-edge
decorator chain wrapped the payload-hash gate outside the signature gate,
so the hash was checked first. gateway/README.md and gateway/docs/flows.md
had drifted to match the code (hash-first), leaving ARCHITECTURE.md as the
lone source describing the intended order.

Swap the two decorators in server.go so the signature gate runs first, and
align README + flows.md to ARCHITECTURE.md. Signature-first is the
cryptographically sound order: the signature covers the payload_hash field,
so the request is authenticated before any of its content is processed.

Observable side effect: a request carrying a tampered payload_hash whose
signature was computed over the original hash is now rejected at the
signature gate (UNAUTHENTICATED "invalid request signature") instead of the
hash gate (INVALID_ARGUMENT). Security is unchanged — both refusals happen
before the payload is handled. The four payload-hash unit tests re-sign
over the tampered hash so they keep exercising the hash gate; the
cross-service integration test signs over the overridden hash and already
accepts both codes.

Refs #39

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 02:42:09 +02:00
Ilia Denisov 8565942392 feat(deploy): single-origin path-based deployment + project site
Build · Site / build (push) Successful in 8s
Tests · Go / test (push) Successful in 2m22s
Tests · UI / test (push) Failing after 2m42s
Serve the whole stack behind one host: site at /, game UI at /game/,
gateway REST at /api + /healthz, Connect at /rpc (prefix stripped by the
edge Caddy). The built artifact is domain-agnostic — the UI talks to the
gateway same-origin via relative URLs, so the same bundle runs under any
host with no rebuild and with CORS disabled.

- Rename the Connect proto service galaxy.gateway.v1.EdgeGateway ->
  edge.v1.Gateway; regenerate Go + TS; public path /rpc/edge.v1.Gateway.
- Move the game UI under base path /game (env BASE_PATH); make the
  manifest, service-worker scope, WASM loader, and all navigation
  base-aware via a withBase helper.
- Relative API + /rpc Connect prefix; Vite dev proxy mirrors the strip.
- Rewrite the edge Caddy (dev + prod) for path-based routing; empty CORS
  allow-lists (same-origin); single host.
- New VitePress project site (site/): i18n en/ru with switcher, LaTeX
  math, minimal monospace theme; built and served at /.
- dev-deploy compose/Makefile + CI (dev-deploy, prod-build, new
  site-build) build and seed the site; probes hit /, /game/, /healthz.
- Sync docs (ARCHITECTURE, gateway README/openapi, dev-deploy &
  local-dev READMEs, CLAUDE.md, ui/PLAN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:19:07 +02:00
Ilia Denisov 14b65389ef feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive
Tests · UI / test (push) Successful in 2m35s
Tests · Go / test (push) Successful in 1m56s
Tests · UI / test (pull_request) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 2m0s
Browser fetch-streaming layers close response bodies they consider
idle after roughly 15-30 s without incoming bytes. Safari is the
most aggressive, but the symptom matters everywhere: a quiet
SubscribeEvents stream (lobby, between turns, mailbox empty) gets
torn down by the browser, the EventStream singleton reconnects with
backoff, and any push event that fires inside the reconnect window
is lost because `push.Hub` queues are not persisted across
subscription closes. The user-visible failure mode is the
intermittent "Fetch API cannot load … due to access control checks"
console error (a misleading WebKit symptom — CORS headers are
actually present) plus missed turn-ready / mail-received toasts.

Server-side fix: a silence-based heartbeat at the
`authenticatedPushStreamService` wrapper layer. After the signed
`gateway.server_time` bootstrap event, gateway wraps the bound
stream with `heartbeatingStream`. Every tail Send (fan-out, future
variants) resets the silence timer; when the timer elapses, a
goroutine emits `gateway.heartbeat` with only `EventType` set —
everything else stays at proto3 defaults, so the wire frame is
~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine
with tail Sends because grpc.ServerStream.Send is not goroutine-safe.

The heartbeat is intentionally UNSIGNED: heartbeats carry no
payload, dispatch to no handler on the client, and an injected
heartbeat trivially causes no user-visible state change. TLS still
protects the wire and real events keep the signed envelope
unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the
per-scale bandwidth projection (100…100 000 clients × 15…60 s).

Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`,
`0s` disables). Telemetry: new
`gateway.push.heartbeats_sent{outcome}` counter so operators can
budget bandwidth and spot a sudden `outcome=error` bump as an
upstream-failing-before-flush signal.

Client (`ui/frontend/src/api/events.svelte.ts`): early `continue`
on `event.eventType === "gateway.heartbeat"` before `verifyEvent`,
`verifyPayloadHash`, or dispatch — empty signature would otherwise
trip SignatureError and reconnect. A leading heartbeat still flips
`connectionStatus` to `connected` and resets backoff, because
receiving one is proof the stream is healthy.

Tests:
- `push_heartbeat_test.go`: unit tests for the wrapper — zero
  interval returns nil, heartbeat fires after silence, real Send
  resets the timer, Stop / context-cancel halt the goroutine,
  Send errors propagate.
- `server_test.go`: integration tests through the full gateway
  pipeline — heartbeat fires after the configured silence window,
  zero interval keeps the stream silent.
- `config_test.go`: default applied, env-override parsed,
  negative value rejected.
- `events.test.ts`: heartbeat skipped before verification + not
  dispatched to handlers; leading heartbeat still flips
  `connectionStatus` to `connected`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:29:29 +02:00
Ilia Denisov 57e6c1d253 gateway: CORS allow-list for the authenticated Connect-Web surface
Tests · Go / test (push) Successful in 2m9s
Tests · Go / test (pull_request) Successful in 2m9s
Tests · Integration / integration (pull_request) Successful in 1m47s
Tests · UI / test (pull_request) Successful in 2m52s
The public REST listener already exposes
`GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS`; the authenticated
Connect-Web listener on the separate gRPC port had no equivalent.
That worked in `tools/local-dev` (Vite proxy makes everything
same-origin) and would work in production once UI and gateway share
a single hostname, but the long-lived dev environment serves the
UI from `https://www.galaxy.lan` and the gateway from
`https://api.galaxy.lan` — every `/galaxy.gateway.v1.EdgeGateway/*`
fetch failed in the browser with the WebKit "Load failed" generic
message because the response carried no `Access-Control-Allow-Origin`
header. Lobby rendered as "[unknown] Load failed" with no game.

Mirror the public-REST CORS surface for the authenticated handler:

- new env `GATEWAY_AUTHENTICATED_GRPC_CORS_ALLOWED_ORIGINS`;
- new `AuthenticatedGRPCConfig.CORSAllowedOrigins` field;
- new `grpcapi.withCORS` middleware wrapping the Connect mux;
- dev-deploy stack sets the env to `https://www.galaxy.lan`.

The middleware speaks plain net/http (the Connect handler is mounted
on a ServeMux, not gin), handles preflight 204 immediately, and
exposes the Connect-Web header set the browser needs to read the
response (`Grpc-Status`, `Grpc-Message`, `Connect-Protocol-Version`).
Empty allow-list disables the middleware — production stays at
"single hostname" by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:15:11 +02:00
Ilia Denisov 118f7c17a2 phase 4: connectrpc on the gateway authenticated edge
Replace the native-gRPC server bootstrap with a single
`connectrpc.com/connect` HTTP/h2c listener. Connect-Go natively
serves Connect, gRPC, and gRPC-Web on the same port, so browsers can
now reach the authenticated surface without giving up the gRPC
framing native and desktop clients may use later. The decorator
stack (envelope → session → payload-hash → signature →
freshness/replay → rate-limit → routing/push) is reused unchanged
behind a small Connect → gRPC adapter and a `grpc.ServerStream`
shim around `*connect.ServerStream`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:49:28 +02:00
Ilia Denisov 604fe40bcf docs: reorder & testing 2026-05-07 00:58:53 +03:00
Ilia Denisov f446c6a2ac feat: backend service 2026-05-06 10:14:55 +03:00
IliaDenisov 9065b82fe2 chore: sync testing plan with gateway 2026-04-09 12:34:55 +02:00
Ilia Denisov 436c97a38b feat: edge gateway service 2026-04-02 19:18:42 +02:00