Commit Graph

11 Commits

Author SHA1 Message Date
Ilia Denisov 27916bbe61 feat(admin-console): Stage 1 — pipe + skeleton behind the gateway
Tests · Go / test (push) Successful in 2m0s
Add the server-rendered operator console at /_gm, exposed publicly through
the gateway behind the existing admin_accounts Basic Auth.

Backend:
- new internal/adminconsole package (html/template Renderer, stateless HMAC
  CSRF signer, embedded stylesheet)
- /_gm route group reusing basicauth.Middleware(admin.Service) + a CSRF guard
  (per-operator token + same-origin check); dashboard landing page
- BACKEND_ADMIN_CONSOLE_CSRF_KEY config (per-process random fallback)

Gateway:
- new "admin" public route class (per-IP rate limit, body + GET/HEAD/POST
  method limits) classifying /_gm traffic
- reverse proxy to the backend /_gm surface, preserving Host and relaying the
  backend 401 Basic Auth challenge; 502 when the backend is unreachable
- GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_ADMIN_* config

dev-deploy:
- Caddy routes /_gm/* to the gateway
- bootstrap admin + stable CSRF key; enable Prometheus /metrics exporters on
  backend and gateway (forward-compat for a future Prometheus/Grafana stack)

Docs: ARCHITECTURE 14.1/16, FUNCTIONAL 10.2.1 (+ru mirror), backend and
gateway READMEs, new backend/docs/admin-console.md.

Tests: renderer + CSRF unit tests; backend router auth/render/asset/CSRF;
gateway classifier, proxy forwarding/Host/401/405/413/429/502.
2026-05-31 19:50:15 +02:00
Ilia Denisov 14b65389ef feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive
Tests · UI / test (push) Successful in 2m35s
Tests · Go / test (push) Successful in 1m56s
Tests · UI / test (pull_request) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 2m0s
Browser fetch-streaming layers close response bodies they consider
idle after roughly 15-30 s without incoming bytes. Safari is the
most aggressive, but the symptom matters everywhere: a quiet
SubscribeEvents stream (lobby, between turns, mailbox empty) gets
torn down by the browser, the EventStream singleton reconnects with
backoff, and any push event that fires inside the reconnect window
is lost because `push.Hub` queues are not persisted across
subscription closes. The user-visible failure mode is the
intermittent "Fetch API cannot load … due to access control checks"
console error (a misleading WebKit symptom — CORS headers are
actually present) plus missed turn-ready / mail-received toasts.

Server-side fix: a silence-based heartbeat at the
`authenticatedPushStreamService` wrapper layer. After the signed
`gateway.server_time` bootstrap event, gateway wraps the bound
stream with `heartbeatingStream`. Every tail Send (fan-out, future
variants) resets the silence timer; when the timer elapses, a
goroutine emits `gateway.heartbeat` with only `EventType` set —
everything else stays at proto3 defaults, so the wire frame is
~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine
with tail Sends because grpc.ServerStream.Send is not goroutine-safe.

The heartbeat is intentionally UNSIGNED: heartbeats carry no
payload, dispatch to no handler on the client, and an injected
heartbeat trivially causes no user-visible state change. TLS still
protects the wire and real events keep the signed envelope
unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the
per-scale bandwidth projection (100…100 000 clients × 15…60 s).

Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`,
`0s` disables). Telemetry: new
`gateway.push.heartbeats_sent{outcome}` counter so operators can
budget bandwidth and spot a sudden `outcome=error` bump as an
upstream-failing-before-flush signal.

Client (`ui/frontend/src/api/events.svelte.ts`): early `continue`
on `event.eventType === "gateway.heartbeat"` before `verifyEvent`,
`verifyPayloadHash`, or dispatch — empty signature would otherwise
trip SignatureError and reconnect. A leading heartbeat still flips
`connectionStatus` to `connected` and resets backoff, because
receiving one is proof the stream is healthy.

Tests:
- `push_heartbeat_test.go`: unit tests for the wrapper — zero
  interval returns nil, heartbeat fires after silence, real Send
  resets the timer, Stop / context-cancel halt the goroutine,
  Send errors propagate.
- `server_test.go`: integration tests through the full gateway
  pipeline — heartbeat fires after the configured silence window,
  zero interval keeps the stream silent.
- `config_test.go`: default applied, env-override parsed,
  negative value rejected.
- `events.test.ts`: heartbeat skipped before verification + not
  dispatched to handlers; leading heartbeat still flips
  `connectionStatus` to `connected`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:29:29 +02:00
Ilia Denisov 57e6c1d253 gateway: CORS allow-list for the authenticated Connect-Web surface
Tests · Go / test (push) Successful in 2m9s
Tests · Go / test (pull_request) Successful in 2m9s
Tests · Integration / integration (pull_request) Successful in 1m47s
Tests · UI / test (pull_request) Successful in 2m52s
The public REST listener already exposes
`GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS`; the authenticated
Connect-Web listener on the separate gRPC port had no equivalent.
That worked in `tools/local-dev` (Vite proxy makes everything
same-origin) and would work in production once UI and gateway share
a single hostname, but the long-lived dev environment serves the
UI from `https://www.galaxy.lan` and the gateway from
`https://api.galaxy.lan` — every `/galaxy.gateway.v1.EdgeGateway/*`
fetch failed in the browser with the WebKit "Load failed" generic
message because the response carried no `Access-Control-Allow-Origin`
header. Lobby rendered as "[unknown] Load failed" with no game.

Mirror the public-REST CORS surface for the authenticated handler:

- new env `GATEWAY_AUTHENTICATED_GRPC_CORS_ALLOWED_ORIGINS`;
- new `AuthenticatedGRPCConfig.CORSAllowedOrigins` field;
- new `grpcapi.withCORS` middleware wrapping the Connect mux;
- dev-deploy stack sets the env to `https://www.galaxy.lan`.

The middleware speaks plain net/http (the Connect handler is mounted
on a ServeMux, not gin), handles preflight 204 immediately, and
exposes the Connect-Web header set the browser needs to read the
response (`Grpc-Status`, `Grpc-Message`, `Connect-Protocol-Version`).
Empty allow-list disables the middleware — production stays at
"single hostname" by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:15:11 +02:00
Ilia Denisov 1855e43699 gateway: add CORS allow-list for the public REST surface
Tests · Go / test (push) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 1m45s
Tests · Integration / integration (pull_request) Successful in 1m36s
Adds a `GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS` env-driven allow-list
on the public REST server so the dev UI on https://www.galaxy.lan can
call https://api.galaxy.lan without the browser blocking the
cross-origin response. Defaults to empty (no CORS) so the production
posture stays closed.

The middleware mounts before route classification and anti-abuse, so
OPTIONS preflights never charge against per-class rate-limit buckets.

`tools/dev-deploy/docker-compose.yml` opts the dev gateway into a
single allowed origin (`https://www.galaxy.lan`); local-dev keeps the
defaults because Vite proxies through the same origin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:58:14 +02:00
Ilia Denisov 604fe40bcf docs: reorder & testing 2026-05-07 00:58:53 +03:00
Ilia Denisov f446c6a2ac feat: backend service 2026-05-06 10:14:55 +03:00
Ilia Denisov a7cee15115 feat: runtime manager 2026-04-28 20:39:18 +02:00
Ilia Denisov fe829285a6 feat: use postgres 2026-04-26 20:34:39 +02:00
Ilia Denisov 23ffcb7535 feat: user service 2026-04-10 19:05:02 +02:00
IliaDenisov 1c8e0ca48e tests: integration suite 2026-04-09 15:27:14 +02:00
Ilia Denisov 436c97a38b feat: edge gateway service 2026-04-02 19:18:42 +02:00