Files
galaxy-game/gateway/PLAN.md
T
2026-04-02 19:18:42 +02:00

547 lines
12 KiB
Markdown

# Edge Gateway Implementation Plan
## Summary
This plan breaks implementation into small, reviewable phases.
Each phase has a single primary goal, clear deliverables, explicit dependencies,
acceptance criteria, and focused tests.
The intended v1 architecture is:
- unauthenticated public ingress over REST/JSON;
- authenticated ingress over gRPC on HTTP/2;
- FlatBuffers payloads for authenticated business commands;
- protobuf-based gRPC control envelopes;
- authenticated server-streaming push through gRPC;
- separate public traffic classes and isolated anti-abuse counters.
## Assumptions and Defaults
- `message_type` is the stable downstream routing key.
- `protocol_version` covers transport and envelope compatibility, not business
payload schema compatibility.
- FlatBuffers are used for business payload bytes only.
- Phase 3 public auth uses a challenge-token REST flow:
`send-email-code(email) -> challenge_id` and
`confirm-email-code(challenge_id, code, client_public_key) -> device_session_id`.
- Phase 3 uses a consumer-side `AuthServiceClient` inside `gateway`; the
default process wiring keeps public auth routes mounted and returns
`503 service_unavailable` until a concrete upstream adapter is added.
- Browser bootstrap and asset traffic are within gateway scope, even when backed
by a pluggable proxy or handler.
- Long-polling is out of scope for v1.
## ~~Phase 1.~~ Module Skeleton
Status: implemented.
Goal: create the runnable gateway process skeleton.
Artifacts:
- `cmd/gateway`
- `internal/app`
- base configuration types
- startup and shutdown wiring
Dependencies: none.
Acceptance criteria:
- the process starts with config;
- the process shuts down cleanly on signal;
- lifecycle wiring is testable.
Targeted tests:
- startup with valid config;
- shutdown without leaked goroutines.
## ~~Phase 2.~~ Public REST Server
Status: implemented.
Goal: add the unauthenticated HTTP server shell.
Artifacts:
- public REST listener
- `GET /healthz`
- `GET /readyz`
- base error serialization
- request classification hook
Dependencies: Phase 1.
Acceptance criteria:
- health endpoints respond deterministically;
- public requests are classified at least into `public_auth` and `browser_*`.
Targeted tests:
- health endpoint responses;
- request classification smoke tests.
## ~~Phase 3.~~ Public Auth REST Handlers
Status: implemented.
Goal: expose unauthenticated auth commands through REST/JSON.
Artifacts:
- `POST /api/v1/public/auth/send-email-code`
- `POST /api/v1/public/auth/confirm-email-code`
- request and response DTOs
- adapter calls into `AuthServiceClient`
Dependencies: Phase 2.
Acceptance criteria:
- no session authentication is required for these routes;
- handlers delegate only through the auth service adapter.
Targeted tests:
- success and validation errors for both routes;
- no session lookup on public auth paths.
## ~~Phase 4.~~ Public Traffic Classification
Status: implemented.
Goal: isolate public traffic into stable anti-abuse classes.
Artifacts:
- `PublicTrafficClassifier`
- classes `public_auth`, `browser_bootstrap`, `browser_asset`, `public_misc`
- isolated rate-limit bucket keys
Dependencies: Phase 2.
Acceptance criteria:
- browser traffic does not share buckets with public auth;
- auth counters remain unaffected by asset bursts.
Targeted tests:
- per-class routing tests;
- bucket isolation tests.
## ~~Phase 5.~~ Public REST Anti-Abuse
Status: implemented.
Goal: add coarse protection to unauthenticated REST traffic.
Artifacts:
- body size limits
- method allow-lists
- malformed request counters
- per-class rate-limit thresholds
Dependencies: Phase 4.
Acceptance criteria:
- first-load browser bursts are not marked hostile because of burst pattern
alone;
- malformed or oversized requests are rejected predictably.
Targeted tests:
- bootstrap burst stays outside auth abuse counters;
- invalid methods and oversized bodies are rejected.
## ~~Phase 6.~~ gRPC Server and Public Contracts
Status: implemented.
Goal: bring up authenticated transport over gRPC and HTTP/2.
Artifacts:
- gRPC listener
- protobuf service definitions
- `ExecuteCommand`
- `SubscribeEvents`
Dependencies: Phase 1.
Acceptance criteria:
- unary and server-streaming RPCs are reachable;
- the server runs only over HTTP/2.
Targeted tests:
- unary transport smoke test;
- stream transport smoke test.
## ~~Phase 7.~~ Envelope Parsing and Protocol Gate
Status: implemented.
Goal: validate the gRPC control envelope before security checks continue.
Artifacts:
- envelope parser
- required-field validation
- protocol version gate
Dependencies: Phase 6.
Acceptance criteria:
- unsupported or malformed envelopes are rejected before routing.
Targeted tests:
- missing field rejection;
- unsupported `protocol_version` rejection.
## ~~Phase 8.~~ Session Cache Lookup
Status: implemented.
Goal: resolve authenticated identity from cache.
Artifacts:
- `SessionCache`
- session lookup pipeline
- revoked versus active session handling
Dependencies: Phase 7.
Acceptance criteria:
- unknown and revoked sessions are blocked before signature verification.
Targeted tests:
- cache hit with active session;
- cache miss reject;
- revoked session reject.
## ~~Phase 9.~~ Payload Hash and Signing Input
Status: implemented.
Goal: verify payload integrity before signature verification.
Artifacts:
- `payload_hash` verification
- canonical signing input builder
Dependencies: Phase 8.
Acceptance criteria:
- changing payload bytes or envelope fields breaks the signing input.
Targeted tests:
- payload hash mismatch reject;
- canonical bytes differ when signed fields change.
## ~~Phase 10.~~ Client Signature Verification
Status: implemented.
Goal: authenticate the request origin using the session public key.
Artifacts:
- signature verifier
- deterministic auth reject mapping
Dependencies: Phase 9.
Acceptance criteria:
- wrong key and invalid signature produce stable rejects.
Targeted tests:
- success case with valid signature;
- bad signature reject;
- wrong-key reject.
## ~~Phase 11.~~ Freshness and Anti-Replay
Status: implemented.
Goal: enforce transport freshness and replay protection.
Artifacts:
- timestamp freshness window
- `ReplayStore`
- replay reservation and rejection logic
Dependencies: Phase 10.
Acceptance criteria:
- stale requests and duplicate `request_id` values are rejected.
Targeted tests:
- stale timestamp reject;
- replay reject for same session and request ID;
- distinct sessions do not collide.
## ~~Phase 12.~~ Authenticated Rate Limits and Policy
Status: implemented.
Goal: apply edge policy after transport authenticity is established.
Artifacts:
- rate-limit keys for IP, session, user, and message class
- authenticated policy evaluation hook
Dependencies: Phase 11.
Acceptance criteria:
- authenticated buckets are independent from public REST buckets.
Targeted tests:
- per-dimension throttling;
- bucket isolation from public traffic.
## ~~Phase 13.~~ Internal Authenticated Command and Routing
Status: implemented.
Note: delivered together with Phase 14 signed unary responses.
Goal: forward only verified context to downstream services.
Artifacts:
- `AuthenticatedCommand`
- `DownstreamRouter`
- `DownstreamClient`
Dependencies: Phase 12.
Acceptance criteria:
- downstream services receive verified context only;
- raw transport details do not leak as authoritative input.
Targeted tests:
- route selection by `message_type`;
- downstream receives the expected authenticated context.
## ~~Phase 14.~~ Signed Unary Responses
Status: implemented as part of Phase 13 delivery.
Goal: return verifiable server responses to authenticated clients.
Artifacts:
- response envelope builder
- payload hash generation
- `ResponseSigner`
Dependencies: Phase 13.
Acceptance criteria:
- unary responses always carry the original `request_id`, `payload_hash`, and
server signature.
Targeted tests:
- response correlation test;
- server signature generation test.
## ~~Phase 15.~~ Session Update and Revocation Events
Status: implemented.
Goal: keep gateway session state current without synchronous hot-path lookups.
Artifacts:
- `EventSubscriber`
- session update handlers
- session revoke handlers
Dependencies: Phase 8.
Acceptance criteria:
- session updates change gateway behavior without per-request sync calls to the
auth service.
Targeted tests:
- cache update from event;
- revocation event invalidates cached session.
## ~~Phase 16.~~ Authenticated Push Stream
Status: implemented.
Goal: open a verified server-streaming channel for client-facing delivery.
Artifacts:
- `SubscribeEvents` handler
- stream binding to `user_id` and `device_session_id`
- initial server time event
Dependencies: Phase 15.
Acceptance criteria:
- the stream opens only after the full auth pipeline succeeds.
Targeted tests:
- authorized stream open;
- rejected stream open for invalid session;
- first event contains server time.
## ~~Phase 17.~~ Event Fan-Out
Status: implemented.
Goal: deliver client-facing events from internal pub/sub to active streams.
Artifacts:
- `PushHub`
- event fan-out logic
- user and session targeting rules
Dependencies: Phase 16.
Acceptance criteria:
- events are delivered to the correct active streams only.
Targeted tests:
- single-session delivery;
- multi-device delivery for one user;
- unrelated sessions do not receive the event.
## ~~Phase 18.~~ Revocation-Driven Stream Teardown
Status: implemented.
Goal: terminate active delivery channels when a session is revoked.
Artifacts:
- stream teardown on revoke
- connection cleanup logic
Dependencies: Phase 17.
Acceptance criteria:
- revocation blocks new unary requests and closes active streams for the same
session.
Targeted tests:
- revoke closes active stream;
- revoked session cannot reopen the stream.
## ~~Phase 19.~~ Observability and Shutdown Hardening
Status: implemented.
Note: delivered with `zap` structured logging, OpenTelemetry tracing and
metrics, the optional private admin `/metrics` listener, timeout budgets, and
shutdown-driven push-stream teardown.
Goal: make the service operable in production.
Artifacts:
- structured logs
- metrics
- trace propagation
- timeout budgets
- graceful shutdown for unary and streaming traffic
Dependencies: Phase 18.
Acceptance criteria:
- shutdown is deterministic;
- logs and metrics expose stable edge outcomes without leaking secrets.
Targeted tests:
- shutdown closes listeners and active streams;
- secret and signature values are not logged.
## ~~Phase 20.~~ Acceptance Pass
Status: implemented.
Note: acceptance pass reconciled README/OpenAPI/root architecture
documentation, fixed the documented public-auth projected-error contract, and
added focused regression coverage including OpenAPI validation.
Goal: reconcile implementation, documentation, and regression coverage.
Artifacts:
- updated README and PLAN
- final protocol and interface review
- focused regression test run
Dependencies: Phases 1 through 19.
Acceptance criteria:
- implementation matches documented contracts and ordering guarantees;
- docs describe the actual gateway behavior.
Targeted tests:
- run focused package tests for gateway packages;
- rerun cross-cutting regression scenarios.
## Cross-Cutting Regression Scenarios
- `send_email_code` and `confirm_email_code` are available without session auth
and are still limited by public auth policy.
- Public browser bootstrap and asset bursts do not increase auth abuse counters
and are not rejected as hostile because of intensity alone.
- Any gRPC command without a valid session is rejected before routing.
- Unknown and revoked sessions are handled predictably and consistently where
policy requires identical behavior.
- Signature verification fails when `payload_bytes`, `payload_hash`,
`message_type`, `request_id`, or the signing key changes.
- `payload_hash` is verified before downstream execution.
- Requests outside the freshness window are rejected.
- Reused `request_id` values are rejected within the session replay window.
- Public REST and authenticated gRPC traffic use independent buckets and
independent abuse telemetry.
- Downstream services receive `AuthenticatedCommand`, not raw REST or gRPC
transport requests.
- Unary responses preserve `request_id` correlation and are server-signed.
- Streaming connections open only after the auth pipeline and close on revoke.
- Session cache updates from events change gateway behavior without synchronous
auth-service lookups per request.
- Graceful shutdown terminates unary and streaming traffic cleanly.