Files
galaxy-game/gateway/PLAN.md
T
2026-04-08 16:23:07 +02:00

13 KiB

Edge Gateway Implementation Plan

This plan has been already implemented and stays here for historical reasons.

It should NOT be threated as source of truth for service functionality.

Summary

This plan breaks implementation into small, reviewable phases. Each phase has a single primary goal, clear deliverables, explicit dependencies, acceptance criteria, and focused tests.

The intended v1 architecture is:

  • unauthenticated public ingress over REST/JSON;
  • authenticated ingress over gRPC on HTTP/2;
  • FlatBuffers payloads for authenticated business commands;
  • protobuf-based gRPC control envelopes;
  • authenticated server-streaming push through gRPC;
  • separate public traffic classes and isolated anti-abuse counters.

Assumptions and Defaults

  • message_type is the stable downstream routing key.
  • protocol_version covers transport and envelope compatibility, not business payload schema compatibility.
  • FlatBuffers are used for business payload bytes only.
  • Phase 3 public auth uses a challenge-token REST flow: send-email-code(email) -> challenge_id and confirm-email-code(challenge_id, code, client_public_key) -> device_session_id.
  • Phase 3 uses a consumer-side AuthServiceClient inside gateway; the default process wiring keeps public auth routes mounted and returns 503 service_unavailable until a concrete upstream adapter is added.
  • Browser bootstrap and asset traffic are within gateway scope, even when backed by a pluggable proxy or handler.
  • Long-polling is out of scope for v1.

Phase 1. Module Skeleton

Status: implemented.

Goal: create the runnable gateway process skeleton.

Artifacts:

  • cmd/gateway
  • internal/app
  • base configuration types
  • startup and shutdown wiring

Dependencies: none.

Acceptance criteria:

  • the process starts with config;
  • the process shuts down cleanly on signal;
  • lifecycle wiring is testable.

Targeted tests:

  • startup with valid config;
  • shutdown without leaked goroutines.

Phase 2. Public REST Server

Status: implemented.

Goal: add the unauthenticated HTTP server shell.

Artifacts:

  • public REST listener
  • GET /healthz
  • GET /readyz
  • base error serialization
  • request classification hook

Dependencies: Phase 1.

Acceptance criteria:

  • health endpoints respond deterministically;
  • public requests are classified at least into public_auth and browser_*.

Targeted tests:

  • health endpoint responses;
  • request classification smoke tests.

Phase 3. Public Auth REST Handlers

Status: implemented.

Goal: expose unauthenticated auth commands through REST/JSON.

Artifacts:

  • POST /api/v1/public/auth/send-email-code
  • POST /api/v1/public/auth/confirm-email-code
  • request and response DTOs
  • adapter calls into AuthServiceClient

Dependencies: Phase 2.

Acceptance criteria:

  • no session authentication is required for these routes;
  • handlers delegate only through the auth service adapter.

Targeted tests:

  • success and validation errors for both routes;
  • no session lookup on public auth paths.

Phase 4. Public Traffic Classification

Status: implemented.

Goal: isolate public traffic into stable anti-abuse classes.

Artifacts:

  • PublicTrafficClassifier
  • classes public_auth, browser_bootstrap, browser_asset, public_misc
  • isolated rate-limit bucket keys

Dependencies: Phase 2.

Acceptance criteria:

  • browser traffic does not share buckets with public auth;
  • auth counters remain unaffected by asset bursts.

Targeted tests:

  • per-class routing tests;
  • bucket isolation tests.

Phase 5. Public REST Anti-Abuse

Status: implemented.

Goal: add coarse protection to unauthenticated REST traffic.

Artifacts:

  • body size limits
  • method allow-lists
  • malformed request counters
  • per-class rate-limit thresholds

Dependencies: Phase 4.

Acceptance criteria:

  • first-load browser bursts are not marked hostile because of burst pattern alone;
  • malformed or oversized requests are rejected predictably.

Targeted tests:

  • bootstrap burst stays outside auth abuse counters;
  • invalid methods and oversized bodies are rejected.

Phase 6. gRPC Server and Public Contracts

Status: implemented.

Goal: bring up authenticated transport over gRPC and HTTP/2.

Artifacts:

  • gRPC listener
  • protobuf service definitions
  • ExecuteCommand
  • SubscribeEvents

Dependencies: Phase 1.

Acceptance criteria:

  • unary and server-streaming RPCs are reachable;
  • the server runs only over HTTP/2.

Targeted tests:

  • unary transport smoke test;
  • stream transport smoke test.

Phase 7. Envelope Parsing and Protocol Gate

Status: implemented.

Goal: validate the gRPC control envelope before security checks continue.

Artifacts:

  • envelope parser
  • required-field validation
  • protocol version gate

Dependencies: Phase 6.

Acceptance criteria:

  • unsupported or malformed envelopes are rejected before routing.

Targeted tests:

  • missing field rejection;
  • unsupported protocol_version rejection.

Phase 8. Session Cache Lookup

Status: implemented.

Goal: resolve authenticated identity from cache.

Artifacts:

  • SessionCache
  • session lookup pipeline
  • revoked versus active session handling

Dependencies: Phase 7.

Acceptance criteria:

  • unknown and revoked sessions are blocked before signature verification.

Targeted tests:

  • cache hit with active session;
  • cache miss reject;
  • revoked session reject.

Phase 9. Payload Hash and Signing Input

Status: implemented.

Goal: verify payload integrity before signature verification.

Artifacts:

  • payload_hash verification
  • canonical signing input builder

Dependencies: Phase 8.

Acceptance criteria:

  • changing payload bytes or envelope fields breaks the signing input.

Targeted tests:

  • payload hash mismatch reject;
  • canonical bytes differ when signed fields change.

Phase 10. Client Signature Verification

Status: implemented.

Goal: authenticate the request origin using the session public key.

Artifacts:

  • signature verifier
  • deterministic auth reject mapping

Dependencies: Phase 9.

Acceptance criteria:

  • wrong key and invalid signature produce stable rejects.

Targeted tests:

  • success case with valid signature;
  • bad signature reject;
  • wrong-key reject.

Phase 11. Freshness and Anti-Replay

Status: implemented.

Goal: enforce transport freshness and replay protection.

Artifacts:

  • timestamp freshness window
  • ReplayStore
  • replay reservation and rejection logic

Dependencies: Phase 10.

Acceptance criteria:

  • stale requests and duplicate request_id values are rejected.

Targeted tests:

  • stale timestamp reject;
  • replay reject for same session and request ID;
  • distinct sessions do not collide.

Phase 12. Authenticated Rate Limits and Policy

Status: implemented.

Goal: apply edge policy after transport authenticity is established.

Artifacts:

  • rate-limit keys for IP, session, user, and message class
  • authenticated policy evaluation hook

Dependencies: Phase 11.

Acceptance criteria:

  • authenticated buckets are independent from public REST buckets.

Targeted tests:

  • per-dimension throttling;
  • bucket isolation from public traffic.

Phase 13. Internal Authenticated Command and Routing

Status: implemented. Note: delivered together with Phase 14 signed unary responses.

Goal: forward only verified context to downstream services.

Artifacts:

  • AuthenticatedCommand
  • DownstreamRouter
  • DownstreamClient

Dependencies: Phase 12.

Acceptance criteria:

  • downstream services receive verified context only;
  • raw transport details do not leak as authoritative input.

Targeted tests:

  • route selection by message_type;
  • downstream receives the expected authenticated context.

Phase 14. Signed Unary Responses

Status: implemented as part of Phase 13 delivery.

Goal: return verifiable server responses to authenticated clients.

Artifacts:

  • response envelope builder
  • payload hash generation
  • ResponseSigner

Dependencies: Phase 13.

Acceptance criteria:

  • unary responses always carry the original request_id, payload_hash, and server signature.

Targeted tests:

  • response correlation test;
  • server signature generation test.

Phase 15. Session Update and Revocation Events

Status: implemented.

Goal: keep gateway session state current without synchronous hot-path lookups.

Artifacts:

  • EventSubscriber
  • session update handlers
  • session revoke handlers

Dependencies: Phase 8.

Acceptance criteria:

  • session updates change gateway behavior without per-request sync calls to the auth service.

Targeted tests:

  • cache update from event;
  • revocation event invalidates cached session.

Phase 16. Authenticated Push Stream

Status: implemented.

Goal: open a verified server-streaming channel for client-facing delivery.

Artifacts:

  • SubscribeEvents handler
  • stream binding to user_id and device_session_id
  • initial server time event

Dependencies: Phase 15.

Acceptance criteria:

  • the stream opens only after the full auth pipeline succeeds.

Targeted tests:

  • authorized stream open;
  • rejected stream open for invalid session;
  • first event contains server time.

Phase 17. Event Fan-Out

Status: implemented.

Goal: deliver client-facing events from internal pub/sub to active streams.

Artifacts:

  • PushHub
  • event fan-out logic
  • user and session targeting rules

Dependencies: Phase 16.

Acceptance criteria:

  • events are delivered to the correct active streams only.

Targeted tests:

  • single-session delivery;
  • multi-device delivery for one user;
  • unrelated sessions do not receive the event.

Phase 18. Revocation-Driven Stream Teardown

Status: implemented.

Goal: terminate active delivery channels when a session is revoked.

Artifacts:

  • stream teardown on revoke
  • connection cleanup logic

Dependencies: Phase 17.

Acceptance criteria:

  • revocation blocks new unary requests and closes active streams for the same session.

Targeted tests:

  • revoke closes active stream;
  • revoked session cannot reopen the stream.

Phase 19. Observability and Shutdown Hardening

Status: implemented. Note: delivered with zap structured logging, OpenTelemetry tracing and metrics, the optional private admin /metrics listener, timeout budgets, and shutdown-driven push-stream teardown.

Goal: make the service operable in production.

Artifacts:

  • structured logs
  • metrics
  • trace propagation
  • timeout budgets
  • graceful shutdown for unary and streaming traffic

Dependencies: Phase 18.

Acceptance criteria:

  • shutdown is deterministic;
  • logs and metrics expose stable edge outcomes without leaking secrets.

Targeted tests:

  • shutdown closes listeners and active streams;
  • secret and signature values are not logged.

Phase 20. Acceptance Pass

Status: implemented. Note: acceptance pass reconciled README/OpenAPI/root architecture documentation, fixed the documented public-auth projected-error contract, and added focused regression coverage including OpenAPI validation.

Goal: reconcile implementation, documentation, and regression coverage.

Artifacts:

  • updated README and PLAN
  • final protocol and interface review
  • focused regression test run

Dependencies: Phases 1 through 19.

Acceptance criteria:

  • implementation matches documented contracts and ordering guarantees;
  • docs describe the actual gateway behavior.

Targeted tests:

  • run focused package tests for gateway packages;
  • rerun cross-cutting regression scenarios.

Cross-Cutting Regression Scenarios

  • send_email_code and confirm_email_code are available without session auth and are still limited by public auth policy.
  • Public browser bootstrap and asset bursts do not increase auth abuse counters and are not rejected as hostile because of intensity alone.
  • Any gRPC command without a valid session is rejected before routing.
  • Unknown and revoked sessions are handled predictably and consistently where policy requires identical behavior.
  • Signature verification fails when payload_bytes, payload_hash, message_type, request_id, or the signing key changes.
  • payload_hash is verified before downstream execution.
  • Requests outside the freshness window are rejected.
  • Reused request_id values are rejected within the session replay window.
  • Public REST and authenticated gRPC traffic use independent buckets and independent abuse telemetry.
  • Downstream services receive AuthenticatedCommand, not raw REST or gRPC transport requests.
  • Unary responses preserve request_id correlation and are server-signed.
  • Streaming connections open only after the auth pipeline and close on revoke.
  • Session cache updates from events change gateway behavior without synchronous auth-service lookups per request.
  • Graceful shutdown terminates unary and streaming traffic cleanly.