# Edge Gateway Implementation Plan This plan has been already implemented and stays here for historical reasons. It should NOT be threated as source of truth for service functionality. ## Summary This plan breaks implementation into small, reviewable phases. Each phase has a single primary goal, clear deliverables, explicit dependencies, acceptance criteria, and focused tests. The intended v1 architecture is: - unauthenticated public ingress over REST/JSON; - authenticated ingress over gRPC on HTTP/2; - FlatBuffers payloads for authenticated business commands; - protobuf-based gRPC control envelopes; - authenticated server-streaming push through gRPC; - separate public traffic classes and isolated anti-abuse counters. ## Assumptions and Defaults - `message_type` is the stable downstream routing key. - `protocol_version` covers transport and envelope compatibility, not business payload schema compatibility. - FlatBuffers are used for business payload bytes only. - Phase 3 public auth uses a challenge-token REST flow: `send-email-code(email) -> challenge_id` and `confirm-email-code(challenge_id, code, client_public_key) -> device_session_id`. - Phase 3 uses a consumer-side `AuthServiceClient` inside `gateway`; the default process wiring keeps public auth routes mounted and returns `503 service_unavailable` until a concrete upstream adapter is added. - Browser bootstrap and asset traffic are within gateway scope, even when backed by a pluggable proxy or handler. - Long-polling is out of scope for v1. ## ~~Phase 1.~~ Module Skeleton Status: implemented. Goal: create the runnable gateway process skeleton. Artifacts: - `cmd/gateway` - `internal/app` - base configuration types - startup and shutdown wiring Dependencies: none. Acceptance criteria: - the process starts with config; - the process shuts down cleanly on signal; - lifecycle wiring is testable. Targeted tests: - startup with valid config; - shutdown without leaked goroutines. ## ~~Phase 2.~~ Public REST Server Status: implemented. Goal: add the unauthenticated HTTP server shell. Artifacts: - public REST listener - `GET /healthz` - `GET /readyz` - base error serialization - request classification hook Dependencies: Phase 1. Acceptance criteria: - health endpoints respond deterministically; - public requests are classified at least into `public_auth` and `browser_*`. Targeted tests: - health endpoint responses; - request classification smoke tests. ## ~~Phase 3.~~ Public Auth REST Handlers Status: implemented. Goal: expose unauthenticated auth commands through REST/JSON. Artifacts: - `POST /api/v1/public/auth/send-email-code` - `POST /api/v1/public/auth/confirm-email-code` - request and response DTOs - adapter calls into `AuthServiceClient` Dependencies: Phase 2. Acceptance criteria: - no session authentication is required for these routes; - handlers delegate only through the auth service adapter. Targeted tests: - success and validation errors for both routes; - no session lookup on public auth paths. ## ~~Phase 4.~~ Public Traffic Classification Status: implemented. Goal: isolate public traffic into stable anti-abuse classes. Artifacts: - `PublicTrafficClassifier` - classes `public_auth`, `browser_bootstrap`, `browser_asset`, `public_misc` - isolated rate-limit bucket keys Dependencies: Phase 2. Acceptance criteria: - browser traffic does not share buckets with public auth; - auth counters remain unaffected by asset bursts. Targeted tests: - per-class routing tests; - bucket isolation tests. ## ~~Phase 5.~~ Public REST Anti-Abuse Status: implemented. Goal: add coarse protection to unauthenticated REST traffic. Artifacts: - body size limits - method allow-lists - malformed request counters - per-class rate-limit thresholds Dependencies: Phase 4. Acceptance criteria: - first-load browser bursts are not marked hostile because of burst pattern alone; - malformed or oversized requests are rejected predictably. Targeted tests: - bootstrap burst stays outside auth abuse counters; - invalid methods and oversized bodies are rejected. ## ~~Phase 6.~~ gRPC Server and Public Contracts Status: implemented. Goal: bring up authenticated transport over gRPC and HTTP/2. Artifacts: - gRPC listener - protobuf service definitions - `ExecuteCommand` - `SubscribeEvents` Dependencies: Phase 1. Acceptance criteria: - unary and server-streaming RPCs are reachable; - the server runs only over HTTP/2. Targeted tests: - unary transport smoke test; - stream transport smoke test. ## ~~Phase 7.~~ Envelope Parsing and Protocol Gate Status: implemented. Goal: validate the gRPC control envelope before security checks continue. Artifacts: - envelope parser - required-field validation - protocol version gate Dependencies: Phase 6. Acceptance criteria: - unsupported or malformed envelopes are rejected before routing. Targeted tests: - missing field rejection; - unsupported `protocol_version` rejection. ## ~~Phase 8.~~ Session Cache Lookup Status: implemented. Goal: resolve authenticated identity from cache. Artifacts: - `SessionCache` - session lookup pipeline - revoked versus active session handling Dependencies: Phase 7. Acceptance criteria: - unknown and revoked sessions are blocked before signature verification. Targeted tests: - cache hit with active session; - cache miss reject; - revoked session reject. ## ~~Phase 9.~~ Payload Hash and Signing Input Status: implemented. Goal: verify payload integrity before signature verification. Artifacts: - `payload_hash` verification - canonical signing input builder Dependencies: Phase 8. Acceptance criteria: - changing payload bytes or envelope fields breaks the signing input. Targeted tests: - payload hash mismatch reject; - canonical bytes differ when signed fields change. ## ~~Phase 10.~~ Client Signature Verification Status: implemented. Goal: authenticate the request origin using the session public key. Artifacts: - signature verifier - deterministic auth reject mapping Dependencies: Phase 9. Acceptance criteria: - wrong key and invalid signature produce stable rejects. Targeted tests: - success case with valid signature; - bad signature reject; - wrong-key reject. ## ~~Phase 11.~~ Freshness and Anti-Replay Status: implemented. Goal: enforce transport freshness and replay protection. Artifacts: - timestamp freshness window - `ReplayStore` - replay reservation and rejection logic Dependencies: Phase 10. Acceptance criteria: - stale requests and duplicate `request_id` values are rejected. Targeted tests: - stale timestamp reject; - replay reject for same session and request ID; - distinct sessions do not collide. ## ~~Phase 12.~~ Authenticated Rate Limits and Policy Status: implemented. Goal: apply edge policy after transport authenticity is established. Artifacts: - rate-limit keys for IP, session, user, and message class - authenticated policy evaluation hook Dependencies: Phase 11. Acceptance criteria: - authenticated buckets are independent from public REST buckets. Targeted tests: - per-dimension throttling; - bucket isolation from public traffic. ## ~~Phase 13.~~ Internal Authenticated Command and Routing Status: implemented. Note: delivered together with Phase 14 signed unary responses. Goal: forward only verified context to downstream services. Artifacts: - `AuthenticatedCommand` - `DownstreamRouter` - `DownstreamClient` Dependencies: Phase 12. Acceptance criteria: - downstream services receive verified context only; - raw transport details do not leak as authoritative input. Targeted tests: - route selection by `message_type`; - downstream receives the expected authenticated context. ## ~~Phase 14.~~ Signed Unary Responses Status: implemented as part of Phase 13 delivery. Goal: return verifiable server responses to authenticated clients. Artifacts: - response envelope builder - payload hash generation - `ResponseSigner` Dependencies: Phase 13. Acceptance criteria: - unary responses always carry the original `request_id`, `payload_hash`, and server signature. Targeted tests: - response correlation test; - server signature generation test. ## ~~Phase 15.~~ Session Update and Revocation Events Status: implemented. Goal: keep gateway session state current without synchronous hot-path lookups. Artifacts: - `EventSubscriber` - session update handlers - session revoke handlers Dependencies: Phase 8. Acceptance criteria: - session updates change gateway behavior without per-request sync calls to the auth service. Targeted tests: - cache update from event; - revocation event invalidates cached session. ## ~~Phase 16.~~ Authenticated Push Stream Status: implemented. Goal: open a verified server-streaming channel for client-facing delivery. Artifacts: - `SubscribeEvents` handler - stream binding to `user_id` and `device_session_id` - initial server time event Dependencies: Phase 15. Acceptance criteria: - the stream opens only after the full auth pipeline succeeds. Targeted tests: - authorized stream open; - rejected stream open for invalid session; - first event contains server time. ## ~~Phase 17.~~ Event Fan-Out Status: implemented. Goal: deliver client-facing events from internal pub/sub to active streams. Artifacts: - `PushHub` - event fan-out logic - user and session targeting rules Dependencies: Phase 16. Acceptance criteria: - events are delivered to the correct active streams only. Targeted tests: - single-session delivery; - multi-device delivery for one user; - unrelated sessions do not receive the event. ## ~~Phase 18.~~ Revocation-Driven Stream Teardown Status: implemented. Goal: terminate active delivery channels when a session is revoked. Artifacts: - stream teardown on revoke - connection cleanup logic Dependencies: Phase 17. Acceptance criteria: - revocation blocks new unary requests and closes active streams for the same session. Targeted tests: - revoke closes active stream; - revoked session cannot reopen the stream. ## ~~Phase 19.~~ Observability and Shutdown Hardening Status: implemented. Note: delivered with `zap` structured logging, OpenTelemetry tracing and metrics, the optional private admin `/metrics` listener, timeout budgets, and shutdown-driven push-stream teardown. Goal: make the service operable in production. Artifacts: - structured logs - metrics - trace propagation - timeout budgets - graceful shutdown for unary and streaming traffic Dependencies: Phase 18. Acceptance criteria: - shutdown is deterministic; - logs and metrics expose stable edge outcomes without leaking secrets. Targeted tests: - shutdown closes listeners and active streams; - secret and signature values are not logged. ## ~~Phase 20.~~ Acceptance Pass Status: implemented. Note: acceptance pass reconciled README/OpenAPI/root architecture documentation, fixed the documented public-auth projected-error contract, and added focused regression coverage including OpenAPI validation. Goal: reconcile implementation, documentation, and regression coverage. Artifacts: - updated README and PLAN - final protocol and interface review - focused regression test run Dependencies: Phases 1 through 19. Acceptance criteria: - implementation matches documented contracts and ordering guarantees; - docs describe the actual gateway behavior. Targeted tests: - run focused package tests for gateway packages; - rerun cross-cutting regression scenarios. ## Cross-Cutting Regression Scenarios - `send_email_code` and `confirm_email_code` are available without session auth and are still limited by public auth policy. - Public browser bootstrap and asset bursts do not increase auth abuse counters and are not rejected as hostile because of intensity alone. - Any gRPC command without a valid session is rejected before routing. - Unknown and revoked sessions are handled predictably and consistently where policy requires identical behavior. - Signature verification fails when `payload_bytes`, `payload_hash`, `message_type`, `request_id`, or the signing key changes. - `payload_hash` is verified before downstream execution. - Requests outside the freshness window are rejected. - Reused `request_id` values are rejected within the session replay window. - Public REST and authenticated gRPC traffic use independent buckets and independent abuse telemetry. - Downstream services receive `AuthenticatedCommand`, not raw REST or gRPC transport requests. - Unary responses preserve `request_id` correlation and are server-signed. - Streaming connections open only after the auth pipeline and close on revoke. - Session cache updates from events change gateway behavior without synchronous auth-service lookups per request. - Graceful shutdown terminates unary and streaming traffic cleanly.