553 lines
13 KiB
Markdown
553 lines
13 KiB
Markdown
# Edge Gateway Implementation Plan
|
|
|
|
This plan has been already implemented and stays here for historical reasons.
|
|
|
|
It should NOT be threated as source of truth for service functionality.
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
This plan breaks implementation into small, reviewable phases.
|
|
Each phase has a single primary goal, clear deliverables, explicit dependencies,
|
|
acceptance criteria, and focused tests.
|
|
|
|
The intended v1 architecture is:
|
|
|
|
- unauthenticated public ingress over REST/JSON;
|
|
- authenticated ingress over gRPC on HTTP/2;
|
|
- FlatBuffers payloads for authenticated business commands;
|
|
- protobuf-based gRPC control envelopes;
|
|
- authenticated server-streaming push through gRPC;
|
|
- separate public traffic classes and isolated anti-abuse counters.
|
|
|
|
## Assumptions and Defaults
|
|
|
|
- `message_type` is the stable downstream routing key.
|
|
- `protocol_version` covers transport and envelope compatibility, not business
|
|
payload schema compatibility.
|
|
- FlatBuffers are used for business payload bytes only.
|
|
- Phase 3 public auth uses a challenge-token REST flow:
|
|
`send-email-code(email) -> challenge_id` and
|
|
`confirm-email-code(challenge_id, code, client_public_key) -> device_session_id`.
|
|
- Phase 3 uses a consumer-side `AuthServiceClient` inside `gateway`; the
|
|
default process wiring keeps public auth routes mounted and returns
|
|
`503 service_unavailable` until a concrete upstream adapter is added.
|
|
- Browser bootstrap and asset traffic are within gateway scope, even when backed
|
|
by a pluggable proxy or handler.
|
|
- Long-polling is out of scope for v1.
|
|
|
|
## ~~Phase 1.~~ Module Skeleton
|
|
|
|
Status: implemented.
|
|
|
|
Goal: create the runnable gateway process skeleton.
|
|
|
|
Artifacts:
|
|
|
|
- `cmd/gateway`
|
|
- `internal/app`
|
|
- base configuration types
|
|
- startup and shutdown wiring
|
|
|
|
Dependencies: none.
|
|
|
|
Acceptance criteria:
|
|
|
|
- the process starts with config;
|
|
- the process shuts down cleanly on signal;
|
|
- lifecycle wiring is testable.
|
|
|
|
Targeted tests:
|
|
|
|
- startup with valid config;
|
|
- shutdown without leaked goroutines.
|
|
|
|
## ~~Phase 2.~~ Public REST Server
|
|
|
|
Status: implemented.
|
|
|
|
Goal: add the unauthenticated HTTP server shell.
|
|
|
|
Artifacts:
|
|
|
|
- public REST listener
|
|
- `GET /healthz`
|
|
- `GET /readyz`
|
|
- base error serialization
|
|
- request classification hook
|
|
|
|
Dependencies: Phase 1.
|
|
|
|
Acceptance criteria:
|
|
|
|
- health endpoints respond deterministically;
|
|
- public requests are classified at least into `public_auth` and `browser_*`.
|
|
|
|
Targeted tests:
|
|
|
|
- health endpoint responses;
|
|
- request classification smoke tests.
|
|
|
|
## ~~Phase 3.~~ Public Auth REST Handlers
|
|
|
|
Status: implemented.
|
|
|
|
Goal: expose unauthenticated auth commands through REST/JSON.
|
|
|
|
Artifacts:
|
|
|
|
- `POST /api/v1/public/auth/send-email-code`
|
|
- `POST /api/v1/public/auth/confirm-email-code`
|
|
- request and response DTOs
|
|
- adapter calls into `AuthServiceClient`
|
|
|
|
Dependencies: Phase 2.
|
|
|
|
Acceptance criteria:
|
|
|
|
- no session authentication is required for these routes;
|
|
- handlers delegate only through the auth service adapter.
|
|
|
|
Targeted tests:
|
|
|
|
- success and validation errors for both routes;
|
|
- no session lookup on public auth paths.
|
|
|
|
## ~~Phase 4.~~ Public Traffic Classification
|
|
|
|
Status: implemented.
|
|
|
|
Goal: isolate public traffic into stable anti-abuse classes.
|
|
|
|
Artifacts:
|
|
|
|
- `PublicTrafficClassifier`
|
|
- classes `public_auth`, `browser_bootstrap`, `browser_asset`, `public_misc`
|
|
- isolated rate-limit bucket keys
|
|
|
|
Dependencies: Phase 2.
|
|
|
|
Acceptance criteria:
|
|
|
|
- browser traffic does not share buckets with public auth;
|
|
- auth counters remain unaffected by asset bursts.
|
|
|
|
Targeted tests:
|
|
|
|
- per-class routing tests;
|
|
- bucket isolation tests.
|
|
|
|
## ~~Phase 5.~~ Public REST Anti-Abuse
|
|
|
|
Status: implemented.
|
|
|
|
Goal: add coarse protection to unauthenticated REST traffic.
|
|
|
|
Artifacts:
|
|
|
|
- body size limits
|
|
- method allow-lists
|
|
- malformed request counters
|
|
- per-class rate-limit thresholds
|
|
|
|
Dependencies: Phase 4.
|
|
|
|
Acceptance criteria:
|
|
|
|
- first-load browser bursts are not marked hostile because of burst pattern
|
|
alone;
|
|
- malformed or oversized requests are rejected predictably.
|
|
|
|
Targeted tests:
|
|
|
|
- bootstrap burst stays outside auth abuse counters;
|
|
- invalid methods and oversized bodies are rejected.
|
|
|
|
## ~~Phase 6.~~ gRPC Server and Public Contracts
|
|
|
|
Status: implemented.
|
|
|
|
Goal: bring up authenticated transport over gRPC and HTTP/2.
|
|
|
|
Artifacts:
|
|
|
|
- gRPC listener
|
|
- protobuf service definitions
|
|
- `ExecuteCommand`
|
|
- `SubscribeEvents`
|
|
|
|
Dependencies: Phase 1.
|
|
|
|
Acceptance criteria:
|
|
|
|
- unary and server-streaming RPCs are reachable;
|
|
- the server runs only over HTTP/2.
|
|
|
|
Targeted tests:
|
|
|
|
- unary transport smoke test;
|
|
- stream transport smoke test.
|
|
|
|
## ~~Phase 7.~~ Envelope Parsing and Protocol Gate
|
|
|
|
Status: implemented.
|
|
|
|
Goal: validate the gRPC control envelope before security checks continue.
|
|
|
|
Artifacts:
|
|
|
|
- envelope parser
|
|
- required-field validation
|
|
- protocol version gate
|
|
|
|
Dependencies: Phase 6.
|
|
|
|
Acceptance criteria:
|
|
|
|
- unsupported or malformed envelopes are rejected before routing.
|
|
|
|
Targeted tests:
|
|
|
|
- missing field rejection;
|
|
- unsupported `protocol_version` rejection.
|
|
|
|
## ~~Phase 8.~~ Session Cache Lookup
|
|
|
|
Status: implemented.
|
|
|
|
Goal: resolve authenticated identity from cache.
|
|
|
|
Artifacts:
|
|
|
|
- `SessionCache`
|
|
- session lookup pipeline
|
|
- revoked versus active session handling
|
|
|
|
Dependencies: Phase 7.
|
|
|
|
Acceptance criteria:
|
|
|
|
- unknown and revoked sessions are blocked before signature verification.
|
|
|
|
Targeted tests:
|
|
|
|
- cache hit with active session;
|
|
- cache miss reject;
|
|
- revoked session reject.
|
|
|
|
## ~~Phase 9.~~ Payload Hash and Signing Input
|
|
|
|
Status: implemented.
|
|
|
|
Goal: verify payload integrity before signature verification.
|
|
|
|
Artifacts:
|
|
|
|
- `payload_hash` verification
|
|
- canonical signing input builder
|
|
|
|
Dependencies: Phase 8.
|
|
|
|
Acceptance criteria:
|
|
|
|
- changing payload bytes or envelope fields breaks the signing input.
|
|
|
|
Targeted tests:
|
|
|
|
- payload hash mismatch reject;
|
|
- canonical bytes differ when signed fields change.
|
|
|
|
## ~~Phase 10.~~ Client Signature Verification
|
|
|
|
Status: implemented.
|
|
|
|
Goal: authenticate the request origin using the session public key.
|
|
|
|
Artifacts:
|
|
|
|
- signature verifier
|
|
- deterministic auth reject mapping
|
|
|
|
Dependencies: Phase 9.
|
|
|
|
Acceptance criteria:
|
|
|
|
- wrong key and invalid signature produce stable rejects.
|
|
|
|
Targeted tests:
|
|
|
|
- success case with valid signature;
|
|
- bad signature reject;
|
|
- wrong-key reject.
|
|
|
|
## ~~Phase 11.~~ Freshness and Anti-Replay
|
|
|
|
Status: implemented.
|
|
|
|
Goal: enforce transport freshness and replay protection.
|
|
|
|
Artifacts:
|
|
|
|
- timestamp freshness window
|
|
- `ReplayStore`
|
|
- replay reservation and rejection logic
|
|
|
|
Dependencies: Phase 10.
|
|
|
|
Acceptance criteria:
|
|
|
|
- stale requests and duplicate `request_id` values are rejected.
|
|
|
|
Targeted tests:
|
|
|
|
- stale timestamp reject;
|
|
- replay reject for same session and request ID;
|
|
- distinct sessions do not collide.
|
|
|
|
## ~~Phase 12.~~ Authenticated Rate Limits and Policy
|
|
|
|
Status: implemented.
|
|
|
|
Goal: apply edge policy after transport authenticity is established.
|
|
|
|
Artifacts:
|
|
|
|
- rate-limit keys for IP, session, user, and message class
|
|
- authenticated policy evaluation hook
|
|
|
|
Dependencies: Phase 11.
|
|
|
|
Acceptance criteria:
|
|
|
|
- authenticated buckets are independent from public REST buckets.
|
|
|
|
Targeted tests:
|
|
|
|
- per-dimension throttling;
|
|
- bucket isolation from public traffic.
|
|
|
|
## ~~Phase 13.~~ Internal Authenticated Command and Routing
|
|
|
|
Status: implemented.
|
|
Note: delivered together with Phase 14 signed unary responses.
|
|
|
|
Goal: forward only verified context to downstream services.
|
|
|
|
Artifacts:
|
|
|
|
- `AuthenticatedCommand`
|
|
- `DownstreamRouter`
|
|
- `DownstreamClient`
|
|
|
|
Dependencies: Phase 12.
|
|
|
|
Acceptance criteria:
|
|
|
|
- downstream services receive verified context only;
|
|
- raw transport details do not leak as authoritative input.
|
|
|
|
Targeted tests:
|
|
|
|
- route selection by `message_type`;
|
|
- downstream receives the expected authenticated context.
|
|
|
|
## ~~Phase 14.~~ Signed Unary Responses
|
|
|
|
Status: implemented as part of Phase 13 delivery.
|
|
|
|
Goal: return verifiable server responses to authenticated clients.
|
|
|
|
Artifacts:
|
|
|
|
- response envelope builder
|
|
- payload hash generation
|
|
- `ResponseSigner`
|
|
|
|
Dependencies: Phase 13.
|
|
|
|
Acceptance criteria:
|
|
|
|
- unary responses always carry the original `request_id`, `payload_hash`, and
|
|
server signature.
|
|
|
|
Targeted tests:
|
|
|
|
- response correlation test;
|
|
- server signature generation test.
|
|
|
|
## ~~Phase 15.~~ Session Update and Revocation Events
|
|
|
|
Status: implemented.
|
|
|
|
Goal: keep gateway session state current without synchronous hot-path lookups.
|
|
|
|
Artifacts:
|
|
|
|
- `EventSubscriber`
|
|
- session update handlers
|
|
- session revoke handlers
|
|
|
|
Dependencies: Phase 8.
|
|
|
|
Acceptance criteria:
|
|
|
|
- session updates change gateway behavior without per-request sync calls to the
|
|
auth service.
|
|
|
|
Targeted tests:
|
|
|
|
- cache update from event;
|
|
- revocation event invalidates cached session.
|
|
|
|
## ~~Phase 16.~~ Authenticated Push Stream
|
|
|
|
Status: implemented.
|
|
|
|
Goal: open a verified server-streaming channel for client-facing delivery.
|
|
|
|
Artifacts:
|
|
|
|
- `SubscribeEvents` handler
|
|
- stream binding to `user_id` and `device_session_id`
|
|
- initial server time event
|
|
|
|
Dependencies: Phase 15.
|
|
|
|
Acceptance criteria:
|
|
|
|
- the stream opens only after the full auth pipeline succeeds.
|
|
|
|
Targeted tests:
|
|
|
|
- authorized stream open;
|
|
- rejected stream open for invalid session;
|
|
- first event contains server time.
|
|
|
|
## ~~Phase 17.~~ Event Fan-Out
|
|
|
|
Status: implemented.
|
|
|
|
Goal: deliver client-facing events from internal pub/sub to active streams.
|
|
|
|
Artifacts:
|
|
|
|
- `PushHub`
|
|
- event fan-out logic
|
|
- user and session targeting rules
|
|
|
|
Dependencies: Phase 16.
|
|
|
|
Acceptance criteria:
|
|
|
|
- events are delivered to the correct active streams only.
|
|
|
|
Targeted tests:
|
|
|
|
- single-session delivery;
|
|
- multi-device delivery for one user;
|
|
- unrelated sessions do not receive the event.
|
|
|
|
## ~~Phase 18.~~ Revocation-Driven Stream Teardown
|
|
|
|
Status: implemented.
|
|
|
|
Goal: terminate active delivery channels when a session is revoked.
|
|
|
|
Artifacts:
|
|
|
|
- stream teardown on revoke
|
|
- connection cleanup logic
|
|
|
|
Dependencies: Phase 17.
|
|
|
|
Acceptance criteria:
|
|
|
|
- revocation blocks new unary requests and closes active streams for the same
|
|
session.
|
|
|
|
Targeted tests:
|
|
|
|
- revoke closes active stream;
|
|
- revoked session cannot reopen the stream.
|
|
|
|
## ~~Phase 19.~~ Observability and Shutdown Hardening
|
|
|
|
Status: implemented.
|
|
Note: delivered with `zap` structured logging, OpenTelemetry tracing and
|
|
metrics, the optional private admin `/metrics` listener, timeout budgets, and
|
|
shutdown-driven push-stream teardown.
|
|
|
|
Goal: make the service operable in production.
|
|
|
|
Artifacts:
|
|
|
|
- structured logs
|
|
- metrics
|
|
- trace propagation
|
|
- timeout budgets
|
|
- graceful shutdown for unary and streaming traffic
|
|
|
|
Dependencies: Phase 18.
|
|
|
|
Acceptance criteria:
|
|
|
|
- shutdown is deterministic;
|
|
- logs and metrics expose stable edge outcomes without leaking secrets.
|
|
|
|
Targeted tests:
|
|
|
|
- shutdown closes listeners and active streams;
|
|
- secret and signature values are not logged.
|
|
|
|
## ~~Phase 20.~~ Acceptance Pass
|
|
|
|
Status: implemented.
|
|
Note: acceptance pass reconciled README/OpenAPI/root architecture
|
|
documentation, fixed the documented public-auth projected-error contract, and
|
|
added focused regression coverage including OpenAPI validation.
|
|
|
|
Goal: reconcile implementation, documentation, and regression coverage.
|
|
|
|
Artifacts:
|
|
|
|
- updated README and PLAN
|
|
- final protocol and interface review
|
|
- focused regression test run
|
|
|
|
Dependencies: Phases 1 through 19.
|
|
|
|
Acceptance criteria:
|
|
|
|
- implementation matches documented contracts and ordering guarantees;
|
|
- docs describe the actual gateway behavior.
|
|
|
|
Targeted tests:
|
|
|
|
- run focused package tests for gateway packages;
|
|
- rerun cross-cutting regression scenarios.
|
|
|
|
## Cross-Cutting Regression Scenarios
|
|
|
|
- `send_email_code` and `confirm_email_code` are available without session auth
|
|
and are still limited by public auth policy.
|
|
- Public browser bootstrap and asset bursts do not increase auth abuse counters
|
|
and are not rejected as hostile because of intensity alone.
|
|
- Any gRPC command without a valid session is rejected before routing.
|
|
- Unknown and revoked sessions are handled predictably and consistently where
|
|
policy requires identical behavior.
|
|
- Signature verification fails when `payload_bytes`, `payload_hash`,
|
|
`message_type`, `request_id`, or the signing key changes.
|
|
- `payload_hash` is verified before downstream execution.
|
|
- Requests outside the freshness window are rejected.
|
|
- Reused `request_id` values are rejected within the session replay window.
|
|
- Public REST and authenticated gRPC traffic use independent buckets and
|
|
independent abuse telemetry.
|
|
- Downstream services receive `AuthenticatedCommand`, not raw REST or gRPC
|
|
transport requests.
|
|
- Unary responses preserve `request_id` correlation and are server-signed.
|
|
- Streaming connections open only after the auth pipeline and close on revoke.
|
|
- Session cache updates from events change gateway behavior without synchronous
|
|
auth-service lookups per request.
|
|
- Graceful shutdown terminates unary and streaming traffic cleanly.
|