gateway readme and plan

This commit is contained in:
Ilia Denisov
2026-03-31 19:56:56 +02:00
parent f616e3f5ca
commit 8cde99936c
2 changed files with 902 additions and 5 deletions
+489 -5
View File
@@ -1,9 +1,493 @@
# Implementation plan for Edge Gateway service # Edge Gateway Implementation Plan
## [x] First step ## Summary
Step description. This plan breaks implementation into small, reviewable phases.
Each phase has a single primary goal, clear deliverables, explicit dependencies,
acceptance criteria, and focused tests.
## [ ] Second step The intended v1 architecture is:
Step Description. - unauthenticated public ingress over REST/JSON;
- authenticated ingress over gRPC on HTTP/2;
- FlatBuffers payloads for authenticated business commands;
- protobuf-based gRPC control envelopes;
- authenticated server-streaming push through gRPC;
- separate public traffic classes and isolated anti-abuse counters.
## Assumptions and Defaults
- `message_type` is the stable downstream routing key.
- `protocol_version` covers transport and envelope compatibility, not business
payload schema compatibility.
- FlatBuffers are used for business payload bytes only.
- Browser bootstrap and asset traffic are within gateway scope, even when backed
by a pluggable proxy or handler.
- Long-polling is out of scope for v1.
## Phase 1. Module Skeleton
Goal: create the runnable gateway process skeleton.
Artifacts:
- `cmd/gateway`
- `internal/app`
- base configuration types
- startup and shutdown wiring
Dependencies: none.
Acceptance criteria:
- the process starts with config;
- the process shuts down cleanly on signal;
- lifecycle wiring is testable.
Targeted tests:
- startup with valid config;
- shutdown without leaked goroutines.
## Phase 2. Public REST Server
Goal: add the unauthenticated HTTP server shell.
Artifacts:
- public REST listener
- `GET /healthz`
- `GET /readyz`
- base error serialization
- request classification hook
Dependencies: Phase 1.
Acceptance criteria:
- health endpoints respond deterministically;
- public requests are classified at least into `public_auth` and `browser_*`.
Targeted tests:
- health endpoint responses;
- request classification smoke tests.
## Phase 3. Public Auth REST Handlers
Goal: expose unauthenticated auth commands through REST/JSON.
Artifacts:
- `POST /api/v1/public/auth/send-email-code`
- `POST /api/v1/public/auth/confirm-email-code`
- request and response DTOs
- adapter calls into `AuthServiceClient`
Dependencies: Phase 2.
Acceptance criteria:
- no session authentication is required for these routes;
- handlers delegate only through the auth service adapter.
Targeted tests:
- success and validation errors for both routes;
- no session lookup on public auth paths.
## Phase 4. Public Traffic Classification
Goal: isolate public traffic into stable anti-abuse classes.
Artifacts:
- `PublicTrafficClassifier`
- classes `public_auth`, `browser_bootstrap`, `browser_asset`, `public_misc`
- isolated rate-limit bucket keys
Dependencies: Phase 2.
Acceptance criteria:
- browser traffic does not share buckets with public auth;
- auth counters remain unaffected by asset bursts.
Targeted tests:
- per-class routing tests;
- bucket isolation tests.
## Phase 5. Public REST Anti-Abuse
Goal: add coarse protection to unauthenticated REST traffic.
Artifacts:
- body size limits
- method allow-lists
- malformed request counters
- per-class rate-limit thresholds
Dependencies: Phase 4.
Acceptance criteria:
- first-load browser bursts are not marked hostile because of burst pattern
alone;
- malformed or oversized requests are rejected predictably.
Targeted tests:
- bootstrap burst stays outside auth abuse counters;
- invalid methods and oversized bodies are rejected.
## Phase 6. gRPC Server and Public Contracts
Goal: bring up authenticated transport over gRPC and HTTP/2.
Artifacts:
- gRPC listener
- protobuf service definitions
- `ExecuteCommand`
- `SubscribeEvents`
Dependencies: Phase 1.
Acceptance criteria:
- unary and server-streaming RPCs are reachable;
- the server runs only over HTTP/2.
Targeted tests:
- unary transport smoke test;
- stream transport smoke test.
## Phase 7. Envelope Parsing and Protocol Gate
Goal: validate the gRPC control envelope before security checks continue.
Artifacts:
- envelope parser
- required-field validation
- protocol version gate
Dependencies: Phase 6.
Acceptance criteria:
- unsupported or malformed envelopes are rejected before routing.
Targeted tests:
- missing field rejection;
- unsupported `protocol_version` rejection.
## Phase 8. Session Cache Lookup
Goal: resolve authenticated identity from cache.
Artifacts:
- `SessionCache`
- session lookup pipeline
- revoked versus active session handling
Dependencies: Phase 7.
Acceptance criteria:
- unknown and revoked sessions are blocked before signature verification.
Targeted tests:
- cache hit with active session;
- cache miss reject;
- revoked session reject.
## Phase 9. Payload Hash and Signing Input
Goal: verify payload integrity before signature verification.
Artifacts:
- `payload_hash` verification
- canonical signing input builder
Dependencies: Phase 8.
Acceptance criteria:
- changing payload bytes or envelope fields breaks the signing input.
Targeted tests:
- payload hash mismatch reject;
- canonical bytes differ when signed fields change.
## Phase 10. Client Signature Verification
Goal: authenticate the request origin using the session public key.
Artifacts:
- signature verifier
- deterministic auth reject mapping
Dependencies: Phase 9.
Acceptance criteria:
- wrong key and invalid signature produce stable rejects.
Targeted tests:
- success case with valid signature;
- bad signature reject;
- wrong-key reject.
## Phase 11. Freshness and Anti-Replay
Goal: enforce transport freshness and replay protection.
Artifacts:
- timestamp freshness window
- `ReplayStore`
- replay reservation and rejection logic
Dependencies: Phase 10.
Acceptance criteria:
- stale requests and duplicate `request_id` values are rejected.
Targeted tests:
- stale timestamp reject;
- replay reject for same session and request ID;
- distinct sessions do not collide.
## Phase 12. Authenticated Rate Limits and Policy
Goal: apply edge policy after transport authenticity is established.
Artifacts:
- rate-limit keys for IP, session, user, and message class
- authenticated policy evaluation hook
Dependencies: Phase 11.
Acceptance criteria:
- authenticated buckets are independent from public REST buckets.
Targeted tests:
- per-dimension throttling;
- bucket isolation from public traffic.
## Phase 13. Internal Authenticated Command and Routing
Goal: forward only verified context to downstream services.
Artifacts:
- `AuthenticatedCommand`
- `DownstreamRouter`
- `DownstreamClient`
Dependencies: Phase 12.
Acceptance criteria:
- downstream services receive verified context only;
- raw transport details do not leak as authoritative input.
Targeted tests:
- route selection by `message_type`;
- downstream receives the expected authenticated context.
## Phase 14. Signed Unary Responses
Goal: return verifiable server responses to authenticated clients.
Artifacts:
- response envelope builder
- payload hash generation
- `ResponseSigner`
Dependencies: Phase 13.
Acceptance criteria:
- unary responses always carry the original `request_id`, `payload_hash`, and
server signature.
Targeted tests:
- response correlation test;
- server signature generation test.
## Phase 15. Session Update and Revocation Events
Goal: keep gateway session state current without synchronous hot-path lookups.
Artifacts:
- `EventSubscriber`
- session update handlers
- session revoke handlers
Dependencies: Phase 8.
Acceptance criteria:
- session updates change gateway behavior without per-request sync calls to the
auth service.
Targeted tests:
- cache update from event;
- revocation event invalidates cached session.
## Phase 16. Authenticated Push Stream
Goal: open a verified server-streaming channel for client-facing delivery.
Artifacts:
- `SubscribeEvents` handler
- stream binding to `user_id` and `device_session_id`
- initial server time event
Dependencies: Phase 15.
Acceptance criteria:
- the stream opens only after the full auth pipeline succeeds.
Targeted tests:
- authorized stream open;
- rejected stream open for invalid session;
- first event contains server time.
## Phase 17. Event Fan-Out
Goal: deliver client-facing events from internal pub/sub to active streams.
Artifacts:
- `PushHub`
- event fan-out logic
- user and session targeting rules
Dependencies: Phase 16.
Acceptance criteria:
- events are delivered to the correct active streams only.
Targeted tests:
- single-session delivery;
- multi-device delivery for one user;
- unrelated sessions do not receive the event.
## Phase 18. Revocation-Driven Stream Teardown
Goal: terminate active delivery channels when a session is revoked.
Artifacts:
- stream teardown on revoke
- connection cleanup logic
Dependencies: Phase 17.
Acceptance criteria:
- revocation blocks new unary requests and closes active streams for the same
session.
Targeted tests:
- revoke closes active stream;
- revoked session cannot reopen the stream.
## Phase 19. Observability and Shutdown Hardening
Goal: make the service operable in production.
Artifacts:
- structured logs
- metrics
- trace propagation
- timeout budgets
- graceful shutdown for unary and streaming traffic
Dependencies: Phase 18.
Acceptance criteria:
- shutdown is deterministic;
- logs and metrics expose stable edge outcomes without leaking secrets.
Targeted tests:
- shutdown closes listeners and active streams;
- secret and signature values are not logged.
## Phase 20. Acceptance Pass
Goal: reconcile implementation, documentation, and regression coverage.
Artifacts:
- updated README and PLAN
- final protocol and interface review
- focused regression test run
Dependencies: Phases 1 through 19.
Acceptance criteria:
- implementation matches documented contracts and ordering guarantees;
- docs describe the actual gateway behavior.
Targeted tests:
- run focused package tests for gateway packages;
- rerun cross-cutting regression scenarios.
## Cross-Cutting Regression Scenarios
- `send_email_code` and `confirm_email_code` are available without session auth
and are still limited by public auth policy.
- Public browser bootstrap and asset bursts do not increase auth abuse counters
and are not rejected as hostile because of intensity alone.
- Any gRPC command without a valid session is rejected before routing.
- Unknown and revoked sessions are handled predictably and consistently where
policy requires identical behavior.
- Signature verification fails when `payload_bytes`, `payload_hash`,
`message_type`, `request_id`, or the signing key changes.
- `payload_hash` is verified before downstream execution.
- Requests outside the freshness window are rejected.
- Reused `request_id` values are rejected within the session replay window.
- Public REST and authenticated gRPC traffic use independent buckets and
independent abuse telemetry.
- Downstream services receive `AuthenticatedCommand`, not raw REST or gRPC
transport requests.
- Unary responses preserve `request_id` correlation and are server-signed.
- Streaming connections open only after the auth pipeline and close on revoke.
- Session cache updates from events change gateway behavior without synchronous
auth-service lookups per request.
- Graceful shutdown terminates unary and streaming traffic cleanly.
+413
View File
@@ -1 +1,414 @@
# Edge Gateway # Edge Gateway
## Purpose
`Edge Gateway` is the only public ingress for Galaxy Plus clients.
It terminates the external transport and security boundary, enforces edge
policies, and routes verified requests to internal services.
The gateway does not implement domain-specific business logic.
Business validation, authorization, ownership checks, and state transitions
remain inside downstream services.
## Trust Boundary
The gateway sits between untrusted external clients and trusted internal
services.
The gateway is responsible for:
- parsing external transport requests;
- classifying public REST traffic;
- authenticating protected gRPC traffic;
- loading session state from cache;
- verifying request freshness and anti-replay constraints;
- applying edge rate limits and anti-abuse policy;
- building an authenticated internal command context;
- routing verified commands to internal services;
- maintaining authenticated push delivery connections.
The gateway is not responsible for:
- deciding whether a user is allowed to execute a business action;
- validating domain invariants;
- storing the source-of-truth session record;
- implementing business idempotency.
## Transport Matrix
The gateway exposes two external transport classes.
| Transport | Audience | Authentication | Payload format | Primary use |
| --- | --- | --- | --- | --- |
| REST/JSON | Public, unauthenticated traffic | No device session auth | JSON | Public auth commands, health checks, browser/bootstrap traffic |
| gRPC over HTTP/2 | Authenticated clients only | Required | FlatBuffers payload inside protobuf control envelope | Verified commands and push delivery |
### Public REST Surface
The public REST surface is used for commands that must work before a device
session exists and for browser-originated traffic that may share the same edge.
Stable public endpoints:
- `POST /api/v1/public/auth/send-email-code`
- `POST /api/v1/public/auth/confirm-email-code`
- `GET /healthz`
- `GET /readyz`
In addition to the fixed endpoints above, the gateway may front browser
bootstrap or asset traffic through a pluggable public handler or proxy.
That traffic belongs to dedicated public route classes and must not share rate
limit buckets or abuse counters with the public auth API.
### Authenticated gRPC Surface
All authenticated client requests use HTTP/2 and gRPC.
The public gRPC service exposes two methods:
- `ExecuteCommand(ExecuteCommandRequest) returns (ExecuteCommandResponse)`
- `SubscribeEvents(SubscribeEventsRequest) returns (stream GatewayEvent)`
`ExecuteCommand` is a generic unary RPC.
The gateway routes the request downstream by `message_type` after transport
verification succeeds.
`SubscribeEvents` is an authenticated server-streaming RPC.
It binds the stream to `user_id` and `device_session_id` and starts by sending
a service event that includes the current server time in milliseconds.
## Envelope and Payload Model
The authenticated transport uses a split contract:
- gRPC control messages are protobuf-based;
- business payload bytes are FlatBuffers;
- signatures are computed over canonical envelope fields and a hash of raw
FlatBuffers bytes.
The gateway treats `payload_bytes` as opaque business data.
It verifies integrity and forwards verified bytes downstream without rewriting
them.
### ExecuteCommandRequest
Required fields:
- `protocol_version`
- `device_session_id`
- `message_type`
- `timestamp_ms`
- `request_id`
- `payload_bytes`
- `payload_hash`
- `signature`
Optional fields:
- `trace_id`
### ExecuteCommandResponse
Required fields:
- `protocol_version`
- `request_id`
- `timestamp_ms`
- `result_code`
- `payload_bytes`
- `payload_hash`
- `signature`
### SubscribeEventsRequest
The stream open request reuses the authenticated request model.
It contains the same authentication fields as the unary request and either an
empty payload or a minimal connect payload.
Required fields:
- `protocol_version`
- `device_session_id`
- `message_type`
- `timestamp_ms`
- `request_id`
- `payload_hash`
- `signature`
Optional fields:
- `payload_bytes`
- `trace_id`
### GatewayEvent
Every stream event is a client-facing signed server message.
Required fields:
- `event_type`
- `event_id`
- `timestamp_ms`
- `payload_bytes`
- `payload_hash`
- `signature`
Optional fields:
- `request_id`
- `trace_id`
## Verification and Routing Pipeline
The gateway applies the same strict verification order for authenticated gRPC
ingress.
1. Parse the control envelope and validate required fields.
2. Check whether `protocol_version` is supported.
3. Resolve `device_session_id` through `SessionCache`.
4. Reject unknown or revoked sessions.
5. Verify that `payload_hash` matches raw `payload_bytes`.
6. Verify the client signature using the public key from session cache.
7. Verify that `timestamp_ms` is inside the accepted freshness window.
8. Verify anti-replay by checking `device_session_id + request_id`.
9. Apply authenticated rate limit and edge policy checks.
10. Build the authenticated internal command context.
11. Route the command downstream by `message_type`.
No downstream business service should receive a request that has not passed
this full verification pipeline.
## Internal Authenticated Contract
Downstream services should receive an internal authenticated command rather than
raw external gRPC transport data.
The minimum authenticated context is:
- `user_id`
- `device_session_id`
- `message_type`
- verified `payload_bytes`
- `request_id`
- optional `trace_id`
- optional client metadata needed for logs and tracing
Downstream services may trust that the gateway has already performed transport
authentication, freshness verification, and anti-replay checks.
They must still perform business authorization and domain validation.
## Session Model
The Auth / Session Service is the source of truth for device session state.
The gateway is designed to authenticate the hot path from cache.
Expected session fields available to the gateway:
- `device_session_id`
- `user_id`
- client public key
- session status
- revoke metadata
- optional client metadata
### Session Cache
`SessionCache` provides the fast path for:
- session existence checks;
- `device_session_id -> user_id`;
- access to the client public key used for signature verification;
- revoked versus active status checks.
Cache updates are event-driven.
TTL is allowed only as a safety net and must not replace invalidation events.
### Revocation Behavior
When a device session is revoked:
1. the Auth / Session Service updates the source of truth;
2. it publishes a session update or revoke event;
3. the gateway invalidates or updates `SessionCache`;
4. new unary gRPC requests for that session are rejected;
5. active `SubscribeEvents` streams for that session are closed.
## Public Anti-Abuse Model
The public REST layer must distinguish between public auth operations and
browser-originated traffic that may burst during a normal first page load.
The gateway uses these public route classes:
- `public_auth`
- `browser_bootstrap`
- `browser_asset`
- `public_misc`
### Public Auth
`public_auth` includes `send-email-code` and `confirm-email-code`.
This class uses stricter limits and abuse scoring because it directly touches
account and session creation flows.
Controls include:
- per-IP and per-identity rate limits;
- request body size limits;
- method allow-lists;
- malformed request counters;
- elevated logging and security telemetry for repeated failures.
### Browser Bootstrap and Asset Traffic
`browser_bootstrap` and `browser_asset` use separate coarse-grained budgets.
They may exhibit bursty behavior during the first load and therefore must not
be treated as hostile based on burst pattern alone.
This traffic is still constrained by:
- dedicated rate limits;
- method allow-lists;
- body size limits where request bodies are expected;
- protocol and path validation;
- independent abuse telemetry.
The gateway must not merge these buckets or counters with `public_auth`.
## Push Delivery Model
The v1 push channel is a gRPC server stream.
Long-polling is intentionally out of scope for the first version.
Expected stream behavior:
1. the client opens `SubscribeEvents`;
2. the gateway applies the full authenticated ingress verification pipeline;
3. the stream is bound to `user_id` and `device_session_id`;
4. the first service event includes `server_time_ms`;
5. client-facing events from internal pub/sub are fanned out to matching active
streams;
6. revoke events close affected streams.
## Recommended Package Layout
The initial package layout should keep transport, policy, and downstream
adapters separate:
- `cmd/gateway`
- `internal/app`
- `internal/config`
- `internal/restapi`
- `internal/grpcapi`
- `internal/authn`
- `internal/session`
- `internal/replay`
- `internal/ratelimit`
- `internal/downstream`
- `internal/push`
- `internal/events`
- `internal/clock`
## Key Interfaces
The gateway should be built around explicit consumer-side interfaces.
### SessionCache
Provides cached session lookup by `device_session_id`.
Returns enough data to verify signatures and identify the authenticated user.
### ReplayStore
Tracks recently seen `request_id` values per device session and rejects replayed
requests inside the accepted freshness window.
### RateLimiter
Applies independent policies for:
- public REST route classes;
- authenticated gRPC requests by IP;
- authenticated gRPC requests by session;
- authenticated gRPC requests by user;
- authenticated gRPC requests by message class.
### PublicTrafficClassifier
Maps incoming public REST requests to one of the public route classes so that
limits and anti-abuse counters remain isolated.
### AuthServiceClient
Handles public auth commands and session-related updates exchanged with the
Auth / Session Service.
### DownstreamRouter
Resolves the target downstream service or adapter by `message_type`.
### DownstreamClient
Executes a verified authenticated command against a downstream internal service
and returns response payload bytes plus a stable result code.
### EventSubscriber
Subscribes to internal pub/sub topics used for:
- session cache updates;
- revocations;
- client-facing event delivery.
### PushHub
Tracks active `SubscribeEvents` streams, binds them to authenticated identities,
and delivers events to the correct connections.
### ResponseSigner
Signs unary responses and stream events so clients can verify server-originated
messages.
### Clock
Provides current server time and supports consistent freshness-window checks.
## Error Model and Observability
The gateway should expose stable edge-level error classes instead of leaking
internal implementation details.
Minimum error categories:
- malformed request;
- unsupported protocol;
- unknown session;
- revoked session;
- invalid signature;
- stale request;
- replay detected;
- rate limited;
- downstream unavailable;
- internal error.
Observability requirements:
- stable correlation identifiers, including `request_id` and optional `trace_id`;
- structured logs;
- security audit events for rejects and abuse signals;
- metrics keyed by route class, message type, result code, and reject reason;
- no logging of secrets, raw private material, or raw signatures.
## Non-Goals
The gateway is not a business authorization layer and must not grow into a
domain coordinator.
The gateway must not:
- implement business ownership checks;
- validate domain state transitions;
- replace the Auth / Session Service as the session source of truth;
- degrade into a synchronous pass-through that reloads session state for every
authenticated request.