gateway readme and plan
This commit is contained in:
@@ -1 +1,414 @@
|
||||
# Edge Gateway
|
||||
|
||||
## Purpose
|
||||
|
||||
`Edge Gateway` is the only public ingress for Galaxy Plus clients.
|
||||
It terminates the external transport and security boundary, enforces edge
|
||||
policies, and routes verified requests to internal services.
|
||||
|
||||
The gateway does not implement domain-specific business logic.
|
||||
Business validation, authorization, ownership checks, and state transitions
|
||||
remain inside downstream services.
|
||||
|
||||
## Trust Boundary
|
||||
|
||||
The gateway sits between untrusted external clients and trusted internal
|
||||
services.
|
||||
|
||||
The gateway is responsible for:
|
||||
|
||||
- parsing external transport requests;
|
||||
- classifying public REST traffic;
|
||||
- authenticating protected gRPC traffic;
|
||||
- loading session state from cache;
|
||||
- verifying request freshness and anti-replay constraints;
|
||||
- applying edge rate limits and anti-abuse policy;
|
||||
- building an authenticated internal command context;
|
||||
- routing verified commands to internal services;
|
||||
- maintaining authenticated push delivery connections.
|
||||
|
||||
The gateway is not responsible for:
|
||||
|
||||
- deciding whether a user is allowed to execute a business action;
|
||||
- validating domain invariants;
|
||||
- storing the source-of-truth session record;
|
||||
- implementing business idempotency.
|
||||
|
||||
## Transport Matrix
|
||||
|
||||
The gateway exposes two external transport classes.
|
||||
|
||||
| Transport | Audience | Authentication | Payload format | Primary use |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| REST/JSON | Public, unauthenticated traffic | No device session auth | JSON | Public auth commands, health checks, browser/bootstrap traffic |
|
||||
| gRPC over HTTP/2 | Authenticated clients only | Required | FlatBuffers payload inside protobuf control envelope | Verified commands and push delivery |
|
||||
|
||||
### Public REST Surface
|
||||
|
||||
The public REST surface is used for commands that must work before a device
|
||||
session exists and for browser-originated traffic that may share the same edge.
|
||||
|
||||
Stable public endpoints:
|
||||
|
||||
- `POST /api/v1/public/auth/send-email-code`
|
||||
- `POST /api/v1/public/auth/confirm-email-code`
|
||||
- `GET /healthz`
|
||||
- `GET /readyz`
|
||||
|
||||
In addition to the fixed endpoints above, the gateway may front browser
|
||||
bootstrap or asset traffic through a pluggable public handler or proxy.
|
||||
That traffic belongs to dedicated public route classes and must not share rate
|
||||
limit buckets or abuse counters with the public auth API.
|
||||
|
||||
### Authenticated gRPC Surface
|
||||
|
||||
All authenticated client requests use HTTP/2 and gRPC.
|
||||
|
||||
The public gRPC service exposes two methods:
|
||||
|
||||
- `ExecuteCommand(ExecuteCommandRequest) returns (ExecuteCommandResponse)`
|
||||
- `SubscribeEvents(SubscribeEventsRequest) returns (stream GatewayEvent)`
|
||||
|
||||
`ExecuteCommand` is a generic unary RPC.
|
||||
The gateway routes the request downstream by `message_type` after transport
|
||||
verification succeeds.
|
||||
|
||||
`SubscribeEvents` is an authenticated server-streaming RPC.
|
||||
It binds the stream to `user_id` and `device_session_id` and starts by sending
|
||||
a service event that includes the current server time in milliseconds.
|
||||
|
||||
## Envelope and Payload Model
|
||||
|
||||
The authenticated transport uses a split contract:
|
||||
|
||||
- gRPC control messages are protobuf-based;
|
||||
- business payload bytes are FlatBuffers;
|
||||
- signatures are computed over canonical envelope fields and a hash of raw
|
||||
FlatBuffers bytes.
|
||||
|
||||
The gateway treats `payload_bytes` as opaque business data.
|
||||
It verifies integrity and forwards verified bytes downstream without rewriting
|
||||
them.
|
||||
|
||||
### ExecuteCommandRequest
|
||||
|
||||
Required fields:
|
||||
|
||||
- `protocol_version`
|
||||
- `device_session_id`
|
||||
- `message_type`
|
||||
- `timestamp_ms`
|
||||
- `request_id`
|
||||
- `payload_bytes`
|
||||
- `payload_hash`
|
||||
- `signature`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `trace_id`
|
||||
|
||||
### ExecuteCommandResponse
|
||||
|
||||
Required fields:
|
||||
|
||||
- `protocol_version`
|
||||
- `request_id`
|
||||
- `timestamp_ms`
|
||||
- `result_code`
|
||||
- `payload_bytes`
|
||||
- `payload_hash`
|
||||
- `signature`
|
||||
|
||||
### SubscribeEventsRequest
|
||||
|
||||
The stream open request reuses the authenticated request model.
|
||||
It contains the same authentication fields as the unary request and either an
|
||||
empty payload or a minimal connect payload.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `protocol_version`
|
||||
- `device_session_id`
|
||||
- `message_type`
|
||||
- `timestamp_ms`
|
||||
- `request_id`
|
||||
- `payload_hash`
|
||||
- `signature`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `payload_bytes`
|
||||
- `trace_id`
|
||||
|
||||
### GatewayEvent
|
||||
|
||||
Every stream event is a client-facing signed server message.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `event_type`
|
||||
- `event_id`
|
||||
- `timestamp_ms`
|
||||
- `payload_bytes`
|
||||
- `payload_hash`
|
||||
- `signature`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `request_id`
|
||||
- `trace_id`
|
||||
|
||||
## Verification and Routing Pipeline
|
||||
|
||||
The gateway applies the same strict verification order for authenticated gRPC
|
||||
ingress.
|
||||
|
||||
1. Parse the control envelope and validate required fields.
|
||||
2. Check whether `protocol_version` is supported.
|
||||
3. Resolve `device_session_id` through `SessionCache`.
|
||||
4. Reject unknown or revoked sessions.
|
||||
5. Verify that `payload_hash` matches raw `payload_bytes`.
|
||||
6. Verify the client signature using the public key from session cache.
|
||||
7. Verify that `timestamp_ms` is inside the accepted freshness window.
|
||||
8. Verify anti-replay by checking `device_session_id + request_id`.
|
||||
9. Apply authenticated rate limit and edge policy checks.
|
||||
10. Build the authenticated internal command context.
|
||||
11. Route the command downstream by `message_type`.
|
||||
|
||||
No downstream business service should receive a request that has not passed
|
||||
this full verification pipeline.
|
||||
|
||||
## Internal Authenticated Contract
|
||||
|
||||
Downstream services should receive an internal authenticated command rather than
|
||||
raw external gRPC transport data.
|
||||
|
||||
The minimum authenticated context is:
|
||||
|
||||
- `user_id`
|
||||
- `device_session_id`
|
||||
- `message_type`
|
||||
- verified `payload_bytes`
|
||||
- `request_id`
|
||||
- optional `trace_id`
|
||||
- optional client metadata needed for logs and tracing
|
||||
|
||||
Downstream services may trust that the gateway has already performed transport
|
||||
authentication, freshness verification, and anti-replay checks.
|
||||
They must still perform business authorization and domain validation.
|
||||
|
||||
## Session Model
|
||||
|
||||
The Auth / Session Service is the source of truth for device session state.
|
||||
The gateway is designed to authenticate the hot path from cache.
|
||||
|
||||
Expected session fields available to the gateway:
|
||||
|
||||
- `device_session_id`
|
||||
- `user_id`
|
||||
- client public key
|
||||
- session status
|
||||
- revoke metadata
|
||||
- optional client metadata
|
||||
|
||||
### Session Cache
|
||||
|
||||
`SessionCache` provides the fast path for:
|
||||
|
||||
- session existence checks;
|
||||
- `device_session_id -> user_id`;
|
||||
- access to the client public key used for signature verification;
|
||||
- revoked versus active status checks.
|
||||
|
||||
Cache updates are event-driven.
|
||||
TTL is allowed only as a safety net and must not replace invalidation events.
|
||||
|
||||
### Revocation Behavior
|
||||
|
||||
When a device session is revoked:
|
||||
|
||||
1. the Auth / Session Service updates the source of truth;
|
||||
2. it publishes a session update or revoke event;
|
||||
3. the gateway invalidates or updates `SessionCache`;
|
||||
4. new unary gRPC requests for that session are rejected;
|
||||
5. active `SubscribeEvents` streams for that session are closed.
|
||||
|
||||
## Public Anti-Abuse Model
|
||||
|
||||
The public REST layer must distinguish between public auth operations and
|
||||
browser-originated traffic that may burst during a normal first page load.
|
||||
|
||||
The gateway uses these public route classes:
|
||||
|
||||
- `public_auth`
|
||||
- `browser_bootstrap`
|
||||
- `browser_asset`
|
||||
- `public_misc`
|
||||
|
||||
### Public Auth
|
||||
|
||||
`public_auth` includes `send-email-code` and `confirm-email-code`.
|
||||
This class uses stricter limits and abuse scoring because it directly touches
|
||||
account and session creation flows.
|
||||
|
||||
Controls include:
|
||||
|
||||
- per-IP and per-identity rate limits;
|
||||
- request body size limits;
|
||||
- method allow-lists;
|
||||
- malformed request counters;
|
||||
- elevated logging and security telemetry for repeated failures.
|
||||
|
||||
### Browser Bootstrap and Asset Traffic
|
||||
|
||||
`browser_bootstrap` and `browser_asset` use separate coarse-grained budgets.
|
||||
They may exhibit bursty behavior during the first load and therefore must not
|
||||
be treated as hostile based on burst pattern alone.
|
||||
|
||||
This traffic is still constrained by:
|
||||
|
||||
- dedicated rate limits;
|
||||
- method allow-lists;
|
||||
- body size limits where request bodies are expected;
|
||||
- protocol and path validation;
|
||||
- independent abuse telemetry.
|
||||
|
||||
The gateway must not merge these buckets or counters with `public_auth`.
|
||||
|
||||
## Push Delivery Model
|
||||
|
||||
The v1 push channel is a gRPC server stream.
|
||||
Long-polling is intentionally out of scope for the first version.
|
||||
|
||||
Expected stream behavior:
|
||||
|
||||
1. the client opens `SubscribeEvents`;
|
||||
2. the gateway applies the full authenticated ingress verification pipeline;
|
||||
3. the stream is bound to `user_id` and `device_session_id`;
|
||||
4. the first service event includes `server_time_ms`;
|
||||
5. client-facing events from internal pub/sub are fanned out to matching active
|
||||
streams;
|
||||
6. revoke events close affected streams.
|
||||
|
||||
## Recommended Package Layout
|
||||
|
||||
The initial package layout should keep transport, policy, and downstream
|
||||
adapters separate:
|
||||
|
||||
- `cmd/gateway`
|
||||
- `internal/app`
|
||||
- `internal/config`
|
||||
- `internal/restapi`
|
||||
- `internal/grpcapi`
|
||||
- `internal/authn`
|
||||
- `internal/session`
|
||||
- `internal/replay`
|
||||
- `internal/ratelimit`
|
||||
- `internal/downstream`
|
||||
- `internal/push`
|
||||
- `internal/events`
|
||||
- `internal/clock`
|
||||
|
||||
## Key Interfaces
|
||||
|
||||
The gateway should be built around explicit consumer-side interfaces.
|
||||
|
||||
### SessionCache
|
||||
|
||||
Provides cached session lookup by `device_session_id`.
|
||||
Returns enough data to verify signatures and identify the authenticated user.
|
||||
|
||||
### ReplayStore
|
||||
|
||||
Tracks recently seen `request_id` values per device session and rejects replayed
|
||||
requests inside the accepted freshness window.
|
||||
|
||||
### RateLimiter
|
||||
|
||||
Applies independent policies for:
|
||||
|
||||
- public REST route classes;
|
||||
- authenticated gRPC requests by IP;
|
||||
- authenticated gRPC requests by session;
|
||||
- authenticated gRPC requests by user;
|
||||
- authenticated gRPC requests by message class.
|
||||
|
||||
### PublicTrafficClassifier
|
||||
|
||||
Maps incoming public REST requests to one of the public route classes so that
|
||||
limits and anti-abuse counters remain isolated.
|
||||
|
||||
### AuthServiceClient
|
||||
|
||||
Handles public auth commands and session-related updates exchanged with the
|
||||
Auth / Session Service.
|
||||
|
||||
### DownstreamRouter
|
||||
|
||||
Resolves the target downstream service or adapter by `message_type`.
|
||||
|
||||
### DownstreamClient
|
||||
|
||||
Executes a verified authenticated command against a downstream internal service
|
||||
and returns response payload bytes plus a stable result code.
|
||||
|
||||
### EventSubscriber
|
||||
|
||||
Subscribes to internal pub/sub topics used for:
|
||||
|
||||
- session cache updates;
|
||||
- revocations;
|
||||
- client-facing event delivery.
|
||||
|
||||
### PushHub
|
||||
|
||||
Tracks active `SubscribeEvents` streams, binds them to authenticated identities,
|
||||
and delivers events to the correct connections.
|
||||
|
||||
### ResponseSigner
|
||||
|
||||
Signs unary responses and stream events so clients can verify server-originated
|
||||
messages.
|
||||
|
||||
### Clock
|
||||
|
||||
Provides current server time and supports consistent freshness-window checks.
|
||||
|
||||
## Error Model and Observability
|
||||
|
||||
The gateway should expose stable edge-level error classes instead of leaking
|
||||
internal implementation details.
|
||||
|
||||
Minimum error categories:
|
||||
|
||||
- malformed request;
|
||||
- unsupported protocol;
|
||||
- unknown session;
|
||||
- revoked session;
|
||||
- invalid signature;
|
||||
- stale request;
|
||||
- replay detected;
|
||||
- rate limited;
|
||||
- downstream unavailable;
|
||||
- internal error.
|
||||
|
||||
Observability requirements:
|
||||
|
||||
- stable correlation identifiers, including `request_id` and optional `trace_id`;
|
||||
- structured logs;
|
||||
- security audit events for rejects and abuse signals;
|
||||
- metrics keyed by route class, message type, result code, and reject reason;
|
||||
- no logging of secrets, raw private material, or raw signatures.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
The gateway is not a business authorization layer and must not grow into a
|
||||
domain coordinator.
|
||||
|
||||
The gateway must not:
|
||||
|
||||
- implement business ownership checks;
|
||||
- validate domain state transitions;
|
||||
- replace the Auth / Session Service as the session source of truth;
|
||||
- degrade into a synchronous pass-through that reloads session state for every
|
||||
authenticated request.
|
||||
|
||||
Reference in New Issue
Block a user