Files
galaxy-game/gateway/README.md
T
2026-03-31 19:56:56 +02:00

415 lines
11 KiB
Markdown

# Edge Gateway
## Purpose
`Edge Gateway` is the only public ingress for Galaxy Plus clients.
It terminates the external transport and security boundary, enforces edge
policies, and routes verified requests to internal services.
The gateway does not implement domain-specific business logic.
Business validation, authorization, ownership checks, and state transitions
remain inside downstream services.
## Trust Boundary
The gateway sits between untrusted external clients and trusted internal
services.
The gateway is responsible for:
- parsing external transport requests;
- classifying public REST traffic;
- authenticating protected gRPC traffic;
- loading session state from cache;
- verifying request freshness and anti-replay constraints;
- applying edge rate limits and anti-abuse policy;
- building an authenticated internal command context;
- routing verified commands to internal services;
- maintaining authenticated push delivery connections.
The gateway is not responsible for:
- deciding whether a user is allowed to execute a business action;
- validating domain invariants;
- storing the source-of-truth session record;
- implementing business idempotency.
## Transport Matrix
The gateway exposes two external transport classes.
| Transport | Audience | Authentication | Payload format | Primary use |
| --- | --- | --- | --- | --- |
| REST/JSON | Public, unauthenticated traffic | No device session auth | JSON | Public auth commands, health checks, browser/bootstrap traffic |
| gRPC over HTTP/2 | Authenticated clients only | Required | FlatBuffers payload inside protobuf control envelope | Verified commands and push delivery |
### Public REST Surface
The public REST surface is used for commands that must work before a device
session exists and for browser-originated traffic that may share the same edge.
Stable public endpoints:
- `POST /api/v1/public/auth/send-email-code`
- `POST /api/v1/public/auth/confirm-email-code`
- `GET /healthz`
- `GET /readyz`
In addition to the fixed endpoints above, the gateway may front browser
bootstrap or asset traffic through a pluggable public handler or proxy.
That traffic belongs to dedicated public route classes and must not share rate
limit buckets or abuse counters with the public auth API.
### Authenticated gRPC Surface
All authenticated client requests use HTTP/2 and gRPC.
The public gRPC service exposes two methods:
- `ExecuteCommand(ExecuteCommandRequest) returns (ExecuteCommandResponse)`
- `SubscribeEvents(SubscribeEventsRequest) returns (stream GatewayEvent)`
`ExecuteCommand` is a generic unary RPC.
The gateway routes the request downstream by `message_type` after transport
verification succeeds.
`SubscribeEvents` is an authenticated server-streaming RPC.
It binds the stream to `user_id` and `device_session_id` and starts by sending
a service event that includes the current server time in milliseconds.
## Envelope and Payload Model
The authenticated transport uses a split contract:
- gRPC control messages are protobuf-based;
- business payload bytes are FlatBuffers;
- signatures are computed over canonical envelope fields and a hash of raw
FlatBuffers bytes.
The gateway treats `payload_bytes` as opaque business data.
It verifies integrity and forwards verified bytes downstream without rewriting
them.
### ExecuteCommandRequest
Required fields:
- `protocol_version`
- `device_session_id`
- `message_type`
- `timestamp_ms`
- `request_id`
- `payload_bytes`
- `payload_hash`
- `signature`
Optional fields:
- `trace_id`
### ExecuteCommandResponse
Required fields:
- `protocol_version`
- `request_id`
- `timestamp_ms`
- `result_code`
- `payload_bytes`
- `payload_hash`
- `signature`
### SubscribeEventsRequest
The stream open request reuses the authenticated request model.
It contains the same authentication fields as the unary request and either an
empty payload or a minimal connect payload.
Required fields:
- `protocol_version`
- `device_session_id`
- `message_type`
- `timestamp_ms`
- `request_id`
- `payload_hash`
- `signature`
Optional fields:
- `payload_bytes`
- `trace_id`
### GatewayEvent
Every stream event is a client-facing signed server message.
Required fields:
- `event_type`
- `event_id`
- `timestamp_ms`
- `payload_bytes`
- `payload_hash`
- `signature`
Optional fields:
- `request_id`
- `trace_id`
## Verification and Routing Pipeline
The gateway applies the same strict verification order for authenticated gRPC
ingress.
1. Parse the control envelope and validate required fields.
2. Check whether `protocol_version` is supported.
3. Resolve `device_session_id` through `SessionCache`.
4. Reject unknown or revoked sessions.
5. Verify that `payload_hash` matches raw `payload_bytes`.
6. Verify the client signature using the public key from session cache.
7. Verify that `timestamp_ms` is inside the accepted freshness window.
8. Verify anti-replay by checking `device_session_id + request_id`.
9. Apply authenticated rate limit and edge policy checks.
10. Build the authenticated internal command context.
11. Route the command downstream by `message_type`.
No downstream business service should receive a request that has not passed
this full verification pipeline.
## Internal Authenticated Contract
Downstream services should receive an internal authenticated command rather than
raw external gRPC transport data.
The minimum authenticated context is:
- `user_id`
- `device_session_id`
- `message_type`
- verified `payload_bytes`
- `request_id`
- optional `trace_id`
- optional client metadata needed for logs and tracing
Downstream services may trust that the gateway has already performed transport
authentication, freshness verification, and anti-replay checks.
They must still perform business authorization and domain validation.
## Session Model
The Auth / Session Service is the source of truth for device session state.
The gateway is designed to authenticate the hot path from cache.
Expected session fields available to the gateway:
- `device_session_id`
- `user_id`
- client public key
- session status
- revoke metadata
- optional client metadata
### Session Cache
`SessionCache` provides the fast path for:
- session existence checks;
- `device_session_id -> user_id`;
- access to the client public key used for signature verification;
- revoked versus active status checks.
Cache updates are event-driven.
TTL is allowed only as a safety net and must not replace invalidation events.
### Revocation Behavior
When a device session is revoked:
1. the Auth / Session Service updates the source of truth;
2. it publishes a session update or revoke event;
3. the gateway invalidates or updates `SessionCache`;
4. new unary gRPC requests for that session are rejected;
5. active `SubscribeEvents` streams for that session are closed.
## Public Anti-Abuse Model
The public REST layer must distinguish between public auth operations and
browser-originated traffic that may burst during a normal first page load.
The gateway uses these public route classes:
- `public_auth`
- `browser_bootstrap`
- `browser_asset`
- `public_misc`
### Public Auth
`public_auth` includes `send-email-code` and `confirm-email-code`.
This class uses stricter limits and abuse scoring because it directly touches
account and session creation flows.
Controls include:
- per-IP and per-identity rate limits;
- request body size limits;
- method allow-lists;
- malformed request counters;
- elevated logging and security telemetry for repeated failures.
### Browser Bootstrap and Asset Traffic
`browser_bootstrap` and `browser_asset` use separate coarse-grained budgets.
They may exhibit bursty behavior during the first load and therefore must not
be treated as hostile based on burst pattern alone.
This traffic is still constrained by:
- dedicated rate limits;
- method allow-lists;
- body size limits where request bodies are expected;
- protocol and path validation;
- independent abuse telemetry.
The gateway must not merge these buckets or counters with `public_auth`.
## Push Delivery Model
The v1 push channel is a gRPC server stream.
Long-polling is intentionally out of scope for the first version.
Expected stream behavior:
1. the client opens `SubscribeEvents`;
2. the gateway applies the full authenticated ingress verification pipeline;
3. the stream is bound to `user_id` and `device_session_id`;
4. the first service event includes `server_time_ms`;
5. client-facing events from internal pub/sub are fanned out to matching active
streams;
6. revoke events close affected streams.
## Recommended Package Layout
The initial package layout should keep transport, policy, and downstream
adapters separate:
- `cmd/gateway`
- `internal/app`
- `internal/config`
- `internal/restapi`
- `internal/grpcapi`
- `internal/authn`
- `internal/session`
- `internal/replay`
- `internal/ratelimit`
- `internal/downstream`
- `internal/push`
- `internal/events`
- `internal/clock`
## Key Interfaces
The gateway should be built around explicit consumer-side interfaces.
### SessionCache
Provides cached session lookup by `device_session_id`.
Returns enough data to verify signatures and identify the authenticated user.
### ReplayStore
Tracks recently seen `request_id` values per device session and rejects replayed
requests inside the accepted freshness window.
### RateLimiter
Applies independent policies for:
- public REST route classes;
- authenticated gRPC requests by IP;
- authenticated gRPC requests by session;
- authenticated gRPC requests by user;
- authenticated gRPC requests by message class.
### PublicTrafficClassifier
Maps incoming public REST requests to one of the public route classes so that
limits and anti-abuse counters remain isolated.
### AuthServiceClient
Handles public auth commands and session-related updates exchanged with the
Auth / Session Service.
### DownstreamRouter
Resolves the target downstream service or adapter by `message_type`.
### DownstreamClient
Executes a verified authenticated command against a downstream internal service
and returns response payload bytes plus a stable result code.
### EventSubscriber
Subscribes to internal pub/sub topics used for:
- session cache updates;
- revocations;
- client-facing event delivery.
### PushHub
Tracks active `SubscribeEvents` streams, binds them to authenticated identities,
and delivers events to the correct connections.
### ResponseSigner
Signs unary responses and stream events so clients can verify server-originated
messages.
### Clock
Provides current server time and supports consistent freshness-window checks.
## Error Model and Observability
The gateway should expose stable edge-level error classes instead of leaking
internal implementation details.
Minimum error categories:
- malformed request;
- unsupported protocol;
- unknown session;
- revoked session;
- invalid signature;
- stale request;
- replay detected;
- rate limited;
- downstream unavailable;
- internal error.
Observability requirements:
- stable correlation identifiers, including `request_id` and optional `trace_id`;
- structured logs;
- security audit events for rejects and abuse signals;
- metrics keyed by route class, message type, result code, and reject reason;
- no logging of secrets, raw private material, or raw signatures.
## Non-Goals
The gateway is not a business authorization layer and must not grow into a
domain coordinator.
The gateway must not:
- implement business ownership checks;
- validate domain state transitions;
- replace the Auth / Session Service as the session source of truth;
- degrade into a synchronous pass-through that reloads session state for every
authenticated request.