feat: backend service
This commit is contained in:
+18
-42
@@ -7,20 +7,20 @@ sequenceDiagram
|
||||
participant Client
|
||||
participant Gateway
|
||||
participant Limiter as Public anti-abuse
|
||||
participant Auth as AuthServiceClient
|
||||
participant Backend as backendclient.RESTClient
|
||||
|
||||
Client->>Gateway: POST /api/v1/public/auth/send-email-code
|
||||
Gateway->>Limiter: classify + rate-limit + body checks
|
||||
Limiter-->>Gateway: allowed
|
||||
Gateway->>Auth: SendEmailCode(email)
|
||||
Auth-->>Gateway: challenge_id
|
||||
Gateway->>Backend: POST /api/v1/public/auth/send-email-code
|
||||
Backend-->>Gateway: 200 {challenge_id}
|
||||
Gateway-->>Client: 200 {challenge_id}
|
||||
|
||||
Client->>Gateway: POST /api/v1/public/auth/confirm-email-code
|
||||
Gateway->>Limiter: classify + rate-limit + body checks
|
||||
Limiter-->>Gateway: allowed
|
||||
Gateway->>Auth: ConfirmEmailCode(challenge_id, code, client_public_key, time_zone)
|
||||
Auth-->>Gateway: device_session_id
|
||||
Gateway->>Backend: POST /api/v1/public/auth/confirm-email-code
|
||||
Backend-->>Gateway: 200 {device_session_id}
|
||||
Gateway-->>Client: 200 {device_session_id}
|
||||
```
|
||||
|
||||
@@ -30,15 +30,14 @@ sequenceDiagram
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Gateway
|
||||
participant Cache as SessionCache
|
||||
participant Backend as backendclient.RESTClient
|
||||
participant Replay as ReplayStore
|
||||
participant Policy as Rate limit / policy
|
||||
participant Downstream
|
||||
|
||||
Client->>Gateway: ExecuteCommand(envelope, payload_bytes, signature)
|
||||
Gateway->>Gateway: validate envelope + protocol_version
|
||||
Gateway->>Cache: lookup(device_session_id)
|
||||
Cache-->>Gateway: session record
|
||||
Gateway->>Backend: GET /api/v1/internal/sessions/{device_session_id}
|
||||
Backend-->>Gateway: session record
|
||||
Gateway->>Gateway: verify payload_hash
|
||||
Gateway->>Gateway: verify Ed25519 signature
|
||||
Gateway->>Gateway: verify freshness window
|
||||
@@ -46,57 +45,34 @@ sequenceDiagram
|
||||
Replay-->>Gateway: accepted
|
||||
Gateway->>Policy: apply IP/session/user/message_type budgets
|
||||
Policy-->>Gateway: allowed
|
||||
Gateway->>Downstream: verified authenticated command
|
||||
Downstream-->>Gateway: result_code + payload_bytes
|
||||
Gateway->>Backend: PATCH/POST/GET /api/v1/user/...
|
||||
Backend-->>Gateway: JSON success or error
|
||||
Gateway->>Gateway: hash payload + sign response
|
||||
Gateway-->>Client: ExecuteCommandResponse + signature
|
||||
```
|
||||
|
||||
## Direct Gateway -> User Self-Service Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Gateway
|
||||
participant User as User Service
|
||||
|
||||
Client->>Gateway: ExecuteCommand(user.account.get | user.profile.update | user.settings.update)
|
||||
Gateway->>Gateway: verify envelope + session + signature + replay
|
||||
Gateway->>Gateway: decode FlatBuffers payload
|
||||
Gateway->>User: trusted REST/JSON internal request
|
||||
User-->>Gateway: JSON account aggregate or JSON error envelope
|
||||
Gateway->>Gateway: encode FlatBuffers success or error payload
|
||||
Gateway->>Gateway: sign response
|
||||
Gateway-->>Client: ExecuteCommandResponse(result_code, payload_bytes, signature)
|
||||
```
|
||||
|
||||
## SubscribeEvents Lifecycle
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Gateway
|
||||
participant Cache as SessionCache
|
||||
participant Replay as ReplayStore
|
||||
participant Backend as backend Push.SubscribePush
|
||||
participant Hub as PushHub
|
||||
participant Stream as Client event stream
|
||||
participant Sess as Session event stream
|
||||
participant Dispatcher
|
||||
|
||||
Client->>Gateway: SubscribeEvents(envelope, signature)
|
||||
Gateway->>Gateway: validate envelope + verify request
|
||||
Gateway->>Cache: lookup(device_session_id)
|
||||
Cache-->>Gateway: session record
|
||||
Gateway->>Replay: reserve(device_session_id, request_id, ttl)
|
||||
Replay-->>Gateway: accepted
|
||||
Gateway->>Gateway: lookup session via backend REST
|
||||
Gateway->>Client: gateway.server_time event
|
||||
Gateway->>Hub: register(user_id, device_session_id)
|
||||
|
||||
Stream-->>Gateway: client-facing event for user_id / device_session_id
|
||||
Gateway->>Hub: publish signed event
|
||||
Hub-->>Client: matching event delivery
|
||||
Backend-->>Dispatcher: PushEvent{ClientEvent}
|
||||
Dispatcher->>Hub: Publish(push.Event)
|
||||
Hub-->>Client: matching event delivery (signed envelope)
|
||||
|
||||
Sess-->>Gateway: revoked session snapshot
|
||||
Gateway->>Hub: revoke(device_session_id)
|
||||
Backend-->>Dispatcher: PushEvent{SessionInvalidation}
|
||||
Dispatcher->>Hub: RevokeDeviceSession or RevokeAllForUser
|
||||
Hub-->>Client: stream closes with FAILED_PRECONDITION
|
||||
|
||||
Note over Gateway,Hub: During shutdown the gateway closes PushHub before gRPC graceful stop.
|
||||
|
||||
@@ -1,43 +1,33 @@
|
||||
# Decision: Redis configuration shape
|
||||
|
||||
PG_PLAN.md §7. Captures the standing rules adopted by Edge Gateway when it
|
||||
joined the project-wide Redis topology defined in
|
||||
`ARCHITECTURE.md §Persistence Backends`.
|
||||
Captures the standing rules adopted by Edge Gateway when it joined the
|
||||
project-wide Redis topology described in `ARCHITECTURE.md`.
|
||||
|
||||
## Context
|
||||
|
||||
Gateway intentionally stays Redis-only. All gateway state Redis serves is
|
||||
TTL-bounded or runtime-coordination state:
|
||||
Gateway intentionally stays Redis-light. The only Redis state served by
|
||||
gateway is the replay reservation namespace (short-lived `SETNX` per
|
||||
authenticated request, bounded by
|
||||
`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`). Session lookup goes through
|
||||
backend's REST surface, and inbound events are delivered through the
|
||||
gRPC `Push.SubscribePush` consumer (see
|
||||
`gateway/internal/backendclient`).
|
||||
|
||||
- the session cache is a read-through projection of authsession's
|
||||
source-of-truth session records (rebuildable via re-authentication);
|
||||
- the replay store is a short-lived `SETNX` reservation namespace per
|
||||
authenticated request (`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`);
|
||||
- the session-events stream is a runtime fan-out of session lifecycle
|
||||
updates;
|
||||
- the client-events stream is a runtime push fan-out.
|
||||
|
||||
Stage 7 brought gateway in line with the steady-state rules established in
|
||||
Stage 0: every Galaxy service uses one master plus zero-or-more replicas
|
||||
with a mandatory password, no TLS, and no Redis ACL username; the connection
|
||||
is configured by the shared `pkg/redisconn` helper.
|
||||
The shared rule is: every Galaxy service uses one master plus
|
||||
zero-or-more replicas with a mandatory password, no TLS, and no Redis
|
||||
ACL username; the connection is configured by the shared
|
||||
`pkg/redisconn` helper.
|
||||
|
||||
## Decisions
|
||||
|
||||
### One shared `*redis.Client` owned by the runtime
|
||||
|
||||
`cmd/gateway/main.go` constructs a single `*redis.Client` via
|
||||
`internal/redisclient.NewClient`, attaches OpenTelemetry tracing and metrics
|
||||
via `internal/redisclient.InstrumentClient`, performs one bounded `PING`
|
||||
via `internal/redisclient.Ping`, and registers `client.Close` for shutdown.
|
||||
The session cache, replay store, session-events subscriber, and
|
||||
client-events subscriber all receive this same client.
|
||||
|
||||
Adapters no longer build or own a Redis client. Their `Config` structs hold
|
||||
only behavior settings (key prefix, stream name, per-subsystem timeouts).
|
||||
Adapter constructors take `(*redis.Client, …)`. The stream subscribers'
|
||||
`Close`/`Shutdown` methods became no-ops; the runtime's context cancellation
|
||||
unblocks the `XRead` loop and the runtime closes the shared client.
|
||||
`internal/redisclient.NewClient`, attaches OpenTelemetry tracing and
|
||||
metrics via `internal/redisclient.InstrumentClient`, performs one
|
||||
bounded `PING` via `internal/redisclient.Ping`, and registers
|
||||
`client.Close` for shutdown. The replay store is the only adapter
|
||||
backed by Redis.
|
||||
|
||||
### One env-var prefix for the connection
|
||||
|
||||
@@ -51,17 +41,10 @@ Connection topology is loaded from a single `GATEWAY_REDIS_*` group via
|
||||
- `GATEWAY_REDIS_DB` (default `0`)
|
||||
- `GATEWAY_REDIS_OPERATION_TIMEOUT` (default `250ms`)
|
||||
|
||||
Per-subsystem behavior env vars keep their existing prefixes — they do not
|
||||
describe connection topology, only namespace and timing:
|
||||
Per-subsystem behavior env vars (namespace and timing only):
|
||||
|
||||
- `GATEWAY_SESSION_CACHE_REDIS_KEY_PREFIX`,
|
||||
`GATEWAY_SESSION_CACHE_REDIS_LOOKUP_TIMEOUT`
|
||||
- `GATEWAY_REPLAY_REDIS_KEY_PREFIX`,
|
||||
`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`
|
||||
- `GATEWAY_SESSION_EVENTS_REDIS_STREAM`,
|
||||
`GATEWAY_SESSION_EVENTS_REDIS_READ_BLOCK_TIMEOUT`
|
||||
- `GATEWAY_CLIENT_EVENTS_REDIS_STREAM`,
|
||||
`GATEWAY_CLIENT_EVENTS_REDIS_READ_BLOCK_TIMEOUT`
|
||||
|
||||
### Retired env vars (hard removal)
|
||||
|
||||
@@ -96,11 +79,8 @@ downstream dashboards will start populating without further changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Gateway test code that previously constructed a Redis client per adapter
|
||||
must now construct one client and pass it to every adapter under test
|
||||
(see `internal/session/redis_test.go`, `internal/replay/redis_test.go`,
|
||||
`internal/events/subscriber_test.go`,
|
||||
`internal/events/client_subscriber_test.go`).
|
||||
- Gateway test code constructs one shared client and passes it to the
|
||||
replay-store adapter under test (see `internal/replay/redis_test.go`).
|
||||
- Operators must set `GATEWAY_REDIS_PASSWORD`. A passwordless local Redis
|
||||
is still acceptable as long as a placeholder password is supplied to the
|
||||
binary; Redis without `requirepass` accepts AUTH unconditionally.
|
||||
|
||||
+39
-34
@@ -7,28 +7,30 @@ readiness, shutdown, and push or revoke incidents.
|
||||
|
||||
Before starting the process, confirm:
|
||||
|
||||
- `GATEWAY_REDIS_MASTER_ADDR` and `GATEWAY_REDIS_PASSWORD` point to the Redis
|
||||
deployment used for session lookup, replay reservations, session-events
|
||||
consumption, and client-events fan-out. Optional read replicas may be
|
||||
listed in `GATEWAY_REDIS_REPLICA_ADDRS` (currently unused; reserved for
|
||||
future read-routing).
|
||||
- `GATEWAY_SESSION_EVENTS_REDIS_STREAM` and
|
||||
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM` reference existing Redis Stream keys
|
||||
or the names publishers will use.
|
||||
- `GATEWAY_RESPONSE_SIGNER_PRIVATE_KEY_PEM_PATH` points to a readable PKCS#8
|
||||
PEM-encoded Ed25519 private key.
|
||||
- `GATEWAY_REDIS_MASTER_ADDR` and `GATEWAY_REDIS_PASSWORD` point to the
|
||||
Redis deployment used for anti-replay reservations. Optional read
|
||||
replicas may be listed in `GATEWAY_REDIS_REPLICA_ADDRS` (currently
|
||||
unused; reserved for future read-routing).
|
||||
- `GATEWAY_BACKEND_HTTP_URL`, `GATEWAY_BACKEND_GRPC_PUSH_URL`, and
|
||||
`GATEWAY_BACKEND_GATEWAY_CLIENT_ID` describe the consolidated backend
|
||||
service the gateway forwards every public auth and authenticated
|
||||
user/lobby request to and the gRPC push subscription it opens.
|
||||
- `GATEWAY_RESPONSE_SIGNER_PRIVATE_KEY_PEM_PATH` points to a readable
|
||||
PKCS#8 PEM-encoded Ed25519 private key.
|
||||
- the configured Redis DB and key-prefix settings match the target
|
||||
environment. Per `ARCHITECTURE.md §Persistence Backends`, Redis traffic is
|
||||
password-protected and TLS is disabled by policy; the deprecated
|
||||
`GATEWAY_REDIS_TLS_ENABLED` and `GATEWAY_REDIS_USERNAME` variables are no
|
||||
longer accepted and cause a hard fail at startup.
|
||||
environment. Per `ARCHITECTURE.md §Persistence Backends`, Redis traffic
|
||||
is password-protected and TLS is disabled by policy; the deprecated
|
||||
`GATEWAY_REDIS_TLS_ENABLED` and `GATEWAY_REDIS_USERNAME` variables are
|
||||
no longer accepted and cause a hard fail at startup.
|
||||
|
||||
At startup the process opens one shared `*redis.Client` (instrumented via
|
||||
OpenTelemetry tracing and metrics) and performs one bounded `PING`. The
|
||||
session cache, replay store, session-events subscriber, and client-events
|
||||
subscriber all use that client.
|
||||
At startup the process opens one shared `*redis.Client` (instrumented
|
||||
via OpenTelemetry tracing and metrics) and performs one bounded `PING`
|
||||
for the replay store. It also dials backend's gRPC push listener and
|
||||
opens one `Push.SubscribePush` stream that reconnects with capped
|
||||
exponential backoff on failure.
|
||||
|
||||
Startup fails fast if the ping fails or if the signer key cannot be loaded.
|
||||
Startup fails fast if the Redis ping fails, the backend URL is
|
||||
malformed, or the signer key cannot be loaded.
|
||||
|
||||
Expected listener state after a healthy start:
|
||||
|
||||
@@ -96,13 +98,15 @@ During planned restarts:
|
||||
|
||||
If a revoked session still sends traffic or keeps an active stream:
|
||||
|
||||
1. verify that the auth/session side published a session snapshot with the
|
||||
same `device_session_id` and `status=revoked`;
|
||||
2. verify that the event was written to
|
||||
`GATEWAY_SESSION_EVENTS_REDIS_STREAM`;
|
||||
3. verify the gateway is connected to the same Redis address, DB, and stream;
|
||||
4. confirm the snapshot fields are complete and well-formed;
|
||||
5. check that a later active snapshot did not overwrite the revoked one.
|
||||
1. verify that backend recorded the revocation (the
|
||||
`/api/v1/internal/sessions/{id}` lookup must return `status=revoked`
|
||||
for that device session);
|
||||
2. verify that backend emitted the corresponding `session_invalidation`
|
||||
frame on `Push.SubscribePush` and that the gateway logs a
|
||||
matching subscription closure;
|
||||
3. verify the gateway is connected to the same backend instance via
|
||||
`GATEWAY_BACKEND_HTTP_URL` / `GATEWAY_BACKEND_GRPC_PUSH_URL`;
|
||||
4. confirm the next authenticated request from that session is rejected.
|
||||
|
||||
Expected gateway behavior after the revoke snapshot is consumed:
|
||||
|
||||
@@ -116,16 +120,17 @@ Expected gateway behavior after the revoke snapshot is consumed:
|
||||
If a client reports missing push events:
|
||||
|
||||
1. confirm that the client successfully opened `SubscribeEvents`;
|
||||
2. confirm the stream received the initial `gateway.server_time` bootstrap
|
||||
event;
|
||||
3. confirm the gateway consumed the expected entry from
|
||||
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM`;
|
||||
4. verify `user_id` and optional `device_session_id` in the stream entry match
|
||||
the intended target;
|
||||
2. confirm the stream received the initial `gateway.server_time`
|
||||
bootstrap event;
|
||||
3. confirm the gateway consumed the expected `pushv1.PushEvent` from
|
||||
backend (look for `push_dispatcher` log lines or
|
||||
`grpc_push_events_total` increments on the backend side);
|
||||
4. verify `user_id` and optional `device_session_id` on the
|
||||
`ClientEvent` match the intended target;
|
||||
5. confirm the event payload fields are well-formed and not dropped as
|
||||
malformed;
|
||||
6. check whether the stream was closed earlier because of revoke, shutdown, or
|
||||
overflow.
|
||||
6. check whether the stream was closed earlier because of revoke,
|
||||
shutdown, or overflow.
|
||||
|
||||
### Stream Closed Unexpectedly
|
||||
|
||||
|
||||
+22
-23
@@ -14,48 +14,47 @@ flowchart LR
|
||||
PublicHTTP["Public HTTP listener\n/healthz /readyz /api/v1/public/auth/*"]
|
||||
AuthGRPC["Authenticated gRPC listener\nExecuteCommand / SubscribeEvents"]
|
||||
AdminHTTP["Optional admin HTTP listener\n/metrics"]
|
||||
SessionSnap["In-memory session snapshot cache"]
|
||||
BackendREST["backendclient.RESTClient\nsessions + public auth + user/lobby"]
|
||||
BackendPush["backendclient.PushClient\nSubscribePush consumer"]
|
||||
Replay["Replay reservation client"]
|
||||
PushHub["PushHub"]
|
||||
SessSub["Session event subscriber"]
|
||||
ClientSub["Client event subscriber"]
|
||||
Dispatcher["Push event dispatcher"]
|
||||
Telemetry["Logs, traces, metrics"]
|
||||
end
|
||||
|
||||
Public --> PublicHTTP
|
||||
Authd --> AuthGRPC
|
||||
AuthGRPC --> SessionSnap
|
||||
PublicHTTP --> BackendREST
|
||||
AuthGRPC --> BackendREST
|
||||
AuthGRPC --> Replay
|
||||
AuthGRPC --> PushHub
|
||||
SessSub --> SessionSnap
|
||||
SessSub --> PushHub
|
||||
ClientSub --> PushHub
|
||||
BackendPush --> Dispatcher
|
||||
Dispatcher --> PushHub
|
||||
PublicHTTP --> Telemetry
|
||||
AuthGRPC --> Telemetry
|
||||
AdminHTTP --> Telemetry
|
||||
|
||||
Redis["Redis\nsession records + replay keys + streams"]
|
||||
AuthSvc["Auth / Session Service"]
|
||||
Downstream["Downstream business services"]
|
||||
Redis["Redis\nanti-replay reservations only"]
|
||||
Backend["backend service\nHTTP + gRPC"]
|
||||
Metrics["Prometheus / OTLP collectors"]
|
||||
|
||||
PublicHTTP -. public auth adapter .-> AuthSvc
|
||||
SessionSnap --> Redis
|
||||
BackendREST --> Backend
|
||||
BackendPush --> Backend
|
||||
Replay --> Redis
|
||||
SessSub --> Redis
|
||||
ClientSub --> Redis
|
||||
AuthGRPC --> Downstream
|
||||
Telemetry --> Metrics
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `cmd/gateway` refuses startup when Redis connectivity or the response signer
|
||||
is misconfigured.
|
||||
- `cmd/gateway` refuses startup when Redis connectivity, the backend endpoint,
|
||||
or the response signer is misconfigured.
|
||||
- Session lookup is synchronous: every authenticated gRPC request triggers one
|
||||
`GET /api/v1/internal/sessions/{id}` call to backend; there is no
|
||||
process-local projection.
|
||||
- `backendclient.PushClient` keeps a long-lived `Push.SubscribePush` stream
|
||||
open. The dispatcher converts inbound `pushv1.PushEvent` frames into either
|
||||
`PushHub.Publish` (for client events) or `PushHub.RevokeDeviceSession` /
|
||||
`PushHub.RevokeAllForUser` (for `session_invalidation`).
|
||||
- `user.*` and `lobby.*` authenticated routes are forwarded to backend through
|
||||
the same REST client, with `X-User-Id` carrying the verified identity.
|
||||
- The admin listener is optional and serves only Prometheus text metrics.
|
||||
- Public auth routing stays available without an upstream adapter, but returns
|
||||
`503 service_unavailable`.
|
||||
- The default runtime reserves direct `user.*` authenticated self-service
|
||||
routes. When `GATEWAY_USER_SERVICE_BASE_URL` is unset those routes stay
|
||||
mounted but fail closed as dependency-unavailable instead of returning a
|
||||
route miss.
|
||||
|
||||
Reference in New Issue
Block a user