feat: backend service

This commit is contained in:
Ilia Denisov
2026-05-06 10:14:55 +03:00
committed by GitHub
parent 3e2622757e
commit f446c6a2ac
1486 changed files with 49720 additions and 266401 deletions
+18 -42
View File
@@ -7,20 +7,20 @@ sequenceDiagram
participant Client
participant Gateway
participant Limiter as Public anti-abuse
participant Auth as AuthServiceClient
participant Backend as backendclient.RESTClient
Client->>Gateway: POST /api/v1/public/auth/send-email-code
Gateway->>Limiter: classify + rate-limit + body checks
Limiter-->>Gateway: allowed
Gateway->>Auth: SendEmailCode(email)
Auth-->>Gateway: challenge_id
Gateway->>Backend: POST /api/v1/public/auth/send-email-code
Backend-->>Gateway: 200 {challenge_id}
Gateway-->>Client: 200 {challenge_id}
Client->>Gateway: POST /api/v1/public/auth/confirm-email-code
Gateway->>Limiter: classify + rate-limit + body checks
Limiter-->>Gateway: allowed
Gateway->>Auth: ConfirmEmailCode(challenge_id, code, client_public_key, time_zone)
Auth-->>Gateway: device_session_id
Gateway->>Backend: POST /api/v1/public/auth/confirm-email-code
Backend-->>Gateway: 200 {device_session_id}
Gateway-->>Client: 200 {device_session_id}
```
@@ -30,15 +30,14 @@ sequenceDiagram
sequenceDiagram
participant Client
participant Gateway
participant Cache as SessionCache
participant Backend as backendclient.RESTClient
participant Replay as ReplayStore
participant Policy as Rate limit / policy
participant Downstream
Client->>Gateway: ExecuteCommand(envelope, payload_bytes, signature)
Gateway->>Gateway: validate envelope + protocol_version
Gateway->>Cache: lookup(device_session_id)
Cache-->>Gateway: session record
Gateway->>Backend: GET /api/v1/internal/sessions/{device_session_id}
Backend-->>Gateway: session record
Gateway->>Gateway: verify payload_hash
Gateway->>Gateway: verify Ed25519 signature
Gateway->>Gateway: verify freshness window
@@ -46,57 +45,34 @@ sequenceDiagram
Replay-->>Gateway: accepted
Gateway->>Policy: apply IP/session/user/message_type budgets
Policy-->>Gateway: allowed
Gateway->>Downstream: verified authenticated command
Downstream-->>Gateway: result_code + payload_bytes
Gateway->>Backend: PATCH/POST/GET /api/v1/user/...
Backend-->>Gateway: JSON success or error
Gateway->>Gateway: hash payload + sign response
Gateway-->>Client: ExecuteCommandResponse + signature
```
## Direct Gateway -> User Self-Service Flow
```mermaid
sequenceDiagram
participant Client
participant Gateway
participant User as User Service
Client->>Gateway: ExecuteCommand(user.account.get | user.profile.update | user.settings.update)
Gateway->>Gateway: verify envelope + session + signature + replay
Gateway->>Gateway: decode FlatBuffers payload
Gateway->>User: trusted REST/JSON internal request
User-->>Gateway: JSON account aggregate or JSON error envelope
Gateway->>Gateway: encode FlatBuffers success or error payload
Gateway->>Gateway: sign response
Gateway-->>Client: ExecuteCommandResponse(result_code, payload_bytes, signature)
```
## SubscribeEvents Lifecycle
```mermaid
sequenceDiagram
participant Client
participant Gateway
participant Cache as SessionCache
participant Replay as ReplayStore
participant Backend as backend Push.SubscribePush
participant Hub as PushHub
participant Stream as Client event stream
participant Sess as Session event stream
participant Dispatcher
Client->>Gateway: SubscribeEvents(envelope, signature)
Gateway->>Gateway: validate envelope + verify request
Gateway->>Cache: lookup(device_session_id)
Cache-->>Gateway: session record
Gateway->>Replay: reserve(device_session_id, request_id, ttl)
Replay-->>Gateway: accepted
Gateway->>Gateway: lookup session via backend REST
Gateway->>Client: gateway.server_time event
Gateway->>Hub: register(user_id, device_session_id)
Stream-->>Gateway: client-facing event for user_id / device_session_id
Gateway->>Hub: publish signed event
Hub-->>Client: matching event delivery
Backend-->>Dispatcher: PushEvent{ClientEvent}
Dispatcher->>Hub: Publish(push.Event)
Hub-->>Client: matching event delivery (signed envelope)
Sess-->>Gateway: revoked session snapshot
Gateway->>Hub: revoke(device_session_id)
Backend-->>Dispatcher: PushEvent{SessionInvalidation}
Dispatcher->>Hub: RevokeDeviceSession or RevokeAllForUser
Hub-->>Client: stream closes with FAILED_PRECONDITION
Note over Gateway,Hub: During shutdown the gateway closes PushHub before gRPC graceful stop.
+21 -41
View File
@@ -1,43 +1,33 @@
# Decision: Redis configuration shape
PG_PLAN.md §7. Captures the standing rules adopted by Edge Gateway when it
joined the project-wide Redis topology defined in
`ARCHITECTURE.md §Persistence Backends`.
Captures the standing rules adopted by Edge Gateway when it joined the
project-wide Redis topology described in `ARCHITECTURE.md`.
## Context
Gateway intentionally stays Redis-only. All gateway state Redis serves is
TTL-bounded or runtime-coordination state:
Gateway intentionally stays Redis-light. The only Redis state served by
gateway is the replay reservation namespace (short-lived `SETNX` per
authenticated request, bounded by
`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`). Session lookup goes through
backend's REST surface, and inbound events are delivered through the
gRPC `Push.SubscribePush` consumer (see
`gateway/internal/backendclient`).
- the session cache is a read-through projection of authsession's
source-of-truth session records (rebuildable via re-authentication);
- the replay store is a short-lived `SETNX` reservation namespace per
authenticated request (`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`);
- the session-events stream is a runtime fan-out of session lifecycle
updates;
- the client-events stream is a runtime push fan-out.
Stage 7 brought gateway in line with the steady-state rules established in
Stage 0: every Galaxy service uses one master plus zero-or-more replicas
with a mandatory password, no TLS, and no Redis ACL username; the connection
is configured by the shared `pkg/redisconn` helper.
The shared rule is: every Galaxy service uses one master plus
zero-or-more replicas with a mandatory password, no TLS, and no Redis
ACL username; the connection is configured by the shared
`pkg/redisconn` helper.
## Decisions
### One shared `*redis.Client` owned by the runtime
`cmd/gateway/main.go` constructs a single `*redis.Client` via
`internal/redisclient.NewClient`, attaches OpenTelemetry tracing and metrics
via `internal/redisclient.InstrumentClient`, performs one bounded `PING`
via `internal/redisclient.Ping`, and registers `client.Close` for shutdown.
The session cache, replay store, session-events subscriber, and
client-events subscriber all receive this same client.
Adapters no longer build or own a Redis client. Their `Config` structs hold
only behavior settings (key prefix, stream name, per-subsystem timeouts).
Adapter constructors take `(*redis.Client, …)`. The stream subscribers'
`Close`/`Shutdown` methods became no-ops; the runtime's context cancellation
unblocks the `XRead` loop and the runtime closes the shared client.
`internal/redisclient.NewClient`, attaches OpenTelemetry tracing and
metrics via `internal/redisclient.InstrumentClient`, performs one
bounded `PING` via `internal/redisclient.Ping`, and registers
`client.Close` for shutdown. The replay store is the only adapter
backed by Redis.
### One env-var prefix for the connection
@@ -51,17 +41,10 @@ Connection topology is loaded from a single `GATEWAY_REDIS_*` group via
- `GATEWAY_REDIS_DB` (default `0`)
- `GATEWAY_REDIS_OPERATION_TIMEOUT` (default `250ms`)
Per-subsystem behavior env vars keep their existing prefixes — they do not
describe connection topology, only namespace and timing:
Per-subsystem behavior env vars (namespace and timing only):
- `GATEWAY_SESSION_CACHE_REDIS_KEY_PREFIX`,
`GATEWAY_SESSION_CACHE_REDIS_LOOKUP_TIMEOUT`
- `GATEWAY_REPLAY_REDIS_KEY_PREFIX`,
`GATEWAY_REPLAY_REDIS_RESERVE_TIMEOUT`
- `GATEWAY_SESSION_EVENTS_REDIS_STREAM`,
`GATEWAY_SESSION_EVENTS_REDIS_READ_BLOCK_TIMEOUT`
- `GATEWAY_CLIENT_EVENTS_REDIS_STREAM`,
`GATEWAY_CLIENT_EVENTS_REDIS_READ_BLOCK_TIMEOUT`
### Retired env vars (hard removal)
@@ -96,11 +79,8 @@ downstream dashboards will start populating without further changes.
## Consequences
- Gateway test code that previously constructed a Redis client per adapter
must now construct one client and pass it to every adapter under test
(see `internal/session/redis_test.go`, `internal/replay/redis_test.go`,
`internal/events/subscriber_test.go`,
`internal/events/client_subscriber_test.go`).
- Gateway test code constructs one shared client and passes it to the
replay-store adapter under test (see `internal/replay/redis_test.go`).
- Operators must set `GATEWAY_REDIS_PASSWORD`. A passwordless local Redis
is still acceptable as long as a placeholder password is supplied to the
binary; Redis without `requirepass` accepts AUTH unconditionally.
+39 -34
View File
@@ -7,28 +7,30 @@ readiness, shutdown, and push or revoke incidents.
Before starting the process, confirm:
- `GATEWAY_REDIS_MASTER_ADDR` and `GATEWAY_REDIS_PASSWORD` point to the Redis
deployment used for session lookup, replay reservations, session-events
consumption, and client-events fan-out. Optional read replicas may be
listed in `GATEWAY_REDIS_REPLICA_ADDRS` (currently unused; reserved for
future read-routing).
- `GATEWAY_SESSION_EVENTS_REDIS_STREAM` and
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM` reference existing Redis Stream keys
or the names publishers will use.
- `GATEWAY_RESPONSE_SIGNER_PRIVATE_KEY_PEM_PATH` points to a readable PKCS#8
PEM-encoded Ed25519 private key.
- `GATEWAY_REDIS_MASTER_ADDR` and `GATEWAY_REDIS_PASSWORD` point to the
Redis deployment used for anti-replay reservations. Optional read
replicas may be listed in `GATEWAY_REDIS_REPLICA_ADDRS` (currently
unused; reserved for future read-routing).
- `GATEWAY_BACKEND_HTTP_URL`, `GATEWAY_BACKEND_GRPC_PUSH_URL`, and
`GATEWAY_BACKEND_GATEWAY_CLIENT_ID` describe the consolidated backend
service the gateway forwards every public auth and authenticated
user/lobby request to and the gRPC push subscription it opens.
- `GATEWAY_RESPONSE_SIGNER_PRIVATE_KEY_PEM_PATH` points to a readable
PKCS#8 PEM-encoded Ed25519 private key.
- the configured Redis DB and key-prefix settings match the target
environment. Per `ARCHITECTURE.md §Persistence Backends`, Redis traffic is
password-protected and TLS is disabled by policy; the deprecated
`GATEWAY_REDIS_TLS_ENABLED` and `GATEWAY_REDIS_USERNAME` variables are no
longer accepted and cause a hard fail at startup.
environment. Per `ARCHITECTURE.md §Persistence Backends`, Redis traffic
is password-protected and TLS is disabled by policy; the deprecated
`GATEWAY_REDIS_TLS_ENABLED` and `GATEWAY_REDIS_USERNAME` variables are
no longer accepted and cause a hard fail at startup.
At startup the process opens one shared `*redis.Client` (instrumented via
OpenTelemetry tracing and metrics) and performs one bounded `PING`. The
session cache, replay store, session-events subscriber, and client-events
subscriber all use that client.
At startup the process opens one shared `*redis.Client` (instrumented
via OpenTelemetry tracing and metrics) and performs one bounded `PING`
for the replay store. It also dials backend's gRPC push listener and
opens one `Push.SubscribePush` stream that reconnects with capped
exponential backoff on failure.
Startup fails fast if the ping fails or if the signer key cannot be loaded.
Startup fails fast if the Redis ping fails, the backend URL is
malformed, or the signer key cannot be loaded.
Expected listener state after a healthy start:
@@ -96,13 +98,15 @@ During planned restarts:
If a revoked session still sends traffic or keeps an active stream:
1. verify that the auth/session side published a session snapshot with the
same `device_session_id` and `status=revoked`;
2. verify that the event was written to
`GATEWAY_SESSION_EVENTS_REDIS_STREAM`;
3. verify the gateway is connected to the same Redis address, DB, and stream;
4. confirm the snapshot fields are complete and well-formed;
5. check that a later active snapshot did not overwrite the revoked one.
1. verify that backend recorded the revocation (the
`/api/v1/internal/sessions/{id}` lookup must return `status=revoked`
for that device session);
2. verify that backend emitted the corresponding `session_invalidation`
frame on `Push.SubscribePush` and that the gateway logs a
matching subscription closure;
3. verify the gateway is connected to the same backend instance via
`GATEWAY_BACKEND_HTTP_URL` / `GATEWAY_BACKEND_GRPC_PUSH_URL`;
4. confirm the next authenticated request from that session is rejected.
Expected gateway behavior after the revoke snapshot is consumed:
@@ -116,16 +120,17 @@ Expected gateway behavior after the revoke snapshot is consumed:
If a client reports missing push events:
1. confirm that the client successfully opened `SubscribeEvents`;
2. confirm the stream received the initial `gateway.server_time` bootstrap
event;
3. confirm the gateway consumed the expected entry from
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM`;
4. verify `user_id` and optional `device_session_id` in the stream entry match
the intended target;
2. confirm the stream received the initial `gateway.server_time`
bootstrap event;
3. confirm the gateway consumed the expected `pushv1.PushEvent` from
backend (look for `push_dispatcher` log lines or
`grpc_push_events_total` increments on the backend side);
4. verify `user_id` and optional `device_session_id` on the
`ClientEvent` match the intended target;
5. confirm the event payload fields are well-formed and not dropped as
malformed;
6. check whether the stream was closed earlier because of revoke, shutdown, or
overflow.
6. check whether the stream was closed earlier because of revoke,
shutdown, or overflow.
### Stream Closed Unexpectedly
+22 -23
View File
@@ -14,48 +14,47 @@ flowchart LR
PublicHTTP["Public HTTP listener\n/healthz /readyz /api/v1/public/auth/*"]
AuthGRPC["Authenticated gRPC listener\nExecuteCommand / SubscribeEvents"]
AdminHTTP["Optional admin HTTP listener\n/metrics"]
SessionSnap["In-memory session snapshot cache"]
BackendREST["backendclient.RESTClient\nsessions + public auth + user/lobby"]
BackendPush["backendclient.PushClient\nSubscribePush consumer"]
Replay["Replay reservation client"]
PushHub["PushHub"]
SessSub["Session event subscriber"]
ClientSub["Client event subscriber"]
Dispatcher["Push event dispatcher"]
Telemetry["Logs, traces, metrics"]
end
Public --> PublicHTTP
Authd --> AuthGRPC
AuthGRPC --> SessionSnap
PublicHTTP --> BackendREST
AuthGRPC --> BackendREST
AuthGRPC --> Replay
AuthGRPC --> PushHub
SessSub --> SessionSnap
SessSub --> PushHub
ClientSub --> PushHub
BackendPush --> Dispatcher
Dispatcher --> PushHub
PublicHTTP --> Telemetry
AuthGRPC --> Telemetry
AdminHTTP --> Telemetry
Redis["Redis\nsession records + replay keys + streams"]
AuthSvc["Auth / Session Service"]
Downstream["Downstream business services"]
Redis["Redis\nanti-replay reservations only"]
Backend["backend service\nHTTP + gRPC"]
Metrics["Prometheus / OTLP collectors"]
PublicHTTP -. public auth adapter .-> AuthSvc
SessionSnap --> Redis
BackendREST --> Backend
BackendPush --> Backend
Replay --> Redis
SessSub --> Redis
ClientSub --> Redis
AuthGRPC --> Downstream
Telemetry --> Metrics
```
Notes:
- `cmd/gateway` refuses startup when Redis connectivity or the response signer
is misconfigured.
- `cmd/gateway` refuses startup when Redis connectivity, the backend endpoint,
or the response signer is misconfigured.
- Session lookup is synchronous: every authenticated gRPC request triggers one
`GET /api/v1/internal/sessions/{id}` call to backend; there is no
process-local projection.
- `backendclient.PushClient` keeps a long-lived `Push.SubscribePush` stream
open. The dispatcher converts inbound `pushv1.PushEvent` frames into either
`PushHub.Publish` (for client events) or `PushHub.RevokeDeviceSession` /
`PushHub.RevokeAllForUser` (for `session_invalidation`).
- `user.*` and `lobby.*` authenticated routes are forwarded to backend through
the same REST client, with `X-User-Id` carrying the verified identity.
- The admin listener is optional and serves only Prometheus text metrics.
- Public auth routing stays available without an upstream adapter, but returns
`503 service_unavailable`.
- The default runtime reserves direct `user.*` authenticated self-service
routes. When `GATEWAY_USER_SERVICE_BASE_URL` is unset those routes stay
mounted but fail closed as dependency-unavailable instead of returning a
route miss.