docs: reorder & testing

This commit is contained in:
Ilia Denisov
2026-05-07 00:58:53 +03:00
committed by GitHub
parent f446c6a2ac
commit 604fe40bcf
148 changed files with 9150 additions and 2757 deletions
+773
View File
@@ -0,0 +1,773 @@
# Galaxy Architecture
Galaxy is a turn-based strategy platform. This document is the source of
truth for the platform architecture and supersedes
`ARCHITECTURE_deprecated.md`. The previous design factored the platform
into nine independently deployed services. This design consolidates all
business logic into a single `backend` service alongside the existing
`gateway` and `game` components.
## 1. Overview
The platform is composed of three executable units:
- **`gateway`** — single public ingress. Owns transport security, request
authentication via Ed25519-signed envelopes, anti-replay, response
signing, and routing of authenticated traffic to `backend`. Stays as a
separate process and is the only component reachable from the public
internet.
- **`backend`** — single internal service that owns every domain concern of
the platform: identity, sessions, lobby, game runtime, mail, push and
email notification delivery, geo signals, and administration. Talks to
Postgres, the Docker daemon, an SMTP relay, and the GeoLite2 country
database. The only consumer of `backend` over the network is `gateway`.
- **`game`** — turn-engine container. One container per active game,
managed exclusively by `backend`. The contract is the OpenAPI document
shipped with the engine module; behaviour is unchanged by this
architecture.
```mermaid
flowchart LR
Client((Client)) -- TLS + Ed25519 envelopes --> Gateway
Gateway -- REST/JSON, X-User-ID --> Backend
Backend -- gRPC stream (push) --> Gateway
Backend -- REST/JSON --> Engine[(Game Engine\ncontainer)]
Backend -- pgx --> Postgres[(Postgres)]
Backend -- Docker API --> Docker[(Docker daemon)]
Backend -- SMTP --> Mail[(SMTP relay)]
Backend -- GeoLite2 lookup --> GeoIP[(GeoLite2 DB)]
Gateway -- anti-replay reservations --> Redis[(Redis)]
```
The MVP runs `gateway` and `backend` as single-instance processes inside a
trusted network. Horizontal scaling, distributed coordination, and
mTLS-secured east-west traffic are explicit future work and are called out
in `Deployment topology`.
## 2. Component Boundaries
### `backend`
- Owns every persistent record of platform state in a Postgres schema named
`backend`. No other process writes that schema.
- Owns every Docker call to `galaxy-game-{game_id}` containers.
- Owns the SMTP relationship and the durable email outbox.
- Owns the in-memory caches that serve hot reads.
- Exposes one HTTP listener and one gRPC listener. No public ingress.
### `gateway`
- Public ingress. Performs TLS termination, request signature verification,
freshness window enforcement, anti-replay reservations, and rate
limiting before any request is forwarded to `backend`.
- Forwards authenticated requests to `backend` over HTTP/REST with the
resolved `user_id` carried as the `X-User-ID` header. Forwards
unauthenticated public traffic verbatim.
- Subscribes to `backend` over a long-lived gRPC server stream to receive
client push events and session-invalidation notices, signs them, and
delivers them to active client subscriptions.
- Stops everything that can be stopped at the edge. Any check that does
not require backend state — bad signature, stale timestamp, replayed
request_id, malformed envelope, blocked-session shortcut — is enforced
by `gateway` so that backend is not loaded with invalid traffic.
### `game`
- A single game-engine instance per running game, packaged as a Docker
container. Stateful only on its host bind-mounted state directory.
- Reachable inside the trusted network at `http://galaxy-game-{game_id}:8080`.
- Receives all administrative and player-action calls from `backend` only.
## 3. Backend API Surfaces
`backend` exposes one HTTP listener with four route groups distinguished
by middleware. The full contract lives in `backend/openapi.yaml`.
| Prefix | Authentication | Audience |
| --------------------- | ------------------------------------------------ | --------------------------------------- |
| `/api/v1/public/*` | none | unauthenticated registration |
| `/api/v1/user/*` | `X-User-ID` injected by `gateway` | authenticated end users |
| `/api/v1/internal/*` | none (network-trusted) | gateway-only server-to-server endpoints |
| `/api/v1/admin/*` | HTTP Basic Auth against `admin_accounts` | platform administrators |
| `/healthz`, `/readyz` | none | infrastructure probes |
`backend` derives user identity exclusively from the `X-User-ID` header on
the user surface. Request bodies are never trusted to convey identity.
The admin surface is on the same listener as the user surface; isolation
between admin and the public is provided by Basic Auth and by the trust
boundary described in [§15](#15-transport-security-model-gateway-boundary).
The internal surface is part of that same trust boundary: it is
network-locked rather than auth-locked, and only `gateway` is expected
to call it. The internal surface is read-only with respect to device
sessions — it carries the per-request lookup gateway needs to verify a
signed envelope, and nothing else. Revocations are user-driven (through
the user surface) or admin-driven (through in-process calls inside
backend); see [`FUNCTIONAL.md` §1.5](FUNCTIONAL.md#15-revocation).
JSON bodies use `snake_case` field names everywhere on the wire. Backend,
gateway, and the shared `pkg/model` schemas are aligned on this convention;
any future migration to `camelCase` must happen at the `pkg/model` boundary
and propagate uniformly. Every error response follows the envelope
`{"error": {"code": "<machine-readable>", "message": "<human-readable>"}}`.
The closed set of `code` values is enumerated in
`components/schemas/ErrorBody` of `backend/openapi.yaml`. `409 Conflict` is
the standard status when a request collides with existing state (duplicate
admin username, duplicate `(template_id, idempotency_key)`, resend on a
`sent` mail delivery, lobby state-machine collisions).
## 4. Backend Domain Modules
Each module is a Go package under `backend/internal/`. Modules are wired
by direct struct references; interfaces are introduced only where a test
seam or an external system boundary justifies them.
A few cross-module invariants survive consolidation and are surfaced here
because they cross domain boundaries:
- **`accounts.user_name`** is the immutable login handle assigned at first
sign-in. Backend synthesises it as `Player-XXXXXXXX` (eight
`crypto/rand`-backed alphanumerics, retried on UNIQUE collisions), so a
fresh email always lands a unique account without a client-supplied
name. The column is never overwritten on subsequent sign-ins.
- **`accounts.permanent_block`** is the canonical permanent-block flag.
When set, both `auth.SendEmailCode` and `auth.ConfirmEmailCode` reject
with `400 invalid_request`. The send-time check stops fresh challenges
for already-blocked addresses; the confirm-time check (re-run after
the verification code matches) catches admin blocks applied in the
window between send and confirm. Every other branch on send — including
a `blocked_emails` row, a throttled email, a fresh email — returns the
opaque `{challenge_id}` shape so the endpoint cannot be used to
enumerate accounts.
- **Public lobby games are admin-created** through
`POST /api/v1/admin/games`. The user-facing
`POST /api/v1/user/lobby/games` always emits `private` games owned by
`X-User-ID`. Public games carry `owner_user_id IS NULL`; the partial
index on `(owner_user_id) WHERE visibility = 'private'` keeps the
private-owner lookup efficient.
| Package | Responsibility |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `backend/internal/config` | Environment-variable loader and validator. |
| `backend/internal/server` | gin engine, listeners, route groups, shared middleware (request id, panic recovery, metrics, tracing). |
| `backend/internal/auth` | Email-code challenges, device sessions, Ed25519 client public keys, send/confirm, user-driven revoke (single + revoke-all), admin-driven revoke (sanctions, soft-delete, in-process), durable revocation audit in `session_revocations`, internal session lookup endpoint for gateway. |
| `backend/internal/user` | User accounts, settings (`preferred_language`, `time_zone`, `declared_country`), entitlements, sanctions, limits, soft delete with in-process cascade. |
| `backend/internal/lobby` | Games, applications, invites, memberships, enrollment state machine, turn schedule, Race Name Directory. |
| `backend/internal/runtime` | Engine version registry, container lifecycle, turn scheduler, `(user_id ↔ race_name ↔ engine_player_uuid)` mapping per game, runtime snapshot publication into `lobby`. |
| `backend/internal/mail` | Postgres outbox, SMTP delivery worker, retry/backoff, dead letters, admin resend. |
| `backend/internal/notification` | Notification intent normalization, idempotency, per-route fan-out into push (gRPC) and email (outbox). |
| `backend/internal/geo` | Per-session country observation, `(user_id, country)` counter, `declared_country` initialisation at registration. |
| `backend/internal/admin` | `admin_accounts` table, env-driven bootstrap, Basic Auth verifier, admin-side operations across other modules. |
| `backend/internal/push` | gRPC server hosting the `SubscribePush` stream consumed by gateway. |
| `backend/internal/engineclient` | Thin REST client to running game engines. Reuses DTOs from `pkg/model/{order,report,rest}`. |
| `backend/internal/dockerclient` | Wrapper around `github.com/docker/docker` for container start, stop, restart, patch, inspect, reconcile. |
| `backend/internal/postgres` | pgx pool, embedded migrations, jet-generated query packages. |
| `backend/internal/telemetry` | OpenTelemetry runtime, zap logger factory, trace-field helpers. |
## 5. Persistence
- A single Postgres database, schema `backend`. `backend` is the only
writer. Every `backend` table lives in this schema.
- Migrations are kept in `backend/internal/postgres/migrations/`,
embedded into the binary, and applied via `pressly/goose/v3` during
startup before any listener opens. The DSN must include
`?search_path=backend` so unqualified reads and writes resolve to the
service-owned schema.
- Queries are written through `go-jet/jet/v2`. Generated code lives in
`backend/internal/postgres/jet/` and is regenerated by `make jet`.
- Every domain identifier is a `uuid` primary key
(`device_session_id`, `user_id`, `game_id`, `application_id`,
`invite_id`, `membership_id`, `delivery_id`, `notification_id`, …).
Identifiers that are not Postgres-side identities (`email`,
`user_name`, `canonical`, `template_id`, `idempotency_key`,
`race_name`) remain `text`.
- Foreign keys are intra-domain only: `accounts → entitlement_*` /
`sanction_*` / `limit_*`; `games → applications` / `invites` /
`memberships` (with `ON DELETE CASCADE`); `mail_payloads →
mail_deliveries → mail_recipients` / `mail_attempts` /
`mail_dead_letters`; `notifications → notification_routes` /
`notification_dead_letters`. Cross-domain references
(`memberships.user_id`, `games.owner_user_id`, etc.) are kept as
opaque `uuid` columns because each domain runs its own cleanup
through the in-process cascade described in [§7](#7-in-process-async-patterns). Adding a database
cascade would either duplicate that work or hide it behind opaque
triggers.
- `created_at`, `updated_at`, `deleted_at` are always `timestamptz`. UTC
normalisation is applied on read and write.
- Idempotency is enforced through UNIQUE indexes on durable tables (for
example `(template_id, idempotency_key)` on `mail_deliveries`,
`race_name_canonical` on registered race names, `(game_id, user_id)` on
`memberships`). There is no separate idempotency table.
- Worker pickup uses `SELECT ... FOR UPDATE SKIP LOCKED` ordered by
`next_attempt_at`. This pattern serves the mail outbox, retry-able
runtime jobs, and any future deferred work.
- `session_revocations` is the append-only audit trail of every device
session revocation, keyed by `revocation_id` (uuid) with
`device_session_id`, `user_id`, `actor_kind`, the actor pair
`actor_user_id uuid` + `actor_username text` (exactly one is
non-NULL per row, enforced by a CHECK constraint), `reason`, and
`revoked_at`. The row is inserted in the same transaction that
flips `device_sessions.status` to `'revoked'`, so a successful
revoke always leaves a matching audit row.
The two-column actor pair is the canonical shape used by every
audit-bearing table — `accounts.deleted_actor_*`,
`entitlement_records`, `entitlement_snapshots`,
`sanction_records.actor_*` + `removed_by_*`, and
`limit_records.actor_*` + `removed_by_*` follow the same convention.
`actor_kind` (or `actor_type` on the user-domain tables) values are
`user`, `admin`, `system`. The Go layer hides the split behind
`user.ActorRef{Type, ID string}`: `Type=="user"` requires `ID` to
be a UUID, `Type=="admin"` stores `ID` as the operator username
(passed to `actor_username`), and `Type=="system"` requires an
empty `ID`. See `backend/internal/user/store.go`
(`actorToColumnArgs`/`actorFromColumns`) for the SQL boundary.
## 6. In-Memory Cache
Postgres is the cold store. In-memory caches in `backend` serve hot
reads and are warmed at process start.
| Cache | Population | Update path |
| ------------------------------- | --------------------------------------------------------- | -------------------------------------------- |
| Active device sessions | Full table read at startup. | Write-through on create/revoke. |
| User entitlement snapshots | Latest snapshot per active user at startup. | Write-through on entitlement change. |
| Engine version registry | Full table read at startup. | Write-through on admin update. |
| Active runtime records | Full table read at startup. | Write-through on container ops. |
| Active games + memberships | Full table read at startup. | Write-through inside lobby commands. |
| Race Name Directory canonicals | Full table read at startup. | Write-through inside lobby commands. |
| Admin accounts | Full table read at startup. | Write-through on admin CRUD. |
Every cache is bounded to MVP-scale data sets that comfortably fit in
process memory (10K accounts, 1000 active games, 100K device sessions, a
few thousand directory entries — all together well under 100 MB). If a
specific cache is observed to grow beyond a process budget at scale,
moving that cache to Redis must be discussed and approved before
implementation; the architecture leaves `backend` Redis-free by default.
Cache writes happen *after* the matching Postgres mutation commits. A
commit failure leaves the cache in sync with the prior database state.
Each cache exposes a `Ready` flag flipped to `true` after the warm-up
read finishes; the `/readyz` probe waits on every cache being ready
before reporting ready, so the listener never serves a request that
would spuriously miss because of a cold cache.
`gateway` carries a separate, smaller cache: the in-memory session
cache fronting every authenticated request. It is a bounded LRU
(default 50 000 entries) with a safety-net TTL (default 10 minutes).
Misses trigger a single synchronous REST call to backend's
`/api/v1/internal/sessions/{id}` lookup; hits answer the hot path
directly. The cache is kept consistent through the
`session_invalidation` push events backend emits over `Push.SubscribePush`:
each event flips the cached entry to `revoked` so subsequent
authenticated requests bound to that session are rejected at the
edge without another backend round-trip. The TTL covers the case of a
missed event (cursor aged out, gateway restart) by forcing a refresh
at most once per window.
## 7. In-Process Async Patterns
Async work is implemented with goroutines and channels. There is no Redis
pub/sub, no Redis Stream, and no message broker between domain modules.
The following table records how previously inter-service streams are
realised in process. The semantics — when each event fires, how many
times, in which order — are preserved; the transport changes from a
durable stream to an in-process function call or buffered channel.
| Previous external stream | In-process realisation |
| ----------------------------------------------------- | - |
| User lifecycle (block / soft delete) → Lobby cascade | `lobby.OnUserBlocked(user_id)` and `lobby.OnUserDeleted(user_id)` invoked synchronously after `user` commits. |
| Runtime snapshot updates → Lobby denormalisation | `lobby.OnRuntimeSnapshot(snapshot)` invoked from `runtime` after each engine status read. |
| Game finished → Lobby promotion / cleanup | `lobby.OnGameFinished(game_id)`. |
| Lobby start/stop jobs → Runtime container lifecycle | `runtime.StartGame(game_id)` / `runtime.StopGame(game_id)`. Long-running pull/start drained on a per-game worker goroutine, serialised by per-game mutex. |
| Runtime job results → Lobby | Direct return value from `runtime.StartGame`, plus optional `lobby.OnRuntimeJobResult` callback for asynchronous progression. |
| Runtime health events | `runtime` publishes onto an in-process channel; `lobby` and `admin` observers consume. |
| Notification intents | Direct call `notification.Submit(intent)` by producers (lobby, runtime, geo). |
| Mail delivery commands | Direct insert into `mail_deliveries` by producers; mail worker drains the table. |
| Auth → Mail (login codes) | Direct call `mail.EnqueueLoginCode(...)` from `auth.confirmEmailCode`. |
| Gateway client-events stream | Backend `push` server emits `client_event` on the gRPC stream consumed by gateway. |
| Gateway session-events stream | Backend `push` server emits `session_invalidation` on the same gRPC stream. |
Workers drain outstanding work on graceful shutdown in a deterministic
order: stop accepting new HTTP/gRPC traffic → finish in-flight requests →
flush mail outbox writes that already started → flush push events to
gateway buffer → close the Docker client → close the database pool.
The lobby state machine is the only domain whose transitions cross
several producers and consumers. The closed transitions are
`draft → enrollment_open → ready_to_start → starting → running ↔ paused
→ finished`, with `cancelled` reachable from every pre-`finished` state
and `start_failed → ready_to_start` for retry. Owner-driven endpoints
(or admin overrides for public games) trigger transitions; the
`runtime` callback `OnRuntimeJobResult` is the only path that flips
`starting → running` or `starting → start_failed`. `lobby.OnGameFinished`
is invoked when the engine reports the game finished, after which the
runtime container is torn down and Race Name Directory promotions run.
## 8. Backend ↔ Gateway Communication
There are two channels between `gateway` and `backend`.
**Sync REST (gateway → backend).** Every authenticated user request and
every public auth request goes over plain HTTP/JSON. The gateway sends
`X-User-ID` (when authenticated) and forwards the verified payload. The
backend never re-derives user identity from the body. The session
lookup hits backend's `/api/v1/internal/sessions/{id}` only on a
cache miss in the gateway-side LRU described in [§6](#6-in-memory-cache); backend updates
`device_sessions.last_seen_at` on every successful lookup so admin
operators can observe when each session was last resolved at the edge.
**gRPC stream (gateway ⇄ backend).** Backend exposes a single RPC
`SubscribePush(GatewaySubscribeRequest) returns (stream PushEvent)`. The
gateway opens this stream once at start and keeps it open. Each
`PushEvent` carries a `oneof`:
- `client_event` — opaque payload addressed to `(user_id [, device_session_id])`,
which gateway signs and delivers to active client subscriptions.
- `session_invalidation` — instructs gateway to immediately close any
active streams for `(device_session_id)` or for all sessions of `user_id`,
and to reject in-flight requests bound to those sessions.
Backend keeps a small in-memory ring buffer of recent events keyed by
cursor with TTL equal to the gateway freshness window. On reconnect,
gateway sends its last consumed cursor; backend resumes from the next
event or from a fresh cursor if the requested point has expired.
`gateway` keeps using Redis for anti-replay request_id reservations. No
other gateway↔backend interaction uses Redis.
### Edge enforcement
`gateway` is responsible for stopping every check it can answer locally so
that backend processes only well-shaped, fresh, authentic traffic:
- TLS termination and pinning where applicable.
- Request envelope parsing, payload hash verification, Ed25519 signature
verification, freshness window enforcement, anti-replay reservation.
- Public-facing rate limiting and basic policy.
- Closing of streams marked invalid via `session_invalidation`.
Backend assumes those checks have happened. It runs business validation,
authorisation, and state transitions on top of that assumption.
## 9. Backend ↔ Game Engine Communication
Backend is the only platform participant that talks to `galaxy-game-*`
containers. The contract is the engine OpenAPI document; backend uses the
existing typed DTOs in `pkg/model/{order,report,rest}` and a hand-written
`net/http` client in `backend/internal/engineclient`.
Authenticated client traffic for in-game operations crosses three
serialisation boundaries: signed-gRPC FlatBuffers (client ↔ gateway),
JSON over REST (gateway ↔ backend), and JSON over REST again
(backend ↔ engine). Gateway owns the FB ↔ JSON transcoding for the
three message types `user.games.command`, `user.games.order`,
`user.games.report` (FB schemas in `pkg/schema/fbs/{order,report}`,
encoders in `pkg/transcoder`). Backend never touches FlatBuffers and
never re-interprets the JSON beyond rebinding the actor field from
the runtime player mapping (clients never carry a trusted actor).
Container state is owned by `backend/internal/runtime`:
- `runtime_records` is the persistent map from `game_id` to current
container state.
- `engine_versions` is the registry of allowed engine images and serves as
the source for `image_ref` arbitration. Producers do not pick image
references on their own.
- Patch is semver-patch-only inside the same major/minor line; any
major/minor change requires an explicit stop and start.
- Reconciliation runs at startup and periodically: every container with
the `galaxy.backend` label is matched against `runtime_records`;
unrecorded containers with the label are adopted, missing recorded
containers are marked removed and an internal event is emitted.
- Container naming is fixed: `galaxy-game-{game_id}`; engine endpoint is
always `http://galaxy-game-{game_id}:8080`.
- Engine probes (`/healthz`) feed `runtime` health observations and turn
generation status.
## 10. Geo Profile (reduced)
The geo concern is intentionally minimal.
- At registration (`/api/v1/public/auth/confirm-email-code`), backend looks
up the source IP against the GeoLite2 country database via `pkg/geoip`
and stores the resulting ISO country code in `accounts.declared_country`.
This value is never updated afterwards; there is no version history.
- On every authenticated user-facing request, a fire-and-forget goroutine
performs the same lookup against the request IP and increments
`user_country_counters` by `(user_id, country, count bigint)`. The
request itself does not block on this update.
- There is no aggregation, no automatic flagging, no review
recommendations, no admin notifications, and no detection of account
takeover. Counter data is only available to operators via the admin
surface for manual inspection.
- Geo work is fail-open: any geoip error is logged but never blocks the
user request.
- Source IP for both flows is read from the leftmost `X-Forwarded-For`
entry, falling back to `RemoteAddr` when the header is absent.
Backend trusts the value because the network segment between gateway
and backend is the trust boundary ([§15](#15-transport-security-model-gateway-boundary)[§16](#16-security-boundaries-summary)); duplicating the edge
rate-limit / spoof checks here would be double work.
- Email addresses are never written to logs verbatim. Backend modules
emit a per-process HMAC-SHA256-truncated `email_hash` instead, so
operators can correlate log lines within a single process lifetime
without persisting PII.
## 11. Mail Outbox
Email is delivered through a Postgres-backed outbox.
- Producers (auth login codes, notification routes) write into
`mail_deliveries` with a unique `(template_id, idempotency_key)` and
the rendered payload bytes in `mail_payloads`.
- A worker goroutine selects work from `mail_deliveries` with
`SELECT ... FOR UPDATE SKIP LOCKED`, attempts SMTP delivery via
`wneessen/go-mail`, records the attempt in `mail_attempts`, and either
marks the delivery sent or schedules `next_attempt_at` for retry with
exponential backoff and jitter.
- After the configured maximum retry budget the delivery moves to
`mail_dead_letters`. The `mail.dead_lettered` notification kind is
reserved in the catalog but has no producer wired up yet, so no
admin notification is emitted today — operator visibility comes
from a log line and the `/api/v1/admin/mail/dead-letters` listing.
- On startup the worker drains everything pending. There is no separate
recovery procedure: starting backend is sufficient.
- Operators can re-enqueue from `mail_dead_letters` through the admin
surface.
The auth path returns success as soon as the delivery row is durably
committed; SMTP completion is asynchronous to the auth request.
## 12. Notification Pipeline
Notifications are an in-process pipeline. The closed catalog is
defined in `backend/internal/notification/catalog.go` and currently
covers 13 kinds: 10 lobby kinds (invite received/revoked, application
submitted/approved/rejected, membership removed/blocked, race name
registered/pending/expired) and 3 admin-recipient runtime kinds
(image pull failed, container start failed, start config invalid).
Per-kind delivery channels (push, email, or both) and the admin-vs-
per-user recipient routing live in the same file.
For every intent, `notification.Submit` performs:
1. Idempotency check (UNIQUE on `(intent_kind, idempotency_key)`).
2. Recipient resolution against `user`.
3. Per-recipient route materialisation in `notification_routes`
`push`, `email`, or both — based on the type-specific policy table.
4. Push routes are emitted onto the gRPC `client_event` channel for
the recipient. The dispatcher passes the producer's payload map
through `notification.buildClientPushEvent(kind, payload)`, which
maps the kind to the matching FlatBuffers schema in
`pkg/schema/fbs/notification.fbs` (one table per catalog kind, 1:1
with the camel-case form of the kind plus the `Event` suffix) and
returns a typed `push.Event`. `push.Service` invokes `Marshal` and
places the bytes into `pushv1.ClientEvent.Payload`. An unknown
kind falls back to `push.JSONEvent` so a misconfigured producer
does not silently drop frames; new kinds must ship with a typed
FB schema and a matching `buildClientPushEvent` case rather than
relying on the fallback.
5. Email routes are inserted into `mail_deliveries` with the matching
template id.
6. Malformed intents go to `notification_malformed_intents` and never
block the producer.
Notification persistence is the auditable record of "we tried to tell
this user about this thing"; clients still derive their actual game
state through normal user-facing reads.
## 13. Container Lifecycle (in-process)
`backend/internal/runtime` owns the lifecycle of game-engine containers
and is the only component permitted to issue Docker calls.
- All Docker calls go through `dockerclient`, which is a thin wrapper over
`github.com/docker/docker` configured against `BACKEND_DOCKER_HOST`.
- Per-game container operations are serialised through a per-game mutex
(held in memory) so that concurrent start/stop/patch attempts cannot
race. `runtime_operation_log` records every operation for audit.
- Long-running pulls and starts execute on worker goroutines; the calling
path returns as soon as the operation is queued, then receives
completion through a callback or a follow-up status read.
- The turn scheduler uses `pkg/cronutil` (a wrapper over
`robfig/cron/v3`) and schedules a tick per running game according to
`games.turn_schedule`. Force-next-turn sets a skip-flag that advances
the next scheduled tick by one cron step.
- Snapshots are read from the engine on a schedule, after every
successful command, and on health probe transitions; each read
publishes a `runtime_snapshot_update` to `lobby` in process.
Containers managed by `backend` carry the Docker label
`galaxy.backend=1`. Reconciliation matches that label against
`runtime_records` so a redeploy of `backend` re-attaches to running
games rather than orphaning them.
Future improvement (not in MVP): introduce a docker-socket-proxy sidecar
(for example `tecnativa/docker-socket-proxy`) and connect `dockerclient`
through it over TCP. Until then `backend` mounts `/var/run/docker.sock`
directly.
## 14. Admin Surface
- Admin authentication is HTTP Basic Auth.
- Credentials live in the Postgres table `admin_accounts` with
`username`, `password_hash` (bcrypt cost 12), `created_at`,
`last_used_at`, `disabled_at`.
- Bootstrap: at startup `backend` reads `BACKEND_ADMIN_BOOTSTRAP_USER`
and `BACKEND_ADMIN_BOOTSTRAP_PASSWORD`; if no `admin_accounts` record
with that username exists, it is inserted with the bcrypt hash. The
insert is idempotent so restarts are safe.
- Existing admins can manage other admins through the same
`/api/v1/admin/admin-accounts` endpoints.
- All other admin endpoints (`/api/v1/admin/users/*`, `/api/v1/admin/games/*`,
`/api/v1/admin/runtimes/*`, `/api/v1/admin/mail/*`,
`/api/v1/admin/notifications/*`) reuse the per-domain logic of the
module they target.
## 15. Transport Security Model (gateway boundary)
This section describes the secure exchange model between client and
gateway. It applies at the public boundary and does not rely on backend
behaviour for any of its guarantees.
### Principles
- No browser cookies.
- Authentication is device-session based.
- Each device session is unique and independently revocable.
- No short-lived access tokens or refresh-token flows.
- Requests are authenticated by client signatures.
- Responses and push events are authenticated by server signatures.
- Transport integrity and freshness are verified before any payload is
processed.
### Device session model
After a successful email-code login:
1. The client generates an Ed25519 key pair.
2. The private key remains on the client.
3. The client public key is registered with `backend` as the standard
base64-encoded raw 32-byte Ed25519 key.
4. `backend` creates a persistent device session.
5. The client persists `device_session_id` and the private key.
`backend` stores at least `device_session_id`, `user_id`, the
base64-encoded raw 32-byte Ed25519 client public key, session status,
and revoke metadata.
### Key storage
- Native clients use platform secure storage; private keys never leave
the device.
- Browser/WASM clients use WebCrypto with non-exportable storage where
available. Loss of browser storage is acceptable and is recovered by
re-login.
### Request envelope
Each authenticated request carries `payload_bytes`, a `request_envelope`,
and a signature. The envelope contains:
- `protocol_version` (`v1`)
- `device_session_id`
- `message_type`
- `timestamp_ms`
- `request_id`
- `payload_hash` (raw 32-byte SHA-256 of `payload_bytes`)
The client signs canonical bytes built from:
```text
"galaxy-request-v1" || protocol_version || device_session_id ||
message_type || timestamp_ms || request_id || payload_hash
```
with this binary encoding:
- each `string` and `bytes` field is encoded as `uvarint(len(field_bytes))`
followed by raw bytes;
- `timestamp_ms` is encoded as an 8-byte big-endian unsigned integer;
- fields are appended in the exact order listed.
The signature scheme is Ed25519. The signature carries the raw 64-byte
signature.
### Response envelope
Each server response carries `payload_bytes`, a `response_envelope`, and
a signature. The envelope contains:
- `protocol_version`
- `request_id`
- `timestamp_ms`
- `result_code`
- `payload_hash`
Canonical bytes:
```text
"galaxy-response-v1" || protocol_version || request_id ||
timestamp_ms || result_code || payload_hash
```
The gateway signs with a PKCS#8 PEM-encoded Ed25519 private key. Clients
verify with a trusted server public key.
### Push events
Each server push event carries `payload_bytes`, an `event_envelope`, and
a signature. Required envelope fields: `event_type`, `event_id`,
`timestamp_ms`, `payload_hash`. Optional: `request_id`, `trace_id`.
Canonical bytes:
```text
"galaxy-event-v1" || event_type || event_id || timestamp_ms ||
request_id || trace_id || payload_hash
```
Gateway signs each event at delivery time using the same Ed25519 key as
for responses. The bootstrap event delivered when a `SubscribeEvents`
stream opens is `event_type = gateway.server_time`, reusing the opening
`request_id` as `event_id` and carrying `server_time_ms` so clients can
calibrate offset without a separate time request.
### Verification order at gateway
Before any payload is forwarded to backend, gateway must:
1. Verify the transport envelope is present and supported.
2. Resolve `device_session_id` (against backend, sync REST).
3. Reject unknown or revoked sessions.
4. Verify the client signature using the stored public key.
5. Verify `payload_hash`.
6. Verify timestamp freshness (symmetric ±5 minutes around server time).
7. Verify anti-replay: reserve `(device_session_id, request_id)` until
`timestamp_ms + freshness_window`.
8. Apply edge rate limits and basic policy.
9. Forward to backend with `X-User-ID` set.
### Verification order at client
Before accepting a response payload, the client must verify the response
signature, that `request_id` matches the corresponding request, the
`payload_hash`, and where applicable the timestamp freshness.
Before accepting a push payload, the client must verify the event
signature, the `payload_hash`, the `request_id` when correlated, and
where applicable the timestamp freshness.
### Anti-replay
Anti-replay uses `(timestamp_ms, request_id)`. Recently seen
`request_id` values are tracked per session in Redis until
`timestamp_ms + freshness_window`. This protects transport freshness
only; business idempotency is a separate concern enforced by backend
domain tables.
### TLS and MITM
Native clients should use TLS pinning (SPKI-based) in addition to the
signed exchange. Browser clients rely on browser-managed TLS and the
signed exchange.
### Threat model boundaries
The transport model protects against tampering in transit, replay inside
the freshness window, use of unknown or revoked sessions, forged server
responses without the gateway signing key, and forged client requests
without the client signing key. It does not prevent a legitimate user
from generating their own valid requests; that is handled by backend
business validation and authorisation.
## 16. Security Boundaries Summary
| Concern | Enforced by | Notes |
| -------------------------------------------------------- | ----------------------- | ----------------------------------------------------------------------------------------------- |
| Public TLS termination, pinning | gateway | Native clients pin SPKI. |
| Request signature, payload hash, freshness, anti-replay | gateway | See [§15](#15-transport-security-model-gateway-boundary). |
| Session lookup | backend (sync REST) + gateway in-memory LRU | gateway-side LRU with TTL safety net ([§6](#6-in-memory-cache)) hits backend's `/api/v1/internal/sessions/{id}` only on miss; no Redis projection. |
| Session revocation propagation | backend → gateway | `session_invalidation` over the gRPC push stream flips the gateway-side cache entry to revoked and closes any active push stream. |
| Authorisation, ownership, state transitions | backend | `X-User-ID` is the sole identity input on the user surface. |
| Edge rate limiting | gateway | Backend has no rate-limit responsibility in MVP. |
| Admin authentication | backend | Basic Auth against `admin_accounts`. |
| Engine API authentication | network | Engine listens only on the trusted network; backend is the only caller. |
### Backend ↔ Gateway trust
The MVP does not require an additional authenticator between gateway and
backend. Backend trusts `X-User-ID` from gateway and accepts gateway
gRPC subscribers without authentication. The trust boundary is the
network: deployment must ensure that only `gateway` can reach
`backend`'s HTTP and gRPC listeners.
This is an explicit, accepted risk. Compromise of the trusted network
between gateway and backend would let any party impersonate any user or
admin against backend. The risk is mitigated only by network isolation
of the deploy. Adding mutual authentication (a pre-shared bearer token
or mTLS between gateway and backend) is a future hardening step;
backend is structured so that adding such a check is a single middleware
addition.
## 17. Observability
- **Tracing and metrics** flow through OpenTelemetry. The default exporter
is OTLP (gRPC or HTTP/protobuf, configurable). Metrics may also be
exposed via a Prometheus pull endpoint when configured.
- **Logging** uses `go.uber.org/zap` in JSON mode. Trace and span ids are
injected into every log entry written inside a request scope.
- Every backend module emits the metrics relevant to its concern: HTTP
request count and duration per route group, gRPC subscription count and
push event throughput, mail outbox depth and per-attempt outcomes,
notification fan-out counts, container operation counts and durations,
Postgres pool stats, geo lookup count and error rate.
- Health probes are unauthenticated `GET /healthz` (process liveness) and
`GET /readyz` (Postgres reachable, migrations applied, gRPC listener
bound). Probes are excluded from anti-replay and rate limiting.
## 18. Deployment Topology (informational)
- MVP runs three executables: one `gateway` instance, one `backend`
instance, and N `galaxy-game-{game_id}` containers managed by backend.
- One Postgres database is shared by `backend` only.
- One Redis instance is reachable from `gateway` only (anti-replay).
- One SMTP relay is reachable from `backend`.
- The Docker daemon socket is mounted into `backend`.
- The GeoLite2 country database file is mounted at the path given by
`BACKEND_GEOIP_DB_PATH`.
Future scale-out hooks (not in MVP):
- Distributed `backend` requires reintroducing Redis for shared session
cache and runtime job leasing, plus leader election for the turn
scheduler.
- mTLS between gateway and backend.
- Docker-socket-proxy sidecar fronting Docker daemon access.
## 19. Glossary
- **device_session_id** — opaque identifier of an authenticated client
device; primary key of the device session record.
- **race_name** — in-game player display name. Three tiers in the Race
Name Directory: registered (platform-unique), reservation (per-game),
pending_registration (post-capable-finish).
- **canonical key** — lowercased and confusable-folded form of a race
name used for uniqueness checks, computed via `disciplinedware/go-confusables`.
- **capable finish** — a finished game in which the player reached
`max_planets > initial AND max_population > initial`. Only capable
finishes promote a reservation to `pending_registration`.
- **runtime snapshot** — engine-status read materialised into the lobby's
denormalised view: `current_turn`, `runtime_status`,
`engine_health_summary`, `player_turn_stats`.
- **turn cutoff** — the `running → generation_in_progress` CAS transition
that closes the command window. Commands arriving after the CAS are
rejected.
- **outbox** — the durable queue of pending mail rows in
`mail_deliveries`, drained by the mail worker.
- **freshness window** — the symmetric ±5-minute interval around server
time inside which a request `timestamp_ms` is accepted.
- **trust boundary** — the network segment between gateway and backend.
Compromise of this segment defeats backend authentication; deployment
must isolate it.
+1036
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+333
View File
@@ -0,0 +1,333 @@
# Testing
Test strategy and runbook for the [Galaxy Game](ARCHITECTURE.md)
platform. The platform ships three executables — `gateway`,
`backend`, `game` (the engine container) — plus the shared `pkg/*`
libraries. This document defines the layering of tests, the
mandatory minimum coverage per executable, the integration runbook,
and the principles every test must follow.
## Layers
1. **Service tests** verify a single executable in isolation. They
live next to the implementation as `*_test.go` files and use only
in-process or testcontainers-managed dependencies. The package
either runs entirely in process or boots a single Postgres
testcontainer per test.
2. **Inter-service integration tests** verify one cross-process seam
between two real executables (most often `gateway ↔ backend`,
sometimes `backend ↔ game`). They live in
[`galaxy/integration/`](../integration/) and drive the platform
from outside the trust boundary.
3. **Full system tests** are a small, focused subset of the
integration suite that walks an entire user-facing flow from the
client edge through every component the flow touches. They live
in the same `integration/` module and reuse the same fixtures.
Service tests are the cheapest and the broadest; integration tests
are slower and broader; full-system tests are the slowest and the
narrowest. The pyramid stays in this order — never replace a service
test with a system test.
## Global rules
- Every executable owns the service tests for its packages. Adding a
new package without `_test.go` files is a review block.
- Every cross-process seam must have at least one passing
inter-service test before the seam is wired in production.
- Async flows (mail outbox, notification routes, runtime workers,
push gRPC) get tests for both the success path and the retry /
dead-letter path, and a duplicate-event safety check.
- Sync flows get happy path, validation failure, timeout
propagation, and dependency unavailable.
- Every external or trusted-internal API must have contract tests
alongside behaviour tests. `backend/internal/server/contract_test.go`
is the reference; gateway runs the same shape against
`gateway/openapi.yaml`.
- The integration suite must keep running on a developer machine
with Docker available. The only acceptable `t.Skip` is
`testenv.RequireDocker` (no daemon at all). Any failure deeper
than that — `tcpostgres.Run`, network create, image build, schema
migration — fails the test loudly with `t.Fatal`. The historical
bug we fixed (silent skips on reaper failures masking 27
integration tests as "ok") came from treating an environment
break as a skip.
## Service-specific coverage
### `galaxy/gateway`
Service tests live under `gateway/internal/`:
- Public REST routing, error projection, and OpenAPI contract
validation.
- Authenticated gRPC envelope verification (`grpcapi.Server`):
signature, payload hash, freshness window, anti-replay reservation,
unknown / revoked sessions.
- Session cache (`session.BackendCache`) — the only implementation
in the codebase, a thin wrapper around the `backendclient.RESTClient`
per-request lookup.
- Response signing for unary responses and stream events
(`authn.ResponseSigner`).
- Push hub (`push.Hub`) and push fan-out (`push_fanout.go`).
- Replay store (`replay.RedisStore`) reservation semantics.
- Anti-abuse rate limits per IP / session / user / message class.
### `galaxy/backend`
Service tests live under `backend/internal/`:
- Startup wiring: `app.App` lifecycle, telemetry runtime, Postgres
pool, embedded migrations.
- OpenAPI contract test (`internal/server/contract_test.go`):
validates every documented operation against the live gin engine.
- Domain unit + e2e tests per package (`auth`, `user`, `admin`,
`lobby`, `runtime`, `mail`, `notification`, `geo`, `push`).
E2E tests (`*_e2e_test.go`) spin up a Postgres testcontainer.
- Mail outbox: pickup with `SELECT FOR UPDATE SKIP LOCKED`, retry
with backoff plus jitter, dead-letter past `MAX_ATTEMPTS`,
resend semantics (`pending|retrying|dead_lettered` → re-armed,
`sent` → 409).
- Notification: idempotent `Submit`, route materialisation, push +
email fan-out, `OnUserDeleted` cascade. Coverage of every catalog
kind in `buildClientPushEvent` lives in
`internal/notification/events_test.go`.
- Lobby: state-machine transitions, RND canonicalisation, sweeper.
- Runtime: per-game mutex serialisation, worker pool, scheduler,
reconciler, force-next-turn skip flag.
- Admin: bcrypt cost 12, idempotent bootstrap, write-through cache,
409 Conflict on duplicate username, last-used timestamp.
- Geo: counter increment on every authenticated request,
declared-country write at registration, fail-open semantics.
### `galaxy/game`
The engine has its own service tests under `game/`:
- OpenAPI contract test (`game/openapi_contract_test.go`).
- Engine lifecycle (init, status, turn, banish, command, order,
report) implemented by the engine package suites.
## Integration runbook
### Entry points
```bash
make -C integration preclean # idempotent leftover cleanup
make -C integration integration # preclean + serial test run
make -C integration integration-step # preclean + one-test-at-a-time
```
`integration` runs every test in the module sequentially
(`-p=1 -parallel=1`) — recommended default on a slow / shared
Docker. `integration-step` runs them one at a time with a fresh
preclean before each test and stops on the first failure; useful to
isolate a flake or build up to a full pass without losing context to
subsequent tests.
### Why preclean matters
`preclean` keys off labels and removes:
- Containers labelled `org.testcontainers=true` (every container the
testcontainers-go library brings up — backend, gateway, game,
postgres, redis, mailpit, ryuk).
- Containers labelled `galaxy.backend=1` — engine instances spawned
by backend's runtime adapter directly on the host Docker daemon
(see `backend/internal/dockerclient/types.go`).
- Networks labelled `org.testcontainers=true`.
- Locally-built images labelled `galaxy.test.kind=integration-image`
— the `galaxy/{backend,gateway,game}:integration` builds produced
by `integration/testenv/images.go`. Pulled service images
(`postgres:16-alpine`, `redis:7-alpine`, `axllent/mailpit`,
`testcontainers/ryuk`) are **not** touched, so the cache stays
warm.
### Ryuk reaper
The integration runners disable the testcontainers Ryuk reaper:
```makefile
export TESTCONTAINERS_RYUK_DISABLED = true
```
This is environment-driven, not principled — Ryuk does not start
cleanly on the local colima setup we use, and `preclean` covers the
same job by labels. Re-enable Ryuk by exporting
`TESTCONTAINERS_RYUK_DISABLED=false` (or unset) before invoking the
make target if you have an environment where Ryuk works.
### Cold runs
The first run after a clean checkout (or after `preclean`) rebuilds
three images: `galaxy/backend:integration`,
`galaxy/gateway:integration`, `galaxy/game:integration`. Cold cost
is ~30 s per image. Subsequent runs reuse the build cache; `preclean`
removes the tagged images themselves but BuildKit cache mounts
survive, so re-builds are fast.
## Integration test coverage
Mandatory inter-service coverage in `integration/`:
- **Gateway ↔ Backend (public auth)**:
`auth_flow_test.go` — register + confirm with mailpit-captured
code; declared_country populated; idempotent re-confirm.
- **Gateway ↔ Backend (authenticated user surface)**:
`user_account_test.go`, `user_profile_update_test.go`,
`user_settings_update_test.go` — signed envelope, FlatBuffers
payload, response signature verification, BCP 47 / IANA validation.
- **Gateway ↔ Backend (anti-replay, signature, freshness)**:
`gateway_edge_test.go` — body-too-large, bad signature,
payload_hash mismatch, stale timestamp, unknown session,
unsupported `protocol_version`.
- **Gateway ↔ Backend (push)**:
`notification_flow_test.go`, `session_revoke_test.go` — push
delivery to a SubscribeEvents stream and immediate stream close
on revoke.
- **Gateway ↔ Backend (anti-replay)**:
`anti_replay_test.go` — duplicate `request_id` rejected.
- **Backend ↔ Postgres** is exercised by every backend e2e test
through testcontainers; integration tests do not duplicate it.
- **Backend ↔ SMTP**:
`mail_flow_test.go` — login-code email captured by mailpit; admin
list reaches `sent`; resend on `sent` returns 409.
- **Backend ↔ Game engine**:
`runtime_lifecycle_test.go`, `engine_command_proxy_test.go`
start container, healthz green, command, force-next-turn, finish,
race name promotion.
- **Admin surface (REST)**:
`admin_flow_test.go`, `admin_global_games_view_test.go`,
`admin_engine_versions_test.go`, `admin_user_sanction_test.go`
bootstrap + CRUD; visibility split between user and admin queries;
engine-version registry CRUD; permanent block cascade.
- **Lobby flow without engine**:
`lobby_flow_test.go` — owner-creates-private-game →
open-enrollment → invite → redeem → memberships listing.
- **Soft delete cascade**:
`soft_delete_test.go``POST /api/v1/user/account/delete`
cascades through auth/lobby/notification/geo, gateway rejects
subsequent calls.
- **Geo counters**:
`geo_counter_increments_test.go` — multiple authenticated
requests with different `X-Forwarded-For` values increment the
user's per-country counter rows.
Full-system flows beyond the inter-service set are intentionally
limited; pick scenarios that exercise the longest vertical slice
the platform supports today.
## Principles
### Service tests
- **Postgres testcontainers must pin no-op observability providers.**
Tests that call `pgshared.OpenPrimary(ctx, cfg)` from
`galaxy/postgres` pass `backendpg.NoObservabilityOptions()...` so
`otelsql` cannot fall through to the global tracer/meter providers.
Without this, an unset OTEL endpoint in the developer environment
can stall the test on a background exporter handshake.
See `backend/internal/postgres/testopts.go` for the helper and
`backend/internal/{auth,user,admin,lobby,mail,notification,runtime,geo,postgres}/`
test files for the established call sites.
- **A bootstrap failure is fatal, not a skip.** A test that needs a
testcontainer must fail loudly when the container fails to come
up. `t.Skipf` is reserved for `testenv.RequireDocker` (no daemon
at all); anything past that — `tcpostgres.Run`, `db.Ping`, schema
migration — uses `t.Fatalf`.
### Integration tests
- **Bootstrap is per-test.** Each test calls `testenv.Bootstrap(t)`
to spin up a dedicated Postgres, Redis, mailpit, backend, and
gateway. Cross-test contamination is impossible.
- **Tests do not call `t.Parallel`.** Docker resource pressure makes
parallel bootstraps flaky on commodity hardware.
- **Anti-abuse limits are loosened by `testenv/gateway.go`.** The
bulk-scenario default lifts every gateway rate-limit class
(`public_auth`, identity-bucket per-email, IP/session/user/
message-class) to 10 000 req/window with a 1 000 burst. Negative-
path edge tests in `gateway_edge_test.go` tighten specific limits
per test to observe the protection firing.
- **Image labels are intentional.** `integration/testenv/images.go`
stamps every locally-built image with
`galaxy.test.kind=integration-image`; `preclean` keys off this
label. Do not strip it from new image builds added to the test
harness.
## Test file ownership matrix
| Suite | Where | Boots | Runs how |
|--------------------------------------------|-------------------|----------------------------------------------------------------------|-------------------------------------------|
| `backend/internal/<pkg>/...` unit | per package | one Postgres testcontainer per test | `go test ./internal/<pkg>/` |
| `backend/push` | `backend/push/` | nothing | `go test ./push/` |
| `gateway/internal/<pkg>/...` unit | per package | mostly nothing; few use redis tc | `go test ./internal/<pkg>/` |
| `pkg/transcoder`, `pkg/postgres` unit | per package | nothing / one tc per test | `go test ./...` from the package |
| `integration/` | `integration/` | postgres + redis + mailpit + backend + gateway (+ optional game) | `make -C integration integration` |
## Adding a new test
1. Decide the layer: service, inter-service, or system. A backend
change usually lands as service tests plus an integration test
for any new cross-process behaviour.
2. Reuse `testenv` fixtures rather than rolling your own container
orchestration.
3. Follow the bootstrap-per-test pattern; do not share a global
stack across tests.
4. Make the test deterministic: explicit timeouts (no
`time.Sleep`), `t.Logf` instead of `fmt.Println`, no
`t.Parallel()` in `integration/`.
5. Service test that hits Postgres: copy the `startPostgres(t)`
helper from one of the existing packages (e.g.
`backend/internal/auth/auth_e2e_test.go`) and pass
`backendpg.NoObservabilityOptions()...` to `pgshared.OpenPrimary`.
6. Integration test: add the file under `integration/`, call
`testenv.Bootstrap(t)`, and use the typed clients exposed by
`testenv` rather than reaching for raw HTTP. New scenarios that
need bespoke gateway env should pass `Extra` through
`BootstrapOptions` so the loosened defaults stay shared.
7. Any test that brings up its own Docker container (rare — most go
through `testenv`) must label the container so `preclean` can
find it on the next run.
## Day-to-day execution
- Run `go test ./<service>/...` for the service you are touching;
this is fast (Postgres testcontainers add ~35 s per package that
uses them).
- Run `make -C integration integration` before opening a PR that
touches a cross-process seam. Cold runs build three Docker images
(`galaxy/backend:integration`, `galaxy/gateway:integration`,
`galaxy/game:integration`) — budget ~3 min for the cold path,
~75 s for the warm path.
- Use `make -C integration integration-step` when a flake or a real
regression needs a per-test isolation pass.
- CI runs every layer on every push. Integration tests rely on a
reachable Docker daemon; missing daemon yields a clear skip from
`testenv.RequireDocker`, anything past that is a hard failure.
## Out-of-scope (legacy architecture)
The previous nine-service architecture defined components that no
longer exist as distinct services. Their behaviour either lives
inside `backend` (and is therefore covered by backend service or
integration tests) or has been removed:
- *Auth/Session Service*, *User Service*, *Notification Service*,
*Mail Service*, *Game Lobby Service*, *Runtime Manager*,
*Game Master*, *Admin Service* — consolidated into
`backend/internal/*`. Inter-service seams between these former
services are now in-process function calls; they are exercised by
backend service tests, not by integration tests.
- *Geo Profile Service* (suspicious-multi-country detection,
review-recommended state, session blocking through geo) — not
implemented. The geo concern is intentionally minimal (see
`ARCHITECTURE.md §10`) and the test plan does not assert on
features we do not ship.
- *Billing Service* — not implemented; no tests required until it
appears.