15d35f6f1f
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.
The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).
Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
525 lines
32 KiB
Markdown
525 lines
32 KiB
Markdown
# backend
|
|
|
|
`backend` is the consolidated business service of the Galaxy platform. It
|
|
owns identity, sessions, lobby, game runtime, mail, notifications, geo
|
|
signals, and administration. It is reachable only from `gateway` over
|
|
the trusted network. See `../docs/ARCHITECTURE.md` for the platform-level
|
|
context, security model, and decision rationale.
|
|
|
|
## 1. Purpose
|
|
|
|
A single Go binary that:
|
|
|
|
- Serves three HTTP route groups (`/api/v1/public/*`, `/api/v1/user/*`,
|
|
`/api/v1/admin/*`) plus health probes.
|
|
- Hosts a gRPC `SubscribePush` server consumed by `gateway`.
|
|
- Owns one Postgres schema (`backend`).
|
|
- Talks to the Docker daemon to run game engine containers.
|
|
- Talks to an SMTP relay to send mail through a durable outbox.
|
|
- Reads the GeoLite2 country database for source-IP country lookup.
|
|
|
|
This README describes how the binary is laid out, configured, and run.
|
|
The implementation specification lives in `PLAN.md`.
|
|
|
|
## 2. API Surfaces
|
|
|
|
| Prefix | Auth | Audience |
|
|
| ------------------ | ----------------------------------------------- | ------------------------------------- |
|
|
| `/api/v1/public/*` | none | Registration, code confirmation |
|
|
| `/api/v1/user/*` | `X-User-ID` injected by gateway | Authenticated end users |
|
|
| `/api/v1/admin/*` | HTTP Basic Auth against `admin_accounts` | Platform administrators |
|
|
| `/healthz` | none | Liveness probe |
|
|
| `/readyz` | none | Readiness probe |
|
|
|
|
The full contract is documented in `openapi.yaml` and validated at
|
|
runtime by the contract tests under `internal/server/`.
|
|
|
|
## 3. Module Layout
|
|
|
|
```text
|
|
backend/
|
|
├── cmd/
|
|
│ ├── backend/ # main.go: process entrypoint
|
|
│ └── jetgen/ # jet code generator runner
|
|
├── internal/
|
|
│ ├── admin/ # admin_accounts, Basic Auth verifier, admin operations
|
|
│ ├── auth/ # email-code challenges, device sessions, Ed25519 keys
|
|
│ ├── config/ # env-var loader, Validate
|
|
│ ├── diplomail/ # diplomatic-mail messages, recipients, translations
|
|
│ ├── dockerclient/ # docker/docker wrapper for container ops
|
|
│ ├── engineclient/ # net/http client to galaxy-game containers
|
|
│ ├── geo/ # geoip lookup, declared_country, per-user counters
|
|
│ ├── lobby/ # games, applications, invites, memberships, RND
|
|
│ ├── mail/ # outbox worker, SMTP delivery, dead letters
|
|
│ ├── notification/ # intent normalisation, push + email fan-out
|
|
│ ├── postgres/ # pgx pool, embedded migrations, jet/
|
|
│ ├── push/ # gRPC SubscribePush server
|
|
│ ├── runtime/ # engine version registry, container lifecycle, scheduler
|
|
│ ├── server/ # gin engine, route groups, middleware, handlers
|
|
│ ├── telemetry/ # otel runtime, zap factory
|
|
│ └── user/ # accounts, settings, entitlements, sanctions, soft delete
|
|
├── proto/
|
|
│ └── push/v1/ # push.proto and generated gRPC code
|
|
├── docs/ # per-stage decision records (one file per decision)
|
|
├── openapi.yaml # full REST contract (public + user + admin)
|
|
├── go.mod
|
|
├── Makefile # `make jet` regenerates jet code
|
|
└── README.md
|
|
```
|
|
|
|
## 4. Configuration
|
|
|
|
All configuration is environment-based; there are no flags or files.
|
|
`Validate()` is called once at startup; missing required values fail
|
|
fast.
|
|
|
|
| Variable | Required | Default | Purpose |
|
|
| --------------------------------------- | -------- | ------------------------ | --------------------------------------------------- |
|
|
| `BACKEND_HTTP_LISTEN_ADDR` | no | `:8080` | HTTP listener for REST surfaces and probes. |
|
|
| `BACKEND_HTTP_READ_TIMEOUT` | no | `30s` | HTTP read timeout. |
|
|
| `BACKEND_HTTP_WRITE_TIMEOUT` | no | `30s` | HTTP write timeout. |
|
|
| `BACKEND_HTTP_SHUTDOWN_TIMEOUT` | no | `15s` | Graceful shutdown budget for HTTP server. |
|
|
| `BACKEND_SHUTDOWN_TIMEOUT` | no | `30s` | Process-wide cap applied to each component shutdown. |
|
|
| `BACKEND_GRPC_PUSH_LISTEN_ADDR` | no | `:8081` | gRPC listener for the push interface. |
|
|
| `BACKEND_GRPC_PUSH_SHUTDOWN_TIMEOUT` | no | `10s` | Graceful shutdown budget for the gRPC server. |
|
|
| `BACKEND_LOGGING_LEVEL` | no | `info` | zap log level. |
|
|
| `BACKEND_POSTGRES_DSN` | yes | — | pgx-style Postgres DSN. Must include `search_path=backend` so unqualified reads and writes resolve to the service-owned schema. |
|
|
| `BACKEND_POSTGRES_MAX_CONNS` | no | `25` | Pool max connections. |
|
|
| `BACKEND_POSTGRES_MIN_CONNS` | no | `2` | Pool min connections. |
|
|
| `BACKEND_POSTGRES_OPERATION_TIMEOUT` | no | `5s` | Default per-statement timeout. |
|
|
| `BACKEND_SMTP_HOST` | yes | — | SMTP relay host. |
|
|
| `BACKEND_SMTP_PORT` | no | `587` | SMTP relay port. |
|
|
| `BACKEND_SMTP_USERNAME` | no | — | SMTP auth username (omit for anonymous). |
|
|
| `BACKEND_SMTP_PASSWORD` | no | — | SMTP auth password. |
|
|
| `BACKEND_SMTP_FROM` | yes | — | RFC-5321 From address. |
|
|
| `BACKEND_SMTP_TLS_MODE` | no | `starttls` | `none`, `starttls`, or `tls`. |
|
|
| `BACKEND_MAIL_WORKER_INTERVAL` | no | `2s` | How often the outbox worker scans for new work. |
|
|
| `BACKEND_MAIL_MAX_ATTEMPTS` | no | `8` | Maximum delivery attempts before dead-lettering. |
|
|
| `BACKEND_DOCKER_HOST` | no | `unix:///var/run/docker.sock` | Docker daemon endpoint. |
|
|
| `BACKEND_DOCKER_NETWORK` | yes | — | User-defined Docker bridge network for engines. |
|
|
| `BACKEND_GAME_STATE_ROOT` | yes | — | Host directory bind-mounted into engine containers. |
|
|
| `BACKEND_ADMIN_BOOTSTRAP_USER` | no | — | Initial admin username; idempotent insert. |
|
|
| `BACKEND_ADMIN_BOOTSTRAP_PASSWORD` | no | — | Initial admin password; required if user is set. |
|
|
| `BACKEND_GEOIP_DB_PATH` | yes | — | Filesystem path to GeoLite2 Country `.mmdb`. |
|
|
| `BACKEND_OTEL_TRACES_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`. |
|
|
| `BACKEND_OTEL_METRICS_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`, `prometheus`. |
|
|
| `BACKEND_OTEL_PROTOCOL` | no | `grpc` | `grpc` or `http/protobuf`. OTLP only. |
|
|
| `BACKEND_OTEL_ENDPOINT` | no | provider default | OTLP endpoint URL. |
|
|
| `BACKEND_OTEL_PROMETHEUS_LISTEN_ADDR` | no | `:9100` | When `BACKEND_OTEL_METRICS_EXPORTER=prometheus`. |
|
|
| `BACKEND_SERVICE_NAME` | no | `galaxy-backend` | Resource attribute for telemetry. |
|
|
| `BACKEND_FRESHNESS_WINDOW` | no | `5m` | Mirrors gateway freshness window for push cursor TTL. |
|
|
| `BACKEND_AUTH_CHALLENGE_TTL` | no | `10m` | Lifetime of an issued `auth_challenges` row. |
|
|
| `BACKEND_AUTH_CHALLENGE_MAX_ATTEMPTS` | no | `5` | Maximum confirm-email-code attempts per challenge. |
|
|
| `BACKEND_AUTH_CHALLENGE_THROTTLE_WINDOW`| no | `60s` | Rolling window over which challenges are counted toward throttle. |
|
|
| `BACKEND_AUTH_CHALLENGE_THROTTLE_MAX` | no | `3` | Max un-consumed, non-expired challenges per email per window before reuse kicks in. |
|
|
| `BACKEND_AUTH_USERNAME_MAX_RETRIES` | no | `10` | Retry budget for synthesising a unique placeholder `accounts.user_name` at registration. |
|
|
| `BACKEND_LOBBY_SWEEPER_INTERVAL` | no | `60s` | How often the lobby sweeper releases expired pending_registrations and auto-closes enrollment-expired games. |
|
|
| `BACKEND_LOBBY_PENDING_REGISTRATION_TTL`| no | `720h` (30 days) | Lifetime of a `pending_registration` Race Name Directory entry awaiting promotion. |
|
|
| `BACKEND_LOBBY_INVITE_DEFAULT_TTL` | no | `168h` (7 days) | Default expiry applied to invites whose request body omits `expires_at`. |
|
|
| `BACKEND_ENGINE_CALL_TIMEOUT` | no | `60s` | Per-call timeout for engine writes (init, turn, banish, command, order). |
|
|
| `BACKEND_ENGINE_PROBE_TIMEOUT` | no | `5s` | Per-call timeout for engine reads (status, report, healthz). |
|
|
| `BACKEND_RUNTIME_WORKER_POOL_SIZE` | no | `4` | Long-running runtime job concurrency. |
|
|
| `BACKEND_RUNTIME_JOB_QUEUE_SIZE` | no | `64` | Buffered runtime-job channel depth. |
|
|
| `BACKEND_RUNTIME_RECONCILE_INTERVAL` | no | `60s` | Interval between reconciler passes against the Docker daemon. |
|
|
| `BACKEND_RUNTIME_IMAGE_PULL_POLICY` | no | `if_missing` | Engine image pull policy: `if_missing`, `always`, `never`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_LOG_DRIVER` | no | `json-file` | Docker log driver applied to engine containers. |
|
|
| `BACKEND_RUNTIME_CONTAINER_LOG_OPTS` | no | — | Comma-separated `key=value` pairs forwarded to the log driver. |
|
|
| `BACKEND_RUNTIME_CONTAINER_CPU_QUOTA` | no | `2.0` | Engine container `--cpus`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_MEMORY` | no | `512m` | Engine container `--memory`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT` | no | `256` | Engine container `--pids-limit`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_STATE_MOUNT` | no | `/var/lib/galaxy-game` | Absolute in-container path for the per-game state bind mount. |
|
|
| `BACKEND_RUNTIME_STOP_GRACE_PERIOD` | no | `10s` | SIGTERM-to-SIGKILL grace period for engine container stop. |
|
|
| `BACKEND_STACK_LABEL` | no | — | Optional value stamped as `galaxy.stack=<value>` on every engine container backend spawns. Lets host-side tooling (Makefile / CI) scope cleanup to one dev stack. Empty → label is not applied. |
|
|
| `BACKEND_NOTIFICATION_ADMIN_EMAIL` | no | — | Recipient address for admin-channel notifications (`runtime.*` kinds). When empty, admin-channel routes are recorded as `skipped` and the catalog is partially silenced. |
|
|
| `BACKEND_NOTIFICATION_WORKER_INTERVAL` | no | `5s` | Notification route worker scan interval. |
|
|
| `BACKEND_NOTIFICATION_MAX_ATTEMPTS` | no | `8` | Notification route delivery attempts before dead-lettering. |
|
|
| `BACKEND_DIPLOMAIL_MAX_BODY_BYTES` | no | `4096` | Maximum size of `diplomail_messages.body` enforced at send time. Tune at runtime without a migration. |
|
|
| `BACKEND_DIPLOMAIL_MAX_SUBJECT_BYTES` | no | `256` | Maximum size of `diplomail_messages.subject`. Subject is optional; empty is always accepted. |
|
|
| `BACKEND_DIPLOMAIL_TRANSLATOR_URL` | no | — | Base URL of a LibreTranslate-compatible instance (`http://libretranslate:5000`). Empty → translator falls through to no-op (recipients are delivered with the original body). |
|
|
| `BACKEND_DIPLOMAIL_TRANSLATOR_TIMEOUT` | no | `10s` | Per-request HTTP timeout for the translation worker. |
|
|
| `BACKEND_DIPLOMAIL_TRANSLATOR_MAX_ATTEMPTS` | no | `5` | Number of failed HTTP attempts before the worker delivers the message with the original body (fallback). |
|
|
| `BACKEND_DIPLOMAIL_WORKER_INTERVAL` | no | `2s` | How often the async translation worker scans for pending pairs. The worker processes one pair per tick. |
|
|
|
|
If `BACKEND_ADMIN_BOOTSTRAP_USER` is set without
|
|
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`, `Validate()` fails. If neither is
|
|
set, no bootstrap insert happens and operators are expected to have
|
|
seeded `admin_accounts` ahead of time.
|
|
|
|
## 5. Persistence
|
|
|
|
- One Postgres database, schema `backend`. The role used by `backend`
|
|
must own the schema (or be granted `CREATE` on it for migrations).
|
|
- Migrations live in `internal/postgres/migrations/`, are embedded into
|
|
the binary via `embed.FS`, and are applied with `pressly/goose/v3`
|
|
before the HTTP listener opens. The startup path also issues a
|
|
`CREATE SCHEMA IF NOT EXISTS backend` so a fresh database does not
|
|
trip goose's bookkeeping table on the first migration.
|
|
- Migrations are sequence-numbered (`0000N_*.sql`) and applied
|
|
additively. `00001_init.sql` is the historical baseline; every
|
|
schema change after it is a new file with a higher prefix. See
|
|
`internal/postgres/migrations/README.md` for the authoring rules.
|
|
- Queries are written through `go-jet/jet/v2`. The generated code is in
|
|
`internal/postgres/jet/backend/` and is committed; `internal/postgres/jet/jet.go`
|
|
carries package metadata that survives regeneration.
|
|
- `make jet` regenerates the jet code: it spins up a transient Postgres
|
|
container, applies the migrations, runs `cmd/jetgen`, and writes the
|
|
output back into `internal/postgres/jet/backend/`. Goose's
|
|
bookkeeping table is dropped before generation so it does not leak
|
|
into the generated package.
|
|
- `BACKEND_POSTGRES_DSN` must include `search_path=backend`; the runtime
|
|
pool relies on this so unqualified reads and writes resolve to the
|
|
service-owned schema.
|
|
|
|
Idempotency is enforced through UNIQUE indexes on durable tables; there
|
|
is no separate idempotency-key table. Worker pickup uses `SELECT ...
|
|
FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`.
|
|
|
|
## 6. In-Memory Cache
|
|
|
|
`backend` warms the following caches at startup before the HTTP listener
|
|
opens:
|
|
|
|
- Active device sessions (lookup by `device_session_id`).
|
|
- User entitlement snapshots (lookup by `user_id`).
|
|
- Engine version registry (lookup by version label, populated by `internal/runtime`).
|
|
- Active runtime records (lookup by `game_id`, populated by `internal/runtime`).
|
|
- Active games and their memberships.
|
|
- Race Name Directory canonical keys.
|
|
- Admin accounts.
|
|
|
|
Each cache is updated write-through in the same domain transaction
|
|
that touches Postgres. Caches are bounded to MVP-scale data sets; if any
|
|
cache grows beyond the budget, the architecture document mandates a
|
|
discussion before moving the cache out of process.
|
|
|
|
## 7. gRPC Push Interface
|
|
|
|
The push interface is the only gRPC server hosted by `backend`. The
|
|
contract is in `proto/push/v1/push.proto`:
|
|
|
|
```proto
|
|
service Push {
|
|
rpc SubscribePush(GatewaySubscribeRequest) returns (stream PushEvent);
|
|
}
|
|
|
|
message PushEvent {
|
|
oneof kind {
|
|
ClientEvent client_event = 1;
|
|
SessionInvalidation session_invalidation = 2;
|
|
}
|
|
string cursor = 3;
|
|
}
|
|
```
|
|
|
|
- `ClientEvent` carries an opaque payload addressed to a `(user_id [,
|
|
device_session_id])`. Gateway signs and forwards it to active client
|
|
subscriptions. Producers do not pass raw bytes to `push.Service`;
|
|
instead they pass a typed `push.Event` (`Kind() string`,
|
|
`Marshal() ([]byte, error)`) and `push.Service` invokes Marshal at
|
|
publish time. Every notification catalog kind (§10) has a 1:1
|
|
FlatBuffers schema in `pkg/schema/fbs/notification.fbs`; the
|
|
notification dispatcher routes `(kind, payload)` to a typed event
|
|
through `notification.buildClientPushEvent`, so client decoders can
|
|
rely on a stable wire shape per kind. `push.JSONEvent` remains as a
|
|
safety net for kinds that arrive without a catalog schema. The frame
|
|
also carries `event_id`, `request_id`, and `trace_id` correlation
|
|
strings populated by backend producers (notification dispatcher
|
|
fills `event_id` from `route_id`, `request_id` from the originating
|
|
intent's `idempotency_key`, and `trace_id` from the active span);
|
|
gateway re-emits the values inside the signed client envelope
|
|
without re-interpreting them.
|
|
- `SessionInvalidation` instructs gateway to close active subscriptions
|
|
and reject in-flight requests for the affected sessions.
|
|
- `cursor` is a monotonically increasing string. Gateway stores the last
|
|
consumed cursor and uses it on reconnect. The format is opaque to
|
|
gateway; backend only guarantees lexicographic monotonicity within a
|
|
process lifetime, and resets the sequence after a restart.
|
|
- Backend keeps an in-memory ring buffer of recent events with a TTL of
|
|
`BACKEND_FRESHNESS_WINDOW`. Cursors that have aged out resume from a
|
|
fresh point.
|
|
- A gateway reconnect with the same `gateway_client_id` replaces the
|
|
previous subscription (`codes.Aborted` is returned to the older
|
|
stream). Distinct ids fan out as separate broadcast targets.
|
|
- Cursor format is a zero-padded decimal `uint64` string emitted by an
|
|
in-process counter; gateway treats it as opaque.
|
|
- Ring buffer eviction is by TTL plus a fixed capacity ceiling.
|
|
Backpressure is per-connection drop-oldest: if the buffered channel
|
|
for a subscriber overflows, the oldest event for that connection is
|
|
discarded and the loss is logged so operators can correlate the gap
|
|
on the gateway side.
|
|
|
|
## 8. Engine Client
|
|
|
|
`internal/engineclient` is a thin `net/http`-based client that targets
|
|
running engine containers at `http://galaxy-game-{game_id}:8080`. It
|
|
uses the DTOs in `pkg/model/{order,report,rest}` directly; it does not
|
|
introduce its own request/response types.
|
|
|
|
Endpoints used:
|
|
|
|
- `POST /api/v1/admin/init` — the runtime worker passes the canonical
|
|
`game_id` (the same UUID that names the engine container and the
|
|
host bind-mount directory) in the request body so the engine's
|
|
`state.json` shares identity with the backend's `games.game_id`.
|
|
- `GET /api/v1/admin/status`
|
|
- `PUT /api/v1/admin/turn`
|
|
- `POST /api/v1/admin/race/banish`
|
|
- `PUT /api/v1/command`
|
|
- `PUT /api/v1/order`
|
|
- `GET /api/v1/report`
|
|
- `GET /healthz`
|
|
|
|
Engine-version arbitration lives in `internal/runtime`. Patch updates
|
|
are semver-patch-only inside the same major/minor line; major or minor
|
|
changes require explicit stop and start. Reconciliation adopts
|
|
unrecorded containers tagged with the `galaxy.backend=1` label and
|
|
marks recorded containers that are missing as removed.
|
|
|
|
## 9. Mail Outbox
|
|
|
|
Tables in schema `backend`:
|
|
|
|
- `mail_deliveries` — one row per logical delivery, keyed by
|
|
`(template_id, idempotency_key)`.
|
|
- `mail_recipients` — `(delivery_id, address)`.
|
|
- `mail_attempts` — append-only attempt log.
|
|
- `mail_dead_letters` — terminal failure mirror with the latest payload
|
|
pointer for forensics and resend.
|
|
- `mail_payloads` — opaque rendered payload bytes.
|
|
|
|
Lifecycle:
|
|
|
|
1. Producer writes the delivery and payload rows in one transaction.
|
|
2. The worker picks the row with `SELECT ... FOR UPDATE SKIP LOCKED`,
|
|
sends through SMTP using `wneessen/go-mail`, records the attempt,
|
|
and either marks `sent` or schedules `next_attempt_at` with
|
|
exponential backoff and jitter.
|
|
3. After `BACKEND_MAIL_MAX_ATTEMPTS` the delivery moves to
|
|
`mail_dead_letters` and the worker writes an operator log line.
|
|
The `mail.dead_lettered` notification kind is reserved in the
|
|
catalog (see §10) but has no producer wired up yet, so no admin
|
|
email or push event is emitted today; admin observability for
|
|
dead letters relies on the log line and the
|
|
`/api/v1/admin/mail/dead-letters` listing.
|
|
4. Operators can resend a `pending`, `retrying`, or `dead_lettered`
|
|
delivery via `POST /api/v1/admin/mail/{delivery_id}/resend`. Resend
|
|
on a `sent` delivery returns `409 Conflict` so operators cannot
|
|
accidentally redeliver an email that already left the relay.
|
|
|
|
On startup the worker drains every row in `pending` or `retrying`
|
|
state. There is no separate recovery flow.
|
|
|
|
`mail_attempts.attempt_no` is monotonic across the entire history of a
|
|
single `delivery_id` — a resend keeps the previous attempts and appends
|
|
new ones rather than restarting the counter. `EnqueueLoginCode` uses a
|
|
server-side UUID as `idempotency_key` so callers cannot collide; other
|
|
template producers (notification routes, future direct callers) supply
|
|
a stable key, and the UNIQUE on `(template_id, idempotency_key)`
|
|
prevents duplicate delivery rows.
|
|
|
|
## 10. Notification Catalog
|
|
|
|
The catalog is the closed set of `notification_kind` values understood
|
|
by `internal/notification`. Each kind specifies the channels it fans
|
|
out to and the payload fields used by templates and clients. The
|
|
`auth.login_code` row is delivered directly through the mail outbox
|
|
from `internal/auth` and is not materialised inside
|
|
`notification_routes` — the auth flow needs the delivery row to commit
|
|
synchronously with the challenge, which the notification dispatcher
|
|
cannot guarantee.
|
|
|
|
| Kind | Channels | Payload essentials |
|
|
| ----------------------------------- | ------------- | -------------------------------------------------------- |
|
|
| `auth.login_code` *(direct mail)* | email | `code`, `ttl` |
|
|
| `lobby.invite.received` | push, email | `game_id`, `inviter_user_id` |
|
|
| `lobby.invite.revoked` | push | `game_id` |
|
|
| `lobby.application.submitted` | push | `game_id`, `application_id` |
|
|
| `lobby.application.approved` | push, email | `game_id` |
|
|
| `lobby.application.rejected` | push, email | `game_id` |
|
|
| `lobby.membership.removed` | push, email | `game_id`, `reason` |
|
|
| `lobby.membership.blocked` | push, email | `game_id` |
|
|
| `lobby.race_name.registered` | push | `race_name` |
|
|
| `lobby.race_name.pending` | push, email | `race_name`, `expires_at` |
|
|
| `lobby.race_name.expired` | push | `race_name` |
|
|
| `runtime.image_pull_failed` | admin email | `game_id`, `image_ref` |
|
|
| `runtime.container_start_failed` | admin email | `game_id` |
|
|
| `runtime.start_config_invalid` | admin email | `game_id`, `reason` |
|
|
| `game.turn.ready` | push | `game_id`, `turn` |
|
|
| `game.paused` | push | `game_id`, `turn`, `reason` |
|
|
|
|
Admin-channel kinds (`runtime.*`) deliver email to
|
|
`BACKEND_NOTIFICATION_ADMIN_EMAIL`; when the variable is empty, those
|
|
routes land in `notification_routes` with `status='skipped'` and the
|
|
operator log line records the configuration miss.
|
|
|
|
`game.turn.ready` and `game.paused` are emitted by
|
|
`lobby.Service.OnRuntimeSnapshot`
|
|
(`backend/internal/lobby/runtime_hooks.go`):
|
|
|
|
- `game.turn.ready` fires whenever the engine's `current_turn`
|
|
advances. Idempotency key `turn-ready:<game_id>:<turn>`, JSON
|
|
payload `{game_id, turn}`.
|
|
- `game.paused` fires whenever the same hook flips the game
|
|
`running → paused` because a runtime snapshot landed with
|
|
`engine_unreachable` / `generation_failed`. Idempotency key
|
|
`paused:<game_id>:<turn>`, JSON payload
|
|
`{game_id, turn, reason}` (reason carries the runtime status
|
|
that triggered the transition). The runtime scheduler
|
|
(`backend/internal/runtime/scheduler.go`) forwards the failing
|
|
snapshot through `Service.publishFailureSnapshot` so a single
|
|
failing tick reliably reaches lobby.
|
|
|
|
Both kinds target every active membership and route through the
|
|
push channel only — per-turn / per-pause email would be spam — so
|
|
the UI's signed `SubscribeEvents` stream
|
|
(`ui/frontend/src/api/events.svelte.ts`) is the sole delivery
|
|
path. The order tab consumes them via
|
|
`OrderDraftStore.resetForNewTurn` / `markPaused`
|
|
(`ui/docs/sync-protocol.md`).
|
|
|
|
The remaining `game.*` (`game.started`, `game.generation.failed`,
|
|
`game.finished`) and `mail.dead_lettered` are reserved kinds without
|
|
a producer in the catalog; adding them is an additive change to the
|
|
catalog vocabulary and the migration CHECK constraint.
|
|
|
|
Templates ship in English only; localisation belongs to clients that
|
|
render the push payload, not to the backend mail body. Per-route mail
|
|
idempotency uses the `route_id` UUID as `idempotency_key`, so retried
|
|
notifications and partial failures cannot fan out a duplicate email.
|
|
|
|
## 11. Geo Profile
|
|
|
|
`internal/geo` operates on the GeoLite2 Country database loaded from
|
|
`BACKEND_GEOIP_DB_PATH` at startup.
|
|
|
|
- `SetDeclaredCountryAtRegistration(user_id, ip)` is called from
|
|
`auth.confirmEmailCode`. It looks up the country and writes it to
|
|
`accounts.declared_country`. The value is never updated after.
|
|
- `IncrementCounterAsync(user_id, ip)` is called from the user-surface
|
|
middleware. It launches a goroutine that looks up the country and
|
|
upserts `(user_id, country, count)` in `user_country_counters`. The
|
|
caller does not block.
|
|
- Lookup errors are logged and ignored; geo work never blocks the user.
|
|
|
|
There is no aggregation, no automatic flagging, no version history of
|
|
declared country, no admin-side review workflow. Counter rows are
|
|
exposed to operators via the admin surface for manual inspection only.
|
|
|
|
## 12. Admin Surface
|
|
|
|
- HTTP Basic Auth credentials are checked against `admin_accounts`
|
|
(Postgres). Passwords are hashed with bcrypt cost 12.
|
|
- Bootstrap on startup: if `BACKEND_ADMIN_BOOTSTRAP_USER` is configured
|
|
and no row with that username exists, insert one with the hashed
|
|
bootstrap password. The insert is idempotent.
|
|
- Admin endpoints are grouped by domain:
|
|
- `POST/GET /api/v1/admin/admin-accounts/*` — manage admins.
|
|
- `GET/POST /api/v1/admin/users/*` — list, lookup, sanction, limit, soft delete.
|
|
- `GET/POST /api/v1/admin/games/*` — list, create (public-game), inspect, force start/stop, ban member.
|
|
- `GET/POST /api/v1/admin/runtimes/*` — inspect runtime, restart, patch.
|
|
- `GET/POST /api/v1/admin/mail/*` — list deliveries, resend, view attempts.
|
|
- `GET /api/v1/admin/notifications/*` — inspect notifications and dead letters.
|
|
- Failed Basic Auth returns `401` with `WWW-Authenticate: Basic realm="galaxy-admin"`.
|
|
|
|
## 13. Local Run
|
|
|
|
Prerequisites:
|
|
|
|
- Go toolchain matching `go.work`.
|
|
- Postgres reachable via `BACKEND_POSTGRES_DSN` (a local container is
|
|
fine).
|
|
- An SMTP server (`mailhog`, `mailpit`, or any other dev relay) reachable
|
|
via `BACKEND_SMTP_HOST`/`BACKEND_SMTP_PORT`.
|
|
- Docker daemon reachable via `BACKEND_DOCKER_HOST` (the local socket is
|
|
the default; running engines through this requires the user-defined
|
|
bridge named in `BACKEND_DOCKER_NETWORK`).
|
|
- A GeoLite2 Country `.mmdb` file at `BACKEND_GEOIP_DB_PATH`. For tests,
|
|
use the synthetic mmdb generator under `pkg/geoip/test-data/`.
|
|
|
|
Run:
|
|
|
|
```bash
|
|
go run ./backend/cmd/backend
|
|
```
|
|
|
|
Migrations are embedded and applied at startup. Bootstrapping the first
|
|
admin happens on the first run if the env vars are set. Subsequent
|
|
restarts are idempotent.
|
|
|
|
## 14. Testing
|
|
|
|
Three levels:
|
|
|
|
- **Unit tests** colocated with the implementation (`*_test.go` next to
|
|
the file under test). Use `testify` for assertions, `go.uber.org/mock`
|
|
for interface mocking when an external boundary justifies it.
|
|
- **Contract tests** under `internal/server/`. Validate every request
|
|
and response against `openapi.yaml` at runtime via `kin-openapi`. New
|
|
endpoints must be added to `openapi.yaml` first; the contract test
|
|
fails until the implementation matches.
|
|
- **Integration tests** under `../integration/` (top-level repo
|
|
module). Use `testcontainers-go` for Postgres and optionally for an
|
|
SMTP capture container. Cover the user flows end to end through the
|
|
real backend binary.
|
|
|
|
`make test` runs unit and contract tests. `make integration-test` runs
|
|
the integration suite (requires Docker).
|
|
|
|
## 15. Telemetry
|
|
|
|
Required minimum signals:
|
|
|
|
- `http_requests_total{group, method, path, status}` and
|
|
`http_request_duration_seconds{...}` for each route group.
|
|
- `grpc_push_subscribers` (gauge), `grpc_push_events_total{kind}`,
|
|
`grpc_push_dropped_total{gateway_client_id}`.
|
|
- `mail_outbox_depth{state}` (gauge), `mail_attempts_total{outcome}`,
|
|
`mail_dead_letters_total`.
|
|
- `notification_intents_total{kind, outcome}`,
|
|
`notification_routes_total{channel}`.
|
|
- `runtime_container_ops_total{op, outcome}`,
|
|
`runtime_health_probes_total{outcome}`.
|
|
- `geo_lookups_total{outcome}`.
|
|
- `db_pool_acquires_total`, `db_pool_in_use{...}`, `db_pool_waits_total`.
|
|
|
|
Tracing covers HTTP request → domain operation → Postgres calls →
|
|
external client calls (SMTP, Docker, engine). Every span is linked to
|
|
the request id.
|
|
|
|
Logs are JSON, written to stdout, with `otel_trace_id` and
|
|
`otel_span_id` injected when a span context is available. The minimum
|
|
fields are `ts`, `level`, `caller`, `service`, `msg`, plus per-call
|
|
context.
|
|
|
|
## 16. Operational Notes
|
|
|
|
- Graceful shutdown drains in this order on SIGTERM/SIGINT: stop
|
|
accepting new HTTP and gRPC traffic → wait for in-flight requests
|
|
(bounded by `BACKEND_HTTP_SHUTDOWN_TIMEOUT` and the gRPC counterpart)
|
|
→ flush mail outbox writes that have already started → drain push
|
|
events to gateway → close the Docker client → close the Postgres pool.
|
|
- `/healthz` returns 200 unconditionally as long as the process is
|
|
alive.
|
|
- `/readyz` checks: Postgres reachable, migrations applied, gRPC
|
|
listener bound. Returns 503 until all hold.
|
|
- Logs are JSON to stdout. Crash dumps go to stderr.
|
|
- Configuration changes require a restart; there is no live reload.
|
|
- Bootstrap admin password should be rotated through the admin surface
|
|
immediately after the first deploy.
|
|
|
|
## 17. Service Documentation
|
|
|
|
Extended service-local documentation lives in [`docs/`](docs/):
|
|
|
|
- [Documentation index](docs/README.md)
|
|
- [Runtime and components](docs/runtime.md)
|
|
- [Domain and protocol flows](docs/flows.md)
|
|
- [Operator runbook](docs/runbook.md)
|
|
- [Configuration and OpenAPI examples](docs/examples.md)
|
|
|
|
Primary references:
|
|
|
|
- [`PLAN.md`](PLAN.md) — historical staged build-up of the service.
|
|
- [`openapi.yaml`](openapi.yaml) — REST contract.
|
|
- [`../docs/ARCHITECTURE.md`](../docs/ARCHITECTURE.md) — workspace-level architecture.
|