Files
galaxy-game/backend/README.md
T
Ilia Denisov 15d35f6f1f
Tests · Go / test (push) Successful in 1m57s
Tests · Integration / integration (pull_request) Successful in 1m48s
Tests · Go / test (pull_request) Successful in 2m0s
feat(game): canonical gameId in POST /api/v1/admin/init
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.

The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).

Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 13:13:31 +02:00

525 lines
32 KiB
Markdown

# backend
`backend` is the consolidated business service of the Galaxy platform. It
owns identity, sessions, lobby, game runtime, mail, notifications, geo
signals, and administration. It is reachable only from `gateway` over
the trusted network. See `../docs/ARCHITECTURE.md` for the platform-level
context, security model, and decision rationale.
## 1. Purpose
A single Go binary that:
- Serves three HTTP route groups (`/api/v1/public/*`, `/api/v1/user/*`,
`/api/v1/admin/*`) plus health probes.
- Hosts a gRPC `SubscribePush` server consumed by `gateway`.
- Owns one Postgres schema (`backend`).
- Talks to the Docker daemon to run game engine containers.
- Talks to an SMTP relay to send mail through a durable outbox.
- Reads the GeoLite2 country database for source-IP country lookup.
This README describes how the binary is laid out, configured, and run.
The implementation specification lives in `PLAN.md`.
## 2. API Surfaces
| Prefix | Auth | Audience |
| ------------------ | ----------------------------------------------- | ------------------------------------- |
| `/api/v1/public/*` | none | Registration, code confirmation |
| `/api/v1/user/*` | `X-User-ID` injected by gateway | Authenticated end users |
| `/api/v1/admin/*` | HTTP Basic Auth against `admin_accounts` | Platform administrators |
| `/healthz` | none | Liveness probe |
| `/readyz` | none | Readiness probe |
The full contract is documented in `openapi.yaml` and validated at
runtime by the contract tests under `internal/server/`.
## 3. Module Layout
```text
backend/
├── cmd/
│ ├── backend/ # main.go: process entrypoint
│ └── jetgen/ # jet code generator runner
├── internal/
│ ├── admin/ # admin_accounts, Basic Auth verifier, admin operations
│ ├── auth/ # email-code challenges, device sessions, Ed25519 keys
│ ├── config/ # env-var loader, Validate
│ ├── diplomail/ # diplomatic-mail messages, recipients, translations
│ ├── dockerclient/ # docker/docker wrapper for container ops
│ ├── engineclient/ # net/http client to galaxy-game containers
│ ├── geo/ # geoip lookup, declared_country, per-user counters
│ ├── lobby/ # games, applications, invites, memberships, RND
│ ├── mail/ # outbox worker, SMTP delivery, dead letters
│ ├── notification/ # intent normalisation, push + email fan-out
│ ├── postgres/ # pgx pool, embedded migrations, jet/
│ ├── push/ # gRPC SubscribePush server
│ ├── runtime/ # engine version registry, container lifecycle, scheduler
│ ├── server/ # gin engine, route groups, middleware, handlers
│ ├── telemetry/ # otel runtime, zap factory
│ └── user/ # accounts, settings, entitlements, sanctions, soft delete
├── proto/
│ └── push/v1/ # push.proto and generated gRPC code
├── docs/ # per-stage decision records (one file per decision)
├── openapi.yaml # full REST contract (public + user + admin)
├── go.mod
├── Makefile # `make jet` regenerates jet code
└── README.md
```
## 4. Configuration
All configuration is environment-based; there are no flags or files.
`Validate()` is called once at startup; missing required values fail
fast.
| Variable | Required | Default | Purpose |
| --------------------------------------- | -------- | ------------------------ | --------------------------------------------------- |
| `BACKEND_HTTP_LISTEN_ADDR` | no | `:8080` | HTTP listener for REST surfaces and probes. |
| `BACKEND_HTTP_READ_TIMEOUT` | no | `30s` | HTTP read timeout. |
| `BACKEND_HTTP_WRITE_TIMEOUT` | no | `30s` | HTTP write timeout. |
| `BACKEND_HTTP_SHUTDOWN_TIMEOUT` | no | `15s` | Graceful shutdown budget for HTTP server. |
| `BACKEND_SHUTDOWN_TIMEOUT` | no | `30s` | Process-wide cap applied to each component shutdown. |
| `BACKEND_GRPC_PUSH_LISTEN_ADDR` | no | `:8081` | gRPC listener for the push interface. |
| `BACKEND_GRPC_PUSH_SHUTDOWN_TIMEOUT` | no | `10s` | Graceful shutdown budget for the gRPC server. |
| `BACKEND_LOGGING_LEVEL` | no | `info` | zap log level. |
| `BACKEND_POSTGRES_DSN` | yes | — | pgx-style Postgres DSN. Must include `search_path=backend` so unqualified reads and writes resolve to the service-owned schema. |
| `BACKEND_POSTGRES_MAX_CONNS` | no | `25` | Pool max connections. |
| `BACKEND_POSTGRES_MIN_CONNS` | no | `2` | Pool min connections. |
| `BACKEND_POSTGRES_OPERATION_TIMEOUT` | no | `5s` | Default per-statement timeout. |
| `BACKEND_SMTP_HOST` | yes | — | SMTP relay host. |
| `BACKEND_SMTP_PORT` | no | `587` | SMTP relay port. |
| `BACKEND_SMTP_USERNAME` | no | — | SMTP auth username (omit for anonymous). |
| `BACKEND_SMTP_PASSWORD` | no | — | SMTP auth password. |
| `BACKEND_SMTP_FROM` | yes | — | RFC-5321 From address. |
| `BACKEND_SMTP_TLS_MODE` | no | `starttls` | `none`, `starttls`, or `tls`. |
| `BACKEND_MAIL_WORKER_INTERVAL` | no | `2s` | How often the outbox worker scans for new work. |
| `BACKEND_MAIL_MAX_ATTEMPTS` | no | `8` | Maximum delivery attempts before dead-lettering. |
| `BACKEND_DOCKER_HOST` | no | `unix:///var/run/docker.sock` | Docker daemon endpoint. |
| `BACKEND_DOCKER_NETWORK` | yes | — | User-defined Docker bridge network for engines. |
| `BACKEND_GAME_STATE_ROOT` | yes | — | Host directory bind-mounted into engine containers. |
| `BACKEND_ADMIN_BOOTSTRAP_USER` | no | — | Initial admin username; idempotent insert. |
| `BACKEND_ADMIN_BOOTSTRAP_PASSWORD` | no | — | Initial admin password; required if user is set. |
| `BACKEND_GEOIP_DB_PATH` | yes | — | Filesystem path to GeoLite2 Country `.mmdb`. |
| `BACKEND_OTEL_TRACES_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`. |
| `BACKEND_OTEL_METRICS_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`, `prometheus`. |
| `BACKEND_OTEL_PROTOCOL` | no | `grpc` | `grpc` or `http/protobuf`. OTLP only. |
| `BACKEND_OTEL_ENDPOINT` | no | provider default | OTLP endpoint URL. |
| `BACKEND_OTEL_PROMETHEUS_LISTEN_ADDR` | no | `:9100` | When `BACKEND_OTEL_METRICS_EXPORTER=prometheus`. |
| `BACKEND_SERVICE_NAME` | no | `galaxy-backend` | Resource attribute for telemetry. |
| `BACKEND_FRESHNESS_WINDOW` | no | `5m` | Mirrors gateway freshness window for push cursor TTL. |
| `BACKEND_AUTH_CHALLENGE_TTL` | no | `10m` | Lifetime of an issued `auth_challenges` row. |
| `BACKEND_AUTH_CHALLENGE_MAX_ATTEMPTS` | no | `5` | Maximum confirm-email-code attempts per challenge. |
| `BACKEND_AUTH_CHALLENGE_THROTTLE_WINDOW`| no | `60s` | Rolling window over which challenges are counted toward throttle. |
| `BACKEND_AUTH_CHALLENGE_THROTTLE_MAX` | no | `3` | Max un-consumed, non-expired challenges per email per window before reuse kicks in. |
| `BACKEND_AUTH_USERNAME_MAX_RETRIES` | no | `10` | Retry budget for synthesising a unique placeholder `accounts.user_name` at registration. |
| `BACKEND_LOBBY_SWEEPER_INTERVAL` | no | `60s` | How often the lobby sweeper releases expired pending_registrations and auto-closes enrollment-expired games. |
| `BACKEND_LOBBY_PENDING_REGISTRATION_TTL`| no | `720h` (30 days) | Lifetime of a `pending_registration` Race Name Directory entry awaiting promotion. |
| `BACKEND_LOBBY_INVITE_DEFAULT_TTL` | no | `168h` (7 days) | Default expiry applied to invites whose request body omits `expires_at`. |
| `BACKEND_ENGINE_CALL_TIMEOUT` | no | `60s` | Per-call timeout for engine writes (init, turn, banish, command, order). |
| `BACKEND_ENGINE_PROBE_TIMEOUT` | no | `5s` | Per-call timeout for engine reads (status, report, healthz). |
| `BACKEND_RUNTIME_WORKER_POOL_SIZE` | no | `4` | Long-running runtime job concurrency. |
| `BACKEND_RUNTIME_JOB_QUEUE_SIZE` | no | `64` | Buffered runtime-job channel depth. |
| `BACKEND_RUNTIME_RECONCILE_INTERVAL` | no | `60s` | Interval between reconciler passes against the Docker daemon. |
| `BACKEND_RUNTIME_IMAGE_PULL_POLICY` | no | `if_missing` | Engine image pull policy: `if_missing`, `always`, `never`. |
| `BACKEND_RUNTIME_CONTAINER_LOG_DRIVER` | no | `json-file` | Docker log driver applied to engine containers. |
| `BACKEND_RUNTIME_CONTAINER_LOG_OPTS` | no | — | Comma-separated `key=value` pairs forwarded to the log driver. |
| `BACKEND_RUNTIME_CONTAINER_CPU_QUOTA` | no | `2.0` | Engine container `--cpus`. |
| `BACKEND_RUNTIME_CONTAINER_MEMORY` | no | `512m` | Engine container `--memory`. |
| `BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT` | no | `256` | Engine container `--pids-limit`. |
| `BACKEND_RUNTIME_CONTAINER_STATE_MOUNT` | no | `/var/lib/galaxy-game` | Absolute in-container path for the per-game state bind mount. |
| `BACKEND_RUNTIME_STOP_GRACE_PERIOD` | no | `10s` | SIGTERM-to-SIGKILL grace period for engine container stop. |
| `BACKEND_STACK_LABEL` | no | — | Optional value stamped as `galaxy.stack=<value>` on every engine container backend spawns. Lets host-side tooling (Makefile / CI) scope cleanup to one dev stack. Empty → label is not applied. |
| `BACKEND_NOTIFICATION_ADMIN_EMAIL` | no | — | Recipient address for admin-channel notifications (`runtime.*` kinds). When empty, admin-channel routes are recorded as `skipped` and the catalog is partially silenced. |
| `BACKEND_NOTIFICATION_WORKER_INTERVAL` | no | `5s` | Notification route worker scan interval. |
| `BACKEND_NOTIFICATION_MAX_ATTEMPTS` | no | `8` | Notification route delivery attempts before dead-lettering. |
| `BACKEND_DIPLOMAIL_MAX_BODY_BYTES` | no | `4096` | Maximum size of `diplomail_messages.body` enforced at send time. Tune at runtime without a migration. |
| `BACKEND_DIPLOMAIL_MAX_SUBJECT_BYTES` | no | `256` | Maximum size of `diplomail_messages.subject`. Subject is optional; empty is always accepted. |
| `BACKEND_DIPLOMAIL_TRANSLATOR_URL` | no | — | Base URL of a LibreTranslate-compatible instance (`http://libretranslate:5000`). Empty → translator falls through to no-op (recipients are delivered with the original body). |
| `BACKEND_DIPLOMAIL_TRANSLATOR_TIMEOUT` | no | `10s` | Per-request HTTP timeout for the translation worker. |
| `BACKEND_DIPLOMAIL_TRANSLATOR_MAX_ATTEMPTS` | no | `5` | Number of failed HTTP attempts before the worker delivers the message with the original body (fallback). |
| `BACKEND_DIPLOMAIL_WORKER_INTERVAL` | no | `2s` | How often the async translation worker scans for pending pairs. The worker processes one pair per tick. |
If `BACKEND_ADMIN_BOOTSTRAP_USER` is set without
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`, `Validate()` fails. If neither is
set, no bootstrap insert happens and operators are expected to have
seeded `admin_accounts` ahead of time.
## 5. Persistence
- One Postgres database, schema `backend`. The role used by `backend`
must own the schema (or be granted `CREATE` on it for migrations).
- Migrations live in `internal/postgres/migrations/`, are embedded into
the binary via `embed.FS`, and are applied with `pressly/goose/v3`
before the HTTP listener opens. The startup path also issues a
`CREATE SCHEMA IF NOT EXISTS backend` so a fresh database does not
trip goose's bookkeeping table on the first migration.
- Migrations are sequence-numbered (`0000N_*.sql`) and applied
additively. `00001_init.sql` is the historical baseline; every
schema change after it is a new file with a higher prefix. See
`internal/postgres/migrations/README.md` for the authoring rules.
- Queries are written through `go-jet/jet/v2`. The generated code is in
`internal/postgres/jet/backend/` and is committed; `internal/postgres/jet/jet.go`
carries package metadata that survives regeneration.
- `make jet` regenerates the jet code: it spins up a transient Postgres
container, applies the migrations, runs `cmd/jetgen`, and writes the
output back into `internal/postgres/jet/backend/`. Goose's
bookkeeping table is dropped before generation so it does not leak
into the generated package.
- `BACKEND_POSTGRES_DSN` must include `search_path=backend`; the runtime
pool relies on this so unqualified reads and writes resolve to the
service-owned schema.
Idempotency is enforced through UNIQUE indexes on durable tables; there
is no separate idempotency-key table. Worker pickup uses `SELECT ...
FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`.
## 6. In-Memory Cache
`backend` warms the following caches at startup before the HTTP listener
opens:
- Active device sessions (lookup by `device_session_id`).
- User entitlement snapshots (lookup by `user_id`).
- Engine version registry (lookup by version label, populated by `internal/runtime`).
- Active runtime records (lookup by `game_id`, populated by `internal/runtime`).
- Active games and their memberships.
- Race Name Directory canonical keys.
- Admin accounts.
Each cache is updated write-through in the same domain transaction
that touches Postgres. Caches are bounded to MVP-scale data sets; if any
cache grows beyond the budget, the architecture document mandates a
discussion before moving the cache out of process.
## 7. gRPC Push Interface
The push interface is the only gRPC server hosted by `backend`. The
contract is in `proto/push/v1/push.proto`:
```proto
service Push {
rpc SubscribePush(GatewaySubscribeRequest) returns (stream PushEvent);
}
message PushEvent {
oneof kind {
ClientEvent client_event = 1;
SessionInvalidation session_invalidation = 2;
}
string cursor = 3;
}
```
- `ClientEvent` carries an opaque payload addressed to a `(user_id [,
device_session_id])`. Gateway signs and forwards it to active client
subscriptions. Producers do not pass raw bytes to `push.Service`;
instead they pass a typed `push.Event` (`Kind() string`,
`Marshal() ([]byte, error)`) and `push.Service` invokes Marshal at
publish time. Every notification catalog kind (§10) has a 1:1
FlatBuffers schema in `pkg/schema/fbs/notification.fbs`; the
notification dispatcher routes `(kind, payload)` to a typed event
through `notification.buildClientPushEvent`, so client decoders can
rely on a stable wire shape per kind. `push.JSONEvent` remains as a
safety net for kinds that arrive without a catalog schema. The frame
also carries `event_id`, `request_id`, and `trace_id` correlation
strings populated by backend producers (notification dispatcher
fills `event_id` from `route_id`, `request_id` from the originating
intent's `idempotency_key`, and `trace_id` from the active span);
gateway re-emits the values inside the signed client envelope
without re-interpreting them.
- `SessionInvalidation` instructs gateway to close active subscriptions
and reject in-flight requests for the affected sessions.
- `cursor` is a monotonically increasing string. Gateway stores the last
consumed cursor and uses it on reconnect. The format is opaque to
gateway; backend only guarantees lexicographic monotonicity within a
process lifetime, and resets the sequence after a restart.
- Backend keeps an in-memory ring buffer of recent events with a TTL of
`BACKEND_FRESHNESS_WINDOW`. Cursors that have aged out resume from a
fresh point.
- A gateway reconnect with the same `gateway_client_id` replaces the
previous subscription (`codes.Aborted` is returned to the older
stream). Distinct ids fan out as separate broadcast targets.
- Cursor format is a zero-padded decimal `uint64` string emitted by an
in-process counter; gateway treats it as opaque.
- Ring buffer eviction is by TTL plus a fixed capacity ceiling.
Backpressure is per-connection drop-oldest: if the buffered channel
for a subscriber overflows, the oldest event for that connection is
discarded and the loss is logged so operators can correlate the gap
on the gateway side.
## 8. Engine Client
`internal/engineclient` is a thin `net/http`-based client that targets
running engine containers at `http://galaxy-game-{game_id}:8080`. It
uses the DTOs in `pkg/model/{order,report,rest}` directly; it does not
introduce its own request/response types.
Endpoints used:
- `POST /api/v1/admin/init` — the runtime worker passes the canonical
`game_id` (the same UUID that names the engine container and the
host bind-mount directory) in the request body so the engine's
`state.json` shares identity with the backend's `games.game_id`.
- `GET /api/v1/admin/status`
- `PUT /api/v1/admin/turn`
- `POST /api/v1/admin/race/banish`
- `PUT /api/v1/command`
- `PUT /api/v1/order`
- `GET /api/v1/report`
- `GET /healthz`
Engine-version arbitration lives in `internal/runtime`. Patch updates
are semver-patch-only inside the same major/minor line; major or minor
changes require explicit stop and start. Reconciliation adopts
unrecorded containers tagged with the `galaxy.backend=1` label and
marks recorded containers that are missing as removed.
## 9. Mail Outbox
Tables in schema `backend`:
- `mail_deliveries` — one row per logical delivery, keyed by
`(template_id, idempotency_key)`.
- `mail_recipients` — `(delivery_id, address)`.
- `mail_attempts` — append-only attempt log.
- `mail_dead_letters` — terminal failure mirror with the latest payload
pointer for forensics and resend.
- `mail_payloads` — opaque rendered payload bytes.
Lifecycle:
1. Producer writes the delivery and payload rows in one transaction.
2. The worker picks the row with `SELECT ... FOR UPDATE SKIP LOCKED`,
sends through SMTP using `wneessen/go-mail`, records the attempt,
and either marks `sent` or schedules `next_attempt_at` with
exponential backoff and jitter.
3. After `BACKEND_MAIL_MAX_ATTEMPTS` the delivery moves to
`mail_dead_letters` and the worker writes an operator log line.
The `mail.dead_lettered` notification kind is reserved in the
catalog (see §10) but has no producer wired up yet, so no admin
email or push event is emitted today; admin observability for
dead letters relies on the log line and the
`/api/v1/admin/mail/dead-letters` listing.
4. Operators can resend a `pending`, `retrying`, or `dead_lettered`
delivery via `POST /api/v1/admin/mail/{delivery_id}/resend`. Resend
on a `sent` delivery returns `409 Conflict` so operators cannot
accidentally redeliver an email that already left the relay.
On startup the worker drains every row in `pending` or `retrying`
state. There is no separate recovery flow.
`mail_attempts.attempt_no` is monotonic across the entire history of a
single `delivery_id` — a resend keeps the previous attempts and appends
new ones rather than restarting the counter. `EnqueueLoginCode` uses a
server-side UUID as `idempotency_key` so callers cannot collide; other
template producers (notification routes, future direct callers) supply
a stable key, and the UNIQUE on `(template_id, idempotency_key)`
prevents duplicate delivery rows.
## 10. Notification Catalog
The catalog is the closed set of `notification_kind` values understood
by `internal/notification`. Each kind specifies the channels it fans
out to and the payload fields used by templates and clients. The
`auth.login_code` row is delivered directly through the mail outbox
from `internal/auth` and is not materialised inside
`notification_routes` — the auth flow needs the delivery row to commit
synchronously with the challenge, which the notification dispatcher
cannot guarantee.
| Kind | Channels | Payload essentials |
| ----------------------------------- | ------------- | -------------------------------------------------------- |
| `auth.login_code` *(direct mail)* | email | `code`, `ttl` |
| `lobby.invite.received` | push, email | `game_id`, `inviter_user_id` |
| `lobby.invite.revoked` | push | `game_id` |
| `lobby.application.submitted` | push | `game_id`, `application_id` |
| `lobby.application.approved` | push, email | `game_id` |
| `lobby.application.rejected` | push, email | `game_id` |
| `lobby.membership.removed` | push, email | `game_id`, `reason` |
| `lobby.membership.blocked` | push, email | `game_id` |
| `lobby.race_name.registered` | push | `race_name` |
| `lobby.race_name.pending` | push, email | `race_name`, `expires_at` |
| `lobby.race_name.expired` | push | `race_name` |
| `runtime.image_pull_failed` | admin email | `game_id`, `image_ref` |
| `runtime.container_start_failed` | admin email | `game_id` |
| `runtime.start_config_invalid` | admin email | `game_id`, `reason` |
| `game.turn.ready` | push | `game_id`, `turn` |
| `game.paused` | push | `game_id`, `turn`, `reason` |
Admin-channel kinds (`runtime.*`) deliver email to
`BACKEND_NOTIFICATION_ADMIN_EMAIL`; when the variable is empty, those
routes land in `notification_routes` with `status='skipped'` and the
operator log line records the configuration miss.
`game.turn.ready` and `game.paused` are emitted by
`lobby.Service.OnRuntimeSnapshot`
(`backend/internal/lobby/runtime_hooks.go`):
- `game.turn.ready` fires whenever the engine's `current_turn`
advances. Idempotency key `turn-ready:<game_id>:<turn>`, JSON
payload `{game_id, turn}`.
- `game.paused` fires whenever the same hook flips the game
`running → paused` because a runtime snapshot landed with
`engine_unreachable` / `generation_failed`. Idempotency key
`paused:<game_id>:<turn>`, JSON payload
`{game_id, turn, reason}` (reason carries the runtime status
that triggered the transition). The runtime scheduler
(`backend/internal/runtime/scheduler.go`) forwards the failing
snapshot through `Service.publishFailureSnapshot` so a single
failing tick reliably reaches lobby.
Both kinds target every active membership and route through the
push channel only — per-turn / per-pause email would be spam — so
the UI's signed `SubscribeEvents` stream
(`ui/frontend/src/api/events.svelte.ts`) is the sole delivery
path. The order tab consumes them via
`OrderDraftStore.resetForNewTurn` / `markPaused`
(`ui/docs/sync-protocol.md`).
The remaining `game.*` (`game.started`, `game.generation.failed`,
`game.finished`) and `mail.dead_lettered` are reserved kinds without
a producer in the catalog; adding them is an additive change to the
catalog vocabulary and the migration CHECK constraint.
Templates ship in English only; localisation belongs to clients that
render the push payload, not to the backend mail body. Per-route mail
idempotency uses the `route_id` UUID as `idempotency_key`, so retried
notifications and partial failures cannot fan out a duplicate email.
## 11. Geo Profile
`internal/geo` operates on the GeoLite2 Country database loaded from
`BACKEND_GEOIP_DB_PATH` at startup.
- `SetDeclaredCountryAtRegistration(user_id, ip)` is called from
`auth.confirmEmailCode`. It looks up the country and writes it to
`accounts.declared_country`. The value is never updated after.
- `IncrementCounterAsync(user_id, ip)` is called from the user-surface
middleware. It launches a goroutine that looks up the country and
upserts `(user_id, country, count)` in `user_country_counters`. The
caller does not block.
- Lookup errors are logged and ignored; geo work never blocks the user.
There is no aggregation, no automatic flagging, no version history of
declared country, no admin-side review workflow. Counter rows are
exposed to operators via the admin surface for manual inspection only.
## 12. Admin Surface
- HTTP Basic Auth credentials are checked against `admin_accounts`
(Postgres). Passwords are hashed with bcrypt cost 12.
- Bootstrap on startup: if `BACKEND_ADMIN_BOOTSTRAP_USER` is configured
and no row with that username exists, insert one with the hashed
bootstrap password. The insert is idempotent.
- Admin endpoints are grouped by domain:
- `POST/GET /api/v1/admin/admin-accounts/*` — manage admins.
- `GET/POST /api/v1/admin/users/*` — list, lookup, sanction, limit, soft delete.
- `GET/POST /api/v1/admin/games/*` — list, create (public-game), inspect, force start/stop, ban member.
- `GET/POST /api/v1/admin/runtimes/*` — inspect runtime, restart, patch.
- `GET/POST /api/v1/admin/mail/*` — list deliveries, resend, view attempts.
- `GET /api/v1/admin/notifications/*` — inspect notifications and dead letters.
- Failed Basic Auth returns `401` with `WWW-Authenticate: Basic realm="galaxy-admin"`.
## 13. Local Run
Prerequisites:
- Go toolchain matching `go.work`.
- Postgres reachable via `BACKEND_POSTGRES_DSN` (a local container is
fine).
- An SMTP server (`mailhog`, `mailpit`, or any other dev relay) reachable
via `BACKEND_SMTP_HOST`/`BACKEND_SMTP_PORT`.
- Docker daemon reachable via `BACKEND_DOCKER_HOST` (the local socket is
the default; running engines through this requires the user-defined
bridge named in `BACKEND_DOCKER_NETWORK`).
- A GeoLite2 Country `.mmdb` file at `BACKEND_GEOIP_DB_PATH`. For tests,
use the synthetic mmdb generator under `pkg/geoip/test-data/`.
Run:
```bash
go run ./backend/cmd/backend
```
Migrations are embedded and applied at startup. Bootstrapping the first
admin happens on the first run if the env vars are set. Subsequent
restarts are idempotent.
## 14. Testing
Three levels:
- **Unit tests** colocated with the implementation (`*_test.go` next to
the file under test). Use `testify` for assertions, `go.uber.org/mock`
for interface mocking when an external boundary justifies it.
- **Contract tests** under `internal/server/`. Validate every request
and response against `openapi.yaml` at runtime via `kin-openapi`. New
endpoints must be added to `openapi.yaml` first; the contract test
fails until the implementation matches.
- **Integration tests** under `../integration/` (top-level repo
module). Use `testcontainers-go` for Postgres and optionally for an
SMTP capture container. Cover the user flows end to end through the
real backend binary.
`make test` runs unit and contract tests. `make integration-test` runs
the integration suite (requires Docker).
## 15. Telemetry
Required minimum signals:
- `http_requests_total{group, method, path, status}` and
`http_request_duration_seconds{...}` for each route group.
- `grpc_push_subscribers` (gauge), `grpc_push_events_total{kind}`,
`grpc_push_dropped_total{gateway_client_id}`.
- `mail_outbox_depth{state}` (gauge), `mail_attempts_total{outcome}`,
`mail_dead_letters_total`.
- `notification_intents_total{kind, outcome}`,
`notification_routes_total{channel}`.
- `runtime_container_ops_total{op, outcome}`,
`runtime_health_probes_total{outcome}`.
- `geo_lookups_total{outcome}`.
- `db_pool_acquires_total`, `db_pool_in_use{...}`, `db_pool_waits_total`.
Tracing covers HTTP request → domain operation → Postgres calls →
external client calls (SMTP, Docker, engine). Every span is linked to
the request id.
Logs are JSON, written to stdout, with `otel_trace_id` and
`otel_span_id` injected when a span context is available. The minimum
fields are `ts`, `level`, `caller`, `service`, `msg`, plus per-call
context.
## 16. Operational Notes
- Graceful shutdown drains in this order on SIGTERM/SIGINT: stop
accepting new HTTP and gRPC traffic → wait for in-flight requests
(bounded by `BACKEND_HTTP_SHUTDOWN_TIMEOUT` and the gRPC counterpart)
→ flush mail outbox writes that have already started → drain push
events to gateway → close the Docker client → close the Postgres pool.
- `/healthz` returns 200 unconditionally as long as the process is
alive.
- `/readyz` checks: Postgres reachable, migrations applied, gRPC
listener bound. Returns 503 until all hold.
- Logs are JSON to stdout. Crash dumps go to stderr.
- Configuration changes require a restart; there is no live reload.
- Bootstrap admin password should be rotated through the admin surface
immediately after the first deploy.
## 17. Service Documentation
Extended service-local documentation lives in [`docs/`](docs/):
- [Documentation index](docs/README.md)
- [Runtime and components](docs/runtime.md)
- [Domain and protocol flows](docs/flows.md)
- [Operator runbook](docs/runbook.md)
- [Configuration and OpenAPI examples](docs/examples.md)
Primary references:
- [`PLAN.md`](PLAN.md) — historical staged build-up of the service.
- [`openapi.yaml`](openapi.yaml) — REST contract.
- [`../docs/ARCHITECTURE.md`](../docs/ARCHITECTURE.md) — workspace-level architecture.