2ca47eb4df
Backend now owns the turn-cutoff and pause guards the order tab relies on: the scheduler flips runtime_status between generation_in_progress and running around every engine tick, a failed tick auto-pauses the game through OnRuntimeSnapshot, and a new game.paused notification kind fans out alongside game.turn.ready. The user-games handlers reject submits with HTTP 409 turn_already_closed or game_paused depending on the runtime state. UI delegates auto-sync to a new OrderQueue: offline detection, single retry on reconnect, conflict / paused classification. OrderDraftStore surfaces conflictBanner / pausedBanner runes, clears them on local mutation or on a game.turn.ready push via resetForNewTurn. The order tab renders the matching banners and the new conflict per-row badge; i18n bundles cover en + ru. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
514 lines
30 KiB
Markdown
514 lines
30 KiB
Markdown
# backend
|
|
|
|
`backend` is the consolidated business service of the Galaxy platform. It
|
|
owns identity, sessions, lobby, game runtime, mail, notifications, geo
|
|
signals, and administration. It is reachable only from `gateway` over
|
|
the trusted network. See `../docs/ARCHITECTURE.md` for the platform-level
|
|
context, security model, and decision rationale.
|
|
|
|
## 1. Purpose
|
|
|
|
A single Go binary that:
|
|
|
|
- Serves three HTTP route groups (`/api/v1/public/*`, `/api/v1/user/*`,
|
|
`/api/v1/admin/*`) plus health probes.
|
|
- Hosts a gRPC `SubscribePush` server consumed by `gateway`.
|
|
- Owns one Postgres schema (`backend`).
|
|
- Talks to the Docker daemon to run game engine containers.
|
|
- Talks to an SMTP relay to send mail through a durable outbox.
|
|
- Reads the GeoLite2 country database for source-IP country lookup.
|
|
|
|
This README describes how the binary is laid out, configured, and run.
|
|
The implementation specification lives in `PLAN.md`.
|
|
|
|
## 2. API Surfaces
|
|
|
|
| Prefix | Auth | Audience |
|
|
| ------------------ | ----------------------------------------------- | ------------------------------------- |
|
|
| `/api/v1/public/*` | none | Registration, code confirmation |
|
|
| `/api/v1/user/*` | `X-User-ID` injected by gateway | Authenticated end users |
|
|
| `/api/v1/admin/*` | HTTP Basic Auth against `admin_accounts` | Platform administrators |
|
|
| `/healthz` | none | Liveness probe |
|
|
| `/readyz` | none | Readiness probe |
|
|
|
|
The full contract is documented in `openapi.yaml` and validated at
|
|
runtime by the contract tests under `internal/server/`.
|
|
|
|
## 3. Module Layout
|
|
|
|
```text
|
|
backend/
|
|
├── cmd/
|
|
│ ├── backend/ # main.go: process entrypoint
|
|
│ └── jetgen/ # jet code generator runner
|
|
├── internal/
|
|
│ ├── admin/ # admin_accounts, Basic Auth verifier, admin operations
|
|
│ ├── auth/ # email-code challenges, device sessions, Ed25519 keys
|
|
│ ├── config/ # env-var loader, Validate
|
|
│ ├── dockerclient/ # docker/docker wrapper for container ops
|
|
│ ├── engineclient/ # net/http client to galaxy-game containers
|
|
│ ├── geo/ # geoip lookup, declared_country, per-user counters
|
|
│ ├── lobby/ # games, applications, invites, memberships, RND
|
|
│ ├── mail/ # outbox worker, SMTP delivery, dead letters
|
|
│ ├── notification/ # intent normalisation, push + email fan-out
|
|
│ ├── postgres/ # pgx pool, embedded migrations, jet/
|
|
│ ├── push/ # gRPC SubscribePush server
|
|
│ ├── runtime/ # engine version registry, container lifecycle, scheduler
|
|
│ ├── server/ # gin engine, route groups, middleware, handlers
|
|
│ ├── telemetry/ # otel runtime, zap factory
|
|
│ └── user/ # accounts, settings, entitlements, sanctions, soft delete
|
|
├── proto/
|
|
│ └── push/v1/ # push.proto and generated gRPC code
|
|
├── docs/ # per-stage decision records (one file per decision)
|
|
├── openapi.yaml # full REST contract (public + user + admin)
|
|
├── go.mod
|
|
├── Makefile # `make jet` regenerates jet code
|
|
└── README.md
|
|
```
|
|
|
|
## 4. Configuration
|
|
|
|
All configuration is environment-based; there are no flags or files.
|
|
`Validate()` is called once at startup; missing required values fail
|
|
fast.
|
|
|
|
| Variable | Required | Default | Purpose |
|
|
| --------------------------------------- | -------- | ------------------------ | --------------------------------------------------- |
|
|
| `BACKEND_HTTP_LISTEN_ADDR` | no | `:8080` | HTTP listener for REST surfaces and probes. |
|
|
| `BACKEND_HTTP_READ_TIMEOUT` | no | `30s` | HTTP read timeout. |
|
|
| `BACKEND_HTTP_WRITE_TIMEOUT` | no | `30s` | HTTP write timeout. |
|
|
| `BACKEND_HTTP_SHUTDOWN_TIMEOUT` | no | `15s` | Graceful shutdown budget for HTTP server. |
|
|
| `BACKEND_SHUTDOWN_TIMEOUT` | no | `30s` | Process-wide cap applied to each component shutdown. |
|
|
| `BACKEND_GRPC_PUSH_LISTEN_ADDR` | no | `:8081` | gRPC listener for the push interface. |
|
|
| `BACKEND_GRPC_PUSH_SHUTDOWN_TIMEOUT` | no | `10s` | Graceful shutdown budget for the gRPC server. |
|
|
| `BACKEND_LOGGING_LEVEL` | no | `info` | zap log level. |
|
|
| `BACKEND_POSTGRES_DSN` | yes | — | pgx-style Postgres DSN. Must include `search_path=backend` so unqualified reads and writes resolve to the service-owned schema. |
|
|
| `BACKEND_POSTGRES_MAX_CONNS` | no | `25` | Pool max connections. |
|
|
| `BACKEND_POSTGRES_MIN_CONNS` | no | `2` | Pool min connections. |
|
|
| `BACKEND_POSTGRES_OPERATION_TIMEOUT` | no | `5s` | Default per-statement timeout. |
|
|
| `BACKEND_SMTP_HOST` | yes | — | SMTP relay host. |
|
|
| `BACKEND_SMTP_PORT` | no | `587` | SMTP relay port. |
|
|
| `BACKEND_SMTP_USERNAME` | no | — | SMTP auth username (omit for anonymous). |
|
|
| `BACKEND_SMTP_PASSWORD` | no | — | SMTP auth password. |
|
|
| `BACKEND_SMTP_FROM` | yes | — | RFC-5321 From address. |
|
|
| `BACKEND_SMTP_TLS_MODE` | no | `starttls` | `none`, `starttls`, or `tls`. |
|
|
| `BACKEND_MAIL_WORKER_INTERVAL` | no | `2s` | How often the outbox worker scans for new work. |
|
|
| `BACKEND_MAIL_MAX_ATTEMPTS` | no | `8` | Maximum delivery attempts before dead-lettering. |
|
|
| `BACKEND_DOCKER_HOST` | no | `unix:///var/run/docker.sock` | Docker daemon endpoint. |
|
|
| `BACKEND_DOCKER_NETWORK` | yes | — | User-defined Docker bridge network for engines. |
|
|
| `BACKEND_GAME_STATE_ROOT` | yes | — | Host directory bind-mounted into engine containers. |
|
|
| `BACKEND_ADMIN_BOOTSTRAP_USER` | no | — | Initial admin username; idempotent insert. |
|
|
| `BACKEND_ADMIN_BOOTSTRAP_PASSWORD` | no | — | Initial admin password; required if user is set. |
|
|
| `BACKEND_GEOIP_DB_PATH` | yes | — | Filesystem path to GeoLite2 Country `.mmdb`. |
|
|
| `BACKEND_OTEL_TRACES_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`. |
|
|
| `BACKEND_OTEL_METRICS_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`, `prometheus`. |
|
|
| `BACKEND_OTEL_PROTOCOL` | no | `grpc` | `grpc` or `http/protobuf`. OTLP only. |
|
|
| `BACKEND_OTEL_ENDPOINT` | no | provider default | OTLP endpoint URL. |
|
|
| `BACKEND_OTEL_PROMETHEUS_LISTEN_ADDR` | no | `:9100` | When `BACKEND_OTEL_METRICS_EXPORTER=prometheus`. |
|
|
| `BACKEND_SERVICE_NAME` | no | `galaxy-backend` | Resource attribute for telemetry. |
|
|
| `BACKEND_FRESHNESS_WINDOW` | no | `5m` | Mirrors gateway freshness window for push cursor TTL. |
|
|
| `BACKEND_AUTH_CHALLENGE_TTL` | no | `10m` | Lifetime of an issued `auth_challenges` row. |
|
|
| `BACKEND_AUTH_CHALLENGE_MAX_ATTEMPTS` | no | `5` | Maximum confirm-email-code attempts per challenge. |
|
|
| `BACKEND_AUTH_CHALLENGE_THROTTLE_WINDOW`| no | `60s` | Rolling window over which challenges are counted toward throttle. |
|
|
| `BACKEND_AUTH_CHALLENGE_THROTTLE_MAX` | no | `3` | Max un-consumed, non-expired challenges per email per window before reuse kicks in. |
|
|
| `BACKEND_AUTH_USERNAME_MAX_RETRIES` | no | `10` | Retry budget for synthesising a unique placeholder `accounts.user_name` at registration. |
|
|
| `BACKEND_LOBBY_SWEEPER_INTERVAL` | no | `60s` | How often the lobby sweeper releases expired pending_registrations and auto-closes enrollment-expired games. |
|
|
| `BACKEND_LOBBY_PENDING_REGISTRATION_TTL`| no | `720h` (30 days) | Lifetime of a `pending_registration` Race Name Directory entry awaiting promotion. |
|
|
| `BACKEND_LOBBY_INVITE_DEFAULT_TTL` | no | `168h` (7 days) | Default expiry applied to invites whose request body omits `expires_at`. |
|
|
| `BACKEND_ENGINE_CALL_TIMEOUT` | no | `60s` | Per-call timeout for engine writes (init, turn, banish, command, order). |
|
|
| `BACKEND_ENGINE_PROBE_TIMEOUT` | no | `5s` | Per-call timeout for engine reads (status, report, healthz). |
|
|
| `BACKEND_RUNTIME_WORKER_POOL_SIZE` | no | `4` | Long-running runtime job concurrency. |
|
|
| `BACKEND_RUNTIME_JOB_QUEUE_SIZE` | no | `64` | Buffered runtime-job channel depth. |
|
|
| `BACKEND_RUNTIME_RECONCILE_INTERVAL` | no | `60s` | Interval between reconciler passes against the Docker daemon. |
|
|
| `BACKEND_RUNTIME_IMAGE_PULL_POLICY` | no | `if_missing` | Engine image pull policy: `if_missing`, `always`, `never`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_LOG_DRIVER` | no | `json-file` | Docker log driver applied to engine containers. |
|
|
| `BACKEND_RUNTIME_CONTAINER_LOG_OPTS` | no | — | Comma-separated `key=value` pairs forwarded to the log driver. |
|
|
| `BACKEND_RUNTIME_CONTAINER_CPU_QUOTA` | no | `2.0` | Engine container `--cpus`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_MEMORY` | no | `512m` | Engine container `--memory`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT` | no | `256` | Engine container `--pids-limit`. |
|
|
| `BACKEND_RUNTIME_CONTAINER_STATE_MOUNT` | no | `/var/lib/galaxy-game` | Absolute in-container path for the per-game state bind mount. |
|
|
| `BACKEND_RUNTIME_STOP_GRACE_PERIOD` | no | `10s` | SIGTERM-to-SIGKILL grace period for engine container stop. |
|
|
| `BACKEND_NOTIFICATION_ADMIN_EMAIL` | no | — | Recipient address for admin-channel notifications (`runtime.*` kinds). When empty, admin-channel routes are recorded as `skipped` and the catalog is partially silenced. |
|
|
| `BACKEND_NOTIFICATION_WORKER_INTERVAL` | no | `5s` | Notification route worker scan interval. |
|
|
| `BACKEND_NOTIFICATION_MAX_ATTEMPTS` | no | `8` | Notification route delivery attempts before dead-lettering. |
|
|
|
|
If `BACKEND_ADMIN_BOOTSTRAP_USER` is set without
|
|
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`, `Validate()` fails. If neither is
|
|
set, no bootstrap insert happens and operators are expected to have
|
|
seeded `admin_accounts` ahead of time.
|
|
|
|
## 5. Persistence
|
|
|
|
- One Postgres database, schema `backend`. The role used by `backend`
|
|
must own the schema (or be granted `CREATE` on it for migrations).
|
|
- Migrations live in `internal/postgres/migrations/`, are embedded into
|
|
the binary via `embed.FS`, and are applied with `pressly/goose/v3`
|
|
before the HTTP listener opens. The startup path also issues a
|
|
`CREATE SCHEMA IF NOT EXISTS backend` so a fresh database does not
|
|
trip goose's bookkeeping table on the first migration.
|
|
- Pre-production uses one migration file (`00001_init.sql`) covering
|
|
every backend domain (auth, user, admin, lobby, runtime, mail,
|
|
notification, geo). Future migrations are sequence-numbered and
|
|
additive.
|
|
- Queries are written through `go-jet/jet/v2`. The generated code is in
|
|
`internal/postgres/jet/backend/` and is committed; `internal/postgres/jet/jet.go`
|
|
carries package metadata that survives regeneration.
|
|
- `make jet` regenerates the jet code: it spins up a transient Postgres
|
|
container, applies the migrations, runs `cmd/jetgen`, and writes the
|
|
output back into `internal/postgres/jet/backend/`. Goose's
|
|
bookkeeping table is dropped before generation so it does not leak
|
|
into the generated package.
|
|
- `BACKEND_POSTGRES_DSN` must include `search_path=backend`; the runtime
|
|
pool relies on this so unqualified reads and writes resolve to the
|
|
service-owned schema.
|
|
|
|
Idempotency is enforced through UNIQUE indexes on durable tables; there
|
|
is no separate idempotency-key table. Worker pickup uses `SELECT ...
|
|
FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`.
|
|
|
|
## 6. In-Memory Cache
|
|
|
|
`backend` warms the following caches at startup before the HTTP listener
|
|
opens:
|
|
|
|
- Active device sessions (lookup by `device_session_id`).
|
|
- User entitlement snapshots (lookup by `user_id`).
|
|
- Engine version registry (lookup by version label, populated by `internal/runtime`).
|
|
- Active runtime records (lookup by `game_id`, populated by `internal/runtime`).
|
|
- Active games and their memberships.
|
|
- Race Name Directory canonical keys.
|
|
- Admin accounts.
|
|
|
|
Each cache is updated write-through in the same domain transaction
|
|
that touches Postgres. Caches are bounded to MVP-scale data sets; if any
|
|
cache grows beyond the budget, the architecture document mandates a
|
|
discussion before moving the cache out of process.
|
|
|
|
## 7. gRPC Push Interface
|
|
|
|
The push interface is the only gRPC server hosted by `backend`. The
|
|
contract is in `proto/push/v1/push.proto`:
|
|
|
|
```proto
|
|
service Push {
|
|
rpc SubscribePush(GatewaySubscribeRequest) returns (stream PushEvent);
|
|
}
|
|
|
|
message PushEvent {
|
|
oneof kind {
|
|
ClientEvent client_event = 1;
|
|
SessionInvalidation session_invalidation = 2;
|
|
}
|
|
string cursor = 3;
|
|
}
|
|
```
|
|
|
|
- `ClientEvent` carries an opaque payload addressed to a `(user_id [,
|
|
device_session_id])`. Gateway signs and forwards it to active client
|
|
subscriptions. Producers do not pass raw bytes to `push.Service`;
|
|
instead they pass a typed `push.Event` (`Kind() string`,
|
|
`Marshal() ([]byte, error)`) and `push.Service` invokes Marshal at
|
|
publish time. Every notification catalog kind (§10) has a 1:1
|
|
FlatBuffers schema in `pkg/schema/fbs/notification.fbs`; the
|
|
notification dispatcher routes `(kind, payload)` to a typed event
|
|
through `notification.buildClientPushEvent`, so client decoders can
|
|
rely on a stable wire shape per kind. `push.JSONEvent` remains as a
|
|
safety net for kinds that arrive without a catalog schema. The frame
|
|
also carries `event_id`, `request_id`, and `trace_id` correlation
|
|
strings populated by backend producers (notification dispatcher
|
|
fills `event_id` from `route_id`, `request_id` from the originating
|
|
intent's `idempotency_key`, and `trace_id` from the active span);
|
|
gateway re-emits the values inside the signed client envelope
|
|
without re-interpreting them.
|
|
- `SessionInvalidation` instructs gateway to close active subscriptions
|
|
and reject in-flight requests for the affected sessions.
|
|
- `cursor` is a monotonically increasing string. Gateway stores the last
|
|
consumed cursor and uses it on reconnect. The format is opaque to
|
|
gateway; backend only guarantees lexicographic monotonicity within a
|
|
process lifetime, and resets the sequence after a restart.
|
|
- Backend keeps an in-memory ring buffer of recent events with a TTL of
|
|
`BACKEND_FRESHNESS_WINDOW`. Cursors that have aged out resume from a
|
|
fresh point.
|
|
- A gateway reconnect with the same `gateway_client_id` replaces the
|
|
previous subscription (`codes.Aborted` is returned to the older
|
|
stream). Distinct ids fan out as separate broadcast targets.
|
|
- Cursor format is a zero-padded decimal `uint64` string emitted by an
|
|
in-process counter; gateway treats it as opaque.
|
|
- Ring buffer eviction is by TTL plus a fixed capacity ceiling.
|
|
Backpressure is per-connection drop-oldest: if the buffered channel
|
|
for a subscriber overflows, the oldest event for that connection is
|
|
discarded and the loss is logged so operators can correlate the gap
|
|
on the gateway side.
|
|
|
|
## 8. Engine Client
|
|
|
|
`internal/engineclient` is a thin `net/http`-based client that targets
|
|
running engine containers at `http://galaxy-game-{game_id}:8080`. It
|
|
uses the DTOs in `pkg/model/{order,report,rest}` directly; it does not
|
|
introduce its own request/response types.
|
|
|
|
Endpoints used:
|
|
|
|
- `POST /api/v1/admin/init`
|
|
- `GET /api/v1/admin/status`
|
|
- `PUT /api/v1/admin/turn`
|
|
- `POST /api/v1/admin/race/banish`
|
|
- `PUT /api/v1/command`
|
|
- `PUT /api/v1/order`
|
|
- `GET /api/v1/report`
|
|
- `GET /healthz`
|
|
|
|
Engine-version arbitration lives in `internal/runtime`. Patch updates
|
|
are semver-patch-only inside the same major/minor line; major or minor
|
|
changes require explicit stop and start. Reconciliation adopts
|
|
unrecorded containers tagged with the `galaxy.backend=1` label and
|
|
marks recorded containers that are missing as removed.
|
|
|
|
## 9. Mail Outbox
|
|
|
|
Tables in schema `backend`:
|
|
|
|
- `mail_deliveries` — one row per logical delivery, keyed by
|
|
`(template_id, idempotency_key)`.
|
|
- `mail_recipients` — `(delivery_id, address)`.
|
|
- `mail_attempts` — append-only attempt log.
|
|
- `mail_dead_letters` — terminal failure mirror with the latest payload
|
|
pointer for forensics and resend.
|
|
- `mail_payloads` — opaque rendered payload bytes.
|
|
|
|
Lifecycle:
|
|
|
|
1. Producer writes the delivery and payload rows in one transaction.
|
|
2. The worker picks the row with `SELECT ... FOR UPDATE SKIP LOCKED`,
|
|
sends through SMTP using `wneessen/go-mail`, records the attempt,
|
|
and either marks `sent` or schedules `next_attempt_at` with
|
|
exponential backoff and jitter.
|
|
3. After `BACKEND_MAIL_MAX_ATTEMPTS` the delivery moves to
|
|
`mail_dead_letters` and the worker writes an operator log line.
|
|
The `mail.dead_lettered` notification kind is reserved in the
|
|
catalog (see §10) but has no producer wired up yet, so no admin
|
|
email or push event is emitted today; admin observability for
|
|
dead letters relies on the log line and the
|
|
`/api/v1/admin/mail/dead-letters` listing.
|
|
4. Operators can resend a `pending`, `retrying`, or `dead_lettered`
|
|
delivery via `POST /api/v1/admin/mail/{delivery_id}/resend`. Resend
|
|
on a `sent` delivery returns `409 Conflict` so operators cannot
|
|
accidentally redeliver an email that already left the relay.
|
|
|
|
On startup the worker drains every row in `pending` or `retrying`
|
|
state. There is no separate recovery flow.
|
|
|
|
`mail_attempts.attempt_no` is monotonic across the entire history of a
|
|
single `delivery_id` — a resend keeps the previous attempts and appends
|
|
new ones rather than restarting the counter. `EnqueueLoginCode` uses a
|
|
server-side UUID as `idempotency_key` so callers cannot collide; other
|
|
template producers (notification routes, future direct callers) supply
|
|
a stable key, and the UNIQUE on `(template_id, idempotency_key)`
|
|
prevents duplicate delivery rows.
|
|
|
|
## 10. Notification Catalog
|
|
|
|
The catalog is the closed set of `notification_kind` values understood
|
|
by `internal/notification`. Each kind specifies the channels it fans
|
|
out to and the payload fields used by templates and clients. The
|
|
`auth.login_code` row is delivered directly through the mail outbox
|
|
from `internal/auth` and is not materialised inside
|
|
`notification_routes` — the auth flow needs the delivery row to commit
|
|
synchronously with the challenge, which the notification dispatcher
|
|
cannot guarantee.
|
|
|
|
| Kind | Channels | Payload essentials |
|
|
| ----------------------------------- | ------------- | -------------------------------------------------------- |
|
|
| `auth.login_code` *(direct mail)* | email | `code`, `ttl` |
|
|
| `lobby.invite.received` | push, email | `game_id`, `inviter_user_id` |
|
|
| `lobby.invite.revoked` | push | `game_id` |
|
|
| `lobby.application.submitted` | push | `game_id`, `application_id` |
|
|
| `lobby.application.approved` | push, email | `game_id` |
|
|
| `lobby.application.rejected` | push, email | `game_id` |
|
|
| `lobby.membership.removed` | push, email | `game_id`, `reason` |
|
|
| `lobby.membership.blocked` | push, email | `game_id` |
|
|
| `lobby.race_name.registered` | push | `race_name` |
|
|
| `lobby.race_name.pending` | push, email | `race_name`, `expires_at` |
|
|
| `lobby.race_name.expired` | push | `race_name` |
|
|
| `runtime.image_pull_failed` | admin email | `game_id`, `image_ref` |
|
|
| `runtime.container_start_failed` | admin email | `game_id` |
|
|
| `runtime.start_config_invalid` | admin email | `game_id`, `reason` |
|
|
| `game.turn.ready` | push | `game_id`, `turn` |
|
|
| `game.paused` | push | `game_id`, `turn`, `reason` |
|
|
|
|
Admin-channel kinds (`runtime.*`) deliver email to
|
|
`BACKEND_NOTIFICATION_ADMIN_EMAIL`; when the variable is empty, those
|
|
routes land in `notification_routes` with `status='skipped'` and the
|
|
operator log line records the configuration miss.
|
|
|
|
`game.turn.ready` and `game.paused` are emitted by
|
|
`lobby.Service.OnRuntimeSnapshot`
|
|
(`backend/internal/lobby/runtime_hooks.go`):
|
|
|
|
- `game.turn.ready` fires whenever the engine's `current_turn`
|
|
advances. Idempotency key `turn-ready:<game_id>:<turn>`, JSON
|
|
payload `{game_id, turn}`.
|
|
- `game.paused` fires whenever the same hook flips the game
|
|
`running → paused` because a runtime snapshot landed with
|
|
`engine_unreachable` / `generation_failed`. Idempotency key
|
|
`paused:<game_id>:<turn>`, JSON payload
|
|
`{game_id, turn, reason}` (reason carries the runtime status
|
|
that triggered the transition). The runtime scheduler
|
|
(`backend/internal/runtime/scheduler.go`) forwards the failing
|
|
snapshot through `Service.publishFailureSnapshot` so a single
|
|
failing tick reliably reaches lobby.
|
|
|
|
Both kinds target every active membership and route through the
|
|
push channel only — per-turn / per-pause email would be spam — so
|
|
the UI's signed `SubscribeEvents` stream
|
|
(`ui/frontend/src/api/events.svelte.ts`) is the sole delivery
|
|
path. The order tab consumes them via
|
|
`OrderDraftStore.resetForNewTurn` / `markPaused`
|
|
(`ui/docs/sync-protocol.md`).
|
|
|
|
The remaining `game.*` (`game.started`, `game.generation.failed`,
|
|
`game.finished`) and `mail.dead_lettered` are reserved kinds without
|
|
a producer in the catalog; adding them is an additive change to the
|
|
catalog vocabulary and the migration CHECK constraint.
|
|
|
|
Templates ship in English only; localisation belongs to clients that
|
|
render the push payload, not to the backend mail body. Per-route mail
|
|
idempotency uses the `route_id` UUID as `idempotency_key`, so retried
|
|
notifications and partial failures cannot fan out a duplicate email.
|
|
|
|
## 11. Geo Profile
|
|
|
|
`internal/geo` operates on the GeoLite2 Country database loaded from
|
|
`BACKEND_GEOIP_DB_PATH` at startup.
|
|
|
|
- `SetDeclaredCountryAtRegistration(user_id, ip)` is called from
|
|
`auth.confirmEmailCode`. It looks up the country and writes it to
|
|
`accounts.declared_country`. The value is never updated after.
|
|
- `IncrementCounterAsync(user_id, ip)` is called from the user-surface
|
|
middleware. It launches a goroutine that looks up the country and
|
|
upserts `(user_id, country, count)` in `user_country_counters`. The
|
|
caller does not block.
|
|
- Lookup errors are logged and ignored; geo work never blocks the user.
|
|
|
|
There is no aggregation, no automatic flagging, no version history of
|
|
declared country, no admin-side review workflow. Counter rows are
|
|
exposed to operators via the admin surface for manual inspection only.
|
|
|
|
## 12. Admin Surface
|
|
|
|
- HTTP Basic Auth credentials are checked against `admin_accounts`
|
|
(Postgres). Passwords are hashed with bcrypt cost 12.
|
|
- Bootstrap on startup: if `BACKEND_ADMIN_BOOTSTRAP_USER` is configured
|
|
and no row with that username exists, insert one with the hashed
|
|
bootstrap password. The insert is idempotent.
|
|
- Admin endpoints are grouped by domain:
|
|
- `POST/GET /api/v1/admin/admin-accounts/*` — manage admins.
|
|
- `GET/POST /api/v1/admin/users/*` — list, lookup, sanction, limit, soft delete.
|
|
- `GET/POST /api/v1/admin/games/*` — list, create (public-game), inspect, force start/stop, ban member.
|
|
- `GET/POST /api/v1/admin/runtimes/*` — inspect runtime, restart, patch.
|
|
- `GET/POST /api/v1/admin/mail/*` — list deliveries, resend, view attempts.
|
|
- `GET /api/v1/admin/notifications/*` — inspect notifications and dead letters.
|
|
- Failed Basic Auth returns `401` with `WWW-Authenticate: Basic realm="galaxy-admin"`.
|
|
|
|
## 13. Local Run
|
|
|
|
Prerequisites:
|
|
|
|
- Go toolchain matching `go.work`.
|
|
- Postgres reachable via `BACKEND_POSTGRES_DSN` (a local container is
|
|
fine).
|
|
- An SMTP server (`mailhog`, `mailpit`, or any other dev relay) reachable
|
|
via `BACKEND_SMTP_HOST`/`BACKEND_SMTP_PORT`.
|
|
- Docker daemon reachable via `BACKEND_DOCKER_HOST` (the local socket is
|
|
the default; running engines through this requires the user-defined
|
|
bridge named in `BACKEND_DOCKER_NETWORK`).
|
|
- A GeoLite2 Country `.mmdb` file at `BACKEND_GEOIP_DB_PATH`. For tests,
|
|
use the synthetic mmdb generator under `pkg/geoip/test-data/`.
|
|
|
|
Run:
|
|
|
|
```bash
|
|
go run ./backend/cmd/backend
|
|
```
|
|
|
|
Migrations are embedded and applied at startup. Bootstrapping the first
|
|
admin happens on the first run if the env vars are set. Subsequent
|
|
restarts are idempotent.
|
|
|
|
## 14. Testing
|
|
|
|
Three levels:
|
|
|
|
- **Unit tests** colocated with the implementation (`*_test.go` next to
|
|
the file under test). Use `testify` for assertions, `go.uber.org/mock`
|
|
for interface mocking when an external boundary justifies it.
|
|
- **Contract tests** under `internal/server/`. Validate every request
|
|
and response against `openapi.yaml` at runtime via `kin-openapi`. New
|
|
endpoints must be added to `openapi.yaml` first; the contract test
|
|
fails until the implementation matches.
|
|
- **Integration tests** under `../integration/` (top-level repo
|
|
module). Use `testcontainers-go` for Postgres and optionally for an
|
|
SMTP capture container. Cover the user flows end to end through the
|
|
real backend binary.
|
|
|
|
`make test` runs unit and contract tests. `make integration-test` runs
|
|
the integration suite (requires Docker).
|
|
|
|
## 15. Telemetry
|
|
|
|
Required minimum signals:
|
|
|
|
- `http_requests_total{group, method, path, status}` and
|
|
`http_request_duration_seconds{...}` for each route group.
|
|
- `grpc_push_subscribers` (gauge), `grpc_push_events_total{kind}`,
|
|
`grpc_push_dropped_total{gateway_client_id}`.
|
|
- `mail_outbox_depth{state}` (gauge), `mail_attempts_total{outcome}`,
|
|
`mail_dead_letters_total`.
|
|
- `notification_intents_total{kind, outcome}`,
|
|
`notification_routes_total{channel}`.
|
|
- `runtime_container_ops_total{op, outcome}`,
|
|
`runtime_health_probes_total{outcome}`.
|
|
- `geo_lookups_total{outcome}`.
|
|
- `db_pool_acquires_total`, `db_pool_in_use{...}`, `db_pool_waits_total`.
|
|
|
|
Tracing covers HTTP request → domain operation → Postgres calls →
|
|
external client calls (SMTP, Docker, engine). Every span is linked to
|
|
the request id.
|
|
|
|
Logs are JSON, written to stdout, with `otel_trace_id` and
|
|
`otel_span_id` injected when a span context is available. The minimum
|
|
fields are `ts`, `level`, `caller`, `service`, `msg`, plus per-call
|
|
context.
|
|
|
|
## 16. Operational Notes
|
|
|
|
- Graceful shutdown drains in this order on SIGTERM/SIGINT: stop
|
|
accepting new HTTP and gRPC traffic → wait for in-flight requests
|
|
(bounded by `BACKEND_HTTP_SHUTDOWN_TIMEOUT` and the gRPC counterpart)
|
|
→ flush mail outbox writes that have already started → drain push
|
|
events to gateway → close the Docker client → close the Postgres pool.
|
|
- `/healthz` returns 200 unconditionally as long as the process is
|
|
alive.
|
|
- `/readyz` checks: Postgres reachable, migrations applied, gRPC
|
|
listener bound. Returns 503 until all hold.
|
|
- Logs are JSON to stdout. Crash dumps go to stderr.
|
|
- Configuration changes require a restart; there is no live reload.
|
|
- Bootstrap admin password should be rotated through the admin surface
|
|
immediately after the first deploy.
|
|
|
|
## 17. Service Documentation
|
|
|
|
Extended service-local documentation lives in [`docs/`](docs/):
|
|
|
|
- [Documentation index](docs/README.md)
|
|
- [Runtime and components](docs/runtime.md)
|
|
- [Domain and protocol flows](docs/flows.md)
|
|
- [Operator runbook](docs/runbook.md)
|
|
- [Configuration and OpenAPI examples](docs/examples.md)
|
|
|
|
Primary references:
|
|
|
|
- [`PLAN.md`](PLAN.md) — historical staged build-up of the service.
|
|
- [`openapi.yaml`](openapi.yaml) — REST contract.
|
|
- [`../docs/ARCHITECTURE.md`](../docs/ARCHITECTURE.md) — workspace-level architecture.
|