feat: backend service
This commit is contained in:
@@ -0,0 +1,472 @@
|
||||
# backend
|
||||
|
||||
`backend` is the consolidated business service of the Galaxy platform. It
|
||||
owns identity, sessions, lobby, game runtime, mail, notifications, geo
|
||||
signals, and administration. It is reachable only from `gateway` over
|
||||
the trusted network. See `../ARCHITECTURE.md` for the platform-level
|
||||
context, security model, and decision rationale.
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
A single Go binary that:
|
||||
|
||||
- Serves three HTTP route groups (`/api/v1/public/*`, `/api/v1/user/*`,
|
||||
`/api/v1/admin/*`) plus health probes.
|
||||
- Hosts a gRPC `SubscribePush` server consumed by `gateway`.
|
||||
- Owns one Postgres schema (`backend`).
|
||||
- Talks to the Docker daemon to run game engine containers.
|
||||
- Talks to an SMTP relay to send mail through a durable outbox.
|
||||
- Reads the GeoLite2 country database for source-IP country lookup.
|
||||
|
||||
This README describes how the binary is laid out, configured, and run.
|
||||
The implementation specification lives in `PLAN.md`.
|
||||
|
||||
## 2. API Surfaces
|
||||
|
||||
| Prefix | Auth | Audience |
|
||||
| ------------------ | ----------------------------------------------- | ------------------------------------- |
|
||||
| `/api/v1/public/*` | none | Registration, code confirmation |
|
||||
| `/api/v1/user/*` | `X-User-ID` injected by gateway | Authenticated end users |
|
||||
| `/api/v1/admin/*` | HTTP Basic Auth against `admin_accounts` | Platform administrators |
|
||||
| `/healthz` | none | Liveness probe |
|
||||
| `/readyz` | none | Readiness probe |
|
||||
|
||||
The full contract is documented in `openapi.yaml` and validated at
|
||||
runtime by the contract tests under `internal/server/`.
|
||||
|
||||
## 3. Module Layout
|
||||
|
||||
```text
|
||||
backend/
|
||||
├── cmd/
|
||||
│ ├── backend/ # main.go: process entrypoint
|
||||
│ └── jetgen/ # jet code generator runner
|
||||
├── internal/
|
||||
│ ├── admin/ # admin_accounts, Basic Auth verifier, admin operations
|
||||
│ ├── auth/ # email-code challenges, device sessions, Ed25519 keys
|
||||
│ ├── config/ # env-var loader, Validate
|
||||
│ ├── dockerclient/ # docker/docker wrapper for container ops
|
||||
│ ├── engineclient/ # net/http client to galaxy-game containers
|
||||
│ ├── geo/ # geoip lookup, declared_country, per-user counters
|
||||
│ ├── lobby/ # games, applications, invites, memberships, RND
|
||||
│ ├── mail/ # outbox worker, SMTP delivery, dead letters
|
||||
│ ├── notification/ # intent normalisation, push + email fan-out
|
||||
│ ├── postgres/ # pgx pool, embedded migrations, jet/
|
||||
│ ├── push/ # gRPC SubscribePush server
|
||||
│ ├── runtime/ # engine version registry, container lifecycle, scheduler
|
||||
│ ├── server/ # gin engine, route groups, middleware, handlers
|
||||
│ ├── telemetry/ # otel runtime, zap factory
|
||||
│ └── user/ # accounts, settings, entitlements, sanctions, soft delete
|
||||
├── proto/
|
||||
│ └── push/v1/ # push.proto and generated gRPC code
|
||||
├── docs/ # per-stage decision records (one file per decision)
|
||||
├── openapi.yaml # full REST contract (public + user + admin)
|
||||
├── go.mod
|
||||
├── Makefile # `make jet` regenerates jet code
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## 4. Configuration
|
||||
|
||||
All configuration is environment-based; there are no flags or files.
|
||||
`Validate()` is called once at startup; missing required values fail
|
||||
fast.
|
||||
|
||||
| Variable | Required | Default | Purpose |
|
||||
| --------------------------------------- | -------- | ------------------------ | --------------------------------------------------- |
|
||||
| `BACKEND_HTTP_LISTEN_ADDR` | no | `:8080` | HTTP listener for REST surfaces and probes. |
|
||||
| `BACKEND_HTTP_READ_TIMEOUT` | no | `30s` | HTTP read timeout. |
|
||||
| `BACKEND_HTTP_WRITE_TIMEOUT` | no | `30s` | HTTP write timeout. |
|
||||
| `BACKEND_HTTP_SHUTDOWN_TIMEOUT` | no | `15s` | Graceful shutdown budget for HTTP server. |
|
||||
| `BACKEND_SHUTDOWN_TIMEOUT` | no | `30s` | Process-wide cap applied to each component shutdown. |
|
||||
| `BACKEND_GRPC_PUSH_LISTEN_ADDR` | no | `:8081` | gRPC listener for the push interface. |
|
||||
| `BACKEND_GRPC_PUSH_SHUTDOWN_TIMEOUT` | no | `10s` | Graceful shutdown budget for the gRPC server. |
|
||||
| `BACKEND_LOGGING_LEVEL` | no | `info` | zap log level. |
|
||||
| `BACKEND_POSTGRES_DSN` | yes | — | pgx-style Postgres DSN. Must include `search_path=backend` so unqualified reads and writes resolve to the service-owned schema. |
|
||||
| `BACKEND_POSTGRES_MAX_CONNS` | no | `25` | Pool max connections. |
|
||||
| `BACKEND_POSTGRES_MIN_CONNS` | no | `2` | Pool min connections. |
|
||||
| `BACKEND_POSTGRES_OPERATION_TIMEOUT` | no | `5s` | Default per-statement timeout. |
|
||||
| `BACKEND_SMTP_HOST` | yes | — | SMTP relay host. |
|
||||
| `BACKEND_SMTP_PORT` | no | `587` | SMTP relay port. |
|
||||
| `BACKEND_SMTP_USERNAME` | no | — | SMTP auth username (omit for anonymous). |
|
||||
| `BACKEND_SMTP_PASSWORD` | no | — | SMTP auth password. |
|
||||
| `BACKEND_SMTP_FROM` | yes | — | RFC-5321 From address. |
|
||||
| `BACKEND_SMTP_TLS_MODE` | no | `starttls` | `none`, `starttls`, or `tls`. |
|
||||
| `BACKEND_MAIL_WORKER_INTERVAL` | no | `2s` | How often the outbox worker scans for new work. |
|
||||
| `BACKEND_MAIL_MAX_ATTEMPTS` | no | `8` | Maximum delivery attempts before dead-lettering. |
|
||||
| `BACKEND_DOCKER_HOST` | no | `unix:///var/run/docker.sock` | Docker daemon endpoint. |
|
||||
| `BACKEND_DOCKER_NETWORK` | yes | — | User-defined Docker bridge network for engines. |
|
||||
| `BACKEND_GAME_STATE_ROOT` | yes | — | Host directory bind-mounted into engine containers. |
|
||||
| `BACKEND_ADMIN_BOOTSTRAP_USER` | no | — | Initial admin username; idempotent insert. |
|
||||
| `BACKEND_ADMIN_BOOTSTRAP_PASSWORD` | no | — | Initial admin password; required if user is set. |
|
||||
| `BACKEND_GEOIP_DB_PATH` | yes | — | Filesystem path to GeoLite2 Country `.mmdb`. |
|
||||
| `BACKEND_OTEL_TRACES_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`. |
|
||||
| `BACKEND_OTEL_METRICS_EXPORTER` | no | `otlp` | `none`, `otlp`, `stdout`, `prometheus`. |
|
||||
| `BACKEND_OTEL_PROTOCOL` | no | `grpc` | `grpc` or `http/protobuf`. OTLP only. |
|
||||
| `BACKEND_OTEL_ENDPOINT` | no | provider default | OTLP endpoint URL. |
|
||||
| `BACKEND_OTEL_PROMETHEUS_LISTEN_ADDR` | no | `:9100` | When `BACKEND_OTEL_METRICS_EXPORTER=prometheus`. |
|
||||
| `BACKEND_SERVICE_NAME` | no | `galaxy-backend` | Resource attribute for telemetry. |
|
||||
| `BACKEND_FRESHNESS_WINDOW` | no | `5m` | Mirrors gateway freshness window for push cursor TTL. |
|
||||
| `BACKEND_AUTH_CHALLENGE_TTL` | no | `10m` | Lifetime of an issued `auth_challenges` row. |
|
||||
| `BACKEND_AUTH_CHALLENGE_MAX_ATTEMPTS` | no | `5` | Maximum confirm-email-code attempts per challenge. |
|
||||
| `BACKEND_AUTH_CHALLENGE_THROTTLE_WINDOW`| no | `60s` | Rolling window over which challenges are counted toward throttle. |
|
||||
| `BACKEND_AUTH_CHALLENGE_THROTTLE_MAX` | no | `3` | Max un-consumed, non-expired challenges per email per window before reuse kicks in. |
|
||||
| `BACKEND_AUTH_USERNAME_MAX_RETRIES` | no | `10` | Retry budget for synthesising a unique placeholder `accounts.user_name` at registration. |
|
||||
| `BACKEND_LOBBY_SWEEPER_INTERVAL` | no | `60s` | How often the lobby sweeper releases expired pending_registrations and auto-closes enrollment-expired games. |
|
||||
| `BACKEND_LOBBY_PENDING_REGISTRATION_TTL`| no | `720h` (30 days) | Lifetime of a `pending_registration` Race Name Directory entry awaiting promotion. |
|
||||
| `BACKEND_LOBBY_INVITE_DEFAULT_TTL` | no | `168h` (7 days) | Default expiry applied to invites whose request body omits `expires_at`. |
|
||||
| `BACKEND_ENGINE_CALL_TIMEOUT` | no | `60s` | Per-call timeout for engine writes (init, turn, banish, command, order). |
|
||||
| `BACKEND_ENGINE_PROBE_TIMEOUT` | no | `5s` | Per-call timeout for engine reads (status, report, healthz). |
|
||||
| `BACKEND_RUNTIME_WORKER_POOL_SIZE` | no | `4` | Long-running runtime job concurrency. |
|
||||
| `BACKEND_RUNTIME_JOB_QUEUE_SIZE` | no | `64` | Buffered runtime-job channel depth. |
|
||||
| `BACKEND_RUNTIME_RECONCILE_INTERVAL` | no | `60s` | Interval between reconciler passes against the Docker daemon. |
|
||||
| `BACKEND_RUNTIME_IMAGE_PULL_POLICY` | no | `if_missing` | Engine image pull policy: `if_missing`, `always`, `never`. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_LOG_DRIVER` | no | `json-file` | Docker log driver applied to engine containers. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_LOG_OPTS` | no | — | Comma-separated `key=value` pairs forwarded to the log driver. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_CPU_QUOTA` | no | `2.0` | Engine container `--cpus`. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_MEMORY` | no | `512m` | Engine container `--memory`. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT` | no | `256` | Engine container `--pids-limit`. |
|
||||
| `BACKEND_RUNTIME_CONTAINER_STATE_MOUNT` | no | `/var/lib/galaxy-game` | Absolute in-container path for the per-game state bind mount. |
|
||||
| `BACKEND_RUNTIME_STOP_GRACE_PERIOD` | no | `10s` | SIGTERM-to-SIGKILL grace period for engine container stop. |
|
||||
| `BACKEND_NOTIFICATION_ADMIN_EMAIL` | no | — | Recipient address for admin-channel notifications (`runtime.*` kinds). When empty, admin-channel routes are recorded as `skipped` and the catalog is partially silenced. |
|
||||
| `BACKEND_NOTIFICATION_WORKER_INTERVAL` | no | `5s` | Notification route worker scan interval. |
|
||||
| `BACKEND_NOTIFICATION_MAX_ATTEMPTS` | no | `8` | Notification route delivery attempts before dead-lettering. |
|
||||
|
||||
If `BACKEND_ADMIN_BOOTSTRAP_USER` is set without
|
||||
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`, `Validate()` fails. If neither is
|
||||
set, no bootstrap insert happens and operators are expected to have
|
||||
seeded `admin_accounts` ahead of time.
|
||||
|
||||
## 5. Persistence
|
||||
|
||||
- One Postgres database, schema `backend`. The role used by `backend`
|
||||
must own the schema (or be granted `CREATE` on it for migrations).
|
||||
- Migrations live in `internal/postgres/migrations/`, are embedded into
|
||||
the binary via `embed.FS`, and are applied with `pressly/goose/v3`
|
||||
before the HTTP listener opens. The startup path also issues a
|
||||
`CREATE SCHEMA IF NOT EXISTS backend` so a fresh database does not
|
||||
trip goose's bookkeeping table on the first migration.
|
||||
- Pre-production uses one migration file (`00001_init.sql`) covering
|
||||
every backend domain (auth, user, admin, lobby, runtime, mail,
|
||||
notification, geo). Future migrations are sequence-numbered and
|
||||
additive.
|
||||
- Queries are written through `go-jet/jet/v2`. The generated code is in
|
||||
`internal/postgres/jet/backend/` and is committed; `internal/postgres/jet/jet.go`
|
||||
carries package metadata that survives regeneration.
|
||||
- `make jet` regenerates the jet code: it spins up a transient Postgres
|
||||
container, applies the migrations, runs `cmd/jetgen`, and writes the
|
||||
output back into `internal/postgres/jet/backend/`. Goose's
|
||||
bookkeeping table is dropped before generation so it does not leak
|
||||
into the generated package.
|
||||
- `BACKEND_POSTGRES_DSN` must include `search_path=backend`; the runtime
|
||||
pool relies on this so unqualified reads and writes resolve to the
|
||||
service-owned schema.
|
||||
|
||||
Idempotency is enforced through UNIQUE indexes on durable tables; there
|
||||
is no separate idempotency-key table. Worker pickup uses `SELECT ...
|
||||
FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`.
|
||||
|
||||
## 6. In-Memory Cache
|
||||
|
||||
`backend` warms the following caches at startup before the HTTP listener
|
||||
opens:
|
||||
|
||||
- Active device sessions (lookup by `device_session_id`).
|
||||
- User entitlement snapshots (lookup by `user_id`).
|
||||
- Engine version registry (lookup by version label, populated by `internal/runtime`).
|
||||
- Active runtime records (lookup by `game_id`, populated by `internal/runtime`).
|
||||
- Active games and their memberships.
|
||||
- Race Name Directory canonical keys.
|
||||
- Admin accounts.
|
||||
|
||||
Each cache is updated write-through in the same domain transaction
|
||||
that touches Postgres. Caches are bounded to MVP-scale data sets; if any
|
||||
cache grows beyond the budget, the architecture document mandates a
|
||||
discussion before moving the cache out of process.
|
||||
|
||||
## 7. gRPC Push Interface
|
||||
|
||||
The push interface is the only gRPC server hosted by `backend`. The
|
||||
contract is in `proto/push/v1/push.proto`:
|
||||
|
||||
```proto
|
||||
service Push {
|
||||
rpc SubscribePush(GatewaySubscribeRequest) returns (stream PushEvent);
|
||||
}
|
||||
|
||||
message PushEvent {
|
||||
oneof kind {
|
||||
ClientEvent client_event = 1;
|
||||
SessionInvalidation session_invalidation = 2;
|
||||
}
|
||||
string cursor = 3;
|
||||
}
|
||||
```
|
||||
|
||||
- `ClientEvent` carries an opaque payload addressed to a `(user_id [,
|
||||
device_session_id])`. Gateway signs and forwards it to active client
|
||||
subscriptions. The frame also carries `event_id`, `request_id`, and
|
||||
`trace_id` correlation strings populated by backend producers
|
||||
(notification dispatcher fills `event_id` from `route_id`,
|
||||
`request_id` from the originating intent's `idempotency_key`, and
|
||||
`trace_id` from the active span); gateway re-emits the values inside
|
||||
the signed client envelope without re-interpreting them.
|
||||
- `SessionInvalidation` instructs gateway to close active subscriptions
|
||||
and reject in-flight requests for the affected sessions.
|
||||
- `cursor` is a monotonically increasing string. Gateway stores the last
|
||||
consumed cursor and uses it on reconnect. The format is opaque to
|
||||
gateway; backend only guarantees lexicographic monotonicity within a
|
||||
process lifetime, and resets the sequence after a restart.
|
||||
- Backend keeps an in-memory ring buffer of recent events with a TTL of
|
||||
`BACKEND_FRESHNESS_WINDOW`. Cursors that have aged out resume from a
|
||||
fresh point.
|
||||
- A gateway reconnect with the same `gateway_client_id` replaces the
|
||||
previous subscription (`codes.Aborted` is returned to the older
|
||||
stream). Distinct ids fan out as separate broadcast targets.
|
||||
- Cursor format is a zero-padded decimal `uint64` string emitted by an
|
||||
in-process counter; gateway treats it as opaque.
|
||||
- Ring buffer eviction is by TTL plus a fixed capacity ceiling.
|
||||
Backpressure is per-connection drop-oldest: if the buffered channel
|
||||
for a subscriber overflows, the oldest event for that connection is
|
||||
discarded and the loss is logged so operators can correlate the gap
|
||||
on the gateway side.
|
||||
|
||||
## 8. Engine Client
|
||||
|
||||
`internal/engineclient` is a thin `net/http`-based client that targets
|
||||
running engine containers at `http://galaxy-game-{game_id}:8080`. It
|
||||
uses the DTOs in `pkg/model/{order,report,rest}` directly; it does not
|
||||
introduce its own request/response types.
|
||||
|
||||
Endpoints used:
|
||||
|
||||
- `POST /api/v1/admin/init`
|
||||
- `GET /api/v1/admin/status`
|
||||
- `PUT /api/v1/admin/turn`
|
||||
- `POST /api/v1/admin/race/banish`
|
||||
- `PUT /api/v1/command`
|
||||
- `PUT /api/v1/order`
|
||||
- `GET /api/v1/report`
|
||||
- `GET /healthz`
|
||||
|
||||
Engine-version arbitration lives in `internal/runtime`. Patch updates
|
||||
are semver-patch-only inside the same major/minor line; major or minor
|
||||
changes require explicit stop and start. Reconciliation adopts
|
||||
unrecorded containers tagged with the `galaxy.backend=1` label and
|
||||
marks recorded containers that are missing as removed.
|
||||
|
||||
## 9. Mail Outbox
|
||||
|
||||
Tables in schema `backend`:
|
||||
|
||||
- `mail_deliveries` — one row per logical delivery, keyed by
|
||||
`(template_id, idempotency_key)`.
|
||||
- `mail_recipients` — `(delivery_id, address)`.
|
||||
- `mail_attempts` — append-only attempt log.
|
||||
- `mail_dead_letters` — terminal failure mirror with the latest payload
|
||||
pointer for forensics and resend.
|
||||
- `mail_payloads` — opaque rendered payload bytes.
|
||||
|
||||
Lifecycle:
|
||||
|
||||
1. Producer writes the delivery and payload rows in one transaction.
|
||||
2. The worker picks the row with `SELECT ... FOR UPDATE SKIP LOCKED`,
|
||||
sends through SMTP using `wneessen/go-mail`, records the attempt,
|
||||
and either marks `sent` or schedules `next_attempt_at` with
|
||||
exponential backoff and jitter.
|
||||
3. After `BACKEND_MAIL_MAX_ATTEMPTS` the delivery moves to
|
||||
`mail_dead_letters`. An admin notification intent is emitted.
|
||||
4. Operators can resend a `pending`, `retrying`, or `dead_lettered`
|
||||
delivery via `POST /api/v1/admin/mail/{delivery_id}/resend`. Resend
|
||||
on a `sent` delivery returns `409 Conflict` so operators cannot
|
||||
accidentally redeliver an email that already left the relay.
|
||||
|
||||
On startup the worker drains every row in `pending` or `retrying`
|
||||
state. There is no separate recovery flow.
|
||||
|
||||
`mail_attempts.attempt_no` is monotonic across the entire history of a
|
||||
single `delivery_id` — a resend keeps the previous attempts and appends
|
||||
new ones rather than restarting the counter. `EnqueueLoginCode` uses a
|
||||
server-side UUID as `idempotency_key` so callers cannot collide; other
|
||||
template producers (notification routes, future direct callers) supply
|
||||
a stable key, and the UNIQUE on `(template_id, idempotency_key)`
|
||||
prevents duplicate delivery rows.
|
||||
|
||||
## 10. Notification Catalog
|
||||
|
||||
The catalog is the closed set of `notification_kind` values understood
|
||||
by `internal/notification`. Each kind specifies the channels it fans
|
||||
out to and the payload fields used by templates and clients. The
|
||||
`auth.login_code` row is delivered directly through the mail outbox
|
||||
from `internal/auth` and is not materialised inside
|
||||
`notification_routes` — the auth flow needs the delivery row to commit
|
||||
synchronously with the challenge, which the notification dispatcher
|
||||
cannot guarantee.
|
||||
|
||||
| Kind | Channels | Payload essentials |
|
||||
| ----------------------------------- | ------------- | -------------------------------------------------------- |
|
||||
| `auth.login_code` *(direct mail)* | email | `code`, `ttl` |
|
||||
| `lobby.invite.received` | push, email | `game_id`, `inviter_user_id` |
|
||||
| `lobby.invite.revoked` | push | `game_id` |
|
||||
| `lobby.application.submitted` | push | `game_id`, `application_id` |
|
||||
| `lobby.application.approved` | push, email | `game_id` |
|
||||
| `lobby.application.rejected` | push, email | `game_id` |
|
||||
| `lobby.membership.removed` | push, email | `game_id`, `reason` |
|
||||
| `lobby.membership.blocked` | push, email | `game_id` |
|
||||
| `lobby.race_name.registered` | push | `race_name` |
|
||||
| `lobby.race_name.pending` | push, email | `race_name`, `expires_at` |
|
||||
| `lobby.race_name.expired` | push | `race_name` |
|
||||
| `runtime.image_pull_failed` | admin email | `game_id`, `image_ref` |
|
||||
| `runtime.container_start_failed` | admin email | `game_id` |
|
||||
| `runtime.start_config_invalid` | admin email | `game_id`, `reason` |
|
||||
|
||||
Admin-channel kinds (`runtime.*`) deliver email to
|
||||
`BACKEND_NOTIFICATION_ADMIN_EMAIL`; when the variable is empty, those
|
||||
routes land in `notification_routes` with `status='skipped'` and the
|
||||
operator log line records the configuration miss.
|
||||
|
||||
`game.*` (`game.started`, `game.turn.ready`, `game.generation.failed`,
|
||||
`game.finished`) and `mail.dead_lettered` are reserved kinds without a
|
||||
producer in the catalog; adding them is an additive change to the
|
||||
catalog vocabulary and the migration CHECK constraint.
|
||||
|
||||
Templates ship in English only; localisation belongs to clients that
|
||||
render the push payload, not to the backend mail body. Per-route mail
|
||||
idempotency uses the `route_id` UUID as `idempotency_key`, so retried
|
||||
notifications and partial failures cannot fan out a duplicate email.
|
||||
|
||||
## 11. Geo Profile
|
||||
|
||||
`internal/geo` operates on the GeoLite2 Country database loaded from
|
||||
`BACKEND_GEOIP_DB_PATH` at startup.
|
||||
|
||||
- `SetDeclaredCountryAtRegistration(user_id, ip)` is called from
|
||||
`auth.confirmEmailCode`. It looks up the country and writes it to
|
||||
`accounts.declared_country`. The value is never updated after.
|
||||
- `IncrementCounterAsync(user_id, ip)` is called from the user-surface
|
||||
middleware. It launches a goroutine that looks up the country and
|
||||
upserts `(user_id, country, count)` in `user_country_counters`. The
|
||||
caller does not block.
|
||||
- Lookup errors are logged and ignored; geo work never blocks the user.
|
||||
|
||||
There is no aggregation, no automatic flagging, no version history of
|
||||
declared country, no admin-side review workflow. Counter rows are
|
||||
exposed to operators via the admin surface for manual inspection only.
|
||||
|
||||
## 12. Admin Surface
|
||||
|
||||
- HTTP Basic Auth credentials are checked against `admin_accounts`
|
||||
(Postgres). Passwords are hashed with bcrypt cost 12.
|
||||
- Bootstrap on startup: if `BACKEND_ADMIN_BOOTSTRAP_USER` is configured
|
||||
and no row with that username exists, insert one with the hashed
|
||||
bootstrap password. The insert is idempotent.
|
||||
- Admin endpoints are grouped by domain:
|
||||
- `POST/GET /api/v1/admin/admin-accounts/*` — manage admins.
|
||||
- `GET/POST /api/v1/admin/users/*` — list, lookup, sanction, limit, soft delete.
|
||||
- `GET/POST /api/v1/admin/games/*` — list, create (public-game), inspect, force start/stop, ban member.
|
||||
- `GET/POST /api/v1/admin/runtimes/*` — inspect runtime, restart, patch.
|
||||
- `GET/POST /api/v1/admin/mail/*` — list deliveries, resend, view attempts.
|
||||
- `GET /api/v1/admin/notifications/*` — inspect notifications and dead letters.
|
||||
- Failed Basic Auth returns `401` with `WWW-Authenticate: Basic realm="galaxy-admin"`.
|
||||
|
||||
## 13. Local Run
|
||||
|
||||
Prerequisites:
|
||||
|
||||
- Go toolchain matching `go.work`.
|
||||
- Postgres reachable via `BACKEND_POSTGRES_DSN` (a local container is
|
||||
fine).
|
||||
- An SMTP server (`mailhog`, `mailpit`, or any other dev relay) reachable
|
||||
via `BACKEND_SMTP_HOST`/`BACKEND_SMTP_PORT`.
|
||||
- Docker daemon reachable via `BACKEND_DOCKER_HOST` (the local socket is
|
||||
the default; running engines through this requires the user-defined
|
||||
bridge named in `BACKEND_DOCKER_NETWORK`).
|
||||
- A GeoLite2 Country `.mmdb` file at `BACKEND_GEOIP_DB_PATH`. For tests,
|
||||
use the synthetic mmdb generator under `pkg/geoip/test-data/`.
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
go run ./backend/cmd/backend
|
||||
```
|
||||
|
||||
Migrations are embedded and applied at startup. Bootstrapping the first
|
||||
admin happens on the first run if the env vars are set. Subsequent
|
||||
restarts are idempotent.
|
||||
|
||||
## 14. Testing
|
||||
|
||||
Three levels:
|
||||
|
||||
- **Unit tests** colocated with the implementation (`*_test.go` next to
|
||||
the file under test). Use `testify` for assertions, `go.uber.org/mock`
|
||||
for interface mocking when an external boundary justifies it.
|
||||
- **Contract tests** under `internal/server/`. Validate every request
|
||||
and response against `openapi.yaml` at runtime via `kin-openapi`. New
|
||||
endpoints must be added to `openapi.yaml` first; the contract test
|
||||
fails until the implementation matches.
|
||||
- **Integration tests** under `../integration/` (top-level repo
|
||||
module). Use `testcontainers-go` for Postgres and optionally for an
|
||||
SMTP capture container. Cover the user flows end to end through the
|
||||
real backend binary.
|
||||
|
||||
`make test` runs unit and contract tests. `make integration-test` runs
|
||||
the integration suite (requires Docker).
|
||||
|
||||
## 15. Telemetry
|
||||
|
||||
Required minimum signals:
|
||||
|
||||
- `http_requests_total{group, method, path, status}` and
|
||||
`http_request_duration_seconds{...}` for each route group.
|
||||
- `grpc_push_subscribers` (gauge), `grpc_push_events_total{kind}`,
|
||||
`grpc_push_dropped_total{gateway_client_id}`.
|
||||
- `mail_outbox_depth{state}` (gauge), `mail_attempts_total{outcome}`,
|
||||
`mail_dead_letters_total`.
|
||||
- `notification_intents_total{kind, outcome}`,
|
||||
`notification_routes_total{channel}`.
|
||||
- `runtime_container_ops_total{op, outcome}`,
|
||||
`runtime_health_probes_total{outcome}`.
|
||||
- `geo_lookups_total{outcome}`.
|
||||
- `db_pool_acquires_total`, `db_pool_in_use{...}`, `db_pool_waits_total`.
|
||||
|
||||
Tracing covers HTTP request → domain operation → Postgres calls →
|
||||
external client calls (SMTP, Docker, engine). Every span is linked to
|
||||
the request id.
|
||||
|
||||
Logs are JSON, written to stdout, with `otel_trace_id` and
|
||||
`otel_span_id` injected when a span context is available. The minimum
|
||||
fields are `ts`, `level`, `caller`, `service`, `msg`, plus per-call
|
||||
context.
|
||||
|
||||
## 16. Operational Notes
|
||||
|
||||
- Graceful shutdown drains in this order on SIGTERM/SIGINT: stop
|
||||
accepting new HTTP and gRPC traffic → wait for in-flight requests
|
||||
(bounded by `BACKEND_HTTP_SHUTDOWN_TIMEOUT` and the gRPC counterpart)
|
||||
→ flush mail outbox writes that have already started → drain push
|
||||
events to gateway → close the Docker client → close the Postgres pool.
|
||||
- `/healthz` returns 200 unconditionally as long as the process is
|
||||
alive.
|
||||
- `/readyz` checks: Postgres reachable, migrations applied, gRPC
|
||||
listener bound. Returns 503 until all hold.
|
||||
- Logs are JSON to stdout. Crash dumps go to stderr.
|
||||
- Configuration changes require a restart; there is no live reload.
|
||||
- Bootstrap admin password should be rotated through the admin surface
|
||||
immediately after the first deploy.
|
||||
|
||||
## 17. Service Documentation
|
||||
|
||||
Extended service-local documentation lives in [`docs/`](docs/):
|
||||
|
||||
- [Documentation index](docs/README.md)
|
||||
- [Runtime and components](docs/runtime.md)
|
||||
- [Domain and protocol flows](docs/flows.md)
|
||||
- [Operator runbook](docs/runbook.md)
|
||||
- [Configuration and OpenAPI examples](docs/examples.md)
|
||||
|
||||
Primary references:
|
||||
|
||||
- [`PLAN.md`](PLAN.md) — historical staged build-up of the service.
|
||||
- [`openapi.yaml`](openapi.yaml) — REST contract.
|
||||
- [`../ARCHITECTURE.md`](../ARCHITECTURE.md) — workspace-level architecture.
|
||||
Reference in New Issue
Block a user