869 lines
36 KiB
Markdown
869 lines
36 KiB
Markdown
# backend — Implementation Plan
|
|
|
|
This plan has been already implemented and stays here for historical reasons.
|
|
|
|
It should NOT be threated as source of truth for service functionality.
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
This plan is the technical specification for implementing the
|
|
consolidated Galaxy `backend` service. It is read together with
|
|
`../ARCHITECTURE.md` (architecture and security model) and
|
|
`README.md` (module layout, configuration, operations).
|
|
|
|
After reading those two documents and this plan, an implementing
|
|
engineer should not need to ask architectural questions. Every stage is
|
|
self-contained inside its domain area; stages run in order; each stage
|
|
has explicit Critical files.
|
|
|
|
The plan does not invent new domain concepts. It catalogues the work
|
|
required to assemble what the architecture document already defines.
|
|
|
|
## ~~Stage 1~~ — Repository cleanup
|
|
|
|
This stage was implemented and marked as done.
|
|
|
|
Goal: remove every module whose responsibility moves into `backend`,
|
|
and prepare the workspace for the new module.
|
|
|
|
Actions:
|
|
|
|
1. `git rm -r authsession/ lobby/ mail/ notification/ gamemaster/
|
|
rtmanager/ geoprofile/ user/ integration/ pkg/redisconn/
|
|
pkg/notificationintent/`.
|
|
2. Edit `go.work`:
|
|
- Remove `use` lines for the deleted modules.
|
|
- Remove `replace` lines for `galaxy/redisconn` and
|
|
`galaxy/notificationintent`.
|
|
- Do not add `./backend` yet — the module is created in Stage 2.
|
|
3. Confirm that surviving modules still build:
|
|
`go build ./gateway/... ./game/... ./client/... ./pkg/...`.
|
|
Any compile error here means a surviving module imported a
|
|
removed package and must be patched (the only realistic culprit is
|
|
`gateway`, which references `pkg/redisconn` and the deleted streams;
|
|
patches there belong to Stage 6, not Stage 1 — for Stage 1 it is
|
|
acceptable to leave gateway broken if and only if the only failures
|
|
come from imports of removed packages).
|
|
4. Run `go vet ./pkg/...` and confirm no diagnostic.
|
|
|
|
Out of scope: any code change inside surviving modules. Stage 1 is
|
|
purely deletion plus `go.work` edits.
|
|
|
|
Critical files:
|
|
|
|
- `go.work`
|
|
- the deletion of `authsession/`, `lobby/`, `mail/`, `notification/`,
|
|
`gamemaster/`, `rtmanager/`, `geoprofile/`, `user/`, `integration/`,
|
|
`pkg/redisconn/`, `pkg/notificationintent/`.
|
|
|
|
Done criteria:
|
|
|
|
- `git status` shows only deletions plus the `go.work` edit.
|
|
- `go build ./pkg/...` is clean.
|
|
- `go vet ./pkg/...` is clean.
|
|
|
|
## ~~Stage 2~~ — Backend skeleton & shared infrastructure
|
|
|
|
This stage was implemented and marked as done.
|
|
|
|
Goal: stand up the new module with its boot path, configuration,
|
|
telemetry, logger, HTTP listener, Postgres pool, and gRPC listener — all
|
|
with empty handlers. After this stage `go run ./backend/cmd/backend`
|
|
must boot to a state where probes return 200 and migrations run (with an
|
|
empty migration file).
|
|
|
|
Actions:
|
|
|
|
1. Create `backend/go.mod` with module path `galaxy/backend` and Go
|
|
version matching `go.work`. Add direct dependencies:
|
|
`github.com/gin-gonic/gin`, `github.com/jackc/pgx/v5`,
|
|
`github.com/go-jet/jet/v2`, `github.com/pressly/goose/v3`,
|
|
`go.uber.org/zap`, `go.opentelemetry.io/otel` and the OTLP
|
|
trace/metric exporters used by other services, and the `galaxy/*`
|
|
pkg modules (`postgres`, `model`, `geoip`, `cronutil`, `error`,
|
|
`util`).
|
|
2. Add `./backend` to `go.work` `use(...)`.
|
|
3. `backend/cmd/backend/main.go` — boot order:
|
|
1. Load `config.LoadFromEnv()`; `cfg.Validate()`.
|
|
2. Initialise telemetry (`telemetry.NewProcess(cfg.Telemetry)`). Set
|
|
global tracer and meter providers.
|
|
3. Construct the zap logger; inject trace fields helper.
|
|
4. Open Postgres pool. Apply embedded migrations with goose. Fail
|
|
fast on any error.
|
|
5. Construct module wiring (empty for now; populated in Stage 5).
|
|
6. Start the HTTP server (gin engine with empty route groups, plus
|
|
`/healthz` and `/readyz`).
|
|
7. Start the gRPC push server (no streams accepted yet — Stage 6).
|
|
8. Block on `signal.NotifyContext(ctx, SIGINT, SIGTERM)`; on signal,
|
|
drain in the order described in `README.md` §16.
|
|
4. `backend/internal/config/config.go` — env-loader following the
|
|
pattern used by surviving services. Cover every variable listed in
|
|
`README.md` §4. Provide `DefaultConfig()` and `Validate()`.
|
|
5. `backend/internal/telemetry/runtime.go` — port the existing service
|
|
pattern verbatim: configurable OTLP gRPC/HTTP exporter, optional
|
|
stdout exporter, Prometheus pull endpoint when configured. Expose
|
|
`TraceFieldsFromContext(ctx) []zap.Field`.
|
|
6. `backend/internal/server/server.go` — gin engine, three empty route
|
|
groups, request id middleware, panic recovery middleware, otel
|
|
middleware. Probe handlers in `server/probes.go`.
|
|
7. `backend/internal/postgres/pool.go` — pgx pool factory using the
|
|
shared `galaxy/postgres` helper.
|
|
8. `backend/internal/postgres/migrations/00001_init.sql` — empty file
|
|
containing the `-- +goose Up` and `-- +goose Down` markers and a
|
|
single `CREATE SCHEMA IF NOT EXISTS backend;` statement so the
|
|
migration is non-empty and can be verified.
|
|
9. `backend/internal/postgres/migrations/embed.go` — `embed.FS` and
|
|
exported `Migrations() fs.FS` helper.
|
|
10. `backend/internal/push/server.go` — gRPC server skeleton bound to
|
|
`cfg.GRPCPushListenAddr`. No service registered yet.
|
|
11. `backend/Makefile` — at minimum a `jet` target stub that prints
|
|
"not generated yet"; will be filled in Stage 4.
|
|
|
|
Critical files:
|
|
|
|
- `backend/go.mod`, `go.work`
|
|
- `backend/cmd/backend/main.go`
|
|
- `backend/internal/config/config.go`
|
|
- `backend/internal/telemetry/runtime.go`
|
|
- `backend/internal/server/server.go`, `backend/internal/server/probes.go`
|
|
- `backend/internal/postgres/pool.go`,
|
|
`backend/internal/postgres/migrations/00001_init.sql`,
|
|
`backend/internal/postgres/migrations/embed.go`
|
|
- `backend/internal/push/server.go`
|
|
- `backend/Makefile`
|
|
|
|
Done criteria:
|
|
|
|
- `go build ./backend/...` is clean.
|
|
- `go run ./backend/cmd/backend` starts, applies the placeholder
|
|
migration, opens HTTP and gRPC listeners, and serves `/healthz` 200
|
|
and `/readyz` 200.
|
|
- Telemetry output (stdout exporter) shows trace and metric activity on
|
|
a probe hit.
|
|
|
|
## ~~Stage~~ 3 — API contract & routing
|
|
|
|
This stage was implemented and marked as done.
|
|
|
|
Goal: define the entire backend REST contract in `openapi.yaml` and
|
|
register every handler as a placeholder that returns
|
|
`501 Not Implemented`. Wire the middleware stack for each route group.
|
|
The contract test suite must validate every endpoint round-trip against
|
|
the OpenAPI document and pass on the placeholders.
|
|
|
|
Actions:
|
|
|
|
1. Author `backend/openapi.yaml` — single document with three tags
|
|
(`Public`, `User`, `Admin`) and the endpoint set below. Reuse
|
|
schemas from `pkg/model` where possible; keep the rest under
|
|
`components/schemas/*`.
|
|
2. Implement middleware in `backend/internal/server/middleware/`:
|
|
- `requestid` — assigns and propagates a request id (Stage 2 may
|
|
have already done this; consolidate here).
|
|
- `logging` — emits an access log entry with trace fields.
|
|
- `metrics` — counters and histograms per route group.
|
|
- `panicrecovery` — converts panics to 500 with structured logging.
|
|
- `userid` — required on `/api/v1/user/*`. Reads `X-User-ID`,
|
|
parses as UUID, places it in the request context. Rejects with
|
|
400 if missing or malformed. Backend trusts the value (see
|
|
architecture trust note).
|
|
- `basicauth` — required on `/api/v1/admin/*`. Stage 3 uses a stub
|
|
verifier that accepts any non-empty username and a fixed password
|
|
read from a test-only env var so contract tests can pass; Stage
|
|
5.3 replaces the verifier with the real Postgres-backed one.
|
|
3. Implement handlers per endpoint in
|
|
`backend/internal/server/handlers_<group>_<topic>.go`. Every handler
|
|
returns `501 Not Implemented` with the standard error body
|
|
`{"error":{"code":"not_implemented","message":"..."}}`.
|
|
4. Implement the contract test:
|
|
`backend/internal/server/contract_test.go`. Loads
|
|
`backend/openapi.yaml` via `kin-openapi`, builds the gin engine,
|
|
walks every operation, sends a representative request, and
|
|
validates both the request and response against the OpenAPI
|
|
document.
|
|
5. Document `openapi.yaml` location and contract test pattern in
|
|
`backend/docs/api-contract.md` (a brief decision record).
|
|
|
|
### Endpoint inventory
|
|
|
|
Public (`/api/v1/public/*`):
|
|
|
|
- `POST /auth/send-email-code` — request body `{email, locale?}`;
|
|
response `{challenge_id}`.
|
|
- `POST /auth/confirm-email-code` — request body
|
|
`{challenge_id, code, client_public_key, time_zone}`; response
|
|
`{device_session_id}`.
|
|
|
|
Probes (root):
|
|
|
|
- `GET /healthz` — `200` always when the process is alive.
|
|
- `GET /readyz` — `200` once Postgres reachable, migrations applied,
|
|
gRPC listener bound; `503` otherwise.
|
|
|
|
User (`/api/v1/user/*`, all require `X-User-ID`):
|
|
|
|
- `GET /account` — current account view (profile + settings +
|
|
entitlements).
|
|
- `PATCH /account/profile` — update mutable profile fields
|
|
(`display_name`).
|
|
- `PATCH /account/settings` — update `preferred_language`, `time_zone`.
|
|
- `POST /account/delete` — soft delete; cascade is in process.
|
|
|
|
- `GET /lobby/games` — public list with paging.
|
|
- `POST /lobby/games` — create.
|
|
- `GET /lobby/games/{game_id}`.
|
|
- `PATCH /lobby/games/{game_id}`.
|
|
- `POST /lobby/games/{game_id}/open-enrollment`.
|
|
- `POST /lobby/games/{game_id}/ready-to-start`.
|
|
- `POST /lobby/games/{game_id}/start`.
|
|
- `POST /lobby/games/{game_id}/pause`.
|
|
- `POST /lobby/games/{game_id}/resume`.
|
|
- `POST /lobby/games/{game_id}/cancel`.
|
|
- `POST /lobby/games/{game_id}/retry-start`.
|
|
- `POST /lobby/games/{game_id}/applications`.
|
|
- `POST /lobby/games/{game_id}/applications/{application_id}/approve`.
|
|
- `POST /lobby/games/{game_id}/applications/{application_id}/reject`.
|
|
- `POST /lobby/games/{game_id}/invites`.
|
|
- `POST /lobby/games/{game_id}/invites/{invite_id}/redeem`.
|
|
- `POST /lobby/games/{game_id}/invites/{invite_id}/decline`.
|
|
- `POST /lobby/games/{game_id}/invites/{invite_id}/revoke`.
|
|
- `GET /lobby/games/{game_id}/memberships`.
|
|
- `POST /lobby/games/{game_id}/memberships/{membership_id}/remove`.
|
|
- `POST /lobby/games/{game_id}/memberships/{membership_id}/block`.
|
|
|
|
- `GET /lobby/my/games`.
|
|
- `GET /lobby/my/applications`.
|
|
- `GET /lobby/my/invites`.
|
|
- `GET /lobby/my/race-names`.
|
|
|
|
- `POST /lobby/race-names/register` — promote a `pending_registration`
|
|
to `registered` within the 30-day window.
|
|
|
|
- `POST /games/{game_id}/commands` — proxy to engine command path.
|
|
- `POST /games/{game_id}/orders` — proxy to engine order validation.
|
|
- `GET /games/{game_id}/reports/{turn}` — proxy to engine report path.
|
|
|
|
Admin (`/api/v1/admin/*`, all require Basic Auth):
|
|
|
|
- `GET /admin-accounts`, `POST /admin-accounts`,
|
|
`GET /admin-accounts/{username}`,
|
|
`POST /admin-accounts/{username}/disable`,
|
|
`POST /admin-accounts/{username}/enable`,
|
|
`POST /admin-accounts/{username}/reset-password`.
|
|
|
|
- `GET /users`, `GET /users/{user_id}`,
|
|
`POST /users/{user_id}/sanctions`,
|
|
`POST /users/{user_id}/limits`,
|
|
`POST /users/{user_id}/entitlements`,
|
|
`POST /users/{user_id}/soft-delete`.
|
|
|
|
- `GET /games`, `GET /games/{game_id}`,
|
|
`POST /games/{game_id}/force-start`,
|
|
`POST /games/{game_id}/force-stop`,
|
|
`POST /games/{game_id}/ban-member`.
|
|
|
|
- `GET /runtimes/{game_id}`,
|
|
`POST /runtimes/{game_id}/restart`,
|
|
`POST /runtimes/{game_id}/patch`,
|
|
`POST /runtimes/{game_id}/force-next-turn`,
|
|
`GET /engine-versions`, `POST /engine-versions`,
|
|
`PATCH /engine-versions/{id}`,
|
|
`POST /engine-versions/{id}/disable`.
|
|
|
|
- `GET /mail/deliveries`,
|
|
`GET /mail/deliveries/{delivery_id}`,
|
|
`GET /mail/deliveries/{delivery_id}/attempts`,
|
|
`POST /mail/deliveries/{delivery_id}/resend`,
|
|
`GET /mail/dead-letters`.
|
|
|
|
- `GET /notifications`, `GET /notifications/{notification_id}`,
|
|
`GET /notifications/dead-letters`,
|
|
`GET /notifications/malformed`.
|
|
|
|
- `GET /geo/users/{user_id}/countries` — counter listing.
|
|
|
|
Internal (gateway-only, `/api/v1/internal/*`):
|
|
|
|
- `GET /sessions/{device_session_id}` — gateway session lookup.
|
|
- `POST /sessions/{device_session_id}/revoke` — admin or self revoke
|
|
passthrough; backend emits `session_invalidation`.
|
|
- `POST /sessions/users/{user_id}/revoke-all`.
|
|
- `GET /users/{user_id}/account-internal` — server-to-server fetch
|
|
used by gateway flows that need account state alongside the session.
|
|
|
|
The internal group is on `/api/v1/internal/*`. The trust model treats
|
|
it as part of the user surface (no extra auth in MVP).
|
|
|
|
Critical files:
|
|
|
|
- `backend/openapi.yaml`
|
|
- `backend/internal/server/router.go`
|
|
- `backend/internal/server/middleware/{requestid,logging,metrics,panicrecovery,userid,basicauth}.go`
|
|
- `backend/internal/server/handlers_*.go`
|
|
- `backend/internal/server/contract_test.go`
|
|
- `backend/docs/api-contract.md`
|
|
|
|
Done criteria:
|
|
|
|
- `go test ./backend/internal/server/...` is green; the contract test
|
|
exercises every endpoint and validates against `openapi.yaml`.
|
|
- Every endpoint returns `501 Not Implemented` with the standard error
|
|
body.
|
|
- gin route table at startup matches the OpenAPI inventory exactly.
|
|
|
|
## ~~Stage 4~~ — Persistence layer
|
|
|
|
This stage was implemented and marked as done.
|
|
|
|
Goal: define every `backend` schema table, generate jet code, and make
|
|
the wiring of the persistence layer ready for the domain modules.
|
|
|
|
Actions:
|
|
|
|
1. Replace `backend/internal/postgres/migrations/00001_init.sql` with
|
|
the full DDL. The schema is `backend`. The expected tables and
|
|
their primary purposes:
|
|
|
|
Auth:
|
|
- `device_sessions(device_session_id uuid pk, user_id uuid not null,
|
|
client_public_key bytea not null, status text not null,
|
|
created_at, revoked_at, last_seen_at)` plus indexes on
|
|
`user_id` and `status`.
|
|
- `auth_challenges(challenge_id uuid pk, email text not null,
|
|
code_hash bytea not null, created_at, expires_at, consumed_at,
|
|
attempts int not null default 0)`. Index on `email`.
|
|
- `blocked_emails(email text pk, blocked_at, reason text)`.
|
|
|
|
User:
|
|
- `accounts(user_id uuid pk, email text unique not null,
|
|
user_name text unique not null, display_name text not null,
|
|
preferred_language text not null, time_zone text not null,
|
|
declared_country text, permanent_block bool not null default false,
|
|
created_at, updated_at, deleted_at)`.
|
|
- `entitlement_records(record_id uuid pk, user_id uuid not null,
|
|
tier text not null, source text not null, created_at)`.
|
|
- `entitlement_snapshots(user_id uuid pk, tier text not null,
|
|
max_registered_race_names int not null, taken_at timestamptz)`.
|
|
Updated on every entitlement change.
|
|
- `sanction_records`, `sanction_active`, `limit_records`,
|
|
`limit_active` — same shape as the previous `user` service had
|
|
(record + active rollup pattern).
|
|
|
|
Admin:
|
|
- `admin_accounts(username text pk, password_hash bytea not null,
|
|
created_at, last_used_at, disabled_at)`.
|
|
|
|
Lobby:
|
|
- `games(game_id uuid pk, owner_user_id uuid not null,
|
|
visibility text not null, status text not null, ...)` covering
|
|
enrollment state machine fields documented in
|
|
`ARCHITECTURE_deprecated.md` § Game Lobby.
|
|
- `applications(application_id uuid pk, game_id uuid not null,
|
|
applicant_user_id uuid not null, status text not null, ...)`.
|
|
- `invites(invite_id uuid pk, game_id uuid not null,
|
|
invited_user_id uuid, code text unique, status text, ...)`.
|
|
- `memberships(membership_id uuid pk, game_id uuid not null,
|
|
user_id uuid not null, race_name text not null, status text,
|
|
...)` plus `unique(game_id, user_id)`.
|
|
- `race_names(name text not null, canonical text not null,
|
|
status text not null, owner_user_id uuid, game_id uuid,
|
|
expires_at, registered_at, ...)` plus
|
|
`unique(canonical) where status in ('registered','reservation','pending_registration')`.
|
|
|
|
Runtime:
|
|
- `runtime_records(game_id uuid pk, current_container_id text,
|
|
status text not null, image_ref text, started_at, last_observed_at,
|
|
...)`.
|
|
- `engine_versions(version text pk, image_ref text not null,
|
|
enabled bool not null default true, created_at, ...)`.
|
|
- `player_mappings(game_id uuid not null, user_id uuid not null,
|
|
race_name text not null, engine_player_uuid uuid not null,
|
|
primary key(game_id, user_id))`.
|
|
- `runtime_operation_log(operation_id uuid pk, game_id uuid,
|
|
op text, status text, started_at, finished_at, error text)`.
|
|
- `runtime_health_snapshots(snapshot_id uuid pk, game_id uuid,
|
|
observed_at, payload jsonb)`.
|
|
|
|
Mail:
|
|
- `mail_deliveries(delivery_id uuid pk, template_id text not null,
|
|
idempotency_key text not null, status text not null,
|
|
attempts int not null default 0, next_attempt_at timestamptz,
|
|
payload_id uuid not null, created_at, ...)` plus
|
|
`unique(template_id, idempotency_key)`.
|
|
- `mail_recipients(recipient_id uuid pk, delivery_id uuid not null,
|
|
address text not null, kind text not null)`.
|
|
- `mail_attempts(attempt_id uuid pk, delivery_id uuid, attempt_no int,
|
|
started_at, finished_at, outcome text, error text)`.
|
|
- `mail_dead_letters(dead_letter_id uuid pk, delivery_id uuid,
|
|
archived_at, reason text)`.
|
|
- `mail_payloads(payload_id uuid pk, content_type text not null,
|
|
subject text, body bytea not null)`.
|
|
|
|
Notification:
|
|
- `notifications(notification_id uuid pk, kind text not null,
|
|
idempotency_key text not null, user_id uuid, payload jsonb,
|
|
created_at)` plus `unique(kind, idempotency_key)`.
|
|
- `notification_routes(route_id uuid pk, notification_id uuid,
|
|
channel text not null, status text not null, last_attempt_at,
|
|
...)`.
|
|
- `notification_dead_letters(dead_letter_id uuid pk, notification_id
|
|
uuid, archived_at, reason text)`.
|
|
- `notification_malformed_intents(id uuid pk, received_at, payload
|
|
jsonb, reason text)`.
|
|
|
|
Geo:
|
|
- `user_country_counters(user_id uuid not null, country text not null,
|
|
count bigint not null default 0, last_seen_at timestamptz,
|
|
primary key(user_id, country))`.
|
|
|
|
2. Add `created_at TIMESTAMPTZ DEFAULT now()` to every table; add
|
|
`updated_at` and `deleted_at` where the domain reasons in
|
|
`ARCHITECTURE_deprecated.md` apply. UTC normalisation is performed
|
|
in Go on read and write (the existing `pkg/postgres` helpers cover
|
|
this).
|
|
|
|
3. `backend/cmd/jetgen/main.go` — port the existing pattern from a
|
|
surviving reference (the previous services' `cmd/jetgen` is a good
|
|
template; adjust import paths to `galaxy/backend`). The tool spins
|
|
up a transient Postgres container, applies the embedded migrations,
|
|
and runs `jet -dsn=...` writing into `internal/postgres/jet/`.
|
|
|
|
4. `backend/Makefile` — fill in the `jet` target.
|
|
|
|
5. Run `make jet` and commit `internal/postgres/jet/`.
|
|
|
|
6. Add `backend/internal/postgres/jet/jet.go` — package doc and
|
|
`//go:generate` comment pointing to `cmd/jetgen`.
|
|
|
|
7. Sanity test in `backend/internal/postgres/migrations_test.go`:
|
|
spin up a Postgres testcontainer, apply migrations, assert that
|
|
the `backend` schema exists and that every expected table is
|
|
present.
|
|
|
|
Critical files:
|
|
|
|
- `backend/internal/postgres/migrations/00001_init.sql`
|
|
- `backend/internal/postgres/jet/**`
|
|
- `backend/cmd/jetgen/main.go`
|
|
- `backend/Makefile`
|
|
- `backend/internal/postgres/migrations_test.go`
|
|
|
|
Done criteria:
|
|
|
|
- `go test ./backend/internal/postgres/...` is green.
|
|
- `make jet` regenerates without diff.
|
|
- All tables listed above exist after a fresh migration.
|
|
|
|
## ~~Stage 5~~ — Domain implementation
|
|
|
|
Goal: implement domain modules in dependency order. After each substage
|
|
the backend is functional for the substage's slice of behaviour. The
|
|
contract tests from Stage 3 progressively flip from `501` to actual
|
|
responses as each substage replaces placeholders.
|
|
|
|
Substages run strictly in order. Each substage:
|
|
|
|
- Implements package code in `backend/internal/<domain>/`.
|
|
- Replaces the corresponding `501` handler bodies in
|
|
`backend/internal/server/handlers_*.go` with real logic that calls
|
|
the domain package.
|
|
- Adds focused unit and contract coverage for the substage's
|
|
endpoints.
|
|
- Wires the new package into `backend/cmd/backend/main.go`.
|
|
|
|
### ~~5.1~~ — auth
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_1-auth.md`](docs/stage05_1-auth.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- `POST /api/v1/public/auth/send-email-code` — generates a challenge,
|
|
hashes the code, persists in `auth_challenges`, calls
|
|
`mail.EnqueueLoginCode(email, code)`. Returns `{challenge_id}` for
|
|
every non-blocked email (existing user, new user, throttled — all
|
|
return identical shape; blocked email rejects with 400 only when the
|
|
block is permanent).
|
|
- `POST /api/v1/public/auth/confirm-email-code` — looks up the
|
|
challenge, verifies the code (constant-time), enforces attempt
|
|
ceiling, marks consumed, calls `user.EnsureByEmail(email,
|
|
preferred_language, time_zone)` to obtain the user_id, stores the
|
|
Ed25519 public key, creates a `device_session` row, populates the
|
|
in-memory cache, calls
|
|
`geo.SetDeclaredCountryAtRegistration(user_id, source_ip)`, and
|
|
returns `{device_session_id}`.
|
|
- `GET /api/v1/internal/sessions/{device_session_id}` — sync session
|
|
lookup for gateway.
|
|
- `POST /api/v1/internal/sessions/{device_session_id}/revoke` and
|
|
`POST /api/v1/internal/sessions/users/{user_id}/revoke-all` — mark
|
|
sessions revoked, evict from in-memory cache, emit
|
|
`session_invalidation` push event (Stage 6 wires the actual
|
|
emission; until then `auth` calls a no-op publisher injected at
|
|
wiring).
|
|
|
|
Cache: full session table read at startup; write-through on every
|
|
mutation.
|
|
|
|
### ~~5.2~~ — user
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_2-user.md`](docs/stage05_2-user.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- Account CRUD limited to allowed mutations on profile and settings.
|
|
- `EnsureByEmail` and `ResolveByEmail` for `auth`.
|
|
- Entitlement records and snapshots; tier downgrades never revoke
|
|
already-registered race names.
|
|
- Sanctions and limits using the record + active rollup pattern.
|
|
- Soft delete: writes `deleted_at` and triggers in-process cascade —
|
|
`lobby.OnUserDeleted(user_id)`, `notification.OnUserDeleted(user_id)`,
|
|
`geo.OnUserDeleted(user_id)`. Permanent block triggers
|
|
`lobby.OnUserBlocked(user_id)`.
|
|
- Cache: latest entitlement snapshot per user; warmed on startup;
|
|
write-through on entitlement mutation.
|
|
|
|
### ~~5.3~~ — admin
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_3-admin.md`](docs/stage05_3-admin.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- `admin_accounts` CRUD with bcrypt hashing.
|
|
- Bootstrap on startup via env vars (`BACKEND_ADMIN_BOOTSTRAP_USER`,
|
|
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`); idempotent.
|
|
- Replace the Stage 3 stub `basicauth` middleware with the real
|
|
Postgres-backed verifier. Constant-time comparison via bcrypt.
|
|
- Admin CRUD endpoints across users, games, runtime, mail,
|
|
notification, geo. Each admin endpoint delegates to the domain
|
|
package's admin-facing methods.
|
|
|
|
Cache: full admin table at startup; write-through on mutation.
|
|
|
|
### ~~5.4~~ — lobby
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_4-lobby.md`](docs/stage05_4-lobby.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- Games CRUD with the enrollment state machine.
|
|
- Applications and invites with their lifecycles.
|
|
- Memberships with race name binding.
|
|
- Race Name Directory: registered, reservation, and
|
|
pending_registration tiers; canonical key via `disciplinedware/go-confusables`;
|
|
uniqueness across all three tiers; capability promotion based on
|
|
`max_planets > initial AND max_population > initial` from the
|
|
runtime snapshot.
|
|
- Pending-registration sweeper: scheduled job, releases entries past
|
|
the 30-day window; uses `pkg/cronutil`. The same sweeper auto-closes
|
|
enrollment-expired games whose `approved_count >= min_players`.
|
|
- Hooks consumed from other modules:
|
|
- `OnUserBlocked(user_id)` — release all RND/applications/invites/
|
|
memberships in one transaction.
|
|
- `OnUserDeleted(user_id)` — same.
|
|
- `OnRuntimeSnapshot(snapshot)` — update denormalised runtime view
|
|
on the game (current_turn, status, per-member max stats).
|
|
- `OnGameFinished(game_id)` — drive race name promotion logic and
|
|
move game to `finished`.
|
|
|
|
Cache: active games and memberships, RND canonical set; warmed on
|
|
startup; write-through on mutation.
|
|
|
|
### ~~5.5~~ — runtime (with dockerclient and engineclient)
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_5-runtime.md`](docs/stage05_5-runtime.md) for the
|
|
decisions taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- Engine version registry CRUD.
|
|
- `engineclient` is a thin `net/http` client over `pkg/model` types,
|
|
one method per engine endpoint listed in `README.md` §8.
|
|
- `dockerclient` wraps `github.com/docker/docker` for: pull, create,
|
|
start, stop, remove, inspect, list (filtered by the
|
|
`galaxy.backend=1` label), patch (semver-only, validated against
|
|
`engine_versions`).
|
|
- Per-game serialisation: a `sync.Map[game_id]*sync.Mutex` ensures
|
|
concurrent ops on the same game are sequential.
|
|
- Worker pool for long-running operations: started in Stage 5.5; jobs
|
|
enqueued on a buffered channel; bounded concurrency.
|
|
- `runtime_operation_log` records every op (start time, finish time,
|
|
outcome, error).
|
|
- Reconciliation: on startup and on a `pkg/cronutil` schedule, list
|
|
containers labelled `galaxy.backend=1`, match against
|
|
`runtime_records`, adopt unrecorded labelled containers, mark
|
|
recorded but missing as removed. Emit
|
|
`lobby.OnRuntimeJobResult` for each removed.
|
|
- Snapshot publication: after every successful engine read or a
|
|
health-probe transition, synthesise a snapshot and call
|
|
`lobby.OnRuntimeSnapshot(snapshot)` synchronously.
|
|
- Turn scheduler: `pkg/cronutil` schedule per running game; each tick
|
|
invokes the engine `admin/turn`, on success snapshots and publishes;
|
|
force-next-turn sets a one-shot skip flag stored in
|
|
`runtime_records`.
|
|
|
|
Cache: active runtime records, engine version registry; warmed on
|
|
startup; write-through on mutation.
|
|
|
|
### ~~5.6~~ — mail
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_6-mail.md`](docs/stage05_6-mail.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- Outbox tables defined in Stage 4.
|
|
- Worker goroutine: scans `mail_deliveries` with
|
|
`SELECT ... FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`,
|
|
attempts SMTP delivery via `wneessen/go-mail`, records in
|
|
`mail_attempts`, updates status, schedules backoff with jitter, or
|
|
dead-letters past the configured maximum attempts.
|
|
- Drain on startup: replays all `pending` and `retrying` rows.
|
|
- Public API for producers: `EnqueueLoginCode(email, code, ttl)`,
|
|
`EnqueueTemplate(template_id, recipient, payload, idempotency_key)`.
|
|
- Admin endpoints implemented: list, view, resend.
|
|
|
|
### ~~5.7~~ — notification
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_7-notification.md`](docs/stage05_7-notification.md) for
|
|
the decisions taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- `Submit(intent)` — validate intent shape, enforce idempotency,
|
|
persist `notifications`, materialise `notification_routes`, fan out
|
|
to push (Stage 6 wires the actual push emission; until then a no-op
|
|
publisher) and email (`mail.EnqueueTemplate`).
|
|
- Each kind has a fixed channel set documented in `README.md` §10.
|
|
- Malformed intents go to `notification_malformed_intents` and never
|
|
block the producer.
|
|
- Dead-letter handling: a failed route past max attempts moves to
|
|
`notification_dead_letters`.
|
|
- Producers (lobby, runtime, geo, auth) are wired via direct function
|
|
calls.
|
|
|
|
### ~~5.8~~ — geo
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage05_8-geo.md`](docs/stage05_8-geo.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Behaviour:
|
|
|
|
- Load GeoLite2 Country DB at startup from `BACKEND_GEOIP_DB_PATH`.
|
|
- `SetDeclaredCountryAtRegistration(user_id, ip)` — sync; lookup,
|
|
update `accounts.declared_country`. No-op on lookup error.
|
|
- `IncrementCounterAsync(user_id, ip)` — fire-and-forget goroutine;
|
|
upsert `user_country_counters` with `count = count + 1`,
|
|
`last_seen_at = now()`.
|
|
- Middleware on `/api/v1/user/*` extracts the source IP from
|
|
`X-Forwarded-For` (or `RemoteAddr`) and calls
|
|
`IncrementCounterAsync` after the handler returns successfully.
|
|
- `OnUserDeleted(user_id)` — delete the user's counter rows.
|
|
|
|
Critical files (Stage 5 as a whole):
|
|
|
|
- `backend/internal/auth/**`
|
|
- `backend/internal/user/**`
|
|
- `backend/internal/admin/**`
|
|
- `backend/internal/lobby/**`
|
|
- `backend/internal/runtime/**`
|
|
- `backend/internal/dockerclient/**`
|
|
- `backend/internal/engineclient/**`
|
|
- `backend/internal/mail/**`
|
|
- `backend/internal/notification/**`
|
|
- `backend/internal/geo/**`
|
|
- `backend/internal/server/handlers_*.go` (replacing 501 stubs)
|
|
- `backend/cmd/backend/main.go` (wiring expansion)
|
|
|
|
Done criteria:
|
|
|
|
- All Stage 3 contract tests pass against real responses.
|
|
- Each substage adds focused unit tests (`testify`, mocks where
|
|
external boundaries justify them).
|
|
- `go run ./backend/cmd/backend` boots, all caches warm, all workers
|
|
start.
|
|
|
|
## ~~Stage 6~~ — Push gRPC interface and gateway adaptation
|
|
|
|
Goal: stand up the bidirectional control channel between backend and
|
|
gateway. Backend pushes `client_event` and `session_invalidation`;
|
|
gateway opens the stream, signs and forwards client events, immediately
|
|
acts on session invalidations. Remove every Redis dependency from
|
|
gateway except anti-replay reservations.
|
|
|
|
### ~~6.1~~ — Backend push server
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage06_1-push.md`](docs/stage06_1-push.md) for the decisions
|
|
taken during implementation.
|
|
|
|
Actions:
|
|
|
|
1. Author `backend/proto/push/v1/push.proto` with
|
|
`service Push { rpc SubscribePush(GatewaySubscribeRequest) returns
|
|
(stream PushEvent); }` and the message types defined in
|
|
`README.md` §7. Include a `cursor` field (string).
|
|
2. `backend/buf.yaml`, `backend/buf.gen.yaml` mirroring the gateway
|
|
pattern; generate Go bindings into `backend/proto/push/v1/`.
|
|
3. `backend/internal/push/server.go` — gRPC service implementation:
|
|
- Maintains a connection registry keyed by gateway client id (the
|
|
`GatewaySubscribeRequest` provides one; if multiple gateway
|
|
instances connect, each gets its own queue).
|
|
- Holds an in-memory ring buffer keyed by cursor, with TTL equal to
|
|
`BACKEND_FRESHNESS_WINDOW`. Cursors past TTL are discarded.
|
|
- Resume: if the client's cursor is still in the buffer, replay
|
|
from there; otherwise replay nothing and start fresh.
|
|
- Backpressure: per-connection buffered channel; on overflow, drop
|
|
the oldest events for that connection and log.
|
|
4. Provide a publisher API consumed by `auth`, `lobby`, `notification`,
|
|
and `runtime`:
|
|
- `push.PublishClientEvent(user_id, device_session_id?, payload, kind)`.
|
|
- `push.PublishSessionInvalidation(device_session_id|user_id, reason)`.
|
|
|
|
### ~~6.2~~ — Gateway adaptation
|
|
|
|
This substage was implemented and marked as done. See
|
|
[`docs/stage06_2-gateway.md`](docs/stage06_2-gateway.md) for the
|
|
decisions taken during implementation.
|
|
|
|
Actions:
|
|
|
|
1. Remove `redisconn` usage for session projection and for the two
|
|
stream consumers. Keep `redisconn` only for anti-replay
|
|
reservations.
|
|
2. Remove `gateway/internal/config` env vars
|
|
`GATEWAY_SESSION_EVENTS_REDIS_STREAM` and
|
|
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM`. Add
|
|
`GATEWAY_BACKEND_HTTP_URL` and `GATEWAY_BACKEND_GRPC_PUSH_URL`.
|
|
3. Add `gateway/internal/backendclient/` with:
|
|
- `RESTClient` — HTTP client for `/api/v1/internal/sessions/...` and
|
|
for forwarding public/user requests.
|
|
- `PushClient` — gRPC client to `SubscribePush` with reconnect
|
|
loop, exponential backoff with jitter, and cursor persistence in
|
|
process memory.
|
|
4. Replace gateway session validation with a sync REST call to
|
|
backend per request.
|
|
5. Replace gateway client-events Redis consumer with the
|
|
`SubscribePush` consumer. On `client_event`: sign envelope (Ed25519)
|
|
and deliver to the matching client subscription. On
|
|
`session_invalidation`: look up active subscriptions for the target
|
|
sessions, close them, and reject any in-flight authenticated
|
|
request bound to those sessions.
|
|
6. Anti-replay request_id reservations remain in Redis (unchanged).
|
|
7. Update gateway tests to use a mocked backend HTTP and gRPC server.
|
|
|
|
Critical files:
|
|
|
|
- `backend/proto/push/v1/push.proto`
|
|
- `backend/buf.yaml`, `backend/buf.gen.yaml`
|
|
- `backend/internal/push/server.go`,
|
|
`backend/internal/push/publisher.go`
|
|
- `gateway/internal/backendclient/*.go`
|
|
- `gateway/internal/config/config.go` (env var changes)
|
|
- `gateway/internal/handlers/*.go` (route forwarding to backend)
|
|
- `gateway/internal/auth/*.go` (session lookup → REST)
|
|
- `gateway/internal/eventfanout/*.go` (replace Redis consumer with
|
|
gRPC consumer; rename if helpful)
|
|
|
|
Done criteria:
|
|
|
|
- `go run ./backend/cmd/backend` and `go run ./gateway/cmd/gateway`
|
|
cooperate end-to-end with no Redis stream usage.
|
|
- A revocation through the admin surface causes immediate stream
|
|
closure on the affected client.
|
|
- Gateway anti-replay still rejects duplicates.
|
|
- gateway test suite green.
|
|
|
|
## ~~Stage 7~~ — Integration testing
|
|
|
|
This stage was implemented and marked as done. See
|
|
[`docs/stage07-integration.md`](docs/stage07-integration.md) for the
|
|
decisions taken during implementation, including the testenv layout,
|
|
the signed-envelope gRPC client, and the per-scenario coverage notes.
|
|
|
|
Goal: end-to-end coverage of the platform with real binaries and real
|
|
infrastructure where practical.
|
|
|
|
Actions:
|
|
|
|
1. Recreate the top-level `integration/` module, registered in
|
|
`go.work`. The module hosts black-box test suites that drive
|
|
`gateway` from outside and verify behaviour at the public boundary
|
|
(with `backend` and `game` running in containers).
|
|
2. Add testcontainers fixtures: Postgres, an SMTP capture server (for
|
|
example `axllent/mailpit`), the `galaxy/game` engine image, the
|
|
`galaxy/backend` image (built from this repo), and the
|
|
`galaxy/gateway` image. The Docker daemon used by testcontainers
|
|
is the same one backend will use to manage engines.
|
|
3. Add a synthetic GeoLite2 mmdb (use `pkg/geoip/test-data/`).
|
|
4. Cover scenarios:
|
|
- Registration flow: send-email-code → confirm-email-code →
|
|
`declared_country` populated from synthetic mmdb.
|
|
- User account fetch: `X-User-ID` path returns the expected
|
|
account; geo counter increments per request.
|
|
- Lobby flow: create game → invite → application → ready-to-start
|
|
→ start (engine container starts, healthz green, status read) →
|
|
command → force-next-turn → finish → race name promotion.
|
|
- Mail flow: trigger an email-bound notification → SMTP capture
|
|
receives it → admin resend works.
|
|
- Notification flow: lobby invite triggers a push event reaching
|
|
the test client's gateway subscription, plus an email captured
|
|
by SMTP.
|
|
- Admin flow: bootstrap admin authenticates; CRUD admin creates a
|
|
second admin; second admin disables the first.
|
|
- Soft delete flow: user soft-delete cascades; their RND entries,
|
|
memberships, applications, invites, geo counters are released
|
|
or removed.
|
|
- Session revocation: admin revokes a session → push
|
|
`session_invalidation` arrives at gateway → active subscription
|
|
closes; subsequent requests with that `device_session_id`
|
|
rejected by gateway.
|
|
- Anti-replay: same `request_id` replayed within freshness window
|
|
is rejected by gateway.
|
|
5. CI: run `go test ./integration/... -tags=integration` (or whichever
|
|
flag the team prefers). Tests requiring real Docker run only when
|
|
a Docker daemon is available; otherwise they skip with a clear
|
|
message.
|
|
|
|
Critical files:
|
|
|
|
- `integration/go.mod`
|
|
- `integration/auth_flow_test.go`
|
|
- `integration/lobby_flow_test.go`
|
|
- `integration/mail_flow_test.go`
|
|
- `integration/notification_flow_test.go`
|
|
- `integration/admin_flow_test.go`
|
|
- `integration/soft_delete_test.go`
|
|
- `integration/session_revoke_test.go`
|
|
- `integration/anti_replay_test.go`
|
|
- `integration/testenv/*.go` (shared fixtures)
|
|
|
|
Done criteria:
|
|
|
|
- `go test ./integration/...` runs the full suite.
|
|
- All listed scenarios pass green on a developer machine with Docker
|
|
available.
|
|
- Failures produce actionable diagnostics (logs from each component
|
|
attached to the test report).
|
|
|
|
## Stage acceptance and decision records
|
|
|
|
After each stage, the implementing engineer writes a short decision
|
|
record under `backend/docs/stage<NN>-<topic>.md` capturing any
|
|
non-trivial choice made during implementation that is not obvious from
|
|
the code or from this plan. Records that contradict this plan must be
|
|
brought to the architecture conversation before merge — the plan and
|
|
the architecture document are the agreed contract.
|