feat: backend service
This commit is contained in:
+868
@@ -0,0 +1,868 @@
|
||||
# backend — Implementation Plan
|
||||
|
||||
This plan has been already implemented and stays here for historical reasons.
|
||||
|
||||
It should NOT be threated as source of truth for service functionality.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This plan is the technical specification for implementing the
|
||||
consolidated Galaxy `backend` service. It is read together with
|
||||
`../ARCHITECTURE.md` (architecture and security model) and
|
||||
`README.md` (module layout, configuration, operations).
|
||||
|
||||
After reading those two documents and this plan, an implementing
|
||||
engineer should not need to ask architectural questions. Every stage is
|
||||
self-contained inside its domain area; stages run in order; each stage
|
||||
has explicit Critical files.
|
||||
|
||||
The plan does not invent new domain concepts. It catalogues the work
|
||||
required to assemble what the architecture document already defines.
|
||||
|
||||
## ~~Stage 1~~ — Repository cleanup
|
||||
|
||||
This stage was implemented and marked as done.
|
||||
|
||||
Goal: remove every module whose responsibility moves into `backend`,
|
||||
and prepare the workspace for the new module.
|
||||
|
||||
Actions:
|
||||
|
||||
1. `git rm -r authsession/ lobby/ mail/ notification/ gamemaster/
|
||||
rtmanager/ geoprofile/ user/ integration/ pkg/redisconn/
|
||||
pkg/notificationintent/`.
|
||||
2. Edit `go.work`:
|
||||
- Remove `use` lines for the deleted modules.
|
||||
- Remove `replace` lines for `galaxy/redisconn` and
|
||||
`galaxy/notificationintent`.
|
||||
- Do not add `./backend` yet — the module is created in Stage 2.
|
||||
3. Confirm that surviving modules still build:
|
||||
`go build ./gateway/... ./game/... ./client/... ./pkg/...`.
|
||||
Any compile error here means a surviving module imported a
|
||||
removed package and must be patched (the only realistic culprit is
|
||||
`gateway`, which references `pkg/redisconn` and the deleted streams;
|
||||
patches there belong to Stage 6, not Stage 1 — for Stage 1 it is
|
||||
acceptable to leave gateway broken if and only if the only failures
|
||||
come from imports of removed packages).
|
||||
4. Run `go vet ./pkg/...` and confirm no diagnostic.
|
||||
|
||||
Out of scope: any code change inside surviving modules. Stage 1 is
|
||||
purely deletion plus `go.work` edits.
|
||||
|
||||
Critical files:
|
||||
|
||||
- `go.work`
|
||||
- the deletion of `authsession/`, `lobby/`, `mail/`, `notification/`,
|
||||
`gamemaster/`, `rtmanager/`, `geoprofile/`, `user/`, `integration/`,
|
||||
`pkg/redisconn/`, `pkg/notificationintent/`.
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `git status` shows only deletions plus the `go.work` edit.
|
||||
- `go build ./pkg/...` is clean.
|
||||
- `go vet ./pkg/...` is clean.
|
||||
|
||||
## ~~Stage 2~~ — Backend skeleton & shared infrastructure
|
||||
|
||||
This stage was implemented and marked as done.
|
||||
|
||||
Goal: stand up the new module with its boot path, configuration,
|
||||
telemetry, logger, HTTP listener, Postgres pool, and gRPC listener — all
|
||||
with empty handlers. After this stage `go run ./backend/cmd/backend`
|
||||
must boot to a state where probes return 200 and migrations run (with an
|
||||
empty migration file).
|
||||
|
||||
Actions:
|
||||
|
||||
1. Create `backend/go.mod` with module path `galaxy/backend` and Go
|
||||
version matching `go.work`. Add direct dependencies:
|
||||
`github.com/gin-gonic/gin`, `github.com/jackc/pgx/v5`,
|
||||
`github.com/go-jet/jet/v2`, `github.com/pressly/goose/v3`,
|
||||
`go.uber.org/zap`, `go.opentelemetry.io/otel` and the OTLP
|
||||
trace/metric exporters used by other services, and the `galaxy/*`
|
||||
pkg modules (`postgres`, `model`, `geoip`, `cronutil`, `error`,
|
||||
`util`).
|
||||
2. Add `./backend` to `go.work` `use(...)`.
|
||||
3. `backend/cmd/backend/main.go` — boot order:
|
||||
1. Load `config.LoadFromEnv()`; `cfg.Validate()`.
|
||||
2. Initialise telemetry (`telemetry.NewProcess(cfg.Telemetry)`). Set
|
||||
global tracer and meter providers.
|
||||
3. Construct the zap logger; inject trace fields helper.
|
||||
4. Open Postgres pool. Apply embedded migrations with goose. Fail
|
||||
fast on any error.
|
||||
5. Construct module wiring (empty for now; populated in Stage 5).
|
||||
6. Start the HTTP server (gin engine with empty route groups, plus
|
||||
`/healthz` and `/readyz`).
|
||||
7. Start the gRPC push server (no streams accepted yet — Stage 6).
|
||||
8. Block on `signal.NotifyContext(ctx, SIGINT, SIGTERM)`; on signal,
|
||||
drain in the order described in `README.md` §16.
|
||||
4. `backend/internal/config/config.go` — env-loader following the
|
||||
pattern used by surviving services. Cover every variable listed in
|
||||
`README.md` §4. Provide `DefaultConfig()` and `Validate()`.
|
||||
5. `backend/internal/telemetry/runtime.go` — port the existing service
|
||||
pattern verbatim: configurable OTLP gRPC/HTTP exporter, optional
|
||||
stdout exporter, Prometheus pull endpoint when configured. Expose
|
||||
`TraceFieldsFromContext(ctx) []zap.Field`.
|
||||
6. `backend/internal/server/server.go` — gin engine, three empty route
|
||||
groups, request id middleware, panic recovery middleware, otel
|
||||
middleware. Probe handlers in `server/probes.go`.
|
||||
7. `backend/internal/postgres/pool.go` — pgx pool factory using the
|
||||
shared `galaxy/postgres` helper.
|
||||
8. `backend/internal/postgres/migrations/00001_init.sql` — empty file
|
||||
containing the `-- +goose Up` and `-- +goose Down` markers and a
|
||||
single `CREATE SCHEMA IF NOT EXISTS backend;` statement so the
|
||||
migration is non-empty and can be verified.
|
||||
9. `backend/internal/postgres/migrations/embed.go` — `embed.FS` and
|
||||
exported `Migrations() fs.FS` helper.
|
||||
10. `backend/internal/push/server.go` — gRPC server skeleton bound to
|
||||
`cfg.GRPCPushListenAddr`. No service registered yet.
|
||||
11. `backend/Makefile` — at minimum a `jet` target stub that prints
|
||||
"not generated yet"; will be filled in Stage 4.
|
||||
|
||||
Critical files:
|
||||
|
||||
- `backend/go.mod`, `go.work`
|
||||
- `backend/cmd/backend/main.go`
|
||||
- `backend/internal/config/config.go`
|
||||
- `backend/internal/telemetry/runtime.go`
|
||||
- `backend/internal/server/server.go`, `backend/internal/server/probes.go`
|
||||
- `backend/internal/postgres/pool.go`,
|
||||
`backend/internal/postgres/migrations/00001_init.sql`,
|
||||
`backend/internal/postgres/migrations/embed.go`
|
||||
- `backend/internal/push/server.go`
|
||||
- `backend/Makefile`
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `go build ./backend/...` is clean.
|
||||
- `go run ./backend/cmd/backend` starts, applies the placeholder
|
||||
migration, opens HTTP and gRPC listeners, and serves `/healthz` 200
|
||||
and `/readyz` 200.
|
||||
- Telemetry output (stdout exporter) shows trace and metric activity on
|
||||
a probe hit.
|
||||
|
||||
## ~~Stage~~ 3 — API contract & routing
|
||||
|
||||
This stage was implemented and marked as done.
|
||||
|
||||
Goal: define the entire backend REST contract in `openapi.yaml` and
|
||||
register every handler as a placeholder that returns
|
||||
`501 Not Implemented`. Wire the middleware stack for each route group.
|
||||
The contract test suite must validate every endpoint round-trip against
|
||||
the OpenAPI document and pass on the placeholders.
|
||||
|
||||
Actions:
|
||||
|
||||
1. Author `backend/openapi.yaml` — single document with three tags
|
||||
(`Public`, `User`, `Admin`) and the endpoint set below. Reuse
|
||||
schemas from `pkg/model` where possible; keep the rest under
|
||||
`components/schemas/*`.
|
||||
2. Implement middleware in `backend/internal/server/middleware/`:
|
||||
- `requestid` — assigns and propagates a request id (Stage 2 may
|
||||
have already done this; consolidate here).
|
||||
- `logging` — emits an access log entry with trace fields.
|
||||
- `metrics` — counters and histograms per route group.
|
||||
- `panicrecovery` — converts panics to 500 with structured logging.
|
||||
- `userid` — required on `/api/v1/user/*`. Reads `X-User-ID`,
|
||||
parses as UUID, places it in the request context. Rejects with
|
||||
400 if missing or malformed. Backend trusts the value (see
|
||||
architecture trust note).
|
||||
- `basicauth` — required on `/api/v1/admin/*`. Stage 3 uses a stub
|
||||
verifier that accepts any non-empty username and a fixed password
|
||||
read from a test-only env var so contract tests can pass; Stage
|
||||
5.3 replaces the verifier with the real Postgres-backed one.
|
||||
3. Implement handlers per endpoint in
|
||||
`backend/internal/server/handlers_<group>_<topic>.go`. Every handler
|
||||
returns `501 Not Implemented` with the standard error body
|
||||
`{"error":{"code":"not_implemented","message":"..."}}`.
|
||||
4. Implement the contract test:
|
||||
`backend/internal/server/contract_test.go`. Loads
|
||||
`backend/openapi.yaml` via `kin-openapi`, builds the gin engine,
|
||||
walks every operation, sends a representative request, and
|
||||
validates both the request and response against the OpenAPI
|
||||
document.
|
||||
5. Document `openapi.yaml` location and contract test pattern in
|
||||
`backend/docs/api-contract.md` (a brief decision record).
|
||||
|
||||
### Endpoint inventory
|
||||
|
||||
Public (`/api/v1/public/*`):
|
||||
|
||||
- `POST /auth/send-email-code` — request body `{email, locale?}`;
|
||||
response `{challenge_id}`.
|
||||
- `POST /auth/confirm-email-code` — request body
|
||||
`{challenge_id, code, client_public_key, time_zone}`; response
|
||||
`{device_session_id}`.
|
||||
|
||||
Probes (root):
|
||||
|
||||
- `GET /healthz` — `200` always when the process is alive.
|
||||
- `GET /readyz` — `200` once Postgres reachable, migrations applied,
|
||||
gRPC listener bound; `503` otherwise.
|
||||
|
||||
User (`/api/v1/user/*`, all require `X-User-ID`):
|
||||
|
||||
- `GET /account` — current account view (profile + settings +
|
||||
entitlements).
|
||||
- `PATCH /account/profile` — update mutable profile fields
|
||||
(`display_name`).
|
||||
- `PATCH /account/settings` — update `preferred_language`, `time_zone`.
|
||||
- `POST /account/delete` — soft delete; cascade is in process.
|
||||
|
||||
- `GET /lobby/games` — public list with paging.
|
||||
- `POST /lobby/games` — create.
|
||||
- `GET /lobby/games/{game_id}`.
|
||||
- `PATCH /lobby/games/{game_id}`.
|
||||
- `POST /lobby/games/{game_id}/open-enrollment`.
|
||||
- `POST /lobby/games/{game_id}/ready-to-start`.
|
||||
- `POST /lobby/games/{game_id}/start`.
|
||||
- `POST /lobby/games/{game_id}/pause`.
|
||||
- `POST /lobby/games/{game_id}/resume`.
|
||||
- `POST /lobby/games/{game_id}/cancel`.
|
||||
- `POST /lobby/games/{game_id}/retry-start`.
|
||||
- `POST /lobby/games/{game_id}/applications`.
|
||||
- `POST /lobby/games/{game_id}/applications/{application_id}/approve`.
|
||||
- `POST /lobby/games/{game_id}/applications/{application_id}/reject`.
|
||||
- `POST /lobby/games/{game_id}/invites`.
|
||||
- `POST /lobby/games/{game_id}/invites/{invite_id}/redeem`.
|
||||
- `POST /lobby/games/{game_id}/invites/{invite_id}/decline`.
|
||||
- `POST /lobby/games/{game_id}/invites/{invite_id}/revoke`.
|
||||
- `GET /lobby/games/{game_id}/memberships`.
|
||||
- `POST /lobby/games/{game_id}/memberships/{membership_id}/remove`.
|
||||
- `POST /lobby/games/{game_id}/memberships/{membership_id}/block`.
|
||||
|
||||
- `GET /lobby/my/games`.
|
||||
- `GET /lobby/my/applications`.
|
||||
- `GET /lobby/my/invites`.
|
||||
- `GET /lobby/my/race-names`.
|
||||
|
||||
- `POST /lobby/race-names/register` — promote a `pending_registration`
|
||||
to `registered` within the 30-day window.
|
||||
|
||||
- `POST /games/{game_id}/commands` — proxy to engine command path.
|
||||
- `POST /games/{game_id}/orders` — proxy to engine order validation.
|
||||
- `GET /games/{game_id}/reports/{turn}` — proxy to engine report path.
|
||||
|
||||
Admin (`/api/v1/admin/*`, all require Basic Auth):
|
||||
|
||||
- `GET /admin-accounts`, `POST /admin-accounts`,
|
||||
`GET /admin-accounts/{username}`,
|
||||
`POST /admin-accounts/{username}/disable`,
|
||||
`POST /admin-accounts/{username}/enable`,
|
||||
`POST /admin-accounts/{username}/reset-password`.
|
||||
|
||||
- `GET /users`, `GET /users/{user_id}`,
|
||||
`POST /users/{user_id}/sanctions`,
|
||||
`POST /users/{user_id}/limits`,
|
||||
`POST /users/{user_id}/entitlements`,
|
||||
`POST /users/{user_id}/soft-delete`.
|
||||
|
||||
- `GET /games`, `GET /games/{game_id}`,
|
||||
`POST /games/{game_id}/force-start`,
|
||||
`POST /games/{game_id}/force-stop`,
|
||||
`POST /games/{game_id}/ban-member`.
|
||||
|
||||
- `GET /runtimes/{game_id}`,
|
||||
`POST /runtimes/{game_id}/restart`,
|
||||
`POST /runtimes/{game_id}/patch`,
|
||||
`POST /runtimes/{game_id}/force-next-turn`,
|
||||
`GET /engine-versions`, `POST /engine-versions`,
|
||||
`PATCH /engine-versions/{id}`,
|
||||
`POST /engine-versions/{id}/disable`.
|
||||
|
||||
- `GET /mail/deliveries`,
|
||||
`GET /mail/deliveries/{delivery_id}`,
|
||||
`GET /mail/deliveries/{delivery_id}/attempts`,
|
||||
`POST /mail/deliveries/{delivery_id}/resend`,
|
||||
`GET /mail/dead-letters`.
|
||||
|
||||
- `GET /notifications`, `GET /notifications/{notification_id}`,
|
||||
`GET /notifications/dead-letters`,
|
||||
`GET /notifications/malformed`.
|
||||
|
||||
- `GET /geo/users/{user_id}/countries` — counter listing.
|
||||
|
||||
Internal (gateway-only, `/api/v1/internal/*`):
|
||||
|
||||
- `GET /sessions/{device_session_id}` — gateway session lookup.
|
||||
- `POST /sessions/{device_session_id}/revoke` — admin or self revoke
|
||||
passthrough; backend emits `session_invalidation`.
|
||||
- `POST /sessions/users/{user_id}/revoke-all`.
|
||||
- `GET /users/{user_id}/account-internal` — server-to-server fetch
|
||||
used by gateway flows that need account state alongside the session.
|
||||
|
||||
The internal group is on `/api/v1/internal/*`. The trust model treats
|
||||
it as part of the user surface (no extra auth in MVP).
|
||||
|
||||
Critical files:
|
||||
|
||||
- `backend/openapi.yaml`
|
||||
- `backend/internal/server/router.go`
|
||||
- `backend/internal/server/middleware/{requestid,logging,metrics,panicrecovery,userid,basicauth}.go`
|
||||
- `backend/internal/server/handlers_*.go`
|
||||
- `backend/internal/server/contract_test.go`
|
||||
- `backend/docs/api-contract.md`
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `go test ./backend/internal/server/...` is green; the contract test
|
||||
exercises every endpoint and validates against `openapi.yaml`.
|
||||
- Every endpoint returns `501 Not Implemented` with the standard error
|
||||
body.
|
||||
- gin route table at startup matches the OpenAPI inventory exactly.
|
||||
|
||||
## ~~Stage 4~~ — Persistence layer
|
||||
|
||||
This stage was implemented and marked as done.
|
||||
|
||||
Goal: define every `backend` schema table, generate jet code, and make
|
||||
the wiring of the persistence layer ready for the domain modules.
|
||||
|
||||
Actions:
|
||||
|
||||
1. Replace `backend/internal/postgres/migrations/00001_init.sql` with
|
||||
the full DDL. The schema is `backend`. The expected tables and
|
||||
their primary purposes:
|
||||
|
||||
Auth:
|
||||
- `device_sessions(device_session_id uuid pk, user_id uuid not null,
|
||||
client_public_key bytea not null, status text not null,
|
||||
created_at, revoked_at, last_seen_at)` plus indexes on
|
||||
`user_id` and `status`.
|
||||
- `auth_challenges(challenge_id uuid pk, email text not null,
|
||||
code_hash bytea not null, created_at, expires_at, consumed_at,
|
||||
attempts int not null default 0)`. Index on `email`.
|
||||
- `blocked_emails(email text pk, blocked_at, reason text)`.
|
||||
|
||||
User:
|
||||
- `accounts(user_id uuid pk, email text unique not null,
|
||||
user_name text unique not null, display_name text not null,
|
||||
preferred_language text not null, time_zone text not null,
|
||||
declared_country text, permanent_block bool not null default false,
|
||||
created_at, updated_at, deleted_at)`.
|
||||
- `entitlement_records(record_id uuid pk, user_id uuid not null,
|
||||
tier text not null, source text not null, created_at)`.
|
||||
- `entitlement_snapshots(user_id uuid pk, tier text not null,
|
||||
max_registered_race_names int not null, taken_at timestamptz)`.
|
||||
Updated on every entitlement change.
|
||||
- `sanction_records`, `sanction_active`, `limit_records`,
|
||||
`limit_active` — same shape as the previous `user` service had
|
||||
(record + active rollup pattern).
|
||||
|
||||
Admin:
|
||||
- `admin_accounts(username text pk, password_hash bytea not null,
|
||||
created_at, last_used_at, disabled_at)`.
|
||||
|
||||
Lobby:
|
||||
- `games(game_id uuid pk, owner_user_id uuid not null,
|
||||
visibility text not null, status text not null, ...)` covering
|
||||
enrollment state machine fields documented in
|
||||
`ARCHITECTURE_deprecated.md` § Game Lobby.
|
||||
- `applications(application_id uuid pk, game_id uuid not null,
|
||||
applicant_user_id uuid not null, status text not null, ...)`.
|
||||
- `invites(invite_id uuid pk, game_id uuid not null,
|
||||
invited_user_id uuid, code text unique, status text, ...)`.
|
||||
- `memberships(membership_id uuid pk, game_id uuid not null,
|
||||
user_id uuid not null, race_name text not null, status text,
|
||||
...)` plus `unique(game_id, user_id)`.
|
||||
- `race_names(name text not null, canonical text not null,
|
||||
status text not null, owner_user_id uuid, game_id uuid,
|
||||
expires_at, registered_at, ...)` plus
|
||||
`unique(canonical) where status in ('registered','reservation','pending_registration')`.
|
||||
|
||||
Runtime:
|
||||
- `runtime_records(game_id uuid pk, current_container_id text,
|
||||
status text not null, image_ref text, started_at, last_observed_at,
|
||||
...)`.
|
||||
- `engine_versions(version text pk, image_ref text not null,
|
||||
enabled bool not null default true, created_at, ...)`.
|
||||
- `player_mappings(game_id uuid not null, user_id uuid not null,
|
||||
race_name text not null, engine_player_uuid uuid not null,
|
||||
primary key(game_id, user_id))`.
|
||||
- `runtime_operation_log(operation_id uuid pk, game_id uuid,
|
||||
op text, status text, started_at, finished_at, error text)`.
|
||||
- `runtime_health_snapshots(snapshot_id uuid pk, game_id uuid,
|
||||
observed_at, payload jsonb)`.
|
||||
|
||||
Mail:
|
||||
- `mail_deliveries(delivery_id uuid pk, template_id text not null,
|
||||
idempotency_key text not null, status text not null,
|
||||
attempts int not null default 0, next_attempt_at timestamptz,
|
||||
payload_id uuid not null, created_at, ...)` plus
|
||||
`unique(template_id, idempotency_key)`.
|
||||
- `mail_recipients(recipient_id uuid pk, delivery_id uuid not null,
|
||||
address text not null, kind text not null)`.
|
||||
- `mail_attempts(attempt_id uuid pk, delivery_id uuid, attempt_no int,
|
||||
started_at, finished_at, outcome text, error text)`.
|
||||
- `mail_dead_letters(dead_letter_id uuid pk, delivery_id uuid,
|
||||
archived_at, reason text)`.
|
||||
- `mail_payloads(payload_id uuid pk, content_type text not null,
|
||||
subject text, body bytea not null)`.
|
||||
|
||||
Notification:
|
||||
- `notifications(notification_id uuid pk, kind text not null,
|
||||
idempotency_key text not null, user_id uuid, payload jsonb,
|
||||
created_at)` plus `unique(kind, idempotency_key)`.
|
||||
- `notification_routes(route_id uuid pk, notification_id uuid,
|
||||
channel text not null, status text not null, last_attempt_at,
|
||||
...)`.
|
||||
- `notification_dead_letters(dead_letter_id uuid pk, notification_id
|
||||
uuid, archived_at, reason text)`.
|
||||
- `notification_malformed_intents(id uuid pk, received_at, payload
|
||||
jsonb, reason text)`.
|
||||
|
||||
Geo:
|
||||
- `user_country_counters(user_id uuid not null, country text not null,
|
||||
count bigint not null default 0, last_seen_at timestamptz,
|
||||
primary key(user_id, country))`.
|
||||
|
||||
2. Add `created_at TIMESTAMPTZ DEFAULT now()` to every table; add
|
||||
`updated_at` and `deleted_at` where the domain reasons in
|
||||
`ARCHITECTURE_deprecated.md` apply. UTC normalisation is performed
|
||||
in Go on read and write (the existing `pkg/postgres` helpers cover
|
||||
this).
|
||||
|
||||
3. `backend/cmd/jetgen/main.go` — port the existing pattern from a
|
||||
surviving reference (the previous services' `cmd/jetgen` is a good
|
||||
template; adjust import paths to `galaxy/backend`). The tool spins
|
||||
up a transient Postgres container, applies the embedded migrations,
|
||||
and runs `jet -dsn=...` writing into `internal/postgres/jet/`.
|
||||
|
||||
4. `backend/Makefile` — fill in the `jet` target.
|
||||
|
||||
5. Run `make jet` and commit `internal/postgres/jet/`.
|
||||
|
||||
6. Add `backend/internal/postgres/jet/jet.go` — package doc and
|
||||
`//go:generate` comment pointing to `cmd/jetgen`.
|
||||
|
||||
7. Sanity test in `backend/internal/postgres/migrations_test.go`:
|
||||
spin up a Postgres testcontainer, apply migrations, assert that
|
||||
the `backend` schema exists and that every expected table is
|
||||
present.
|
||||
|
||||
Critical files:
|
||||
|
||||
- `backend/internal/postgres/migrations/00001_init.sql`
|
||||
- `backend/internal/postgres/jet/**`
|
||||
- `backend/cmd/jetgen/main.go`
|
||||
- `backend/Makefile`
|
||||
- `backend/internal/postgres/migrations_test.go`
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `go test ./backend/internal/postgres/...` is green.
|
||||
- `make jet` regenerates without diff.
|
||||
- All tables listed above exist after a fresh migration.
|
||||
|
||||
## ~~Stage 5~~ — Domain implementation
|
||||
|
||||
Goal: implement domain modules in dependency order. After each substage
|
||||
the backend is functional for the substage's slice of behaviour. The
|
||||
contract tests from Stage 3 progressively flip from `501` to actual
|
||||
responses as each substage replaces placeholders.
|
||||
|
||||
Substages run strictly in order. Each substage:
|
||||
|
||||
- Implements package code in `backend/internal/<domain>/`.
|
||||
- Replaces the corresponding `501` handler bodies in
|
||||
`backend/internal/server/handlers_*.go` with real logic that calls
|
||||
the domain package.
|
||||
- Adds focused unit and contract coverage for the substage's
|
||||
endpoints.
|
||||
- Wires the new package into `backend/cmd/backend/main.go`.
|
||||
|
||||
### ~~5.1~~ — auth
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_1-auth.md`](docs/stage05_1-auth.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- `POST /api/v1/public/auth/send-email-code` — generates a challenge,
|
||||
hashes the code, persists in `auth_challenges`, calls
|
||||
`mail.EnqueueLoginCode(email, code)`. Returns `{challenge_id}` for
|
||||
every non-blocked email (existing user, new user, throttled — all
|
||||
return identical shape; blocked email rejects with 400 only when the
|
||||
block is permanent).
|
||||
- `POST /api/v1/public/auth/confirm-email-code` — looks up the
|
||||
challenge, verifies the code (constant-time), enforces attempt
|
||||
ceiling, marks consumed, calls `user.EnsureByEmail(email,
|
||||
preferred_language, time_zone)` to obtain the user_id, stores the
|
||||
Ed25519 public key, creates a `device_session` row, populates the
|
||||
in-memory cache, calls
|
||||
`geo.SetDeclaredCountryAtRegistration(user_id, source_ip)`, and
|
||||
returns `{device_session_id}`.
|
||||
- `GET /api/v1/internal/sessions/{device_session_id}` — sync session
|
||||
lookup for gateway.
|
||||
- `POST /api/v1/internal/sessions/{device_session_id}/revoke` and
|
||||
`POST /api/v1/internal/sessions/users/{user_id}/revoke-all` — mark
|
||||
sessions revoked, evict from in-memory cache, emit
|
||||
`session_invalidation` push event (Stage 6 wires the actual
|
||||
emission; until then `auth` calls a no-op publisher injected at
|
||||
wiring).
|
||||
|
||||
Cache: full session table read at startup; write-through on every
|
||||
mutation.
|
||||
|
||||
### ~~5.2~~ — user
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_2-user.md`](docs/stage05_2-user.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- Account CRUD limited to allowed mutations on profile and settings.
|
||||
- `EnsureByEmail` and `ResolveByEmail` for `auth`.
|
||||
- Entitlement records and snapshots; tier downgrades never revoke
|
||||
already-registered race names.
|
||||
- Sanctions and limits using the record + active rollup pattern.
|
||||
- Soft delete: writes `deleted_at` and triggers in-process cascade —
|
||||
`lobby.OnUserDeleted(user_id)`, `notification.OnUserDeleted(user_id)`,
|
||||
`geo.OnUserDeleted(user_id)`. Permanent block triggers
|
||||
`lobby.OnUserBlocked(user_id)`.
|
||||
- Cache: latest entitlement snapshot per user; warmed on startup;
|
||||
write-through on entitlement mutation.
|
||||
|
||||
### ~~5.3~~ — admin
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_3-admin.md`](docs/stage05_3-admin.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- `admin_accounts` CRUD with bcrypt hashing.
|
||||
- Bootstrap on startup via env vars (`BACKEND_ADMIN_BOOTSTRAP_USER`,
|
||||
`BACKEND_ADMIN_BOOTSTRAP_PASSWORD`); idempotent.
|
||||
- Replace the Stage 3 stub `basicauth` middleware with the real
|
||||
Postgres-backed verifier. Constant-time comparison via bcrypt.
|
||||
- Admin CRUD endpoints across users, games, runtime, mail,
|
||||
notification, geo. Each admin endpoint delegates to the domain
|
||||
package's admin-facing methods.
|
||||
|
||||
Cache: full admin table at startup; write-through on mutation.
|
||||
|
||||
### ~~5.4~~ — lobby
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_4-lobby.md`](docs/stage05_4-lobby.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- Games CRUD with the enrollment state machine.
|
||||
- Applications and invites with their lifecycles.
|
||||
- Memberships with race name binding.
|
||||
- Race Name Directory: registered, reservation, and
|
||||
pending_registration tiers; canonical key via `disciplinedware/go-confusables`;
|
||||
uniqueness across all three tiers; capability promotion based on
|
||||
`max_planets > initial AND max_population > initial` from the
|
||||
runtime snapshot.
|
||||
- Pending-registration sweeper: scheduled job, releases entries past
|
||||
the 30-day window; uses `pkg/cronutil`. The same sweeper auto-closes
|
||||
enrollment-expired games whose `approved_count >= min_players`.
|
||||
- Hooks consumed from other modules:
|
||||
- `OnUserBlocked(user_id)` — release all RND/applications/invites/
|
||||
memberships in one transaction.
|
||||
- `OnUserDeleted(user_id)` — same.
|
||||
- `OnRuntimeSnapshot(snapshot)` — update denormalised runtime view
|
||||
on the game (current_turn, status, per-member max stats).
|
||||
- `OnGameFinished(game_id)` — drive race name promotion logic and
|
||||
move game to `finished`.
|
||||
|
||||
Cache: active games and memberships, RND canonical set; warmed on
|
||||
startup; write-through on mutation.
|
||||
|
||||
### ~~5.5~~ — runtime (with dockerclient and engineclient)
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_5-runtime.md`](docs/stage05_5-runtime.md) for the
|
||||
decisions taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- Engine version registry CRUD.
|
||||
- `engineclient` is a thin `net/http` client over `pkg/model` types,
|
||||
one method per engine endpoint listed in `README.md` §8.
|
||||
- `dockerclient` wraps `github.com/docker/docker` for: pull, create,
|
||||
start, stop, remove, inspect, list (filtered by the
|
||||
`galaxy.backend=1` label), patch (semver-only, validated against
|
||||
`engine_versions`).
|
||||
- Per-game serialisation: a `sync.Map[game_id]*sync.Mutex` ensures
|
||||
concurrent ops on the same game are sequential.
|
||||
- Worker pool for long-running operations: started in Stage 5.5; jobs
|
||||
enqueued on a buffered channel; bounded concurrency.
|
||||
- `runtime_operation_log` records every op (start time, finish time,
|
||||
outcome, error).
|
||||
- Reconciliation: on startup and on a `pkg/cronutil` schedule, list
|
||||
containers labelled `galaxy.backend=1`, match against
|
||||
`runtime_records`, adopt unrecorded labelled containers, mark
|
||||
recorded but missing as removed. Emit
|
||||
`lobby.OnRuntimeJobResult` for each removed.
|
||||
- Snapshot publication: after every successful engine read or a
|
||||
health-probe transition, synthesise a snapshot and call
|
||||
`lobby.OnRuntimeSnapshot(snapshot)` synchronously.
|
||||
- Turn scheduler: `pkg/cronutil` schedule per running game; each tick
|
||||
invokes the engine `admin/turn`, on success snapshots and publishes;
|
||||
force-next-turn sets a one-shot skip flag stored in
|
||||
`runtime_records`.
|
||||
|
||||
Cache: active runtime records, engine version registry; warmed on
|
||||
startup; write-through on mutation.
|
||||
|
||||
### ~~5.6~~ — mail
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_6-mail.md`](docs/stage05_6-mail.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- Outbox tables defined in Stage 4.
|
||||
- Worker goroutine: scans `mail_deliveries` with
|
||||
`SELECT ... FOR UPDATE SKIP LOCKED` ordered by `next_attempt_at`,
|
||||
attempts SMTP delivery via `wneessen/go-mail`, records in
|
||||
`mail_attempts`, updates status, schedules backoff with jitter, or
|
||||
dead-letters past the configured maximum attempts.
|
||||
- Drain on startup: replays all `pending` and `retrying` rows.
|
||||
- Public API for producers: `EnqueueLoginCode(email, code, ttl)`,
|
||||
`EnqueueTemplate(template_id, recipient, payload, idempotency_key)`.
|
||||
- Admin endpoints implemented: list, view, resend.
|
||||
|
||||
### ~~5.7~~ — notification
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_7-notification.md`](docs/stage05_7-notification.md) for
|
||||
the decisions taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- `Submit(intent)` — validate intent shape, enforce idempotency,
|
||||
persist `notifications`, materialise `notification_routes`, fan out
|
||||
to push (Stage 6 wires the actual push emission; until then a no-op
|
||||
publisher) and email (`mail.EnqueueTemplate`).
|
||||
- Each kind has a fixed channel set documented in `README.md` §10.
|
||||
- Malformed intents go to `notification_malformed_intents` and never
|
||||
block the producer.
|
||||
- Dead-letter handling: a failed route past max attempts moves to
|
||||
`notification_dead_letters`.
|
||||
- Producers (lobby, runtime, geo, auth) are wired via direct function
|
||||
calls.
|
||||
|
||||
### ~~5.8~~ — geo
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage05_8-geo.md`](docs/stage05_8-geo.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Behaviour:
|
||||
|
||||
- Load GeoLite2 Country DB at startup from `BACKEND_GEOIP_DB_PATH`.
|
||||
- `SetDeclaredCountryAtRegistration(user_id, ip)` — sync; lookup,
|
||||
update `accounts.declared_country`. No-op on lookup error.
|
||||
- `IncrementCounterAsync(user_id, ip)` — fire-and-forget goroutine;
|
||||
upsert `user_country_counters` with `count = count + 1`,
|
||||
`last_seen_at = now()`.
|
||||
- Middleware on `/api/v1/user/*` extracts the source IP from
|
||||
`X-Forwarded-For` (or `RemoteAddr`) and calls
|
||||
`IncrementCounterAsync` after the handler returns successfully.
|
||||
- `OnUserDeleted(user_id)` — delete the user's counter rows.
|
||||
|
||||
Critical files (Stage 5 as a whole):
|
||||
|
||||
- `backend/internal/auth/**`
|
||||
- `backend/internal/user/**`
|
||||
- `backend/internal/admin/**`
|
||||
- `backend/internal/lobby/**`
|
||||
- `backend/internal/runtime/**`
|
||||
- `backend/internal/dockerclient/**`
|
||||
- `backend/internal/engineclient/**`
|
||||
- `backend/internal/mail/**`
|
||||
- `backend/internal/notification/**`
|
||||
- `backend/internal/geo/**`
|
||||
- `backend/internal/server/handlers_*.go` (replacing 501 stubs)
|
||||
- `backend/cmd/backend/main.go` (wiring expansion)
|
||||
|
||||
Done criteria:
|
||||
|
||||
- All Stage 3 contract tests pass against real responses.
|
||||
- Each substage adds focused unit tests (`testify`, mocks where
|
||||
external boundaries justify them).
|
||||
- `go run ./backend/cmd/backend` boots, all caches warm, all workers
|
||||
start.
|
||||
|
||||
## ~~Stage 6~~ — Push gRPC interface and gateway adaptation
|
||||
|
||||
Goal: stand up the bidirectional control channel between backend and
|
||||
gateway. Backend pushes `client_event` and `session_invalidation`;
|
||||
gateway opens the stream, signs and forwards client events, immediately
|
||||
acts on session invalidations. Remove every Redis dependency from
|
||||
gateway except anti-replay reservations.
|
||||
|
||||
### ~~6.1~~ — Backend push server
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage06_1-push.md`](docs/stage06_1-push.md) for the decisions
|
||||
taken during implementation.
|
||||
|
||||
Actions:
|
||||
|
||||
1. Author `backend/proto/push/v1/push.proto` with
|
||||
`service Push { rpc SubscribePush(GatewaySubscribeRequest) returns
|
||||
(stream PushEvent); }` and the message types defined in
|
||||
`README.md` §7. Include a `cursor` field (string).
|
||||
2. `backend/buf.yaml`, `backend/buf.gen.yaml` mirroring the gateway
|
||||
pattern; generate Go bindings into `backend/proto/push/v1/`.
|
||||
3. `backend/internal/push/server.go` — gRPC service implementation:
|
||||
- Maintains a connection registry keyed by gateway client id (the
|
||||
`GatewaySubscribeRequest` provides one; if multiple gateway
|
||||
instances connect, each gets its own queue).
|
||||
- Holds an in-memory ring buffer keyed by cursor, with TTL equal to
|
||||
`BACKEND_FRESHNESS_WINDOW`. Cursors past TTL are discarded.
|
||||
- Resume: if the client's cursor is still in the buffer, replay
|
||||
from there; otherwise replay nothing and start fresh.
|
||||
- Backpressure: per-connection buffered channel; on overflow, drop
|
||||
the oldest events for that connection and log.
|
||||
4. Provide a publisher API consumed by `auth`, `lobby`, `notification`,
|
||||
and `runtime`:
|
||||
- `push.PublishClientEvent(user_id, device_session_id?, payload, kind)`.
|
||||
- `push.PublishSessionInvalidation(device_session_id|user_id, reason)`.
|
||||
|
||||
### ~~6.2~~ — Gateway adaptation
|
||||
|
||||
This substage was implemented and marked as done. See
|
||||
[`docs/stage06_2-gateway.md`](docs/stage06_2-gateway.md) for the
|
||||
decisions taken during implementation.
|
||||
|
||||
Actions:
|
||||
|
||||
1. Remove `redisconn` usage for session projection and for the two
|
||||
stream consumers. Keep `redisconn` only for anti-replay
|
||||
reservations.
|
||||
2. Remove `gateway/internal/config` env vars
|
||||
`GATEWAY_SESSION_EVENTS_REDIS_STREAM` and
|
||||
`GATEWAY_CLIENT_EVENTS_REDIS_STREAM`. Add
|
||||
`GATEWAY_BACKEND_HTTP_URL` and `GATEWAY_BACKEND_GRPC_PUSH_URL`.
|
||||
3. Add `gateway/internal/backendclient/` with:
|
||||
- `RESTClient` — HTTP client for `/api/v1/internal/sessions/...` and
|
||||
for forwarding public/user requests.
|
||||
- `PushClient` — gRPC client to `SubscribePush` with reconnect
|
||||
loop, exponential backoff with jitter, and cursor persistence in
|
||||
process memory.
|
||||
4. Replace gateway session validation with a sync REST call to
|
||||
backend per request.
|
||||
5. Replace gateway client-events Redis consumer with the
|
||||
`SubscribePush` consumer. On `client_event`: sign envelope (Ed25519)
|
||||
and deliver to the matching client subscription. On
|
||||
`session_invalidation`: look up active subscriptions for the target
|
||||
sessions, close them, and reject any in-flight authenticated
|
||||
request bound to those sessions.
|
||||
6. Anti-replay request_id reservations remain in Redis (unchanged).
|
||||
7. Update gateway tests to use a mocked backend HTTP and gRPC server.
|
||||
|
||||
Critical files:
|
||||
|
||||
- `backend/proto/push/v1/push.proto`
|
||||
- `backend/buf.yaml`, `backend/buf.gen.yaml`
|
||||
- `backend/internal/push/server.go`,
|
||||
`backend/internal/push/publisher.go`
|
||||
- `gateway/internal/backendclient/*.go`
|
||||
- `gateway/internal/config/config.go` (env var changes)
|
||||
- `gateway/internal/handlers/*.go` (route forwarding to backend)
|
||||
- `gateway/internal/auth/*.go` (session lookup → REST)
|
||||
- `gateway/internal/eventfanout/*.go` (replace Redis consumer with
|
||||
gRPC consumer; rename if helpful)
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `go run ./backend/cmd/backend` and `go run ./gateway/cmd/gateway`
|
||||
cooperate end-to-end with no Redis stream usage.
|
||||
- A revocation through the admin surface causes immediate stream
|
||||
closure on the affected client.
|
||||
- Gateway anti-replay still rejects duplicates.
|
||||
- gateway test suite green.
|
||||
|
||||
## ~~Stage 7~~ — Integration testing
|
||||
|
||||
This stage was implemented and marked as done. See
|
||||
[`docs/stage07-integration.md`](docs/stage07-integration.md) for the
|
||||
decisions taken during implementation, including the testenv layout,
|
||||
the signed-envelope gRPC client, and the per-scenario coverage notes.
|
||||
|
||||
Goal: end-to-end coverage of the platform with real binaries and real
|
||||
infrastructure where practical.
|
||||
|
||||
Actions:
|
||||
|
||||
1. Recreate the top-level `integration/` module, registered in
|
||||
`go.work`. The module hosts black-box test suites that drive
|
||||
`gateway` from outside and verify behaviour at the public boundary
|
||||
(with `backend` and `game` running in containers).
|
||||
2. Add testcontainers fixtures: Postgres, an SMTP capture server (for
|
||||
example `axllent/mailpit`), the `galaxy/game` engine image, the
|
||||
`galaxy/backend` image (built from this repo), and the
|
||||
`galaxy/gateway` image. The Docker daemon used by testcontainers
|
||||
is the same one backend will use to manage engines.
|
||||
3. Add a synthetic GeoLite2 mmdb (use `pkg/geoip/test-data/`).
|
||||
4. Cover scenarios:
|
||||
- Registration flow: send-email-code → confirm-email-code →
|
||||
`declared_country` populated from synthetic mmdb.
|
||||
- User account fetch: `X-User-ID` path returns the expected
|
||||
account; geo counter increments per request.
|
||||
- Lobby flow: create game → invite → application → ready-to-start
|
||||
→ start (engine container starts, healthz green, status read) →
|
||||
command → force-next-turn → finish → race name promotion.
|
||||
- Mail flow: trigger an email-bound notification → SMTP capture
|
||||
receives it → admin resend works.
|
||||
- Notification flow: lobby invite triggers a push event reaching
|
||||
the test client's gateway subscription, plus an email captured
|
||||
by SMTP.
|
||||
- Admin flow: bootstrap admin authenticates; CRUD admin creates a
|
||||
second admin; second admin disables the first.
|
||||
- Soft delete flow: user soft-delete cascades; their RND entries,
|
||||
memberships, applications, invites, geo counters are released
|
||||
or removed.
|
||||
- Session revocation: admin revokes a session → push
|
||||
`session_invalidation` arrives at gateway → active subscription
|
||||
closes; subsequent requests with that `device_session_id`
|
||||
rejected by gateway.
|
||||
- Anti-replay: same `request_id` replayed within freshness window
|
||||
is rejected by gateway.
|
||||
5. CI: run `go test ./integration/... -tags=integration` (or whichever
|
||||
flag the team prefers). Tests requiring real Docker run only when
|
||||
a Docker daemon is available; otherwise they skip with a clear
|
||||
message.
|
||||
|
||||
Critical files:
|
||||
|
||||
- `integration/go.mod`
|
||||
- `integration/auth_flow_test.go`
|
||||
- `integration/lobby_flow_test.go`
|
||||
- `integration/mail_flow_test.go`
|
||||
- `integration/notification_flow_test.go`
|
||||
- `integration/admin_flow_test.go`
|
||||
- `integration/soft_delete_test.go`
|
||||
- `integration/session_revoke_test.go`
|
||||
- `integration/anti_replay_test.go`
|
||||
- `integration/testenv/*.go` (shared fixtures)
|
||||
|
||||
Done criteria:
|
||||
|
||||
- `go test ./integration/...` runs the full suite.
|
||||
- All listed scenarios pass green on a developer machine with Docker
|
||||
available.
|
||||
- Failures produce actionable diagnostics (logs from each component
|
||||
attached to the test report).
|
||||
|
||||
## Stage acceptance and decision records
|
||||
|
||||
After each stage, the implementing engineer writes a short decision
|
||||
record under `backend/docs/stage<NN>-<topic>.md` capturing any
|
||||
non-trivial choice made during implementation that is not obvious from
|
||||
the code or from this plan. Records that contradict this plan must be
|
||||
brought to the architecture conversation before merge — the plan and
|
||||
the architecture document are the agreed contract.
|
||||
Reference in New Issue
Block a user