a01f39e4a7fe35e90d6fc389664d79396bec4be3
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
601970b028 |
refactor(game): lock-free storage, remove /command, flatten engine wrapper
Three-stage refactor of the game-engine plumbing (game logic untouched): Stage 1 — lock-free persistence + admin serialisation. Remove the file lock from repo/fs (the .lock file, the Read/Write-vs-*Safe duality and the dead ReadSafe polling) and replace the two-step rename with a single atomic rename so concurrent reads are torn-free without a lock. Serialise the state-mutating admin writers (init/turn/banish) with one shared router LimitMiddleware, rewritten to block on the request context instead of a racy shared 100ms timer. Stage 2 — remove the obsolete immediate-command path end to end. Players submit through PUT /api/v1/order; the legacy PUT /api/v1/command path is deleted across game (route, handler, 24 command factories, Ctrl), backend (Commands handler/route, engineclient.ExecuteCommands), gateway (dispatch + executeUserGamesCommand + routing entry), the FlatBuffers/model contract (UserGamesCommand[Response]) and transcoder, plus every affected OpenAPI/README/FUNCTIONAL/ARCHITECTURE doc. The integration proxy test is converted to the order path. Stage 3 — flatten the REST->engine wrapper. Replace the executor adapter, the controller package functions and RepoController with one concrete controller.Service; drop the single-implementation Repo and Storage interfaces (repo.Repo / fs.FS are now concrete). Handlers depend on a thin handler.Engine seam and own the domain->REST projection; storage is resolved once at startup instead of per request. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
15d35f6f1f |
feat(game): canonical gameId in POST /api/v1/admin/init
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.
The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).
Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
009ea560f9 |
feat(lobby): F8-04b hierarchical sidebar + paid-tier gate for create-game
Reshape the lobby UI from a single Overview into a two-level sidebar (games · profile · DEV synthetic-reports) with four games sub-panels (active-past · recruitment · invitations · private-games). Move the `create new game` button into the private-games panel, merge the applications section into recruitment cards as status chips, and add DEV-only synthetic-report loader as a top-level screen. Add a paid-tier gate at backend `lobby.game.create`: free callers get `403 forbidden` before the lobby service is invoked. The UI hides the private-games sub-panel + create button on free tier (DEV affordances flag overrides). Update every integration test that creates a game to use a new `testenv.PromoteToPaid` helper; add a new `TestLobbyFlow_FreeUserCreateGameForbidden`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8565942392 |
feat(deploy): single-origin path-based deployment + project site
Serve the whole stack behind one host: site at /, game UI at /game/, gateway REST at /api + /healthz, Connect at /rpc (prefix stripped by the edge Caddy). The built artifact is domain-agnostic — the UI talks to the gateway same-origin via relative URLs, so the same bundle runs under any host with no rebuild and with CORS disabled. - Rename the Connect proto service galaxy.gateway.v1.EdgeGateway -> edge.v1.Gateway; regenerate Go + TS; public path /rpc/edge.v1.Gateway. - Move the game UI under base path /game (env BASE_PATH); make the manifest, service-worker scope, WASM loader, and all navigation base-aware via a withBase helper. - Relative API + /rpc Connect prefix; Vite dev proxy mirrors the strip. - Rewrite the edge Caddy (dev + prod) for path-based routing; empty CORS allow-lists (same-origin); single host. - New VitePress project site (site/): i18n en/ru with switcher, LaTeX math, minimal monospace theme; built and served at /. - dev-deploy compose/Makefile + CI (dev-deploy, prod-build, new site-build) build and seed the site; probes hit /, /game/, /healthz. - Sync docs (ARCHITECTURE, gateway README/openapi, dev-deploy & local-dev READMEs, CLAUDE.md, ui/PLAN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
14b65389ef |
feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive
Tests · UI / test (push) Successful in 2m35s
Tests · Go / test (push) Successful in 1m56s
Tests · UI / test (pull_request) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 2m0s
Browser fetch-streaming layers close response bodies they consider
idle after roughly 15-30 s without incoming bytes. Safari is the
most aggressive, but the symptom matters everywhere: a quiet
SubscribeEvents stream (lobby, between turns, mailbox empty) gets
torn down by the browser, the EventStream singleton reconnects with
backoff, and any push event that fires inside the reconnect window
is lost because `push.Hub` queues are not persisted across
subscription closes. The user-visible failure mode is the
intermittent "Fetch API cannot load … due to access control checks"
console error (a misleading WebKit symptom — CORS headers are
actually present) plus missed turn-ready / mail-received toasts.
Server-side fix: a silence-based heartbeat at the
`authenticatedPushStreamService` wrapper layer. After the signed
`gateway.server_time` bootstrap event, gateway wraps the bound
stream with `heartbeatingStream`. Every tail Send (fan-out, future
variants) resets the silence timer; when the timer elapses, a
goroutine emits `gateway.heartbeat` with only `EventType` set —
everything else stays at proto3 defaults, so the wire frame is
~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine
with tail Sends because grpc.ServerStream.Send is not goroutine-safe.
The heartbeat is intentionally UNSIGNED: heartbeats carry no
payload, dispatch to no handler on the client, and an injected
heartbeat trivially causes no user-visible state change. TLS still
protects the wire and real events keep the signed envelope
unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the
per-scale bandwidth projection (100…100 000 clients × 15…60 s).
Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`,
`0s` disables). Telemetry: new
`gateway.push.heartbeats_sent{outcome}` counter so operators can
budget bandwidth and spot a sudden `outcome=error` bump as an
upstream-failing-before-flush signal.
Client (`ui/frontend/src/api/events.svelte.ts`): early `continue`
on `event.eventType === "gateway.heartbeat"` before `verifyEvent`,
`verifyPayloadHash`, or dispatch — empty signature would otherwise
trip SignatureError and reconnect. A leading heartbeat still flips
`connectionStatus` to `connected` and resets backoff, because
receiving one is proof the stream is healthy.
Tests:
- `push_heartbeat_test.go`: unit tests for the wrapper — zero
interval returns nil, heartbeat fires after silence, real Send
resets the timer, Stop / context-cancel halt the goroutine,
Send errors propagate.
- `server_test.go`: integration tests through the full gateway
pipeline — heartbeat fires after the configured silence window,
zero interval keeps the stream silent.
- `config_test.go`: default applied, env-override parsed,
negative value rejected.
- `events.test.ts`: heartbeat skipped before verification + not
dispatched to handlers; leading heartbeat still flips
`connectionStatus` to `connected`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a338ebf058 |
fix(integration): scope preclean to galaxy.stack=integration
Tests · Integration / integration (pull_request) Successful in 1m37s
Root cause for the long-standing "Dev Sandbox flips to cancelled after dev-deploy" symptom in push-triggered cycles: when `integration.yaml` runs in parallel with `dev-deploy.yaml`, its `integration/scripts/preclean.sh` issues a `docker rm -f` over every container labelled `galaxy.backend=1`. That label is stamped by the backend's runtime adapter on every engine it spawns — including the engines living in the long-lived dev-deploy environment on the same Docker daemon. Each post-merge auto-deploy therefore had the integration preclean wipe the dev-sandbox engine, and the new backend's reconciler tick observed `container disappeared` and cascaded the sandbox into `cancelled`. Fix: - `integration/testenv/backend.go` now sets `BACKEND_STACK_LABEL=integration` on every backend-under-test, so the engines spawned by integration carry `galaxy.stack=integration` in addition to `galaxy.backend=1`. The backend support for this env was added in the previous CI tidy-up PR (#13). - `integration/scripts/preclean.sh` gains a multi-label AND filter helper and uses it to scope engine cleanup to the combination `galaxy.backend=1 AND galaxy.stack=integration`. dev-deploy and local-dev engines carry different `galaxy.stack` values, so the AND match leaves them alone. - `docs/ARCHITECTURE.md` "Container labels" — refreshed to call out the AND-scoping rule and the new integration backend stamp. - `tools/dev-deploy/KNOWN-ISSUES.md` — the sandbox-cancel entry gets an "Update" section recording the root cause and the fix; the status is downgraded to "partially fixed" because the solo `workflow_dispatch` reproduction (which does NOT trigger integration) remains unexplained. - `tools/dev-deploy/KNOWN-ISSUES.md` — separately, document the `docker restart galaxy-dev-backend` failure caused by the runner-workspace bind-mount that surfaced while diagnosing this issue. Workaround: `make -C tools/dev-deploy up` from the persistent checkout. Real fix is a follow-up (bake fixture into image or copy to named volume). Verification: - `go build ./backend/... ./integration/...` — clean. - `bash -n integration/scripts/preclean.sh` — syntax OK. - Live AND-filter check on the dev host: `docker ps -aq --filter label=galaxy.backend=1 --filter label=galaxy.stack=integration` returns nothing while the dev-deploy engine `galaxy-game-80f3ce86-...` keeps running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
daed2690c1 |
fix(compose): keep galaxy.stack label on containers only
The previous commit stamped `galaxy.stack=<value>` on services, volumes, and networks. Putting it on volumes/networks changes their compose config-hash on every label revision, so `docker compose up` tries to recreate them — which on the long-lived dev environment either destroys the postgres data volume or deadlocks while trying to remove `galaxy-dev-internal` with containers still bound to it. Observed live: run #184 hung in compose recreate after the three stateful services were stopped, with no recovery. Containers alone are sufficient for the cleanup contract (we filter containers, not volumes or networks). Roll back the label on volumes and networks in both compose files and capture the rule in docs/ARCHITECTURE.md so the next contributor does not reintroduce it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a9087691a3 |
chore(ci): tidy CI/dev infra — drop local-ci, lift migration rule, scope by galaxy.stack label
Five connected cleanups across the dev/CI infrastructure:
1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
the legacy "offline workflow validator"; the per-stage CI gate now
runs on gitea.lan and the directory was only retained as a
fallback. Removing it leaves no operational dependency: backend,
gateway, and game code have no references; documentation that
pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
in this same change. Historical "Verified on local-ci run N"
markers in ui/PLAN.md are preserved unchanged.
2. Lift the pre-production single-migration rule. The rule forced
every schema delta into 00001_init.sql and required a manual
make clean-data wipe on every backward-incompatible change in
tools/dev-deploy/. Future schema deltas now land as additive
sequence-numbered files (00002_*.sql, …) that goose applies
automatically on backend startup; 00001_init.sql becomes an
immutable baseline. Authoring conventions live in
backend/internal/postgres/migrations/README.md. The chain may be
squashed back into a fresh 00001 as a deliberate one-time
operation before the first production deployment.
3. Document the deployment cadence. The dev environment is
single-tenant: pushes to feature/* run the test workflows
(go-unit, ui-test, integration) only; dev-deploy.yaml fires on
push to development. A workflow_dispatch override on
dev-deploy.yaml lets a developer preview a feature branch on the
shared dev environment before merge; the next merge into
development overwrites the manual deploy idempotently.
4. Scope compose-managed resources by an explicit
galaxy.stack=<local-dev|dev-deploy> label. Both compose files
stamp the label on every service, network, and named volume.
Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
engine-cleanup operations by (stack-label AND engine OCI title)
so they never touch unrelated workloads on the same daemon.
dev-deploy.yaml gains a pre-`compose up` step that reaps stale
exited/dead containers under the dev-deploy stack label.
5. Backend now stamps the same galaxy.stack=<value> label on every
engine container it spawns, sourced from a new BACKEND_STACK_LABEL
env var (empty → label not applied; legacy-safe). Both compose
files set it to their stack name (local-dev / dev-deploy). The
contract is recorded in docs/ARCHITECTURE.md under
"Container labels". A package-level test in
backend/internal/runtime exercises both the label-present and
label-absent paths.
No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
535e27008f |
diplomail (Stage A): add in-game personal mail subsystem
Phase 28 of ui/PLAN.md needs a persistent player-to-player mail channel; the existing `mail` package is a transactional email outbox and the `notification` catalog is one-way platform events. Stage A lands the schema (diplomail_messages / _recipients / _translations), a single-recipient personal send/read/delete service path, a `diplomail.message.received` push kind plumbed through the notification pipeline, and an unread-counts endpoint that drives the lobby badge. Admin / system mail, lifecycle hooks, paid-tier broadcast, multi-game broadcast, bulk purge and language detection / translation cache come in stages B–D. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f00c8efd18 |
docs: sync project guides to the new CI flow
Aligns the project guides with the branching/CI/environment changes landed in the previous commits: - CLAUDE.md: per-stage CI gate now closes against gitea.lan; describes the main/development/feature/* flow and the workflow surface - docs/ARCHITECTURE.md: new section 18 "CI and Environments" covering branches, workflows, and the local-dev / dev-deploy / local-ci triad; section numbering shifted accordingly - tools/local-ci/README.md: marked as fallback (offline / runner isolation only) - tools/local-dev/README.md and ui/README.md: cross-link to tools/dev-deploy/ for production-shaped testing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2ca47eb4df |
ui/phase-25: backend turn-cutoff guard + auto-pause + UI sync protocol
Backend now owns the turn-cutoff and pause guards the order tab relies on: the scheduler flips runtime_status between generation_in_progress and running around every engine tick, a failed tick auto-pauses the game through OnRuntimeSnapshot, and a new game.paused notification kind fans out alongside game.turn.ready. The user-games handlers reject submits with HTTP 409 turn_already_closed or game_paused depending on the runtime state. UI delegates auto-sync to a new OrderQueue: offline detection, single retry on reconnect, conflict / paused classification. OrderDraftStore surfaces conflictBanner / pausedBanner runes, clears them on local mutation or on a game.turn.ready push via resetForNewTurn. The order tab renders the matching banners and the new conflict per-row badge; i18n bundles cover en + ru. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f80c623a74 |
ui/phase-14: rename planet end-to-end + order read-back
Wires the first end-to-end command through the full pipeline:
inspector rename action → local order draft → user.games.order
submit → optimistic overlay on map / inspector → server hydration
on cache miss via the new user.games.order.get message type.
Backend: GET /api/v1/user/games/{id}/orders forwards to engine
GET /api/v1/order. Gateway parses the engine PUT response into the
extended UserGamesOrderResponse FBS envelope and adds
executeUserGamesOrderGet for the read-back path. Frontend ports
ValidateTypeName to TS, lands the inline rename editor + Submit
button, and exposes a renderedReport context so consumers see the
overlay-applied snapshot.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
f57a290432 |
phase 8: lobby UI + cross-stack lobby command catalog + TS FlatBuffers
- Extend pkg/model/lobby and pkg/schema/fbs/lobby.fbs with public-games
list, my-applications/invites lists, game-create, application-submit,
invite-redeem/decline. Mirror the matching transcoder pairs and Go
fixture round-trip tests.
- Wire the seven new lobby message types through
gateway/internal/backendclient/{routes,lobby_commands}.go with
per-command REST helpers, JSON-tolerant decoding of backend wire
shapes, and httptest-based unit coverage for success / 4xx / 5xx /
503 across each command.
- Introduce TS-side FlatBuffers via the `flatbuffers` runtime dep, a
`make fbs-ts` target driving flatc, and the generated bindings under
ui/frontend/src/proto/galaxy/fbs. Phase 7's `user.account.get` decode
now uses these bindings as well, closing the JSON.parse vs
FlatBuffers gap that would have failed against a real local stack.
- Replace the placeholder lobby with five sections (my games, pending
invitations, my applications, public games, create new game) and the
/lobby/create form. Submit-application uses an inline race-name
form on the public-game card; create-game keeps name / description /
turn_schedule / enrollment_ends_at always visible and the rest under
an Advanced toggle with TS-side defaults.
- Update lobby/+page.svelte to throw LobbyError on non-ok result codes;
GalaxyClient.executeCommand now returns { resultCode, payloadBytes }.
- Vitest binding round-trips, lobby.ts wrapper unit tests, lobby-page
+ lobby-create component tests, Playwright lobby-flow.spec covering
create / submit / accept across all four projects. Phase 7 e2e was
migrated to the FlatBuffers fixtures and to click+fill against the
Safari-autofill readonly inputs.
- Mark Phase 8 done in ui/PLAN.md, mirror the wire-format note into
Phase 7, append the new lobby commands to gateway/README.md and
docs/ARCHITECTURE.md, add ui/docs/lobby.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
ecd2bc9348 |
phase 6: web storage layer (KeyStore, Cache, session)
KeyStore + Cache TS interfaces with WebCrypto non-extractable Ed25519 keys persisted via IndexedDB (idb), plus thin api/session.ts that loads or creates the device session at app startup. Vitest unit tests under fake-indexeddb cover both adapters; Playwright e2e verifies the keypair survives reload and produces signatures still verifiable under the persisted public key (gateway round-trip moves to Phase 7's existing acceptance bullet). Browser baseline: WebCrypto Ed25519 — Chrome >=137, Firefox >=130, Safari >=17.4. No JS fallback; ui/docs/storage.md documents the matrix and the WebKit non-determinism quirk. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
118f7c17a2 |
phase 4: connectrpc on the gateway authenticated edge
Replace the native-gRPC server bootstrap with a single `connectrpc.com/connect` HTTP/h2c listener. Connect-Go natively serves Connect, gRPC, and gRPC-Web on the same port, so browsers can now reach the authenticated surface without giving up the gRPC framing native and desktop clients may use later. The decorator stack (envelope → session → payload-hash → signature → freshness/replay → rate-limit → routing/push) is reused unchanged behind a small Connect → gRPC adapter and a `grpc.ServerStream` shim around `*connect.ServerStream`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
604fe40bcf | docs: reorder & testing |