Commit Graph

15 Commits

Author SHA1 Message Date
Ilia Denisov 15d35f6f1f feat(game): canonical gameId in POST /api/v1/admin/init
Tests · Go / test (push) Successful in 1m57s
Tests · Integration / integration (pull_request) Successful in 1m48s
Tests · Go / test (pull_request) Successful in 2m0s
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.

The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).

Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 13:13:31 +02:00
Ilia Denisov 009ea560f9 feat(lobby): F8-04b hierarchical sidebar + paid-tier gate for create-game
Tests · Go / test (push) Successful in 2m17s
Tests · UI / test (push) Waiting to run
Reshape the lobby UI from a single Overview into a two-level sidebar
(games · profile · DEV synthetic-reports) with four games sub-panels
(active-past · recruitment · invitations · private-games). Move the
`create new game` button into the private-games panel, merge the
applications section into recruitment cards as status chips, and add
DEV-only synthetic-report loader as a top-level screen.

Add a paid-tier gate at backend `lobby.game.create`: free callers get
`403 forbidden` before the lobby service is invoked. The UI hides the
private-games sub-panel + create button on free tier (DEV affordances
flag overrides). Update every integration test that creates a game to
use a new `testenv.PromoteToPaid` helper; add a new
`TestLobbyFlow_FreeUserCreateGameForbidden`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:53:53 +02:00
Ilia Denisov 8565942392 feat(deploy): single-origin path-based deployment + project site
Build · Site / build (push) Successful in 8s
Tests · Go / test (push) Successful in 2m22s
Tests · UI / test (push) Failing after 2m42s
Serve the whole stack behind one host: site at /, game UI at /game/,
gateway REST at /api + /healthz, Connect at /rpc (prefix stripped by the
edge Caddy). The built artifact is domain-agnostic — the UI talks to the
gateway same-origin via relative URLs, so the same bundle runs under any
host with no rebuild and with CORS disabled.

- Rename the Connect proto service galaxy.gateway.v1.EdgeGateway ->
  edge.v1.Gateway; regenerate Go + TS; public path /rpc/edge.v1.Gateway.
- Move the game UI under base path /game (env BASE_PATH); make the
  manifest, service-worker scope, WASM loader, and all navigation
  base-aware via a withBase helper.
- Relative API + /rpc Connect prefix; Vite dev proxy mirrors the strip.
- Rewrite the edge Caddy (dev + prod) for path-based routing; empty CORS
  allow-lists (same-origin); single host.
- New VitePress project site (site/): i18n en/ru with switcher, LaTeX
  math, minimal monospace theme; built and served at /.
- dev-deploy compose/Makefile + CI (dev-deploy, prod-build, new
  site-build) build and seed the site; probes hit /, /game/, /healthz.
- Sync docs (ARCHITECTURE, gateway README/openapi, dev-deploy &
  local-dev READMEs, CLAUDE.md, ui/PLAN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:19:07 +02:00
Ilia Denisov 14b65389ef feat(gateway): unsigned gateway.heartbeat keeps Safari push streams alive
Tests · UI / test (push) Successful in 2m35s
Tests · Go / test (push) Successful in 1m56s
Tests · UI / test (pull_request) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 2m0s
Browser fetch-streaming layers close response bodies they consider
idle after roughly 15-30 s without incoming bytes. Safari is the
most aggressive, but the symptom matters everywhere: a quiet
SubscribeEvents stream (lobby, between turns, mailbox empty) gets
torn down by the browser, the EventStream singleton reconnects with
backoff, and any push event that fires inside the reconnect window
is lost because `push.Hub` queues are not persisted across
subscription closes. The user-visible failure mode is the
intermittent "Fetch API cannot load … due to access control checks"
console error (a misleading WebKit symptom — CORS headers are
actually present) plus missed turn-ready / mail-received toasts.

Server-side fix: a silence-based heartbeat at the
`authenticatedPushStreamService` wrapper layer. After the signed
`gateway.server_time` bootstrap event, gateway wraps the bound
stream with `heartbeatingStream`. Every tail Send (fan-out, future
variants) resets the silence timer; when the timer elapses, a
goroutine emits `gateway.heartbeat` with only `EventType` set —
everything else stays at proto3 defaults, so the wire frame is
~45 bytes amortised. A `sendMu` serialises the heartbeat goroutine
with tail Sends because grpc.ServerStream.Send is not goroutine-safe.

The heartbeat is intentionally UNSIGNED: heartbeats carry no
payload, dispatch to no handler on the client, and an injected
heartbeat trivially causes no user-visible state change. TLS still
protects the wire and real events keep the signed envelope
unchanged. Documented in `docs/ARCHITECTURE.md` § 15 alongside the
per-scale bandwidth projection (100…100 000 clients × 15…60 s).

Config: new `GATEWAY_PUSH_HEARTBEAT_INTERVAL` (default `15s`,
`0s` disables). Telemetry: new
`gateway.push.heartbeats_sent{outcome}` counter so operators can
budget bandwidth and spot a sudden `outcome=error` bump as an
upstream-failing-before-flush signal.

Client (`ui/frontend/src/api/events.svelte.ts`): early `continue`
on `event.eventType === "gateway.heartbeat"` before `verifyEvent`,
`verifyPayloadHash`, or dispatch — empty signature would otherwise
trip SignatureError and reconnect. A leading heartbeat still flips
`connectionStatus` to `connected` and resets backoff, because
receiving one is proof the stream is healthy.

Tests:
- `push_heartbeat_test.go`: unit tests for the wrapper — zero
  interval returns nil, heartbeat fires after silence, real Send
  resets the timer, Stop / context-cancel halt the goroutine,
  Send errors propagate.
- `server_test.go`: integration tests through the full gateway
  pipeline — heartbeat fires after the configured silence window,
  zero interval keeps the stream silent.
- `config_test.go`: default applied, env-override parsed,
  negative value rejected.
- `events.test.ts`: heartbeat skipped before verification + not
  dispatched to handlers; leading heartbeat still flips
  `connectionStatus` to `connected`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:29:29 +02:00
Ilia Denisov a338ebf058 fix(integration): scope preclean to galaxy.stack=integration
Tests · Integration / integration (pull_request) Successful in 1m37s
Root cause for the long-standing "Dev Sandbox flips to cancelled
after dev-deploy" symptom in push-triggered cycles: when
`integration.yaml` runs in parallel with `dev-deploy.yaml`, its
`integration/scripts/preclean.sh` issues a `docker rm -f` over every
container labelled `galaxy.backend=1`. That label is stamped by the
backend's runtime adapter on every engine it spawns — including the
engines living in the long-lived dev-deploy environment on the same
Docker daemon. Each post-merge auto-deploy therefore had the
integration preclean wipe the dev-sandbox engine, and the new
backend's reconciler tick observed `container disappeared` and
cascaded the sandbox into `cancelled`.

Fix:

- `integration/testenv/backend.go` now sets
  `BACKEND_STACK_LABEL=integration` on every backend-under-test, so
  the engines spawned by integration carry
  `galaxy.stack=integration` in addition to `galaxy.backend=1`. The
  backend support for this env was added in the previous CI tidy-up
  PR (#13).

- `integration/scripts/preclean.sh` gains a multi-label AND filter
  helper and uses it to scope engine cleanup to the combination
  `galaxy.backend=1 AND galaxy.stack=integration`. dev-deploy and
  local-dev engines carry different `galaxy.stack` values, so the
  AND match leaves them alone.

- `docs/ARCHITECTURE.md` "Container labels" — refreshed to call out
  the AND-scoping rule and the new integration backend stamp.

- `tools/dev-deploy/KNOWN-ISSUES.md` — the sandbox-cancel entry
  gets an "Update" section recording the root cause and the fix; the
  status is downgraded to "partially fixed" because the solo
  `workflow_dispatch` reproduction (which does NOT trigger
  integration) remains unexplained.

- `tools/dev-deploy/KNOWN-ISSUES.md` — separately, document the
  `docker restart galaxy-dev-backend` failure caused by the
  runner-workspace bind-mount that surfaced while diagnosing this
  issue. Workaround: `make -C tools/dev-deploy up` from the
  persistent checkout. Real fix is a follow-up (bake fixture into
  image or copy to named volume).

Verification:

- `go build ./backend/... ./integration/...` — clean.
- `bash -n integration/scripts/preclean.sh` — syntax OK.
- Live AND-filter check on the dev host:
  `docker ps -aq --filter label=galaxy.backend=1 --filter label=galaxy.stack=integration`
  returns nothing while the dev-deploy engine
  `galaxy-game-80f3ce86-...` keeps running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:37:55 +02:00
Ilia Denisov daed2690c1 fix(compose): keep galaxy.stack label on containers only
Tests · Integration / integration (pull_request) Successful in 1m41s
Tests · Go / test (pull_request) Successful in 2m0s
The previous commit stamped `galaxy.stack=<value>` on services,
volumes, and networks. Putting it on volumes/networks changes their
compose config-hash on every label revision, so `docker compose up`
tries to recreate them — which on the long-lived dev environment
either destroys the postgres data volume or deadlocks while trying
to remove `galaxy-dev-internal` with containers still bound to it.
Observed live: run #184 hung in compose recreate after the three
stateful services were stopped, with no recovery.

Containers alone are sufficient for the cleanup contract (we filter
containers, not volumes or networks). Roll back the label on volumes
and networks in both compose files and capture the rule in
docs/ARCHITECTURE.md so the next contributor does not reintroduce it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:00:21 +02:00
Ilia Denisov a9087691a3 chore(ci): tidy CI/dev infra — drop local-ci, lift migration rule, scope by galaxy.stack label
Tests · Go / test (push) Successful in 2m6s
Tests · Go / test (pull_request) Successful in 3m1s
Tests · Integration / integration (pull_request) Successful in 1m42s
Five connected cleanups across the dev/CI infrastructure:

1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
   the legacy "offline workflow validator"; the per-stage CI gate now
   runs on gitea.lan and the directory was only retained as a
   fallback. Removing it leaves no operational dependency: backend,
   gateway, and game code have no references; documentation that
   pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
   tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
   in this same change. Historical "Verified on local-ci run N"
   markers in ui/PLAN.md are preserved unchanged.

2. Lift the pre-production single-migration rule. The rule forced
   every schema delta into 00001_init.sql and required a manual
   make clean-data wipe on every backward-incompatible change in
   tools/dev-deploy/. Future schema deltas now land as additive
   sequence-numbered files (00002_*.sql, …) that goose applies
   automatically on backend startup; 00001_init.sql becomes an
   immutable baseline. Authoring conventions live in
   backend/internal/postgres/migrations/README.md. The chain may be
   squashed back into a fresh 00001 as a deliberate one-time
   operation before the first production deployment.

3. Document the deployment cadence. The dev environment is
   single-tenant: pushes to feature/* run the test workflows
   (go-unit, ui-test, integration) only; dev-deploy.yaml fires on
   push to development. A workflow_dispatch override on
   dev-deploy.yaml lets a developer preview a feature branch on the
   shared dev environment before merge; the next merge into
   development overwrites the manual deploy idempotently.

4. Scope compose-managed resources by an explicit
   galaxy.stack=<local-dev|dev-deploy> label. Both compose files
   stamp the label on every service, network, and named volume.
   Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
   engine-cleanup operations by (stack-label AND engine OCI title)
   so they never touch unrelated workloads on the same daemon.
   dev-deploy.yaml gains a pre-`compose up` step that reaps stale
   exited/dead containers under the dev-deploy stack label.

5. Backend now stamps the same galaxy.stack=<value> label on every
   engine container it spawns, sourced from a new BACKEND_STACK_LABEL
   env var (empty → label not applied; legacy-safe). Both compose
   files set it to their stack name (local-dev / dev-deploy). The
   contract is recorded in docs/ARCHITECTURE.md under
   "Container labels". A package-level test in
   backend/internal/runtime exercises both the label-present and
   label-absent paths.

No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 23:32:42 +02:00
Ilia Denisov 535e27008f diplomail (Stage A): add in-game personal mail subsystem
Tests · Go / test (push) Successful in 1m44s
Tests · Integration / integration (pull_request) Successful in 1m44s
Tests · Go / test (pull_request) Successful in 2m45s
Phase 28 of ui/PLAN.md needs a persistent player-to-player mail
channel; the existing `mail` package is a transactional email
outbox and the `notification` catalog is one-way platform events.
Stage A lands the schema (diplomail_messages / _recipients /
_translations), a single-recipient personal send/read/delete
service path, a `diplomail.message.received` push kind plumbed
through the notification pipeline, and an unread-counts endpoint
that drives the lobby badge. Admin / system mail, lifecycle hooks,
paid-tier broadcast, multi-game broadcast, bulk purge and language
detection / translation cache come in stages B–D.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:28:55 +02:00
Ilia Denisov f00c8efd18 docs: sync project guides to the new CI flow
go-unit / test (pull_request) Failing after 30s
integration / integration (pull_request) Failing after 34s
ui-test / test (pull_request) Failing after 37s
Aligns the project guides with the branching/CI/environment changes
landed in the previous commits:

- CLAUDE.md: per-stage CI gate now closes against gitea.lan; describes
  the main/development/feature/* flow and the workflow surface
- docs/ARCHITECTURE.md: new section 18 "CI and Environments" covering
  branches, workflows, and the local-dev / dev-deploy / local-ci
  triad; section numbering shifted accordingly
- tools/local-ci/README.md: marked as fallback (offline / runner
  isolation only)
- tools/local-dev/README.md and ui/README.md: cross-link to
  tools/dev-deploy/ for production-shaped testing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:26:57 +02:00
Ilia Denisov 2ca47eb4df ui/phase-25: backend turn-cutoff guard + auto-pause + UI sync protocol
Backend now owns the turn-cutoff and pause guards the order tab
relies on: the scheduler flips runtime_status between
generation_in_progress and running around every engine tick, a
failed tick auto-pauses the game through OnRuntimeSnapshot, and a
new game.paused notification kind fans out alongside
game.turn.ready. The user-games handlers reject submits with
HTTP 409 turn_already_closed or game_paused depending on the
runtime state.

UI delegates auto-sync to a new OrderQueue: offline detection,
single retry on reconnect, conflict / paused classification.
OrderDraftStore surfaces conflictBanner / pausedBanner runes,
clears them on local mutation or on a game.turn.ready push via
resetForNewTurn. The order tab renders the matching banners and
the new conflict per-row badge; i18n bundles cover en + ru.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:00:16 +02:00
Ilia Denisov f80c623a74 ui/phase-14: rename planet end-to-end + order read-back
Wires the first end-to-end command through the full pipeline:
inspector rename action → local order draft → user.games.order
submit → optimistic overlay on map / inspector → server hydration
on cache miss via the new user.games.order.get message type.

Backend: GET /api/v1/user/games/{id}/orders forwards to engine
GET /api/v1/order. Gateway parses the engine PUT response into the
extended UserGamesOrderResponse FBS envelope and adds
executeUserGamesOrderGet for the read-back path. Frontend ports
ValidateTypeName to TS, lands the inline rename editor + Submit
button, and exposes a renderedReport context so consumers see the
overlay-applied snapshot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 11:50:09 +02:00
Ilia Denisov f57a290432 phase 8: lobby UI + cross-stack lobby command catalog + TS FlatBuffers
- Extend pkg/model/lobby and pkg/schema/fbs/lobby.fbs with public-games
  list, my-applications/invites lists, game-create, application-submit,
  invite-redeem/decline. Mirror the matching transcoder pairs and Go
  fixture round-trip tests.
- Wire the seven new lobby message types through
  gateway/internal/backendclient/{routes,lobby_commands}.go with
  per-command REST helpers, JSON-tolerant decoding of backend wire
  shapes, and httptest-based unit coverage for success / 4xx / 5xx /
  503 across each command.
- Introduce TS-side FlatBuffers via the `flatbuffers` runtime dep, a
  `make fbs-ts` target driving flatc, and the generated bindings under
  ui/frontend/src/proto/galaxy/fbs. Phase 7's `user.account.get` decode
  now uses these bindings as well, closing the JSON.parse vs
  FlatBuffers gap that would have failed against a real local stack.
- Replace the placeholder lobby with five sections (my games, pending
  invitations, my applications, public games, create new game) and the
  /lobby/create form. Submit-application uses an inline race-name
  form on the public-game card; create-game keeps name / description /
  turn_schedule / enrollment_ends_at always visible and the rest under
  an Advanced toggle with TS-side defaults.
- Update lobby/+page.svelte to throw LobbyError on non-ok result codes;
  GalaxyClient.executeCommand now returns { resultCode, payloadBytes }.
- Vitest binding round-trips, lobby.ts wrapper unit tests, lobby-page
  + lobby-create component tests, Playwright lobby-flow.spec covering
  create / submit / accept across all four projects. Phase 7 e2e was
  migrated to the FlatBuffers fixtures and to click+fill against the
  Safari-autofill readonly inputs.
- Mark Phase 8 done in ui/PLAN.md, mirror the wire-format note into
  Phase 7, append the new lobby commands to gateway/README.md and
  docs/ARCHITECTURE.md, add ui/docs/lobby.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 18:05:08 +02:00
Ilia Denisov ecd2bc9348 phase 6: web storage layer (KeyStore, Cache, session)
KeyStore + Cache TS interfaces with WebCrypto non-extractable Ed25519
keys persisted via IndexedDB (idb), plus thin api/session.ts that
loads or creates the device session at app startup. Vitest unit
tests under fake-indexeddb cover both adapters; Playwright e2e
verifies the keypair survives reload and produces signatures still
verifiable under the persisted public key (gateway round-trip moves
to Phase 7's existing acceptance bullet).

Browser baseline: WebCrypto Ed25519 — Chrome >=137, Firefox >=130,
Safari >=17.4. No JS fallback; ui/docs/storage.md documents the
matrix and the WebKit non-determinism quirk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 14:08:09 +02:00
Ilia Denisov 118f7c17a2 phase 4: connectrpc on the gateway authenticated edge
Replace the native-gRPC server bootstrap with a single
`connectrpc.com/connect` HTTP/h2c listener. Connect-Go natively
serves Connect, gRPC, and gRPC-Web on the same port, so browsers can
now reach the authenticated surface without giving up the gRPC
framing native and desktop clients may use later. The decorator
stack (envelope → session → payload-hash → signature →
freshness/replay → rate-limit → routing/push) is reused unchanged
behind a small Connect → gRPC adapter and a `grpc.ServerStream`
shim around `*connect.ServerStream`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:49:28 +02:00
Ilia Denisov 604fe40bcf docs: reorder & testing 2026-05-07 00:58:53 +03:00