Stand up a production-mirror monitoring stack in the long-lived dev
contour, all on galaxy-dev-internal with no host ports (reached only via
the in-repo galaxy-dev-caddy):
- Prometheus scrapes backend:9100, gateway:9191, node-exporter and
cadvisor (30s interval, 15d retention); Loki (7d) + promtail (Docker
service discovery by the galaxy.stack=dev-deploy label) for logs;
Tempo (3d) for traces.
- Backend and gateway now export OTLP traces to Tempo over plaintext
gRPC on the internal network (OTEL_EXPORTER_OTLP_INSECURE).
- Grafana provisioned as code (Prometheus/Loki/Tempo datasources plus a
starter dashboard), served under /grafana/ via Caddy sub-path mode;
admin password from the GALAXY_DEV_GRAFANA_ADMIN_PASSWORD secret.
- Expose the Mailpit capture UI under /mailpit/ (Caddy basic-auth +
MP_WEBROOT) so every captured message is readable regardless of relay.
- dev-deploy.yaml seeds the monitoring config to a stable, reboot-
surviving host path and injects the Grafana admin secret.
Per-service memory limits keep the footprint within budget. All
collector config lives under tools/dev-deploy/monitoring/ for dev/prod
parity.
Keep Mailpit as the backend's SMTP submission point and turn on its
relay so OTP/notification mail addressed to the owner reaches a real
Gmail inbox, while everything else stays captured-only.
- mailpit gains --smtp-relay-config + --smtp-relay-matching (default
non-routable, so an unconfigured stack only captures); relay.conf is
mounted from a new galaxy-dev-mailpit-config volume
- tools/dev-deploy/mailpit/relay.conf.tmpl + a dev-deploy.yaml step that
renders it from Gitea secrets (Gmail App Password, never committed)
and seeds the volume; the GALAXY_DEV_MAIL_RELAY_MATCH var drives the
relay-matching recipient
- backend SMTP config unchanged (still -> galaxy-mailpit:1025)
- dev-deploy README documents the relay + required secrets/vars
Verified locally: compose config valid; the rendered relay.conf is
accepted by mailpit v1.21.8 (relay + recipient-matching enabled).
Real Gmail delivery is verified at the dev-deploy preview once the
owner sets the secrets.
Stage 2 of the dev-as-prod-mirror rework. The legacy-report (synthetic)
report loader is already available in the dev-deploy UI: it is gated by
the build-time flag VITE_GALAXY_DEV_AFFORDANCES (set "true" in
dev-deploy.yaml line 89, unset in prod-build.yaml so prod strips it),
not by import.meta.env.DEV. Correct the stale header comment that
claimed import.meta.env.DEV. No functional change — the desired
"loader in dev, absent in prod" posture already holds.
Stage 1 of the dev-as-prod-mirror rework. The auto-provisioned "Dev
Sandbox" game and dummy users are removed so the dev contour starts
empty like prod; the separate legacy-report loader stays as the
test-data path.
- delete backend/internal/devsandbox (package + tests)
- drop the bootstrap call + DevSandboxConfig (struct, Config field,
BACKEND_DEV_SANDBOX_* env, defaults, loader, validation)
- strip BACKEND_DEV_SANDBOX_* from dev-deploy + local-dev compose and
.env.example; the generic engine-recycle / prune-broken-engines logic
stays (it serves real games)
- update tooling docs (dev-deploy README + KNOWN-ISSUES, local-dev
README + Makefile) and stale comments; DeleteGame and
InsertMembershipDirect remain (exercised by lobby integration tests)
No app behaviour change beyond not auto-creating the sandbox game.
Add the mail, notifications, and broadcast pages over the mail, notification,
and diplomail services (no new business logic), completing the operator console.
- GET /_gm/mail deliveries (paginated) + dead-letters
- GET /_gm/mail/deliveries/{id} delivery detail + attempts
- POST /_gm/mail/deliveries/{id}/resend re-enqueue a non-sent delivery
- GET /_gm/notifications notifications + dead-letters + malformed
- GET/POST /_gm/broadcast multi-game admin diplomatic broadcast
Console depends on MailAdmin / NotificationAdmin / DiplomailAdmin interfaces
(satisfied by the concrete services); pages render in tests without a database.
Delivery detail and dead-letters live under /_gm/mail/deliveries/* and
/_gm/mail/... static segments to avoid a param/static route conflict. Resend
and broadcast flow through the CSRF guard.
Tests: mail page, delivery detail (+ not-found), resend (+ bad-CSRF),
notifications overview, broadcast form + send (input assertions) + bad game
ids, and unavailable. Plus an integration test that drives /_gm end to end
through the real gateway → backend (401 challenge + authenticated dashboard).
Docs: backend/docs/admin-console.md page inventory completed.
Add the operator-management page over *admin.Service (no new business logic).
- GET/POST /_gm/operators list + create operator
- POST /_gm/operators/{user}/disable|enable toggle access
- POST /_gm/operators/{user}/reset-password set a new password
Console depends on an OperatorAdmin interface (satisfied by *admin.Service) so
the page renders in tests without a database. Create POST is mounted on the
collection path; per-row disable/enable/reset are guarded by the CSRF middleware
and redirect back. Passwords are never logged.
Tests: list render, create (+ username/password assertions), username-taken
conflict, disable/enable, reset (+ password assertion), missing-password 400,
bad-CSRF 403, and unavailable 503.
Docs: backend/docs/admin-console.md page inventory extended.
Add the games, runtime, and engine-version pages over the existing lobby,
runtime, and engine-version services (no new business logic).
- GET/POST /_gm/games list + create public game
- GET /_gm/games/{id} detail incl. runtime snapshot
- POST /_gm/games/{id}/force-start|stop game state actions
- POST /_gm/games/{id}/ban-member ban a member (uuid + reason)
- POST /_gm/games/{id}/runtime/restart|patch|force-next-turn
- GET/POST /_gm/engine-versions registry + register
- POST /_gm/engine-versions/{ver}/disable disable a version
Console depends on GameAdmin / RuntimeAdmin / EngineVersionAdmin interfaces
(satisfied by the concrete services) so the pages render in tests without a
database. Collection-mutating POSTs are mounted on the collection path to avoid
a static-vs-param route conflict in gin. Writes flow through the CSRF guard and
redirect back; the create form parses datetime-local as UTC.
Tests: list/detail (with and without a runtime), create (visibility/owner/time
assertions), force-start (+ bad-CSRF), ban-member (+ bad uuid), runtime patch
(+ missing version), engine-version list/register/disable, and unavailable.
Docs: backend/docs/admin-console.md page inventory extended.
Add the operator console's user-administration pages over the existing
*user.Service (no new business logic).
- GET /_gm/users paginated account list
- GET /_gm/users/{id} account detail: profile, entitlement, sanctions
- POST /_gm/users/{id}/block apply permanent_block (reason required)
- POST /_gm/users/{id}/entitlement set the entitlement tier
- POST /_gm/users/{id}/soft-delete soft-delete the account (cascades)
The console depends on a UserAdmin interface (satisfied by *user.Service) so the
pages render in tests without a database. All writes flow through the CSRF
guard, carry the operator as the audit actor, and answer with a 303 redirect;
a generic message page handles not-found, validation, and failure notices.
Unblock is intentionally absent — the admin API exposes no remove-sanction
endpoint.
Tests: list/detail render, not-found, block (with actor/scope/reason
assertions), missing-reason 400, bad-CSRF 403, entitlement, soft-delete
redirect, and the service-unavailable path.
Docs: backend/docs/admin-console.md gains the page inventory.
Turn the console landing page into an operational dashboard.
- new internal/opsstatus: read-only Postgres projection via go-jet — ping +
per-status COUNT/GROUP BY on runtime_records, mail_deliveries,
notification_routes, and a malformed-intent count; degrades per-probe into
Snapshot.Errors rather than failing the page
- dashboard renders backend readiness, database health, the three status
tables, the malformed count, and any collection errors; falls back to a
"monitoring not wired" note when no reader is injected
- AdminConsoleHandlers now takes an AdminConsoleDeps struct (Monitor + Ready
added) so later stages add service refs without churning the signature
Tests: opsstatus store test against a Postgres testcontainer (empty schema +
one enqueued delivery); dashboard render tests with a fake reader (with and
without monitoring).
Docs: ARCHITECTURE 14.1 + FUNCTIONAL 10.2.1 (+ru) describe the dashboard.
(Prometheus /metrics exporters were already enabled in dev-deploy in Stage 1.)
site/rules.md is a faithful English mirror of the authoritative Russian
site/ru/rules.md — the same section anchors (so the in-page cross-links
and the RU/EN structure line up), the same LaTeX formulas with English
labels, and the same tables and engine nuances. Rewrite the English home
intro to match the Russian one and link to the rules, and register Rules
in the English sidebar. Completes the bilingual rules.
The duplicate-then-disappearing formulas were a Vue hydration mismatch,
not a CSS problem. markdown-it-mathjax3 v5 pulls the `mathxyjax3` fork,
which emits each formula's CSS as an in-content `<style>` block scoped to
a per-container `#mjx-<id>` that the static build never sets. The orphaned
scoped CSS left the screen-reader MathML twin visible (the duplicate), and
the in-`<main>` `<style>` elements break VitePress/Vue hydration
("Hydration completed but contains mismatches"), which strips the SVG
glyph `<path>`s and blanks every formula after the page finishes loading.
Downgrade to markdown-it-mathjax3 ^4.3.2 — the mathjax-full-based version
VitePress officially supports. It uses `juice` to inline all CSS into the
element `style` attributes (no in-content `<style>`), so hydration is
clean (glyphs survive) and the MathML twin is hidden by its own inlined
style (no duplicate). This also drops the earlier custom.css workaround,
which only treated the symptom and itself blanked the formulas.
Verified with a headless Chromium render of the built /ru/rules: all 10
formulas keep their glyph paths after hydration, no console mismatch, no
duplicate copies.
markdown-it-mathjax3 renders each formula as a visible SVG plus a MathML
twin (<mjx-assistive-mml>) for screen readers, hidden via CSS scoped to a
per-container #mjx-<id> selector. In the static build the containers carry
no id, so that scoped rule matches nothing and the twin renders as a
second, oversized (theme-monospaced) copy of every formula, in every
browser. Add a global visually-hidden rule for mjx-assistive-mml in the
theme CSS: the twin stays in the DOM for assistive tech but is removed
from view and from layout.
Editorial pass over site/ru/rules.md (on top of the verbatim port):
- moved the lore intro to the RU home page, rewritten in a modern voice;
- fixed typos, replaced the TODO/WTF cargo-tech note and the abandoned
(---ссылка---) marker with the verified mechanic and a real cross-link,
dropped the report TODO row;
- wove organic intra-page cross-links (#combat, #movement, #victory, ...);
- documented engine nuances verified against the code: ore auto-farming
and the capital / "запасы промышленности" store (industry capped at
population); cargo lost with ships destroyed in battle; and that a
losing race's colonists at a neutral planet are NOT lost — they stay
aboard (this corrects the audit note, verified in route.go).
Migration: delete game/rules.txt (its content now lives, authoritative,
in site/ru/rules.md) and repoint every reference to it (ui/frontend code
comments + tests, ui/docs, tools, ui/PLAN.md links). Record the
RU-authoritative rule in site/README.md and CLAUDE.md. The English
site/rules.md mirror follows in a separate stage.
Faithful Markdown rendering of game/rules.txt for the site: headings with
stable anchors, GFM tables and LaTeX formulas — the text itself is
unchanged (typos, the TODO/WTF notes, the broken (---ссылка---) marker and
the lore intro are all preserved as-is). The editorial pass (clarity,
nuances, organic cross-links, intro moved to the home page) follows in a
separate commit so its diff isolates exactly what changed relative to the
original. Registers the page in the RU sidebar.
In host-mode the ui-test job runs as root, so vite (test:pwa),
svelte-kit and Playwright write build/, .svelte-kit/, test-results/ and
playwright-report/ root-owned into the shared host workspace. The
act_runner (non-root) then cannot remove them at teardown
("unlinkat ui/frontend/build: permission denied"), which spuriously
marks this or a sibling job that inherits the dirty workspace as failed
— it hit go-unit on the #83 merge even though every test passed.
Add an `if: always()` step that removes those generated dirs while the
job still has root, after the artifact uploads. Keeps the shared
workspace clean for the runner's own teardown and for later jobs.
The committed FlatBuffers bindings were generated by flatc 25.x (the TS
runtime is flatbuffers@25.9.23), but nothing pinned the compiler, so a
regen on a box with an older flatc (Debian apt ships 23.5.26) silently
churns output and flips nullable-scalar builder defaults. PR #82 hit this
and shipped 5 report files from the wrong compiler.
Unify the whole toolchain on 25.9.23 (the only version available as an
npm package, a prebuilt flatc binary, and a Go tag) and make the bindings
reproducible:
- Downgrade the flatbuffers Go module 25.12.19 -> 25.9.23 (schema,
transcoder, gateway, integration) so compiler and both runtimes match.
- Regenerate every schema with flatc 25.9.23. The only resulting change
is order/command-item.ts: the lone straggler still on the old
optional-scalar builder default (cmd_applied/cmd_error_code: 0 -> null).
Inert in practice — the TS side never builds those response-only fields
(the engine sets them in Go); the reader is unchanged.
- Pin the version in tooling: a flatc-check guard in ui/Makefile (fbs-ts)
and a new pkg/schema/fbs/Makefile (fbs-go); both refuse a mismatched
flatc and point at the release binary. Fix the stale apt install hint.
- Add a path-filtered CI guard (.gitea/workflows/fbs-codegen.yaml) that
regenerates with the pinned flatc and fails on any diff.
- Document the pinned version and the regen commands in the schema README.
No wire-format change: Go build/vet, transcoder roundtrip + engine tests,
pnpm check and the full vitest suite (888) stay green.
Surface the inactivity-removal countdown the rules promise but the
engine never reported. A race within five turns of being auto-removed
for inactivity gets a personal warning in its own report; every race
within three turns is listed publicly to all participants.
- model: Report.PersonalExitWarning + RacesLeavingSoon ([]RaceExitNotice)
- fbs: RaceExitNotice table + Report.personal_exit_warning /
races_leaving_soon (regenerated Go + TS bindings)
- transcoder: encode/decode both fields
- engine: ReportExitWarnings fills the recipient's TTL (1..5) and lists
other non-extinct races with TTL 1..3, excluding the recipient itself
- ui: danger-styled personal banner + "races leaving soon" section
(hidden when empty), wired into the report view, EN/RU i18n
- docs: rules.txt report-section list, FUNCTIONAL.md 6.4 + RU mirror
Voluntary quit and idle timeout share the TTL countdown and are not
distinguished, per the agreed scope.
A bundle of small rules-vs-engine corrections:
- Science proportions: accept a sum that equals 1 only up to float
rounding (was an exact != 1 comparison); the rules example is reworded
so it is unambiguous that proportions are fractions summing to 1.
- Generation: super-big planets get a resource strictly above 0 (minimum
0.001, was a hard 0.1); the rules table is fixed for big planets (1-10,
not 0.1-10) and the false "0.1-20 / average 1.5" resource claim removed.
- Dismantle over a neutral planet now unloads the colonists and settles
it (the planet becomes the race's); over a foreign planet they are
still lost. The rules clause is clarified for own / neutral / foreign.
- Report: ship-production entries are written at the compacted report
index (was the planet's map index, which could write past the grown
slice and panic); the incoming-group "remaining distance" is measured
from the group's current hyperspace position, not its origin planet
(matching OtherGroup).
- validator: the cargo-value error now carries the cargo value, not the
shields value.
Tests added for each behavioural fix; rules.txt updated in the same patch.
TurnWipeExtinctRaces iterated only non-extinct races, so an
administratively banished race (flagged extinct, TTL untouched) was never
wiped: its planets stayed owned and its ships lingered, while the race
itself could no longer act. The loop now covers every race and wipes when
either an active race's TTL has run out (idle / quit) or an extinct race
still holds assets (banish). The asset check makes repeated passes
idempotent.
wipeRace already matched the rules for exclusion (ships removed, planets
uninhabited, industry and capital cleared, material retained), so the
behaviour is just documented in game/README.md.
Tests: banish releases planets and ships on the next turn (and is
idempotent); idle-timeout wipe still fires under the new iterator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the rules ("Бомбардировка планет"), a planet is bombed from the
strongest attacking power downwards, and a planet bombed to extinction
keeps its material and capital stockpiles but loses its working industry.
ProduceBombings now sorts attacking races by total bombing power
(descending) instead of iterating the attacker map in random order, and
on a wipe zeroes the planet's industry (Free already keeps capital and
material). bombingPower is extracted as a shared helper.
The rules already describe both, so no documentation change. Tests:
bombing order by power, and industry collapse with capital/material kept
on a wipe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring the report's foreign-group and foreign-class visibility in line
with the rules (game/rules.txt "Движение" and the report sections):
- incoming groups (heading to one of the recipient's planets) are shown
only within the recipient's visibility range (driveTech*30); beyond it
a group is hidden even though it is inbound;
- the unidentified-group list now uses the visibility range (it used the
flight range, driveTech*40), excludes groups heading to the recipient's
planets (those belong to the incoming list), and reports each group
once (it previously emitted an entry per in-range owned planet);
- ship classes met in a battle the recipient took part in or witnessed
now appear in OtherShipClass, with the design looked up from the owner
race's ship types (the battle report carries only the class name).
The rules already describe this behaviour and the report wire shape is
unchanged, so no documentation change. Tests added for all three.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TurnPlanetProductions started its production budget from
PlanetProductionCapacity, which already subtracts the reserved upgrade
cost, and then subtracted each applied upgrade's cost again in the apply
loop — charging every applied upgrade twice. That both starved the
planet's build/research budget and could skip upgrades that were in fact
affordable.
The budget now starts from the planet's full production potential and the
apply loop deducts each upgrade once; PlanetProductionCapacity stays the
report's net-of-upgrades "free L".
Test: TestUpgradeDoesNotDoubleChargeProduction; the TestProduceShips MAT
expectation is updated to the once-charged value.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the documented turn order (game/rules.txt "Последовательность
действий"), no ship should dodge the pre-departure battle by slipping
into hyperspace. MakeTurn now runs merge -> battle -> load+launch routed
groups -> fly -> merge -> battle, so:
- ships ordered to depart (Launched) and ships being upgraded now take
part in the pre-departure battle at their planet (CollectPlanetGroups /
FilterBattleGroups); only survivors then enter hyperspace;
- routed transports are loaded and launched AFTER that battle, so they
fight empty and cannot escape it.
A just-launched group has no stored hyperspace position, so moveShipGroup
starts its first leg from the origin planet; the previous code read the
nil launch coordinate and would panic.
Because upgrading groups can now lose ships in the battle, the pending
upgrade cost is recomputed from the group's current ship count instead of
the value stored when the order was validated.
Rules: reordered "Последовательность действий" and rewrote the combat
note that ordered/routed ships skip the battle.
Tests: launched-group move from origin, launched/upgrade groups taking
part in battle, upgrade cost tracking ship losses.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The battle engine diverged from the documented combat model
(game/rules.txt "Сражения") in three ways:
- the destruction roll was inverted (rand >= p), so a near-certain hit
destroyed its target only ~(1-p) of the time;
- a whole group fired as a single ship (Armament shots per round)
regardless of its ship count, so fleet size never affected offence;
- the defending mass used the whole group's full mass instead of one
target ship's, weakening grouped ships' shields by ~Number^(1/3).
SingleBattle now resolves ship by ship: every living ship fires once per
round in random order across all groups, each gun targets a random enemy
ship (weighted by group size), and the destruction roll matches the
documented probability. FilterBattleOpponents evaluates per-ship mass.
Also fixes opponent-map initialisation in ProduceBattles that kept only
an attacker's last opponent.
The rules already describe this model, so no documentation change is
needed. Tests: per-ship one-sided wipe, destruction-roll direction, and
the updated per-ship-mass probability expectation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up tidy after the cross-service /command removal (#73):
- Rename the router test double dummyExecutor -> fakeEngine (and the
newExecutor / setupRouterExecutor helpers -> newFakeEngine /
setupRouterEngine): it implements handler.Engine now, "executor" was a
leftover of the removed adapter. Test-only.
- Regenerate the ui/core canon signing golden onto user.games.order
(request_user_games_command.json -> request_user_games_order.json, fresh
canonical bytes + Ed25519 signature) and drop the last
user.games.command references from the Go/TS tests and docs.
- Align game openapi: CommandRequest.cmd no longer carries minItems: 1. It
is now used only by PUT /api/v1/order, which accepts an empty batch
(clearing the player's stored order, equivalent to removing every
command); the contract test freezes the empty-allowed shape.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend e2e tests (and, more rarely, service startup) intermittently
failed applying migrations with `driver: bad connection`: a freshly
started Postgres — notably a test container — can reset a pooled
connection moments after it reports ready, killing the migration
transaction. The harness already waits for the double "ready" log and
pings before migrating, yet goose can still draw a connection postgres
then resets.
ApplyMigrations now wraps the schema-create + goose run in a bounded
retry that fires only on transient connection errors (driver.ErrBadConn
and the connection-failure messages Postgres drivers surface); both
steps are idempotent, so a retry resumes cleanly. Deterministic SQL
errors still fail fast.
Fixes the intermittent TestDiplomailAsyncFallbackOnUnsupportedPair (and
the eight other testcontainer e2e harnesses that share ApplyMigrations).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three-stage refactor of the game-engine plumbing (game logic untouched):
Stage 1 — lock-free persistence + admin serialisation. Remove the file
lock from repo/fs (the .lock file, the Read/Write-vs-*Safe duality and the
dead ReadSafe polling) and replace the two-step rename with a single atomic
rename so concurrent reads are torn-free without a lock. Serialise the
state-mutating admin writers (init/turn/banish) with one shared router
LimitMiddleware, rewritten to block on the request context instead of a
racy shared 100ms timer.
Stage 2 — remove the obsolete immediate-command path end to end. Players
submit through PUT /api/v1/order; the legacy PUT /api/v1/command path is
deleted across game (route, handler, 24 command factories, Ctrl), backend
(Commands handler/route, engineclient.ExecuteCommands), gateway (dispatch +
executeUserGamesCommand + routing entry), the FlatBuffers/model contract
(UserGamesCommand[Response]) and transcoder, plus every affected
OpenAPI/README/FUNCTIONAL/ARCHITECTURE doc. The integration proxy test is
converted to the order path.
Stage 3 — flatten the REST->engine wrapper. Replace the executor adapter,
the controller package functions and RepoController with one concrete
controller.Service; drop the single-implementation Repo and Storage
interfaces (repo.Repo / fs.FS are now concrete). Handlers depend on a thin
handler.Engine seam and own the domain->REST projection; storage is
resolved once at startup instead of per request.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Engine no longer mints its own game UUID. The orchestrator (backend)
generates the game UUID at game-create time and passes it in the
admin/init request body as the required `gameId` field, so the value
that names the engine container and host bind-mount directory also
ends up inside the engine's state.json.
The engine rejects the zero UUID with 400 and any init that conflicts
with an existing state.json with 409 (a second init on the same gameId
is also a conflict; full idempotency is not part of the contract).
Updates rest.InitRequest, openapi.yaml (schema + 409 response),
controller.GenerateGame/NewGame/buildGameOnMap signatures, the engine
HTTP handler/executor, the backend runtime worker, and the relevant
unit and contract tests. Documentation in game/README.md,
docs/ARCHITECTURE.md, backend/README.md, and backend/docs/{runtime,flows}.md
is updated in the same patch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#59. Engine returns 202 + per-command cmdApplied/cmdErrorCode/cmdErrorMessage instead of blanket 500; pkg/error consts reshelved onto 1xxx/2xxx/3xxx; UI keeps sync banner green on per-command rejection, surfaces the engine reason inline, and hydrates per-command verdicts from the server on game re-entry. Dev-deploy now recycles game containers when galaxy-engine:dev SHA drifts.
The `rename-planet` and `ship-classes` rejected-submit specs broke on
the previous commit because:
1. `tests/e2e/fixtures/order-fbs.ts` builds the FBS response without
`forceDefaults(true)`, and flatbuffers@25's TS codegen now elides
`cmd_applied=false` against its int8 default of 0. The encoded
payload no longer carried the rejection, so the UI decoded the row
as `applied` and the assertions on the `rejected` status text
failed first. The production Go transcoder already force-slots
the field; mirror that behaviour in the e2e fixture.
2. The specs themselves still asserted the old blanket
`data-sync-status="error"` on per-command rejection. After the
previous commit's behaviour change the bar stays `synced` for
per-command rejection (only genuine transport failures keep the
red banner + Retry), so the assertions now read the row's inline
reason text instead.
`tests/e2e/fixtures/order-fbs.ts` also gains the `cmdErrorMessage`
field so future fixtures can mirror the engine's rejection reason
through the round trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues surfaced once the per-command rejection from the previous
commit actually reached the UI:
1. Sync banner falsely red. `OrderDraftStore.runSync` flipped
`syncStatus = "error"` whenever any command was rejected and
advertised a Retry button. A per-command rejection is a
player-correctable state — the round trip succeeded, the engine
just refused that command — so the retry can't help. Keep
`syncStatus = "synced"` on `success`; the red row highlight is
the visible cue.
2. Rejection reason missing. Add `cmd_error_message: string` to
`CommandItem` in `pkg/schema/fbs/order.fbs` (appended last to
preserve existing slot offsets) and regenerate the Go + TS stubs
for that one type. Plumb the message through `CommandMeta`,
`Controller.applyCommand`'s `m.Result(code, message)` call, the
Go transcoder, the UI decoders in `submit.ts` / `order-load.ts`,
and the `OrderDraftStore.errorMessages` map. `order-tab.svelte`
renders it as an italic danger-coloured line under rejected
commands, with new CSS for `.error-reason`.
3. Verdict lost on navigation. `order-load.ts.decodeCommand` never
read `cmdApplied`/`cmdErrorCode`, so `hydrateFromServer` fell
back to a blanket "applied" status — a previously-rejected
command came back green after a lobby → game round trip. Extend
the fetch decoder to populate `statuses`/`errorCodes`/
`errorMessages` maps and have `hydrateFromServer` use them.
Engine-side persistence already records the verdict on disk —
verified against the live `0000/order/<id>.json`.
`flatbuffers@25` elides default-int8/int64 fields on write; the Go
transcoder force-slots `cmd_applied=false` / `cmd_error_code=0`
already, the new test fixtures flip `builder.forceDefaults(true)` to
mirror that behaviour so the round trip survives.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`backend`'s reconciler adopts pre-existing `galaxy-game-*` containers
without comparing their image SHA against the freshly-built
`galaxy-engine:dev`, so a long-lived sandbox would otherwise keep
serving the previous engine code after a redeploy. Issue #59 surfaced
this: after the per-command-rejection fix was deployed via
`workflow_dispatch`, the running sandbox container was still on the
old image SHA and the browser kept seeing the 503/unavailable response.
Adds a `Recycle engine containers on image drift` step right before
`Reap stray dev-deploy containers`. The step compares the new
`galaxy-engine:dev` SHA against every running `galaxy-game-*`
container and, on drift, stops the backend, removes the container,
wipes the bind-mounted per-game state directory (Engine.Init() writes
turn-0 over any pre-existing `turn-N` files — silent state corruption
otherwise), and cascade-deletes the lobby `games` row. The
`dev-sandbox` bootstrap on the next backend boot finds no live
sandbox and provisions a fresh one on the new engine image.
When the engine sources are unchanged, the BuildKit cache hits and
the SHA stays the same — the recycle step is a no-op and the running
games keep their state across the deploy. Verified end-to-end against
the live dev environment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>