Serve the whole stack behind one host: site at /, game UI at /game/,
gateway REST at /api + /healthz, Connect at /rpc (prefix stripped by the
edge Caddy). The built artifact is domain-agnostic — the UI talks to the
gateway same-origin via relative URLs, so the same bundle runs under any
host with no rebuild and with CORS disabled.
- Rename the Connect proto service galaxy.gateway.v1.EdgeGateway ->
edge.v1.Gateway; regenerate Go + TS; public path /rpc/edge.v1.Gateway.
- Move the game UI under base path /game (env BASE_PATH); make the
manifest, service-worker scope, WASM loader, and all navigation
base-aware via a withBase helper.
- Relative API + /rpc Connect prefix; Vite dev proxy mirrors the strip.
- Rewrite the edge Caddy (dev + prod) for path-based routing; empty CORS
allow-lists (same-origin); single host.
- New VitePress project site (site/): i18n en/ru with switcher, LaTeX
math, minimal monospace theme; built and served at /.
- dev-deploy compose/Makefile + CI (dev-deploy, prod-build, new
site-build) build and seed the site; probes hit /, /game/, /healthz.
- Sync docs (ARCHITECTURE, gateway README/openapi, dev-deploy &
local-dev READMEs, CLAUDE.md, ui/PLAN).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Native SvelteKit service worker (src/service-worker.ts): a version-keyed
cache precaches the app shell + build artefacts (incl. core.wasm) +
static files; activate purges old caches; the gateway is never
intercepted; navigations fall back to the cached shell offline. Adds
static/manifest.webmanifest, a generated placeholder icon set
(scripts/gen-pwa-icons.mjs — dependency-free pure-Node PNG encoder), and
manifest / theme-color / apple-touch tags in app.html.
Gated by Playwright against a production preview (playwright.pwa.config.ts
+ tests/pwa/pwa.spec.ts via `pnpm test:pwa`, wired into ui-test):
manifest + installable icons, SW registration + a single version-keyed
cache, and offline shell load. Lighthouse is not used — its PWA category
was removed in v12.
Docs: ui/docs/pwa-strategy.md (+ index); F5 marked done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core.wasm and wasm_exec.js are no longer tracked (untracked + gitignored).
A reusable composite action .gitea/actions/build-wasm installs TinyGo
(actions/cache'd) and runs `make -C ui wasm`; it runs in all three
frontend-building workflows — ui-test (before Playwright; Vitest uses the
fake Core and needs no build), dev-deploy, and prod-build. ui-test gains a
Go setup (TinyGo shells out to Go); the deploy workflows already had one.
Docs: ui/docs/wasm-toolchain.md, ui/README.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`pnpm/action-setup@v4` defaults to installing pnpm in the shared
`~/setup-pnpm`. On the single host-mode runner $HOME is shared across
concurrent jobs, so when two pnpm jobs overlap (e.g. a post-merge
`dev-deploy` and `ui-test`, which sit in different concurrency groups)
their self-installers race and one fails with
`ENOTEMPTY ... rmdir '~/setup-pnpm/node_modules/.bin/store/v11/files'`
before the tests even run.
Point each step's `dest` at `${{ runner.temp }}/setup-pnpm` (a per-job
isolated directory) so concurrent jobs never share the install location.
The action still adds `dest` to PATH, so setup-node's pnpm cache and
later `pnpm` calls are unaffected; the pnpm package store stays shared
(safe — pnpm locks it). Applied to the three workflows that set up pnpm:
ui-test, dev-deploy, prod-build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`docker restart galaxy-dev-backend` failed with "not a directory"
after every dev-deploy workflow run. Root cause: the compose file
bind-mounted the geoip database via a relative path
(`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`).
When the Gitea runner invoked `docker compose up`, the path
resolved against the runner's ephemeral workspace under
`/home/runner/.cache/act/<hash>/hostexecutor/...`. The bind source
baked into the running container therefore pointed at that
ephemeral path; the runner deleted the workspace once the workflow
finished, and any later `docker restart` could not remount.
Replace the bind with a named volume `galaxy-dev-geoip-data`,
seeded at deploy time:
- `tools/dev-deploy/docker-compose.yml`: mount
`galaxy-dev-geoip-data:/var/lib/galaxy:ro` instead of a relative
bind. Declare the volume in the top-level `volumes:` block.
- `.gitea/workflows/dev-deploy.yaml`: new `Seed geoip volume` step
(placed right after the existing UI-volume seed) copies the
fixture from `pkg/geoip/test-data/test-data/` into the named
volume via an ephemeral alpine container, the same pattern UI
seeding already uses.
- `tools/dev-deploy/Makefile`: new `seed-geoip` target performs
the same copy from the persistent checkout. `up` and `rebuild`
now depend on it, so a hand-run `make -C tools/dev-deploy up`
populates the volume without operator action.
- `tools/dev-deploy/README.md`: updated the make-targets table to
list `seed-geoip`.
- `tools/dev-deploy/KNOWN-ISSUES.md`: the entry for the restart
failure is downgraded to a "fixed" postmortem; the symptom,
cause, and where the fix lives are kept for future reference.
Verification on the dev host (this branch checked out):
$ make -C tools/dev-deploy up # populates the volume, brings stack healthy
$ docker restart galaxy-dev-backend # used to error "not a directory"
$ until [ "$(docker inspect -f '{{.State.Health.Status}}' galaxy-dev-backend)" = "healthy" ]; do sleep 2; done
$ echo "ok" # backend up 6s, healthy
The pre-existing sandbox engine `galaxy-game-80f3ce86-...` survived
both `make up` and `docker restart` untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five connected cleanups across the dev/CI infrastructure:
1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
the legacy "offline workflow validator"; the per-stage CI gate now
runs on gitea.lan and the directory was only retained as a
fallback. Removing it leaves no operational dependency: backend,
gateway, and game code have no references; documentation that
pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
in this same change. Historical "Verified on local-ci run N"
markers in ui/PLAN.md are preserved unchanged.
2. Lift the pre-production single-migration rule. The rule forced
every schema delta into 00001_init.sql and required a manual
make clean-data wipe on every backward-incompatible change in
tools/dev-deploy/. Future schema deltas now land as additive
sequence-numbered files (00002_*.sql, …) that goose applies
automatically on backend startup; 00001_init.sql becomes an
immutable baseline. Authoring conventions live in
backend/internal/postgres/migrations/README.md. The chain may be
squashed back into a fresh 00001 as a deliberate one-time
operation before the first production deployment.
3. Document the deployment cadence. The dev environment is
single-tenant: pushes to feature/* run the test workflows
(go-unit, ui-test, integration) only; dev-deploy.yaml fires on
push to development. A workflow_dispatch override on
dev-deploy.yaml lets a developer preview a feature branch on the
shared dev environment before merge; the next merge into
development overwrites the manual deploy idempotently.
4. Scope compose-managed resources by an explicit
galaxy.stack=<local-dev|dev-deploy> label. Both compose files
stamp the label on every service, network, and named volume.
Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
engine-cleanup operations by (stack-label AND engine OCI title)
so they never touch unrelated workloads on the same daemon.
dev-deploy.yaml gains a pre-`compose up` step that reaps stale
exited/dead containers under the dev-deploy stack label.
5. Backend now stamps the same galaxy.stack=<value> label on every
engine container it spawns, sourced from a new BACKEND_STACK_LABEL
env var (empty → label not applied; legacy-safe). Both compose
files set it to their stack name (local-dev / dev-deploy). The
contract is recorded in docs/ARCHITECTURE.md under
"Container labels". A package-level test in
backend/internal/runtime exercises both the label-present and
label-absent paths.
No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two long-standing dev-environment ergonomics had not survived the
move from the bespoke local-dev stack to the CI-driven dev-deploy:
1. `BACKEND_DEV_SANDBOX_EMAIL` defaulted to an empty string in the
dev-deploy compose, so the auto-provisioned "Dev Sandbox" game
never appeared on `https://www.galaxy.lan`. Bake `dev@galaxy.lan`
as the default — matches `.env.example` and lets a developer who
logs in with that email find a ready-to-play game in the lobby.
2. The lobby's synthetic-report loader was gated on
`import.meta.env.DEV`, which is true only for `vite dev` (the
tools/local-dev path). The long-lived dev environment builds
with `vite build` (production mode), so the section was always
stripped from its bundle. Gate it on an explicit
`VITE_GALAXY_DEV_AFFORDANCES` flag instead and set it both in
`.env.development` (preserves `pnpm dev` behaviour) and in the
`dev-deploy.yaml` build step. The `prod-build.yaml` build path
leaves the flag unset, so production stays clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two problems showed up while trying to log into the long-lived dev
environment with the dev-fixed code `123456`:
1. `ConfirmEmailCode` checked the per-challenge attempts ceiling
*before* the dev-fixed-code override. A developer who burned past
`ChallengeMaxAttempts` on an existing un-consumed challenge (easy
to trigger when the throttle reuses one challenge_id) hit
`ErrTooManyAttempts` and the UI rendered "code expired or already
used" even though the fixed code was correct. Reorder so the
dev-fixed-code branch runs first and bypasses both the bcrypt
verify and the attempts gate. Production stays unaffected
because production loaders refuse to set `DevFixedCode`.
2. `dev-deploy.yaml` only fires on push to `development`, so the
matching docker-compose default change for
`BACKEND_AUTH_DEV_FIXED_CODE` could not reach the running stack
before this PR merged. Add `workflow_dispatch: {}` so a developer
can deploy any branch — typically a feature branch under review —
from the Gitea Actions UI without waiting for the merge.
Covered by a new `TestConfirmEmailCodeDevFixedCodeBypassesAttemptsCeiling`
integration test that burns through the ceiling with wrong codes
then proves the dev-fixed code still produces a session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`cancel-in-progress: true` killed run #73 even though it was the
only ui-test in its concurrency group — Gitea appears to cancel the
in-progress job on its own under that setting in some edge cases.
Switch to a singleton group with `cancel-in-progress: false`. The
new behaviour is simple queueing: only one ui-test workflow runs at
a time across the repository, the rest wait. Vite-on-:5173 cannot
collide because there is never a second ui-test alive. The wall-time
hit is bounded — ui-test is ~2 minutes — and bursts are rare enough
that queueing is cheap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`gitea.ref` differs between push (`refs/heads/<branch>`) and
pull_request (`refs/pull/N/head`) events even for the same commit,
so the two parallel runs land in different concurrency groups and
the Vite-on-:5173 collision is not suppressed. Switching the key to
the head sha (`gitea.event.pull_request.head.sha || gitea.sha`)
collapses both events into one bucket, leaving exactly one ui-test
alive per commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two ui-test jobs cannot coexist on the same host: Playwright's
`webServer` spec spawns `pnpm dev` on :5173, and on a host-mode
runner the port lives in the host namespace shared by every job.
ui-test #67 hit "Error: http://localhost:5173 is already used"
because a parallel job's Vite still held the port.
Two changes:
1. `concurrency: ui-test-${{ gitea.ref }}` with `cancel-in-progress:
true`. New push/PR runs against the same ref kill any earlier
ui-test before starting, so we never have two `pnpm dev`s alive
at once.
2. `pkill -f 'vite dev' || true` plus `fuser -k 5173/tcp` right
before Playwright. Defence in depth in case the concurrency
cancellation does not reap the spawned shell promptly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Gitea Actions cache service now answers on 10.200.0.1:43513
(post nftables fix on the runner side). Turn `cache: true` and
`cache: pnpm` back on so setup-go/setup-node can use it for
cross-job tarball caching on top of the host-persistent caches we
already rely on.
The setup-* actions still tolerate the cache being unavailable, so
this is reversible to `cache: false` if the service goes away again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`cache: true` (setup-go) and `cache: pnpm` (setup-node) make the
actions push and pull tarballs through the Gitea Actions cache
service at 192.168.0.222:43513. That endpoint currently does not
answer, so every workflow burns minutes per run on reserveCache
retries before the action gives up.
In host-mode the real caches live under the runner user's $HOME
(~/go/pkg/mod, ~/.cache/go-build, ~/.local/share/pnpm,
~/.cache/ms-playwright) and persist between jobs without any
actions/cache plumbing. Switching cache: off avoids the zombie
retries and uses the local disk caches the runner already has warm.
Reviving the cache service is a separate TODO. Until then this is
the simpler and faster baseline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`playwright install --with-deps` shells out to `sudo apt-get install`
for the system libraries that headless browsers need. In a job
container that runs as root this is silent; on a host-mode runner the
non-interactive sudo prompts for a password, fails three times, and
the step exits 1.
Drop --with-deps. The system .so libraries are installed once on the
host via `pnpm exec playwright install-deps` (or the equivalent
apt-get incantation); workflow runs only need to fetch the browser
binaries themselves, which lives under the runner user's home and
needs no privilege.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The act_runner now executes jobs natively on the host (no per-job
container), so actions/checkout uses the host's system CA store,
which already trusts the host-Caddy root CA. The workaround that
disabled TLS verification for `git fetch` is no longer needed and
just hides legitimate cert issues if they ever appear.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the `name:` field on every workflow to the bulleted style:
Tests · Go (go-unit.yaml)
Tests · UI (ui-test.yaml)
Tests · Integration (integration.yaml)
Deploy · Dev (dev-deploy.yaml)
Build · Prod (prod-build.yaml)
Deploy · Prod (deploy-prod.yaml)
File names stay the same so existing path filters and any URL
references continue to work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced on the first real merge into development:
1. `${{ env.HOME }}` evaluates to empty string at the workflow stage,
so GALAXY_DEV_GAME_STATE_DIR became `/.galaxy-dev/game-state`.
Resolve in the shell instead of YAML.
2. The compose bind-mount of GeoIP2-Country-Test.mmdb referenced a
path inside the runner's workspace volume, which the host Docker
daemon cannot see — it created an empty directory and the backend
crashed with "geoip database: is a directory" in a restart loop.
Bake the file into the backend image so dev-deploy no longer needs
a bind-mount; local-dev compose still mounts it on top for swap-in
during development.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Gitea host serves https://gitea.iliadenisov.ru with a cert signed
by host-Caddy's internal CA, which the runner-image's CA bundle does
not trust. actions/checkout@v4 fails on `git fetch` as a result, so
every workflow on gitea.lan has been failing — visible only now that
we made gitea.lan the primary CI target.
Sets GIT_SSL_NO_VERIFY=true on every workflow as a quick fix. Safe in
practice because both endpoints sit on the same LAN. The long-term
fix is to bake the Caddy root CA into the runner image and drop this
env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshapes .gitea/workflows/ around the new main ← development ←
feature/* branching model:
- go-unit.yaml — Go unit tests, runs on push/PR matching Go paths
- ui-test.yaml — narrowed to Vitest + Playwright only (Go tests now
live in go-unit.yaml)
- integration.yaml — testcontainers suite, fires on PR to
development/main and on push to development
- dev-deploy.yaml — builds the stack and (re)deploys tools/dev-deploy/
on every merge into development
- prod-build.yaml — builds prod images on push to main and uploads
docker save bundles as artifacts (30-day retention)
- deploy-prod.yaml — workflow_dispatch placeholder for the future
SSH-based rollout
ui-release.yaml is removed; its v* tag trigger is superseded by
prod-build.yaml plus the manual deploy-prod entry point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ui-test workflow gains a `!**/*.md` negation so commits touching only
markdown (READMEs, PLAN.md updates, topic docs) no longer kick off the
full Go + Vitest + Playwright pipeline. Mixed commits keep triggering
because at least one positive path (`ui/**`, `gateway/**`, …) still
matches.
Project CLAUDE.md adds a per-stage CI gate section so the local
Gitea Actions runner is exercised at the close of every stage from
any PLAN.md, with the push step pre-authorised.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two issues surfaced by the first end-to-end ui-test.yaml run on a
clean Linux runner that don't reproduce locally:
- pkg/geoip tests load fixtures from the pkg/geoip/test-data git
submodule (MaxMind-DB). actions/checkout@v4 does not fetch
submodules by default, so the fixture path is missing on the
runner. Both ui-test and ui-release workflows now check out with
submodules: recursive.
- pkg/util/TestWritable asserts that /usr/lib is not writable, which
holds for unprivileged users but fails inside the catthehacker
workflow container that runs as root. Skip that branch when
os.Geteuid() == 0; the root-only "the writable dir is writable"
branch still runs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Vitest + @testing-library/jest-dom matchers wired through tests/setup.ts.
Playwright with four projects: chromium-desktop, webkit-desktop,
chromium-mobile-iphone-13, chromium-mobile-pixel-5; traces and
screenshots retained on failure.
.gitea/workflows/ui-test.yaml runs Tier 1 on every push and pull
request: monorepo Go service tests (backend with -p 1 to dodge
testcontainer contention; gateway, game, every pkg/<name> module),
pnpm install --frozen-lockfile, playwright install --with-deps,
pnpm test, pnpm exec playwright test. Uploads playwright-report
and test-results on failure. Integration suite stays gated behind
make -C integration integration; deprecated client/ excluded.
.gitea/workflows/ui-release.yaml mirrors Tier 1 on v* tag push and
keeps commented placeholders for visual regression (Phase 33) and
macOS iOS smoke (Phase 32).
ui/docs/testing.md documents both tiers and the local invocations
that mirror what CI runs. ui/PLAN.md Phase 2 marked done; Phase 3
gains a bullet to extend the go test command with ./ui/core/...;
Phase 36 has the renamed release workflow path.
tools/local-ci/ ships a self-contained docker-compose for verifying
workflows against a local Gitea + arm64 act_runner before pushing
to a real instance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>