Commit Graph

14 Commits

Author SHA1 Message Date
Ilia Denisov 8565942392 feat(deploy): single-origin path-based deployment + project site
Build · Site / build (push) Successful in 8s
Tests · Go / test (push) Successful in 2m22s
Tests · UI / test (push) Failing after 2m42s
Serve the whole stack behind one host: site at /, game UI at /game/,
gateway REST at /api + /healthz, Connect at /rpc (prefix stripped by the
edge Caddy). The built artifact is domain-agnostic — the UI talks to the
gateway same-origin via relative URLs, so the same bundle runs under any
host with no rebuild and with CORS disabled.

- Rename the Connect proto service galaxy.gateway.v1.EdgeGateway ->
  edge.v1.Gateway; regenerate Go + TS; public path /rpc/edge.v1.Gateway.
- Move the game UI under base path /game (env BASE_PATH); make the
  manifest, service-worker scope, WASM loader, and all navigation
  base-aware via a withBase helper.
- Relative API + /rpc Connect prefix; Vite dev proxy mirrors the strip.
- Rewrite the edge Caddy (dev + prod) for path-based routing; empty CORS
  allow-lists (same-origin); single host.
- New VitePress project site (site/): i18n en/ru with switcher, LaTeX
  math, minimal monospace theme; built and served at /.
- dev-deploy compose/Makefile + CI (dev-deploy, prod-build, new
  site-build) build and seed the site; probes hit /, /game/, /healthz.
- Sync docs (ARCHITECTURE, gateway README/openapi, dev-deploy &
  local-dev READMEs, CLAUDE.md, ui/PLAN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:19:07 +02:00
Ilia Denisov b729036778 build(ui): build core.wasm in CI, stop committing the binary (F6)
Tests · UI / test (push) Successful in 3m48s
Tests · UI / test (pull_request) Successful in 2m35s
core.wasm and wasm_exec.js are no longer tracked (untracked + gitignored).
A reusable composite action .gitea/actions/build-wasm installs TinyGo
(actions/cache'd) and runs `make -C ui wasm`; it runs in all three
frontend-building workflows — ui-test (before Playwright; Vitest uses the
fake Core and needs no build), dev-deploy, and prod-build. ui-test gains a
Go setup (TinyGo shells out to Go); the deploy workflows already had one.

Docs: ui/docs/wasm-toolchain.md, ui/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:29:33 +02:00
Ilia Denisov b24d53b82f ci: install pnpm into a per-job dir to fix the host-runner setup race
Tests · UI / test (push) Waiting to run
Tests · UI / test (pull_request) Successful in 1m56s
`pnpm/action-setup@v4` defaults to installing pnpm in the shared
`~/setup-pnpm`. On the single host-mode runner $HOME is shared across
concurrent jobs, so when two pnpm jobs overlap (e.g. a post-merge
`dev-deploy` and `ui-test`, which sit in different concurrency groups)
their self-installers race and one fails with
`ENOTEMPTY ... rmdir '~/setup-pnpm/node_modules/.bin/store/v11/files'`
before the tests even run.

Point each step's `dest` at `${{ runner.temp }}/setup-pnpm` (a per-job
isolated directory) so concurrent jobs never share the install location.
The action still adds `dest` to PATH, so setup-node's pnpm cache and
later `pnpm` calls are unaffected; the pnpm package store stays shared
(safe — pnpm locks it). Applied to the three workflows that set up pnpm:
ui-test, dev-deploy, prod-build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 17:26:49 +02:00
Ilia Denisov f70258849f fix(dev-deploy): seed geoip onto a named volume
`docker restart galaxy-dev-backend` failed with "not a directory"
after every dev-deploy workflow run. Root cause: the compose file
bind-mounted the geoip database via a relative path
(`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`).
When the Gitea runner invoked `docker compose up`, the path
resolved against the runner's ephemeral workspace under
`/home/runner/.cache/act/<hash>/hostexecutor/...`. The bind source
baked into the running container therefore pointed at that
ephemeral path; the runner deleted the workspace once the workflow
finished, and any later `docker restart` could not remount.

Replace the bind with a named volume `galaxy-dev-geoip-data`,
seeded at deploy time:

- `tools/dev-deploy/docker-compose.yml`: mount
  `galaxy-dev-geoip-data:/var/lib/galaxy:ro` instead of a relative
  bind. Declare the volume in the top-level `volumes:` block.

- `.gitea/workflows/dev-deploy.yaml`: new `Seed geoip volume` step
  (placed right after the existing UI-volume seed) copies the
  fixture from `pkg/geoip/test-data/test-data/` into the named
  volume via an ephemeral alpine container, the same pattern UI
  seeding already uses.

- `tools/dev-deploy/Makefile`: new `seed-geoip` target performs
  the same copy from the persistent checkout. `up` and `rebuild`
  now depend on it, so a hand-run `make -C tools/dev-deploy up`
  populates the volume without operator action.

- `tools/dev-deploy/README.md`: updated the make-targets table to
  list `seed-geoip`.

- `tools/dev-deploy/KNOWN-ISSUES.md`: the entry for the restart
  failure is downgraded to a "fixed" postmortem; the symptom,
  cause, and where the fix lives are kept for future reference.

Verification on the dev host (this branch checked out):

  $ make -C tools/dev-deploy up                # populates the volume, brings stack healthy
  $ docker restart galaxy-dev-backend          # used to error "not a directory"
  $ until [ "$(docker inspect -f '{{.State.Health.Status}}' galaxy-dev-backend)" = "healthy" ]; do sleep 2; done
  $ echo "ok"                                   # backend up 6s, healthy

The pre-existing sandbox engine `galaxy-game-80f3ce86-...` survived
both `make up` and `docker restart` untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:59:38 +02:00
Ilia Denisov a9087691a3 chore(ci): tidy CI/dev infra — drop local-ci, lift migration rule, scope by galaxy.stack label
Tests · Go / test (push) Successful in 2m6s
Tests · Go / test (pull_request) Successful in 3m1s
Tests · Integration / integration (pull_request) Successful in 1m42s
Five connected cleanups across the dev/CI infrastructure:

1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
   the legacy "offline workflow validator"; the per-stage CI gate now
   runs on gitea.lan and the directory was only retained as a
   fallback. Removing it leaves no operational dependency: backend,
   gateway, and game code have no references; documentation that
   pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
   tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
   in this same change. Historical "Verified on local-ci run N"
   markers in ui/PLAN.md are preserved unchanged.

2. Lift the pre-production single-migration rule. The rule forced
   every schema delta into 00001_init.sql and required a manual
   make clean-data wipe on every backward-incompatible change in
   tools/dev-deploy/. Future schema deltas now land as additive
   sequence-numbered files (00002_*.sql, …) that goose applies
   automatically on backend startup; 00001_init.sql becomes an
   immutable baseline. Authoring conventions live in
   backend/internal/postgres/migrations/README.md. The chain may be
   squashed back into a fresh 00001 as a deliberate one-time
   operation before the first production deployment.

3. Document the deployment cadence. The dev environment is
   single-tenant: pushes to feature/* run the test workflows
   (go-unit, ui-test, integration) only; dev-deploy.yaml fires on
   push to development. A workflow_dispatch override on
   dev-deploy.yaml lets a developer preview a feature branch on the
   shared dev environment before merge; the next merge into
   development overwrites the manual deploy idempotently.

4. Scope compose-managed resources by an explicit
   galaxy.stack=<local-dev|dev-deploy> label. Both compose files
   stamp the label on every service, network, and named volume.
   Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
   engine-cleanup operations by (stack-label AND engine OCI title)
   so they never touch unrelated workloads on the same daemon.
   dev-deploy.yaml gains a pre-`compose up` step that reaps stale
   exited/dead containers under the dev-deploy stack label.

5. Backend now stamps the same galaxy.stack=<value> label on every
   engine container it spawns, sourced from a new BACKEND_STACK_LABEL
   env var (empty → label not applied; legacy-safe). Both compose
   files set it to their stack name (local-dev / dev-deploy). The
   contract is recorded in docs/ARCHITECTURE.md under
   "Container labels". A package-level test in
   backend/internal/runtime exercises both the label-present and
   label-absent paths.

No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 23:32:42 +02:00
Ilia Denisov 81917acc3e dev-deploy: enable Dev Sandbox bootstrap and synthetic-report loader
Tests · UI / test (push) Has been cancelled
Tests · Integration / integration (pull_request) Successful in 1m47s
Tests · Go / test (pull_request) Successful in 2m4s
Tests · UI / test (pull_request) Successful in 2m23s
Two long-standing dev-environment ergonomics had not survived the
move from the bespoke local-dev stack to the CI-driven dev-deploy:

1. `BACKEND_DEV_SANDBOX_EMAIL` defaulted to an empty string in the
   dev-deploy compose, so the auto-provisioned "Dev Sandbox" game
   never appeared on `https://www.galaxy.lan`. Bake `dev@galaxy.lan`
   as the default — matches `.env.example` and lets a developer who
   logs in with that email find a ready-to-play game in the lobby.

2. The lobby's synthetic-report loader was gated on
   `import.meta.env.DEV`, which is true only for `vite dev` (the
   tools/local-dev path). The long-lived dev environment builds
   with `vite build` (production mode), so the section was always
   stripped from its bundle. Gate it on an explicit
   `VITE_GALAXY_DEV_AFFORDANCES` flag instead and set it both in
   `.env.development` (preserves `pnpm dev` behaviour) and in the
   `dev-deploy.yaml` build step. The `prod-build.yaml` build path
   leaves the flag unset, so production stays clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:46:24 +02:00
Ilia Denisov 859b157a59 auth dev-fixed-code bypasses attempts cap; dev-deploy gains manual dispatch
Tests · Go / test (pull_request) Successful in 2m9s
Tests · Go / test (push) Successful in 2m9s
Tests · Integration / integration (pull_request) Successful in 1m49s
Tests · UI / test (pull_request) Successful in 2m51s
Two problems showed up while trying to log into the long-lived dev
environment with the dev-fixed code `123456`:

1. `ConfirmEmailCode` checked the per-challenge attempts ceiling
   *before* the dev-fixed-code override. A developer who burned past
   `ChallengeMaxAttempts` on an existing un-consumed challenge (easy
   to trigger when the throttle reuses one challenge_id) hit
   `ErrTooManyAttempts` and the UI rendered "code expired or already
   used" even though the fixed code was correct. Reorder so the
   dev-fixed-code branch runs first and bypasses both the bcrypt
   verify and the attempts gate. Production stays unaffected
   because production loaders refuse to set `DevFixedCode`.

2. `dev-deploy.yaml` only fires on push to `development`, so the
   matching docker-compose default change for
   `BACKEND_AUTH_DEV_FIXED_CODE` could not reach the running stack
   before this PR merged. Add `workflow_dispatch: {}` so a developer
   can deploy any branch — typically a feature branch under review —
   from the Gitea Actions UI without waiting for the merge.

Covered by a new `TestConfirmEmailCodeDevFixedCodeBypassesAttemptsCeiling`
integration test that burns through the ceiling with wrong codes
then proves the dev-fixed code still produces a session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:28:30 +02:00
Ilia Denisov 2a95bf4a50 ci: re-enable actions cache now that the runner serves it
Tests · UI / test (push) Successful in 2m20s
Tests · Go / test (push) Failing after 2m21s
Tests · Go / test (pull_request) Successful in 1m40s
Tests · Integration / integration (pull_request) Successful in 1m46s
Tests · UI / test (pull_request) Successful in 2m2s
The Gitea Actions cache service now answers on 10.200.0.1:43513
(post nftables fix on the runner side). Turn `cache: true` and
`cache: pnpm` back on so setup-go/setup-node can use it for
cross-job tarball caching on top of the host-persistent caches we
already rely on.

The setup-* actions still tolerate the cache being unavailable, so
this is reversible to `cache: false` if the service goes away again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:39:39 +02:00
Ilia Denisov 8058f26397 ci: drop cache: setting in setup-go/setup-node
Tests · Go / test (push) Successful in 2m21s
Tests · UI / test (push) Successful in 2m22s
Tests · Go / test (pull_request) Successful in 3m14s
Tests · Integration / integration (pull_request) Successful in 1m37s
Tests · UI / test (pull_request) Successful in 2m7s
`cache: true` (setup-go) and `cache: pnpm` (setup-node) make the
actions push and pull tarballs through the Gitea Actions cache
service at 192.168.0.222:43513. That endpoint currently does not
answer, so every workflow burns minutes per run on reserveCache
retries before the action gives up.

In host-mode the real caches live under the runner user's $HOME
(~/go/pkg/mod, ~/.cache/go-build, ~/.local/share/pnpm,
~/.cache/ms-playwright) and persist between jobs without any
actions/cache plumbing. Switching cache: off avoids the zombie
retries and uses the local disk caches the runner already has warm.

Reviving the cache service is a separate TODO. Until then this is
the simpler and faster baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:39:22 +02:00
Ilia Denisov 4a88b24f4b ci: drop GIT_SSL_NO_VERIFY now that runner is host-mode
The act_runner now executes jobs natively on the host (no per-job
container), so actions/checkout uses the host's system CA store,
which already trusts the host-Caddy root CA. The workaround that
disabled TLS verification for `git fetch` is no longer needed and
just hides legitimate cert issues if they ever appear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:04:11 +02:00
Ilia Denisov 9ebb2e7f0f ci: rename workflows for Gitea UI readability
Tests · Go / test (push) Successful in 2m31s
Tests · Integration / integration (pull_request) Successful in 2m23s
Tests · Go / test (pull_request) Successful in 2m50s
Tests · UI / test (push) Successful in 13m2s
Tests · UI / test (pull_request) Successful in 13m22s
Switches the `name:` field on every workflow to the bulleted style:

  Tests · Go            (go-unit.yaml)
  Tests · UI            (ui-test.yaml)
  Tests · Integration   (integration.yaml)
  Deploy · Dev          (dev-deploy.yaml)
  Build · Prod          (prod-build.yaml)
  Deploy · Prod         (deploy-prod.yaml)

File names stay the same so existing path filters and any URL
references continue to work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 00:22:53 +02:00
Ilia Denisov 0da360a644 dev-deploy: fix backend startup in CI
Two bugs surfaced on the first real merge into development:

1. `${{ env.HOME }}` evaluates to empty string at the workflow stage,
   so GALAXY_DEV_GAME_STATE_DIR became `/.galaxy-dev/game-state`.
   Resolve in the shell instead of YAML.

2. The compose bind-mount of GeoIP2-Country-Test.mmdb referenced a
   path inside the runner's workspace volume, which the host Docker
   daemon cannot see — it created an empty directory and the backend
   crashed with "geoip database: is a directory" in a restart loop.
   Bake the file into the backend image so dev-deploy no longer needs
   a bind-mount; local-dev compose still mounts it on top for swap-in
   during development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 00:22:16 +02:00
Ilia Denisov c6c5f3c8dd ci: skip TLS verify for actions/checkout on LAN Gitea
go-unit / test (push) Successful in 2m28s
go-unit / test (pull_request) Successful in 2m30s
integration / integration (pull_request) Successful in 2m20s
ui-test / test (push) Successful in 13m5s
ui-test / test (pull_request) Successful in 14m31s
The Gitea host serves https://gitea.iliadenisov.ru with a cert signed
by host-Caddy's internal CA, which the runner-image's CA bundle does
not trust. actions/checkout@v4 fails on `git fetch` as a result, so
every workflow on gitea.lan has been failing — visible only now that
we made gitea.lan the primary CI target.

Sets GIT_SSL_NO_VERIFY=true on every workflow as a quick fix. Safe in
practice because both endpoints sit on the same LAN. The long-term
fix is to bake the Caddy root CA into the runner image and drop this
env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:43:51 +02:00
Ilia Denisov f316952c12 ci: split workflows for linear development flow
Reshapes .gitea/workflows/ around the new main ← development ←
feature/* branching model:

- go-unit.yaml — Go unit tests, runs on push/PR matching Go paths
- ui-test.yaml — narrowed to Vitest + Playwright only (Go tests now
  live in go-unit.yaml)
- integration.yaml — testcontainers suite, fires on PR to
  development/main and on push to development
- dev-deploy.yaml — builds the stack and (re)deploys tools/dev-deploy/
  on every merge into development
- prod-build.yaml — builds prod images on push to main and uploads
  docker save bundles as artifacts (30-day retention)
- deploy-prod.yaml — workflow_dispatch placeholder for the future
  SSH-based rollout

ui-release.yaml is removed; its v* tag trigger is superseded by
prod-build.yaml plus the manual deploy-prod entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:26:46 +02:00