The previous commit stamped `galaxy.stack=<value>` on services,
volumes, and networks. Putting it on volumes/networks changes their
compose config-hash on every label revision, so `docker compose up`
tries to recreate them — which on the long-lived dev environment
either destroys the postgres data volume or deadlocks while trying
to remove `galaxy-dev-internal` with containers still bound to it.
Observed live: run #184 hung in compose recreate after the three
stateful services were stopped, with no recovery.
Containers alone are sufficient for the cleanup contract (we filter
containers, not volumes or networks). Roll back the label on volumes
and networks in both compose files and capture the rule in
docs/ARCHITECTURE.md so the next contributor does not reintroduce it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five connected cleanups across the dev/CI infrastructure:
1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
the legacy "offline workflow validator"; the per-stage CI gate now
runs on gitea.lan and the directory was only retained as a
fallback. Removing it leaves no operational dependency: backend,
gateway, and game code have no references; documentation that
pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
in this same change. Historical "Verified on local-ci run N"
markers in ui/PLAN.md are preserved unchanged.
2. Lift the pre-production single-migration rule. The rule forced
every schema delta into 00001_init.sql and required a manual
make clean-data wipe on every backward-incompatible change in
tools/dev-deploy/. Future schema deltas now land as additive
sequence-numbered files (00002_*.sql, …) that goose applies
automatically on backend startup; 00001_init.sql becomes an
immutable baseline. Authoring conventions live in
backend/internal/postgres/migrations/README.md. The chain may be
squashed back into a fresh 00001 as a deliberate one-time
operation before the first production deployment.
3. Document the deployment cadence. The dev environment is
single-tenant: pushes to feature/* run the test workflows
(go-unit, ui-test, integration) only; dev-deploy.yaml fires on
push to development. A workflow_dispatch override on
dev-deploy.yaml lets a developer preview a feature branch on the
shared dev environment before merge; the next merge into
development overwrites the manual deploy idempotently.
4. Scope compose-managed resources by an explicit
galaxy.stack=<local-dev|dev-deploy> label. Both compose files
stamp the label on every service, network, and named volume.
Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
engine-cleanup operations by (stack-label AND engine OCI title)
so they never touch unrelated workloads on the same daemon.
dev-deploy.yaml gains a pre-`compose up` step that reaps stale
exited/dead containers under the dev-deploy stack label.
5. Backend now stamps the same galaxy.stack=<value> label on every
engine container it spawns, sourced from a new BACKEND_STACK_LABEL
env var (empty → label not applied; legacy-safe). Both compose
files set it to their stack name (local-dev / dev-deploy). The
contract is recorded in docs/ARCHITECTURE.md under
"Container labels". A package-level test in
backend/internal/runtime exercises both the label-present and
label-absent paths.
No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The public REST listener already exposes
`GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS`; the authenticated
Connect-Web listener on the separate gRPC port had no equivalent.
That worked in `tools/local-dev` (Vite proxy makes everything
same-origin) and would work in production once UI and gateway share
a single hostname, but the long-lived dev environment serves the
UI from `https://www.galaxy.lan` and the gateway from
`https://api.galaxy.lan` — every `/galaxy.gateway.v1.EdgeGateway/*`
fetch failed in the browser with the WebKit "Load failed" generic
message because the response carried no `Access-Control-Allow-Origin`
header. Lobby rendered as "[unknown] Load failed" with no game.
Mirror the public-REST CORS surface for the authenticated handler:
- new env `GATEWAY_AUTHENTICATED_GRPC_CORS_ALLOWED_ORIGINS`;
- new `AuthenticatedGRPCConfig.CORSAllowedOrigins` field;
- new `grpcapi.withCORS` middleware wrapping the Connect mux;
- dev-deploy stack sets the env to `https://www.galaxy.lan`.
The middleware speaks plain net/http (the Connect handler is mounted
on a ServeMux, not gin), handles preflight 204 immediately, and
exposes the Connect-Web header set the browser needs to read the
response (`Grpc-Status`, `Grpc-Message`, `Connect-Protocol-Version`).
Empty allow-list disables the middleware — production stays at
"single hostname" by default.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two long-standing dev-environment ergonomics had not survived the
move from the bespoke local-dev stack to the CI-driven dev-deploy:
1. `BACKEND_DEV_SANDBOX_EMAIL` defaulted to an empty string in the
dev-deploy compose, so the auto-provisioned "Dev Sandbox" game
never appeared on `https://www.galaxy.lan`. Bake `dev@galaxy.lan`
as the default — matches `.env.example` and lets a developer who
logs in with that email find a ready-to-play game in the lobby.
2. The lobby's synthetic-report loader was gated on
`import.meta.env.DEV`, which is true only for `vite dev` (the
tools/local-dev path). The long-lived dev environment builds
with `vite build` (production mode), so the section was always
stripped from its bundle. Gate it on an explicit
`VITE_GALAXY_DEV_AFFORDANCES` flag instead and set it both in
`.env.development` (preserves `pnpm dev` behaviour) and in the
`dev-deploy.yaml` build step. The `prod-build.yaml` build path
leaves the flag unset, so production stays clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The long-lived dev environment now opts into the bcrypt-bypass on a
fresh `up`/`rebuild` so a returning developer can sign in with `123456`
even after the matching browser session was cleared (the real emailed
code is single-use). Set the variable to an empty string in `.env` to
force real Mailpit codes (mail-flow QA).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a `GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS` env-driven allow-list
on the public REST server so the dev UI on https://www.galaxy.lan can
call https://api.galaxy.lan without the browser blocking the
cross-origin response. Defaults to empty (no CORS) so the production
posture stays closed.
The middleware mounts before route classification and anti-abuse, so
OPTIONS preflights never charge against per-class rate-limit buckets.
`tools/dev-deploy/docker-compose.yml` opts the dev gateway into a
single allowed origin (`https://www.galaxy.lan`); local-dev keeps the
defaults because Vite proxies through the same origin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With the runner in host-mode, compose bind-mount paths resolve to
real host paths the Docker daemon can see, so the GeoIP file no
longer needs to be baked into the backend image to survive CI. Bring
back the bind-mount of `pkg/geoip/test-data/.../mmdb`, matching how
local-dev sources it. Image now only carries the backend binary,
symmetric with the production `backend/Dockerfile`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced on the first real merge into development:
1. `${{ env.HOME }}` evaluates to empty string at the workflow stage,
so GALAXY_DEV_GAME_STATE_DIR became `/.galaxy-dev/game-state`.
Resolve in the shell instead of YAML.
2. The compose bind-mount of GeoIP2-Country-Test.mmdb referenced a
path inside the runner's workspace volume, which the host Docker
daemon cannot see — it created an empty directory and the backend
crashed with "geoip database: is a directory" in a restart loop.
Bake the file into the backend image so dev-deploy no longer needs
a bind-mount; local-dev compose still mounts it on top for swap-in
during development.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A docker-compose stack that hosts postgres, redis, mailpit, backend,
gateway, and an app-routing Caddy. Reachable through the host Caddy at
https://www.galaxy.lan (static SPA) and https://api.galaxy.lan (REST +
gRPC). Coexists with tools/local-dev/ and tools/local-ci/ by giving
every name (compose project, container, network, volume) a distinct
galaxy-dev-* prefix.
State is persisted in named volumes; game-state lives under
${GALAXY_DEV_GAME_STATE_DIR:-$HOME/.galaxy-dev/game-state} so the
default works for a non-root runner without sudo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>