The previous commit stamped `galaxy.stack=<value>` on services,
volumes, and networks. Putting it on volumes/networks changes their
compose config-hash on every label revision, so `docker compose up`
tries to recreate them — which on the long-lived dev environment
either destroys the postgres data volume or deadlocks while trying
to remove `galaxy-dev-internal` with containers still bound to it.
Observed live: run #184 hung in compose recreate after the three
stateful services were stopped, with no recovery.
Containers alone are sufficient for the cleanup contract (we filter
containers, not volumes or networks). Roll back the label on volumes
and networks in both compose files and capture the rule in
docs/ARCHITECTURE.md so the next contributor does not reintroduce it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five connected cleanups across the dev/CI infrastructure:
1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
the legacy "offline workflow validator"; the per-stage CI gate now
runs on gitea.lan and the directory was only retained as a
fallback. Removing it leaves no operational dependency: backend,
gateway, and game code have no references; documentation that
pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
in this same change. Historical "Verified on local-ci run N"
markers in ui/PLAN.md are preserved unchanged.
2. Lift the pre-production single-migration rule. The rule forced
every schema delta into 00001_init.sql and required a manual
make clean-data wipe on every backward-incompatible change in
tools/dev-deploy/. Future schema deltas now land as additive
sequence-numbered files (00002_*.sql, …) that goose applies
automatically on backend startup; 00001_init.sql becomes an
immutable baseline. Authoring conventions live in
backend/internal/postgres/migrations/README.md. The chain may be
squashed back into a fresh 00001 as a deliberate one-time
operation before the first production deployment.
3. Document the deployment cadence. The dev environment is
single-tenant: pushes to feature/* run the test workflows
(go-unit, ui-test, integration) only; dev-deploy.yaml fires on
push to development. A workflow_dispatch override on
dev-deploy.yaml lets a developer preview a feature branch on the
shared dev environment before merge; the next merge into
development overwrites the manual deploy idempotently.
4. Scope compose-managed resources by an explicit
galaxy.stack=<local-dev|dev-deploy> label. Both compose files
stamp the label on every service, network, and named volume.
Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
engine-cleanup operations by (stack-label AND engine OCI title)
so they never touch unrelated workloads on the same daemon.
dev-deploy.yaml gains a pre-`compose up` step that reaps stale
exited/dead containers under the dev-deploy stack label.
5. Backend now stamps the same galaxy.stack=<value> label on every
engine container it spawns, sourced from a new BACKEND_STACK_LABEL
env var (empty → label not applied; legacy-safe). Both compose
files set it to their stack name (local-dev / dev-deploy). The
contract is recorded in docs/ARCHITECTURE.md under
"Container labels". A package-level test in
backend/internal/runtime exercises both the label-present and
label-absent paths.
No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compose stack hard-coded host ports (postgres 5433, redis 6380,
mailpit 8025, gateway REST 8080, gateway gRPC 9090) — fine for a
clean dev machine, painful when those ports collide with other
services on the same host (e.g. a `crowdsec` sitting on
127.0.0.1:8080 or a Prometheus instance on :9090).
Every host-port mapping is now `${LOCAL_DEV_*_PORT:-<old-default>}`,
so the defaults match prior behaviour for everyone and a per-host
override is a single environment variable away. `.env` carries the
overrides as commented-out lines so the customisation surface is
discoverable without grepping the compose file. README's
"Port 8080 already in use" troubleshooting entry now points at the
new variables and the optional `docker-compose.override.yml`
workflow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three fixes around the dev sandbox end-to-end path. Each one was
flushed out by an actual login walkthrough after the previous
commit.
Backend bootstrap now treats `cancelled`, `finished`, and
`start_failed` as terminal: the per-boot find-or-create skips such
games and provisions a fresh one. Without this, a single bad
shutdown cascade leaves the developer staring at a dead lobby tile
forever (cancelled games don't transition back). Covered by
TestTerminalSandboxStatus.
Tools/local-dev: stop killing engine containers in `make down`. The
runtime treats the disappearance of an engine as a real failure
(cascading the lobby game to `cancelled`); leaving the container
running across `down/up` lets the runtime reconciler re-attach on
the next boot. The teardown happens only in `make clean`, where the
DB is wiped anyway. Compose now also exposes :9090 (authenticated
EdgeGateway listener) on the host so the Vite dev proxy can reach
the Connect-Web surface, and bumps the gateway anti-abuse limits
for `public_misc` so the same surface is not blanket-rejected with
413.
Ui/frontend: the lobby's `My Games` cards are now clickable only
for the playable statuses (`running`, `paused`, `finished`). All
other statuses render as disabled buttons so a click on a draft or
cancelled game no longer drops the user on a 404 — the in-game
view at /games/:id/* doesn't exist before Phase 10 and never makes
sense for a cancelled game. Vite proxy splits the dev targets so
`/api/*` continues to talk to the REST listener and
`/galaxy.gateway.v1.EdgeGateway/*` is routed to the Connect-Web
listener via VITE_DEV_GRPC_PROXY_TARGET (defaults to :9090).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds backend/internal/devsandbox: an idempotent boot-time hook that,
when BACKEND_DEV_SANDBOX_EMAIL is set, ensures (1) the configured
engine_version row, (2) the real dev user, (3) PlayerCount-1
deterministic dummy users, (4) a private "Dev Sandbox" game with a
year-out turn schedule, (5) memberships for every participant via
the new lobby.Service.InsertMembershipDirect helper, (6) a drive of
the lifecycle to running. Re-running on a populated DB is a no-op;
partial states from earlier crashes are recovered.
tools/local-dev gains the matching env vars in .env, surfaces them
in compose, and acquires a `make build-engine` target that builds
galaxy-engine:local-dev from game/Dockerfile (a prerequisite of
`up`/`rebuild`). The compose game-state mount is changed from a
named volume to a host bind on /tmp/galaxy-game-state so backend's
bind-mount source for spawned engine containers resolves on the
docker daemon.
After `make -C tools/local-dev up`, login as dev@local.test with
the dev code 123456 and the Dev Sandbox already shows up in My
Games. Per-user behaviour for the same email survives a backend
restart.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds tools/local-dev/ with postgres + redis + mailpit + backend +
gateway plus a Make wrapper, so `make -C tools/local-dev up` brings
the full authenticated stack online and `pnpm -C ui/frontend dev`
talks to it directly. The committed `.env.development` already
points at the stack and pins the matching gateway response public
key from the dev keypair under tools/local-dev/keys/.
The backend ships a new opt-in env, BACKEND_AUTH_DEV_FIXED_CODE
(`tools/local-dev/.env` defaults it to 123456). When set,
ConfirmEmailCode accepts that literal in addition to the real
bcrypt-verified code; SendEmailCode still queues a real email so
Mailpit captures the issued code at http://localhost:8025/, and
both paths coexist. The override is rejected as non-six-digit by
config validation and emits a loud warning at backend startup.
The local-dev Dockerfiles mirror backend/Dockerfile and
gateway/Dockerfile but switch the runtime stage to alpine so
docker-compose healthchecks can wget /healthz; the gateway
Dockerfile additionally copies ui/core/ into the build context
because gateway/go.mod's `replace galaxy/core => ../ui/core` is
required to compile the gateway main.
Smoke tested:
- `make -C tools/local-dev up` boots all five services to healthy.
- send-email-code + confirm-email-code with code=123456 returns a
device_session_id; a real code in Mailpit also redeems
successfully.
- `pnpm test` 14/14, `pnpm exec playwright test` 44/44.
- `go test ./backend/internal/config/...` green.
Docs: tools/local-dev/README.md, tools/local-dev/keys/README.md,
new "Local development stack" section in ui/docs/testing.md, and a
short pointer in ui/README.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>