Files
galaxy-game/tools/local-dev/README.md
T
Ilia Denisov edc9709bd6 local-dev: auto-recreate engine containers when bind-mount disappears
After a host reboot macOS clears /private/tmp, so the per-game
bind-mount source under /tmp/galaxy-game-state/<uuid> vanishes and
Docker refuses to restart the long-lived engine container under
`restart: unless-stopped`. The container then sits in `exited` state
and the dev sandbox is unreachable until the developer manually rms
it and runs `make up` twice.

Fix `make -C tools/local-dev up` to heal this in one cycle:

1. `prune-broken-engines` (new make target wired into `up`) walks
   every container labelled `galaxy-game-engine` and removes the ones
   not in `running` / `restarting` state. Healthy long-lived
   containers survive normal up/down cycles untouched.
2. The backend now runs a single reconciliation pass before the
   dev-sandbox bootstrap (`Reconciler().Tick(ctx)` in main.go).
   Without it, bootstrap would reuse the soon-to-be-cancelled game
   that the periodic ticker is about to mark `removed`. The pre-tick
   cascades the orphan runtime row through markRemoved → lobby
   cancel before bootstrap purges terminal sandbox games and creates
   a fresh one — so a single `make up` lands a working sandbox with
   a brand new state directory.

README troubleshooting section documents the symptom and the
recovery so the bind-mount-source error message is greppable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 22:27:31 +02:00

274 lines
13 KiB
Markdown

# `tools/local-dev/` — Galaxy local development stack
A docker-compose stack that brings up postgres + redis + mailpit +
backend + gateway so the UI Vite dev server (run on the host) can
talk to a real authenticated stack without any cloud dependency.
The stack is the recommended baseline for UI work that goes beyond
the mocked Playwright fixtures: every payload exercises the real
FlatBuffers wire, every authenticated call verifies the response
signature against the dev keypair, and every email passes through
Mailpit's web UI for inspection.
This stack is **not** a CI gate (that role belongs to
[`tools/local-ci/`](../local-ci/README.md), which boots a Gitea +
Actions runner and replays workflow files). The two stacks are
independent and can coexist on the same machine; they bind different
ports and use different networks.
## Bring it up
```sh
make -C tools/local-dev up
```
`up` builds the local-dev backend and gateway images on first run
(pulls postgres, redis, mailpit), waits for every service to report
healthy, and returns. Subsequent invocations reuse the built images.
After the stack is healthy:
```sh
pnpm -C ui/frontend dev
```
Open <http://localhost:5173> for the UI and
<http://localhost:8025> for Mailpit.
The first `make up` builds the engine image (`galaxy-engine:local-dev`)
from `game/Dockerfile`. Subsequent invocations skip the build when the
image already exists; force a rebuild with `docker rmi galaxy-engine:local-dev`
followed by `make build-engine`.
## Daily flow
```sh
make -C tools/local-dev up # bring up (idempotent, fast on warm cache)
pnpm -C ui/frontend dev # in another terminal
# ...edit UI, browse, repeat...
make -C tools/local-dev down # stop containers, keep state
```
State persists in named Docker volumes between `up`/`down` cycles, so
games created on Tuesday survive into Wednesday. Wipe with
`make clean` when you want a fresh database.
## Logging in
Two paths coexist by default:
1. **Fixed dev code (fast).** `tools/local-dev/.env` ships
`BACKEND_AUTH_DEV_FIXED_CODE=123456`. After requesting a code in
the UI, type `123456``ConfirmEmailCode` accepts that literal
in addition to the real bcrypt-verified code stored on the
challenge row. The override emits a loud warning at backend boot
and is rejected by the production env loader (`BACKEND_ENV` guard
in `backend/internal/config`).
2. **Real Mailpit code.** Open <http://localhost:8025>, find the
most recent message, copy the six-digit code, paste it into the
UI. This exercises the full mail outbox path, including SMTP
handoff and gomail TLS-mode handling.
To force the second path (no fast-bypass), edit
`tools/local-dev/.env` and clear `BACKEND_AUTH_DEV_FIXED_CODE`, then
`make rebuild` (or simply `docker compose up -d backend` to recreate
the backend with the new env).
## Auto-provisioned dev sandbox
`make up` provisions a private game called **Dev Sandbox** owned by
the dev user (default `dev@local.test`). The flow is implemented in
`backend/internal/devsandbox` and runs on every backend boot when
`BACKEND_DEV_SANDBOX_EMAIL` is non-empty in `tools/local-dev/.env`.
Bootstrap is idempotent — re-running `make up` after a `make down`
finds the existing user, dummy participants, game, and memberships
without creating duplicates. If a previous boot crashed mid-way
(game stuck in `enrollment_open` or `ready_to_start`), the next boot
resumes the lifecycle.
To log in straight into the sandbox:
1. `make -C tools/local-dev up`
2. `pnpm -C ui/frontend dev` (in another terminal)
3. Open <http://localhost:5173/login>, enter `dev@local.test`, then
the dev code `123456`.
4. The lobby shows **Dev Sandbox** in *My Games*; click in.
To disable the bootstrap, clear `BACKEND_DEV_SANDBOX_EMAIL` in
`tools/local-dev/.env` and `docker compose up -d backend` (or
`make rebuild`). Existing users / games are not removed.
Terminal sandbox games — anything in `cancelled`, `finished`, or
`start_failed` — are deleted on every boot before find-or-create
runs. The cascade declared in `00001_init.sql` removes the
matching memberships, applications, invites, runtime records,
and player mappings in the same write, so the dev user's lobby
shows exactly one running tile at all times. Cancelling the
sandbox manually and running `docker compose restart backend`
(or `make rebuild`) yields a fresh game without leaving dead
tiles behind.
The bootstrap requires:
- `galaxy-engine:local-dev` Docker image (`make build-engine`).
- `BACKEND_DEV_SANDBOX_ENGINE_VERSION` parses as plain semver
(`MAJOR.MINOR.PATCH`); the default `0.1.0` is what the bootstrap
registers in the `engine_versions` row that points at the image.
- `BACKEND_DEV_SANDBOX_PLAYER_COUNT` ≥ 20 (the engine's minimum;
19 deterministic dummies fill the slots so the single real user
can start the game).
- A frozen turn schedule (`0 0 1 1 *` — once a year) so the visible
game state stays at turn 1 until you explicitly progress it.
## Network map
```
host compose network "galaxy-local-dev-net"
┌────────────────────────────────┐ ┌──────────────────────────────┐
│ browser localhost:5173 │── pnpm dev (Vite, host) ──┐ │
│ ↳ /api/* proxied ───┼──────────────────────────▶│ gateway:8080 │
│ ↳ /galaxy.gateway... ┼──────────────────────────▶│ │
│ browser localhost:8025 │─────────────────────────▶│ mailpit:8025 │
│ psql localhost:5433 │─────────────────────────▶│ postgres:5432 │
│ redis-cli localhost:6380 │─────────────────────────▶│ redis:6379 │
└────────────────────────────────┘ │ ↳ backend:8080 (HTTP) │
│ ↳ backend:8081 (gRPC push) │
│ ↳ mailpit:1025 (SMTP in) │
└────────────────────────────────┘
```
Vite's dev server proxies `/api` and `/galaxy.gateway.v1.EdgeGateway`
to the gateway, so every browser request stays same-origin (no CORS
preflight). The gateway is therefore reachable only through Vite at
<http://localhost:5173>, not at <http://localhost:8080> from the
browser tab. Direct curl/wget against <http://localhost:8080> still
works for diagnostic probes — only the browser-side requests are
proxied.
Mailpit (8025), postgres (5433), and redis (6380) remain directly
reachable for diagnostics (`make psql`, `redis-cli -h localhost -p
6380 -a galaxy-dev`).
To point the proxy at a non-local gateway, run
`VITE_DEV_PROXY_TARGET=http://gateway.host:8080 pnpm -C ui/frontend dev`
— no compose changes needed.
## Refreshing after Go-side changes
`make up` reuses any pre-built images and, by default, only rebuilds
the engine image (`build-engine`) when the tag is missing. Touching
backend or gateway code (handlers, routes, transcoders, model
constants) **does not** trigger a rebuild on its own — the next
`docker compose up -d` will reattach to the stale image and the
new behaviour silently disappears. After any change under
`backend/`, `gateway/`, `pkg/`, or the FBS schemas, force a
rebuild:
```sh
make -C tools/local-dev rebuild
```
`rebuild` runs `compose build --no-cache backend gateway` followed
by `up -d --wait`, so the next request through the stack hits the
new code. Engine code lives in a separate image — touch the engine
and run `make stop-engines` plus `docker rmi galaxy-engine:local-dev`
before `make up` (or `make build-engine`) so per-game containers
respawn from the freshly built layers.
## Make targets
```text
make up Bring up the stack (build engine + compose images if needed) and wait for health
make rebuild Rebuild the backend / gateway images (ignores cache)
make build-engine Build galaxy-engine:local-dev from game/Dockerfile (no-op if image already present)
make down Stop containers, keep volumes
make clean Stop and wipe volumes (postgres + game-state)
make logs Tail every service's logs
make logs-backend Tail backend only
make logs-gateway Tail gateway only
make logs-mail Tail mailpit only
make psql Open a psql shell as galaxy@galaxy_backend
make status docker compose ps
```
## Files
- `docker-compose.yml` — five services: postgres, redis, mailpit,
backend, gateway, plus shared network and volumes.
- `backend.Dockerfile`, `gateway.Dockerfile` — local-dev runtime
images built on alpine (so `wget` is available for the compose
healthchecks). The build stage mirrors `backend/Dockerfile` and
`gateway/Dockerfile` exactly.
- `Makefile` — wrapper over `docker compose` that keeps the muscle
memory close to `tools/local-ci/`'s Makefile.
- `.env` — committed defaults for the compose `${VAR:-}`
expansions. Edit per-developer or override via your shell.
- `keys/gateway-response.pem`, `keys/gateway-response.pub` — dev-only
Ed25519 keypair used by the gateway for response signing. Pairs
with the `VITE_GATEWAY_RESPONSE_PUBLIC_KEY` value in
`ui/frontend/.env.development`. See `keys/README.md` before
rotating.
- `keys/regenerate.go` — one-shot Go helper that regenerates the
pair and prints the new base64 public key.
## Troubleshooting
- **Lobby shows "no games yet" after `make clean && make up`** —
the browser still holds a keypair + device session bound to the
user_id from the previous DB. The new user has the same email
(`dev@local.test`) but a fresh user_id, so the old keypair
authenticates against a session row that no longer exists or
points at the wrong account. Open the page in an incognito
window, or wipe site data for `localhost:5173` (DevTools →
Application → Storage → Clear site data) and log in again.
- **`make down` leaves a `galaxy-game-…` container behind** — fixed
in this Makefile: `make down` and `make clean` now stop spawned
engine containers via the `org.opencontainers.image.title=
galaxy-game-engine` label. To stop them by hand without touching
the rest of the stack, `make stop-engines`.
- **Engine container exits with `bind source path does not exist:
/tmp/galaxy-game-state/<uuid>` after a host reboot** — macOS clears
`/private/tmp` on reboot, so the per-game state directory the
long-lived engine container bind-mounts is gone and Docker refuses
to restart it under `restart: unless-stopped`. `make up` auto-heals
this in one cycle: `prune-broken-engines` (runs as part of `up`)
removes every engine container that is not in `running` /
`restarting` state, the backend's pre-bootstrap reconciler tick
cascades the orphan runtime row to `removed`, the lobby cancels
the matching sandbox game, and the dev-sandbox bootstrap purges
the cancelled tile and provisions a fresh sandbox with a brand
new state directory. To run the cleanup by hand without restarting
the rest of the stack, `make prune-broken-engines`.
- **`make up` reports a build error mentioning `pkg/cronutil`** —
upstream module list drifted; copy any new `pkg/<name>/` line into
the local-dev `backend.Dockerfile` / `gateway.Dockerfile` to match
`backend/Dockerfile` / `gateway/Dockerfile`.
- **Gateway exits at boot with "redis: …"** — the redis container is
still bootstrapping. `make up --wait` waits for healthchecks; if
it times out, increase `start_period` in the gateway service or
inspect `make logs-redis`.
- **Login form rejects every code** — confirm
`BACKEND_AUTH_DEV_FIXED_CODE` is set in `tools/local-dev/.env` and
the backend has been recreated since the last edit
(`docker compose up -d backend`). Real Mailpit codes work
regardless.
- **UI talks to old gateway**: Vite caches `import.meta.env` at boot.
Restart `pnpm dev` after editing
`ui/frontend/.env.development.local`.
- **Port 8080 already in use** — stop the conflicting service or
edit the host-side mapping in `docker-compose.yml` (gateway's
`ports:` entry) plus the matching `VITE_GATEWAY_BASE_URL` in
`ui/frontend/.env.development.local`.
## Relationship to other infrastructure
- `tools/local-ci/` — Gitea + Actions runner, replays
`.gitea/workflows/*` against a pushed branch. Different stack,
different purpose; coexists with local-dev on the same machine.
- `integration/testenv/` — testcontainers harness used by
`make -C integration integration`. Uses the same images
(`backend/Dockerfile`, `gateway/Dockerfile`) at production
defaults; do not confuse with this local-dev stack, which carries
alpine-runtime images for ergonomics and the dev-mode auth
override.