local-dev: auto-recreate engine containers when bind-mount disappears

After a host reboot macOS clears /private/tmp, so the per-game
bind-mount source under /tmp/galaxy-game-state/<uuid> vanishes and
Docker refuses to restart the long-lived engine container under
`restart: unless-stopped`. The container then sits in `exited` state
and the dev sandbox is unreachable until the developer manually rms
it and runs `make up` twice.

Fix `make -C tools/local-dev up` to heal this in one cycle:

1. `prune-broken-engines` (new make target wired into `up`) walks
   every container labelled `galaxy-game-engine` and removes the ones
   not in `running` / `restarting` state. Healthy long-lived
   containers survive normal up/down cycles untouched.
2. The backend now runs a single reconciliation pass before the
   dev-sandbox bootstrap (`Reconciler().Tick(ctx)` in main.go).
   Without it, bootstrap would reuse the soon-to-be-cancelled game
   that the periodic ticker is about to mark `removed`. The pre-tick
   cascades the orphan runtime row through markRemoved → lobby
   cancel before bootstrap purges terminal sandbox games and creates
   a fresh one — so a single `make up` lands a working sandbox with
   a brand new state directory.

README troubleshooting section documents the symptom and the
recovery so the bind-mount-source error message is greppable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Ilia Denisov
2026-05-10 22:27:31 +02:00
parent 5a3bec5acd
commit edc9709bd6
3 changed files with 59 additions and 2 deletions
+13
View File
@@ -226,6 +226,19 @@ make status docker compose ps
engine containers via the `org.opencontainers.image.title=
galaxy-game-engine` label. To stop them by hand without touching
the rest of the stack, `make stop-engines`.
- **Engine container exits with `bind source path does not exist:
/tmp/galaxy-game-state/<uuid>` after a host reboot** — macOS clears
`/private/tmp` on reboot, so the per-game state directory the
long-lived engine container bind-mounts is gone and Docker refuses
to restart it under `restart: unless-stopped`. `make up` auto-heals
this in one cycle: `prune-broken-engines` (runs as part of `up`)
removes every engine container that is not in `running` /
`restarting` state, the backend's pre-bootstrap reconciler tick
cascades the orphan runtime row to `removed`, the lobby cancels
the matching sandbox game, and the dev-sandbox bootstrap purges
the cancelled tile and provisions a fresh sandbox with a brand
new state directory. To run the cleanup by hand without restarting
the rest of the stack, `make prune-broken-engines`.
- **`make up` reports a build error mentioning `pkg/cronutil`** —
upstream module list drifted; copy any new `pkg/<name>/` line into
the local-dev `backend.Dockerfile` / `gateway.Dockerfile` to match