local-dev: auto-recreate engine containers when bind-mount disappears

After a host reboot macOS clears /private/tmp, so the per-game
bind-mount source under /tmp/galaxy-game-state/<uuid> vanishes and
Docker refuses to restart the long-lived engine container under
`restart: unless-stopped`. The container then sits in `exited` state
and the dev sandbox is unreachable until the developer manually rms
it and runs `make up` twice.

Fix `make -C tools/local-dev up` to heal this in one cycle:

1. `prune-broken-engines` (new make target wired into `up`) walks
   every container labelled `galaxy-game-engine` and removes the ones
   not in `running` / `restarting` state. Healthy long-lived
   containers survive normal up/down cycles untouched.
2. The backend now runs a single reconciliation pass before the
   dev-sandbox bootstrap (`Reconciler().Tick(ctx)` in main.go).
   Without it, bootstrap would reuse the soon-to-be-cancelled game
   that the periodic ticker is about to mark `removed`. The pre-tick
   cascades the orphan runtime row through markRemoved → lobby
   cancel before bootstrap purges terminal sandbox games and creates
   a fresh one — so a single `make up` lands a working sandbox with
   a brand new state directory.

README troubleshooting section documents the symptom and the
recovery so the bind-mount-source error message is greppable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Ilia Denisov
2026-05-10 22:27:31 +02:00
parent 5a3bec5acd
commit edc9709bd6
3 changed files with 59 additions and 2 deletions
+15
View File
@@ -266,6 +266,21 @@ func run(ctx context.Context) (err error) {
)
runtimeGateway.svc = runtimeSvc
// Run a single reconciliation pass before the dev-sandbox
// bootstrap so any runtime row pointing at a vanished engine
// container (host reboot wiped /tmp/galaxy-game-state/<uuid>;
// `tools/local-dev`'s `prune-broken-engines` target reaped the
// husk) is already cascaded through `markRemoved` → lobby
// `cancelled` by the time the bootstrap walks the sandbox list.
// Without this pre-tick the bootstrap would reuse the
// soon-to-be-cancelled game and force the developer into a
// second `make up` cycle to land a healthy sandbox. Failures are
// non-fatal: the periodic ticker started later catches up, and
// the worst case degrades to the legacy two-cycle recovery.
if err := runtimeSvc.Reconciler().Tick(ctx); err != nil {
logger.Warn("pre-bootstrap reconciler tick failed", zap.Error(err))
}
if err := devsandbox.Bootstrap(ctx, devsandbox.Deps{
Users: userSvc,
Lobby: lobbySvc,