local-dev: auto-recreate engine containers when bind-mount disappears
After a host reboot macOS clears /private/tmp, so the per-game bind-mount source under /tmp/galaxy-game-state/<uuid> vanishes and Docker refuses to restart the long-lived engine container under `restart: unless-stopped`. The container then sits in `exited` state and the dev sandbox is unreachable until the developer manually rms it and runs `make up` twice. Fix `make -C tools/local-dev up` to heal this in one cycle: 1. `prune-broken-engines` (new make target wired into `up`) walks every container labelled `galaxy-game-engine` and removes the ones not in `running` / `restarting` state. Healthy long-lived containers survive normal up/down cycles untouched. 2. The backend now runs a single reconciliation pass before the dev-sandbox bootstrap (`Reconciler().Tick(ctx)` in main.go). Without it, bootstrap would reuse the soon-to-be-cancelled game that the periodic ticker is about to mark `removed`. The pre-tick cascades the orphan runtime row through markRemoved → lobby cancel before bootstrap purges terminal sandbox games and creates a fresh one — so a single `make up` lands a working sandbox with a brand new state directory. README troubleshooting section documents the symptom and the recovery so the bind-mount-source error message is greppable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
.PHONY: help up down logs status rebuild clean psql logs-backend logs-gateway logs-mail build-engine stop-engines wait
|
||||
.PHONY: help up down logs status rebuild clean psql logs-backend logs-gateway logs-mail build-engine stop-engines prune-broken-engines wait
|
||||
|
||||
.DEFAULT_GOAL := help
|
||||
|
||||
@@ -17,6 +17,7 @@ help:
|
||||
@echo " make rebuild Force rebuild of backend / gateway images and bring up"
|
||||
@echo " make build-engine Build the engine image $(ENGINE_IMAGE) used by the dev sandbox"
|
||||
@echo " make stop-engines Stop and remove only the per-game engine containers"
|
||||
@echo " make prune-broken-engines Remove non-running engine containers Docker can't heal (run inside 'up')"
|
||||
@echo " make clean Stop everything (incl. engines) and wipe volumes + game state"
|
||||
@echo " make logs Tail all logs"
|
||||
@echo " make logs-backend Tail only the backend logs"
|
||||
@@ -32,7 +33,7 @@ help:
|
||||
@echo "Default login for the auto-provisioned dev sandbox: dev@local.test"
|
||||
@echo "(see BACKEND_DEV_SANDBOX_EMAIL in .env). Login code: 123456."
|
||||
|
||||
up: build-engine
|
||||
up: build-engine prune-broken-engines
|
||||
$(COMPOSE) up -d --wait
|
||||
|
||||
rebuild: build-engine
|
||||
@@ -70,6 +71,34 @@ stop-engines:
|
||||
docker rm -f $$ids >/dev/null; \
|
||||
fi
|
||||
|
||||
# Remove engine containers Docker can no longer heal on its own.
|
||||
# After a host reboot, the per-game bind-mount source under
|
||||
# /tmp/galaxy-game-state/<uuid> may have been wiped (macOS clears
|
||||
# /private/tmp on reboot), so `restart: unless-stopped` cannot
|
||||
# revive the container — Docker refuses to start it with a missing
|
||||
# bind-mount source and leaves it stuck in `exited` / `created`
|
||||
# state. This target prunes the husks before `compose up`; the
|
||||
# backend's pre-bootstrap reconciler tick (`backend/cmd/backend/main.go`)
|
||||
# then cascades the orphan runtime row to `removed`, the lobby
|
||||
# cancels the game, and the dev-sandbox bootstrap purges the
|
||||
# cancelled tile and provisions a fresh sandbox in the same
|
||||
# `make up` cycle. Healthy `running` / `restarting` containers are
|
||||
# left intact so a long-lived sandbox survives normal up/down
|
||||
# cycles.
|
||||
prune-broken-engines:
|
||||
@ids=""; \
|
||||
for cid in $$(docker ps -aq --filter label=$(ENGINE_LABEL) 2>/dev/null); do \
|
||||
state=$$(docker inspect -f '{{.State.Status}}' $$cid 2>/dev/null); \
|
||||
case "$$state" in \
|
||||
running|restarting) ;; \
|
||||
*) ids="$$ids $$cid";; \
|
||||
esac; \
|
||||
done; \
|
||||
if [ -n "$$ids" ]; then \
|
||||
echo "removing non-running engine containers (post-reboot cleanup):$$ids"; \
|
||||
docker rm -f $$ids >/dev/null; \
|
||||
fi
|
||||
|
||||
logs:
|
||||
$(COMPOSE) logs -f --tail=100
|
||||
|
||||
|
||||
Reference in New Issue
Block a user