fix(dev-deploy): recycle engine containers on galaxy-engine:dev SHA drift
`backend`'s reconciler adopts pre-existing `galaxy-game-*` containers without comparing their image SHA against the freshly-built `galaxy-engine:dev`, so a long-lived sandbox would otherwise keep serving the previous engine code after a redeploy. Issue #59 surfaced this: after the per-command-rejection fix was deployed via `workflow_dispatch`, the running sandbox container was still on the old image SHA and the browser kept seeing the 503/unavailable response. Adds a `Recycle engine containers on image drift` step right before `Reap stray dev-deploy containers`. The step compares the new `galaxy-engine:dev` SHA against every running `galaxy-game-*` container and, on drift, stops the backend, removes the container, wipes the bind-mounted per-game state directory (Engine.Init() writes turn-0 over any pre-existing `turn-N` files — silent state corruption otherwise), and cascade-deletes the lobby `games` row. The `dev-sandbox` bootstrap on the next backend boot finds no live sandbox and provisions a fresh one on the new engine image. When the engine sources are unchanged, the BuildKit cache hits and the SHA stays the same — the recycle step is a no-op and the running games keep their state across the deploy. Verified end-to-end against the live dev environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -235,6 +235,29 @@ The deploy is idempotent — when the PR later merges into
|
||||
healthcheck steps, overwriting whatever the manual dispatch left
|
||||
behind. There is no separate state to clean up between the two paths.
|
||||
|
||||
### Engine image drift recycle
|
||||
|
||||
`backend` spawns one engine container per game (the long-lived "Dev
|
||||
Sandbox" plus any user-created games) and the reconciler reattaches
|
||||
to whatever it finds with the `galaxy.stack=dev-deploy` label. That
|
||||
reattach does not check the running container's image SHA against the
|
||||
freshly-built `galaxy-engine:dev` tag, so an unchanged container would
|
||||
otherwise keep serving the previous engine code after a redeploy.
|
||||
|
||||
The `dev-deploy.yaml` workflow handles this in the
|
||||
`Recycle engine containers on image drift` step. When `docker build`
|
||||
produces a new `galaxy-engine:dev` SHA, the step compares it against
|
||||
every running `galaxy-game-*` container and, for each drifted one,
|
||||
stops the backend, removes the container, wipes its bind-mounted
|
||||
state directory (Engine.Init() writes turn-0 over any pre-existing
|
||||
`turn-N` files), and cascade-deletes the lobby `games` row. The
|
||||
`dev-sandbox` bootstrap on the next backend boot finds no live
|
||||
sandbox and provisions a fresh one on the new engine image.
|
||||
|
||||
When the engine sources are unchanged, the BuildKit cache hits and
|
||||
the SHA stays the same — the recycle step is a no-op and the running
|
||||
games keep their state across the deploy.
|
||||
|
||||
## Relationship to other infrastructure
|
||||
|
||||
- `tools/local-dev/` — single-developer playground, host-port mapped,
|
||||
|
||||
Reference in New Issue
Block a user