chore(dev-deploy): KNOWN-ISSUES entry for sandbox-cancel after redispatch #12
Reference in New Issue
Block a user
Delete Branch "chore/dev-sandbox-cancel-todo"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Captures the diagnostic notes for the long-running dev environment issue where the auto-provisioned Dev Sandbox game flips to
cancelled~15 minutes after adev-deploy.yamlredispatch, with noengine spawnedlog between bootstrap and the reconcile cancel and the engine container missing fromdocker ps -aentirely.This is a docs-only change.
tools/dev-deploy/KNOWN-ISSUES.mdrecords:AutoRemove=false, noruntime.Service.Shutdownreaps engines on backend exit, and two controlled redispatches underdocker eventscapture failed to reproduce the destroy.docker prunecron, manualdocker rm,dockerdrestart, idle-state engine crash) rejected by the project owner after the investigation.dev-deploy.yamlCI job sequence, plus the concrete next step (adocker events --since 0capture armed before the deploy starts).Status: parkednote. The bug is mildly disruptive and atools/dev-deploy/rewrite is on the medium-term list; reopen if the symptom recurs with a fresh trace attached.No code changes.
A live `docker inspect` of an engine container and two redispatch runs with `docker events` captured confirm: - Engine has no `com.docker.compose.*` labels and `AutoRemove=false`, so `--remove-orphans` cannot reap it. - Two consecutive `dev-deploy.yaml` redispatches with an engine already running emitted `die` / `destroy` events only for `galaxy-dev-{backend,api,caddy}` — never for the engine. - The reconciler tick that fires 60s after backend recreate correctly matched the surviving engine in both cases (`status=running` in both `games` and `runtime_records`). - `runtime.Service` has no `Shutdown` that proactively removes engine containers, so a graceful backend exit also leaves them alone. The repro window therefore needs a separate trigger that removed the engine container outside of compose. The new hypotheses point at host-side `docker prune` jobs, a `dockerd` restart that lost the container, or an early `Engine.Init` failure that exited the engine before `status=running` reached the runtime row. The investigation list now leads with `journalctl -u docker` and the host crontab — those are the cheapest checks to confirm or rule out next. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>