Commit Graph

3 Commits

Author SHA1 Message Date
Ilia Denisov 49f614926a KNOWN-ISSUES: park sandbox-cancel; owner rejected host-side hypotheses
After the live investigation, the project owner confirms that none
of the host-side cleanup paths apply: no docker prune cron, no
manual `docker rm`, no `dockerd` restart in the window, and the
engine binary does not crash while idling on API calls.

Replace the host-side hypothesis list with a one-line note that
they were considered and rejected, narrow the open suspicion to
the `dev-deploy.yaml` job sequence (`docker build` + `docker
compose build` + the alpine `docker run --rm` for UI seeding +
`docker compose up -d --wait --remove-orphans`), and park the
entry. Reopen if the symptom recurs with a fresh
`docker events --since 0` capture armed before the deploy
starts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:16:51 +02:00
Ilia Denisov cadb72b412 KNOWN-ISSUES: rule out compose orphan reap; narrow to host-side reap
Tests · UI / test (push) Successful in 2m36s
Tests · Go / test (push) Successful in 2m38s
A live `docker inspect` of an engine container and two redispatch
runs with `docker events` captured confirm:

- Engine has no `com.docker.compose.*` labels and `AutoRemove=false`,
  so `--remove-orphans` cannot reap it.
- Two consecutive `dev-deploy.yaml` redispatches with an engine
  already running emitted `die` / `destroy` events only for
  `galaxy-dev-{backend,api,caddy}` — never for the engine.
- The reconciler tick that fires 60s after backend recreate
  correctly matched the surviving engine in both cases
  (`status=running` in both `games` and `runtime_records`).
- `runtime.Service` has no `Shutdown` that proactively removes
  engine containers, so a graceful backend exit also leaves them
  alone.

The repro window therefore needs a separate trigger that removed
the engine container outside of compose. The new hypotheses point
at host-side `docker prune` jobs, a `dockerd` restart that lost the
container, or an early `Engine.Init` failure that exited the engine
before `status=running` reached the runtime row. The investigation
list now leads with `journalctl -u docker` and the host crontab —
those are the cheapest checks to confirm or rule out next.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:10:13 +02:00
Ilia Denisov 5177fef2ef tools/dev-deploy: log the sandbox-cancellation TODO
Capture the diagnostic notes for the issue we hit after every
`dev-deploy.yaml` redispatch: the freshly-bootstrapped "Dev Sandbox"
game ends up `cancelled` ~15 minutes later, with the runtime
reconciler reporting "container disappeared". The engine never
shows up in `docker ps -a --filter label=galaxy-game-engine`, so
either it never spawned or it was removed before any host-side
snapshot.

`KNOWN-ISSUES.md` records the symptom, the log excerpt, three
working hypotheses (runtime spawn race, `--remove-orphans`
interaction, engine `--rm` lifecycle), and the investigation
checklist before opening an issue. The README gets a one-line
pointer so future redeploys land on the doc immediately.

No code change — this is the placeholder so the next person
investigating the cancellation pattern does not have to
rediscover the diagnostic from scratch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:56:25 +02:00