a338ebf058
Tests · Integration / integration (pull_request) Successful in 1m37s
Root cause for the long-standing "Dev Sandbox flips to cancelled after dev-deploy" symptom in push-triggered cycles: when `integration.yaml` runs in parallel with `dev-deploy.yaml`, its `integration/scripts/preclean.sh` issues a `docker rm -f` over every container labelled `galaxy.backend=1`. That label is stamped by the backend's runtime adapter on every engine it spawns — including the engines living in the long-lived dev-deploy environment on the same Docker daemon. Each post-merge auto-deploy therefore had the integration preclean wipe the dev-sandbox engine, and the new backend's reconciler tick observed `container disappeared` and cascaded the sandbox into `cancelled`. Fix: - `integration/testenv/backend.go` now sets `BACKEND_STACK_LABEL=integration` on every backend-under-test, so the engines spawned by integration carry `galaxy.stack=integration` in addition to `galaxy.backend=1`. The backend support for this env was added in the previous CI tidy-up PR (#13). - `integration/scripts/preclean.sh` gains a multi-label AND filter helper and uses it to scope engine cleanup to the combination `galaxy.backend=1 AND galaxy.stack=integration`. dev-deploy and local-dev engines carry different `galaxy.stack` values, so the AND match leaves them alone. - `docs/ARCHITECTURE.md` "Container labels" — refreshed to call out the AND-scoping rule and the new integration backend stamp. - `tools/dev-deploy/KNOWN-ISSUES.md` — the sandbox-cancel entry gets an "Update" section recording the root cause and the fix; the status is downgraded to "partially fixed" because the solo `workflow_dispatch` reproduction (which does NOT trigger integration) remains unexplained. - `tools/dev-deploy/KNOWN-ISSUES.md` — separately, document the `docker restart galaxy-dev-backend` failure caused by the runner-workspace bind-mount that surfaced while diagnosing this issue. Workaround: `make -C tools/dev-deploy up` from the persistent checkout. Real fix is a follow-up (bake fixture into image or copy to named volume). Verification: - `go build ./backend/... ./integration/...` — clean. - `bash -n integration/scripts/preclean.sh` — syntax OK. - Live AND-filter check on the dev host: `docker ps -aq --filter label=galaxy.backend=1 --filter label=galaxy.stack=integration` returns nothing while the dev-deploy engine `galaxy-game-80f3ce86-...` keeps running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
100 lines
3.6 KiB
Bash
Executable File
100 lines
3.6 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Pre-run cleanup for galaxy/integration. Idempotent and safe to call
|
|
# repeatedly; runs before each integration test session to wipe state
|
|
# left over from earlier runs.
|
|
#
|
|
# What we touch:
|
|
# 1. Containers labelled `org.testcontainers=true` — every container
|
|
# brought up by testcontainers-go (our backend/gateway/game plus
|
|
# postgres/redis/mailpit/ryuk service containers).
|
|
# 2. Containers labelled `galaxy.backend=1` AND
|
|
# `galaxy.stack=integration` — engine instances spawned by the
|
|
# backend-under-test on the host Docker daemon (see
|
|
# `backend/internal/dockerclient/types.go` and the
|
|
# `BACKEND_STACK_LABEL=integration` env in
|
|
# `integration/testenv/backend.go`). The stack-label filter is
|
|
# what keeps dev-deploy / local-dev engines on the same host
|
|
# safe — they carry `galaxy.backend=1` too but a different
|
|
# `galaxy.stack` value, so the AND match leaves them alone.
|
|
# 3. Networks labelled `org.testcontainers=true` — networks created
|
|
# by testcontainers-go for cross-container wiring.
|
|
# 4. Images labelled `galaxy.test.kind=integration-image` — local
|
|
# builds of galaxy/{backend,gateway,game}:integration. Pulled
|
|
# service images (postgres, redis, ryuk, mailpit) are NOT touched
|
|
# so the cache stays warm between runs.
|
|
#
|
|
# What we never touch:
|
|
# - Containers / images without one of the labels above.
|
|
# - User-managed images and volumes.
|
|
# - dev-deploy / local-dev engines (they share the `galaxy.backend=1`
|
|
# label, but their `galaxy.stack` value differs from `integration`).
|
|
|
|
set -euo pipefail
|
|
|
|
remove_containers_with_label() {
|
|
local description="${!#}"
|
|
local labels=("${@:1:$#-1}")
|
|
local filter_args=()
|
|
local label
|
|
for label in "${labels[@]}"; do
|
|
filter_args+=("--filter" "label=$label")
|
|
done
|
|
local ids
|
|
ids=$(docker ps -aq "${filter_args[@]}" 2>/dev/null || true)
|
|
if [ -z "$ids" ]; then
|
|
return
|
|
fi
|
|
local count
|
|
count=$(printf '%s\n' "$ids" | wc -l | tr -d ' ')
|
|
echo "preclean: removing $count $description"
|
|
# shellcheck disable=SC2086
|
|
docker rm -f $ids >/dev/null 2>&1 || true
|
|
}
|
|
|
|
remove_networks_with_label() {
|
|
local label="$1"
|
|
local description="$2"
|
|
local ids
|
|
ids=$(docker network ls -q --filter "label=$label" 2>/dev/null || true)
|
|
if [ -z "$ids" ]; then
|
|
return
|
|
fi
|
|
local count
|
|
count=$(printf '%s\n' "$ids" | wc -l | tr -d ' ')
|
|
echo "preclean: removing $count $description"
|
|
# shellcheck disable=SC2086
|
|
docker network rm $ids >/dev/null 2>&1 || true
|
|
}
|
|
|
|
remove_images_with_label() {
|
|
local label="$1"
|
|
local description="$2"
|
|
local ids
|
|
ids=$(docker images -q --filter "label=$label" 2>/dev/null || true)
|
|
if [ -z "$ids" ]; then
|
|
return
|
|
fi
|
|
local count
|
|
count=$(printf '%s\n' "$ids" | sort -u | wc -l | tr -d ' ')
|
|
echo "preclean: removing $count $description"
|
|
# shellcheck disable=SC2086
|
|
docker rmi -f $ids >/dev/null 2>&1 || true
|
|
}
|
|
|
|
if ! command -v docker >/dev/null 2>&1; then
|
|
echo "preclean: docker CLI not found, nothing to do" >&2
|
|
exit 0
|
|
fi
|
|
|
|
if ! docker info >/dev/null 2>&1; then
|
|
echo "preclean: docker daemon unreachable, nothing to do" >&2
|
|
exit 0
|
|
fi
|
|
|
|
remove_containers_with_label "org.testcontainers=true" "testcontainers-managed containers"
|
|
remove_containers_with_label "galaxy.backend=1" "galaxy.stack=integration" "integration-owned engine containers"
|
|
remove_networks_with_label "org.testcontainers=true" "testcontainers-managed networks"
|
|
remove_images_with_label "galaxy.test.kind=integration-image" "integration-built images"
|
|
|
|
echo "preclean: done"
|