e038ea6154
`backend`'s reconciler adopts pre-existing `galaxy-game-*` containers without comparing their image SHA against the freshly-built `galaxy-engine:dev`, so a long-lived sandbox would otherwise keep serving the previous engine code after a redeploy. Issue #59 surfaced this: after the per-command-rejection fix was deployed via `workflow_dispatch`, the running sandbox container was still on the old image SHA and the browser kept seeing the 503/unavailable response. Adds a `Recycle engine containers on image drift` step right before `Reap stray dev-deploy containers`. The step compares the new `galaxy-engine:dev` SHA against every running `galaxy-game-*` container and, on drift, stops the backend, removes the container, wipes the bind-mounted per-game state directory (Engine.Init() writes turn-0 over any pre-existing `turn-N` files — silent state corruption otherwise), and cascade-deletes the lobby `games` row. The `dev-sandbox` bootstrap on the next backend boot finds no live sandbox and provisions a fresh one on the new engine image. When the engine sources are unchanged, the BuildKit cache hits and the SHA stays the same — the recycle step is a no-op and the running games keep their state across the deploy. Verified end-to-end against the live dev environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
261 lines
11 KiB
YAML
261 lines
11 KiB
YAML
name: Deploy · Dev
|
|
|
|
# Builds the Galaxy stack and (re)deploys it into the long-lived dev
|
|
# environment on the host running this Gitea Actions runner. Triggered
|
|
# on every merge into `development`. Branch protections on `development`
|
|
# guarantee the commit already passed `go-unit`, `ui-test`, and
|
|
# `integration` as part of the PR that produced this push, so this
|
|
# workflow does not re-run those tests — it focuses on packaging and
|
|
# rollout.
|
|
#
|
|
# `workflow_dispatch` is also accepted so a developer can deploy any
|
|
# branch (typically a feature branch under active review) into the
|
|
# shared dev environment from the Gitea Actions UI without waiting for
|
|
# the PR to merge first. The deploy job picks up whatever the chosen
|
|
# ref is — same packaging + healthcheck steps as the merge path.
|
|
|
|
on:
|
|
push:
|
|
branches:
|
|
- development
|
|
paths:
|
|
- 'backend/**'
|
|
- 'gateway/**'
|
|
- 'game/**'
|
|
- 'pkg/**'
|
|
- 'ui/**'
|
|
- 'site/**'
|
|
- 'go.work'
|
|
- 'go.work.sum'
|
|
- 'tools/dev-deploy/**'
|
|
- '.gitea/workflows/dev-deploy.yaml'
|
|
- '!**/*.md'
|
|
workflow_dispatch: {}
|
|
|
|
jobs:
|
|
deploy:
|
|
runs-on: ubuntu-latest
|
|
defaults:
|
|
run:
|
|
shell: bash
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
with:
|
|
submodules: recursive
|
|
|
|
- name: Set up Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version-file: go.work
|
|
cache: true
|
|
|
|
- name: Set up pnpm
|
|
uses: pnpm/action-setup@v4
|
|
with:
|
|
version: 11.0.7
|
|
# Install pnpm into a per-job directory so concurrent jobs on
|
|
# the shared host runner do not race on the default
|
|
# `~/setup-pnpm` (the self-installer otherwise fails with
|
|
# `ENOTEMPTY` while cleaning a sibling job's install).
|
|
dest: ${{ runner.temp }}/setup-pnpm
|
|
|
|
- name: Set up Node
|
|
uses: actions/setup-node@v4
|
|
with:
|
|
node-version: 22
|
|
cache: pnpm
|
|
cache-dependency-path: ui/pnpm-lock.yaml
|
|
|
|
- name: Install UI dependencies
|
|
working-directory: ui
|
|
run: pnpm install --frozen-lockfile
|
|
|
|
- name: Build core.wasm
|
|
uses: ./.gitea/actions/build-wasm
|
|
|
|
- name: Build UI frontend
|
|
working-directory: ui/frontend
|
|
env:
|
|
# Single-origin deployment: an empty base URL means the
|
|
# gateway shares the document origin (REST at /api, Connect at
|
|
# /rpc). The game UI is served under the /game/ base path.
|
|
VITE_GATEWAY_BASE_URL: ""
|
|
BASE_PATH: /game
|
|
# Surface the synthetic-report loader and similar dev-only
|
|
# affordances in the long-lived dev bundle. The prod build
|
|
# path (`prod-build.yaml`) leaves this flag unset so the
|
|
# production bundle keeps the same affordances stripped.
|
|
VITE_GALAXY_DEV_AFFORDANCES: "true"
|
|
run: |
|
|
# The response-signing public key is committed in
|
|
# `.env.development` alongside its private counterpart in
|
|
# `tools/local-dev/keys/`. Pull it from there at build time so
|
|
# the production-mode bundle ships the same key the dev
|
|
# gateway uses to sign.
|
|
export VITE_GATEWAY_RESPONSE_PUBLIC_KEY="$(grep -E '^VITE_GATEWAY_RESPONSE_PUBLIC_KEY=' .env.development | cut -d= -f2)"
|
|
pnpm build
|
|
|
|
- name: Install site dependencies
|
|
working-directory: site
|
|
run: pnpm install --frozen-lockfile
|
|
|
|
- name: Build project site
|
|
working-directory: site
|
|
run: pnpm build
|
|
|
|
- name: Build galaxy-engine image
|
|
working-directory: ${{ gitea.workspace }}
|
|
run: |
|
|
docker build \
|
|
-t galaxy-engine:dev \
|
|
-f game/Dockerfile \
|
|
.
|
|
|
|
- name: Build backend + gateway images
|
|
working-directory: tools/dev-deploy
|
|
run: |
|
|
docker compose build galaxy-backend galaxy-api
|
|
|
|
- name: Seed UI volume
|
|
run: |
|
|
docker volume create galaxy-dev-ui-dist >/dev/null
|
|
docker run --rm \
|
|
-v galaxy-dev-ui-dist:/dst \
|
|
-v "${{ gitea.workspace }}/ui/frontend/build:/src:ro" \
|
|
alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/'
|
|
|
|
- name: Seed site volume
|
|
run: |
|
|
docker volume create galaxy-dev-site-dist >/dev/null
|
|
docker run --rm \
|
|
-v galaxy-dev-site-dist:/dst \
|
|
-v "${{ gitea.workspace }}/site/.vitepress/dist:/src:ro" \
|
|
alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/'
|
|
|
|
- name: Seed geoip volume
|
|
run: |
|
|
# Copy the GeoIP test fixture into a named volume so the
|
|
# backend can mount it as /var/lib/galaxy. A bind-mount with
|
|
# a relative path would resolve against this runner's
|
|
# ephemeral workspace under /home/runner/.cache/act/<hash>/,
|
|
# which the runner deletes once the workflow ends — the next
|
|
# `docker restart galaxy-dev-backend` would then fail with
|
|
# "not a directory" because the mount source vanished.
|
|
docker volume create galaxy-dev-geoip-data >/dev/null
|
|
docker run --rm \
|
|
-v galaxy-dev-geoip-data:/dst \
|
|
-v "${{ gitea.workspace }}/pkg/geoip/test-data/test-data:/src:ro" \
|
|
alpine sh -c 'cp /src/GeoIP2-Country-Test.mmdb /dst/geoip.mmdb'
|
|
|
|
- name: Recycle engine containers on image drift
|
|
run: |
|
|
# Compare the freshly-built `galaxy-engine:dev` SHA against
|
|
# every running `galaxy-game-*` container. The backend
|
|
# reconciler adopts pre-existing labelled engine containers
|
|
# without checking image drift, so a running sandbox would
|
|
# otherwise keep serving the previous engine code until the
|
|
# container is recycled by hand. This step makes the recycle
|
|
# automatic but only when it is actually needed:
|
|
#
|
|
# * BuildKit cache hit on the `Build galaxy-engine image`
|
|
# step → `galaxy-engine:dev` keeps its previous SHA →
|
|
# no drift → no-op (no engine source change to deploy).
|
|
# * engine source change → fresh SHA → for each drifted
|
|
# container we stop the backend, remove the container,
|
|
# wipe its bind-mounted state directory (Engine.Init()
|
|
# writes turn-0 over any pre-existing `turn-N` files —
|
|
# silent state corruption otherwise), and cascade-delete
|
|
# the lobby `games` row (the FKs in `00001_init.sql`
|
|
# drop the matching `runtime_records`, `memberships`,
|
|
# `player_mappings`, etc. in the same write). The
|
|
# `dev-sandbox` bootstrap on the next backend boot finds
|
|
# no live sandbox and provisions a fresh one on the new
|
|
# engine image.
|
|
#
|
|
# Backend is stopped first to keep the reconciler from
|
|
# racing the recycle (mid-stream adoption / restart). The
|
|
# subsequent `Bring up the stack` step restarts it.
|
|
set -u
|
|
new_sha=$(docker image inspect galaxy-engine:dev --format '{{.Id}}')
|
|
echo "fresh galaxy-engine:dev = $new_sha"
|
|
|
|
drift=()
|
|
for c in $(docker ps --filter "name=galaxy-game-" --format '{{.Names}}'); do
|
|
cur=$(docker inspect "$c" --format '{{.Image}}')
|
|
if [ "$cur" != "$new_sha" ]; then
|
|
drift+=("${c#galaxy-game-}")
|
|
echo " drift: $c was on $cur"
|
|
else
|
|
echo " match: $c"
|
|
fi
|
|
done
|
|
if [ ${#drift[@]} -eq 0 ]; then
|
|
echo "no drift detected — recycle skipped"
|
|
else
|
|
docker stop -t 30 galaxy-dev-backend >/dev/null 2>&1 || true
|
|
state_root="$HOME/.galaxy-dev/game-state"
|
|
for gid in "${drift[@]}"; do
|
|
echo "recycling $gid"
|
|
docker rm -f "galaxy-game-$gid" >/dev/null 2>&1 || true
|
|
# Wipe the per-game state dir as root inside a throwaway
|
|
# container so we can remove files left behind by the
|
|
# engine container even when its uid differs from the
|
|
# runner's.
|
|
docker run --rm -v "$state_root:/state" alpine \
|
|
sh -c "rm -rf -- /state/$gid"
|
|
done
|
|
ids_csv=$(printf "'%s'," "${drift[@]}")
|
|
ids_csv=${ids_csv%,}
|
|
docker exec galaxy-dev-postgres psql -v ON_ERROR_STOP=1 \
|
|
-U galaxy -d galaxy_backend \
|
|
-c "DELETE FROM backend.games WHERE game_id IN (${ids_csv});"
|
|
fi
|
|
|
|
- name: Reap stray dev-deploy containers
|
|
run: |
|
|
# Remove any non-running compose-managed containers from
|
|
# earlier deploys before `compose up`. Filter by the stack
|
|
# label so we never touch unrelated workloads on the same
|
|
# daemon. Running engine containers spawned by backend with
|
|
# the same label are left intact when their image SHA still
|
|
# matches the freshly-built `galaxy-engine:dev` (handled by
|
|
# the preceding `Recycle engine containers on image drift`
|
|
# step); the reconciler reattaches them on backend boot.
|
|
ids=$(docker ps -aq \
|
|
--filter "label=galaxy.stack=dev-deploy" \
|
|
--filter "status=exited" \
|
|
--filter "status=created" \
|
|
--filter "status=dead")
|
|
if [ -n "$ids" ]; then
|
|
echo "reaping: $ids"
|
|
docker rm -f $ids
|
|
fi
|
|
|
|
- name: Bring up the stack
|
|
working-directory: tools/dev-deploy
|
|
run: |
|
|
# Resolve in the shell, not in YAML expressions — `env.HOME`
|
|
# is empty at the workflow-evaluation stage.
|
|
export GALAXY_DEV_GAME_STATE_DIR="$HOME/.galaxy-dev/game-state"
|
|
mkdir -p "$GALAXY_DEV_GAME_STATE_DIR"
|
|
docker compose up -d --wait --remove-orphans
|
|
|
|
- name: Probe the stack
|
|
run: |
|
|
set -e
|
|
# Use --resolve so the probe goes through the same routing as
|
|
# a browser on the host: the host Caddy on :443 (which has
|
|
# `tls internal`) terminates and forwards into the edge
|
|
# network. We accept the host's internal CA via -k because
|
|
# the runner image has no reason to trust it.
|
|
curl -sk --max-time 10 https://galaxy.lan/healthz \
|
|
| tee /tmp/healthz
|
|
test -s /tmp/healthz
|
|
curl -sk --max-time 10 -o /dev/null -w '%{http_code}\n' \
|
|
https://galaxy.lan/ | tee /tmp/site_status
|
|
grep -qE '^(200|304)$' /tmp/site_status
|
|
curl -sk --max-time 10 -o /dev/null -w '%{http_code}\n' \
|
|
https://galaxy.lan/game/ | tee /tmp/game_status
|
|
grep -qE '^(200|304)$' /tmp/game_status
|