fix(dev-deploy): seed geoip onto a named volume

`docker restart galaxy-dev-backend` failed with "not a directory"
after every dev-deploy workflow run. Root cause: the compose file
bind-mounted the geoip database via a relative path
(`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`).
When the Gitea runner invoked `docker compose up`, the path
resolved against the runner's ephemeral workspace under
`/home/runner/.cache/act/<hash>/hostexecutor/...`. The bind source
baked into the running container therefore pointed at that
ephemeral path; the runner deleted the workspace once the workflow
finished, and any later `docker restart` could not remount.

Replace the bind with a named volume `galaxy-dev-geoip-data`,
seeded at deploy time:

- `tools/dev-deploy/docker-compose.yml`: mount
  `galaxy-dev-geoip-data:/var/lib/galaxy:ro` instead of a relative
  bind. Declare the volume in the top-level `volumes:` block.

- `.gitea/workflows/dev-deploy.yaml`: new `Seed geoip volume` step
  (placed right after the existing UI-volume seed) copies the
  fixture from `pkg/geoip/test-data/test-data/` into the named
  volume via an ephemeral alpine container, the same pattern UI
  seeding already uses.

- `tools/dev-deploy/Makefile`: new `seed-geoip` target performs
  the same copy from the persistent checkout. `up` and `rebuild`
  now depend on it, so a hand-run `make -C tools/dev-deploy up`
  populates the volume without operator action.

- `tools/dev-deploy/README.md`: updated the make-targets table to
  list `seed-geoip`.

- `tools/dev-deploy/KNOWN-ISSUES.md`: the entry for the restart
  failure is downgraded to a "fixed" postmortem; the symptom,
  cause, and where the fix lives are kept for future reference.

Verification on the dev host (this branch checked out):

  $ make -C tools/dev-deploy up                # populates the volume, brings stack healthy
  $ docker restart galaxy-dev-backend          # used to error "not a directory"
  $ until [ "$(docker inspect -f '{{.State.Health.Status}}' galaxy-dev-backend)" = "healthy" ]; do sleep 2; done
  $ echo "ok"                                   # backend up 6s, healthy

The pre-existing sandbox engine `galaxy-game-80f3ce86-...` survived
both `make up` and `docker restart` untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ilia Denisov
2026-05-19 01:59:38 +02:00
parent d19aa3aac5
commit f70258849f
5 changed files with 68 additions and 33 deletions
+15
View File
@@ -104,6 +104,21 @@ jobs:
-v "${{ gitea.workspace }}/ui/frontend/build:/src:ro" \ -v "${{ gitea.workspace }}/ui/frontend/build:/src:ro" \
alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/' alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/'
- name: Seed geoip volume
run: |
# Copy the GeoIP test fixture into a named volume so the
# backend can mount it as /var/lib/galaxy. A bind-mount with
# a relative path would resolve against this runner's
# ephemeral workspace under /home/runner/.cache/act/<hash>/,
# which the runner deletes once the workflow ends — the next
# `docker restart galaxy-dev-backend` would then fail with
# "not a directory" because the mount source vanished.
docker volume create galaxy-dev-geoip-data >/dev/null
docker run --rm \
-v galaxy-dev-geoip-data:/dst \
-v "${{ gitea.workspace }}/pkg/geoip/test-data/test-data:/src:ro" \
alpine sh -c 'cp /src/GeoIP2-Country-Test.mmdb /dst/geoip.mmdb'
- name: Reap stray dev-deploy containers - name: Reap stray dev-deploy containers
run: | run: |
# Remove any non-running compose-managed containers from # Remove any non-running compose-managed containers from
+22 -27
View File
@@ -162,9 +162,12 @@ redeploys can short-circuit the diagnostic loop.
## `docker restart galaxy-dev-backend` fails after the CI runner cleans up ## `docker restart galaxy-dev-backend` fails after the CI runner cleans up
**Status: fixed (2026-05-19).** Kept here as a postmortem in case
the symptom resurfaces in a different form.
### Symptom ### Symptom
`docker restart galaxy-dev-backend` from the host fails with: `docker restart galaxy-dev-backend` from the host failed with:
```text ```text
Error response from daemon: ... error mounting Error response from daemon: ... error mounting
@@ -172,36 +175,28 @@ Error response from daemon: ... error mounting
to rootfs at "/var/lib/galaxy/geoip.mmdb": ... not a directory to rootfs at "/var/lib/galaxy/geoip.mmdb": ... not a directory
``` ```
The container ends up `Exited (127)` and never comes back. The container ended up `Exited (127)` and never came back.
### Cause ### Cause
`tools/dev-deploy/docker-compose.yml` mounts the geoip database via `tools/dev-deploy/docker-compose.yml` used to mount the geoip
a path relative to the compose file database via a path relative to the compose file
(`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`). When (`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`). When
the `dev-deploy.yaml` Gitea runner invokes `docker compose up` it the `dev-deploy.yaml` Gitea runner invoked `docker compose up`, it
resolves that relative path against the runner's ephemeral workspace resolved that relative path against the runner's ephemeral workspace
under `/home/runner/.cache/act/<hash>/hostexecutor/tools/dev-deploy/`, under `/home/runner/.cache/act/<hash>/hostexecutor/tools/dev-deploy/`,
so the bind-mount source baked into the running container points at so the bind-mount source baked into the running container pointed at
that ephemeral path. The runner deletes the workspace once the that ephemeral path. The runner deleted the workspace once the
workflow ends, the source disappears, and the next `docker restart` workflow ended, the source disappeared, and the next `docker restart`
fails to remount it. failed to remount it.
### Workaround ### Fix
Bring the stack back up from a stable workspace, which re-binds the Replaced the bind-mount with a named volume,
mount source to the persistent checkout: `galaxy-dev-geoip-data`, seeded by the `dev-deploy.yaml` workflow
(and by the new `make seed-geoip` target) at deploy time. The
```sh backend mounts the volume as `/var/lib/galaxy:ro`, so the bind
make -C tools/dev-deploy up source is a Docker-managed volume — independent of the runner
``` workspace — and survives a `docker restart`. See
`.gitea/workflows/dev-deploy.yaml` ("Seed geoip volume" step) and
This restarts every service (including the broken `galaxy-dev-backend`) `tools/dev-deploy/Makefile` (`seed-geoip` target).
with a stable source path.
### Status
Open. The clean fix is either to bake the geoip test fixture into
the backend image (no host bind-mount) or to copy it onto a named
volume during `dev-deploy.yaml` and bind that instead. Either change
removes the runner-workspace dependency entirely.
+18 -4
View File
@@ -1,4 +1,4 @@
.PHONY: help up down rebuild logs status clean-data health psql build-engine seed-ui .PHONY: help up down rebuild logs status clean-data health psql build-engine seed-ui seed-geoip
.DEFAULT_GOAL := help .DEFAULT_GOAL := help
@@ -18,10 +18,11 @@ COMPOSE := docker compose
help: help:
@echo "Long-lived Galaxy dev environment (https://*.galaxy.lan):" @echo "Long-lived Galaxy dev environment (https://*.galaxy.lan):"
@echo " make up Build images, ensure engine image, bring stack up" @echo " make up Build images, ensure engine image, seed geoip, bring stack up"
@echo " make rebuild Force rebuild of backend / gateway images and bring up" @echo " make rebuild Force rebuild of backend / gateway images and bring up"
@echo " make build-engine Build $(ENGINE_IMAGE) from game/Dockerfile (no-op if present)" @echo " make build-engine Build $(ENGINE_IMAGE) from game/Dockerfile (no-op if present)"
@echo " make seed-ui Build ui/frontend and load into galaxy-dev-ui-dist volume" @echo " make seed-ui Build ui/frontend and load into galaxy-dev-ui-dist volume"
@echo " make seed-geoip Copy GeoIP fixture into galaxy-dev-geoip-data volume"
@echo " make down Stop containers, keep named volumes" @echo " make down Stop containers, keep named volumes"
@echo " make logs Tail all logs" @echo " make logs Tail all logs"
@echo " make status docker compose ps" @echo " make status docker compose ps"
@@ -35,11 +36,11 @@ help:
@echo " - host Caddy proxying *.galaxy.lan into that network" @echo " - host Caddy proxying *.galaxy.lan into that network"
@echo " - game-state dir: $(GALAXY_DEV_GAME_STATE_DIR) (auto-created)" @echo " - game-state dir: $(GALAXY_DEV_GAME_STATE_DIR) (auto-created)"
up: build-engine up: build-engine seed-geoip
mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)" mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)"
$(COMPOSE) up -d --wait $(COMPOSE) up -d --wait
rebuild: build-engine rebuild: build-engine seed-geoip
$(COMPOSE) build --no-cache galaxy-backend galaxy-api $(COMPOSE) build --no-cache galaxy-backend galaxy-api
mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)" mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)"
$(COMPOSE) up -d --wait $(COMPOSE) up -d --wait
@@ -52,6 +53,19 @@ build-engine:
docker build -t $(ENGINE_IMAGE) -f $(REPO_ROOT)/game/Dockerfile $(REPO_ROOT); \ docker build -t $(ENGINE_IMAGE) -f $(REPO_ROOT)/game/Dockerfile $(REPO_ROOT); \
fi fi
# Copy the GeoIP fixture into a named volume the backend mounts as
# /var/lib/galaxy. Using a volume avoids a bind-mount that would
# resolve against an ephemeral workspace path when compose is driven
# from the Gitea runner (see tools/dev-deploy/KNOWN-ISSUES.md for the
# breakage that bind-mounts caused on `docker restart`).
seed-geoip:
@echo "seeding GeoIP fixture into galaxy-dev-geoip-data…"
docker volume create galaxy-dev-geoip-data >/dev/null
docker run --rm \
-v galaxy-dev-geoip-data:/dst \
-v $(REPO_ROOT)/pkg/geoip/test-data/test-data:/src:ro \
alpine sh -c 'cp /src/GeoIP2-Country-Test.mmdb /dst/geoip.mmdb'
# Build the UI frontend and load the resulting build/ directory into # Build the UI frontend and load the resulting build/ directory into
# the named volume Caddy serves from. Used by the dev-deploy workflow # the named volume Caddy serves from. Used by the dev-deploy workflow
# and by anyone bringing the stack up by hand. # and by anyone bringing the stack up by hand.
+2 -1
View File
@@ -153,9 +153,10 @@ The same volume-persistence model applies to `tools/local-dev/`.
## Make targets ## Make targets
```text ```text
make up Build images, ensure engine image, bring stack up (waits for health) make up Build images, ensure engine image, seed geoip, bring stack up
make rebuild Rebuild backend / gateway images (ignores cache), then up make rebuild Rebuild backend / gateway images (ignores cache), then up
make seed-ui pnpm build + load build/ into galaxy-dev-ui-dist volume make seed-ui pnpm build + load build/ into galaxy-dev-ui-dist volume
make seed-geoip Copy pkg/geoip fixture into galaxy-dev-geoip-data volume
make build-engine Build galaxy-engine:dev (no-op if image already present) make build-engine Build galaxy-engine:dev (no-op if image already present)
make down Stop containers, keep named volumes make down Stop containers, keep named volumes
make logs Tail compose logs make logs Tail compose logs
+11 -1
View File
@@ -144,7 +144,15 @@ services:
target: ${GALAXY_DEV_GAME_STATE_DIR} target: ${GALAXY_DEV_GAME_STATE_DIR}
bind: bind:
create_host_path: true create_host_path: true
- ../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb:/var/lib/galaxy/geoip.mmdb:ro # The geoip database lives on a named volume seeded by the
# `dev-deploy.yaml` workflow (or by `make seed-geoip` when
# bringing the stack up by hand). A bind-mount with a relative
# path would resolve against the runner's ephemeral workspace
# under /home/runner/.cache/act/<hash>/, which the runner
# deletes after the workflow ends — and the next
# `docker restart galaxy-dev-backend` would then fail with
# "not a directory" because the mount source vanished.
- galaxy-dev-geoip-data:/var/lib/galaxy:ro
networks: networks:
- galaxy-internal - galaxy-internal
healthcheck: healthcheck:
@@ -258,3 +266,5 @@ volumes:
name: galaxy-dev-caddy-data name: galaxy-dev-caddy-data
galaxy-dev-ui-dist: galaxy-dev-ui-dist:
name: galaxy-dev-ui-dist name: galaxy-dev-ui-dist
galaxy-dev-geoip-data:
name: galaxy-dev-geoip-data