chore(ci): tidy CI/dev infra — drop local-ci, lift migration rule, scope by galaxy.stack label
Tests · Go / test (push) Successful in 2m6s
Tests · Go / test (pull_request) Successful in 3m1s
Tests · Integration / integration (pull_request) Successful in 1m42s

Five connected cleanups across the dev/CI infrastructure:

1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was
   the legacy "offline workflow validator"; the per-stage CI gate now
   runs on gitea.lan and the directory was only retained as a
   fallback. Removing it leaves no operational dependency: backend,
   gateway, and game code have no references; documentation that
   pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md,
   tools/dev-deploy/README.md, tools/local-dev/README.md) is updated
   in this same change. Historical "Verified on local-ci run N"
   markers in ui/PLAN.md are preserved unchanged.

2. Lift the pre-production single-migration rule. The rule forced
   every schema delta into 00001_init.sql and required a manual
   make clean-data wipe on every backward-incompatible change in
   tools/dev-deploy/. Future schema deltas now land as additive
   sequence-numbered files (00002_*.sql, …) that goose applies
   automatically on backend startup; 00001_init.sql becomes an
   immutable baseline. Authoring conventions live in
   backend/internal/postgres/migrations/README.md. The chain may be
   squashed back into a fresh 00001 as a deliberate one-time
   operation before the first production deployment.

3. Document the deployment cadence. The dev environment is
   single-tenant: pushes to feature/* run the test workflows
   (go-unit, ui-test, integration) only; dev-deploy.yaml fires on
   push to development. A workflow_dispatch override on
   dev-deploy.yaml lets a developer preview a feature branch on the
   shared dev environment before merge; the next merge into
   development overwrites the manual deploy idempotently.

4. Scope compose-managed resources by an explicit
   galaxy.stack=<local-dev|dev-deploy> label. Both compose files
   stamp the label on every service, network, and named volume.
   Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their
   engine-cleanup operations by (stack-label AND engine OCI title)
   so they never touch unrelated workloads on the same daemon.
   dev-deploy.yaml gains a pre-`compose up` step that reaps stale
   exited/dead containers under the dev-deploy stack label.

5. Backend now stamps the same galaxy.stack=<value> label on every
   engine container it spawns, sourced from a new BACKEND_STACK_LABEL
   env var (empty → label not applied; legacy-safe). Both compose
   files set it to their stack name (local-dev / dev-deploy). The
   contract is recorded in docs/ARCHITECTURE.md under
   "Container labels". A package-level test in
   backend/internal/runtime exercises both the label-present and
   label-absent paths.

No tests intentionally regressed: go test ./backend/internal/{config,
runtime,dockerclient} is green, both compose files validate cleanly,
and the backend, gateway, and game modules all build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ilia Denisov
2026-05-18 23:32:42 +02:00
parent 5eec7013ba
commit a9087691a3
23 changed files with 325 additions and 532 deletions
+6 -3
View File
@@ -4,6 +4,7 @@
REPO_ROOT := $(realpath $(CURDIR)/../..)
ENGINE_IMAGE := galaxy-engine:dev
STACK_LABEL := galaxy.stack=dev-deploy
ENGINE_LABEL := org.opencontainers.image.title=galaxy-game-engine
# Game-state root lives under the invoking user's home by default so
# `make up` works without sudo. Override `GALAXY_DEV_GAME_STATE_DIR`
@@ -93,12 +94,14 @@ psql:
clean-data:
@echo "Stopping containers and engines, then wiping volumes + game-state…"
@ids=$$(docker ps -aq --filter label=$(ENGINE_LABEL)); \
$(COMPOSE) down -v
@ids=$$(docker ps -aq \
--filter "label=$(STACK_LABEL)" \
--filter "label=$(ENGINE_LABEL)"); \
if [ -n "$$ids" ]; then \
echo "stopping engine containers"; \
echo "stopping engine containers for $(STACK_LABEL)"; \
docker rm -f $$ids >/dev/null; \
fi
$(COMPOSE) down -v
@if [ -d "$(GALAXY_DEV_GAME_STATE_DIR)" ]; then \
echo "wiping $(GALAXY_DEV_GAME_STATE_DIR)"; \
docker run --rm -v "$(GALAXY_DEV_GAME_STATE_DIR):/state" alpine sh -c 'rm -rf /state/*' 2>/dev/null \
+28 -8
View File
@@ -135,17 +135,20 @@ exec galaxy-mailpit wget -qO- localhost:8025/messages` and similar.
## Persistent state and schema changes
The dev Postgres volume `galaxy-dev-postgres-data` survives redeploys.
Until the pre-production migration rule is lifted, every
backward-incompatible change to `backend/internal/postgres/migrations/00001_init.sql`
needs a manual wipe before the next deploy succeeds:
Schema deltas land as additive, sequence-numbered migration files
(`backend/internal/postgres/migrations/0000N_*.sql`) and `pressly/goose`
applies them on backend startup without operator action.
Use `make -C tools/dev-deploy clean-data` only when you deliberately
want a fresh database (debugging schema drift, exercising the
bootstrap path from scratch, etc.):
```sh
make -C tools/dev-deploy clean-data
make -C tools/dev-deploy up
```
This is the same caveat as `tools/local-dev/`, just with a different
volume name.
The same volume-persistence model applies to `tools/local-dev/`.
## Make targets
@@ -183,13 +186,30 @@ See [`KNOWN-ISSUES.md`](KNOWN-ISSUES.md) for symptoms that surface
in the long-lived dev environment but are not yet fixed (currently:
the sandbox game flipping to `cancelled` after a redispatch).
## Deployment cadence
This environment is single-tenant: one live deployment, redeployed by
the `dev-deploy.yaml` workflow on every merge into `development`. PR
branches do not auto-deploy here — pushes to `feature/*` only run the
test workflows (`go-unit`, `ui-test`, `integration`).
To put a feature branch on the shared dev environment before its PR
merges (e.g. to validate a UI flow against the real Caddy edge), run
the workflow manually:
1. Push the branch (`git push gitea HEAD`).
2. Gitea UI → **Actions → Deploy · Dev → Run workflow**, pick the
feature ref.
The deploy is idempotent — when the PR later merges into
`development`, the regular push trigger fires the same packaging and
healthcheck steps, overwriting whatever the manual dispatch left
behind. There is no separate state to clean up between the two paths.
## Relationship to other infrastructure
- `tools/local-dev/` — single-developer playground, host-port mapped,
Vite dev server on the side. Recommended for active UI work.
- `tools/local-ci/` — Gitea + act runner for **fallback** workflow
testing without `gitea.lan`. Optional, not part of the per-stage CI
gate anymore.
- `.gitea/workflows/dev-deploy.yaml` — the CI side of this stack:
builds images, seeds the UI volume, runs `docker compose up -d` on
every merge into `development`. The Makefile in this directory is
+21
View File
@@ -22,6 +22,8 @@ services:
image: postgres:16-alpine
container_name: galaxy-dev-postgres
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
environment:
POSTGRES_USER: galaxy
POSTGRES_PASSWORD: galaxy
@@ -41,6 +43,8 @@ services:
image: redis:7-alpine
container_name: galaxy-dev-redis
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
command:
- redis-server
- --requirepass
@@ -62,6 +66,8 @@ services:
image: axllent/mailpit:v1.21
container_name: galaxy-dev-mailpit
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
networks:
- galaxy-internal
healthcheck:
@@ -78,6 +84,8 @@ services:
image: galaxy/backend:dev
container_name: galaxy-dev-backend
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
user: "0:0"
depends_on:
galaxy-postgres:
@@ -94,6 +102,7 @@ services:
BACKEND_SMTP_FROM: "galaxy-backend@galaxy.lan"
BACKEND_SMTP_TLS_MODE: none
BACKEND_DOCKER_NETWORK: galaxy-dev-internal
BACKEND_STACK_LABEL: dev-deploy
BACKEND_GAME_STATE_ROOT: ${GALAXY_DEV_GAME_STATE_DIR}
BACKEND_GEOIP_DB_PATH: /var/lib/galaxy/geoip.mmdb
BACKEND_NOTIFICATION_ADMIN_EMAIL: admin@galaxy.lan
@@ -152,6 +161,8 @@ services:
image: galaxy/gateway:dev
container_name: galaxy-dev-api
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
depends_on:
galaxy-backend:
condition: service_healthy
@@ -209,6 +220,8 @@ services:
image: caddy:2.11.2-alpine
container_name: galaxy-dev-caddy
restart: unless-stopped
labels:
galaxy.stack: dev-deploy
depends_on:
galaxy-api:
condition: service_healthy
@@ -225,6 +238,8 @@ networks:
name: galaxy-dev-internal
driver: bridge
internal: false
labels:
galaxy.stack: dev-deploy
edge:
name: ${GALAXY_EDGE_NETWORK:-edge}
external: true
@@ -232,7 +247,13 @@ networks:
volumes:
galaxy-dev-postgres-data:
name: galaxy-dev-postgres-data
labels:
galaxy.stack: dev-deploy
galaxy-dev-caddy-data:
name: galaxy-dev-caddy-data
labels:
galaxy.stack: dev-deploy
galaxy-dev-ui-dist:
name: galaxy-dev-ui-dist
labels:
galaxy.stack: dev-deploy