From a9087691a3b108fbea0a2a936b952a28cefebbb0 Mon Sep 17 00:00:00 2001 From: Ilia Denisov Date: Mon, 18 May 2026 23:32:42 +0200 Subject: [PATCH 1/2] =?UTF-8?q?chore(ci):=20tidy=20CI/dev=20infra=20?= =?UTF-8?q?=E2=80=94=20drop=20local-ci,=20lift=20migration=20rule,=20scope?= =?UTF-8?q?=20by=20galaxy.stack=20label?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five connected cleanups across the dev/CI infrastructure: 1. Drop tools/local-ci/. The standalone Gitea + act_runner stack was the legacy "offline workflow validator"; the per-stage CI gate now runs on gitea.lan and the directory was only retained as a fallback. Removing it leaves no operational dependency: backend, gateway, and game code have no references; documentation that pointed at it (CLAUDE.md, docs/ARCHITECTURE.md, ui/docs/testing.md, tools/dev-deploy/README.md, tools/local-dev/README.md) is updated in this same change. Historical "Verified on local-ci run N" markers in ui/PLAN.md are preserved unchanged. 2. Lift the pre-production single-migration rule. The rule forced every schema delta into 00001_init.sql and required a manual make clean-data wipe on every backward-incompatible change in tools/dev-deploy/. Future schema deltas now land as additive sequence-numbered files (00002_*.sql, …) that goose applies automatically on backend startup; 00001_init.sql becomes an immutable baseline. Authoring conventions live in backend/internal/postgres/migrations/README.md. The chain may be squashed back into a fresh 00001 as a deliberate one-time operation before the first production deployment. 3. Document the deployment cadence. The dev environment is single-tenant: pushes to feature/* run the test workflows (go-unit, ui-test, integration) only; dev-deploy.yaml fires on push to development. A workflow_dispatch override on dev-deploy.yaml lets a developer preview a feature branch on the shared dev environment before merge; the next merge into development overwrites the manual deploy idempotently. 4. Scope compose-managed resources by an explicit galaxy.stack= label. Both compose files stamp the label on every service, network, and named volume. Makefiles in tools/local-dev/ and tools/dev-deploy/ filter their engine-cleanup operations by (stack-label AND engine OCI title) so they never touch unrelated workloads on the same daemon. dev-deploy.yaml gains a pre-`compose up` step that reaps stale exited/dead containers under the dev-deploy stack label. 5. Backend now stamps the same galaxy.stack= label on every engine container it spawns, sourced from a new BACKEND_STACK_LABEL env var (empty → label not applied; legacy-safe). Both compose files set it to their stack name (local-dev / dev-deploy). The contract is recorded in docs/ARCHITECTURE.md under "Container labels". A package-level test in backend/internal/runtime exercises both the label-present and label-absent paths. No tests intentionally regressed: go test ./backend/internal/{config, runtime,dockerclient} is green, both compose files validate cleanly, and the backend, gateway, and game modules all build. Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitea/workflows/dev-deploy.yaml | 18 +++ CLAUDE.md | 46 +++++--- backend/README.md | 9 +- backend/docs/runbook.md | 9 +- backend/internal/config/config.go | 10 ++ .../internal/postgres/migrations/README.md | 60 ++++++---- backend/internal/runtime/service.go | 34 ++++-- .../internal/runtime/service_internal_test.go | 51 +++++++++ docs/ARCHITECTURE.md | 34 +++++- tools/dev-deploy/Makefile | 9 +- tools/dev-deploy/README.md | 36 ++++-- tools/dev-deploy/docker-compose.yml | 21 ++++ tools/local-ci/.gitignore | 1 - tools/local-ci/Makefile | 42 ------- tools/local-ci/README.md | 106 ------------------ tools/local-ci/bootstrap.sh | 86 -------------- tools/local-ci/config.yaml | 35 ------ tools/local-ci/docker-compose.override.yml | 16 --- tools/local-ci/docker-compose.yml | 78 ------------- tools/local-dev/Makefile | 23 +++- tools/local-dev/README.md | 25 +++-- tools/local-dev/docker-compose.yml | 17 +++ ui/docs/testing.md | 91 ++------------- 23 files changed, 325 insertions(+), 532 deletions(-) create mode 100644 backend/internal/runtime/service_internal_test.go delete mode 100644 tools/local-ci/.gitignore delete mode 100644 tools/local-ci/Makefile delete mode 100644 tools/local-ci/README.md delete mode 100755 tools/local-ci/bootstrap.sh delete mode 100644 tools/local-ci/config.yaml delete mode 100644 tools/local-ci/docker-compose.override.yml delete mode 100644 tools/local-ci/docker-compose.yml diff --git a/.gitea/workflows/dev-deploy.yaml b/.gitea/workflows/dev-deploy.yaml index 3eb6305..589cd7d 100644 --- a/.gitea/workflows/dev-deploy.yaml +++ b/.gitea/workflows/dev-deploy.yaml @@ -104,6 +104,24 @@ jobs: -v "${{ gitea.workspace }}/ui/frontend/build:/src:ro" \ alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/' + - name: Reap stray dev-deploy containers + run: | + # Remove any non-running compose-managed containers from + # earlier deploys before `compose up`. Filter by the stack + # label so we never touch unrelated workloads on the same + # daemon. Running containers (incl. engine instances backend + # spawned itself with the same label) are left intact — + # those are reattached by the backend reconciler on boot. + ids=$(docker ps -aq \ + --filter "label=galaxy.stack=dev-deploy" \ + --filter "status=exited" \ + --filter "status=created" \ + --filter "status=dead") + if [ -n "$ids" ]; then + echo "reaping: $ids" + docker rm -f $ids + fi + - name: Bring up the stack working-directory: tools/dev-deploy run: | diff --git a/CLAUDE.md b/CLAUDE.md index a58e51e..e1e330e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -46,7 +46,7 @@ Branches: it auto-deploys to the dev environment via `dev-deploy.yaml` (reachable at `https://www.galaxy.lan` / `https://api.galaxy.lan`). - `feature/*` — short-lived branches off `development`. Merged back - via PR; only then do they reach the dev environment. + via PR; only then do they reach the dev environment automatically. Workflows in `.gitea/workflows/`: @@ -55,10 +55,24 @@ Workflows in `.gitea/workflows/`: | `go-unit.yaml` | push + PR matching Go paths | Fast Go unit tests. | | `ui-test.yaml` | push + PR matching `ui/**` | Vitest + Playwright. | | `integration.yaml` | PR to `development`/`main`; push to `development` | testcontainers integration suite. | -| `dev-deploy.yaml` | push to `development` | Build images + (re)deploy to `tools/dev-deploy/`. | +| `dev-deploy.yaml` | push to `development`; `workflow_dispatch` on any ref | Build images + (re)deploy to `tools/dev-deploy/`. | | `prod-build.yaml` | push to `main` | Build prod images and `docker save` into artifacts. | | `deploy-prod.yaml` | `workflow_dispatch` | Manual rollout (placeholder until prod host exists). | +### Deployment cadence + +The long-lived dev environment (`tools/dev-deploy/`) is single-tenant: +one live deployment, redeployed on every merge into `development`. +While a PR is open the dev environment stays on whatever was last +merged — pushes to `feature/*` only fire the test workflows +(`go-unit`, `ui-test`, `integration`), not `dev-deploy.yaml`. + +To preview an unmerged feature branch on the shared dev environment, +trigger `dev-deploy.yaml` manually from the Gitea UI +(**Actions → Deploy · Dev → Run workflow**) and pick the feature ref. +The deploy is idempotent: the next merge into `development` simply +overwrites whatever the manual dispatch left behind. + ## Per-stage CI gate Every completed stage from any `PLAN.md` (per-service or `ui/PLAN.md`) @@ -72,10 +86,6 @@ short version: 4. Only after every workflow that fired is `success` may the stage be marked done in the corresponding `PLAN.md`. -`tools/local-ci/` is now an opt-in fallback for testing workflow -changes without `gitea.lan` (offline iterations, runner-isolation -debugging). It is no longer required for the per-stage gate. - ## Decisions during stage implementation Stages from `PLAN.md` produce decisions. Those decisions never live in a @@ -102,18 +112,22 @@ The existing codebase of `galaxy/` may be modified or extended when a plan stage requires it. All such changes must be covered by new or updated tests and reflected in documentation when they affect documented behavior. -## Pre-production migration rule +## Migrations -The platform is not yet in production. Schema changes for `backend` go -into the existing `backend/internal/postgres/migrations/00001_init.sql` -file rather than into new `00002_*`-prefixed files. Local databases and -integration test harnesses are recreated from scratch on every pull. +Schema changes for `backend` go into a new `0000N_*.sql` file under +`backend/internal/postgres/migrations/` with a monotonically increasing +prefix. `00001_init.sql` is the historical baseline and stays +immutable; every subsequent change is its own additive migration with +matching Up/Down sides. `pressly/goose/v3` (embedded into the backend +binary) applies pending migrations on startup, so the long-lived dev +environment picks up schema deltas without a manual reset. -**This rule is removed before the first production deployment.** From -that point on every schema change becomes a new migration file with a -monotonically increasing prefix, and `00001_init.sql` becomes immutable -history. See `backend/internal/postgres/migrations/README.md` for -details. +Before the first production deployment the migration chain may be +squashed back into a single fresh `00001_init.sql` for a clean slate; +plan that work as an explicit task when it lands. See +`backend/internal/postgres/migrations/README.md` for the local +authoring conventions (file naming, transactional vs. non-transactional +sections, backward-compatible deletes, rollback expectations). ## Documentation discipline diff --git a/backend/README.md b/backend/README.md index 27505cf..a27452e 100644 --- a/backend/README.md +++ b/backend/README.md @@ -129,6 +129,7 @@ fast. | `BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT` | no | `256` | Engine container `--pids-limit`. | | `BACKEND_RUNTIME_CONTAINER_STATE_MOUNT` | no | `/var/lib/galaxy-game` | Absolute in-container path for the per-game state bind mount. | | `BACKEND_RUNTIME_STOP_GRACE_PERIOD` | no | `10s` | SIGTERM-to-SIGKILL grace period for engine container stop. | +| `BACKEND_STACK_LABEL` | no | — | Optional value stamped as `galaxy.stack=` on every engine container backend spawns. Lets host-side tooling (Makefile / CI) scope cleanup to one dev stack. Empty → label is not applied. | | `BACKEND_NOTIFICATION_ADMIN_EMAIL` | no | — | Recipient address for admin-channel notifications (`runtime.*` kinds). When empty, admin-channel routes are recorded as `skipped` and the catalog is partially silenced. | | `BACKEND_NOTIFICATION_WORKER_INTERVAL` | no | `5s` | Notification route worker scan interval. | | `BACKEND_NOTIFICATION_MAX_ATTEMPTS` | no | `8` | Notification route delivery attempts before dead-lettering. | @@ -153,10 +154,10 @@ seeded `admin_accounts` ahead of time. before the HTTP listener opens. The startup path also issues a `CREATE SCHEMA IF NOT EXISTS backend` so a fresh database does not trip goose's bookkeeping table on the first migration. -- Pre-production uses one migration file (`00001_init.sql`) covering - every backend domain (auth, user, admin, lobby, runtime, mail, - notification, geo). Future migrations are sequence-numbered and - additive. +- Migrations are sequence-numbered (`0000N_*.sql`) and applied + additively. `00001_init.sql` is the historical baseline; every + schema change after it is a new file with a higher prefix. See + `internal/postgres/migrations/README.md` for the authoring rules. - Queries are written through `go-jet/jet/v2`. The generated code is in `internal/postgres/jet/backend/` and is committed; `internal/postgres/jet/jet.go` carries package metadata that survives regeneration. diff --git a/backend/docs/runbook.md b/backend/docs/runbook.md index 9d28e38..3ab5869 100644 --- a/backend/docs/runbook.md +++ b/backend/docs/runbook.md @@ -28,10 +28,11 @@ test stack. The list mirrors the steady-state behaviour documented in ## Migrations `pressly/goose/v3` applies embedded migrations from -`internal/postgres/migrations/`. The pre-production set ships as -`00001_init.sql` plus additive numbered files. Backend always runs -`CREATE SCHEMA IF NOT EXISTS backend` before goose so a fresh database -does not trip the bookkeeping table on the first migration. +`internal/postgres/migrations/`. Migrations are additive, +sequence-numbered files (`00001_init.sql` is the baseline). Backend +always runs `CREATE SCHEMA IF NOT EXISTS backend` before goose so a +fresh database does not trip the bookkeeping table on the first +migration. `internal/postgres/migrations_test.go` asserts that the migration produces the expected table set; adding a table without updating the diff --git a/backend/internal/config/config.go b/backend/internal/config/config.go index bd981ab..ee6cae9 100644 --- a/backend/internal/config/config.go +++ b/backend/internal/config/config.go @@ -91,6 +91,7 @@ const ( envRuntimeContainerPIDsLimit = "BACKEND_RUNTIME_CONTAINER_PIDS_LIMIT" envRuntimeContainerStateMount = "BACKEND_RUNTIME_CONTAINER_STATE_MOUNT" envRuntimeStopGracePeriod = "BACKEND_RUNTIME_STOP_GRACE_PERIOD" + envRuntimeStackLabel = "BACKEND_STACK_LABEL" envNotificationAdminEmail = "BACKEND_NOTIFICATION_ADMIN_EMAIL" envNotificationWorkerInterval = "BACKEND_NOTIFICATION_WORKER_INTERVAL" @@ -409,6 +410,14 @@ type RuntimeConfig struct { // StopGracePeriod is the docker stop SIGTERM-to-SIGKILL grace period // applied during stop / cancel / restart / patch. StopGracePeriod time.Duration + + // StackLabel is the optional value backend stamps as + // `galaxy.stack=` on every engine container it spawns. It + // lets host-side tooling (Makefile, CI workflows) scope cleanup + // operations to a single dev stack without touching unrelated + // workloads on the same Docker daemon. When empty, the label is + // not applied. + StackLabel string } // DiplomailConfig bounds the diplomatic-mail subsystem. Both limits @@ -705,6 +714,7 @@ func LoadFromEnv() (Config, error) { if cfg.Runtime.StopGracePeriod, err = loadDuration(envRuntimeStopGracePeriod, cfg.Runtime.StopGracePeriod); err != nil { return Config{}, err } + cfg.Runtime.StackLabel = strings.TrimSpace(loadString(envRuntimeStackLabel, cfg.Runtime.StackLabel)) cfg.Notification.AdminEmail = loadString(envNotificationAdminEmail, cfg.Notification.AdminEmail) if cfg.Notification.WorkerInterval, err = loadDuration(envNotificationWorkerInterval, cfg.Notification.WorkerInterval); err != nil { diff --git a/backend/internal/postgres/migrations/README.md b/backend/internal/postgres/migrations/README.md index fdb85cf..d131f49 100644 --- a/backend/internal/postgres/migrations/README.md +++ b/backend/internal/postgres/migrations/README.md @@ -1,26 +1,46 @@ # Backend migrations -Goose migrations embedded into the backend binary by `embed.go`. Applied -at startup before any listener opens (see `internal/postgres`). +Goose (`pressly/goose/v3`) migrations embedded into the backend binary +by `embed.go`. Applied at startup before any listener opens — see +`internal/postgres`. -## Pre-production single-file rule +## Authoring conventions -**While the platform is not yet in production, every schema change goes -into the existing `00001_init.sql` file** rather than a new -`00002_*`-prefixed file. The intent is to keep the schema in one -canonical place so reviewers and developers do not have to reconstruct -the latest shape from a chain of incremental migrations. +- Each schema change is a new file with a monotonically increasing + numeric prefix and a snake-case slug: + `0000N_short_description.sql`. Reuse of a prefix is forbidden once + the file is merged. +- `00001_init.sql` is the historical baseline. Treat it as immutable + history; do not edit it to land new schema. Squashing the chain back + into a fresh `00001` is reserved for the explicit pre-production + cut-over. +- Every file MUST contain both an `-- +goose Up` and `-- +goose Down` + section, even if Down is a single `DROP …` for the same artefacts. + Down migrations are exercised by the schema test and serve as the + documented rollback path. +- Destructive changes (dropping columns/tables, renaming with data + loss) MUST be split into at least two migrations so the chain stays + rollable forward and backward without coordinated code+schema + windows: + 1. add the new shape, dual-write the data, leave the old shape in + place; + 2. once all readers have switched, drop the old shape in a follow-up + migration. +- Migrations are applied automatically on backend startup, so a fresh + push to `development` plus the `dev-deploy.yaml` workflow brings the + long-lived dev database up to head without manual intervention. + `make -C tools/dev-deploy clean-data` is only needed when a developer + deliberately wants a fresh database. +- The integration harness (`backend/internal/postgres/migrations_test.go`) + spins up a disposable Postgres per run and asserts the final table + set. When a migration adds or removes tables, update the expected + list in the same patch. -Operationally this means that pulling a branch with schema changes -requires a fresh database — the only consumer today is local development -and integration tests, both of which spin up disposable Postgres -instances. +## Pre-production squash -> **Remove this rule before the first production deployment.** From -> that point on every schema change must be a new migration file with a -> monotonically increasing prefix, and `00001_init.sql` becomes -> immutable history. - -If you need to make a change, edit `00001_init.sql` directly. Down -migrations should still be kept in sync (they live at the bottom of the -file — currently a single `DROP SCHEMA backend CASCADE`). +The chain may be squashed back into one clean `00001_init.sql` before +the first production deployment. That is a deliberate, one-time +operation; until then, additive numbered files are the rule. After the +squash this file gets a short note that `00001_init.sql` represents +the production baseline and the policy above continues to apply for +every later migration. diff --git a/backend/internal/runtime/service.go b/backend/internal/runtime/service.go index d4495e1..a7d13e8 100644 --- a/backend/internal/runtime/service.go +++ b/backend/internal/runtime/service.go @@ -537,10 +537,7 @@ func (s *Service) runStart(ctx context.Context, op OperationLog) error { Env: map[string]string{ "GAME_STATE_PATH": statePath, }, - Labels: map[string]string{ - "galaxy.game_id": gameID.String(), - "galaxy.engine_version": version.Version, - }, + Labels: s.engineLabels(gameID.String(), version.Version), BindMounts: []dockerclient.BindMount{ { HostPath: hostStatePath, @@ -735,10 +732,7 @@ func (s *Service) runPatch(ctx context.Context, op OperationLog, target EngineVe Env: map[string]string{ "GAME_STATE_PATH": statePath, }, - Labels: map[string]string{ - "galaxy.game_id": op.GameID.String(), - "galaxy.engine_version": target.Version, - }, + Labels: s.engineLabels(op.GameID.String(), target.Version), BindMounts: []dockerclient.BindMount{ {HostPath: hostStatePath, MountPath: s.deps.Config.ContainerStateMount}, }, @@ -938,6 +932,30 @@ func (s *Service) upsertRuntimeRecord(ctx context.Context, in runtimeRecordInser // containers attach to. Wired from cfg.Docker.Network through Deps. func (s *Service) dockerNetwork() string { return s.deps.DockerNetwork } +// engineLabels returns the label set stamped on every engine container +// spawned for gameID running engineVersion. The runtime adapter merges +// `dockerclient.ManagedLabel` separately; this helper covers the +// game-scoped labels plus an optional `galaxy.stack=` from the +// runtime config so host-side tooling can scope cleanup by dev stack +// without touching unrelated workloads. +func (s *Service) engineLabels(gameID, engineVersion string) map[string]string { + return engineLabels(gameID, engineVersion, s.deps.Config.StackLabel) +} + +// engineLabels is the side-effect-free part of `(*Service).engineLabels`, +// exposed at package scope so unit tests can exercise the labelling +// rules without building a full Service. +func engineLabels(gameID, engineVersion, stackLabel string) map[string]string { + labels := map[string]string{ + "galaxy.game_id": gameID, + "galaxy.engine_version": engineVersion, + } + if stackLabel != "" { + labels["galaxy.stack"] = stackLabel + } + return labels +} + // waitForEngineHealthz polls the engine `/healthz` endpoint until it // responds 2xx or until the timeout elapses. The Docker daemon // reports a container as `running` as soon as the entrypoint starts, diff --git a/backend/internal/runtime/service_internal_test.go b/backend/internal/runtime/service_internal_test.go new file mode 100644 index 0000000..12a5420 --- /dev/null +++ b/backend/internal/runtime/service_internal_test.go @@ -0,0 +1,51 @@ +package runtime + +import "testing" + +func TestEngineLabels(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + gameID string + version string + stackLabel string + want map[string]string + }{ + { + name: "stack label omitted when empty", + gameID: "11111111-1111-1111-1111-111111111111", + version: "0.1.0", + stackLabel: "", + want: map[string]string{ + "galaxy.game_id": "11111111-1111-1111-1111-111111111111", + "galaxy.engine_version": "0.1.0", + }, + }, + { + name: "stack label included when set", + gameID: "22222222-2222-2222-2222-222222222222", + version: "0.2.3", + stackLabel: "dev-deploy", + want: map[string]string{ + "galaxy.game_id": "22222222-2222-2222-2222-222222222222", + "galaxy.engine_version": "0.2.3", + "galaxy.stack": "dev-deploy", + }, + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + got := engineLabels(tc.gameID, tc.version, tc.stackLabel) + if len(got) != len(tc.want) { + t.Fatalf("len(labels) = %d, want %d (got %v)", len(got), len(tc.want), got) + } + for k, v := range tc.want { + if got[k] != v { + t.Errorf("labels[%q] = %q, want %q", k, got[k], v) + } + } + }) + } +} diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 53665ae..a2dec3e 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -808,10 +808,17 @@ Workflows under `.gitea/workflows/`: | `go-unit.yaml` | push + PR matching Go paths | Fast Go unit tests. | | `ui-test.yaml` | push + PR matching `ui/**` | Vitest + Playwright. | | `integration.yaml` | PR to `development` / `main`; push to `development` | testcontainers integration suite. | -| `dev-deploy.yaml` | push to `development` | Build images, seed UI volume, `compose up` against `tools/dev-deploy/`. | +| `dev-deploy.yaml` | push to `development`; `workflow_dispatch` on any ref | Build images, seed UI volume, `compose up` against `tools/dev-deploy/`. | | `prod-build.yaml` | push to `main` | Build production images and persist `docker save` bundles as artifacts. | | `deploy-prod.yaml` | manual `workflow_dispatch` | Placeholder for the future SSH-based production rollout. | +Deployment cadence: the dev environment is single-tenant. Pushes to +`feature/*` branches run only the test workflows; `dev-deploy.yaml` +does not auto-fire. To preview a feature branch on the shared dev +environment, trigger `dev-deploy.yaml` manually from the Gitea UI +against the desired ref. The deploy is idempotent — the next merge +into `development` overwrites the manually deployed state. + Environments: - **`tools/local-dev/`** — single-developer playground. Bound to @@ -823,9 +830,28 @@ Environments: and are shipped to the production host via `docker save` → `ssh prod docker load` → `docker compose up -d`. -`tools/local-ci/` remains as an opt-in fallback runner for testing -workflow changes without `gitea.lan`. It is no longer part of the -per-stage CI gate; see `CLAUDE.md` for the gate definition. +### Container labels + +Every Docker resource Galaxy creates carries an opinionated label so +that host-side tooling (Makefiles, CI workflows, `preclean.sh`) can +scope its operations to Galaxy-owned objects and never touch unrelated +workloads on the shared daemon. + +| Label | Values | Set by | Used by | +|-------|--------|--------|---------| +| `galaxy.stack` | `local-dev`, `dev-deploy`, `integration` | `tools/{local-dev,dev-deploy}/docker-compose.yml` for compose-managed resources; backend reads `BACKEND_STACK_LABEL` and stamps engines it spawns. | `tools/{local-dev,dev-deploy}/Makefile`, `.gitea/workflows/dev-deploy.yaml`. | +| `galaxy.backend` | `1` | `backend/internal/dockerclient` adapter on every engine container. | `integration/scripts/preclean.sh`. | +| `galaxy.game_id` | `` | Backend on engine create. | Reconciler reattach loop. | +| `galaxy.engine_version` | `` | Backend on engine create. | Reconciler version checks. | +| `galaxy.test.kind` | `integration-image` | `integration/testenv/images.go` on local image builds. | `integration/scripts/preclean.sh` (filter for `docker rmi`). | +| `org.testcontainers` | `true` | `testcontainers-go` (automatic). | `integration/scripts/preclean.sh`. | + +The contract: any Makefile target, CI step, or script that issues +`docker rm` / `docker rmi` / `docker network rm` MUST scope itself via +one of the labels above. Compose-managed resources are additionally +scoped by their compose project name (`galaxy-dev`, `galaxy-local-dev`), +which Compose enforces on `docker compose up/down`; the labels make the +contract explicit and survive hand-rolled cleanup commands as well. ## 19. Deployment Topology (informational) diff --git a/tools/dev-deploy/Makefile b/tools/dev-deploy/Makefile index 4e154b8..e9f1260 100644 --- a/tools/dev-deploy/Makefile +++ b/tools/dev-deploy/Makefile @@ -4,6 +4,7 @@ REPO_ROOT := $(realpath $(CURDIR)/../..) ENGINE_IMAGE := galaxy-engine:dev +STACK_LABEL := galaxy.stack=dev-deploy ENGINE_LABEL := org.opencontainers.image.title=galaxy-game-engine # Game-state root lives under the invoking user's home by default so # `make up` works without sudo. Override `GALAXY_DEV_GAME_STATE_DIR` @@ -93,12 +94,14 @@ psql: clean-data: @echo "Stopping containers and engines, then wiping volumes + game-state…" - @ids=$$(docker ps -aq --filter label=$(ENGINE_LABEL)); \ + $(COMPOSE) down -v + @ids=$$(docker ps -aq \ + --filter "label=$(STACK_LABEL)" \ + --filter "label=$(ENGINE_LABEL)"); \ if [ -n "$$ids" ]; then \ - echo "stopping engine containers…"; \ + echo "stopping engine containers for $(STACK_LABEL)…"; \ docker rm -f $$ids >/dev/null; \ fi - $(COMPOSE) down -v @if [ -d "$(GALAXY_DEV_GAME_STATE_DIR)" ]; then \ echo "wiping $(GALAXY_DEV_GAME_STATE_DIR)…"; \ docker run --rm -v "$(GALAXY_DEV_GAME_STATE_DIR):/state" alpine sh -c 'rm -rf /state/*' 2>/dev/null \ diff --git a/tools/dev-deploy/README.md b/tools/dev-deploy/README.md index 5d3f68c..fb308a7 100644 --- a/tools/dev-deploy/README.md +++ b/tools/dev-deploy/README.md @@ -135,17 +135,20 @@ exec galaxy-mailpit wget -qO- localhost:8025/messages` and similar. ## Persistent state and schema changes The dev Postgres volume `galaxy-dev-postgres-data` survives redeploys. -Until the pre-production migration rule is lifted, every -backward-incompatible change to `backend/internal/postgres/migrations/00001_init.sql` -needs a manual wipe before the next deploy succeeds: +Schema deltas land as additive, sequence-numbered migration files +(`backend/internal/postgres/migrations/0000N_*.sql`) and `pressly/goose` +applies them on backend startup without operator action. + +Use `make -C tools/dev-deploy clean-data` only when you deliberately +want a fresh database (debugging schema drift, exercising the +bootstrap path from scratch, etc.): ```sh make -C tools/dev-deploy clean-data make -C tools/dev-deploy up ``` -This is the same caveat as `tools/local-dev/`, just with a different -volume name. +The same volume-persistence model applies to `tools/local-dev/`. ## Make targets @@ -183,13 +186,30 @@ See [`KNOWN-ISSUES.md`](KNOWN-ISSUES.md) for symptoms that surface in the long-lived dev environment but are not yet fixed (currently: the sandbox game flipping to `cancelled` after a redispatch). +## Deployment cadence + +This environment is single-tenant: one live deployment, redeployed by +the `dev-deploy.yaml` workflow on every merge into `development`. PR +branches do not auto-deploy here — pushes to `feature/*` only run the +test workflows (`go-unit`, `ui-test`, `integration`). + +To put a feature branch on the shared dev environment before its PR +merges (e.g. to validate a UI flow against the real Caddy edge), run +the workflow manually: + +1. Push the branch (`git push gitea HEAD`). +2. Gitea UI → **Actions → Deploy · Dev → Run workflow**, pick the + feature ref. + +The deploy is idempotent — when the PR later merges into +`development`, the regular push trigger fires the same packaging and +healthcheck steps, overwriting whatever the manual dispatch left +behind. There is no separate state to clean up between the two paths. + ## Relationship to other infrastructure - `tools/local-dev/` — single-developer playground, host-port mapped, Vite dev server on the side. Recommended for active UI work. -- `tools/local-ci/` — Gitea + act runner for **fallback** workflow - testing without `gitea.lan`. Optional, not part of the per-stage CI - gate anymore. - `.gitea/workflows/dev-deploy.yaml` — the CI side of this stack: builds images, seeds the UI volume, runs `docker compose up -d` on every merge into `development`. The Makefile in this directory is diff --git a/tools/dev-deploy/docker-compose.yml b/tools/dev-deploy/docker-compose.yml index 0962b3e..dbc0cc1 100644 --- a/tools/dev-deploy/docker-compose.yml +++ b/tools/dev-deploy/docker-compose.yml @@ -22,6 +22,8 @@ services: image: postgres:16-alpine container_name: galaxy-dev-postgres restart: unless-stopped + labels: + galaxy.stack: dev-deploy environment: POSTGRES_USER: galaxy POSTGRES_PASSWORD: galaxy @@ -41,6 +43,8 @@ services: image: redis:7-alpine container_name: galaxy-dev-redis restart: unless-stopped + labels: + galaxy.stack: dev-deploy command: - redis-server - --requirepass @@ -62,6 +66,8 @@ services: image: axllent/mailpit:v1.21 container_name: galaxy-dev-mailpit restart: unless-stopped + labels: + galaxy.stack: dev-deploy networks: - galaxy-internal healthcheck: @@ -78,6 +84,8 @@ services: image: galaxy/backend:dev container_name: galaxy-dev-backend restart: unless-stopped + labels: + galaxy.stack: dev-deploy user: "0:0" depends_on: galaxy-postgres: @@ -94,6 +102,7 @@ services: BACKEND_SMTP_FROM: "galaxy-backend@galaxy.lan" BACKEND_SMTP_TLS_MODE: none BACKEND_DOCKER_NETWORK: galaxy-dev-internal + BACKEND_STACK_LABEL: dev-deploy BACKEND_GAME_STATE_ROOT: ${GALAXY_DEV_GAME_STATE_DIR} BACKEND_GEOIP_DB_PATH: /var/lib/galaxy/geoip.mmdb BACKEND_NOTIFICATION_ADMIN_EMAIL: admin@galaxy.lan @@ -152,6 +161,8 @@ services: image: galaxy/gateway:dev container_name: galaxy-dev-api restart: unless-stopped + labels: + galaxy.stack: dev-deploy depends_on: galaxy-backend: condition: service_healthy @@ -209,6 +220,8 @@ services: image: caddy:2.11.2-alpine container_name: galaxy-dev-caddy restart: unless-stopped + labels: + galaxy.stack: dev-deploy depends_on: galaxy-api: condition: service_healthy @@ -225,6 +238,8 @@ networks: name: galaxy-dev-internal driver: bridge internal: false + labels: + galaxy.stack: dev-deploy edge: name: ${GALAXY_EDGE_NETWORK:-edge} external: true @@ -232,7 +247,13 @@ networks: volumes: galaxy-dev-postgres-data: name: galaxy-dev-postgres-data + labels: + galaxy.stack: dev-deploy galaxy-dev-caddy-data: name: galaxy-dev-caddy-data + labels: + galaxy.stack: dev-deploy galaxy-dev-ui-dist: name: galaxy-dev-ui-dist + labels: + galaxy.stack: dev-deploy diff --git a/tools/local-ci/.gitignore b/tools/local-ci/.gitignore deleted file mode 100644 index 4c49bd7..0000000 --- a/tools/local-ci/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.env diff --git a/tools/local-ci/Makefile b/tools/local-ci/Makefile deleted file mode 100644 index 82be826..0000000 --- a/tools/local-ci/Makefile +++ /dev/null @@ -1,42 +0,0 @@ -.PHONY: help up down logs status clean push - -.DEFAULT_GOAL := help - -COMPOSE := docker compose -GITEA_USER := galaxy -GITEA_PASS := galaxy-dev -REPO_NAME := galaxy -REMOTE_NAME := local-gitea -REPO_ROOT := $(realpath $(CURDIR)/../..) -GIT := git -C $(REPO_ROOT) -REMOTE_URL := http://$(GITEA_USER):$(GITEA_PASS)@localhost:3000/$(GITEA_USER)/$(REPO_NAME).git - -help: - @echo "Local Gitea CI for galaxy:" - @echo " make up Bring up Gitea + runner (idempotent)" - @echo " make down Stop both containers" - @echo " make logs Tail logs" - @echo " make status Show container status" - @echo " make push Push current branch to local Gitea" - @echo " make clean Stop and wipe all local state" - -up: - @./bootstrap.sh - -down: - $(COMPOSE) down - -logs: - $(COMPOSE) logs -f --tail=50 - -status: - $(COMPOSE) ps - -push: - @$(GIT) remote get-url $(REMOTE_NAME) >/dev/null 2>&1 || \ - $(GIT) remote add $(REMOTE_NAME) $(REMOTE_URL) - $(GIT) push $(REMOTE_NAME) HEAD - -clean: - $(COMPOSE) down -v - rm -f .env diff --git a/tools/local-ci/README.md b/tools/local-ci/README.md deleted file mode 100644 index 1115f01..0000000 --- a/tools/local-ci/README.md +++ /dev/null @@ -1,106 +0,0 @@ -# Local Gitea CI (fallback) - -> **Status:** fallback / opt-in. The primary CI target is now -> `gitea.lan` with its host-mode `act_runner`. The per-stage CI gate -> closes against `gitea.lan`, not against this stack. Use this -> directory when you want to validate `.gitea/workflows/*` without -> reaching `gitea.lan` — for example, when iterating on a workflow -> file from a flight without LAN access — or when isolating a runner -> issue from production-shaped infrastructure. - -Self-contained Gitea + Actions runner for verifying -`.gitea/workflows/*` honestly before pushing to `gitea.lan`. Runs -natively on arm64 (Apple Silicon) — every image below has an arm64 -variant, so Docker pulls the right architecture and the runner -executes workflow steps without QEMU emulation. - -## Prerequisites - -- Docker (Colima or Docker Desktop) -- `python3`, `curl`, `bash` — all built into macOS - -## First time - -```sh -make -C tools/local-ci up -``` - -This: - -1. brings up the Gitea container; -2. creates an admin user (`galaxy` / `galaxy-dev`); -3. creates the `galaxy/galaxy` repo; -4. fetches a runner registration token from the Gitea API; -5. brings up the runner with that token (the runner persists its - credentials in a Docker volume and ignores the token on subsequent - restarts). - -The script is idempotent — re-running it is safe. - -## Pushing a branch - -```sh -make -C tools/local-ci push -``` - -This adds a `local-gitea` remote on the first run and then pushes the -current `HEAD`. Equivalent manual flow: - -```sh -git remote add local-gitea \ - http://galaxy:galaxy-dev@localhost:3000/galaxy/galaxy.git -git push local-gitea HEAD -``` - -The Tier 1 workflow fires on `push` to any branch and the Tier 2 -workflow fires on tags matching `v*`. Watch runs at: - - - -## Operational targets - -| Target | What it does | -| ---------------- | -------------------------------------------- | -| `make up` | Bring up Gitea + runner (idempotent) | -| `make down` | Stop both containers (state preserved) | -| `make logs` | Tail logs from both containers | -| `make status` | Show container status | -| `make push` | Push current `HEAD` to local Gitea | -| `make clean` | Stop and wipe all local state (full reset) | - -## What's in the box - -| Component | Image | Role | -| ---------- | ---------------------------------- | ------------------------------------------- | -| Gitea | `gitea/gitea:1.23` | Server with SQLite backend | -| act_runner | `gitea/act_runner:0.6.1` | Single-capacity runner registered on boot | -| Workflow | `catthehacker/ubuntu:act-latest` | Image spawned per job (multi-arch) | - -The runner mounts the host Docker socket and spawns workflow -containers on the same Docker network as Gitea, so -`actions/checkout` reaches the server at `http://gitea:3000` from -inside spawned containers. - -## Caveats - -- Gitea's `ROOT_URL` is set to `http://gitea:3000/` so spawned - workflow containers reach the server through the compose network. - The web UI works at `http://localhost:3000` via port mapping, but - copy-paste URLs in the UI may show `gitea:3000` instead of - `localhost:3000`. Harmless for local dev; switch the host part by - hand when copying. -- The runner is single-capacity (`runner.capacity: 1` in - `config.yaml`). Concurrent jobs queue. Bump if you need parallel - jobs. -- First push from a fresh checkout uploads the full repo history - (~tens of MB). Subsequent pushes are deltas. -- `actions/upload-artifact@v4` requires Gitea ≥ 1.21 — we pin - `1.23` to stay above the cutoff. -- Workflow steps run as `root` inside the spawned container; this - matches the upstream catthehacker behaviour. Keep that in mind if - you add steps that touch host-mounted directories. -- On Apple Silicon the runner image and its catthehacker child run - natively as arm64. Some pre-built tools that ship in the image are - amd64-only and would fall back to QEMU; `setup-go`, `setup-node`, - and `pnpm/action-setup` all download arm64 binaries themselves, so - the workflow steps we care about stay native. diff --git a/tools/local-ci/bootstrap.sh b/tools/local-ci/bootstrap.sh deleted file mode 100755 index 7e81dc1..0000000 --- a/tools/local-ci/bootstrap.sh +++ /dev/null @@ -1,86 +0,0 @@ -#!/usr/bin/env bash -# Bring up Gitea, create the admin user and the galaxy/galaxy repo, -# fetch a runner registration token, bring up the runner. -# Idempotent — re-runnable. -set -euo pipefail - -cd "$(dirname "$0")" - -GITEA_USER=galaxy -GITEA_PASS=galaxy-dev -GITEA_EMAIL=galaxy@local -REPO_NAME=galaxy -GITEA_URL=http://localhost:3000 - -echo ">>> Bringing up Gitea..." -docker compose up -d gitea - -echo ">>> Waiting for Gitea API..." -for _ in $(seq 1 120); do - if curl -fsS "${GITEA_URL}/api/v1/version" >/dev/null 2>&1; then - echo "Gitea is up." - break - fi - sleep 1 -done - -if ! curl -fsS "${GITEA_URL}/api/v1/version" >/dev/null 2>&1; then - echo "Gitea did not come up within 120 seconds." >&2 - docker compose logs gitea | tail -30 >&2 - exit 1 -fi - -echo ">>> Creating admin user (idempotent)..." -docker compose exec -T gitea su git -c " - gitea admin user create \ - --username ${GITEA_USER} \ - --password ${GITEA_PASS} \ - --email ${GITEA_EMAIL} \ - --admin \ - --must-change-password=false 2>&1 || true -" - -echo ">>> Creating repo (idempotent)..." -HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' \ - -u "${GITEA_USER}:${GITEA_PASS}" \ - -H "Content-Type: application/json" \ - -d "{\"name\":\"${REPO_NAME}\",\"private\":true,\"auto_init\":false}" \ - "${GITEA_URL}/api/v1/user/repos") -case "${HTTP_CODE}" in - 201) echo "Repo created." ;; - 409) echo "Repo already exists." ;; - *) - echo "Unexpected response (${HTTP_CODE}) creating repo." >&2 - exit 1 - ;; -esac - -echo ">>> Fetching runner registration token..." -RUNNER_TOKEN=$(curl -fsS \ - -u "${GITEA_USER}:${GITEA_PASS}" \ - "${GITEA_URL}/api/v1/admin/runners/registration-token" \ - | python3 -c "import json, sys; print(json.load(sys.stdin)['token'])") - -# act_runner uses RUNNER_TOKEN only on the first boot. After registration -# it persists credentials in the named runner-data volume (/data/.runner) -# and ignores the env token on subsequent restarts. Writing a fresh token -# every time is harmless. -echo "RUNNER_TOKEN=${RUNNER_TOKEN}" > .env - -echo ">>> Bringing up runner..." -docker compose up -d runner - -cat </dev/null || true - git push local-gitea HEAD - - open http://localhost:3000/${GITEA_USER}/${REPO_NAME}/actions - -Or use \`make push\` from this directory. -EOF diff --git a/tools/local-ci/config.yaml b/tools/local-ci/config.yaml deleted file mode 100644 index 8f34468..0000000 --- a/tools/local-ci/config.yaml +++ /dev/null @@ -1,35 +0,0 @@ -# act_runner configuration. -# -# The `ubuntu-latest` label is mapped to catthehacker/ubuntu:act-latest, -# which is multi-arch — Docker on Apple Silicon pulls the arm64 variant -# and runs it natively (no QEMU). The same image is what `act` uses -# locally, so workflows behave the same. - -log: - level: info - -runner: - file: /data/.runner - capacity: 1 - fetch_timeout: 5s - fetch_interval: 2s - labels: - - "ubuntu-latest:docker://catthehacker/ubuntu:act-latest" - -cache: - enabled: true - dir: /data/cache - -container: - # Spawned workflow containers join the same network as Gitea so - # actions/checkout and other steps can reach the server at - # http://gitea:3000. - network: galaxy-local-gitea-net - privileged: false - options: "" - workdir_parent: "" - valid_volumes: [] - force_pull: false - -host: - workdir_parent: "" diff --git a/tools/local-ci/docker-compose.override.yml b/tools/local-ci/docker-compose.override.yml deleted file mode 100644 index f1555ff..0000000 --- a/tools/local-ci/docker-compose.override.yml +++ /dev/null @@ -1,16 +0,0 @@ -# Local-only override: this developer's host already runs another -# Gitea instance bound to 0.0.0.0:3000 and 0.0.0.0:2222, so the -# default port mappings in docker-compose.yml conflict. Remap the -# local-ci Gitea to 13000 (HTTP) and 12222 (SSH) on the host. The -# in-network ports stay 3000 / 22 — runners and workflow containers -# keep reaching Gitea by hostname through the compose network. -# -# This file is intentionally NOT committed to the repo; it captures -# per-host port allocation. Use `make -C tools/local-ci push` only -# after pointing the `local-gitea` git remote at the override port. - -services: - gitea: - ports: !override - - "13000:3000" - - "12222:22" diff --git a/tools/local-ci/docker-compose.yml b/tools/local-ci/docker-compose.yml deleted file mode 100644 index 2586dcb..0000000 --- a/tools/local-ci/docker-compose.yml +++ /dev/null @@ -1,78 +0,0 @@ -# Local Gitea + Actions runner for verifying .gitea/workflows/*. -# Runs natively on arm64 (Apple Silicon) — every image below is multi-arch. -# -# Browser: http://localhost:3000 -# API: http://localhost:3000/api/v1 -# Push URL: http://galaxy:galaxy-dev@localhost:3000/galaxy/galaxy.git -# Actions: http://localhost:3000/galaxy/galaxy/actions -# -# `bootstrap.sh` (or `make up`) brings everything up and registers the -# runner. State persists in named Docker volumes; `make clean` wipes them. - -services: - gitea: - image: gitea/gitea:1.23 - container_name: galaxy-local-gitea - restart: unless-stopped - environment: - USER_UID: "1000" - USER_GID: "1000" - GITEA__database__DB_TYPE: sqlite3 - GITEA__database__PATH: /data/gitea/gitea.db - # ROOT_URL uses the in-network hostname so the runner and spawned - # workflow containers reach Gitea through the compose network. - # The browser still works at http://localhost:3000 via the port - # mapping below; UI-generated copy URLs may show "gitea:3000", - # which is harmless for local dev. - GITEA__server__ROOT_URL: http://gitea:3000/ - GITEA__server__SSH_PORT: "2222" - GITEA__actions__ENABLED: "true" - GITEA__security__INSTALL_LOCK: "true" - GITEA__service__DISABLE_REGISTRATION: "true" - ports: - - "3000:3000" - - "2222:22" - volumes: - - gitea-data:/data - networks: - - gitea-net - healthcheck: - test: - - CMD-SHELL - - wget -q -O- http://localhost:3000/api/v1/version >/dev/null || exit 1 - interval: 5s - timeout: 3s - retries: 30 - start_period: 5s - - runner: - image: gitea/act_runner:0.6.1 - container_name: galaxy-local-runner - restart: unless-stopped - depends_on: - gitea: - condition: service_healthy - environment: - CONFIG_FILE: /config/config.yaml - GITEA_INSTANCE_URL: http://gitea:3000 - # Provided by bootstrap.sh in the .env file. After the first - # successful registration, act_runner persists credentials in - # /data/.runner and ignores this token on subsequent restarts. - GITEA_RUNNER_REGISTRATION_TOKEN: ${RUNNER_TOKEN:-} - GITEA_RUNNER_NAME: galaxy-local - volumes: - - /var/run/docker.sock:/var/run/docker.sock - - runner-data:/data - - ./config.yaml:/config/config.yaml:ro - networks: - - gitea-net - -networks: - gitea-net: - name: galaxy-local-gitea-net - -volumes: - gitea-data: - name: galaxy-local-gitea-data - runner-data: - name: galaxy-local-runner-data diff --git a/tools/local-dev/Makefile b/tools/local-dev/Makefile index d4444c8..4981f23 100644 --- a/tools/local-dev/Makefile +++ b/tools/local-dev/Makefile @@ -5,9 +5,16 @@ COMPOSE := docker compose REPO_ROOT := $(realpath $(CURDIR)/../..) ENGINE_IMAGE := galaxy-engine:local-dev -# Label set by the engine `Dockerfile` runtime stage; used to find -# engine containers spawned by backend's runtime that fall outside -# `docker compose down`'s scope. +# Engine containers spawned by backend's runtime fall outside the +# compose project. We identify them by two labels: +# STACK_LABEL — backend stamps this on every engine it spawns from +# this stack (see BACKEND_STACK_LABEL env in the +# compose file); +# ENGINE_LABEL — image-level OCI title baked into the engine +# Dockerfile. +# Both filters together select exactly this stack's engine containers +# and never compose-managed services or unrelated workloads. +STACK_LABEL := galaxy.stack=local-dev ENGINE_LABEL := org.opencontainers.image.title=galaxy-game-engine help: @@ -65,9 +72,11 @@ clean: stop-engines # cascade the game to `cancelled`. We only remove them as part of # `clean`, where the whole DB is wiped anyway. stop-engines: - @ids=$$(docker ps -aq --filter label=$(ENGINE_LABEL)); \ + @ids=$$(docker ps -aq \ + --filter "label=$(STACK_LABEL)" \ + --filter "label=$(ENGINE_LABEL)"); \ if [ -n "$$ids" ]; then \ - echo "stopping engine containers…"; \ + echo "stopping engine containers for $(STACK_LABEL)…"; \ docker rm -f $$ids >/dev/null; \ fi @@ -87,7 +96,9 @@ stop-engines: # cycles. prune-broken-engines: @ids=""; \ - for cid in $$(docker ps -aq --filter label=$(ENGINE_LABEL) 2>/dev/null); do \ + for cid in $$(docker ps -aq \ + --filter "label=$(STACK_LABEL)" \ + --filter "label=$(ENGINE_LABEL)" 2>/dev/null); do \ state=$$(docker inspect -f '{{.State.Status}}' $$cid 2>/dev/null); \ case "$$state" in \ running|restarting) ;; \ diff --git a/tools/local-dev/README.md b/tools/local-dev/README.md index d15a405..428172b 100644 --- a/tools/local-dev/README.md +++ b/tools/local-dev/README.md @@ -15,10 +15,10 @@ This stack is **not** a CI gate (the per-stage CI gate now lives on the **long-lived dev environment** at [`tools/dev-deploy/`](../dev-deploy/README.md), which is redeployed on every merge into `development` and is reachable as -`https://www.galaxy.lan` / `https://api.galaxy.lan`. The three stacks -(`tools/local-dev/`, `tools/dev-deploy/`, and the fallback -`tools/local-ci/`) coexist on the same host because every name — -compose project, container, network, volume — is distinct. +`https://www.galaxy.lan` / `https://api.galaxy.lan`. The two stacks +(`tools/local-dev/` and `tools/dev-deploy/`) coexist on the same host +because every name — compose project, container, network, volume — is +distinct. ## Bring it up @@ -203,8 +203,8 @@ make status docker compose ps images built on alpine (so `wget` is available for the compose healthchecks). The build stage mirrors `backend/Dockerfile` and `gateway/Dockerfile` exactly. -- `Makefile` — wrapper over `docker compose` that keeps the muscle - memory close to `tools/local-ci/`'s Makefile. +- `Makefile` — wrapper over `docker compose` with thin targets for the + most common dev cycles. - `.env` — committed defaults for the compose `${VAR:-}` expansions. Edit per-developer or override via your shell. - `keys/gateway-response.pem`, `keys/gateway-response.pub` — dev-only @@ -290,12 +290,13 @@ make status docker compose ps ## Relationship to other infrastructure -- `tools/local-ci/` — Gitea + Actions runner, replays - `.gitea/workflows/*` against a pushed branch. Different stack, - different purpose; coexists with local-dev on the same machine. +- `tools/dev-deploy/` — long-lived dev environment redeployed on every + merge into `development`; reachable at `https://www.galaxy.lan` / + `https://api.galaxy.lan`. Distinct compose project, container names, + network and volumes. - `integration/testenv/` — testcontainers harness used by - `make -C integration integration`. Uses the same images - (`backend/Dockerfile`, `gateway/Dockerfile`) at production - defaults; do not confuse with this local-dev stack, which carries + `make -C integration integration`. Uses the canonical + `backend/Dockerfile` / `gateway/Dockerfile` at production defaults; + do not confuse with this local-dev stack, which carries alpine-runtime images for ergonomics and the dev-mode auth override. diff --git a/tools/local-dev/docker-compose.yml b/tools/local-dev/docker-compose.yml index ac6d724..9e52c39 100644 --- a/tools/local-dev/docker-compose.yml +++ b/tools/local-dev/docker-compose.yml @@ -19,11 +19,15 @@ # can log in without touching Mailpit. Real codes still arrive in # Mailpit; both paths coexist. +name: galaxy-local-dev + services: postgres: image: postgres:16-alpine container_name: galaxy-local-dev-postgres restart: unless-stopped + labels: + galaxy.stack: local-dev environment: POSTGRES_USER: galaxy POSTGRES_PASSWORD: galaxy @@ -45,6 +49,8 @@ services: image: redis:7-alpine container_name: galaxy-local-dev-redis restart: unless-stopped + labels: + galaxy.stack: local-dev command: - redis-server - --requirepass @@ -68,6 +74,8 @@ services: image: axllent/mailpit:v1.21 container_name: galaxy-local-dev-mailpit restart: unless-stopped + labels: + galaxy.stack: local-dev ports: - "${LOCAL_DEV_MAILPIT_PORT:-8025}:8025" networks: @@ -86,6 +94,8 @@ services: image: galaxy/backend:local-dev container_name: galaxy-local-dev-backend restart: unless-stopped + labels: + galaxy.stack: local-dev user: "0:0" depends_on: postgres: @@ -102,6 +112,7 @@ services: BACKEND_SMTP_FROM: "galaxy-backend@galaxy.local" BACKEND_SMTP_TLS_MODE: none BACKEND_DOCKER_NETWORK: galaxy-local-dev-net + BACKEND_STACK_LABEL: local-dev BACKEND_GAME_STATE_ROOT: /tmp/galaxy-game-state BACKEND_GEOIP_DB_PATH: /var/lib/galaxy/geoip.mmdb BACKEND_NOTIFICATION_ADMIN_EMAIL: admin@galaxy.local @@ -144,6 +155,8 @@ services: image: galaxy/gateway:local-dev container_name: galaxy-local-dev-gateway restart: unless-stopped + labels: + galaxy.stack: local-dev depends_on: backend: condition: service_healthy @@ -205,7 +218,11 @@ services: networks: galaxy-net: name: galaxy-local-dev-net + labels: + galaxy.stack: local-dev volumes: postgres-data: name: galaxy-local-dev-postgres-data + labels: + galaxy.stack: local-dev diff --git a/ui/docs/testing.md b/ui/docs/testing.md index 3fe103c..396b4ac 100644 --- a/ui/docs/testing.md +++ b/ui/docs/testing.md @@ -106,8 +106,6 @@ addition to the real Mailpit code; see for the full runbook (regenerating the dev keypair, switching the mode off, troubleshooting common boot issues). -The local-dev stack is independent from the local-ci stack below; -they bind different ports and can run side by side. ## Synthetic reports for visual testing (DEV) @@ -159,92 +157,19 @@ record in the parser's `README.md` that the new field cannot be derived from legacy text. This keeps the synthetic-mode coverage in step with the contract as the UI grows. -## Local CI verification +## CI verification -`tools/local-ci/` ships a self-contained Gitea + Actions runner via -docker-compose so workflow changes are exercised end-to-end on a real -runner before pushing to a remote Gitea instance. On Apple Silicon -the runner and every spawned workflow container are arm64-native -(no QEMU). Full runbook lives in -[`../../tools/local-ci/README.md`](../../tools/local-ci/README.md); -the cheat sheet below covers the operations needed when working a -phase that touches CI. +Workflow changes are exercised on the primary CI host (`gitea.lan`). +Push the branch (`git push gitea …`), then open the run in the Gitea +UI to inspect the status and logs. See `CLAUDE.md` (`## Per-stage CI +gate`) for the per-stage workflow. -### Bring up / push / tear down - -```sh -make -C tools/local-ci up # idempotent: gitea + runner + admin user + repo -make -C tools/local-ci push # add `local-gitea` remote (first call) and push HEAD -make -C tools/local-ci status # docker compose ps -make -C tools/local-ci logs # tail container logs -make -C tools/local-ci down # stop, keep state -make -C tools/local-ci clean # stop and wipe volumes for a fresh start -``` - -Default credentials baked in: `galaxy:galaxy-dev` (admin user, also -the owner of the `galaxy/galaxy` repo). Web UI on -; runs at -. - -### Inspect a run from the shell - -The Gitea Actions API is on `http://localhost:3000/api/v1` with basic -auth. Useful for verifying a workflow change without opening the -browser: - -```sh -# Latest workflow runs — `status` is a human-readable string here: -# "running" / "success" / "failure" / "cancelled". -curl -s -u galaxy:galaxy-dev \ - 'http://localhost:3000/api/v1/repos/galaxy/galaxy/actions/tasks?limit=5' \ - | python3 -m json.tool - -# Tight one-liner for the latest run only: -curl -s -u galaxy:galaxy-dev \ - 'http://localhost:3000/api/v1/repos/galaxy/galaxy/actions/tasks?limit=1' \ - | python3 -c 'import json, sys; r=json.load(sys.stdin)["workflow_runs"][0]; print(r["run_number"], r["status"], r["display_title"])' -``` - -Step-by-step workflow output is stored zstd-compressed under -`/data/gitea/actions_log/galaxy/galaxy//.log.zst` -inside the gitea container: - -```sh -docker compose -f tools/local-ci/docker-compose.yml exec -T gitea sh -c ' - apk add --quiet zstd - zstdcat /data/gitea/actions_log/galaxy/galaxy/01/1.log.zst -' | less -``` - -`` is the run number, zero-padded to two digits -(`01`, `02`, …); `` is the 1-based index of the job -inside that run (only `1` for the current single-job workflows). - -### Typical phase workflow - -When a phase changes anything under `.gitea/workflows/` or surfaces -new tests in CI: - -1. Local sanity first — run the affected commands directly - (`pnpm test`, `pnpm exec playwright test`, the targeted - `go test ./...` slice). -2. Commit and `make -C tools/local-ci push`. -3. Poll the API for the latest run; once it leaves `running`, - inspect status. On failure pull the log via the snippet above. -4. Fix and repeat. The runner is always-on; each push triggers a - fresh run (test cache is cleared by `-count=1` so a green run is - honest). - -### Quick syntax-only dry-run with `act` - -For a sub-second check that the workflow YAML is well-formed and -action references resolve, without pulling images and without -running anything: +For a sub-second syntax check of a workflow YAML without pulling +images or running anything: ```sh act -W .gitea/workflows/ui-test.yaml -n push ``` `act` doesn't honour Gitea-specific behaviours (artifact storage, -secrets, run triggers). Use it for syntax checks; fall back to the -local Gitea above for honest end-to-end verification. +secrets, run triggers); use it only for syntax checks. -- 2.52.0 From daed2690c1e43e0b58d79d8afc8dd454b58f7dad Mon Sep 17 00:00:00 2001 From: Ilia Denisov Date: Tue, 19 May 2026 01:00:21 +0200 Subject: [PATCH 2/2] fix(compose): keep galaxy.stack label on containers only MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous commit stamped `galaxy.stack=` on services, volumes, and networks. Putting it on volumes/networks changes their compose config-hash on every label revision, so `docker compose up` tries to recreate them — which on the long-lived dev environment either destroys the postgres data volume or deadlocks while trying to remove `galaxy-dev-internal` with containers still bound to it. Observed live: run #184 hung in compose recreate after the three stateful services were stopped, with no recovery. Containers alone are sufficient for the cleanup contract (we filter containers, not volumes or networks). Roll back the label on volumes and networks in both compose files and capture the rule in docs/ARCHITECTURE.md so the next contributor does not reintroduce it. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/ARCHITECTURE.md | 19 ++++++++++++++----- tools/dev-deploy/docker-compose.yml | 17 +++++++++-------- tools/local-dev/docker-compose.yml | 7 +++---- 3 files changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index a2dec3e..1947197 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -832,14 +832,14 @@ Environments: ### Container labels -Every Docker resource Galaxy creates carries an opinionated label so -that host-side tooling (Makefiles, CI workflows, `preclean.sh`) can -scope its operations to Galaxy-owned objects and never touch unrelated -workloads on the shared daemon. +Every Galaxy-managed Docker **container** carries an opinionated +label so that host-side tooling (Makefiles, CI workflows, +`preclean.sh`) can scope its operations to Galaxy-owned containers +and never touch unrelated workloads on the shared daemon. | Label | Values | Set by | Used by | |-------|--------|--------|---------| -| `galaxy.stack` | `local-dev`, `dev-deploy`, `integration` | `tools/{local-dev,dev-deploy}/docker-compose.yml` for compose-managed resources; backend reads `BACKEND_STACK_LABEL` and stamps engines it spawns. | `tools/{local-dev,dev-deploy}/Makefile`, `.gitea/workflows/dev-deploy.yaml`. | +| `galaxy.stack` | `local-dev`, `dev-deploy`, `integration` | `tools/{local-dev,dev-deploy}/docker-compose.yml` for compose-managed services; backend reads `BACKEND_STACK_LABEL` and stamps engines it spawns. | `tools/{local-dev,dev-deploy}/Makefile`, `.gitea/workflows/dev-deploy.yaml`. | | `galaxy.backend` | `1` | `backend/internal/dockerclient` adapter on every engine container. | `integration/scripts/preclean.sh`. | | `galaxy.game_id` | `` | Backend on engine create. | Reconciler reattach loop. | | `galaxy.engine_version` | `` | Backend on engine create. | Reconciler version checks. | @@ -853,6 +853,15 @@ scoped by their compose project name (`galaxy-dev`, `galaxy-local-dev`), which Compose enforces on `docker compose up/down`; the labels make the contract explicit and survive hand-rolled cleanup commands as well. +**Scope deliberately limited to containers.** Labels are NOT stamped +on named volumes or user-defined networks. Adding labels there would +change the compose config-hash for the volume/network on every label +revision and force `docker compose up` to recreate them — which for a +postgres data volume means destroying the database, and for a shared +network can deadlock if any container is still attached. Containers +alone are sufficient for the cleanup contract; stateful resources stay +untouched by compose between deploys. + ## 19. Deployment Topology (informational) - MVP runs three executables: one `gateway` instance, one `backend` diff --git a/tools/dev-deploy/docker-compose.yml b/tools/dev-deploy/docker-compose.yml index dbc0cc1..8fdfc2b 100644 --- a/tools/dev-deploy/docker-compose.yml +++ b/tools/dev-deploy/docker-compose.yml @@ -238,22 +238,23 @@ networks: name: galaxy-dev-internal driver: bridge internal: false - labels: - galaxy.stack: dev-deploy edge: name: ${GALAXY_EDGE_NETWORK:-edge} external: true +# Note: `galaxy.stack=dev-deploy` is intentionally stamped only on +# services (containers). Stamping it on networks or named volumes +# changes the compose config-hash for those resources, and on a +# subsequent `compose up` compose tries to recreate them — for the +# `galaxy-dev-postgres-data` volume that means destroying the +# database, and for `galaxy-dev-internal` it can deadlock if any +# container is still attached. Per-container labels are sufficient +# for the CI/cleanup contract; we filter containers, not volumes or +# networks. volumes: galaxy-dev-postgres-data: name: galaxy-dev-postgres-data - labels: - galaxy.stack: dev-deploy galaxy-dev-caddy-data: name: galaxy-dev-caddy-data - labels: - galaxy.stack: dev-deploy galaxy-dev-ui-dist: name: galaxy-dev-ui-dist - labels: - galaxy.stack: dev-deploy diff --git a/tools/local-dev/docker-compose.yml b/tools/local-dev/docker-compose.yml index 9e52c39..5a7db40 100644 --- a/tools/local-dev/docker-compose.yml +++ b/tools/local-dev/docker-compose.yml @@ -218,11 +218,10 @@ services: networks: galaxy-net: name: galaxy-local-dev-net - labels: - galaxy.stack: local-dev +# See note in tools/dev-deploy/docker-compose.yml — labels live only +# on services (containers), not on volumes or networks, to keep the +# compose config-hash for stateful resources stable across deploys. volumes: postgres-data: name: galaxy-local-dev-postgres-data - labels: - galaxy.stack: local-dev -- 2.52.0