Files
galaxy-game/tools/dev-deploy/README.md
T
Ilia Denisov f70258849f fix(dev-deploy): seed geoip onto a named volume
`docker restart galaxy-dev-backend` failed with "not a directory"
after every dev-deploy workflow run. Root cause: the compose file
bind-mounted the geoip database via a relative path
(`../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb`).
When the Gitea runner invoked `docker compose up`, the path
resolved against the runner's ephemeral workspace under
`/home/runner/.cache/act/<hash>/hostexecutor/...`. The bind source
baked into the running container therefore pointed at that
ephemeral path; the runner deleted the workspace once the workflow
finished, and any later `docker restart` could not remount.

Replace the bind with a named volume `galaxy-dev-geoip-data`,
seeded at deploy time:

- `tools/dev-deploy/docker-compose.yml`: mount
  `galaxy-dev-geoip-data:/var/lib/galaxy:ro` instead of a relative
  bind. Declare the volume in the top-level `volumes:` block.

- `.gitea/workflows/dev-deploy.yaml`: new `Seed geoip volume` step
  (placed right after the existing UI-volume seed) copies the
  fixture from `pkg/geoip/test-data/test-data/` into the named
  volume via an ephemeral alpine container, the same pattern UI
  seeding already uses.

- `tools/dev-deploy/Makefile`: new `seed-geoip` target performs
  the same copy from the persistent checkout. `up` and `rebuild`
  now depend on it, so a hand-run `make -C tools/dev-deploy up`
  populates the volume without operator action.

- `tools/dev-deploy/README.md`: updated the make-targets table to
  list `seed-geoip`.

- `tools/dev-deploy/KNOWN-ISSUES.md`: the entry for the restart
  failure is downgraded to a "fixed" postmortem; the symptom,
  cause, and where the fix lives are kept for future reference.

Verification on the dev host (this branch checked out):

  $ make -C tools/dev-deploy up                # populates the volume, brings stack healthy
  $ docker restart galaxy-dev-backend          # used to error "not a directory"
  $ until [ "$(docker inspect -f '{{.State.Health.Status}}' galaxy-dev-backend)" = "healthy" ]; do sleep 2; done
  $ echo "ok"                                   # backend up 6s, healthy

The pre-existing sandbox engine `galaxy-game-80f3ce86-...` survived
both `make up` and `docker restart` untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:59:38 +02:00

218 lines
8.6 KiB
Markdown

# `tools/dev-deploy/` — long-lived Galaxy dev environment
A docker-compose stack that runs the Galaxy backend, gateway, supporting
services, and a small Caddy in front of them, reachable through the host
Caddy at `https://www.galaxy.lan` and `https://api.galaxy.lan`. Used by
the `dev-deploy.yaml` Gitea Actions workflow as the canonical dev target
on every merge into the `development` branch, and runnable by hand
through this Makefile for local debugging of the deploy plumbing
itself.
This stack is **not** the developer's primary playground for UI work —
that role still belongs to [`tools/local-dev/`](../local-dev/README.md),
which is faster (Vite HMR, host-side dev server) and isolated to one
developer. The two stacks coexist on the same host because every name
is distinct:
| | `tools/local-dev/` | `tools/dev-deploy/` |
|------------------|------------------------------|-----------------------------|
| Compose project | `local-dev` | `galaxy-dev` |
| Container prefix | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Network | `galaxy-local-dev-net` | `galaxy-dev-internal`, `edge` |
| Volumes | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Host ports | 5433/6380/8025/8080/9090 | none (only `edge` network) |
| Game state | `/tmp/galaxy-game-state` | `/var/lib/galaxy-dev/game-state` |
| Engine image | `galaxy-engine:local-dev` | `galaxy-engine:dev` |
## Prerequisites
The host must already provide:
- Docker daemon reachable as the user running `make` (member of the
`docker` group, no sudo).
- An external bridge network named `edge` (or whatever
`GALAXY_EDGE_NETWORK` overrides to):
```sh
docker network create edge
```
- A host Caddy listening on `:80`/`:443`, attached to the `edge`
network, and proxying `www.galaxy.lan` and `api.galaxy.lan` to
`galaxy-caddy:80`. Example fragment for the host Caddyfile:
```caddy
www.galaxy.lan, api.galaxy.lan {
tls internal
reverse_proxy galaxy-caddy:80
}
```
- Game-state directory writable by the user running `make`. Default
is `${HOME}/.galaxy-dev/game-state`; `make up` creates it on demand.
Override by exporting `GALAXY_DEV_GAME_STATE_DIR` (e.g. to
`/var/lib/galaxy-dev/game-state` once the host is provisioned for
it).
## Bring it up
```sh
make -C tools/dev-deploy up
```
`up` (re)builds the local-dev backend and gateway images, makes sure the
engine image `galaxy-engine:dev` exists, and waits for healthchecks. It
does **not** seed the UI volume — that is normally done by CI. The first
time you run by hand:
```sh
make -C tools/dev-deploy seed-ui
make -C tools/dev-deploy up
make -C tools/dev-deploy health
```
`seed-ui` runs `pnpm build` in `ui/frontend/`, then copies the resulting
`build/` tree into the `galaxy-dev-ui-dist` volume. Subsequent CI deploys
overwrite this volume automatically.
## Daily flow
```sh
make -C tools/dev-deploy rebuild # rebuild backend/gateway images + up
make -C tools/dev-deploy logs # tail compose logs
make -C tools/dev-deploy health # probe https://*.galaxy.lan
make -C tools/dev-deploy down # stop, keep state
```
State persists in named volumes between `up`/`down` cycles. The
`development` branch keeps the dev environment continuously usable —
games created last week survive into this week unless somebody
calls `make clean-data`.
## Logging in
The same dev-mode email-code override as `tools/local-dev/` applies,
and the dev-deploy compose ships with it enabled by default:
1. Enter `dev@galaxy.lan` (or whatever `BACKEND_DEV_SANDBOX_EMAIL`
resolves to) in the login form.
2. Submit `123456` as the code — the docker-compose default for
`BACKEND_AUTH_DEV_FIXED_CODE` is `123456`, so the bcrypt-hashed
email code stays a fallback. To force real Mailpit codes (e.g. for
mail-flow QA), set `BACKEND_AUTH_DEV_FIXED_CODE=` (empty) in a
local `.env` and `make rebuild`.
The fixed-code override is rejected by production env loaders, so it
cannot leak into the prod environment.
## Networking
```
Browser
│ https://www.galaxy.lan, https://api.galaxy.lan
host-Caddy (:80, :443, TLS, attached to `edge` network)
│ reverse_proxy *.galaxy.lan → galaxy-caddy:80
galaxy-caddy (networks: edge + galaxy-dev-internal)
│ www.galaxy.lan → file_server /srv/galaxy-ui (volume galaxy-dev-ui-dist)
│ api.galaxy.lan → reverse_proxy galaxy-api:8080
galaxy-dev-internal
├─ galaxy-api (gateway: :8080 REST, :9090 gRPC)
├─ galaxy-backend (backend: :8080 HTTP, :8081 gRPC push)
├─ galaxy-postgres (postgres: :5432)
├─ galaxy-redis (redis: :6379)
├─ galaxy-mailpit (mailpit: :8025 UI, :1025 SMTP)
└─ engine containers (spawned by backend on demand)
```
The compose project deliberately exposes no host ports. Diagnostics
that used to go through `localhost:8025` etc. now go through the
container network: `docker compose -f tools/dev-deploy/docker-compose.yml
exec galaxy-mailpit wget -qO- localhost:8025/messages` and similar.
## Persistent state and schema changes
The dev Postgres volume `galaxy-dev-postgres-data` survives redeploys.
Schema deltas land as additive, sequence-numbered migration files
(`backend/internal/postgres/migrations/0000N_*.sql`) and `pressly/goose`
applies them on backend startup without operator action.
Use `make -C tools/dev-deploy clean-data` only when you deliberately
want a fresh database (debugging schema drift, exercising the
bootstrap path from scratch, etc.):
```sh
make -C tools/dev-deploy clean-data
make -C tools/dev-deploy up
```
The same volume-persistence model applies to `tools/local-dev/`.
## Make targets
```text
make up Build images, ensure engine image, seed geoip, bring stack up
make rebuild Rebuild backend / gateway images (ignores cache), then up
make seed-ui pnpm build + load build/ into galaxy-dev-ui-dist volume
make seed-geoip Copy pkg/geoip fixture into galaxy-dev-geoip-data volume
make build-engine Build galaxy-engine:dev (no-op if image already present)
make down Stop containers, keep named volumes
make logs Tail compose logs
make status docker compose ps
make health curl https://www.galaxy.lan + https://api.galaxy.lan/healthz
make psql psql as galaxy@galaxy_backend
make clean-data Stop everything and wipe volumes + game-state dir
```
## Files
- `docker-compose.yml` — six services: postgres, redis, mailpit,
galaxy-backend, galaxy-api, galaxy-caddy. Reuses the alpine-runtime
Dockerfiles from `../local-dev/` so the backend healthcheck can run
`wget`. Reuses the dev keypair from `../local-dev/keys/`.
- `Caddyfile.dev` — the application-routing Caddy config, mounted into
`galaxy-caddy` at `/etc/caddy/Caddyfile`.
- `Caddyfile.prod` — placeholder for a future prod deployment; not used
by this compose.
- `Makefile` — wrapper over `docker compose` with helpers for engine,
UI seeding, health probes, and full wipe.
- `.env.example` — non-secret defaults for the compose `${VAR:-}`
expansions. Copy to `.env` if you want host-local overrides.
## Known issues
See [`KNOWN-ISSUES.md`](KNOWN-ISSUES.md) for symptoms that surface
in the long-lived dev environment but are not yet fixed (currently:
the sandbox game flipping to `cancelled` after a redispatch).
## Deployment cadence
This environment is single-tenant: one live deployment, redeployed by
the `dev-deploy.yaml` workflow on every merge into `development`. PR
branches do not auto-deploy here — pushes to `feature/*` only run the
test workflows (`go-unit`, `ui-test`, `integration`).
To put a feature branch on the shared dev environment before its PR
merges (e.g. to validate a UI flow against the real Caddy edge), run
the workflow manually:
1. Push the branch (`git push gitea HEAD`).
2. Gitea UI → **Actions → Deploy · Dev → Run workflow**, pick the
feature ref.
The deploy is idempotent — when the PR later merges into
`development`, the regular push trigger fires the same packaging and
healthcheck steps, overwriting whatever the manual dispatch left
behind. There is no separate state to clean up between the two paths.
## Relationship to other infrastructure
- `tools/local-dev/` — single-developer playground, host-port mapped,
Vite dev server on the side. Recommended for active UI work.
- `.gitea/workflows/dev-deploy.yaml` — the CI side of this stack:
builds images, seeds the UI volume, runs `docker compose up -d` on
every merge into `development`. The Makefile in this directory is
what that workflow ultimately calls into.