Files
galaxy-game/tools/dev-deploy/README.md
T
Ilia Denisov e038ea6154
Tests · Integration / integration (pull_request) Successful in 1m48s
Tests · Go / test (pull_request) Successful in 2m1s
fix(dev-deploy): recycle engine containers on galaxy-engine:dev SHA drift
`backend`'s reconciler adopts pre-existing `galaxy-game-*` containers
without comparing their image SHA against the freshly-built
`galaxy-engine:dev`, so a long-lived sandbox would otherwise keep
serving the previous engine code after a redeploy. Issue #59 surfaced
this: after the per-command-rejection fix was deployed via
`workflow_dispatch`, the running sandbox container was still on the
old image SHA and the browser kept seeing the 503/unavailable response.

Adds a `Recycle engine containers on image drift` step right before
`Reap stray dev-deploy containers`. The step compares the new
`galaxy-engine:dev` SHA against every running `galaxy-game-*`
container and, on drift, stops the backend, removes the container,
wipes the bind-mounted per-game state directory (Engine.Init() writes
turn-0 over any pre-existing `turn-N` files — silent state corruption
otherwise), and cascade-deletes the lobby `games` row. The
`dev-sandbox` bootstrap on the next backend boot finds no live
sandbox and provisions a fresh one on the new engine image.

When the engine sources are unchanged, the BuildKit cache hits and
the SHA stays the same — the recycle step is a no-op and the running
games keep their state across the deploy. Verified end-to-end against
the live dev environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:47:25 +02:00

269 lines
11 KiB
Markdown

# `tools/dev-deploy/` — long-lived Galaxy dev environment
A docker-compose stack that runs the Galaxy backend, gateway, supporting
services, and a small Caddy in front of them, reachable through the host
Caddy at a single origin (`https://galaxy.lan` in dev). The stack is
single-origin and path-based: the project site, the game UI, and both
gateway surfaces live behind one host with no host name baked into the
artifacts. Used by the `dev-deploy.yaml` Gitea Actions workflow as the
canonical dev target on every merge into the `development` branch, and
runnable by hand through this Makefile for local debugging of the deploy
plumbing itself.
The application Caddy (`Caddyfile.dev`) is the authoritative routing
source; its header comment documents the exact topology:
```text
/ -> project site (galaxy-dev-site-dist -> /srv/galaxy-site)
/game/* -> game UI (galaxy-dev-ui-dist -> /srv/galaxy-ui)
/api/*, /healthz -> gateway public REST (galaxy-api:8080)
/rpc/* -> gateway Connect/gRPC-web (galaxy-api:9090)
```
The `/rpc` prefix is stripped before the gateway, and the game UI bundle
is built with base path `/game`.
This stack is **not** the developer's primary playground for UI work —
that role still belongs to [`tools/local-dev/`](../local-dev/README.md),
which is faster (Vite HMR, host-side dev server) and isolated to one
developer. The two stacks coexist on the same host because every name
is distinct:
| | `tools/local-dev/` | `tools/dev-deploy/` |
|------------------|------------------------------|-----------------------------|
| Compose project | `local-dev` | `galaxy-dev` |
| Container prefix | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Network | `galaxy-local-dev-net` | `galaxy-dev-internal`, `edge` |
| Volumes | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Host ports | 5433/6380/8025/8080/9090 | none (only `edge` network) |
| Game state | `/tmp/galaxy-game-state` | `/var/lib/galaxy-dev/game-state` |
| Engine image | `galaxy-engine:local-dev` | `galaxy-engine:dev` |
## Prerequisites
The host must already provide:
- Docker daemon reachable as the user running `make` (member of the
`docker` group, no sudo).
- An external bridge network named `edge` (or whatever
`GALAXY_EDGE_NETWORK` overrides to):
```sh
docker network create edge
```
- A host Caddy listening on `:80`/`:443`, attached to the `edge`
network, and proxying the single dev host `galaxy.lan` to
`galaxy-caddy:80`. The host Caddy only needs that one host;
`Caddyfile.dev` does the path-based fan-out behind it. Example
fragment for the host Caddyfile:
```caddy
galaxy.lan {
tls internal
reverse_proxy galaxy-caddy:80
}
```
- Game-state directory writable by the user running `make`. Default
is `${HOME}/.galaxy-dev/game-state`; `make up` creates it on demand.
Override by exporting `GALAXY_DEV_GAME_STATE_DIR` (e.g. to
`/var/lib/galaxy-dev/game-state` once the host is provisioned for
it).
## Bring it up
```sh
make -C tools/dev-deploy up
```
`up` (re)builds the local-dev backend and gateway images, makes sure the
engine image `galaxy-engine:dev` exists, and waits for healthchecks. It
does **not** seed the UI or site volumes — that is normally done by CI.
The first time you run by hand:
```sh
make -C tools/dev-deploy seed-site
make -C tools/dev-deploy seed-ui
make -C tools/dev-deploy up
make -C tools/dev-deploy health
```
`seed-ui` runs `pnpm build` in `ui/frontend/` (base path `/game`), then
copies the resulting `build/` tree into the `galaxy-dev-ui-dist` volume.
`seed-site` builds the VitePress project site in `site/` and copies its
`.vitepress/dist/` output into the `galaxy-dev-site-dist` volume.
Subsequent CI deploys overwrite both volumes automatically.
## Daily flow
```sh
make -C tools/dev-deploy rebuild # rebuild backend/gateway images + up
make -C tools/dev-deploy logs # tail compose logs
make -C tools/dev-deploy health # probe https://galaxy.lan/ , /game/ , /healthz
make -C tools/dev-deploy down # stop, keep state
```
State persists in named volumes between `up`/`down` cycles. The
`development` branch keeps the dev environment continuously usable —
games created last week survive into this week unless somebody
calls `make clean-data`.
## Logging in
The same dev-mode email-code override as `tools/local-dev/` applies,
and the dev-deploy compose ships with it enabled by default:
1. Enter `dev@galaxy.lan` (or whatever `BACKEND_DEV_SANDBOX_EMAIL`
resolves to) in the login form.
2. Submit `123456` as the code — the docker-compose default for
`BACKEND_AUTH_DEV_FIXED_CODE` is `123456`, so the bcrypt-hashed
email code stays a fallback. To force real Mailpit codes (e.g. for
mail-flow QA), set `BACKEND_AUTH_DEV_FIXED_CODE=` (empty) in a
local `.env` and `make rebuild`.
The fixed-code override is rejected by production env loaders, so it
cannot leak into the prod environment.
## Networking
```
Browser
│ https://galaxy.lan/ (one origin, path-based)
host-Caddy (:80, :443, TLS, attached to `edge` network)
│ reverse_proxy galaxy.lan → galaxy-caddy:80
galaxy-caddy (networks: edge + galaxy-dev-internal)
│ / -> file_server /srv/galaxy-site (volume galaxy-dev-site-dist)
│ /game/* -> file_server /srv/galaxy-ui (volume galaxy-dev-ui-dist)
│ /api/*, /healthz -> reverse_proxy galaxy-api:8080
│ /rpc/* -> reverse_proxy galaxy-api:9090 (strips /rpc)
galaxy-dev-internal
├─ galaxy-api (gateway: :8080 REST, :9090 gRPC)
├─ galaxy-backend (backend: :8080 HTTP, :8081 gRPC push)
├─ galaxy-postgres (postgres: :5432)
├─ galaxy-redis (redis: :6379)
├─ galaxy-mailpit (mailpit: :8025 UI, :1025 SMTP)
└─ engine containers (spawned by backend on demand)
```
The compose project deliberately exposes no host ports. Diagnostics
that used to go through `localhost:8025` etc. now go through the
container network: `docker compose -f tools/dev-deploy/docker-compose.yml
exec galaxy-mailpit wget -qO- localhost:8025/messages` and similar.
## Persistent state and schema changes
The dev Postgres volume `galaxy-dev-postgres-data` survives redeploys.
Schema deltas land as additive, sequence-numbered migration files
(`backend/internal/postgres/migrations/0000N_*.sql`) and `pressly/goose`
applies them on backend startup without operator action.
Use `make -C tools/dev-deploy clean-data` only when you deliberately
want a fresh database (debugging schema drift, exercising the
bootstrap path from scratch, etc.):
```sh
make -C tools/dev-deploy clean-data
make -C tools/dev-deploy up
```
The same volume-persistence model applies to `tools/local-dev/`.
## Make targets
```text
make up Build images, ensure engine image, seed geoip, bring stack up
make rebuild Rebuild backend / gateway images (ignores cache), then up
make seed-ui pnpm build (base /game) + load build/ into galaxy-dev-ui-dist volume
make seed-site vitepress build + load site dist into galaxy-dev-site-dist volume
make seed-geoip Copy pkg/geoip fixture into galaxy-dev-geoip-data volume
make build-engine Build galaxy-engine:dev (no-op if image already present)
make down Stop containers, keep named volumes
make logs Tail compose logs
make status docker compose ps
make health curl https://galaxy.lan/ , /game/ , and /healthz
make psql psql as galaxy@galaxy_backend
make clean-data Stop everything and wipe volumes + game-state dir
```
## Files
- `docker-compose.yml` — six services: postgres, redis, mailpit,
galaxy-backend, galaxy-api, galaxy-caddy. `galaxy-caddy` mounts both
the `galaxy-dev-site-dist` (`/srv/galaxy-site`) and
`galaxy-dev-ui-dist` (`/srv/galaxy-ui`) volumes and reverse-proxies
both gateway tiers (REST/health on `:8080`, Connect/gRPC-web on
`:9090`). Reuses the alpine-runtime Dockerfiles from `../local-dev/`
so the backend healthcheck can run `wget`. Reuses the dev keypair
from `../local-dev/keys/`.
- `Caddyfile.dev` — the application-routing Caddy config and the
authoritative single-origin path topology, mounted into `galaxy-caddy`
at `/etc/caddy/Caddyfile`.
- `Caddyfile.prod` — placeholder for a future prod deployment; not used
by this compose.
- `Makefile` — wrapper over `docker compose` with helpers for engine,
site/UI seeding, health probes, and full wipe.
- `.env.example` — non-secret defaults for the compose `${VAR:-}`
expansions. Copy to `.env` if you want host-local overrides.
## Known issues
See [`KNOWN-ISSUES.md`](KNOWN-ISSUES.md) for symptoms that surface
in the long-lived dev environment but are not yet fixed (currently:
the sandbox game flipping to `cancelled` after a redispatch).
## Deployment cadence
This environment is single-tenant: one live deployment, redeployed by
the `dev-deploy.yaml` workflow on every merge into `development`. PR
branches do not auto-deploy here — pushes to `feature/*` only run the
test workflows (`go-unit`, `ui-test`, `integration`).
To put a feature branch on the shared dev environment before its PR
merges (e.g. to validate a UI flow against the real Caddy edge), run
the workflow manually:
1. Push the branch (`git push gitea HEAD`).
2. Gitea UI → **Actions → Deploy · Dev → Run workflow**, pick the
feature ref.
The deploy is idempotent — when the PR later merges into
`development`, the regular push trigger fires the same packaging and
healthcheck steps, overwriting whatever the manual dispatch left
behind. There is no separate state to clean up between the two paths.
### Engine image drift recycle
`backend` spawns one engine container per game (the long-lived "Dev
Sandbox" plus any user-created games) and the reconciler reattaches
to whatever it finds with the `galaxy.stack=dev-deploy` label. That
reattach does not check the running container's image SHA against the
freshly-built `galaxy-engine:dev` tag, so an unchanged container would
otherwise keep serving the previous engine code after a redeploy.
The `dev-deploy.yaml` workflow handles this in the
`Recycle engine containers on image drift` step. When `docker build`
produces a new `galaxy-engine:dev` SHA, the step compares it against
every running `galaxy-game-*` container and, for each drifted one,
stops the backend, removes the container, wipes its bind-mounted
state directory (Engine.Init() writes turn-0 over any pre-existing
`turn-N` files), and cascade-deletes the lobby `games` row. The
`dev-sandbox` bootstrap on the next backend boot finds no live
sandbox and provisions a fresh one on the new engine image.
When the engine sources are unchanged, the BuildKit cache hits and
the SHA stays the same — the recycle step is a no-op and the running
games keep their state across the deploy.
## Relationship to other infrastructure
- `tools/local-dev/` — single-developer playground, host-port mapped,
Vite dev server on the side. Recommended for active UI work.
- `.gitea/workflows/dev-deploy.yaml` — the CI side of this stack:
builds images, seeds the site and UI volumes, runs `docker compose
up -d` on every merge into `development`. The Makefile in this
directory is what that workflow ultimately calls into.