25 Commits

Author SHA1 Message Date
developer 77cb7c78b6 Merge pull request #9: ui-test singleton queue
Tests · UI / test (push) Successful in 2m14s
Replaces per-sha cancel-in-progress (which fired spurious self-cancels) with a singleton queueing group. ui-test #74 (push) and #75 (pull_request) both green at ~2m, queue-not-cancel verified.
2026-05-15 06:57:09 +00:00
Ilia Denisov 1a0e3e992f ci/ui-test: queue runs in one bucket instead of cancelling
Tests · UI / test (push) Waiting to run
Tests · UI / test (pull_request) Successful in 2m20s
`cancel-in-progress: true` killed run #73 even though it was the
only ui-test in its concurrency group — Gitea appears to cancel the
in-progress job on its own under that setting in some edge cases.

Switch to a singleton group with `cancel-in-progress: false`. The
new behaviour is simple queueing: only one ui-test workflow runs at
a time across the repository, the rest wait. Vite-on-:5173 cannot
collide because there is never a second ui-test alive. The wall-time
hit is bounded — ui-test is ~2 minutes — and bursts are rare enough
that queueing is cheap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:51:54 +02:00
developer faf598b2cd Merge pull request #8: Playwright tuning + concurrency for ui-test
Tests · UI / test (push) Failing after 6s
Deploy · Dev / deploy (push) Successful in 32s
Caps Playwright at 4 workers + 4 retries to absorb the host-mode flake budget, and serialises ui-test runs by head sha so push and pull_request events for the same commit cannot collide on Vite :5173.
2026-05-15 06:49:06 +00:00
Ilia Denisov 6e6186a571 ci/ui-test: key concurrency by head sha, not gitea.ref
Tests · UI / test (push) Has been cancelled
Tests · UI / test (pull_request) Successful in 2m17s
`gitea.ref` differs between push (`refs/heads/<branch>`) and
pull_request (`refs/pull/N/head`) events even for the same commit,
so the two parallel runs land in different concurrency groups and
the Vite-on-:5173 collision is not suppressed. Switching the key to
the head sha (`gitea.event.pull_request.head.sha || gitea.sha`)
collapses both events into one bucket, leaving exactly one ui-test
alive per commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:46:00 +02:00
Ilia Denisov e3bb30201d ci/ui-test: serialise per-ref + clear stale Vite before Playwright
Tests · UI / test (pull_request) Failing after 6s
Tests · UI / test (push) Successful in 2m21s
Two ui-test jobs cannot coexist on the same host: Playwright's
`webServer` spec spawns `pnpm dev` on :5173, and on a host-mode
runner the port lives in the host namespace shared by every job.
ui-test #67 hit "Error: http://localhost:5173 is already used"
because a parallel job's Vite still held the port.

Two changes:

1. `concurrency: ui-test-${{ gitea.ref }}` with `cancel-in-progress:
   true`. New push/PR runs against the same ref kill any earlier
   ui-test before starting, so we never have two `pnpm dev`s alive
   at once.
2. `pkill -f 'vite dev' || true` plus `fuser -k 5173/tcp` right
   before Playwright. Defence in depth in case the concurrency
   cancellation does not reap the spawned shell promptly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:42:08 +02:00
Ilia Denisov 7ff81de2b6 ui/frontend: cap Playwright at 4 workers, retry 4 times
Tests · UI / test (pull_request) Failing after 26s
Tests · UI / test (push) Successful in 2m21s
Under host-mode runner the default 6 workers + 1 retry consistently
land on ~7 flakies and an occasional hard fail per ui-test run
(ui-test #59 most recently). Workers share CPU and the host Docker
daemon with gitea, the long-lived dev stack, and the user's host
Caddy; the extra wall time from contention pushes individual
expectations past their timeouts.

Lower the worker cap to 4 to keep parallelism but give each worker
real CPU headroom, and raise retries to 4 so the rare slow page is
absorbed without surfacing as failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 08:39:22 +02:00
developer 9d65bf5157 Merge pull request #7: flaky RandomSuffix + CORS allow-list
Tests · Integration / integration (push) Successful in 1m40s
Deploy · Dev / deploy (push) Successful in 34s
Tests · Go / test (push) Successful in 1m40s
Fixes the birthday-collision flake in TestRandomSuffixGenerator and adds an env-driven CORS allow-list on the public gateway so the dev UI on https://www.galaxy.lan can reach https://api.galaxy.lan.
2026-05-15 06:35:58 +00:00
Ilia Denisov 1855e43699 gateway: add CORS allow-list for the public REST surface
Tests · Go / test (push) Successful in 1m42s
Tests · Go / test (pull_request) Successful in 1m45s
Tests · Integration / integration (pull_request) Successful in 1m36s
Adds a `GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS` env-driven allow-list
on the public REST server so the dev UI on https://www.galaxy.lan can
call https://api.galaxy.lan without the browser blocking the
cross-origin response. Defaults to empty (no CORS) so the production
posture stays closed.

The middleware mounts before route classification and anti-abuse, so
OPTIONS preflights never charge against per-class rate-limit buckets.

`tools/dev-deploy/docker-compose.yml` opts the dev gateway into a
single allowed origin (`https://www.galaxy.lan`); local-dev keeps the
defaults because Vite proxies through the same origin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:58:14 +02:00
Ilia Denisov 7bce67462c pkg/util: harden TestRandomSuffixGenerator against birthday collisions
The previous test asserted that no two adjacent samples from a
~10 000-element space were equal across 100 iterations. The birthday
math gives that adjacency check a ~1 % flake rate per run; with the
new gitea.lan CI volume that turned into observable random failures
(go-unit #51 on feature/enable-actions-cache hit "Should not be:
'6635'").

Replace adjacency with a distinctness floor over a wider 200-sample
draw. A stuck generator (single value) lands at 1 unique; a
256-element range lands at ~196; the natural full-range generator
lands at ~198. A floor of 150 catches the failure modes the test was
actually written to guard against and never trips on legitimate
randomness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:58:02 +02:00
developer 2be7e5c110 Merge pull request #6: re-enable actions cache
Deploy · Dev / deploy (push) Successful in 27s
Tests · Go / test (push) Successful in 1m43s
Tests · Integration / integration (push) Successful in 1m42s
Tests · UI / test (push) Failing after 2m10s
Cache service answers on 10.200.0.1:43513 after the nftables fix. setup-go/setup-node opt back into cache: true / cache: pnpm. Cache hit verified in run #55 (ui-test on PR head).
2026-05-15 05:46:57 +00:00
Ilia Denisov 2a95bf4a50 ci: re-enable actions cache now that the runner serves it
Tests · UI / test (push) Successful in 2m20s
Tests · Go / test (push) Failing after 2m21s
Tests · Go / test (pull_request) Successful in 1m40s
Tests · Integration / integration (pull_request) Successful in 1m46s
Tests · UI / test (pull_request) Successful in 2m2s
The Gitea Actions cache service now answers on 10.200.0.1:43513
(post nftables fix on the runner side). Turn `cache: true` and
`cache: pnpm` back on so setup-go/setup-node can use it for
cross-job tarball caching on top of the host-persistent caches we
already rely on.

The setup-* actions still tolerate the cache being unavailable, so
this is reversible to `cache: false` if the service goes away again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:39:39 +02:00
developer fd071260ec Merge pull request #5: drop cache: setting in setup-go / setup-node
Deploy · Dev / deploy (push) Successful in 31s
Tests · Go / test (push) Successful in 1m44s
Tests · Integration / integration (push) Successful in 1m42s
Tests · UI / test (push) Successful in 2m11s
Avoids the zombie reserveCache retries against the unreachable Gitea Actions cache service on :43513. Host-mode runner keeps caches warm in $HOME without needing actions/cache plumbing.
2026-05-14 04:47:56 +00:00
Ilia Denisov 8058f26397 ci: drop cache: setting in setup-go/setup-node
Tests · Go / test (push) Successful in 2m21s
Tests · UI / test (push) Successful in 2m22s
Tests · Go / test (pull_request) Successful in 3m14s
Tests · Integration / integration (pull_request) Successful in 1m37s
Tests · UI / test (pull_request) Successful in 2m7s
`cache: true` (setup-go) and `cache: pnpm` (setup-node) make the
actions push and pull tarballs through the Gitea Actions cache
service at 192.168.0.222:43513. That endpoint currently does not
answer, so every workflow burns minutes per run on reserveCache
retries before the action gives up.

In host-mode the real caches live under the runner user's $HOME
(~/go/pkg/mod, ~/.cache/go-build, ~/.local/share/pnpm,
~/.cache/ms-playwright) and persist between jobs without any
actions/cache plumbing. Switching cache: off avoids the zombie
retries and uses the local disk caches the runner already has warm.

Reviving the cache service is a separate TODO. Until then this is
the simpler and faster baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:39:22 +02:00
developer 660044559c Merge pull request #4: cleanup after host-mode runner
Deploy · Dev / deploy (push) Successful in 28s
Tests · Go / test (push) Successful in 1m41s
Tests · Integration / integration (push) Successful in 1m45s
Tests · UI / test (push) Successful in 2m14s
Drops the docker-in-docker workarounds (GIT_SSL_NO_VERIFY env, GeoIP image bake, playwright --with-deps) now that act_runner executes jobs natively on the host.
2026-05-14 04:31:27 +00:00
Ilia Denisov 9135991887 ci/ui-test: drop --with-deps now that runner is host-mode
Tests · Go / test (pull_request) Successful in 2m6s
Tests · UI / test (push) Failing after 2m32s
Tests · Integration / integration (pull_request) Successful in 1m52s
Tests · UI / test (pull_request) Successful in 2m3s
`playwright install --with-deps` shells out to `sudo apt-get install`
for the system libraries that headless browsers need. In a job
container that runs as root this is silent; on a host-mode runner the
non-interactive sudo prompts for a password, fails three times, and
the step exits 1.

Drop --with-deps. The system .so libraries are installed once on the
host via `pnpm exec playwright install-deps` (or the equivalent
apt-get incantation); workflow runs only need to fetch the browser
binaries themselves, which lives under the runner user's home and
needs no privilege.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:59:45 +02:00
Ilia Denisov bb74e3336e dev-deploy: restore GeoIP bind-mount, drop image bake
Tests · Integration / integration (pull_request) Successful in 2m14s
Tests · Go / test (pull_request) Successful in 2m19s
Tests · UI / test (pull_request) Failing after 51m17s
With the runner in host-mode, compose bind-mount paths resolve to
real host paths the Docker daemon can see, so the GeoIP file no
longer needs to be baked into the backend image to survive CI. Bring
back the bind-mount of `pkg/geoip/test-data/.../mmdb`, matching how
local-dev sources it. Image now only carries the backend binary,
symmetric with the production `backend/Dockerfile`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:04:11 +02:00
Ilia Denisov 4a88b24f4b ci: drop GIT_SSL_NO_VERIFY now that runner is host-mode
The act_runner now executes jobs natively on the host (no per-job
container), so actions/checkout uses the host's system CA store,
which already trusts the host-Caddy root CA. The workaround that
disabled TLS verification for `git fetch` is no longer needed and
just hides legitimate cert issues if they ever appear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:04:11 +02:00
developer fe8ad6a02a Merge pull request 'dev-deploy: fix backend startup in CI' (#3) from feature/dev-deploy-followups into development
Tests · Integration / integration (push) Successful in 2m16s
Tests · Go / test (push) Successful in 2m39s
Tests · UI / test (push) Successful in 12m29s
Deploy · Dev / deploy (push) Successful in 44s
Reviewed-on: https://gitea.dev/developer/galaxy-game/pulls/3
2026-05-13 22:42:03 +00:00
Ilia Denisov 9ebb2e7f0f ci: rename workflows for Gitea UI readability
Tests · Go / test (push) Successful in 2m31s
Tests · Integration / integration (pull_request) Successful in 2m23s
Tests · Go / test (pull_request) Successful in 2m50s
Tests · UI / test (push) Successful in 13m2s
Tests · UI / test (pull_request) Successful in 13m22s
Switches the `name:` field on every workflow to the bulleted style:

  Tests · Go            (go-unit.yaml)
  Tests · UI            (ui-test.yaml)
  Tests · Integration   (integration.yaml)
  Deploy · Dev          (dev-deploy.yaml)
  Build · Prod          (prod-build.yaml)
  Deploy · Prod         (deploy-prod.yaml)

File names stay the same so existing path filters and any URL
references continue to work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 00:22:53 +02:00
Ilia Denisov 0da360a644 dev-deploy: fix backend startup in CI
Two bugs surfaced on the first real merge into development:

1. `${{ env.HOME }}` evaluates to empty string at the workflow stage,
   so GALAXY_DEV_GAME_STATE_DIR became `/.galaxy-dev/game-state`.
   Resolve in the shell instead of YAML.

2. The compose bind-mount of GeoIP2-Country-Test.mmdb referenced a
   path inside the runner's workspace volume, which the host Docker
   daemon cannot see — it created an empty directory and the backend
   crashed with "geoip database: is a directory" in a restart loop.
   Bake the file into the backend image so dev-deploy no longer needs
   a bind-mount; local-dev compose still mounts it on top for swap-in
   during development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 00:22:16 +02:00
developer 6686059535 Merge pull request 'tools/dev-deploy: long-lived dev environment behind host Caddy' (#2) from feature/ci-reorg-and-dev-deploy into development
go-unit / test (push) Successful in 2m32s
integration / integration (push) Successful in 2m3s
dev-deploy / deploy (push) Failing after 5m7s
ui-test / test (push) Has been cancelled
Reviewed-on: https://gitea.dev/developer/galaxy-game/pulls/2
2026-05-13 22:10:24 +00:00
Ilia Denisov c6c5f3c8dd ci: skip TLS verify for actions/checkout on LAN Gitea
go-unit / test (push) Successful in 2m28s
go-unit / test (pull_request) Successful in 2m30s
integration / integration (pull_request) Successful in 2m20s
ui-test / test (push) Successful in 13m5s
ui-test / test (pull_request) Successful in 14m31s
The Gitea host serves https://gitea.iliadenisov.ru with a cert signed
by host-Caddy's internal CA, which the runner-image's CA bundle does
not trust. actions/checkout@v4 fails on `git fetch` as a result, so
every workflow on gitea.lan has been failing — visible only now that
we made gitea.lan the primary CI target.

Sets GIT_SSL_NO_VERIFY=true on every workflow as a quick fix. Safe in
practice because both endpoints sit on the same LAN. The long-term
fix is to bake the Caddy root CA into the runner image and drop this
env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:43:51 +02:00
Ilia Denisov f00c8efd18 docs: sync project guides to the new CI flow
go-unit / test (pull_request) Failing after 30s
integration / integration (pull_request) Failing after 34s
ui-test / test (pull_request) Failing after 37s
Aligns the project guides with the branching/CI/environment changes
landed in the previous commits:

- CLAUDE.md: per-stage CI gate now closes against gitea.lan; describes
  the main/development/feature/* flow and the workflow surface
- docs/ARCHITECTURE.md: new section 18 "CI and Environments" covering
  branches, workflows, and the local-dev / dev-deploy / local-ci
  triad; section numbering shifted accordingly
- tools/local-ci/README.md: marked as fallback (offline / runner
  isolation only)
- tools/local-dev/README.md and ui/README.md: cross-link to
  tools/dev-deploy/ for production-shaped testing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:26:57 +02:00
Ilia Denisov f316952c12 ci: split workflows for linear development flow
Reshapes .gitea/workflows/ around the new main ← development ←
feature/* branching model:

- go-unit.yaml — Go unit tests, runs on push/PR matching Go paths
- ui-test.yaml — narrowed to Vitest + Playwright only (Go tests now
  live in go-unit.yaml)
- integration.yaml — testcontainers suite, fires on PR to
  development/main and on push to development
- dev-deploy.yaml — builds the stack and (re)deploys tools/dev-deploy/
  on every merge into development
- prod-build.yaml — builds prod images on push to main and uploads
  docker save bundles as artifacts (30-day retention)
- deploy-prod.yaml — workflow_dispatch placeholder for the future
  SSH-based rollout

ui-release.yaml is removed; its v* tag trigger is superseded by
prod-build.yaml plus the manual deploy-prod entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:26:46 +02:00
Ilia Denisov 00c79064fc tools/dev-deploy: long-lived dev environment behind host Caddy
A docker-compose stack that hosts postgres, redis, mailpit, backend,
gateway, and an app-routing Caddy. Reachable through the host Caddy at
https://www.galaxy.lan (static SPA) and https://api.galaxy.lan (REST +
gRPC). Coexists with tools/local-dev/ and tools/local-ci/ by giving
every name (compose project, container, network, volume) a distinct
galaxy-dev-* prefix.

State is persisted in named volumes; game-state lives under
${GALAXY_DEV_GAME_STATE_DIR:-$HOME/.galaxy-dev/game-state} so the
default works for a non-root runner without sudo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 23:26:35 +02:00
25 changed files with 1367 additions and 244 deletions
+31
View File
@@ -0,0 +1,31 @@
name: Deploy · Prod
# Placeholder for the production rollout workflow. Today it only proves
# the manual entry point works; the actual `docker save | ssh prod
# docker load` + remote `docker compose up -d` pipeline is wired in
# once the production host, SSH credentials, and DNS are decided.
on:
workflow_dispatch:
inputs:
image_tag:
description: "Image tag to deploy (commit-<sha12>, produced by prod-build.yaml)"
required: true
type: string
jobs:
deploy:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Announce target
run: |
echo "Would deploy image tag: ${{ inputs.image_tag }}"
echo "TODO:"
echo " 1. Download galaxy-images-${{ inputs.image_tag }} from prod-build artifacts."
echo " 2. scp the .tar.gz bundles to the production host."
echo " 3. ssh prod 'docker load -i ...' for backend / gateway / engine."
echo " 4. ssh prod 'docker compose -f /opt/galaxy/docker-compose.yml up -d'."
echo " 5. Probe https://api.galaxy.com/healthz and roll back on failure."
+117
View File
@@ -0,0 +1,117 @@
name: Deploy · Dev
# Builds the Galaxy stack and (re)deploys it into the long-lived dev
# environment on the host running this Gitea Actions runner. Triggered
# on every merge into `development`. Branch protections on `development`
# guarantee the commit already passed `go-unit`, `ui-test`, and
# `integration` as part of the PR that produced this push, so this
# workflow does not re-run those tests — it focuses on packaging and
# rollout.
on:
push:
branches:
- development
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/**'
- 'go.work'
- 'go.work.sum'
- 'tools/dev-deploy/**'
- '.gitea/workflows/dev-deploy.yaml'
- '!**/*.md'
jobs:
deploy:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Set up pnpm
uses: pnpm/action-setup@v4
with:
version: 11.0.7
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
cache-dependency-path: ui/pnpm-lock.yaml
- name: Install UI dependencies
working-directory: ui
run: pnpm install --frozen-lockfile
- name: Build UI frontend
working-directory: ui/frontend
env:
VITE_GATEWAY_BASE_URL: https://api.galaxy.lan
run: |
# The response-signing public key is committed in
# `.env.development` alongside its private counterpart in
# `tools/local-dev/keys/`. Pull it from there at build time so
# the production-mode bundle ships the same key the dev
# gateway uses to sign.
export VITE_GATEWAY_RESPONSE_PUBLIC_KEY="$(grep -E '^VITE_GATEWAY_RESPONSE_PUBLIC_KEY=' .env.development | cut -d= -f2)"
pnpm build
- name: Build galaxy-engine image
working-directory: ${{ gitea.workspace }}
run: |
docker build \
-t galaxy-engine:dev \
-f game/Dockerfile \
.
- name: Build backend + gateway images
working-directory: tools/dev-deploy
run: |
docker compose build galaxy-backend galaxy-api
- name: Seed UI volume
run: |
docker volume create galaxy-dev-ui-dist >/dev/null
docker run --rm \
-v galaxy-dev-ui-dist:/dst \
-v "${{ gitea.workspace }}/ui/frontend/build:/src:ro" \
alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/'
- name: Bring up the stack
working-directory: tools/dev-deploy
run: |
# Resolve in the shell, not in YAML expressions — `env.HOME`
# is empty at the workflow-evaluation stage.
export GALAXY_DEV_GAME_STATE_DIR="$HOME/.galaxy-dev/game-state"
mkdir -p "$GALAXY_DEV_GAME_STATE_DIR"
docker compose up -d --wait --remove-orphans
- name: Probe the stack
run: |
set -e
# Use --resolve so the probe goes through the same routing as
# a browser on the host: the host Caddy on :443 (which has
# `tls internal`) terminates and forwards into the edge
# network. We accept the host's internal CA via -k because
# the runner image has no reason to trust it.
curl -sk --max-time 10 https://api.galaxy.lan/healthz \
| tee /tmp/healthz
test -s /tmp/healthz
curl -sk --max-time 10 -o /dev/null -w '%{http_code}\n' \
https://www.galaxy.lan/ | tee /tmp/www_status
grep -qE '^(200|304)$' /tmp/www_status
+78
View File
@@ -0,0 +1,78 @@
name: Tests · Go
# Fast unit tests for the Go side of the monorepo. Runs on every push
# and pull request whose path filter matches a Go source directory.
# The integration suite (testcontainers-driven, slow) lives in
# `integration.yaml` and only fires for PRs into `development`/`main`
# and pushes to `development`.
on:
push:
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/core/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/go-unit.yaml'
- '!**/*.md'
pull_request:
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/core/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/go-unit.yaml'
- '!**/*.md'
jobs:
test:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Run Go tests
# client/ is the deprecated Fyne client; excluded from CI per
# ui/PLAN.md §74. -count=1 disables Go's test cache so a green
# run never depends on a previous runner's cached state. The
# backend suite is run with -p 1 because most backend packages
# spawn their own Postgres testcontainer, and parallel
# Postgres bootstraps starve each other on a constrained
# runner. pkg modules are listed one by one because ./pkg/...
# does not recurse across the independent go.work modules
# under pkg/.
run: |
go test -count=1 -p 1 ./backend/...
go test -count=1 \
./gateway/... \
./game/... \
./ui/core/... \
./pkg/calc/... \
./pkg/connector/... \
./pkg/cronutil/... \
./pkg/error/... \
./pkg/geoip/... \
./pkg/model/... \
./pkg/postgres/... \
./pkg/redisconn/... \
./pkg/schema/... \
./pkg/storage/... \
./pkg/transcoder/... \
./pkg/util/...
+65
View File
@@ -0,0 +1,65 @@
name: Tests · Integration
# Full integration suite (testcontainers-driven, ~510 minutes). Heavy
# enough that we do not run it on every push to a feature branch — only
# when there is an open PR aimed at `development`/`main`, or after a
# merge into `development`. The unit jobs (`go-unit.yaml`,
# `ui-test.yaml`) keep guarding fast feedback on every push.
on:
pull_request:
branches:
- development
- main
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/core/**'
- 'integration/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/integration.yaml'
- '!**/*.md'
push:
branches:
- development
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/core/**'
- 'integration/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/integration.yaml'
- '!**/*.md'
jobs:
integration:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Run integration suite
# `make integration` precleans leftover docker-compose state and
# then runs every test under integration/ serially (-p=1
# -parallel=1, 15-minute per-test timeout). Testcontainers
# reaches the host's docker daemon via the socket Gitea exposes
# to the runner; the workflow inherits the same access the
# runner has.
run: make -C integration integration
+116
View File
@@ -0,0 +1,116 @@
name: Build · Prod
# Builds the production-grade Docker images and the UI bundle on every
# merge into `main`, then saves the artifacts so a future
# `deploy-prod.yaml` run can ship them to the production host. This
# workflow does not deploy anything by itself — production rollout is
# strictly manual (workflow_dispatch on `deploy-prod.yaml`).
on:
push:
branches:
- main
paths:
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'ui/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/prod-build.yaml'
- '!**/*.md'
jobs:
build:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Set up pnpm
uses: pnpm/action-setup@v4
with:
version: 11.0.7
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
cache-dependency-path: ui/pnpm-lock.yaml
- name: Resolve image tag
id: tag
run: |
short_sha=$(git rev-parse --short=12 HEAD)
echo "tag=commit-${short_sha}" >>"$GITHUB_OUTPUT"
- name: Build backend image
run: |
docker build \
-t "galaxy/backend:${{ steps.tag.outputs.tag }}" \
-f backend/Dockerfile \
.
- name: Build gateway image
run: |
docker build \
-t "galaxy/gateway:${{ steps.tag.outputs.tag }}" \
-f gateway/Dockerfile \
.
- name: Build engine image
run: |
docker build \
-t "galaxy/game-engine:${{ steps.tag.outputs.tag }}" \
-f game/Dockerfile \
.
- name: Install UI dependencies
working-directory: ui
run: pnpm install --frozen-lockfile
- name: Build UI bundle
working-directory: ui/frontend
env:
VITE_GATEWAY_BASE_URL: https://api.galaxy.com
run: |
# Production response-signing public key is not in the repo
# yet (the dev key in `tools/local-dev/keys/` is for dev
# only). When real prod keys exist, source them from a Gitea
# Actions secret and set VITE_GATEWAY_RESPONSE_PUBLIC_KEY
# here. Until then the prod bundle compiles with the dev
# key as a placeholder so the artifact exists.
export VITE_GATEWAY_RESPONSE_PUBLIC_KEY="$(grep -E '^VITE_GATEWAY_RESPONSE_PUBLIC_KEY=' .env.development | cut -d= -f2)"
pnpm build
- name: Save images as artifact bundles
run: |
mkdir -p artifacts
docker save "galaxy/backend:${{ steps.tag.outputs.tag }}" \
| gzip >"artifacts/backend-${{ steps.tag.outputs.tag }}.tar.gz"
docker save "galaxy/gateway:${{ steps.tag.outputs.tag }}" \
| gzip >"artifacts/gateway-${{ steps.tag.outputs.tag }}.tar.gz"
docker save "galaxy/game-engine:${{ steps.tag.outputs.tag }}" \
| gzip >"artifacts/game-engine-${{ steps.tag.outputs.tag }}.tar.gz"
tar -C ui/frontend -czf \
"artifacts/ui-dist-${{ steps.tag.outputs.tag }}.tar.gz" build
- name: Upload images
uses: actions/upload-artifact@v4
with:
name: galaxy-images-${{ steps.tag.outputs.tag }}
path: artifacts/*.tar.gz
retention-days: 30
-148
View File
@@ -1,148 +0,0 @@
name: ui-release
# Tier 2 (release) workflow. Runs on tag push.
#
# Currently mirrors the Tier 1 step set. Visual regression baseline
# checks and the macOS-runner iOS smoke job are landed in later phases
# of ui/PLAN.md and live as commented sections at the end of this file
# until those phases ship.
on:
push:
tags:
- 'v*'
jobs:
test:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Run Go tests
# client/ is the deprecated Fyne client; excluded from CI per
# ui/PLAN.md §74. -count=1 disables Go's test cache so a green
# run never depends on a previous runner's cached state. The
# backend suite is run with -p 1 because most backend packages
# spawn their own Postgres testcontainer, and parallel
# Postgres bootstraps starve each other on a constrained
# runner. pkg modules are listed one by one because ./pkg/...
# does not recurse across the independent go.work modules
# under pkg/.
run: |
go test -count=1 -p 1 ./backend/...
go test -count=1 \
./gateway/... \
./game/... \
./ui/core/... \
./pkg/calc/... \
./pkg/connector/... \
./pkg/cronutil/... \
./pkg/error/... \
./pkg/geoip/... \
./pkg/model/... \
./pkg/postgres/... \
./pkg/redisconn/... \
./pkg/schema/... \
./pkg/storage/... \
./pkg/transcoder/... \
./pkg/util/...
- name: Set up pnpm
uses: pnpm/action-setup@v4
with:
version: 11.0.7
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
cache-dependency-path: ui/pnpm-lock.yaml
- name: Install npm dependencies
working-directory: ui
run: pnpm install --frozen-lockfile
- name: Install Playwright browsers
working-directory: ui/frontend
run: pnpm exec playwright install --with-deps
- name: Run Vitest
working-directory: ui/frontend
run: pnpm test
- name: Run Playwright
working-directory: ui/frontend
run: pnpm exec playwright test
- name: Upload Playwright report on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: ui/frontend/playwright-report/
retention-days: 14
- name: Upload Playwright traces on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-traces
path: ui/frontend/test-results/
retention-days: 14
# visual-regression: enabled in Phase 33 of ui/PLAN.md, once the PWA
# shell and service worker land and a snapshot baseline is committed
# under ui/frontend/tests/__snapshots__/.
#
# visual-regression:
# runs-on: ubuntu-latest
# needs: test
# steps:
# - uses: actions/checkout@v4
# - uses: pnpm/action-setup@v4
# with: { version: 11.0.7 }
# - uses: actions/setup-node@v4
# with:
# node-version: 22
# cache: pnpm
# cache-dependency-path: ui/pnpm-lock.yaml
# - working-directory: ui
# run: pnpm install --frozen-lockfile
# - working-directory: ui/frontend
# run: pnpm exec playwright install --with-deps
# - working-directory: ui/frontend
# run: pnpm exec playwright test --grep @visual
# ios-smoke: enabled in Phase 32 of ui/PLAN.md, once the Capacitor
# wrapper lands. Runs a Capacitor + Appium smoke against an iOS
# simulator on a macOS runner.
#
# ios-smoke:
# runs-on: macos-13
# needs: test
# steps:
# - uses: actions/checkout@v4
# - uses: pnpm/action-setup@v4
# with: { version: 11.0.7 }
# - uses: actions/setup-node@v4
# with:
# node-version: 22
# cache: pnpm
# cache-dependency-path: ui/pnpm-lock.yaml
# - working-directory: ui
# run: pnpm install --frozen-lockfile
# - working-directory: ui/mobile
# run: pnpm exec cap sync ios && pnpm exec appium-smoke ios
+33 -60
View File
@@ -1,41 +1,33 @@
name: ui-test name: Tests · UI
# Tier 1 (per-PR) workflow. Runs Vitest + Playwright for the UI client and # UI-side unit and end-to-end tests (Vitest + Playwright). The Go side
# the monorepo Go service tests (everything except the integration suite, # of the workspace is tested in `go-unit.yaml`. Both workflows can run
# which lives behind `make -C integration integration` and needs a Docker # in parallel for a push that touches Go and UI together.
# daemon set up for testcontainers).
#
# The path filter is intentionally broad until a dedicated go-test
# workflow is introduced; this is the only CI gate today.
on: on:
push: push:
paths: paths:
- 'ui/**' - 'ui/**'
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/ui-test.yaml' - '.gitea/workflows/ui-test.yaml'
# Skip docs-only commits. Negation removes pure markdown changes;
# mixed commits (code + .md) still match a positive pattern above
# and trigger the workflow. Image and other binary asset paths
# are already outside the positive list.
- '!**/*.md' - '!**/*.md'
pull_request: pull_request:
paths: paths:
- 'ui/**' - 'ui/**'
- 'backend/**'
- 'gateway/**'
- 'game/**'
- 'pkg/**'
- 'go.work'
- 'go.work.sum'
- '.gitea/workflows/ui-test.yaml' - '.gitea/workflows/ui-test.yaml'
- '!**/*.md' - '!**/*.md'
# Playwright launches its own `pnpm dev` on :5173, and in host-mode
# the runner shares the host's port namespace with every other job,
# so two parallel ui-test runs collide on EADDRINUSE. Serialise via a
# singleton concurrency group with queueing — new runs wait their
# turn instead of cancelling the in-progress one. cancel-in-progress
# is explicitly false because Gitea has shown spurious self-cancel
# behaviour under cancel-in-progress: true even when no other run
# shares the group.
concurrency:
group: ui-test-singleton
cancel-in-progress: false
jobs: jobs:
test: test:
runs-on: ubuntu-latest runs-on: ubuntu-latest
@@ -48,41 +40,6 @@ jobs:
with: with:
submodules: recursive submodules: recursive
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.work
cache: true
- name: Run Go tests
# client/ is the deprecated Fyne client; excluded from CI per
# ui/PLAN.md §74. -count=1 disables Go's test cache so a green
# run never depends on a previous runner's cached state. The
# backend suite is run with -p 1 because most backend packages
# spawn their own Postgres testcontainer, and parallel
# Postgres bootstraps starve each other on a constrained
# runner. pkg modules are listed one by one because ./pkg/...
# does not recurse across the independent go.work modules
# under pkg/.
run: |
go test -count=1 -p 1 ./backend/...
go test -count=1 \
./gateway/... \
./game/... \
./ui/core/... \
./pkg/calc/... \
./pkg/connector/... \
./pkg/cronutil/... \
./pkg/error/... \
./pkg/geoip/... \
./pkg/model/... \
./pkg/postgres/... \
./pkg/redisconn/... \
./pkg/schema/... \
./pkg/storage/... \
./pkg/transcoder/... \
./pkg/util/...
- name: Set up pnpm - name: Set up pnpm
uses: pnpm/action-setup@v4 uses: pnpm/action-setup@v4
with: with:
@@ -100,13 +57,29 @@ jobs:
run: pnpm install --frozen-lockfile run: pnpm install --frozen-lockfile
- name: Install Playwright browsers - name: Install Playwright browsers
# `--with-deps` would shell out to `sudo apt-get install` for
# the system .so libraries, which the host-mode runner cannot
# run non-interactively. The host has the deps installed once,
# globally; we only need to fetch the browser binaries here.
# If a future run fails with missing libraries, install them
# on the host via `pnpm exec playwright install-deps` (one
# shot, requires sudo).
working-directory: ui/frontend working-directory: ui/frontend
run: pnpm exec playwright install --with-deps run: pnpm exec playwright install
- name: Run Vitest - name: Run Vitest
working-directory: ui/frontend working-directory: ui/frontend
run: pnpm test run: pnpm test
- name: Clear stale Vite from :5173
# Defence in depth in case a previous job's webServer survived
# the concurrency-cancel — `pkill` does not fail when there is
# nothing to kill, and `fuser -k` cleans up anything else
# holding the port.
run: |
pkill -f 'vite dev' || true
fuser -k 5173/tcp 2>/dev/null || true
- name: Run Playwright - name: Run Playwright
working-directory: ui/frontend working-directory: ui/frontend
run: pnpm exec playwright test run: pnpm exec playwright test
+35 -20
View File
@@ -34,32 +34,47 @@ This repository hosts the Galaxy Game project.
deeper than what fits in `README.md` (per-feature design notes, deeper than what fits in `README.md` (per-feature design notes,
protocol specs, runbooks). Not stage-by-stage history. protocol specs, runbooks). Not stage-by-stage history.
## Branching and CI flow
Branches:
- `main` — production-track. Direct pushes are disallowed; the only
way in is a PR merge from `development`. A merge fires
`prod-build.yaml` which packages the artifacts; production rollout
is manual through `deploy-prod.yaml`.
- `development` — long-lived dev integration branch. Every merge into
it auto-deploys to the dev environment via `dev-deploy.yaml`
(reachable at `https://www.galaxy.lan` / `https://api.galaxy.lan`).
- `feature/*` — short-lived branches off `development`. Merged back
via PR; only then do they reach the dev environment.
Workflows in `.gitea/workflows/`:
| File | Trigger | What it does |
|------|---------|--------------|
| `go-unit.yaml` | push + PR matching Go paths | Fast Go unit tests. |
| `ui-test.yaml` | push + PR matching `ui/**` | Vitest + Playwright. |
| `integration.yaml` | PR to `development`/`main`; push to `development` | testcontainers integration suite. |
| `dev-deploy.yaml` | push to `development` | Build images + (re)deploy to `tools/dev-deploy/`. |
| `prod-build.yaml` | push to `main` | Build prod images and `docker save` into artifacts. |
| `deploy-prod.yaml` | `workflow_dispatch` | Manual rollout (placeholder until prod host exists). |
## Per-stage CI gate ## Per-stage CI gate
Every completed stage from any `PLAN.md` (per-service or `ui/PLAN.md`) Every completed stage from any `PLAN.md` (per-service or `ui/PLAN.md`)
must be exercised on the local Gitea Actions runner before being must be exercised on `gitea.lan` before being declared done. The
declared done. The runbook lives in `tools/local-ci/README.md`; the short version:
short version is:
1. Commit the stage changes. 1. Commit the stage changes on the feature branch.
2. `make -C tools/local-ci push` pushes `HEAD` to the local Gitea 2. `git push gitea …` to publish the branch.
instance and triggers every workflow that matches the changed 3. Poll the latest run in the Gitea UI (or the API) until it leaves
paths.
3. Poll the latest run via the API snippet in `ui/docs/testing.md`
(or the Gitea UI on `http://localhost:3000`) until it leaves
`running`. Inspect the log on failure. `running`. Inspect the log on failure.
4. Only after the run is `success` may the stage be marked done in 4. Only after every workflow that fired is `success` may the stage be
the corresponding `PLAN.md`. marked done in the corresponding `PLAN.md`.
This applies even when the local unit-test suite is green — `tools/local-ci/` is now an opt-in fallback for testing workflow
workflow-only failures (path filters, action-version mismatches, changes without `gitea.lan` (offline iterations, runner-isolation
missing secrets, runner-only environment differences) are cheap to debugging). It is no longer required for the per-stage gate.
catch here and expensive to catch on a remote PR. The push step is
implicitly authorised: do not ask for confirmation on every stage.
If `tools/local-ci` is not running, bring it up first
(`make -C tools/local-ci up`); do not skip this gate. The single
exception is when the user explicitly waives it for a stage.
## Decisions during stage implementation ## Decisions during stage implementation
+45 -2
View File
@@ -751,7 +751,50 @@ addition.
`GET /readyz` (Postgres reachable, migrations applied, gRPC listener `GET /readyz` (Postgres reachable, migrations applied, gRPC listener
bound). Probes are excluded from anti-replay and rate limiting. bound). Probes are excluded from anti-replay and rate limiting.
## 18. Deployment Topology (informational) ## 18. CI and Environments
The repository is monorepo and intentionally so — semver tags and
per-service rollouts are achievable without splitting the code into
multiple repositories.
Branches:
- `main` — production-track. Direct pushes are disallowed; the only
way in is a PR merge from `development`.
- `development` — long-lived dev integration branch. Every merge
triggers an auto-deploy into the long-lived dev environment on the
CI host, reachable through the host Caddy at
`https://www.galaxy.lan` and `https://api.galaxy.lan`.
- `feature/*` — short-lived branches off `development`. Merged back
via PR; PRs run unit + integration checks before merge.
Workflows under `.gitea/workflows/`:
| File | Trigger | Purpose |
|------|---------|---------|
| `go-unit.yaml` | push + PR matching Go paths | Fast Go unit tests. |
| `ui-test.yaml` | push + PR matching `ui/**` | Vitest + Playwright. |
| `integration.yaml` | PR to `development` / `main`; push to `development` | testcontainers integration suite. |
| `dev-deploy.yaml` | push to `development` | Build images, seed UI volume, `compose up` against `tools/dev-deploy/`. |
| `prod-build.yaml` | push to `main` | Build production images and persist `docker save` bundles as artifacts. |
| `deploy-prod.yaml` | manual `workflow_dispatch` | Placeholder for the future SSH-based production rollout. |
Environments:
- **`tools/local-dev/`** — single-developer playground. Bound to
host ports, Vite dev server runs on the host. Not driven by CI.
- **`tools/dev-deploy/`** — long-lived dev environment behind
`*.galaxy.lan`, redeployed on every merge into `development`.
- **production** — future. Images come from the
`galaxy-images-commit-<sha>` artifact produced by `prod-build.yaml`
and are shipped to the production host via `docker save`
`ssh prod docker load``docker compose up -d`.
`tools/local-ci/` remains as an opt-in fallback runner for testing
workflow changes without `gitea.lan`. It is no longer part of the
per-stage CI gate; see `CLAUDE.md` for the gate definition.
## 19. Deployment Topology (informational)
- MVP runs three executables: one `gateway` instance, one `backend` - MVP runs three executables: one `gateway` instance, one `backend`
instance, and N `galaxy-game-{game_id}` containers managed by backend. instance, and N `galaxy-game-{game_id}` containers managed by backend.
@@ -770,7 +813,7 @@ Future scale-out hooks (not in MVP):
- mTLS between gateway and backend. - mTLS between gateway and backend.
- Docker-socket-proxy sidecar fronting Docker daemon access. - Docker-socket-proxy sidecar fronting Docker daemon access.
## 19. Glossary ## 20. Glossary
- **device_session_id** — opaque identifier of an authenticated client - **device_session_id** — opaque identifier of an authenticated client
device; primary key of the device session record. device; primary key of the device session record.
+22
View File
@@ -40,6 +40,12 @@ const (
// the keep-alive idle timeout for the public REST listener. // the keep-alive idle timeout for the public REST listener.
publicHTTPIdleTimeoutEnvVar = "GATEWAY_PUBLIC_HTTP_IDLE_TIMEOUT" publicHTTPIdleTimeoutEnvVar = "GATEWAY_PUBLIC_HTTP_IDLE_TIMEOUT"
// publicHTTPCORSAllowedOriginsEnvVar names the environment variable that
// configures the comma-separated list of browser origins permitted to
// call the public REST surface. An empty value disables CORS entirely;
// requests without an Origin header still pass through normally.
publicHTTPCORSAllowedOriginsEnvVar = "GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS"
// publicAuthUpstreamTimeoutEnvVar names the environment variable that // publicAuthUpstreamTimeoutEnvVar names the environment variable that
// configures the timeout budget used for public auth upstream calls. // configures the timeout budget used for public auth upstream calls.
publicAuthUpstreamTimeoutEnvVar = "GATEWAY_PUBLIC_AUTH_UPSTREAM_TIMEOUT" publicAuthUpstreamTimeoutEnvVar = "GATEWAY_PUBLIC_AUTH_UPSTREAM_TIMEOUT"
@@ -457,6 +463,12 @@ type PublicHTTPConfig struct {
// AntiAbuse configures the public REST anti-abuse middleware. // AntiAbuse configures the public REST anti-abuse middleware.
AntiAbuse PublicHTTPAntiAbuseConfig AntiAbuse PublicHTTPAntiAbuseConfig
// CORSAllowedOrigins is the exact-match list of browser origins
// permitted to call the public REST surface. Empty disables CORS:
// requests without an Origin header continue to work, cross-origin
// requests are subject to the browser's default same-origin policy.
CORSAllowedOrigins []string
} }
// BackendConfig describes the consolidated backend service the gateway // BackendConfig describes the consolidated backend service the gateway
@@ -814,6 +826,16 @@ func LoadFromEnv() (Config, error) {
} }
cfg.PublicHTTP.AuthUpstreamTimeout = publicAuthUpstreamTimeout cfg.PublicHTTP.AuthUpstreamTimeout = publicAuthUpstreamTimeout
if v, ok := os.LookupEnv(publicHTTPCORSAllowedOriginsEnvVar); ok {
origins := make([]string, 0)
for part := range strings.SplitSeq(v, ",") {
if trimmed := strings.TrimSpace(part); trimmed != "" {
origins = append(origins, trimmed)
}
}
cfg.PublicHTTP.CORSAllowedOrigins = origins
}
if v, ok := os.LookupEnv(backendHTTPURLEnvVar); ok { if v, ok := os.LookupEnv(backendHTTPURLEnvVar); ok {
cfg.Backend.HTTPBaseURL = v cfg.Backend.HTTPBaseURL = v
} }
+18
View File
@@ -158,6 +158,7 @@ func TestLoadFromEnvAppliesPublicAndAuthGRPCDefaults(t *testing.T) {
assert.Equal(t, defaultPublicHTTPReadTimeout, cfg.PublicHTTP.ReadTimeout) assert.Equal(t, defaultPublicHTTPReadTimeout, cfg.PublicHTTP.ReadTimeout)
assert.Equal(t, defaultPublicHTTPIdleTimeout, cfg.PublicHTTP.IdleTimeout) assert.Equal(t, defaultPublicHTTPIdleTimeout, cfg.PublicHTTP.IdleTimeout)
assert.Equal(t, defaultPublicAuthUpstreamTimeout, cfg.PublicHTTP.AuthUpstreamTimeout) assert.Equal(t, defaultPublicAuthUpstreamTimeout, cfg.PublicHTTP.AuthUpstreamTimeout)
assert.Empty(t, cfg.PublicHTTP.CORSAllowedOrigins, "default disables CORS")
assert.Equal(t, defaultAuthenticatedGRPCAddr, cfg.AuthenticatedGRPC.Addr) assert.Equal(t, defaultAuthenticatedGRPCAddr, cfg.AuthenticatedGRPC.Addr)
assert.Equal(t, defaultAuthenticatedGRPCConnectionTimeout, cfg.AuthenticatedGRPC.ConnectionTimeout) assert.Equal(t, defaultAuthenticatedGRPCConnectionTimeout, cfg.AuthenticatedGRPC.ConnectionTimeout)
@@ -165,6 +166,22 @@ func TestLoadFromEnvAppliesPublicAndAuthGRPCDefaults(t *testing.T) {
assert.Equal(t, defaultAuthenticatedGRPCFreshnessWindow, cfg.AuthenticatedGRPC.FreshnessWindow) assert.Equal(t, defaultAuthenticatedGRPCFreshnessWindow, cfg.AuthenticatedGRPC.FreshnessWindow)
} }
func TestLoadFromEnvParsesCORSAllowedOrigins(t *testing.T) {
configEnvMu.Lock()
defer configEnvMu.Unlock()
resetEnv(t)
setBaseRequiredEnv(t)
t.Setenv(publicHTTPCORSAllowedOriginsEnvVar, "https://www.galaxy.lan, , https://staging.galaxy.lan")
cfg, err := LoadFromEnv()
require.NoError(t, err)
assert.Equal(t,
[]string{"https://www.galaxy.lan", "https://staging.galaxy.lan"},
cfg.PublicHTTP.CORSAllowedOrigins,
"comma-separated list is split, whitespace-trimmed, and empty segments dropped")
}
// resetEnv clears every env var the gateway config might read so that // resetEnv clears every env var the gateway config might read so that
// individual tests can build the exact environment they need without // individual tests can build the exact environment they need without
// leakage from a previous test. // leakage from a previous test.
@@ -179,6 +196,7 @@ func resetEnv(t *testing.T) {
publicHTTPReadTimeoutEnvVar, publicHTTPReadTimeoutEnvVar,
publicHTTPIdleTimeoutEnvVar, publicHTTPIdleTimeoutEnvVar,
publicAuthUpstreamTimeoutEnvVar, publicAuthUpstreamTimeoutEnvVar,
publicHTTPCORSAllowedOriginsEnvVar,
backendHTTPURLEnvVar, backendHTTPURLEnvVar,
backendGRPCPushURLEnvVar, backendGRPCPushURLEnvVar,
backendGatewayClientIDEnvVar, backendGatewayClientIDEnvVar,
+52
View File
@@ -0,0 +1,52 @@
package restapi
import (
"net/http"
"github.com/gin-gonic/gin"
)
// withCORS returns a gin middleware that handles browser CORS preflight and
// attaches Access-Control-Allow-* response headers when the request's Origin
// is on the configured allow-list. Origins are compared exactly: scheme,
// host, and port must match. An empty allow-list disables the middleware —
// requests pass through untouched. Requests without an Origin header always
// pass through, the middleware only acts when a browser actually asks.
//
// The middleware mounts before the anti-abuse layer so OPTIONS preflights
// do not count against the rate-limit buckets for the eventual real call.
func withCORS(allowedOrigins []string) gin.HandlerFunc {
allowed := make(map[string]struct{}, len(allowedOrigins))
for _, origin := range allowedOrigins {
allowed[origin] = struct{}{}
}
if len(allowed) == 0 {
return func(c *gin.Context) { c.Next() }
}
return func(c *gin.Context) {
origin := c.GetHeader("Origin")
if origin == "" {
c.Next()
return
}
if _, ok := allowed[origin]; !ok {
c.Next()
return
}
c.Header("Access-Control-Allow-Origin", origin)
c.Header("Vary", "Origin")
c.Header("Access-Control-Allow-Credentials", "true")
if c.Request.Method == http.MethodOptions {
c.Header("Access-Control-Allow-Methods", "GET, POST, PUT, PATCH, DELETE, OPTIONS")
if reqHeaders := c.GetHeader("Access-Control-Request-Headers"); reqHeaders != "" {
c.Header("Access-Control-Allow-Headers", reqHeaders)
} else {
c.Header("Access-Control-Allow-Headers", "Content-Type, Authorization")
}
c.Header("Access-Control-Max-Age", "3600")
c.AbortWithStatus(http.StatusNoContent)
return
}
c.Next()
}
}
+120
View File
@@ -0,0 +1,120 @@
package restapi
import (
"net/http"
"net/http/httptest"
"testing"
"github.com/gin-gonic/gin"
"github.com/stretchr/testify/assert"
)
func init() {
gin.SetMode(gin.TestMode)
}
func newCORSRouter(allowedOrigins []string) *gin.Engine {
router := gin.New()
router.Use(withCORS(allowedOrigins))
router.GET("/api/v1/public/probe", func(c *gin.Context) {
c.JSON(http.StatusOK, statusResponse{Status: "ok"})
})
return router
}
func TestWithCORSAllowsListedOrigin(t *testing.T) {
t.Parallel()
router := newCORSRouter([]string{"https://www.galaxy.lan"})
req := httptest.NewRequest(http.MethodGet, "/api/v1/public/probe", nil)
req.Header.Set("Origin", "https://www.galaxy.lan")
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusOK, recorder.Code)
assert.Equal(t, "https://www.galaxy.lan", recorder.Header().Get("Access-Control-Allow-Origin"))
assert.Equal(t, "Origin", recorder.Header().Get("Vary"))
assert.Equal(t, "true", recorder.Header().Get("Access-Control-Allow-Credentials"))
}
func TestWithCORSPreflightShortCircuits(t *testing.T) {
t.Parallel()
router := newCORSRouter([]string{"https://www.galaxy.lan"})
req := httptest.NewRequest(http.MethodOptions, "/api/v1/public/probe", nil)
req.Header.Set("Origin", "https://www.galaxy.lan")
req.Header.Set("Access-Control-Request-Method", "POST")
req.Header.Set("Access-Control-Request-Headers", "Content-Type, X-Galaxy-Trace")
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusNoContent, recorder.Code)
assert.Equal(t, "https://www.galaxy.lan", recorder.Header().Get("Access-Control-Allow-Origin"))
assert.Contains(t, recorder.Header().Get("Access-Control-Allow-Methods"), "POST")
assert.Equal(t, "Content-Type, X-Galaxy-Trace", recorder.Header().Get("Access-Control-Allow-Headers"))
assert.Equal(t, "3600", recorder.Header().Get("Access-Control-Max-Age"))
assert.Empty(t, recorder.Body.String(), "preflight must not return a body")
}
func TestWithCORSPreflightFallbackHeadersWhenRequestHeadersMissing(t *testing.T) {
t.Parallel()
router := newCORSRouter([]string{"https://www.galaxy.lan"})
req := httptest.NewRequest(http.MethodOptions, "/api/v1/public/probe", nil)
req.Header.Set("Origin", "https://www.galaxy.lan")
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusNoContent, recorder.Code)
assert.Equal(t, "Content-Type, Authorization", recorder.Header().Get("Access-Control-Allow-Headers"))
}
func TestWithCORSRejectsUnknownOrigin(t *testing.T) {
t.Parallel()
router := newCORSRouter([]string{"https://www.galaxy.lan"})
req := httptest.NewRequest(http.MethodGet, "/api/v1/public/probe", nil)
req.Header.Set("Origin", "https://evil.example.com")
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusOK, recorder.Code, "real call must still succeed; the browser is the one that blocks the response")
assert.Empty(t, recorder.Header().Get("Access-Control-Allow-Origin"), "no allow-origin header for rejected origin")
}
func TestWithCORSPassThroughWithoutOriginHeader(t *testing.T) {
t.Parallel()
router := newCORSRouter([]string{"https://www.galaxy.lan"})
req := httptest.NewRequest(http.MethodGet, "/api/v1/public/probe", nil)
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusOK, recorder.Code)
assert.Empty(t, recorder.Header().Get("Access-Control-Allow-Origin"))
}
func TestWithCORSDisabledByEmptyConfig(t *testing.T) {
t.Parallel()
router := newCORSRouter(nil)
req := httptest.NewRequest(http.MethodGet, "/api/v1/public/probe", nil)
req.Header.Set("Origin", "https://www.galaxy.lan")
recorder := httptest.NewRecorder()
router.ServeHTTP(recorder, req)
assert.Equal(t, http.StatusOK, recorder.Code)
assert.Empty(t, recorder.Header().Get("Access-Control-Allow-Origin"))
}
+4
View File
@@ -278,6 +278,10 @@ func newPublicHandlerWithConfig(cfg config.PublicHTTPConfig, deps ServerDependen
})) }))
router.Use(otelgin.Middleware("galaxy-edge-gateway-public")) router.Use(otelgin.Middleware("galaxy-edge-gateway-public"))
router.Use(withPublicObservability(deps.Logger.Named("public_http"), deps.Telemetry)) router.Use(withPublicObservability(deps.Logger.Named("public_http"), deps.Telemetry))
// CORS runs before the route classifier and anti-abuse layers so
// preflight OPTIONS calls answer with 204 immediately and never
// count against any rate-limit bucket.
router.Use(withCORS(cfg.CORSAllowedOrigins))
router.Use(withPublicRouteClass(deps.Classifier)) router.Use(withPublicRouteClass(deps.Classifier))
router.Use(withPublicAntiAbuse(cfg.AntiAbuse, deps.Limiter, deps.Observer)) router.Use(withPublicAntiAbuse(cfg.AntiAbuse, deps.Limiter, deps.Observer))
+16 -4
View File
@@ -282,12 +282,24 @@ func TestAppendRandomSuffixGenerator(t *testing.T) {
} }
func TestRandomSuffixGenerator(t *testing.T) { func TestRandomSuffixGenerator(t *testing.T) {
var last string // The generator draws from a ~10 000 element space (Intn(9999)
for range 100 { // formatted as four digits). Comparing each sample against the
// previous one with NotEqual flaked ~1 % per 100-iteration run on
// natural collisions. Count unique values instead — if the
// generator ever gets stuck on a tiny range we still catch it,
// without depending on the birthday paradox not firing today.
const samples = 200
seen := make(map[string]struct{}, samples)
for range samples {
s := util.RandomSuffixGenerator() s := util.RandomSuffixGenerator()
assert.Len(t, s, 4) assert.Len(t, s, 4)
assert.NotEqual(t, last, s)
assert.True(t, strings.ContainsFunc(s, func(r rune) bool { return r >= '0' && r <= '9' })) assert.True(t, strings.ContainsFunc(s, func(r rune) bool { return r >= '0' && r <= '9' }))
last = s seen[s] = struct{}{}
} }
// In 200 draws from ~10 000 the expected number of unique values
// is ~198; a stuck generator (single value) would land at 1, a
// 256-value range at ~196. 150 is well above the floor either
// way and well below the expected mean.
assert.GreaterOrEqual(t, len(seen), 150,
"RandomSuffixGenerator drew from too small a range over %d samples", samples)
} }
+21
View File
@@ -0,0 +1,21 @@
# Defaults for the long-lived dev stack. Copy to `.env` and edit
# per-environment overrides. Everything in this file is non-secret;
# real credentials would go through Gitea Actions secrets and never
# this file.
#
# The compose `${VAR:-default}` expansions fall back to the values
# baked into `docker-compose.yml`, so this file documents the knobs
# rather than driving them.
# Auto-provisioned sandbox bootstrap. Empty disables the bootstrap.
BACKEND_DEV_SANDBOX_EMAIL=dev@galaxy.lan
BACKEND_DEV_SANDBOX_ENGINE_IMAGE=galaxy-engine:dev
BACKEND_DEV_SANDBOX_ENGINE_VERSION=0.1.0
BACKEND_DEV_SANDBOX_PLAYER_COUNT=20
# `123456` short-circuits the email-code path for the dev account.
# Leave empty in environments where real Mailpit codes must be used.
BACKEND_AUTH_DEV_FIXED_CODE=123456
# Name of the external Docker bridge the host Caddy is attached to.
GALAXY_EDGE_NETWORK=edge
+25
View File
@@ -0,0 +1,25 @@
# Application-routing Caddy for the long-lived dev environment.
# Listens only on the `edge` Docker network; TLS termination and the
# real `:80`/`:443` listeners belong to the host Caddy in front of us.
#
# `/srv/galaxy-ui` is mounted from the `galaxy-dev-ui-dist` named volume,
# refreshed on every dev-deploy run.
{
auto_https off
}
:80 {
@frontend host www.galaxy.lan
handle @frontend {
root * /srv/galaxy-ui
try_files {path} /index.html
file_server
encode zstd gzip
}
@api host api.galaxy.lan
handle @api {
reverse_proxy galaxy-api:8080
}
}
}
+15
View File
@@ -0,0 +1,15 @@
# Production placeholder. Mirrors `Caddyfile.dev` but uses real
# hostnames and lets Caddy auto-provision TLS certificates. Not used
# until prod-deploy plumbing exists; kept under version control so the
# dev/prod surface stays symmetric.
www.galaxy.com {
root * /srv/galaxy-ui
try_files {path} /index.html
file_server
encode zstd gzip
}
api.galaxy.com {
reverse_proxy galaxy-api:8080
}
+105
View File
@@ -0,0 +1,105 @@
.PHONY: help up down rebuild logs status clean-data health psql build-engine seed-ui
.DEFAULT_GOAL := help
REPO_ROOT := $(realpath $(CURDIR)/../..)
ENGINE_IMAGE := galaxy-engine:dev
ENGINE_LABEL := org.opencontainers.image.title=galaxy-game-engine
# Game-state root lives under the invoking user's home by default so
# `make up` works without sudo. Override `GALAXY_DEV_GAME_STATE_DIR`
# in the environment or `.env` to relocate (e.g. /var/lib/galaxy-dev/
# game-state in a production-shaped host). The value flows through to
# both the compose bind-mount and the backend's
# `BACKEND_GAME_STATE_ROOT`.
export GALAXY_DEV_GAME_STATE_DIR ?= $(HOME)/.galaxy-dev/game-state
COMPOSE := docker compose
help:
@echo "Long-lived Galaxy dev environment (https://*.galaxy.lan):"
@echo " make up Build images, ensure engine image, bring stack up"
@echo " make rebuild Force rebuild of backend / gateway images and bring up"
@echo " make build-engine Build $(ENGINE_IMAGE) from game/Dockerfile (no-op if present)"
@echo " make seed-ui Build ui/frontend and load into galaxy-dev-ui-dist volume"
@echo " make down Stop containers, keep named volumes"
@echo " make logs Tail all logs"
@echo " make status docker compose ps"
@echo " make health Probe the stack through the host Caddy"
@echo " make psql Open a psql shell as galaxy@galaxy_backend"
@echo " make clean-data Stop everything and wipe named volumes + game-state"
@echo ""
@echo "Requires:"
@echo " - external Docker network '$${GALAXY_EDGE_NETWORK:-edge}'"
@echo " (docker network create edge)"
@echo " - host Caddy proxying *.galaxy.lan into that network"
@echo " - game-state dir: $(GALAXY_DEV_GAME_STATE_DIR) (auto-created)"
up: build-engine
mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)"
$(COMPOSE) up -d --wait
rebuild: build-engine
$(COMPOSE) build --no-cache galaxy-backend galaxy-api
mkdir -p "$(GALAXY_DEV_GAME_STATE_DIR)"
$(COMPOSE) up -d --wait
build-engine:
@if docker image inspect $(ENGINE_IMAGE) >/dev/null 2>&1; then \
echo "$(ENGINE_IMAGE) already built; skipping (use 'docker rmi $(ENGINE_IMAGE)' to force a rebuild)."; \
else \
echo "building $(ENGINE_IMAGE)"; \
docker build -t $(ENGINE_IMAGE) -f $(REPO_ROOT)/game/Dockerfile $(REPO_ROOT); \
fi
# Build the UI frontend and load the resulting build/ directory into
# the named volume Caddy serves from. Used by the dev-deploy workflow
# and by anyone bringing the stack up by hand.
seed-ui:
@if [ ! -d $(REPO_ROOT)/ui/frontend/node_modules ]; then \
echo "installing UI dependencies…"; \
(cd $(REPO_ROOT)/ui && pnpm install --frozen-lockfile); \
fi
@echo "building UI (vite build)…"
(cd $(REPO_ROOT)/ui/frontend && \
VITE_GATEWAY_BASE_URL=https://api.galaxy.lan \
VITE_GATEWAY_RESPONSE_PUBLIC_KEY=$$(cat $(REPO_ROOT)/ui/frontend/.env.development \
| sed -n 's/^VITE_GATEWAY_RESPONSE_PUBLIC_KEY=//p') \
pnpm build)
@echo "loading build/ into galaxy-dev-ui-dist volume…"
docker volume create galaxy-dev-ui-dist >/dev/null
docker run --rm \
-v galaxy-dev-ui-dist:/dst \
-v $(REPO_ROOT)/ui/frontend/build:/src:ro \
alpine sh -c 'rm -rf /dst/* /dst/.??* 2>/dev/null; cp -a /src/. /dst/'
down:
$(COMPOSE) down
logs:
$(COMPOSE) logs -f --tail=100
status:
$(COMPOSE) ps
health:
@echo "Frontend (https://www.galaxy.lan):"
@curl -sS -o /dev/null -w " HTTP %{http_code}\n" https://www.galaxy.lan/ || echo " unreachable"
@echo "API healthz (https://api.galaxy.lan/healthz):"
@curl -sS -o /dev/null -w " HTTP %{http_code}\n" https://api.galaxy.lan/healthz || echo " unreachable"
psql:
$(COMPOSE) exec galaxy-postgres psql -U galaxy -d galaxy_backend
clean-data:
@echo "Stopping containers and engines, then wiping volumes + game-state…"
@ids=$$(docker ps -aq --filter label=$(ENGINE_LABEL)); \
if [ -n "$$ids" ]; then \
echo "stopping engine containers…"; \
docker rm -f $$ids >/dev/null; \
fi
$(COMPOSE) down -v
@if [ -d "$(GALAXY_DEV_GAME_STATE_DIR)" ]; then \
echo "wiping $(GALAXY_DEV_GAME_STATE_DIR)"; \
docker run --rm -v "$(GALAXY_DEV_GAME_STATE_DIR):/state" alpine sh -c 'rm -rf /state/*' 2>/dev/null \
|| rm -rf "$(GALAXY_DEV_GAME_STATE_DIR)"/* 2>/dev/null || true; \
fi
+188
View File
@@ -0,0 +1,188 @@
# `tools/dev-deploy/` — long-lived Galaxy dev environment
A docker-compose stack that runs the Galaxy backend, gateway, supporting
services, and a small Caddy in front of them, reachable through the host
Caddy at `https://www.galaxy.lan` and `https://api.galaxy.lan`. Used by
the `dev-deploy.yaml` Gitea Actions workflow as the canonical dev target
on every merge into the `development` branch, and runnable by hand
through this Makefile for local debugging of the deploy plumbing
itself.
This stack is **not** the developer's primary playground for UI work —
that role still belongs to [`tools/local-dev/`](../local-dev/README.md),
which is faster (Vite HMR, host-side dev server) and isolated to one
developer. The two stacks coexist on the same host because every name
is distinct:
| | `tools/local-dev/` | `tools/dev-deploy/` |
|------------------|------------------------------|-----------------------------|
| Compose project | `local-dev` | `galaxy-dev` |
| Container prefix | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Network | `galaxy-local-dev-net` | `galaxy-dev-internal`, `edge` |
| Volumes | `galaxy-local-dev-*` | `galaxy-dev-*` |
| Host ports | 5433/6380/8025/8080/9090 | none (only `edge` network) |
| Game state | `/tmp/galaxy-game-state` | `/var/lib/galaxy-dev/game-state` |
| Engine image | `galaxy-engine:local-dev` | `galaxy-engine:dev` |
## Prerequisites
The host must already provide:
- Docker daemon reachable as the user running `make` (member of the
`docker` group, no sudo).
- An external bridge network named `edge` (or whatever
`GALAXY_EDGE_NETWORK` overrides to):
```sh
docker network create edge
```
- A host Caddy listening on `:80`/`:443`, attached to the `edge`
network, and proxying `www.galaxy.lan` and `api.galaxy.lan` to
`galaxy-caddy:80`. Example fragment for the host Caddyfile:
```caddy
www.galaxy.lan, api.galaxy.lan {
tls internal
reverse_proxy galaxy-caddy:80
}
```
- Game-state directory writable by the user running `make`. Default
is `${HOME}/.galaxy-dev/game-state`; `make up` creates it on demand.
Override by exporting `GALAXY_DEV_GAME_STATE_DIR` (e.g. to
`/var/lib/galaxy-dev/game-state` once the host is provisioned for
it).
## Bring it up
```sh
make -C tools/dev-deploy up
```
`up` (re)builds the local-dev backend and gateway images, makes sure the
engine image `galaxy-engine:dev` exists, and waits for healthchecks. It
does **not** seed the UI volume — that is normally done by CI. The first
time you run by hand:
```sh
make -C tools/dev-deploy seed-ui
make -C tools/dev-deploy up
make -C tools/dev-deploy health
```
`seed-ui` runs `pnpm build` in `ui/frontend/`, then copies the resulting
`build/` tree into the `galaxy-dev-ui-dist` volume. Subsequent CI deploys
overwrite this volume automatically.
## Daily flow
```sh
make -C tools/dev-deploy rebuild # rebuild backend/gateway images + up
make -C tools/dev-deploy logs # tail compose logs
make -C tools/dev-deploy health # probe https://*.galaxy.lan
make -C tools/dev-deploy down # stop, keep state
```
State persists in named volumes between `up`/`down` cycles. The
`development` branch keeps the dev environment continuously usable —
games created last week survive into this week unless somebody
calls `make clean-data`.
## Logging in
The same dev-mode email-code override as `tools/local-dev/` applies:
1. Enter `dev@galaxy.lan` (or whatever `BACKEND_DEV_SANDBOX_EMAIL`
resolves to) in the login form.
2. Submit `123456` as the code if `BACKEND_AUTH_DEV_FIXED_CODE` is
non-empty. Otherwise open Mailpit at
`http://galaxy-mailpit:8025/` from inside the network or proxy it
through the host Caddy when needed.
The fixed-code override is rejected by production env loaders, so it
cannot leak into the prod environment.
## Networking
```
Browser
│ https://www.galaxy.lan, https://api.galaxy.lan
host-Caddy (:80, :443, TLS, attached to `edge` network)
│ reverse_proxy *.galaxy.lan → galaxy-caddy:80
galaxy-caddy (networks: edge + galaxy-dev-internal)
│ www.galaxy.lan → file_server /srv/galaxy-ui (volume galaxy-dev-ui-dist)
│ api.galaxy.lan → reverse_proxy galaxy-api:8080
galaxy-dev-internal
├─ galaxy-api (gateway: :8080 REST, :9090 gRPC)
├─ galaxy-backend (backend: :8080 HTTP, :8081 gRPC push)
├─ galaxy-postgres (postgres: :5432)
├─ galaxy-redis (redis: :6379)
├─ galaxy-mailpit (mailpit: :8025 UI, :1025 SMTP)
└─ engine containers (spawned by backend on demand)
```
The compose project deliberately exposes no host ports. Diagnostics
that used to go through `localhost:8025` etc. now go through the
container network: `docker compose -f tools/dev-deploy/docker-compose.yml
exec galaxy-mailpit wget -qO- localhost:8025/messages` and similar.
## Persistent state and schema changes
The dev Postgres volume `galaxy-dev-postgres-data` survives redeploys.
Until the pre-production migration rule is lifted, every
backward-incompatible change to `backend/internal/postgres/migrations/00001_init.sql`
needs a manual wipe before the next deploy succeeds:
```sh
make -C tools/dev-deploy clean-data
make -C tools/dev-deploy up
```
This is the same caveat as `tools/local-dev/`, just with a different
volume name.
## Make targets
```text
make up Build images, ensure engine image, bring stack up (waits for health)
make rebuild Rebuild backend / gateway images (ignores cache), then up
make seed-ui pnpm build + load build/ into galaxy-dev-ui-dist volume
make build-engine Build galaxy-engine:dev (no-op if image already present)
make down Stop containers, keep named volumes
make logs Tail compose logs
make status docker compose ps
make health curl https://www.galaxy.lan + https://api.galaxy.lan/healthz
make psql psql as galaxy@galaxy_backend
make clean-data Stop everything and wipe volumes + game-state dir
```
## Files
- `docker-compose.yml` — six services: postgres, redis, mailpit,
galaxy-backend, galaxy-api, galaxy-caddy. Reuses the alpine-runtime
Dockerfiles from `../local-dev/` so the backend healthcheck can run
`wget`. Reuses the dev keypair from `../local-dev/keys/`.
- `Caddyfile.dev` — the application-routing Caddy config, mounted into
`galaxy-caddy` at `/etc/caddy/Caddyfile`.
- `Caddyfile.prod` — placeholder for a future prod deployment; not used
by this compose.
- `Makefile` — wrapper over `docker compose` with helpers for engine,
UI seeding, health probes, and full wipe.
- `.env.example` — non-secret defaults for the compose `${VAR:-}`
expansions. Copy to `.env` if you want host-local overrides.
## Relationship to other infrastructure
- `tools/local-dev/` — single-developer playground, host-port mapped,
Vite dev server on the side. Recommended for active UI work.
- `tools/local-ci/` — Gitea + act runner for **fallback** workflow
testing without `gitea.lan`. Optional, not part of the per-stage CI
gate anymore.
- `.gitea/workflows/dev-deploy.yaml` — the CI side of this stack:
builds images, seeds the UI volume, runs `docker compose up -d` on
every merge into `development`. The Makefile in this directory is
what that workflow ultimately calls into.
+227
View File
@@ -0,0 +1,227 @@
# Long-lived dev environment for the Galaxy stack, deployed by the
# `dev-deploy.yaml` Gitea Actions workflow on every merge into the
# `development` branch and (optionally) by `make -C tools/dev-deploy up`
# from a developer shell on the same host.
#
# The stack is reachable from a browser only through the host Caddy on
# the machine, which terminates TLS and forwards `*.galaxy.lan` into the
# external `edge` Docker network where `galaxy-caddy` does app-routing.
# No service in this compose project binds a host port — coexistence
# with `tools/local-dev/` (which listens on localhost:5433/6380/8025/...)
# is achieved by distinct names, networks, and volumes.
#
# Browser → host-Caddy (:80/:443) → galaxy-caddy → {galaxy-api, /srv/galaxy-ui}
#
# Persistent state lives in named volumes under the `galaxy-dev-*`
# prefix; surviving redeploys across compose rebuilds.
name: galaxy-dev
services:
galaxy-postgres:
image: postgres:16-alpine
container_name: galaxy-dev-postgres
restart: unless-stopped
environment:
POSTGRES_USER: galaxy
POSTGRES_PASSWORD: galaxy
POSTGRES_DB: galaxy_backend
volumes:
- galaxy-dev-postgres-data:/var/lib/postgresql/data
networks:
- galaxy-internal
healthcheck:
test: ["CMD-SHELL", "pg_isready -U galaxy -d galaxy_backend"]
interval: 3s
timeout: 3s
retries: 30
start_period: 5s
galaxy-redis:
image: redis:7-alpine
container_name: galaxy-dev-redis
restart: unless-stopped
command:
- redis-server
- --requirepass
- galaxy-dev
- --appendonly
- "no"
- --save
- ""
networks:
- galaxy-internal
healthcheck:
test: ["CMD", "redis-cli", "-a", "galaxy-dev", "PING"]
interval: 3s
timeout: 3s
retries: 30
start_period: 3s
galaxy-mailpit:
image: axllent/mailpit:v1.21
container_name: galaxy-dev-mailpit
restart: unless-stopped
networks:
- galaxy-internal
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://localhost:8025/livez"]
interval: 3s
timeout: 3s
retries: 30
start_period: 3s
galaxy-backend:
build:
context: ../..
dockerfile: tools/local-dev/backend.Dockerfile
image: galaxy/backend:dev
container_name: galaxy-dev-backend
restart: unless-stopped
user: "0:0"
depends_on:
galaxy-postgres:
condition: service_healthy
galaxy-mailpit:
condition: service_healthy
environment:
BACKEND_LOGGING_LEVEL: info
BACKEND_HTTP_LISTEN_ADDR: ":8080"
BACKEND_GRPC_PUSH_LISTEN_ADDR: ":8081"
BACKEND_POSTGRES_DSN: "postgres://galaxy:galaxy@galaxy-postgres:5432/galaxy_backend?search_path=backend&sslmode=disable"
BACKEND_SMTP_HOST: galaxy-mailpit
BACKEND_SMTP_PORT: "1025"
BACKEND_SMTP_FROM: "galaxy-backend@galaxy.lan"
BACKEND_SMTP_TLS_MODE: none
BACKEND_DOCKER_NETWORK: galaxy-dev-internal
BACKEND_GAME_STATE_ROOT: ${GALAXY_DEV_GAME_STATE_DIR}
BACKEND_GEOIP_DB_PATH: /var/lib/galaxy/geoip.mmdb
BACKEND_NOTIFICATION_ADMIN_EMAIL: admin@galaxy.lan
BACKEND_MAIL_WORKER_INTERVAL: 500ms
BACKEND_NOTIFICATION_WORKER_INTERVAL: 500ms
BACKEND_OTEL_TRACES_EXPORTER: none
BACKEND_OTEL_METRICS_EXPORTER: none
BACKEND_AUTH_DEV_FIXED_CODE: ${BACKEND_AUTH_DEV_FIXED_CODE:-}
BACKEND_DEV_SANDBOX_EMAIL: ${BACKEND_DEV_SANDBOX_EMAIL:-}
BACKEND_DEV_SANDBOX_ENGINE_IMAGE: ${BACKEND_DEV_SANDBOX_ENGINE_IMAGE:-galaxy-engine:dev}
BACKEND_DEV_SANDBOX_ENGINE_VERSION: ${BACKEND_DEV_SANDBOX_ENGINE_VERSION:-0.1.0}
BACKEND_DEV_SANDBOX_PLAYER_COUNT: ${BACKEND_DEV_SANDBOX_PLAYER_COUNT:-20}
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# Per-game state directories live under the same absolute path
# both inside the backend container and on the Docker daemon host,
# so the bind-mount source the backend hands to the daemon
# resolves correctly when spawning engine containers. The dev
# environment uses a distinct prefix from `tools/local-dev/` so
# the two stacks do not collide on the same host.
# Game-state root must resolve to the same absolute path inside
# the backend container and on the Docker daemon host, because
# backend hands that path to the daemon when it spawns engine
# containers. The Makefile exports `GALAXY_DEV_GAME_STATE_DIR`
# to `${HOME}/.galaxy-dev/game-state` by default, so a non-root
# runner user can write to it without sudo.
- type: bind
source: ${GALAXY_DEV_GAME_STATE_DIR}
target: ${GALAXY_DEV_GAME_STATE_DIR}
bind:
create_host_path: true
- ../../pkg/geoip/test-data/test-data/GeoIP2-Country-Test.mmdb:/var/lib/galaxy/geoip.mmdb:ro
networks:
- galaxy-internal
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://localhost:8080/healthz"]
interval: 3s
timeout: 3s
retries: 60
start_period: 10s
galaxy-api:
build:
context: ../..
dockerfile: tools/local-dev/gateway.Dockerfile
image: galaxy/gateway:dev
container_name: galaxy-dev-api
restart: unless-stopped
depends_on:
galaxy-backend:
condition: service_healthy
galaxy-redis:
condition: service_healthy
environment:
GATEWAY_LOG_LEVEL: info
GATEWAY_PUBLIC_HTTP_ADDR: ":8080"
GATEWAY_AUTHENTICATED_GRPC_ADDR: ":9090"
GATEWAY_BACKEND_HTTP_URL: "http://galaxy-backend:8080"
GATEWAY_BACKEND_GRPC_PUSH_URL: "galaxy-backend:8081"
GATEWAY_BACKEND_GATEWAY_CLIENT_ID: dev-gateway-1
GATEWAY_RESPONSE_SIGNER_PRIVATE_KEY_PEM_PATH: /run/secrets/gateway-response.pem
GATEWAY_REDIS_MASTER_ADDR: "galaxy-redis:6379"
GATEWAY_REDIS_PASSWORD: galaxy-dev
# UI lives on https://www.galaxy.lan; the API is on
# https://api.galaxy.lan. Browsers therefore issue cross-origin
# requests to the gateway and need an explicit allow-list.
GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS: "https://www.galaxy.lan"
# Anti-abuse defaults are looser than production: the dev
# environment is shared by a handful of trusted testers who
# frequently hammer the same identity to reproduce flows.
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_PUBLIC_AUTH_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_PUBLIC_AUTH_RATE_LIMIT_BURST: "1000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_SEND_EMAIL_CODE_IDENTITY_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_SEND_EMAIL_CODE_IDENTITY_RATE_LIMIT_BURST: "1000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_CONFIRM_EMAIL_CODE_IDENTITY_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_CONFIRM_EMAIL_CODE_IDENTITY_RATE_LIMIT_BURST: "1000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_PUBLIC_MISC_MAX_BODY_BYTES: "131072"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_PUBLIC_MISC_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_PUBLIC_MISC_RATE_LIMIT_BURST: "1000"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_BROWSER_BOOTSTRAP_MAX_BODY_BYTES: "65536"
GATEWAY_PUBLIC_HTTP_ANTI_ABUSE_BROWSER_ASSET_MAX_BODY_BYTES: "65536"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_IP_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_IP_RATE_LIMIT_BURST: "1000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_SESSION_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_SESSION_RATE_LIMIT_BURST: "1000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_USER_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_USER_RATE_LIMIT_BURST: "1000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_MESSAGE_CLASS_RATE_LIMIT_REQUESTS: "10000"
GATEWAY_AUTHENTICATED_GRPC_ANTI_ABUSE_MESSAGE_CLASS_RATE_LIMIT_BURST: "1000"
volumes:
- ../local-dev/keys/gateway-response.pem:/run/secrets/gateway-response.pem:ro
networks:
- galaxy-internal
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://localhost:8080/healthz"]
interval: 3s
timeout: 3s
retries: 30
start_period: 5s
galaxy-caddy:
image: caddy:2.11.2-alpine
container_name: galaxy-dev-caddy
restart: unless-stopped
depends_on:
galaxy-api:
condition: service_healthy
volumes:
- ./Caddyfile.dev:/etc/caddy/Caddyfile:ro
- galaxy-dev-caddy-data:/data
- galaxy-dev-ui-dist:/srv/galaxy-ui:ro
networks:
- galaxy-internal
- edge
networks:
galaxy-internal:
name: galaxy-dev-internal
driver: bridge
internal: false
edge:
name: ${GALAXY_EDGE_NETWORK:-edge}
external: true
volumes:
galaxy-dev-postgres-data:
name: galaxy-dev-postgres-data
galaxy-dev-caddy-data:
name: galaxy-dev-caddy-data
galaxy-dev-ui-dist:
name: galaxy-dev-ui-dist
+12 -4
View File
@@ -1,9 +1,17 @@
# Local Gitea CI # Local Gitea CI (fallback)
> **Status:** fallback / opt-in. The primary CI target is now
> `gitea.lan` with its host-mode `act_runner`. The per-stage CI gate
> closes against `gitea.lan`, not against this stack. Use this
> directory when you want to validate `.gitea/workflows/*` without
> reaching `gitea.lan` — for example, when iterating on a workflow
> file from a flight without LAN access — or when isolating a runner
> issue from production-shaped infrastructure.
Self-contained Gitea + Actions runner for verifying Self-contained Gitea + Actions runner for verifying
`.gitea/workflows/*` honestly before pushing to a real Gitea instance. `.gitea/workflows/*` honestly before pushing to `gitea.lan`. Runs
Runs natively on arm64 (Apple Silicon) — every image below has an natively on arm64 (Apple Silicon) — every image below has an arm64
arm64 variant, so Docker pulls the right architecture and the runner variant, so Docker pulls the right architecture and the runner
executes workflow steps without QEMU emulation. executes workflow steps without QEMU emulation.
## Prerequisites ## Prerequisites
+9 -5
View File
@@ -10,11 +10,15 @@ FlatBuffers wire, every authenticated call verifies the response
signature against the dev keypair, and every email passes through signature against the dev keypair, and every email passes through
Mailpit's web UI for inspection. Mailpit's web UI for inspection.
This stack is **not** a CI gate (that role belongs to This stack is **not** a CI gate (the per-stage CI gate now lives on
[`tools/local-ci/`](../local-ci/README.md), which boots a Gitea + `gitea.lan`; see project-level `CLAUDE.md`). It is also distinct from
Actions runner and replays workflow files). The two stacks are the **long-lived dev environment** at
independent and can coexist on the same machine; they bind different [`tools/dev-deploy/`](../dev-deploy/README.md), which is redeployed on
ports and use different networks. every merge into `development` and is reachable as
`https://www.galaxy.lan` / `https://api.galaxy.lan`. The three stacks
(`tools/local-dev/`, `tools/dev-deploy/`, and the fallback
`tools/local-ci/`) coexist on the same host because every name —
compose project, container, network, volume — is distinct.
## Bring it up ## Bring it up
+6
View File
@@ -153,6 +153,12 @@ The stack accepts a fixed dev code (`123456`) in addition to the
real Mailpit-delivered one. Full runbook in real Mailpit-delivered one. Full runbook in
[`../tools/local-dev/README.md`](../tools/local-dev/README.md). [`../tools/local-dev/README.md`](../tools/local-dev/README.md).
For testing the production-shaped surface — Caddy in front of the
gateway, statically served UI bundle, real `https://*.galaxy.lan`
hostnames — use the long-lived dev environment at
[`../tools/dev-deploy/`](../tools/dev-deploy/README.md). It is
redeployed by Gitea Actions on every merge into `development`.
## Per-phase docs ## Per-phase docs
Topic docs live under `ui/docs/` and are added per phase as they're Topic docs live under `ui/docs/` and are added per phase as they're
+7 -1
View File
@@ -5,7 +5,13 @@ export default defineConfig({
testDir: "tests/e2e", testDir: "tests/e2e",
fullyParallel: true, fullyParallel: true,
forbidOnly: !!process.env.CI, forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0, // host-mode CI runner shares CPU/IO with the long-lived dev stack,
// gitea, and the user's host Caddy. The default 6 workers + 1
// retry produced ~7 flakies + 1 hard fail per ui-test run; cap at
// 4 workers (still parallel) and allow 4 retries to ride out
// transient timing hiccups without inflating wall time.
workers: 4,
retries: process.env.CI ? 4 : 0,
reporter: [["list"], ["html", { open: "never" }]], reporter: [["list"], ["html", { open: "never" }]],
use: { use: {
baseURL: "http://localhost:5173", baseURL: "http://localhost:5173",