galaxy-game

Author	SHA1	Message	Date
Ilia Denisov	49f614926a	KNOWN-ISSUES: park sandbox-cancel; owner rejected host-side hypotheses After the live investigation, the project owner confirms that none of the host-side cleanup paths apply: no docker prune cron, no manual `docker rm`, no `dockerd` restart in the window, and the engine binary does not crash while idling on API calls. Replace the host-side hypothesis list with a one-line note that they were considered and rejected, narrow the open suspicion to the `dev-deploy.yaml` job sequence (`docker build` + `docker compose build` + the alpine `docker run --rm` for UI seeding + `docker compose up -d --wait --remove-orphans`), and park the entry. Reopen if the symptom recurs with a fresh `docker events --since 0` capture armed before the deploy starts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 23:16:51 +02:00
Ilia Denisov	cadb72b412	KNOWN-ISSUES: rule out compose orphan reap; narrow to host-side reap Tests · UI / test (push) Successful in 2m36s Details Tests · Go / test (push) Successful in 2m38s Details A live `docker inspect` of an engine container and two redispatch runs with `docker events` captured confirm: - Engine has no `com.docker.compose.*` labels and `AutoRemove=false`, so `--remove-orphans` cannot reap it. - Two consecutive `dev-deploy.yaml` redispatches with an engine already running emitted `die` / `destroy` events only for `galaxy-dev-{backend,api,caddy}` — never for the engine. - The reconciler tick that fires 60s after backend recreate correctly matched the surviving engine in both cases (`status=running` in both `games` and `runtime_records`). - `runtime.Service` has no `Shutdown` that proactively removes engine containers, so a graceful backend exit also leaves them alone. The repro window therefore needs a separate trigger that removed the engine container outside of compose. The new hypotheses point at host-side `docker prune` jobs, a `dockerd` restart that lost the container, or an early `Engine.Init` failure that exited the engine before `status=running` reached the runtime row. The investigation list now leads with `journalctl -u docker` and the host crontab — those are the cheapest checks to confirm or rule out next. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 23:10:13 +02:00
Ilia Denisov	5177fef2ef	tools/dev-deploy: log the sandbox-cancellation TODO Capture the diagnostic notes for the issue we hit after every `dev-deploy.yaml` redispatch: the freshly-bootstrapped "Dev Sandbox" game ends up `cancelled` ~15 minutes later, with the runtime reconciler reporting "container disappeared". The engine never shows up in `docker ps -a --filter label=galaxy-game-engine`, so either it never spawned or it was removed before any host-side snapshot. `KNOWN-ISSUES.md` records the symptom, the log excerpt, three working hypotheses (runtime spawn race, `--remove-orphans` interaction, engine `--rm` lifecycle), and the investigation checklist before opening an issue. The README gets a one-line pointer so future redeploys land on the doc immediately. No code change — this is the placeholder so the next person investigating the cancellation pattern does not have to rediscover the diagnostic from scratch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:56:25 +02:00
Ilia Denisov	57e6c1d253	gateway: CORS allow-list for the authenticated Connect-Web surface Tests · Go / test (push) Successful in 2m9s Details Tests · Go / test (pull_request) Successful in 2m9s Details Tests · Integration / integration (pull_request) Successful in 1m47s Details Tests · UI / test (pull_request) Successful in 2m52s Details The public REST listener already exposes `GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS`; the authenticated Connect-Web listener on the separate gRPC port had no equivalent. That worked in `tools/local-dev` (Vite proxy makes everything same-origin) and would work in production once UI and gateway share a single hostname, but the long-lived dev environment serves the UI from `https://www.galaxy.lan` and the gateway from `https://api.galaxy.lan` — every `/galaxy.gateway.v1.EdgeGateway/*` fetch failed in the browser with the WebKit "Load failed" generic message because the response carried no `Access-Control-Allow-Origin` header. Lobby rendered as "[unknown] Load failed" with no game. Mirror the public-REST CORS surface for the authenticated handler: - new env `GATEWAY_AUTHENTICATED_GRPC_CORS_ALLOWED_ORIGINS`; - new `AuthenticatedGRPCConfig.CORSAllowedOrigins` field; - new `grpcapi.withCORS` middleware wrapping the Connect mux; - dev-deploy stack sets the env to `https://www.galaxy.lan`. The middleware speaks plain net/http (the Connect handler is mounted on a ServeMux, not gin), handles preflight 204 immediately, and exposes the Connect-Web header set the browser needs to read the response (`Grpc-Status`, `Grpc-Message`, `Connect-Protocol-Version`). Empty allow-list disables the middleware — production stays at "single hostname" by default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:15:11 +02:00
Ilia Denisov	4b2a949f12	dev-deploy Caddy: route Connect-Web traffic to gateway :9090 Tests · Integration / integration (pull_request) Successful in 1m44s Details Tests · Go / test (pull_request) Successful in 2m6s Details Tests · UI / test (pull_request) Successful in 2m27s Details `api.galaxy.lan` was proxying every path to `galaxy-api:8080` (the public REST listener), so authenticated Connect-Web calls (`/galaxy.gateway.v1.EdgeGateway/ExecuteCommand`, `/galaxy.gateway.v1.EdgeGateway/SubscribeEvents`) collapsed to a 404 from the public route table — the lobby loaded the static bundle but every authenticated query failed silently. Split routing by path: `/galaxy.gateway.v1.EdgeGateway/*` goes to the authenticated listener on `:9090`, everything else stays on `:8080`. Mirrors the Vite dev-server proxy in `ui/frontend/vite.config.ts`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:03:55 +02:00
Ilia Denisov	81917acc3e	dev-deploy: enable Dev Sandbox bootstrap and synthetic-report loader Tests · UI / test (push) Has been cancelled Details Tests · Integration / integration (pull_request) Successful in 1m47s Details Tests · Go / test (pull_request) Successful in 2m4s Details Tests · UI / test (pull_request) Successful in 2m23s Details Two long-standing dev-environment ergonomics had not survived the move from the bespoke local-dev stack to the CI-driven dev-deploy: 1. `BACKEND_DEV_SANDBOX_EMAIL` defaulted to an empty string in the dev-deploy compose, so the auto-provisioned "Dev Sandbox" game never appeared on `https://www.galaxy.lan`. Bake `dev@galaxy.lan` as the default — matches `.env.example` and lets a developer who logs in with that email find a ready-to-play game in the lobby. 2. The lobby's synthetic-report loader was gated on `import.meta.env.DEV`, which is true only for `vite dev` (the tools/local-dev path). The long-lived dev environment builds with `vite build` (production mode), so the section was always stripped from its bundle. Gate it on an explicit `VITE_GALAXY_DEV_AFFORDANCES` flag instead and set it both in `.env.development` (preserves `pnpm dev` behaviour) and in the `dev-deploy.yaml` build step. The `prod-build.yaml` build path leaves the flag unset, so production stays clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:46:24 +02:00
Ilia Denisov	8bc75fd71b	dev-deploy: default BACKEND_AUTH_DEV_FIXED_CODE to 123456 The long-lived dev environment now opts into the bcrypt-bypass on a fresh `up`/`rebuild` so a returning developer can sign in with `123456` even after the matching browser session was cleared (the real emailed code is single-use). Set the variable to an empty string in `.env` to force real Mailpit codes (mail-flow QA). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 12:41:32 +02:00
Ilia Denisov	1855e43699	gateway: add CORS allow-list for the public REST surface Tests · Go / test (push) Successful in 1m42s Details Tests · Go / test (pull_request) Successful in 1m45s Details Tests · Integration / integration (pull_request) Successful in 1m36s Details Adds a `GATEWAY_PUBLIC_HTTP_CORS_ALLOWED_ORIGINS` env-driven allow-list on the public REST server so the dev UI on https://www.galaxy.lan can call https://api.galaxy.lan without the browser blocking the cross-origin response. Defaults to empty (no CORS) so the production posture stays closed. The middleware mounts before route classification and anti-abuse, so OPTIONS preflights never charge against per-class rate-limit buckets. `tools/dev-deploy/docker-compose.yml` opts the dev gateway into a single allowed origin (`https://www.galaxy.lan`); local-dev keeps the defaults because Vite proxies through the same origin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 07:58:14 +02:00
Ilia Denisov	bb74e3336e	dev-deploy: restore GeoIP bind-mount, drop image bake Tests · Integration / integration (pull_request) Successful in 2m14s Details Tests · Go / test (pull_request) Successful in 2m19s Details Tests · UI / test (pull_request) Failing after 51m17s Details With the runner in host-mode, compose bind-mount paths resolve to real host paths the Docker daemon can see, so the GeoIP file no longer needs to be baked into the backend image to survive CI. Bring back the bind-mount of `pkg/geoip/test-data/.../mmdb`, matching how local-dev sources it. Image now only carries the backend binary, symmetric with the production `backend/Dockerfile`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 01:04:11 +02:00
Ilia Denisov	0da360a644	dev-deploy: fix backend startup in CI Two bugs surfaced on the first real merge into development: 1. `${{ env.HOME }}` evaluates to empty string at the workflow stage, so GALAXY_DEV_GAME_STATE_DIR became `/.galaxy-dev/game-state`. Resolve in the shell instead of YAML. 2. The compose bind-mount of GeoIP2-Country-Test.mmdb referenced a path inside the runner's workspace volume, which the host Docker daemon cannot see — it created an empty directory and the backend crashed with "geoip database: is a directory" in a restart loop. Bake the file into the backend image so dev-deploy no longer needs a bind-mount; local-dev compose still mounts it on top for swap-in during development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 00:22:16 +02:00
Ilia Denisov	00c79064fc	tools/dev-deploy: long-lived dev environment behind host Caddy A docker-compose stack that hosts postgres, redis, mailpit, backend, gateway, and an app-routing Caddy. Reachable through the host Caddy at https://www.galaxy.lan (static SPA) and https://api.galaxy.lan (REST + gRPC). Coexists with tools/local-dev/ and tools/local-ci/ by giving every name (compose project, container, network, volume) a distinct galaxy-dev-* prefix. State is persisted in named volumes; game-state lives under ${GALAXY_DEV_GAME_STATE_DIR:-$HOME/.galaxy-dev/game-state} so the default works for a non-root runner without sudo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 23:26:35 +02:00

11 Commits