Commit Graph

6 Commits

Author SHA1 Message Date
Ilia Denisov a338ebf058 fix(integration): scope preclean to galaxy.stack=integration
Tests · Integration / integration (pull_request) Successful in 1m37s
Root cause for the long-standing "Dev Sandbox flips to cancelled
after dev-deploy" symptom in push-triggered cycles: when
`integration.yaml` runs in parallel with `dev-deploy.yaml`, its
`integration/scripts/preclean.sh` issues a `docker rm -f` over every
container labelled `galaxy.backend=1`. That label is stamped by the
backend's runtime adapter on every engine it spawns — including the
engines living in the long-lived dev-deploy environment on the same
Docker daemon. Each post-merge auto-deploy therefore had the
integration preclean wipe the dev-sandbox engine, and the new
backend's reconciler tick observed `container disappeared` and
cascaded the sandbox into `cancelled`.

Fix:

- `integration/testenv/backend.go` now sets
  `BACKEND_STACK_LABEL=integration` on every backend-under-test, so
  the engines spawned by integration carry
  `galaxy.stack=integration` in addition to `galaxy.backend=1`. The
  backend support for this env was added in the previous CI tidy-up
  PR (#13).

- `integration/scripts/preclean.sh` gains a multi-label AND filter
  helper and uses it to scope engine cleanup to the combination
  `galaxy.backend=1 AND galaxy.stack=integration`. dev-deploy and
  local-dev engines carry different `galaxy.stack` values, so the
  AND match leaves them alone.

- `docs/ARCHITECTURE.md` "Container labels" — refreshed to call out
  the AND-scoping rule and the new integration backend stamp.

- `tools/dev-deploy/KNOWN-ISSUES.md` — the sandbox-cancel entry
  gets an "Update" section recording the root cause and the fix; the
  status is downgraded to "partially fixed" because the solo
  `workflow_dispatch` reproduction (which does NOT trigger
  integration) remains unexplained.

- `tools/dev-deploy/KNOWN-ISSUES.md` — separately, document the
  `docker restart galaxy-dev-backend` failure caused by the
  runner-workspace bind-mount that surfaced while diagnosing this
  issue. Workaround: `make -C tools/dev-deploy up` from the
  persistent checkout. Real fix is a follow-up (bake fixture into
  image or copy to named volume).

Verification:

- `go build ./backend/... ./integration/...` — clean.
- `bash -n integration/scripts/preclean.sh` — syntax OK.
- Live AND-filter check on the dev host:
  `docker ps -aq --filter label=galaxy.backend=1 --filter label=galaxy.stack=integration`
  returns nothing while the dev-deploy engine
  `galaxy-game-80f3ce86-...` keeps running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:37:55 +02:00
Ilia Denisov 9d2504c42d backend: embed tzdata so time.LoadLocation works in distroless/alpine
`time.LoadLocation` is called from
backend/internal/server/handlers_public_auth.go:108 (confirm-email-code)
and backend/internal/user/account.go:218 (user.settings.update). Both
runtime images shipped today have no tzdata — production
backend/Dockerfile uses gcr.io/distroless/static-debian12:nonroot, and
local-dev tools/local-dev/backend.Dockerfile uses alpine:3.20 without
the optional tzdata apk — so the container-side binary resolves only
the no-data fallback (UTC and fixed offsets) and rejects every real
IANA zone with HTTP 400 `invalid_request: time_zone must be a valid
IANA zone`.

Adding `import _ "time/tzdata"` to backend's main is the idiomatic
Go fix: the binary embeds the IANA database, time.LoadLocation works
on every base image, no Dockerfile changes needed. Cost is ~800 KB
of binary growth — invisible next to the existing /usr/local/bin/backend
size and well below any container layer threshold.

The OpenAPI spec already documents the field as "IANA time-zone
identifier" (gateway/openapi.yaml:205, backend/openapi.yaml:2334)
and the UI sends Intl.DateTimeFormat().resolvedOptions().timeZone,
so neither the contract nor the client needs a change.

Why this slipped through: backend unit tests run as a host Go test
process (developer's tzdata covers them), Playwright tests mock the
gateway (backend never reached), and the integration suite — the only
layer that exercises the real backend container — uses
RegisterSession which hardcoded `time_zone="UTC"`. Switching that
default to "Europe/Berlin" makes every integration scenario that
enrols a pilot exercise the tzdata path, so the next regression
surfaces in the integration run instead of escaping into manual
smoke. (The integration suite is not in the per-PR workflow yet; that
gap is tracked separately.)

Verified end-to-end against `tools/local-dev`:
  - Europe/Amsterdam, Asia/Tokyo, America/Los_Angeles → 200 +
    device_session_id (was 400 before this patch).
  - Mars/Olympus still → 400 (validation behaviour unchanged).
Host tests: backend/internal/{auth,user,config} green.
UI: pnpm test 14/14, CI=1 pnpm exec playwright test 44/44.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 11:58:47 +02:00
Ilia Denisov 89bf7e6576 phase 4: drop stale gRPC nomenclature from integration tests
Phase 4 replaced the gateway's authenticated edge listener with a
Connect-Go HTTP/h2c bootstrap that natively serves Connect, gRPC,
and gRPC-Web. Sweep the integration suite so test names, comments,
and helper docs match the new transport posture: rename
TestUserAccount_GetThroughGatewayGRPC to TestUserAccount_GetThroughGatewayEdge,
flip "authenticated gRPC" / "signed gRPC" / "gateway gRPC" comments
to "authenticated edge", and align testenv doc strings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:52:17 +02:00
Ilia Denisov 118f7c17a2 phase 4: connectrpc on the gateway authenticated edge
Replace the native-gRPC server bootstrap with a single
`connectrpc.com/connect` HTTP/h2c listener. Connect-Go natively
serves Connect, gRPC, and gRPC-Web on the same port, so browsers can
now reach the authenticated surface without giving up the gRPC
framing native and desktop clients may use later. The decorator
stack (envelope → session → payload-hash → signature →
freshness/replay → rate-limit → routing/push) is reused unchanged
behind a small Connect → gRPC adapter and a `grpc.ServerStream`
shim around `*connect.ServerStream`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:49:28 +02:00
Ilia Denisov 604fe40bcf docs: reorder & testing 2026-05-07 00:58:53 +03:00
Ilia Denisov f446c6a2ac feat: backend service 2026-05-06 10:14:55 +03:00