dev-deploy: production mirror + full observability behind the /_gm gate #88

Merged
developer merged 8 commits from feature/dev-prod-mirror into development 2026-06-01 04:56:46 +00:00
Owner

Turns the long-lived dev environment into a production mirror with a full observability stack, and puts the operator console + Grafana + Mailpit behind a single /_gm Basic Auth gate.

Stages

  • Remove dev-sandbox everywhere (backend package, compose, local-dev); keep only the legacy-report loader (UI gated by VITE_GALAXY_DEV_AFFORDANCES).
  • Mailpit relay to a real Gmail (credentials live in Mailpit, not the backend; backend↔SMTP stays prod-identical).
  • Observability stack: Prometheus (15d) + Loki (7d) + promtail + Tempo (3d) + node-exporter + cAdvisor + Grafana on galaxy-dev-internal, no host ports. backend :9100 / gateway :9191 scraped; both export OTLP traces to Tempo.
  • Single /_gm gate: one Basic Auth (the admin-console account) fronts the console, /_gm/grafana/ (Grafana anonymous-Admin, own login disabled) and /_gm/mailpit/ (Mailpit MP_WEBROOT); links added to the console nav.
  • Docs: ARCHITECTURE §17, tools/dev-deploy/monitoring/README.md, dev-deploy README, FUNCTIONAL §10.2.1 (+ ru mirror).

New Gitea settings

  • secrets: GALAXY_DEV_MAIL_RELAY_USERNAME, GALAXY_DEV_MAIL_RELAY_PASSWORD, GALAXY_DEV_GRAFANA_ADMIN_PASSWORD (optional — Grafana is anonymous behind the gate)
  • var: GALAXY_DEV_MAIL_RELAY_MATCH

Verified live (galaxy.lan, 2026-06-01)

13/13 containers healthy; Prometheus 5/5 targets up; Loki logs; Tempo backend traces; Grafana 11.4 healthy. /_gm/ 401 → 200 (gm) → 401 (wrong); /_gm/grafana/ 200 anonymous; /_gm/mailpit/ 200 (auth) / 401 (none); former top-level /grafana/ /mailpit/ removed.

Production deployment (ssh + docker save/load) is out of scope; the compose is structured for reuse.

Turns the long-lived dev environment into a **production mirror** with a full observability stack, and puts the operator console + Grafana + Mailpit behind a single `/_gm` Basic Auth gate. ## Stages - **Remove dev-sandbox** everywhere (backend package, compose, local-dev); keep only the legacy-report loader (UI gated by `VITE_GALAXY_DEV_AFFORDANCES`). - **Mailpit relay** to a real Gmail (credentials live in Mailpit, not the backend; backend↔SMTP stays prod-identical). - **Observability stack**: Prometheus (15d) + Loki (7d) + promtail + Tempo (3d) + node-exporter + cAdvisor + Grafana on `galaxy-dev-internal`, no host ports. backend `:9100` / gateway `:9191` scraped; both export OTLP traces to Tempo. - **Single `/_gm` gate**: one Basic Auth (the admin-console account) fronts the console, `/_gm/grafana/` (Grafana anonymous-Admin, own login disabled) and `/_gm/mailpit/` (Mailpit `MP_WEBROOT`); links added to the console nav. - **Docs**: ARCHITECTURE §17, `tools/dev-deploy/monitoring/README.md`, dev-deploy README, FUNCTIONAL §10.2.1 (+ ru mirror). ## New Gitea settings - secrets: `GALAXY_DEV_MAIL_RELAY_USERNAME`, `GALAXY_DEV_MAIL_RELAY_PASSWORD`, `GALAXY_DEV_GRAFANA_ADMIN_PASSWORD` (optional — Grafana is anonymous behind the gate) - var: `GALAXY_DEV_MAIL_RELAY_MATCH` ## Verified live (galaxy.lan, 2026-06-01) 13/13 containers healthy; Prometheus 5/5 targets up; Loki logs; Tempo backend traces; Grafana 11.4 healthy. `/_gm/` 401 → 200 (gm) → 401 (wrong); `/_gm/grafana/` 200 anonymous; `/_gm/mailpit/` 200 (auth) / 401 (none); former top-level `/grafana/` `/mailpit/` removed. Production deployment (ssh + `docker save`/`load`) is out of scope; the compose is structured for reuse.
developer added 8 commits 2026-06-01 04:46:43 +00:00
refactor(dev): remove the dev-sandbox bootstrap everywhere
Tests · Go / test (push) Successful in 1m59s
0cae89cba2
Stage 1 of the dev-as-prod-mirror rework. The auto-provisioned "Dev
Sandbox" game and dummy users are removed so the dev contour starts
empty like prod; the separate legacy-report loader stays as the
test-data path.

- delete backend/internal/devsandbox (package + tests)
- drop the bootstrap call + DevSandboxConfig (struct, Config field,
  BACKEND_DEV_SANDBOX_* env, defaults, loader, validation)
- strip BACKEND_DEV_SANDBOX_* from dev-deploy + local-dev compose and
  .env.example; the generic engine-recycle / prune-broken-engines logic
  stays (it serves real games)
- update tooling docs (dev-deploy README + KNOWN-ISSUES, local-dev
  README + Makefile) and stale comments; DeleteGame and
  InsertMembershipDirect remain (exercised by lobby integration tests)

No app behaviour change beyond not auto-creating the sandbox game.
docs(ui): correct the synthetic-report loader gate comment
Tests · UI / test (push) Successful in 3m16s
225f89fad6
Stage 2 of the dev-as-prod-mirror rework. The legacy-report (synthetic)
report loader is already available in the dev-deploy UI: it is gated by
the build-time flag VITE_GALAXY_DEV_AFFORDANCES (set "true" in
dev-deploy.yaml line 89, unset in prod-build.yaml so prod strips it),
not by import.meta.env.DEV. Correct the stale header comment that
claimed import.meta.env.DEV. No functional change — the desired
"loader in dev, absent in prod" posture already holds.
Keep Mailpit as the backend's SMTP submission point and turn on its
relay so OTP/notification mail addressed to the owner reaches a real
Gmail inbox, while everything else stays captured-only.

- mailpit gains --smtp-relay-config + --smtp-relay-matching (default
  non-routable, so an unconfigured stack only captures); relay.conf is
  mounted from a new galaxy-dev-mailpit-config volume
- tools/dev-deploy/mailpit/relay.conf.tmpl + a dev-deploy.yaml step that
  renders it from Gitea secrets (Gmail App Password, never committed)
  and seeds the volume; the GALAXY_DEV_MAIL_RELAY_MATCH var drives the
  relay-matching recipient
- backend SMTP config unchanged (still -> galaxy-mailpit:1025)
- dev-deploy README documents the relay + required secrets/vars

Verified locally: compose config valid; the rendered relay.conf is
accepted by mailpit v1.21.8 (relay + recipient-matching enabled).
Real Gmail delivery is verified at the dev-deploy preview once the
owner sets the secrets.
Stand up a production-mirror monitoring stack in the long-lived dev
contour, all on galaxy-dev-internal with no host ports (reached only via
the in-repo galaxy-dev-caddy):

- Prometheus scrapes backend:9100, gateway:9191, node-exporter and
  cadvisor (30s interval, 15d retention); Loki (7d) + promtail (Docker
  service discovery by the galaxy.stack=dev-deploy label) for logs;
  Tempo (3d) for traces.
- Backend and gateway now export OTLP traces to Tempo over plaintext
  gRPC on the internal network (OTEL_EXPORTER_OTLP_INSECURE).
- Grafana provisioned as code (Prometheus/Loki/Tempo datasources plus a
  starter dashboard), served under /grafana/ via Caddy sub-path mode;
  admin password from the GALAXY_DEV_GRAFANA_ADMIN_PASSWORD secret.
- Expose the Mailpit capture UI under /mailpit/ (Caddy basic-auth +
  MP_WEBROOT) so every captured message is readable regardless of relay.
- dev-deploy.yaml seeds the monitoring config to a stable, reboot-
  surviving host path and injects the Grafana admin secret.

Per-service memory limits keep the footprint within budget. All
collector config lives under tools/dev-deploy/monitoring/ for dev/prod
parity.
Deploy wiring for the observability stack (the services and collector
config landed in the previous commit):

- Caddyfile.dev: route /grafana/* to galaxy-grafana:3000 (Caddy
  sub-path mode, Grafana keeps its own login) and /mailpit/* to
  galaxy-mailpit:8025 behind dev basic-auth, so the captured-mail UI
  (every message, relayed or not) and Grafana are reachable through the
  single dev origin.
- dev-deploy.yaml: seed the monitoring config tree to a stable,
  reboot-surviving host path (GALAXY_DEV_MONITORING_DIR) before bringing
  the stack up, and inject the Grafana admin password from a Gitea
  secret (GALAXY_DEV_GRAFANA_ADMIN_PASSWORD; empty falls back to the
  compose default).
MP_WEBROOT=/mailpit prefixes every Mailpit HTTP route, including the
/livez health endpoint. The container healthcheck still probed
http://localhost:8025/livez, which now 404s, so Mailpit reported
unhealthy; the backend depends_on it with condition: service_healthy
and never started, cascading to the gateway and Caddy and failing
`docker compose up --wait`. Point the healthcheck at /mailpit/livez.
Consolidate the operator console and the observability / captured-mail
UIs behind a single Basic Auth gate, so one password (the admin-console
account, dev: gm/gm-dev-password) unlocks all three, with links in the
console nav:

- Caddyfile.dev: a single basic_auth on /_gm/* fronts nested routes —
  /_gm/grafana/ -> Grafana, /_gm/mailpit/ -> Mailpit, catch-all -> the
  gateway/backend console. Caddy forwards the same Authorization header,
  which the backend console also accepts, so there is one prompt. The
  former top-level /grafana/ and /mailpit/ routes are removed.
- Grafana: served under /_gm/grafana/ (sub-path) as anonymous Admin with
  the login form and basic auth disabled, so it relies solely on the
  /_gm gate and ignores the forwarded credentials.
- Mailpit: MP_WEBROOT=/_gm/mailpit (and the healthcheck path) so its UI
  lives under the gate.
- Operator console: add Grafana and Mailpit links to the nav.
docs: observability stack + the single /_gm gate for Grafana/Mailpit
Tests · Go / test (pull_request) Successful in 1m56s
Tests · Integration / integration (pull_request) Successful in 1m41s
Tests · UI / test (pull_request) Successful in 3m23s
814eae0802
- ARCHITECTURE §17: the dev (production-mirror) collection stack
  (Prometheus / Loki / Tempo / promtail / node-exporter / cAdvisor) and
  the single /_gm Basic Auth gate fronting Grafana and the Mailpit UI.
- tools/dev-deploy/monitoring/README.md (new): services, what is
  collected, Grafana-behind-the-gate access, config delivery, tuning.
- tools/dev-deploy/README.md: an Observability section; the Mailpit UI
  under /_gm/mailpit/; Networking diagram and Files list updated.
- FUNCTIONAL §10.2.1 (+ ru mirror): the operator console nav links to
  Grafana and Mailpit under the same /_gm gate, one sign-in for all.
owner approved these changes 2026-06-01 04:54:07 +00:00
developer merged commit a19512adaa into development 2026-06-01 04:56:46 +00:00
developer deleted branch feature/dev-prod-mirror 2026-06-01 04:56:46 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: developer/galaxy-game#88