developer/galaxy-game

Fork 0

Files

T

History

Ilia Denisov 814eae0802

Tests · Go / test (pull_request) Successful in 1m56s

Details

Tests · Integration / integration (pull_request) Successful in 1m41s

Details

Tests · UI / test (pull_request) Successful in 3m23s

Details

docs: observability stack + the single /_gm gate for Grafana/Mailpit

- ARCHITECTURE §17: the dev (production-mirror) collection stack
  (Prometheus / Loki / Tempo / promtail / node-exporter / cAdvisor) and
  the single /_gm Basic Auth gate fronting Grafana and the Mailpit UI.
- tools/dev-deploy/monitoring/README.md (new): services, what is
  collected, Grafana-behind-the-gate access, config delivery, tuning.
- tools/dev-deploy/README.md: an Observability section; the Mailpit UI
  under /_gm/mailpit/; Networking diagram and Files list updated.
- FUNCTIONAL §10.2.1 (+ ru mirror): the operator console nav links to
  Grafana and Mailpit under the same /_gm gate, one sign-in for all.

2026-06-01 06:37:24 +02:00

grafana

feat(dev-deploy): full observability stack (Prometheus/Grafana/Loki/Tempo)

2026-05-31 23:39:06 +02:00

loki

feat(dev-deploy): full observability stack (Prometheus/Grafana/Loki/Tempo)

2026-05-31 23:39:06 +02:00

prometheus

feat(dev-deploy): full observability stack (Prometheus/Grafana/Loki/Tempo)

2026-05-31 23:39:06 +02:00

promtail

feat(dev-deploy): full observability stack (Prometheus/Grafana/Loki/Tempo)

2026-05-31 23:39:06 +02:00

tempo

feat(dev-deploy): full observability stack (Prometheus/Grafana/Loki/Tempo)

2026-05-31 23:39:06 +02:00

README.md

docs: observability stack + the single /_gm gate for Grafana/Mailpit

2026-06-01 06:37:24 +02:00

README.md

`tools/dev-deploy/monitoring/` — observability stack

The long-lived dev environment runs a full metrics + logs + traces stack alongside the application as a production mirror: the same compose fragment and collector configs are meant to back production later. Every collector lives on the internal galaxy-dev-internal network and publishes no host port. The browser-reachable pieces (Grafana and the Mailpit UI) sit behind the operator console's single /_gm Basic Auth gate — see ../README.md and ARCHITECTURE.md §14.

Services

Service	Image	Role	Reachable
`galaxy-prometheus`	`prom/prometheus`	Scrape + store metrics (15d)	internal `:9090`
`galaxy-loki`	`grafana/loki`	Log store (7d)	internal `:3100`
`galaxy-promtail`	`grafana/promtail`	Ship container logs to Loki	—
`galaxy-tempo`	`grafana/tempo`	Trace store (3d), OTLP receiver	internal `:3200`, OTLP `:4317`/`:4318`
`galaxy-node-exporter`	`prom/node-exporter`	Host metrics	internal `:9100`
`galaxy-cadvisor`	`cadvisor`	Per-container CPU/memory/IO	internal `:8080`
`galaxy-grafana`	`grafana/grafana`	Dashboards + Explore	Caddy `/_gm/grafana/`

What is collected

Metrics. Prometheus (30s interval) scrapes the backend Prometheus endpoint (galaxy-backend:9100), the gateway admin endpoint (galaxy-api:9191), node-exporter (host) and cAdvisor (per container). Engine containers expose no /metrics; cAdvisor covers their resource use.
Logs. promtail discovers containers through the Docker API, filtered to the galaxy.stack=dev-deploy label, and ships their stdout/stderr to Loki labelled by container.
Traces. backend and gateway export OTLP traces over gRPC to Tempo (galaxy-tempo:4317), plaintext on the internal network (OTEL_EXPORTER_OTLP_INSECURE=true, since Tempo's receiver is not TLS-wrapped inside the contour).

Grafana access (behind the `/_gm` gate)

Grafana is served under /_gm/grafana/ (GF_SERVER_ROOT_URL + GF_SERVER_SERVE_FROM_SUB_PATH=true) behind the shared operator gate: the Caddy /_gm/* Basic Auth (the admin-console account) is the only barrier. Grafana itself runs as anonymous Admin with its login form and basic auth disabled (GF_AUTH_ANONYMOUS_ENABLED=true, GF_AUTH_ANONYMOUS_ORG_ROLE=Admin, GF_AUTH_DISABLE_LOGIN_FORM=true, GF_AUTH_BASIC_ENABLED=false), so it ignores the forwarded credentials and asks for no second password. GALAXY_DEV_GRAFANA_ADMIN_PASSWORD still seeds the admin user for provisioning/API use.

Datasources (Prometheus, Loki, Tempo) and a starter dashboard (grafana/dashboards/galaxy-overview.json) are provisioned as code under grafana/provisioning/.

Config delivery

dev-deploy.yaml copies this directory to a stable host path ($HOME/.galaxy-dev/monitoring, exported as GALAXY_DEV_MONITORING_DIR) before compose up, and the compose binds it read-only into the collectors. A stable path — not the ephemeral CI workspace — keeps the mounts valid across container restarts and host reboots (the same lesson as the geoip volume; see ../KNOWN-ISSUES.md).

Tuning (cost knobs)

Defaults favour the smallest workable footprint; all are config/compose values:

Prometheus scrape_interval=30s, --storage.tsdb.retention.time=15d.
Loki retention_period=168h (7d); Tempo block_retention=72h (3d).
cAdvisor --housekeeping_interval=30s.
Per-service deploy.resources.limits.memory caps (~1.5 GB total cap; steady-state well under that).

Seven always-on containers cost roughly ~1.1 GB steady RAM and ~1.5–2.5 GB disk at these retention windows. cAdvisor is the main CPU cost; on a constrained host it can be dropped (host + app metrics still cover most needs).

README.md Unescape Escape

tools/dev-deploy/monitoring/ — observability stack

Services

What is collected

Grafana access (behind the /_gm gate)

Config delivery

Tuning (cost knobs)

README.md

`tools/dev-deploy/monitoring/` — observability stack

Grafana access (behind the `/_gm` gate)