docs: observability stack + the single /_gm gate for Grafana/Mailpit
Tests · Go / test (pull_request) Successful in 1m56s
Tests · Integration / integration (pull_request) Successful in 1m41s
Tests · UI / test (pull_request) Successful in 3m23s

- ARCHITECTURE §17: the dev (production-mirror) collection stack
  (Prometheus / Loki / Tempo / promtail / node-exporter / cAdvisor) and
  the single /_gm Basic Auth gate fronting Grafana and the Mailpit UI.
- tools/dev-deploy/monitoring/README.md (new): services, what is
  collected, Grafana-behind-the-gate access, config delivery, tuning.
- tools/dev-deploy/README.md: an Observability section; the Mailpit UI
  under /_gm/mailpit/; Networking diagram and Files list updated.
- FUNCTIONAL §10.2.1 (+ ru mirror): the operator console nav links to
  Grafana and Mailpit under the same /_gm gate, one sign-in for all.
This commit is contained in:
Ilia Denisov
2026-06-01 06:37:24 +02:00
parent cb8491c200
commit 814eae0802
5 changed files with 140 additions and 5 deletions
+13
View File
@@ -888,6 +888,19 @@ addition.
- Health probes are unauthenticated `GET /healthz` (process liveness) and
`GET /readyz` (Postgres reachable, migrations applied, gRPC listener
bound). Probes are excluded from anti-replay and rate limiting.
- **Collection (dev, production mirror).** The long-lived dev environment
(`tools/dev-deploy/`) runs a full metrics + logs + traces stack on its
internal network with no host ports: Prometheus scrapes the backend
(`:9100`) and gateway (`:9191`) endpoints plus `node-exporter` and
cAdvisor; Tempo ingests OTLP traces from backend and gateway; Loki
stores container logs shipped by promtail (Docker service-discovery on
the `galaxy.stack=dev-deploy` label). Grafana (provisioned datasources
+ dashboards) and the Mailpit capture UI are reached only through the
operator console's single `/_gm` Basic Auth gate (§14.1) — at
`/_gm/grafana/` and `/_gm/mailpit/` — so one password covers the
console and both UIs. Retention is tuned small (Prometheus 15d, Loki
7d, Tempo 3d). The same compose fragment is meant to back production.
See `tools/dev-deploy/monitoring/README.md`.
## 18. CI and Environments
+4 -1
View File
@@ -1182,7 +1182,10 @@ The console landing page is a dashboard that summarises operational
health: whether the backend is ready and the database reachable, how many
game runtimes sit in each state, and the depth of the mail and
notification queues. It is a read-only point-in-time view for quick
triage, not a metrics history.
triage, not a metrics history. The console nav also links to Grafana
(metrics, logs and traces) and the Mailpit capture UI, which the
deployment serves under the same `/_gm` Basic Auth gate — one sign-in
covers the console and both UIs.
### 10.3 Admin account management
+3 -1
View File
@@ -1218,7 +1218,9 @@ admin-API, либо через серверно-рендеримую веб-ко
здоровье: готов ли backend и доступна ли БД, сколько игровых рантаймов
в каждом состоянии, какова глубина очередей почты и уведомлений. Это
read-only-срез на текущий момент для быстрой диагностики, не история
метрик.
метрик. Навигация консоли также ведёт в Grafana (метрики, логи и
трейсы) и в UI захвата почты Mailpit, которые деплой отдаёт под тем же
шлюзом Basic Auth `/_gm` — один вход покрывает консоль и оба UI.
### 10.3 Управление admin-аккаунтами