Stage 12: observability & performance (OTel/OTLP, metrics, guest GC) #13
Reference in New Issue
Block a user
Delete Branch "feature/stage-12-observability"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Re-scoped Stage 12 to observability + performance + guest GC; adds Stage 13 (alphabet-on-wire) and Stage 14 (CI/deploy) to PLAN.md. Shared pkg/telemetry (none/stdout/otlp), gateway+connector telemetry parity, otelgrpc on the gRPC hops, domain metrics (variant-attributed), and the TODO-3 guest reaper. Tests + docs included.
- pkg/telemetry: shared OTel provider bootstrap (none/stdout/otlp + W3C propagators + Go runtime metrics); backend/internal/telemetry becomes a thin facade keeping its gin middleware. - Telemetry parity: gateway and the Telegram connector gain telemetry runtimes and config (GATEWAY_/TELEGRAM_ SERVICE_NAME + OTEL_*); otelgrpc instruments the backend push server, the gateway's backend+connector clients and the connector server. Default exporter stays none (collector/dashboards are Stage 14). - Operational metrics (variant attribute on game-scoped ones): game_replay_duration, game_move_validate_duration, games_started_total, games_abandoned_total, game_cache_active, chat_messages_total{kind}, gateway edge_request_duration. Wired via the SetMetrics setter pattern (default no-op meter). - TODO-3: account.GuestReaper deletes guests with no game seat past BACKEND_GUEST_RETENTION (default 30d, swept every BACKEND_GUEST_REAP_INTERVAL). - Tests: pkg/telemetry exporter selection; game/social/edge metric recording via a manual reader; config (otlp accepted, guest knobs); inttest guest reaper. - Docs: PLAN.md re-scopes Stage 12 and adds Stage 13 (alphabet-on-wire) + Stage 14 (CI/deploy) with the agreed dictionary-versioning resolution; ARCHITECTURE 11/13, TESTING, the three READMEs and FUNCTIONAL(+ru) updated.