# Runtime and Components The diagram below focuses on the deployed `galaxy/notification` process and its runtime dependencies. ```mermaid flowchart LR subgraph Producers GM["Game Master"] Lobby["Game Lobby"] Geo["Geo Profile Service"] end subgraph Notify["Notification Service process"] Probe["Private probe HTTP listener\n/healthz /readyz"] Consumer["Notification intent consumer"] Accept["Intent acceptance service"] Push["Push route publisher"] Email["Email route publisher"] Telemetry["Logs, traces, metrics"] end User["User Service"] Gateway["Edge Gateway\nclient-event stream consumer"] Mail["Mail Service\ncommand stream consumer"] Redis["Redis\nstate + streams + schedules"] GM --> Redis Lobby --> Redis Geo --> Redis Consumer --> Redis Consumer --> Accept Accept --> User Accept --> Redis Push --> Redis Email --> Redis Push --> Gateway Email --> Mail Probe --> Telemetry Consumer --> Telemetry Push --> Telemetry Email --> Telemetry ``` ## Listener `notification` exposes exactly one HTTP listener: | Listener | Default addr | Purpose | | --- | --- | --- | | Internal probe HTTP | `:8092` | Private liveness and readiness probes | Shared listener defaults: - read-header timeout: `2s` - read timeout: `10s` - idle timeout: `1m` Probe routes: - `GET /healthz` returns `{"status":"ok"}` - `GET /readyz` returns `{"status":"ready"}` - `readyz` is process-local after successful startup and does not perform a live Redis ping per request Intentional omissions: - no public listener - no operator API - there is no `/metrics` route ## Startup Wiring `cmd/notification` loads config, constructs logging, and builds the runtime through `internal/app.NewRuntime`. The runtime wires: - Redis client with startup connectivity check - `User Service` HTTP client for recipient enrichment - private probe HTTP server - plain `XREAD` intent consumer - `push` route publisher for `Gateway` - `email` route publisher for `Mail Service` - Redis-backed accepted-intent, route, idempotency, malformed-intent, dead-letter, stream-offset, and schedule stores - OpenTelemetry traces and metrics exporters Startup fails fast on invalid configuration or unavailable Redis. ## Background Components ### Intent consumer - reads one plain `XREAD` stream, default `notification:intents` - starts from stored offset or `0-0` - advances offset only after durable acceptance or durable malformed-intent recording - stops without offset advancement when `User Service` enrichment has a temporary failure ### Acceptance service - validates the normalized intent envelope - applies idempotency rules for `(producer, idempotency_key)` - enriches user-targeted recipients before durable route write - materializes route slots for `push` and `email` - stores malformed-intent records for invalid payloads, idempotency conflicts, and unresolved users ### Push publisher - scans `notification:route_schedule` - processes only scheduled route IDs beginning with `push:` - coordinates replicas with temporary route leases - publishes Gateway client events with `XADD MAXLEN ~` - omits `device_session_id` so Gateway fans out to all active streams for the target user ### Email publisher - scans `notification:route_schedule` - processes only scheduled route IDs beginning with `email:` - coordinates replicas with temporary route leases - publishes Mail Service generic commands with plain `XADD` - always uses `payload_mode=template` ## Configuration Groups Required: - `NOTIFICATION_REDIS_ADDR` - `NOTIFICATION_USER_SERVICE_BASE_URL` Core process config: - `NOTIFICATION_SHUTDOWN_TIMEOUT` - `NOTIFICATION_LOG_LEVEL` Internal HTTP config: - `NOTIFICATION_INTERNAL_HTTP_ADDR` with default `:8092` - `NOTIFICATION_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT` with default `10s` - `NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m` Redis connectivity: - `NOTIFICATION_REDIS_USERNAME` - `NOTIFICATION_REDIS_PASSWORD` - `NOTIFICATION_REDIS_DB` - `NOTIFICATION_REDIS_TLS_ENABLED` - `NOTIFICATION_REDIS_OPERATION_TIMEOUT` - `NOTIFICATION_INTENTS_STREAM` - `NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT` - `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM` - `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN` - `NOTIFICATION_MAIL_DELIVERY_COMMANDS_STREAM` Retry and retention: - `NOTIFICATION_PUSH_RETRY_MAX_ATTEMPTS` - `NOTIFICATION_EMAIL_RETRY_MAX_ATTEMPTS` - `NOTIFICATION_ROUTE_BACKOFF_MIN` - `NOTIFICATION_ROUTE_BACKOFF_MAX` - `NOTIFICATION_ROUTE_LEASE_TTL` - `NOTIFICATION_DEAD_LETTER_TTL` - `NOTIFICATION_RECORD_TTL` - `NOTIFICATION_IDEMPOTENCY_TTL` User enrichment: - `NOTIFICATION_USER_SERVICE_TIMEOUT` with default `1s` Administrator routing: - `NOTIFICATION_ADMIN_EMAILS_GEO_REVIEW_RECOMMENDED` - `NOTIFICATION_ADMIN_EMAILS_GAME_GENERATION_FAILED` - `NOTIFICATION_ADMIN_EMAILS_LOBBY_RUNTIME_PAUSED_AFTER_START` - `NOTIFICATION_ADMIN_EMAILS_LOBBY_APPLICATION_SUBMITTED` Telemetry: - `OTEL_SERVICE_NAME` - `OTEL_TRACES_EXPORTER` - `OTEL_METRICS_EXPORTER` - `OTEL_EXPORTER_OTLP_PROTOCOL` - `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` - `OTEL_EXPORTER_OTLP_METRICS_PROTOCOL` - `NOTIFICATION_OTEL_STDOUT_TRACES_ENABLED` - `NOTIFICATION_OTEL_STDOUT_METRICS_ENABLED` ## Runtime Notes - `Notification Service` does not create or own notification audiences; it trusts producers to publish concrete user recipients. - Administrator recipients are type-specific configuration, not a global list. - A missing user is treated as a producer input defect. - A temporary `User Service` outage pauses stream progress for the affected entry and allows replay after restart. - Go producers use `galaxy/notificationintent` to build compatible intents. - Producers append intents with plain `XADD`; producer-side publish failure is notification degradation and must not roll back already committed source business state. - Dead-letter replay is performed by publishing a new compatible intent with a new `idempotency_key`.