Files
galaxy-game/notification/docs/runtime.md
T
2026-04-22 08:49:45 +02:00

207 lines
6.0 KiB
Markdown

# Runtime and Components
The diagram below focuses on the deployed `galaxy/notification` process and
its runtime dependencies.
```mermaid
flowchart LR
subgraph Producers
GM["Game Master"]
Lobby["Game Lobby"]
Geo["Geo Profile Service"]
end
subgraph Notify["Notification Service process"]
Probe["Private probe HTTP listener\n/healthz /readyz"]
Consumer["Notification intent consumer"]
Accept["Intent acceptance service"]
Push["Push route publisher"]
Email["Email route publisher"]
Telemetry["Logs, traces, metrics"]
end
User["User Service"]
Gateway["Edge Gateway\nclient-event stream consumer"]
Mail["Mail Service\ncommand stream consumer"]
Redis["Redis\nstate + streams + schedules"]
GM --> Redis
Lobby --> Redis
Geo --> Redis
Consumer --> Redis
Consumer --> Accept
Accept --> User
Accept --> Redis
Push --> Redis
Email --> Redis
Push --> Gateway
Email --> Mail
Probe --> Telemetry
Consumer --> Telemetry
Push --> Telemetry
Email --> Telemetry
```
## Listener
`notification` exposes exactly one HTTP listener:
| Listener | Default addr | Purpose |
| --- | --- | --- |
| Internal probe HTTP | `:8092` | Private liveness and readiness probes |
Shared listener defaults:
- read-header timeout: `2s`
- read timeout: `10s`
- idle timeout: `1m`
Probe routes:
- `GET /healthz` returns `{"status":"ok"}`
- `GET /readyz` returns `{"status":"ready"}`
- `readyz` is process-local after successful startup and does not perform a
live Redis ping per request
Intentional omissions:
- no public listener
- no operator API
- there is no `/metrics` route
## Startup Wiring
`cmd/notification` loads config, constructs logging, and builds the runtime
through `internal/app.NewRuntime`.
The runtime wires:
- Redis client with startup connectivity check
- `User Service` HTTP client for recipient enrichment
- private probe HTTP server
- plain `XREAD` intent consumer
- `push` route publisher for `Gateway`
- `email` route publisher for `Mail Service`
- Redis-backed accepted-intent, route, idempotency, malformed-intent,
dead-letter, stream-offset, and schedule stores
- OpenTelemetry traces and metrics exporters
Startup fails fast on invalid configuration or unavailable Redis.
## Background Components
### Intent consumer
- reads one plain `XREAD` stream, default `notification:intents`
- starts from stored offset or `0-0`
- advances offset only after durable acceptance or durable malformed-intent
recording
- stops without offset advancement when `User Service` enrichment has a
temporary failure
### Acceptance service
- validates the normalized intent envelope
- applies idempotency rules for `(producer, idempotency_key)`
- enriches user-targeted recipients before durable route write
- materializes route slots for `push` and `email`
- stores malformed-intent records for invalid payloads, idempotency conflicts,
and unresolved users
### Push publisher
- scans `notification:route_schedule`
- processes only scheduled route IDs beginning with `push:`
- coordinates replicas with temporary route leases
- publishes Gateway client events with `XADD MAXLEN ~`
- omits `device_session_id` so Gateway fans out to all active streams for the
target user
### Email publisher
- scans `notification:route_schedule`
- processes only scheduled route IDs beginning with `email:`
- coordinates replicas with temporary route leases
- publishes Mail Service generic commands with plain `XADD`
- always uses `payload_mode=template`
## Configuration Groups
Required:
- `NOTIFICATION_REDIS_ADDR`
- `NOTIFICATION_USER_SERVICE_BASE_URL`
Core process config:
- `NOTIFICATION_SHUTDOWN_TIMEOUT`
- `NOTIFICATION_LOG_LEVEL`
Internal HTTP config:
- `NOTIFICATION_INTERNAL_HTTP_ADDR` with default `:8092`
- `NOTIFICATION_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s`
- `NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT` with default `10s`
- `NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m`
Redis connectivity:
- `NOTIFICATION_REDIS_USERNAME`
- `NOTIFICATION_REDIS_PASSWORD`
- `NOTIFICATION_REDIS_DB`
- `NOTIFICATION_REDIS_TLS_ENABLED`
- `NOTIFICATION_REDIS_OPERATION_TIMEOUT`
- `NOTIFICATION_INTENTS_STREAM`
- `NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT`
- `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM`
- `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN`
- `NOTIFICATION_MAIL_DELIVERY_COMMANDS_STREAM`
Retry and retention:
- `NOTIFICATION_PUSH_RETRY_MAX_ATTEMPTS`
- `NOTIFICATION_EMAIL_RETRY_MAX_ATTEMPTS`
- `NOTIFICATION_ROUTE_BACKOFF_MIN`
- `NOTIFICATION_ROUTE_BACKOFF_MAX`
- `NOTIFICATION_ROUTE_LEASE_TTL`
- `NOTIFICATION_DEAD_LETTER_TTL`
- `NOTIFICATION_RECORD_TTL`
- `NOTIFICATION_IDEMPOTENCY_TTL`
User enrichment:
- `NOTIFICATION_USER_SERVICE_TIMEOUT` with default `1s`
Administrator routing:
- `NOTIFICATION_ADMIN_EMAILS_GEO_REVIEW_RECOMMENDED`
- `NOTIFICATION_ADMIN_EMAILS_GAME_GENERATION_FAILED`
- `NOTIFICATION_ADMIN_EMAILS_LOBBY_RUNTIME_PAUSED_AFTER_START`
- `NOTIFICATION_ADMIN_EMAILS_LOBBY_APPLICATION_SUBMITTED`
Telemetry:
- `OTEL_SERVICE_NAME`
- `OTEL_TRACES_EXPORTER`
- `OTEL_METRICS_EXPORTER`
- `OTEL_EXPORTER_OTLP_PROTOCOL`
- `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`
- `OTEL_EXPORTER_OTLP_METRICS_PROTOCOL`
- `NOTIFICATION_OTEL_STDOUT_TRACES_ENABLED`
- `NOTIFICATION_OTEL_STDOUT_METRICS_ENABLED`
## Runtime Notes
- `Notification Service` does not create or own notification audiences; it
trusts producers to publish concrete user recipients.
- Administrator recipients are type-specific configuration, not a global list.
- A missing user is treated as a producer input defect.
- A temporary `User Service` outage pauses stream progress for the affected
entry and allows replay after restart.
- Go producers use `galaxy/notificationintent` to build compatible intents.
- Producers append intents with plain `XADD`; producer-side publish failure is
notification degradation and must not roll back already committed source
business state.
- Dead-letter replay is performed by publishing a new compatible intent with a
new `idempotency_key`.