# Notification Service Canonical references: - [Service-local docs](docs/README.md) - [Intent AsyncAPI contract](api/intents-asyncapi.yaml) - [Probe OpenAPI contract](openapi.yaml) - [Gateway push model](../gateway/README.md) - [Mail async command contract](../mail/api/delivery-commands-asyncapi.yaml) - [Notification FlatBuffers payloads](../pkg/schema/fbs/notification.fbs) - [System architecture](../ARCHITECTURE.md) ## Purpose `Notification Service` is the internal asynchronous orchestration layer for platform notifications. It accepts normalized notification intents from upstream services, materializes per-recipient routes, enriches user-targeted routes through `User Service`, publishes client-facing push events toward `Gateway`, publishes non-auth email commands toward `Mail Service`, and isolates transient downstream failures with independent retry budgets per channel. The service is intentionally not a source of truth for: - game state - lobby membership - invite ownership - review flags - notification preferences - email delivery attempts ## Responsibility Boundaries `Notification Service` is responsible for: - consuming normalized notification intents from a dedicated Redis Stream - validating intent envelopes and rejecting malformed or conflicting duplicates - persisting durable notification and route state - resolving user contact data from `User Service` by `user_id` - selecting locale from `User Service.preferred_language` with `en` fallback - shaping lightweight push payloads for user-facing events - publishing template-mode email commands to `Mail Service` - retrying route publication independently for `push` and `email` - persisting dead-letter entries for exhausted routes `Notification Service` is not responsible for: - computing business audiences from `game_id` or other domain identifiers - owning administrator identity or administrator user records - sending auth-code email - storing per-user notification preferences in v1 - exposing an operator REST API in v1 The key design rule is that upstream producers must publish the concrete `recipient_user_id` values for user-targeted notification intents. For administrator-only notification types, recipient email addresses are resolved from `Notification Service` configuration by `notification_type`. Private-game invite notifications in v1 remain user-bound by internal `user_id` values and must not target recipients by raw email address. ## Runtime Surface The implemented process contains: - one private internal HTTP probe listener - process-wide structured logging - process-wide OpenTelemetry runtime - one shared `galaxy/notificationintent` producer contract module - one shared Redis client with startup connectivity check - one trusted `User Service` HTTP enrichment client - one plain-`XREAD` notification-intent consumer - one long-lived `push` route publisher - one long-lived `email` route publisher - durable accepted-intent, route, idempotency, malformed-intent, and stream-offset storage in Redis - user-targeted route enrichment during intent acceptance before durable write - client-facing `push` publication toward `Gateway` - template-mode `email` publication toward `Mail Service` - durable `push` and `email` retry, dead-letter, and temporary lease coordination in Redis - OpenTelemetry counters and observable gauges for intent intake, user enrichment, route publication, route schedule depth, and intent stream lag - graceful shutdown on process cancellation Probe contract: - `GET /healthz` returns `{"status":"ok"}` - `GET /readyz` returns `{"status":"ready"}` - `readyz` is process-local after successful startup and does not perform a live Redis ping per request - there is no `/metrics` route Runtime behavior: - the intent consumer reads `notification:intents` with plain `XREAD` - when no stored stream offset exists, the consumer starts from `0-0` - the persisted offset advances only after durable acceptance or durable malformed-intent recording - user-targeted routes are enriched through `GET /api/v1/internal/users/{user_id}` before durable route write - `404 subject_not_found` from `User Service` is recorded under malformed-intent storage with `failure_code=recipient_not_found` - temporary `User Service` lookup failures stop the consumer before stream-offset advance - due `push` routes are published toward `Gateway` from the shared `notification:route_schedule` - due `email` routes are published toward `Mail Service` from the shared `notification:route_schedule` - the `push` publisher claims only routes whose `route_id` starts with `push:` - the `email` publisher claims only routes whose `route_id` starts with `email:` - replicas coordinate through temporary Redis lease `notification:route_leases::` - `Gateway` publication uses `XADD MAXLEN ~` with `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN` - `event_id` equals `/` - `Mail Service` publication uses plain `XADD` with no stream trimming - `delivery_id` equals `/` - `idempotency_key` equals `notification:/` - `requested_at_ms` equals `accepted_at_ms` - `request_id` and `trace_id` are forwarded when present - `device_session_id` is intentionally omitted so `Gateway` fans the event out to every active stream of that user - Go producers use `galaxy/notificationintent` to construct and publish compatible intents into `notification:intents` - producer publication uses plain `XADD` without stream trimming or hidden helper retries - a producer-side notification publication failure is notification degradation and must not roll back the source business state - metric export uses the configured OpenTelemetry exporters only - there is still no `/metrics` route - `notification.route_schedule.depth` and `notification.route_schedule.oldest_age_ms` are derived from `notification:route_schedule` - `notification.intent_stream.oldest_unprocessed_age_ms` is derived from the persisted intent stream offset and the configured ingress stream - manual dead-letter replay is performed by publishing a new compatible intent with a new `idempotency_key`; existing dead-letter records remain audit history until TTL expiry The target process shape is one internal-only process with: - one notification-intent consumer - one `push` route publisher for `Gateway` - one `email` route publisher for `Mail Service` Intentional runtime omissions in v1: - no public ingress - no dedicated operator REST API - no direct client delivery - no direct SMTP integration ## Configuration Required: - `NOTIFICATION_REDIS_ADDR` - `NOTIFICATION_USER_SERVICE_BASE_URL` Primary configuration groups: - process and logging: - `NOTIFICATION_SHUTDOWN_TIMEOUT` - `NOTIFICATION_LOG_LEVEL` - internal probe HTTP: - `NOTIFICATION_INTERNAL_HTTP_ADDR` with default `:8092` - `NOTIFICATION_INTERNAL_HTTP_READ_HEADER_TIMEOUT` with default `2s` - `NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT` with default `10s` - `NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT` with default `1m` - Redis connectivity: - `NOTIFICATION_REDIS_USERNAME` - `NOTIFICATION_REDIS_PASSWORD` - `NOTIFICATION_REDIS_DB` - `NOTIFICATION_REDIS_TLS_ENABLED` - `NOTIFICATION_REDIS_OPERATION_TIMEOUT` - stream names: - `NOTIFICATION_INTENTS_STREAM` with default `notification:intents` - `NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT` with default `2s` - `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM` with default `gateway:client-events` - `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN` with default `1024` - `NOTIFICATION_MAIL_DELIVERY_COMMANDS_STREAM` with default `mail:delivery_commands` - retry and dead-letter: - `NOTIFICATION_PUSH_RETRY_MAX_ATTEMPTS` with default `3` - `NOTIFICATION_EMAIL_RETRY_MAX_ATTEMPTS` with default `7` - `NOTIFICATION_ROUTE_BACKOFF_MIN` with default `1s` - `NOTIFICATION_ROUTE_BACKOFF_MAX` with default `5m` - `NOTIFICATION_ROUTE_LEASE_TTL` with default `5s` - `NOTIFICATION_DEAD_LETTER_TTL` with default `720h` - `NOTIFICATION_RECORD_TTL` with default `720h` - `NOTIFICATION_IDEMPOTENCY_TTL` with default `168h` - `User Service` enrichment: - `NOTIFICATION_USER_SERVICE_TIMEOUT` with default `1s` - administrator routing: - `NOTIFICATION_ADMIN_EMAILS_GEO_REVIEW_RECOMMENDED` - `NOTIFICATION_ADMIN_EMAILS_GAME_GENERATION_FAILED` - `NOTIFICATION_ADMIN_EMAILS_LOBBY_RUNTIME_PAUSED_AFTER_START` - `NOTIFICATION_ADMIN_EMAILS_LOBBY_APPLICATION_SUBMITTED` - OpenTelemetry: - standard `OTEL_*` variables - `NOTIFICATION_OTEL_STDOUT_TRACES_ENABLED` - `NOTIFICATION_OTEL_STDOUT_METRICS_ENABLED` Each administrator configuration variable stores a comma-separated list of email addresses for exactly one `notification_type`. v1 does not use one global admin-recipient list shared across all administrative events. ## Stable Input Contract The service accepts intents from one dedicated Redis Stream: - `notification:intents` The canonical envelope is defined in [api/intents-asyncapi.yaml](api/intents-asyncapi.yaml). Go producers should use the shared `galaxy/notificationintent` module to build and append compatible stream entries instead of duplicating field names, payload structs, or validation rules locally. Required envelope fields: - `notification_type` - `producer` - `audience_kind` - `idempotency_key` - `occurred_at_ms` - `payload_json` Optional envelope fields: - `recipient_user_ids_json` - `request_id` - `trace_id` Rules: - `audience_kind=user` requires `recipient_user_ids_json` with one or more unique stable `user_id` values - `audience_kind=admin_email` forbids `recipient_user_ids_json` - `recipient_user_ids_json` is normalized as an unordered recipient set, so duplicate `user_id` values are invalid and element order does not affect idempotency - `request_id` and `trace_id` are observability-only metadata and do not participate in the idempotency fingerprint - `payload_json` is type-specific, must remain backward-compatible for each `notification_type`, and is normalized structurally for duplicate detection: insignificant whitespace and object key order are ignored while array order remains significant - a replay with the same `(producer, idempotency_key)` and the same normalized payload is treated as a successful duplicate - a replay with the same `(producer, idempotency_key)` but different normalized content is recorded as a conflicting duplicate under malformed-intent storage with `failure_code=idempotency_conflict` and must not create new routes - during user enrichment, a missing `user_id` in `User Service` is recorded under malformed-intent storage with `failure_code=recipient_not_found` Malformed stream entries do not create durable notification records. They are logged, metered, and recorded separately for operator inspection. Accepted intents use the original Redis Stream `stream_entry_id` as `notification_id`. ## Notification Catalog `payload_json` fields are normalized by the producer before publication. | `notification_type` | Producer | Audience | Channels | Required `payload_json` fields | | --- | --- | --- | --- | --- | | `geo.review_recommended` | `Geo Profile Service` (`geoprofile`) | configured admin email list (`audience_kind=admin_email`) | `email` | `user_id`, `user_email`, `observed_country`, `usual_connection_country`, `review_reason` | | `game.turn.ready` | `Game Master` (`game_master`) | active accepted participants (`audience_kind=user`) | `push+email` | `game_id`, `game_name`, `turn_number` | | `game.finished` | `Game Master` (`game_master`) | active accepted participants (`audience_kind=user`) | `push+email` | `game_id`, `game_name`, `final_turn_number` | | `game.generation_failed` | `Game Master` (`game_master`) | configured admin email list (`audience_kind=admin_email`) | `email` | `game_id`, `game_name`, `failure_reason` | | `lobby.runtime_paused_after_start` | `Game Lobby` (`game_lobby`) | configured admin email list (`audience_kind=admin_email`) | `email` | `game_id`, `game_name` | | `lobby.application.submitted` | `Game Lobby` (`game_lobby`) | private owner (`audience_kind=user`) or public admins (`audience_kind=admin_email`) | private: `push+email`, public: `email` | `game_id`, `game_name`, `applicant_user_id`, `applicant_name` | | `lobby.membership.approved` | `Game Lobby` (`game_lobby`) | applicant user (`audience_kind=user`) | `push+email` | `game_id`, `game_name` | | `lobby.membership.rejected` | `Game Lobby` (`game_lobby`) | applicant user (`audience_kind=user`) | `push+email` | `game_id`, `game_name` | | `lobby.invite.created` | `Game Lobby` (`game_lobby`) | invited user (`audience_kind=user`) | `push+email` | `game_id`, `game_name`, `inviter_user_id`, `inviter_name` | | `lobby.invite.redeemed` | `Game Lobby` (`game_lobby`) | private-game owner (`audience_kind=user`) | `push+email` | `game_id`, `game_name`, `invitee_user_id`, `invitee_name` | | `lobby.invite.expired` | `Game Lobby` (`game_lobby`) | private-game owner (`audience_kind=user`) | `email` | `game_id`, `game_name`, `invitee_user_id`, `invitee_name` | Rules: - v1 supports exactly the eleven `notification_type` values listed above - `lobby.application.submitted` keeps one stable `notification_type` and one stable `payload_json` shape; private games publish `audience_kind=user` while public games publish `audience_kind=admin_email` - `lobby.invite.revoked` deliberately produces no notification in v1 and remains outside the supported catalog - private-game invite notifications remain user-bound by internal `user_id` ## Recipient Enrichment And Locale Policy For `audience_kind=user`, `Notification Service` resolves user records through the trusted `User Service` lookup endpoint: - `GET /api/v1/internal/users/{user_id}` The response supplies: - `email` - `preferred_language` Locale rules: - current implemented support is exactly one locale: `en` - exact `preferred_language` is used when supported by `Mail Service` - unsupported, empty, or invalid language values fall back to `en` - no intermediate locale reduction is used in v1 - the same resolved locale drives both `push` payload localization decisions and `Mail Service` template selection - enrichment runs during intent acceptance before durable route write - `404 subject_not_found` from `User Service` is treated as permanent producer input error and becomes malformed-intent `recipient_not_found` - temporary `User Service` failures stop the consumer before stream-offset advance so the same stream entry is retried after restart For `audience_kind=admin_email`, `Notification Service` does not consult `User Service` and instead resolves recipients from type-specific config. ## Push Contract Toward Gateway Push events are published into the existing `Gateway` client-events stream. Stable routing rules: - `event_type` equals `notification_type` - `event_id` equals `/` - `user_id` is derived from `recipient_ref=user:` for user-targeted routes - `request_id` and `trace_id` are forwarded when present - `device_session_id` is intentionally omitted so `Gateway` fans the event out to every active stream of that user `Notification Service` appends `Gateway` events with `XADD MAXLEN ~` using `NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN`. User-facing push payloads use [pkg/schema/fbs/notification.fbs](../pkg/schema/fbs/notification.fbs). | `notification_type` | FlatBuffers table | Payload fields | | --- | --- | --- | | `game.turn.ready` | `notification.GameTurnReadyEvent` | `game_id`, `turn_number` | | `game.finished` | `notification.GameFinishedEvent` | `game_id`, `final_turn_number` | | `lobby.application.submitted` | `notification.LobbyApplicationSubmittedEvent` | `game_id`, `applicant_user_id` | | `lobby.membership.approved` | `notification.LobbyMembershipApprovedEvent` | `game_id` | | `lobby.membership.rejected` | `notification.LobbyMembershipRejectedEvent` | `game_id` | | `lobby.invite.created` | `notification.LobbyInviteCreatedEvent` | `game_id`, `inviter_user_id` | | `lobby.invite.redeemed` | `notification.LobbyInviteRedeemedEvent` | `game_id`, `invitee_user_id` | Only the seven user-facing push notification types above are represented in `notification.fbs`. `geo.review_recommended`, `game.generation_failed`, `lobby.runtime_paused_after_start`, and `lobby.invite.expired` remain outside this schema because they are email-only in v1. Checked-in generated Go bindings for this schema live under [`../pkg/schema/fbs/notification`](../pkg/schema/fbs/notification). `notification_type` alone determines the concrete FlatBuffers table. No extra envelope or FlatBuffers `union` is added in v1. The push payload must stay lightweight and must not attempt to mirror full game, lobby, or profile state. `game_name`, human-readable user names, and other full business-state fields stay out of the push schema. Clients react to the notification and then fetch fresh business state through normal service APIs. ## Email Contract Toward Mail Service Email routes are published to `Mail Service` through `mail:delivery_commands` using the existing generic async command contract. Rules: - `delivery_id` equals `/` - `source` is always `notification` - `payload_mode` is always `template` - `idempotency_key` equals `notification:/` - `requested_at_ms` equals `accepted_at_ms` - `request_id` and `trace_id` are forwarded when present - `payload_json.to` contains exactly one resolved recipient email - `payload_json.cc`, `payload_json.bcc`, `payload_json.reply_to`, and `payload_json.attachments` are empty arrays in v1 - `template_id` equals `notification_type` - `locale` is the resolved language from the enrichment step or `en` - template variables are passed through from normalized `payload_json` `Notification Service` appends `Mail Service` commands with plain `XADD` and does not manage retention or trimming of `mail:delivery_commands`. Auth-code email remains a direct `Auth / Session Service -> Mail Service` flow and does not pass through `Notification Service`. Initial notification-owned template assets: | `notification_type` | `template_id` | Required assets | | --- | --- | --- | | `geo.review_recommended` | `geo.review_recommended` | `en/subject.tmpl`, `en/text.tmpl` | | `game.turn.ready` | `game.turn.ready` | `en/subject.tmpl`, `en/text.tmpl` | | `game.finished` | `game.finished` | `en/subject.tmpl`, `en/text.tmpl` | | `game.generation_failed` | `game.generation_failed` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.runtime_paused_after_start` | `lobby.runtime_paused_after_start` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.application.submitted` | `lobby.application.submitted` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.membership.approved` | `lobby.membership.approved` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.membership.rejected` | `lobby.membership.rejected` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.invite.created` | `lobby.invite.created` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.invite.redeemed` | `lobby.invite.redeemed` | `en/subject.tmpl`, `en/text.tmpl` | | `lobby.invite.expired` | `lobby.invite.expired` | `en/subject.tmpl`, `en/text.tmpl` | `auth.login_code` does not belong to the notification-owned template set. ## Route Model One accepted intent materializes: - one `notification_record` - zero or more `notification_route` entries Each route represents exactly one `(channel, recipient_ref)` pair. Stable route statuses: - `pending` - `published` - `failed` - `dead_letter` - `skipped` Rules: - `pending` means the route is ready for first publish or retry - `published` means the route was durably handed off to its downstream channel - `failed` means the last publish attempt failed and a later retry is scheduled - `dead_letter` means the route exhausted its retry budget - `skipped` means the route slot was durably materialized but intentionally not emitted Materialization rules: - every derived `recipient_ref` receives one `push` route slot and one `email` route slot, except that an empty administrator email list materializes one synthetic `config:` recipient slot with only a skipped `email` route - a route slot whose channel is outside the notification type channel matrix is materialized as `skipped` - `recipient_ref` is `user:` for user-targeted routes - `recipient_ref` is `email:` for configured administrator email routes - when an administrator email list is empty, the service materializes one synthetic recipient slot `config:` with one skipped `email` route so the configuration gap remains durable and operator-visible - `route_id` is mandatory and equals `:` The service-local aggregate notification status is derived from routes and is not a separate durable source of truth. ## Redis Logical Model Storage rules: - durable records are stored as strict JSON blobs - timestamps are stored in Unix milliseconds - dynamic Redis key segments are base64url-encoded - `notification:route_schedule` is one shared sorted set for both `push` and `email` | Logical artifact | Redis key | | --- | --- | | `notification_record` | `notification:records:` | | `notification_route` | `notification:routes::` | | temporary route lease | `notification:route_leases::` | | `notification_idempotency_record` | `notification:idempotency::` | | `notification_dead_letter_entry` | `notification:dead_letters::` | | malformed intent record | `notification:malformed_intents:` | | stream offset record | `notification:stream_offsets:` | | ingress stream | `notification:intents` | | route schedule sorted set | `notification:route_schedule` | | Record | Frozen fields | | --- | --- | | `notification_record` | `notification_id`, `notification_type`, `producer`, `audience_kind`, normalized `recipient_user_ids`, normalized `payload_json`, `idempotency_key`, `request_fingerprint`, optional `request_id`, optional `trace_id`, `occurred_at_ms`, `accepted_at_ms`, `updated_at_ms` | | `notification_route` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `status`, `attempt_count`, `max_attempts`, `next_attempt_at_ms`, optional `resolved_email`, optional `resolved_locale`, optional `last_error_classification`, optional `last_error_message`, optional `last_error_at_ms`, `created_at_ms`, `updated_at_ms`, optional `published_at_ms`, optional `dead_lettered_at_ms`, optional `skipped_at_ms` | | `notification_idempotency_record` | `producer`, `idempotency_key`, `notification_id`, `request_fingerprint`, `created_at_ms`, `expires_at_ms` | | `notification_dead_letter_entry` | `notification_id`, `route_id`, `channel`, `recipient_ref`, `final_attempt_count`, `max_attempts`, `failure_classification`, `failure_message`, `created_at_ms`, optional `recovery_hint` | | malformed intent record | `stream_entry_id`, optional `notification_type`, optional `producer`, optional `idempotency_key`, `failure_code`, `failure_message`, `raw_fields_json`, `recorded_at_ms` | | stream offset record | `stream`, `last_processed_entry_id`, `updated_at_ms` | `notification_record.recipient_user_ids` stores a normalized array of unique `user_id` values and is omitted for `audience_kind=admin_email`. `notification_record.payload_json` stores the canonical normalized JSON string used for idempotency fingerprinting. Temporary route lease keys store one opaque worker token and use `NOTIFICATION_ROUTE_LEASE_TTL`; they are service-local coordination state rather than durable records. `notification:route_schedule` stores one member per scheduled route where score = `next_attempt_at_ms` and member = full Redis route key with encoded dynamic segments. Newly accepted publishable routes enter the schedule immediately with `status=pending` and `next_attempt_at_ms = accepted_at_ms`. `failed` routes remain scheduled for retry. `published`, `dead_letter`, and `skipped` are absent from the schedule. Only the current lease holder may finalize one due publication attempt. ## Retry And Dead-Letter Policy Retry budgets are channel-specific: - `push` publication to `Gateway`: `3` attempts total - `email` publication to `Mail Service`: `7` attempts total Rules: - the first publication attempt happens immediately at `accepted_at_ms` - after failed attempt `N`, the next delay is `clamp(NOTIFICATION_ROUTE_BACKOFF_MIN * 2^(N-1), NOTIFICATION_ROUTE_BACKOFF_MIN, NOTIFICATION_ROUTE_BACKOFF_MAX)` - no jitter is added to the retry delay - `push` and `email` routes are retried independently - the shared schedule is filtered by route prefix so `push` publishers claim only `push:` routes and `email` publishers claim only `email:` routes - `push` and `email` replicas coordinate through `notification:route_leases::` with `NOTIFICATION_ROUTE_LEASE_TTL` - `push` publication failures are classified minimally as `payload_encoding_failed` and `gateway_stream_publish_failed` - `email` publication failures are classified minimally as `payload_encoding_failed` and `mail_stream_publish_failed` - when a route exhausts its retry budget, it transitions to `dead_letter`, creates `notification_dead_letter_entry`, and is removed from `notification:route_schedule` - one exhausted route entering `dead_letter` must not roll back or invalidate a sibling route that already reached `published` - service restarts resume from durable route state and persisted stream offsets Retention rules: - `notification_record` and `notification_route` use `NOTIFICATION_RECORD_TTL` - `notification_idempotency_record` uses `NOTIFICATION_IDEMPOTENCY_TTL` - `notification_dead_letter_entry` and malformed intent records use `NOTIFICATION_DEAD_LETTER_TTL` - stream offset records do not use TTL ## Observability The service instruments: - internal probe HTTP requests - internal probe HTTP listener startup and shutdown events - structured logs for accepted, duplicate, and rejected notification intents - structured logs for `push` and `email` route publication, retry, and dead-letter transitions - accepted and duplicate intent outcomes - malformed intents, including idempotency conflicts and unresolved recipients - user-enrichment lookup outcomes - route publish attempts, retries, and dead-letter transitions - current route-schedule depth and oldest scheduled route age - oldest unprocessed intent stream entry age Metric names: - `notification.intent.outcomes` - `notification.intent.malformed` - `notification.user_enrichment.attempts` - `notification.route.publish_attempts` - `notification.route.retries` - `notification.route.dead_letters` - `notification.route_schedule.depth` - `notification.route_schedule.oldest_age_ms` - `notification.intent_stream.oldest_unprocessed_age_ms` Metrics intentionally avoid high-cardinality attributes such as `user_id`, email address, `notification_id`, `route_id`, and `idempotency_key`. Metric attributes may include `notification_type`, `producer`, `audience_kind`, `channel`, `result`, `outcome`, `failure_code`, and `failure_classification`. Structured logs for intent intake, duplicate resolution, malformed-intent recording, route publication, retry scheduling, and dead-letter transitions use the same field names where the value exists: - `notification_id` - `notification_type` - `producer` - `audience_kind` - `idempotency_key` - `route_id` - `channel` - `request_id` - `trace_id` OpenTelemetry trace context is logged as `otel_trace_id` and `otel_span_id` when the active context carries a valid span. ## Recovery The supported manual replay path for a dead-lettered notification route is to publish a new compatible intent to `notification:intents`. Recovery rules: - inspect the `notification_dead_letter_entry`, `notification_route`, and owning `notification_record` - confirm the downstream dependency or payload problem has been corrected - publish a new intent with the same semantic `payload_json` and audience fields, but with a new producer-owned `idempotency_key` - keep the old `notification_dead_letter_entry` untouched as audit history until its configured TTL expires Manual Redis mutation of an existing route record or `notification:route_schedule` is not a supported replay workflow. ## Verification Focused service-local coverage verifies: - configuration loading and validation - `GET /healthz` - `GET /readyz` - absence of `/metrics` - Redis startup fast-fail behavior - graceful shutdown of the private probe listener - valid intent acceptance - malformed intent rejection - duplicate and conflicting duplicate handling - user-targeted route enrichment from `User Service` - `recipient_not_found` malformed-intent recording for unresolved `user_id` - temporary `User Service` failure handling without stream-offset advance - FlatBuffers payload encoding for all seven user-facing `push` `notification_type` values - template-mode `Mail Service` command encoding for user and administrator `email` routes - due-route loading, lease acquisition, route publication, retry reschedule, and dead-letter persistence in Redis - `push` worker success, retry, and duplicate-prevention behavior across concurrent replicas - `email` worker success, retry, and duplicate-prevention behavior across concurrent replicas - OpenTelemetry metric recording for intent outcomes, malformed intents, user enrichment, route publication attempts, retries, dead letters, route-schedule gauges, and intent-stream lag - Redis-backed route-schedule and intent-stream lag snapshots - structured log field helper coverage through intake and publisher tests - intent-consumer restart from `0-0` and from persisted stream offsets - runtime wiring of the intent consumer and both route publishers - shared `galaxy/notificationintent` producer constructors, validation, and Redis Stream publication compatibility Cross-service coverage verifies: - `Notification Service -> User Service` enrichment compatibility and failure handling - `Notification Service -> Gateway` push compatibility for every user-facing `notification_type` - `Notification Service -> Mail Service` template-mode handoff for every supported email type - producer compatibility for `Game Master`, `Game Lobby`, and `Geo Profile Service` through `galaxy/notificationintent` - explicit regression that auth-code email still bypasses `Notification Service` - real black-box `Notification Service -> Gateway` push fan-out coverage - real black-box `Notification Service -> Mail Service` template-mode handoff coverage Real producer-boundary suites for `Game Master`, `Game Lobby`, and `Geo Profile Service` should be added only when those service boundaries exist in code.