Files
galaxy-game/notification/README.md
T
2026-04-25 23:20:55 +02:00

32 KiB

Notification Service

Canonical references:

Purpose

Notification Service is the internal asynchronous orchestration layer for platform notifications.

It accepts normalized notification intents from upstream services, materializes per-recipient routes, enriches user-targeted routes through User Service, publishes client-facing push events toward Gateway, publishes non-auth email commands toward Mail Service, and isolates transient downstream failures with independent retry budgets per channel.

The service is intentionally not a source of truth for:

  • game state
  • lobby membership
  • invite ownership
  • review flags
  • notification preferences
  • email delivery attempts

Responsibility Boundaries

Notification Service is responsible for:

  • consuming normalized notification intents from a dedicated Redis Stream
  • validating intent envelopes and rejecting malformed or conflicting duplicates
  • persisting durable notification and route state
  • resolving user contact data from User Service by user_id
  • selecting locale from User Service.preferred_language with en fallback
  • shaping lightweight push payloads for user-facing events
  • publishing template-mode email commands to Mail Service
  • retrying route publication independently for push and email
  • persisting dead-letter entries for exhausted routes

Notification Service is not responsible for:

  • computing business audiences from game_id or other domain identifiers
  • owning administrator identity or administrator user records
  • sending auth-code email
  • storing per-user notification preferences in v1
  • exposing an operator REST API in v1

The key design rule is that upstream producers must publish the concrete recipient_user_id values for user-targeted notification intents. For administrator-only notification types, recipient email addresses are resolved from Notification Service configuration by notification_type. Private-game invite notifications in v1 remain user-bound by internal user_id values and must not target recipients by raw email address.

Runtime Surface

The implemented process contains:

  • one private internal HTTP probe listener
  • process-wide structured logging
  • process-wide OpenTelemetry runtime
  • one shared galaxy/notificationintent producer contract module
  • one shared Redis client with startup connectivity check
  • one trusted User Service HTTP enrichment client
  • one plain-XREAD notification-intent consumer
  • one long-lived push route publisher
  • one long-lived email route publisher
  • durable accepted-intent, route, idempotency, malformed-intent, and stream-offset storage in Redis
  • user-targeted route enrichment during intent acceptance before durable write
  • client-facing push publication toward Gateway
  • template-mode email publication toward Mail Service
  • durable push and email retry, dead-letter, and temporary lease coordination in Redis
  • OpenTelemetry counters and observable gauges for intent intake, user enrichment, route publication, route schedule depth, and intent stream lag
  • graceful shutdown on process cancellation

Probe contract:

  • GET /healthz returns {"status":"ok"}
  • GET /readyz returns {"status":"ready"}
  • readyz is process-local after successful startup and does not perform a live Redis ping per request
  • there is no /metrics route

Runtime behavior:

  • the intent consumer reads notification:intents with plain XREAD
  • when no stored stream offset exists, the consumer starts from 0-0
  • the persisted offset advances only after durable acceptance or durable malformed-intent recording
  • user-targeted routes are enriched through GET /api/v1/internal/users/{user_id} before durable route write
  • 404 subject_not_found from User Service is recorded under malformed-intent storage with failure_code=recipient_not_found
  • temporary User Service lookup failures stop the consumer before stream-offset advance
  • due push routes are published toward Gateway from the shared notification:route_schedule
  • due email routes are published toward Mail Service from the shared notification:route_schedule
  • the push publisher claims only routes whose route_id starts with push:
  • the email publisher claims only routes whose route_id starts with email:
  • replicas coordinate through temporary Redis lease notification:route_leases:<notification_id>:<route_id>
  • Gateway publication uses XADD MAXLEN ~ with NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN
  • event_id equals <notification_id>/<route_id>
  • Mail Service publication uses plain XADD with no stream trimming
  • delivery_id equals <notification_id>/<route_id>
  • idempotency_key equals notification:<notification_id>/<route_id>
  • requested_at_ms equals accepted_at_ms
  • request_id and trace_id are forwarded when present
  • device_session_id is intentionally omitted so Gateway fans the event out to every active stream of that user
  • Go producers use galaxy/notificationintent to construct and publish compatible intents into notification:intents
  • producer publication uses plain XADD without stream trimming or hidden helper retries
  • a producer-side notification publication failure is notification degradation and must not roll back the source business state
  • metric export uses the configured OpenTelemetry exporters only
  • there is still no /metrics route
  • notification.route_schedule.depth and notification.route_schedule.oldest_age_ms are derived from notification:route_schedule
  • notification.intent_stream.oldest_unprocessed_age_ms is derived from the persisted intent stream offset and the configured ingress stream
  • manual dead-letter replay is performed by publishing a new compatible intent with a new idempotency_key; existing dead-letter records remain audit history until TTL expiry

The target process shape is one internal-only process with:

  • one notification-intent consumer
  • one push route publisher for Gateway
  • one email route publisher for Mail Service

Intentional runtime omissions in v1:

  • no public ingress
  • no dedicated operator REST API
  • no direct client delivery
  • no direct SMTP integration

Configuration

Required:

  • NOTIFICATION_REDIS_ADDR
  • NOTIFICATION_USER_SERVICE_BASE_URL

Primary configuration groups:

  • process and logging:
    • NOTIFICATION_SHUTDOWN_TIMEOUT
    • NOTIFICATION_LOG_LEVEL
  • internal probe HTTP:
    • NOTIFICATION_INTERNAL_HTTP_ADDR with default :8092
    • NOTIFICATION_INTERNAL_HTTP_READ_HEADER_TIMEOUT with default 2s
    • NOTIFICATION_INTERNAL_HTTP_READ_TIMEOUT with default 10s
    • NOTIFICATION_INTERNAL_HTTP_IDLE_TIMEOUT with default 1m
  • Redis connectivity:
    • NOTIFICATION_REDIS_USERNAME
    • NOTIFICATION_REDIS_PASSWORD
    • NOTIFICATION_REDIS_DB
    • NOTIFICATION_REDIS_TLS_ENABLED
    • NOTIFICATION_REDIS_OPERATION_TIMEOUT
  • stream names:
    • NOTIFICATION_INTENTS_STREAM with default notification:intents
    • NOTIFICATION_INTENTS_READ_BLOCK_TIMEOUT with default 2s
    • NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM with default gateway:client-events
    • NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN with default 1024
    • NOTIFICATION_MAIL_DELIVERY_COMMANDS_STREAM with default mail:delivery_commands
  • retry and dead-letter:
    • NOTIFICATION_PUSH_RETRY_MAX_ATTEMPTS with default 3
    • NOTIFICATION_EMAIL_RETRY_MAX_ATTEMPTS with default 7
    • NOTIFICATION_ROUTE_BACKOFF_MIN with default 1s
    • NOTIFICATION_ROUTE_BACKOFF_MAX with default 5m
    • NOTIFICATION_ROUTE_LEASE_TTL with default 5s
    • NOTIFICATION_DEAD_LETTER_TTL with default 720h
    • NOTIFICATION_RECORD_TTL with default 720h
    • NOTIFICATION_IDEMPOTENCY_TTL with default 168h
  • User Service enrichment:
    • NOTIFICATION_USER_SERVICE_TIMEOUT with default 1s
  • administrator routing:
    • NOTIFICATION_ADMIN_EMAILS_GEO_REVIEW_RECOMMENDED
    • NOTIFICATION_ADMIN_EMAILS_GAME_GENERATION_FAILED
    • NOTIFICATION_ADMIN_EMAILS_LOBBY_RUNTIME_PAUSED_AFTER_START
    • NOTIFICATION_ADMIN_EMAILS_LOBBY_APPLICATION_SUBMITTED
  • OpenTelemetry:
    • standard OTEL_* variables
    • NOTIFICATION_OTEL_STDOUT_TRACES_ENABLED
    • NOTIFICATION_OTEL_STDOUT_METRICS_ENABLED

Each administrator configuration variable stores a comma-separated list of email addresses for exactly one notification_type. v1 does not use one global admin-recipient list shared across all administrative events.

Stable Input Contract

The service accepts intents from one dedicated Redis Stream:

  • notification:intents

The canonical envelope is defined in api/intents-asyncapi.yaml. Go producers should use the shared galaxy/notificationintent module to build and append compatible stream entries instead of duplicating field names, payload structs, or validation rules locally.

Required envelope fields:

  • notification_type
  • producer
  • audience_kind
  • idempotency_key
  • occurred_at_ms
  • payload_json

Optional envelope fields:

  • recipient_user_ids_json
  • request_id
  • trace_id

Rules:

  • audience_kind=user requires recipient_user_ids_json with one or more unique stable user_id values
  • audience_kind=admin_email forbids recipient_user_ids_json
  • recipient_user_ids_json is normalized as an unordered recipient set, so duplicate user_id values are invalid and element order does not affect idempotency
  • request_id and trace_id are observability-only metadata and do not participate in the idempotency fingerprint
  • payload_json is type-specific, must remain backward-compatible for each notification_type, and is normalized structurally for duplicate detection: insignificant whitespace and object key order are ignored while array order remains significant
  • a replay with the same (producer, idempotency_key) and the same normalized payload is treated as a successful duplicate
  • a replay with the same (producer, idempotency_key) but different normalized content is recorded as a conflicting duplicate under malformed-intent storage with failure_code=idempotency_conflict and must not create new routes
  • during user enrichment, a missing user_id in User Service is recorded under malformed-intent storage with failure_code=recipient_not_found

Malformed stream entries do not create durable notification records. They are logged, metered, and recorded separately for operator inspection. Accepted intents use the original Redis Stream stream_entry_id as notification_id.

Notification Catalog

payload_json fields are normalized by the producer before publication.

notification_type Producer Audience Channels Required payload_json fields
geo.review_recommended Geo Profile Service (geoprofile) configured admin email list (audience_kind=admin_email) email user_id, user_email, observed_country, usual_connection_country, review_reason
game.turn.ready Game Master (game_master) active accepted participants (audience_kind=user) push+email game_id, game_name, turn_number
game.finished Game Master (game_master) active accepted participants (audience_kind=user) push+email game_id, game_name, final_turn_number
game.generation_failed Game Master (game_master) configured admin email list (audience_kind=admin_email) email game_id, game_name, failure_reason
lobby.runtime_paused_after_start Game Lobby (game_lobby) configured admin email list (audience_kind=admin_email) email game_id, game_name
lobby.application.submitted Game Lobby (game_lobby) private owner (audience_kind=user) or public admins (audience_kind=admin_email) private: push+email, public: email game_id, game_name, applicant_user_id, applicant_name
lobby.membership.approved Game Lobby (game_lobby) applicant user (audience_kind=user) push+email game_id, game_name
lobby.membership.rejected Game Lobby (game_lobby) applicant user (audience_kind=user) push+email game_id, game_name
lobby.membership.blocked Game Lobby (game_lobby) private-game owner (audience_kind=user) push+email game_id, game_name, membership_user_id, membership_user_name, reason
lobby.invite.created Game Lobby (game_lobby) invited user (audience_kind=user) push+email game_id, game_name, inviter_user_id, inviter_name
lobby.invite.redeemed Game Lobby (game_lobby) private-game owner (audience_kind=user) push+email game_id, game_name, invitee_user_id, invitee_name
lobby.invite.expired Game Lobby (game_lobby) private-game owner (audience_kind=user) email game_id, game_name, invitee_user_id, invitee_name
lobby.race_name.registration_eligible Game Lobby (game_lobby) capable member (audience_kind=user) push+email game_id, game_name, race_name, eligible_until_ms
lobby.race_name.registered Game Lobby (game_lobby) registering user (audience_kind=user) push+email race_name
lobby.race_name.registration_denied Game Lobby (game_lobby) incapable member (audience_kind=user) email game_id, game_name, race_name, reason

Rules:

  • v1 supports exactly the fifteen notification_type values listed above
  • lobby.application.submitted keeps one stable notification_type and one stable payload_json shape; private games publish audience_kind=user while public games publish audience_kind=admin_email
  • lobby.invite.revoked deliberately produces no notification in v1 and remains outside the supported catalog
  • private-game invite notifications remain user-bound by internal user_id
  • lobby.race_name.registration_eligible and lobby.race_name.registration_denied are emitted by Game Lobby at game_finished based on capability evaluation; the former always pairs with a 30-day eligible_until_ms window
  • lobby.race_name.registered is emitted on successful lobby.race_name.register commit

Recipient Enrichment And Locale Policy

For audience_kind=user, Notification Service resolves user records through the trusted User Service lookup endpoint:

  • GET /api/v1/internal/users/{user_id}

The response supplies:

  • email
  • preferred_language

Locale rules:

  • current implemented support is exactly one locale: en
  • exact preferred_language is used when supported by Mail Service
  • unsupported, empty, or invalid language values fall back to en
  • no intermediate locale reduction is used in v1
  • the same resolved locale drives both push payload localization decisions and Mail Service template selection
  • enrichment runs during intent acceptance before durable route write
  • 404 subject_not_found from User Service is treated as permanent producer input error and becomes malformed-intent recipient_not_found
  • temporary User Service failures stop the consumer before stream-offset advance so the same stream entry is retried after restart

For audience_kind=admin_email, Notification Service does not consult User Service and instead resolves recipients from type-specific config.

Push Contract Toward Gateway

Push events are published into the existing Gateway client-events stream.

Stable routing rules:

  • event_type equals notification_type
  • event_id equals <notification_id>/<route_id>
  • user_id is derived from recipient_ref=user:<user_id> for user-targeted routes
  • request_id and trace_id are forwarded when present
  • device_session_id is intentionally omitted so Gateway fans the event out to every active stream of that user

Notification Service appends Gateway events with XADD MAXLEN ~ using NOTIFICATION_GATEWAY_CLIENT_EVENTS_STREAM_MAX_LEN.

User-facing push payloads use pkg/schema/fbs/notification.fbs.

notification_type FlatBuffers table Payload fields
game.turn.ready notification.GameTurnReadyEvent game_id, turn_number
game.finished notification.GameFinishedEvent game_id, final_turn_number
lobby.application.submitted notification.LobbyApplicationSubmittedEvent game_id, applicant_user_id
lobby.membership.approved notification.LobbyMembershipApprovedEvent game_id
lobby.membership.rejected notification.LobbyMembershipRejectedEvent game_id
lobby.membership.blocked notification.LobbyMembershipBlockedEvent game_id, membership_user_id, reason
lobby.invite.created notification.LobbyInviteCreatedEvent game_id, inviter_user_id
lobby.invite.redeemed notification.LobbyInviteRedeemedEvent game_id, invitee_user_id
lobby.race_name.registration_eligible notification.LobbyRaceNameRegistrationEligibleEvent game_id, race_name, eligible_until_ms
lobby.race_name.registered notification.LobbyRaceNameRegisteredEvent race_name

Only the ten user-facing push notification types above are represented in notification.fbs. geo.review_recommended, game.generation_failed, lobby.runtime_paused_after_start, lobby.invite.expired, and lobby.race_name.registration_denied remain outside this schema because they are email-only in v1.

Checked-in generated Go bindings for this schema live under ../pkg/schema/fbs/notification.

notification_type alone determines the concrete FlatBuffers table. No extra envelope or FlatBuffers union is added in v1.

The push payload must stay lightweight and must not attempt to mirror full game, lobby, or profile state. game_name, human-readable user names, and other full business-state fields stay out of the push schema. Clients react to the notification and then fetch fresh business state through normal service APIs.

Email Contract Toward Mail Service

Email routes are published to Mail Service through mail:delivery_commands using the existing generic async command contract.

Rules:

  • delivery_id equals <notification_id>/<route_id>
  • source is always notification
  • payload_mode is always template
  • idempotency_key equals notification:<notification_id>/<route_id>
  • requested_at_ms equals accepted_at_ms
  • request_id and trace_id are forwarded when present
  • payload_json.to contains exactly one resolved recipient email
  • payload_json.cc, payload_json.bcc, payload_json.reply_to, and payload_json.attachments are empty arrays in v1
  • template_id equals notification_type
  • locale is the resolved language from the enrichment step or en
  • template variables are passed through from normalized payload_json

Notification Service appends Mail Service commands with plain XADD and does not manage retention or trimming of mail:delivery_commands.

Auth-code email remains a direct Auth / Session Service -> Mail Service flow and does not pass through Notification Service.

Initial notification-owned template assets:

notification_type template_id Required assets
geo.review_recommended geo.review_recommended en/subject.tmpl, en/text.tmpl
game.turn.ready game.turn.ready en/subject.tmpl, en/text.tmpl
game.finished game.finished en/subject.tmpl, en/text.tmpl
game.generation_failed game.generation_failed en/subject.tmpl, en/text.tmpl
lobby.runtime_paused_after_start lobby.runtime_paused_after_start en/subject.tmpl, en/text.tmpl
lobby.application.submitted lobby.application.submitted en/subject.tmpl, en/text.tmpl
lobby.membership.approved lobby.membership.approved en/subject.tmpl, en/text.tmpl
lobby.membership.rejected lobby.membership.rejected en/subject.tmpl, en/text.tmpl
lobby.membership.blocked lobby.membership.blocked en/subject.tmpl, en/text.tmpl
lobby.invite.created lobby.invite.created en/subject.tmpl, en/text.tmpl
lobby.invite.redeemed lobby.invite.redeemed en/subject.tmpl, en/text.tmpl
lobby.invite.expired lobby.invite.expired en/subject.tmpl, en/text.tmpl
lobby.race_name.registration_eligible lobby.race_name.registration_eligible en/subject.tmpl, en/text.tmpl
lobby.race_name.registered lobby.race_name.registered en/subject.tmpl, en/text.tmpl
lobby.race_name.registration_denied lobby.race_name.registration_denied en/subject.tmpl, en/text.tmpl

auth.login_code does not belong to the notification-owned template set.

Route Model

One accepted intent materializes:

  • one notification_record
  • zero or more notification_route entries

Each route represents exactly one (channel, recipient_ref) pair.

Stable route statuses:

  • pending
  • published
  • failed
  • dead_letter
  • skipped

Rules:

  • pending means the route is ready for first publish or retry
  • published means the route was durably handed off to its downstream channel
  • failed means the last publish attempt failed and a later retry is scheduled
  • dead_letter means the route exhausted its retry budget
  • skipped means the route slot was durably materialized but intentionally not emitted

Materialization rules:

  • every derived recipient_ref receives one push route slot and one email route slot, except that an empty administrator email list materializes one synthetic config:<notification_type> recipient slot with only a skipped email route
  • a route slot whose channel is outside the notification type channel matrix is materialized as skipped
  • recipient_ref is user:<user_id> for user-targeted routes
  • recipient_ref is email:<normalized_address> for configured administrator email routes
  • when an administrator email list is empty, the service materializes one synthetic recipient slot config:<notification_type> with one skipped email route so the configuration gap remains durable and operator-visible
  • route_id is mandatory and equals <channel>:<recipient_ref>

The service-local aggregate notification status is derived from routes and is not a separate durable source of truth.

Redis Logical Model

Storage rules:

  • durable records are stored as strict JSON blobs
  • timestamps are stored in Unix milliseconds
  • dynamic Redis key segments are base64url-encoded
  • notification:route_schedule is one shared sorted set for both push and email
Logical artifact Redis key
notification_record notification:records:<notification_id>
notification_route notification:routes:<notification_id>:<route_id>
temporary route lease notification:route_leases:<notification_id>:<route_id>
notification_idempotency_record notification:idempotency:<producer>:<idempotency_key>
notification_dead_letter_entry notification:dead_letters:<notification_id>:<route_id>
malformed intent record notification:malformed_intents:<stream_entry_id>
stream offset record notification:stream_offsets:<stream>
ingress stream notification:intents
route schedule sorted set notification:route_schedule
Record Frozen fields
notification_record notification_id, notification_type, producer, audience_kind, normalized recipient_user_ids, normalized payload_json, idempotency_key, request_fingerprint, optional request_id, optional trace_id, occurred_at_ms, accepted_at_ms, updated_at_ms
notification_route notification_id, route_id, channel, recipient_ref, status, attempt_count, max_attempts, next_attempt_at_ms, optional resolved_email, optional resolved_locale, optional last_error_classification, optional last_error_message, optional last_error_at_ms, created_at_ms, updated_at_ms, optional published_at_ms, optional dead_lettered_at_ms, optional skipped_at_ms
notification_idempotency_record producer, idempotency_key, notification_id, request_fingerprint, created_at_ms, expires_at_ms
notification_dead_letter_entry notification_id, route_id, channel, recipient_ref, final_attempt_count, max_attempts, failure_classification, failure_message, created_at_ms, optional recovery_hint
malformed intent record stream_entry_id, optional notification_type, optional producer, optional idempotency_key, failure_code, failure_message, raw_fields_json, recorded_at_ms
stream offset record stream, last_processed_entry_id, updated_at_ms

notification_record.recipient_user_ids stores a normalized array of unique user_id values and is omitted for audience_kind=admin_email. notification_record.payload_json stores the canonical normalized JSON string used for idempotency fingerprinting. Temporary route lease keys store one opaque worker token and use NOTIFICATION_ROUTE_LEASE_TTL; they are service-local coordination state rather than durable records. notification:route_schedule stores one member per scheduled route where score = next_attempt_at_ms and member = full Redis route key with encoded dynamic segments. Newly accepted publishable routes enter the schedule immediately with status=pending and next_attempt_at_ms = accepted_at_ms. failed routes remain scheduled for retry. published, dead_letter, and skipped are absent from the schedule. Only the current lease holder may finalize one due publication attempt.

Retry And Dead-Letter Policy

Retry budgets are channel-specific:

  • push publication to Gateway: 3 attempts total
  • email publication to Mail Service: 7 attempts total

Rules:

  • the first publication attempt happens immediately at accepted_at_ms
  • after failed attempt N, the next delay is clamp(NOTIFICATION_ROUTE_BACKOFF_MIN * 2^(N-1), NOTIFICATION_ROUTE_BACKOFF_MIN, NOTIFICATION_ROUTE_BACKOFF_MAX)
  • no jitter is added to the retry delay
  • push and email routes are retried independently
  • the shared schedule is filtered by route prefix so push publishers claim only push: routes and email publishers claim only email: routes
  • push and email replicas coordinate through notification:route_leases:<notification_id>:<route_id> with NOTIFICATION_ROUTE_LEASE_TTL
  • push publication failures are classified minimally as payload_encoding_failed and gateway_stream_publish_failed
  • email publication failures are classified minimally as payload_encoding_failed and mail_stream_publish_failed
  • when a route exhausts its retry budget, it transitions to dead_letter, creates notification_dead_letter_entry, and is removed from notification:route_schedule
  • one exhausted route entering dead_letter must not roll back or invalidate a sibling route that already reached published
  • service restarts resume from durable route state and persisted stream offsets

Retention rules:

  • notification_record and notification_route use NOTIFICATION_RECORD_TTL
  • notification_idempotency_record uses NOTIFICATION_IDEMPOTENCY_TTL
  • notification_dead_letter_entry and malformed intent records use NOTIFICATION_DEAD_LETTER_TTL
  • stream offset records do not use TTL

Observability

The service instruments:

  • internal probe HTTP requests
  • internal probe HTTP listener startup and shutdown events
  • structured logs for accepted, duplicate, and rejected notification intents
  • structured logs for push and email route publication, retry, and dead-letter transitions
  • accepted and duplicate intent outcomes
  • malformed intents, including idempotency conflicts and unresolved recipients
  • user-enrichment lookup outcomes
  • route publish attempts, retries, and dead-letter transitions
  • current route-schedule depth and oldest scheduled route age
  • oldest unprocessed intent stream entry age

Metric names:

  • notification.intent.outcomes
  • notification.intent.malformed
  • notification.user_enrichment.attempts
  • notification.route.publish_attempts
  • notification.route.retries
  • notification.route.dead_letters
  • notification.route_schedule.depth
  • notification.route_schedule.oldest_age_ms
  • notification.intent_stream.oldest_unprocessed_age_ms

Metrics intentionally avoid high-cardinality attributes such as user_id, email address, notification_id, route_id, and idempotency_key.

Metric attributes may include notification_type, producer, audience_kind, channel, result, outcome, failure_code, and failure_classification.

Structured logs for intent intake, duplicate resolution, malformed-intent recording, route publication, retry scheduling, and dead-letter transitions use the same field names where the value exists:

  • notification_id
  • notification_type
  • producer
  • audience_kind
  • idempotency_key
  • route_id
  • channel
  • request_id
  • trace_id

OpenTelemetry trace context is logged as otel_trace_id and otel_span_id when the active context carries a valid span.

Recovery

The supported manual replay path for a dead-lettered notification route is to publish a new compatible intent to notification:intents.

Recovery rules:

  • inspect the notification_dead_letter_entry, notification_route, and owning notification_record
  • confirm the downstream dependency or payload problem has been corrected
  • publish a new intent with the same semantic payload_json and audience fields, but with a new producer-owned idempotency_key
  • keep the old notification_dead_letter_entry untouched as audit history until its configured TTL expires

Manual Redis mutation of an existing route record or notification:route_schedule is not a supported replay workflow.

Verification

Focused service-local coverage verifies:

  • configuration loading and validation
  • GET /healthz
  • GET /readyz
  • absence of /metrics
  • Redis startup fast-fail behavior
  • graceful shutdown of the private probe listener
  • valid intent acceptance
  • malformed intent rejection
  • duplicate and conflicting duplicate handling
  • user-targeted route enrichment from User Service
  • recipient_not_found malformed-intent recording for unresolved user_id
  • temporary User Service failure handling without stream-offset advance
  • FlatBuffers payload encoding for all seven user-facing push notification_type values
  • template-mode Mail Service command encoding for user and administrator email routes
  • due-route loading, lease acquisition, route publication, retry reschedule, and dead-letter persistence in Redis
  • push worker success, retry, and duplicate-prevention behavior across concurrent replicas
  • email worker success, retry, and duplicate-prevention behavior across concurrent replicas
  • OpenTelemetry metric recording for intent outcomes, malformed intents, user enrichment, route publication attempts, retries, dead letters, route-schedule gauges, and intent-stream lag
  • Redis-backed route-schedule and intent-stream lag snapshots
  • structured log field helper coverage through intake and publisher tests
  • intent-consumer restart from 0-0 and from persisted stream offsets
  • runtime wiring of the intent consumer and both route publishers
  • shared galaxy/notificationintent producer constructors, validation, and Redis Stream publication compatibility

Cross-service coverage verifies:

  • Notification Service -> User Service enrichment compatibility and failure handling
  • Notification Service -> Gateway push compatibility for every user-facing notification_type
  • Notification Service -> Mail Service template-mode handoff for every supported email type
  • producer compatibility for Game Master, Game Lobby, and Geo Profile Service through galaxy/notificationintent
  • explicit regression that auth-code email still bypasses Notification Service
  • real black-box Notification Service -> Gateway push fan-out coverage
  • real black-box Notification Service -> Mail Service template-mode handoff coverage

Real producer-boundary suites for Game Master, Game Lobby, and Geo Profile Service should be added only when those service boundaries exist in code.