# Mail Service `Mail Service` is the internal e-mail delivery service of Galaxy. Canonical contracts: - [Internal REST API](api/internal-openapi.yaml) - [Async generic command contract](api/delivery-commands-asyncapi.yaml) - [Extended service docs](docs/README.md) ## Purpose `Mail Service` owns durable intake, rendering, execution, retry, audit, and operator recovery for outbound e-mail. It does not decide whether a business event should become e-mail. That decision belongs to `Notification Service`. ## Responsibility Boundaries `Mail Service` is responsible for: - direct auth-code mail intake from `Auth / Session Service` - async generic mail intake from `Notification Service` - validation of recipient envelope, payload shape, locale, and attachments - deterministic template rendering for template-mode deliveries - provider execution through `stub` or `smtp` - retry scheduling, dead-letter escalation, and operator-visible audit state - trusted operator reads and resend by clone creation `Mail Service` is not responsible for: - end-user authentication or authorization - notification preference ownership - deciding whether non-auth mail should be sent at all - direct calls from `Geo Profile Service` - hot-reloading templates or editing template catalog state at runtime Cross-service routing rules: - `Auth / Session Service -> Mail Service` is synchronous trusted REST - `Notification Service -> Mail Service` is asynchronous `Redis Streams` - `Geo Profile Service` must route optional admin e-mail through `Notification Service`, not directly to `Mail Service` ## Runtime Surface `cmd/mail` starts one internal-only process with: - one trusted internal HTTP listener on `MAIL_INTERNAL_HTTP_ADDR` - one async command consumer - one attempt scheduler - one attempt worker pool - one cleanup worker The service has no public ingress and no dedicated admin listener. Intentional runtime omissions: - no `/healthz` - no `/readyz` - no `/metrics` Operational behavior: - startup performs bounded Redis connectivity checks and fails fast on invalid runtime configuration - the template catalog is parsed once at startup and kept immutable for the lifetime of the process - template changes require process restart - operator handlers execute under `MAIL_OPERATOR_REQUEST_TIMEOUT` ## Configuration Required for all starts: - `MAIL_REDIS_ADDR` Primary configuration groups: - process and logging: - `MAIL_SHUTDOWN_TIMEOUT` - `MAIL_LOG_LEVEL` - internal HTTP: - `MAIL_INTERNAL_HTTP_ADDR` - `MAIL_INTERNAL_HTTP_READ_HEADER_TIMEOUT` - `MAIL_INTERNAL_HTTP_READ_TIMEOUT` - `MAIL_INTERNAL_HTTP_IDLE_TIMEOUT` - Redis connectivity: - `MAIL_REDIS_USERNAME` - `MAIL_REDIS_PASSWORD` - `MAIL_REDIS_DB` - `MAIL_REDIS_TLS_ENABLED` - `MAIL_REDIS_OPERATION_TIMEOUT` - `MAIL_REDIS_COMMAND_STREAM` - SMTP provider: - `MAIL_SMTP_MODE=stub|smtp` - `MAIL_SMTP_ADDR` - `MAIL_SMTP_USERNAME` - `MAIL_SMTP_PASSWORD` - `MAIL_SMTP_FROM_EMAIL` - `MAIL_SMTP_FROM_NAME` - `MAIL_SMTP_TIMEOUT` - `MAIL_SMTP_INSECURE_SKIP_VERIFY` - template catalog: - `MAIL_TEMPLATE_DIR` - worker and operator behavior: - `MAIL_ATTEMPT_WORKER_CONCURRENCY` - `MAIL_STREAM_BLOCK_TIMEOUT` - `MAIL_OPERATOR_REQUEST_TIMEOUT` - OpenTelemetry: - `OTEL_SERVICE_NAME` - `OTEL_TRACES_EXPORTER` - `OTEL_METRICS_EXPORTER` - `OTEL_EXPORTER_OTLP_PROTOCOL` - `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` - `OTEL_EXPORTER_OTLP_METRICS_PROTOCOL` - `MAIL_OTEL_STDOUT_TRACES_ENABLED` - `MAIL_OTEL_STDOUT_METRICS_ENABLED` Defaults worth knowing: - `MAIL_INTERNAL_HTTP_ADDR=:8080` - `MAIL_SMTP_MODE=stub` - `MAIL_SMTP_TIMEOUT=15s` Additional SMTP note: - `MAIL_SMTP_INSECURE_SKIP_VERIFY=false` by default and is intended only for local self-signed SMTP capture or similar non-production environments - `MAIL_TEMPLATE_DIR=templates` - `MAIL_ATTEMPT_WORKER_CONCURRENCY=4` - `MAIL_STREAM_BLOCK_TIMEOUT=2s` - `MAIL_OPERATOR_REQUEST_TIMEOUT=5s` - `MAIL_SHUTDOWN_TIMEOUT=5s` Current implementation caveats: - `MAIL_REDIS_COMMAND_STREAM` is effective for the async command consumer - `MAIL_REDIS_ATTEMPT_SCHEDULE_KEY` and `MAIL_REDIS_DEAD_LETTER_PREFIX` are parsed but the Redis adapters still use the fixed keys `mail:attempt_schedule` and `mail:dead_letters:` - `MAIL_IDEMPOTENCY_TTL`, `MAIL_DELIVERY_TTL`, and `MAIL_ATTEMPT_TTL` are parsed but the Redis adapters still enforce fixed retentions of `7d`, `30d`, and `90d` ## Stable Input Contracts ### 1. Auth delivery REST Route: - `POST /api/v1/internal/login-code-deliveries` Headers: - required `Idempotency-Key` Request body: - `email` - `code` - `locale` Stable success outcomes: - `sent` - `suppressed` Important semantics: - `sent` means the request was durably accepted into the internal mail-delivery pipeline - `sent` does not mean that SMTP delivery has already completed - new durable auth deliveries surface as: - `queued` in `MAIL_SMTP_MODE=smtp` - `suppressed` in `MAIL_SMTP_MODE=stub` - duplicate replays with the same normalized request return the same stable outcome - mismatched replays on the same `(source, idempotency_key)` return `409 conflict` ### 2. Async generic command intake Ingress stream: - `mail:delivery_commands` Stable envelope fields: - `delivery_id` - `source` - `payload_mode` - `idempotency_key` - `request_id` - `trace_id` - `payload_json` Contract rules: - async `source` is fixed to `notification` - supported `payload_mode` values are `rendered` and `template` - `request_id` and `trace_id` are observability-only metadata and do not participate in idempotency fingerprinting - malformed commands are metered, logged, and recorded as dedicated malformed-command entries - malformed commands do not create a durable delivery record - stream offset advances only after durable acceptance or durable malformed-command recording ### 3. Trusted operator REST Routes: - `GET /api/v1/internal/deliveries` - `GET /api/v1/internal/deliveries/{delivery_id}` - `GET /api/v1/internal/deliveries/{delivery_id}/attempts` - `POST /api/v1/internal/deliveries/{delivery_id}/resend` List filters: - `recipient` - `status` - `source` - `template_id` - `idempotency_key` - `from_created_at_ms` - `to_created_at_ms` - `limit` - `cursor` Stable list behavior: - ordering is `created_at_ms DESC`, then `delivery_id DESC` - cursor is an opaque base64url encoding of `created_at_ms:delivery_id` - `idempotency_key` without `source` matches across all stable sources Stable resend rules: - resend is clone-only - resend is allowed only for terminal delivery states - resend creates a new delivery with `source=operator_resend` - resend clones preserve audit history of the original instead of mutating it ## Delivery Model ### Source vocabulary Stable `mail_delivery.source` values: - `authsession` - `notification` - `operator_resend` ### Payload modes Stable `mail_delivery.payload_mode` values: - `rendered` - `template` Rules: - `rendered` stores final `subject`, `text_body`, and optional `html_body` - `template` stores `template_id`, canonical `locale`, and strict JSON-object `template_variables` - raw attachment bodies are stored separately from the delivery audit record ### Delivery statuses Stable operator-visible `mail_delivery.status` values: - `queued` - `rendered` - `sending` - `sent` - `suppressed` - `failed` - `dead_letter` Status meanings: - `queued`: durable intake completed and the next attempt is scheduled - `rendered`: template content has been materialized - `sending`: one worker currently owns the active attempt - `sent`: provider accepted the envelope - `suppressed`: delivery was intentionally skipped as a successful business outcome - `failed`: terminal failure without dead-letter escalation - `dead_letter`: retry budget was exhausted and operator follow-up is required Stable transition rules: - newly accepted durable deliveries surface as `queued` or `suppressed` - `queued -> rendered` is used only for `payload_mode=template` - `queued|rendered -> sending` happens on successful claim - `sending -> sent|suppressed|failed|queued|dead_letter` depends on provider classification and retry policy The internal type `delivery.StatusAccepted` still exists in code, but it is not part of the stable public delivery-status vocabulary and is not emitted by the current runtime. ### Attempt statuses Stable `mail_attempt.status` values: - `scheduled` - `in_progress` - `render_failed` - `provider_accepted` - `provider_rejected` - `transport_failed` - `timed_out` Rules: - there is at most one active `in_progress` attempt per delivery - `render_failed` means template rendering failed before provider execution - `provider_accepted` ends the delivery as `sent` - `provider_rejected` is used for: - provider-side suppression ending in `suppressed` - permanent provider failure ending in `failed` - retryable paths are expressed through: - `transport_failed` - `timed_out` ## Template and Locale Policy Template layout: - `//subject.tmpl` - `//text.tmpl` - optional `//html.tmpl` Required auth fallback files: - `auth.login_code/en/subject.tmpl` - `auth.login_code/en/text.tmpl` Rendering rules: - the process loads the full catalog at startup - exact locale match is attempted first - the only fallback locale is `en` - there are no intermediate reductions such as `fr-CA -> fr -> en` - `locale_fallback_used=true` is stored durably when fallback is applied - subject and text use `text/template` - optional HTML uses `html/template` - missing required variables and template lookup failures are classified into stable render-failure codes ## Redis Logical Model Primary keys: - `mail:deliveries:` - `mail:attempts::` - `mail:idempotency::` - `mail:dead_letters:` - `mail:delivery_payloads:` - `mail:malformed_commands:` - `mail:stream_offsets:` Scheduling and ingress keys: - `mail:delivery_commands` - `mail:attempt_schedule` Operator indexes: - `mail:idx:recipient:` - `mail:idx:status:` - `mail:idx:source:` - `mail:idx:template:` - `mail:idx:idempotency::` - `mail:idx:created_at` - `mail:idx:malformed_command:created_at` Storage rules: - dynamic Redis key segments are base64url-encoded - durable records are stored as strict JSON blobs - timestamps are stored in Unix milliseconds - raw attachment payloads are separated from audit metadata - malformed async commands are stored idempotently by `stream_entry_id` Current fixed retentions: - idempotency: `7d` - deliveries and payload audit: `30d` - attempts and dead letters: `90d` - malformed commands: `90d` ## Provider, Retry, and Failure Policy Provider modes: - `stub` - `smtp` SMTP rules: - outbound SMTP requires `STARTTLS` - servers without `STARTTLS` support are treated as permanent failure - SMTP authentication is enabled only when both username and password are set Retry ladder: - attempt `1 -> 2`: `1m` - attempt `2 -> 3`: `5m` - attempt `3 -> 4`: `30m` - after attempt `4`: `dead_letter` Failure handling: - retryable provider failures become `transport_failed` or `timed_out`, then either reschedule or escalate to `dead_letter` - permanent provider failures become `failed` - render failures become `failed` with `render_failed` - stale claimed work is recovered after `MAIL_SMTP_TIMEOUT + 30s` ## Observability The runtime exports telemetry through configured OpenTelemetry exporters only. Main signals: - `mail.delivery.accepted_auth` - `mail.delivery.accepted_generic` - `mail.delivery.suppressed` - `mail.delivery.status_transitions` - `mail.attempt.outcomes` - `mail.delivery.dead_letters` - `mail.template.locale_fallback` - `mail.attempt_schedule.depth` - `mail.attempt_schedule.oldest_age_ms` - `mail.provider.send.duration_ms` - `mail.stream_commands.malformed` Additional behavior: - internal HTTP uses `otelhttp` - Redis clients use `redisotel` - structured logs include `otel_trace_id` and `otel_span_id` when available ## Verification Relevant commands: - `cd mail && go test ./...` - `cd integration && go test ./authsessionmail/...` - `cd integration && go test ./gatewayauthsessionmail/...` Extended references: - [Runtime and components](docs/runtime.md) - [Main flows](docs/flows.md) - [Configuration and contract examples](docs/examples.md) - [Operator runbook](docs/runbook.md)