feat: mail service
This commit is contained in:
+460
@@ -0,0 +1,460 @@
|
||||
# Mail Service
|
||||
|
||||
`Mail Service` is the internal e-mail delivery service of Galaxy.
|
||||
|
||||
Canonical contracts:
|
||||
|
||||
- [Internal REST API](api/internal-openapi.yaml)
|
||||
- [Async generic command contract](api/delivery-commands-asyncapi.yaml)
|
||||
- [Extended service docs](docs/README.md)
|
||||
|
||||
## Purpose
|
||||
|
||||
`Mail Service` owns durable intake, rendering, execution, retry, audit, and
|
||||
operator recovery for outbound e-mail.
|
||||
|
||||
It does not decide whether a business event should become e-mail. That
|
||||
decision belongs to `Notification Service`.
|
||||
|
||||
## Responsibility Boundaries
|
||||
|
||||
`Mail Service` is responsible for:
|
||||
|
||||
- direct auth-code mail intake from `Auth / Session Service`
|
||||
- async generic mail intake from `Notification Service`
|
||||
- validation of recipient envelope, payload shape, locale, and attachments
|
||||
- deterministic template rendering for template-mode deliveries
|
||||
- provider execution through `stub` or `smtp`
|
||||
- retry scheduling, dead-letter escalation, and operator-visible audit state
|
||||
- trusted operator reads and resend by clone creation
|
||||
|
||||
`Mail Service` is not responsible for:
|
||||
|
||||
- end-user authentication or authorization
|
||||
- notification preference ownership
|
||||
- deciding whether non-auth mail should be sent at all
|
||||
- direct calls from `Geo Profile Service`
|
||||
- hot-reloading templates or editing template catalog state at runtime
|
||||
|
||||
Cross-service routing rules:
|
||||
|
||||
- `Auth / Session Service -> Mail Service` is synchronous trusted REST
|
||||
- `Notification Service -> Mail Service` is asynchronous `Redis Streams`
|
||||
- `Geo Profile Service` must route optional admin e-mail through
|
||||
`Notification Service`, not directly to `Mail Service`
|
||||
|
||||
## Runtime Surface
|
||||
|
||||
`cmd/mail` starts one internal-only process with:
|
||||
|
||||
- one trusted internal HTTP listener on `MAIL_INTERNAL_HTTP_ADDR`
|
||||
- one async command consumer
|
||||
- one attempt scheduler
|
||||
- one attempt worker pool
|
||||
- one cleanup worker
|
||||
|
||||
The service has no public ingress and no dedicated admin listener.
|
||||
|
||||
Intentional runtime omissions:
|
||||
|
||||
- no `/healthz`
|
||||
- no `/readyz`
|
||||
- no `/metrics`
|
||||
|
||||
Operational behavior:
|
||||
|
||||
- startup performs bounded Redis connectivity checks and fails fast on invalid
|
||||
runtime configuration
|
||||
- the template catalog is parsed once at startup and kept immutable for the
|
||||
lifetime of the process
|
||||
- template changes require process restart
|
||||
- operator handlers execute under `MAIL_OPERATOR_REQUEST_TIMEOUT`
|
||||
|
||||
## Configuration
|
||||
|
||||
Required for all starts:
|
||||
|
||||
- `MAIL_REDIS_ADDR`
|
||||
|
||||
Primary configuration groups:
|
||||
|
||||
- process and logging:
|
||||
- `MAIL_SHUTDOWN_TIMEOUT`
|
||||
- `MAIL_LOG_LEVEL`
|
||||
- internal HTTP:
|
||||
- `MAIL_INTERNAL_HTTP_ADDR`
|
||||
- `MAIL_INTERNAL_HTTP_READ_HEADER_TIMEOUT`
|
||||
- `MAIL_INTERNAL_HTTP_READ_TIMEOUT`
|
||||
- `MAIL_INTERNAL_HTTP_IDLE_TIMEOUT`
|
||||
- Redis connectivity:
|
||||
- `MAIL_REDIS_USERNAME`
|
||||
- `MAIL_REDIS_PASSWORD`
|
||||
- `MAIL_REDIS_DB`
|
||||
- `MAIL_REDIS_TLS_ENABLED`
|
||||
- `MAIL_REDIS_OPERATION_TIMEOUT`
|
||||
- `MAIL_REDIS_COMMAND_STREAM`
|
||||
- SMTP provider:
|
||||
- `MAIL_SMTP_MODE=stub|smtp`
|
||||
- `MAIL_SMTP_ADDR`
|
||||
- `MAIL_SMTP_USERNAME`
|
||||
- `MAIL_SMTP_PASSWORD`
|
||||
- `MAIL_SMTP_FROM_EMAIL`
|
||||
- `MAIL_SMTP_FROM_NAME`
|
||||
- `MAIL_SMTP_TIMEOUT`
|
||||
- `MAIL_SMTP_INSECURE_SKIP_VERIFY`
|
||||
- template catalog:
|
||||
- `MAIL_TEMPLATE_DIR`
|
||||
- worker and operator behavior:
|
||||
- `MAIL_ATTEMPT_WORKER_CONCURRENCY`
|
||||
- `MAIL_STREAM_BLOCK_TIMEOUT`
|
||||
- `MAIL_OPERATOR_REQUEST_TIMEOUT`
|
||||
- OpenTelemetry:
|
||||
- `OTEL_SERVICE_NAME`
|
||||
- `OTEL_TRACES_EXPORTER`
|
||||
- `OTEL_METRICS_EXPORTER`
|
||||
- `OTEL_EXPORTER_OTLP_PROTOCOL`
|
||||
- `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`
|
||||
- `OTEL_EXPORTER_OTLP_METRICS_PROTOCOL`
|
||||
- `MAIL_OTEL_STDOUT_TRACES_ENABLED`
|
||||
- `MAIL_OTEL_STDOUT_METRICS_ENABLED`
|
||||
|
||||
Defaults worth knowing:
|
||||
|
||||
- `MAIL_INTERNAL_HTTP_ADDR=:8080`
|
||||
- `MAIL_SMTP_MODE=stub`
|
||||
- `MAIL_SMTP_TIMEOUT=15s`
|
||||
|
||||
Additional SMTP note:
|
||||
|
||||
- `MAIL_SMTP_INSECURE_SKIP_VERIFY=false` by default and is intended only for
|
||||
local self-signed SMTP capture or similar non-production environments
|
||||
- `MAIL_TEMPLATE_DIR=templates`
|
||||
- `MAIL_ATTEMPT_WORKER_CONCURRENCY=4`
|
||||
- `MAIL_STREAM_BLOCK_TIMEOUT=2s`
|
||||
- `MAIL_OPERATOR_REQUEST_TIMEOUT=5s`
|
||||
- `MAIL_SHUTDOWN_TIMEOUT=5s`
|
||||
|
||||
Current implementation caveats:
|
||||
|
||||
- `MAIL_REDIS_COMMAND_STREAM` is effective for the async command consumer
|
||||
- `MAIL_REDIS_ATTEMPT_SCHEDULE_KEY` and `MAIL_REDIS_DEAD_LETTER_PREFIX` are
|
||||
parsed but the Redis adapters still use the fixed keys
|
||||
`mail:attempt_schedule` and `mail:dead_letters:<delivery_id>`
|
||||
- `MAIL_IDEMPOTENCY_TTL`, `MAIL_DELIVERY_TTL`, and `MAIL_ATTEMPT_TTL` are
|
||||
parsed but the Redis adapters still enforce fixed retentions of `7d`, `30d`,
|
||||
and `90d`
|
||||
|
||||
## Stable Input Contracts
|
||||
|
||||
### 1. Auth delivery REST
|
||||
|
||||
Route:
|
||||
|
||||
- `POST /api/v1/internal/login-code-deliveries`
|
||||
|
||||
Headers:
|
||||
|
||||
- required `Idempotency-Key`
|
||||
|
||||
Request body:
|
||||
|
||||
- `email`
|
||||
- `code`
|
||||
- `locale`
|
||||
|
||||
Stable success outcomes:
|
||||
|
||||
- `sent`
|
||||
- `suppressed`
|
||||
|
||||
Important semantics:
|
||||
|
||||
- `sent` means the request was durably accepted into the internal
|
||||
mail-delivery pipeline
|
||||
- `sent` does not mean that SMTP delivery has already completed
|
||||
- new durable auth deliveries surface as:
|
||||
- `queued` in `MAIL_SMTP_MODE=smtp`
|
||||
- `suppressed` in `MAIL_SMTP_MODE=stub`
|
||||
- duplicate replays with the same normalized request return the same stable
|
||||
outcome
|
||||
- mismatched replays on the same `(source, idempotency_key)` return
|
||||
`409 conflict`
|
||||
|
||||
### 2. Async generic command intake
|
||||
|
||||
Ingress stream:
|
||||
|
||||
- `mail:delivery_commands`
|
||||
|
||||
Stable envelope fields:
|
||||
|
||||
- `delivery_id`
|
||||
- `source`
|
||||
- `payload_mode`
|
||||
- `idempotency_key`
|
||||
- `request_id`
|
||||
- `trace_id`
|
||||
- `payload_json`
|
||||
|
||||
Contract rules:
|
||||
|
||||
- async `source` is fixed to `notification`
|
||||
- supported `payload_mode` values are `rendered` and `template`
|
||||
- `request_id` and `trace_id` are observability-only metadata and do not
|
||||
participate in idempotency fingerprinting
|
||||
- malformed commands are metered, logged, and recorded as dedicated
|
||||
malformed-command entries
|
||||
- malformed commands do not create a durable delivery record
|
||||
- stream offset advances only after durable acceptance or durable
|
||||
malformed-command recording
|
||||
|
||||
### 3. Trusted operator REST
|
||||
|
||||
Routes:
|
||||
|
||||
- `GET /api/v1/internal/deliveries`
|
||||
- `GET /api/v1/internal/deliveries/{delivery_id}`
|
||||
- `GET /api/v1/internal/deliveries/{delivery_id}/attempts`
|
||||
- `POST /api/v1/internal/deliveries/{delivery_id}/resend`
|
||||
|
||||
List filters:
|
||||
|
||||
- `recipient`
|
||||
- `status`
|
||||
- `source`
|
||||
- `template_id`
|
||||
- `idempotency_key`
|
||||
- `from_created_at_ms`
|
||||
- `to_created_at_ms`
|
||||
- `limit`
|
||||
- `cursor`
|
||||
|
||||
Stable list behavior:
|
||||
|
||||
- ordering is `created_at_ms DESC`, then `delivery_id DESC`
|
||||
- cursor is an opaque base64url encoding of `created_at_ms:delivery_id`
|
||||
- `idempotency_key` without `source` matches across all stable sources
|
||||
|
||||
Stable resend rules:
|
||||
|
||||
- resend is clone-only
|
||||
- resend is allowed only for terminal delivery states
|
||||
- resend creates a new delivery with `source=operator_resend`
|
||||
- resend clones preserve audit history of the original instead of mutating it
|
||||
|
||||
## Delivery Model
|
||||
|
||||
### Source vocabulary
|
||||
|
||||
Stable `mail_delivery.source` values:
|
||||
|
||||
- `authsession`
|
||||
- `notification`
|
||||
- `operator_resend`
|
||||
|
||||
### Payload modes
|
||||
|
||||
Stable `mail_delivery.payload_mode` values:
|
||||
|
||||
- `rendered`
|
||||
- `template`
|
||||
|
||||
Rules:
|
||||
|
||||
- `rendered` stores final `subject`, `text_body`, and optional `html_body`
|
||||
- `template` stores `template_id`, canonical `locale`, and strict JSON-object
|
||||
`template_variables`
|
||||
- raw attachment bodies are stored separately from the delivery audit record
|
||||
|
||||
### Delivery statuses
|
||||
|
||||
Stable operator-visible `mail_delivery.status` values:
|
||||
|
||||
- `queued`
|
||||
- `rendered`
|
||||
- `sending`
|
||||
- `sent`
|
||||
- `suppressed`
|
||||
- `failed`
|
||||
- `dead_letter`
|
||||
|
||||
Status meanings:
|
||||
|
||||
- `queued`: durable intake completed and the next attempt is scheduled
|
||||
- `rendered`: template content has been materialized
|
||||
- `sending`: one worker currently owns the active attempt
|
||||
- `sent`: provider accepted the envelope
|
||||
- `suppressed`: delivery was intentionally skipped as a successful business
|
||||
outcome
|
||||
- `failed`: terminal failure without dead-letter escalation
|
||||
- `dead_letter`: retry budget was exhausted and operator follow-up is required
|
||||
|
||||
Stable transition rules:
|
||||
|
||||
- newly accepted durable deliveries surface as `queued` or `suppressed`
|
||||
- `queued -> rendered` is used only for `payload_mode=template`
|
||||
- `queued|rendered -> sending` happens on successful claim
|
||||
- `sending -> sent|suppressed|failed|queued|dead_letter` depends on provider
|
||||
classification and retry policy
|
||||
|
||||
The internal type `delivery.StatusAccepted` still exists in code, but it is
|
||||
not part of the stable public delivery-status vocabulary and is not emitted by
|
||||
the current runtime.
|
||||
|
||||
### Attempt statuses
|
||||
|
||||
Stable `mail_attempt.status` values:
|
||||
|
||||
- `scheduled`
|
||||
- `in_progress`
|
||||
- `render_failed`
|
||||
- `provider_accepted`
|
||||
- `provider_rejected`
|
||||
- `transport_failed`
|
||||
- `timed_out`
|
||||
|
||||
Rules:
|
||||
|
||||
- there is at most one active `in_progress` attempt per delivery
|
||||
- `render_failed` means template rendering failed before provider execution
|
||||
- `provider_accepted` ends the delivery as `sent`
|
||||
- `provider_rejected` is used for:
|
||||
- provider-side suppression ending in `suppressed`
|
||||
- permanent provider failure ending in `failed`
|
||||
- retryable paths are expressed through:
|
||||
- `transport_failed`
|
||||
- `timed_out`
|
||||
|
||||
## Template and Locale Policy
|
||||
|
||||
Template layout:
|
||||
|
||||
- `<template_id>/<locale>/subject.tmpl`
|
||||
- `<template_id>/<locale>/text.tmpl`
|
||||
- optional `<template_id>/<locale>/html.tmpl`
|
||||
|
||||
Required auth fallback files:
|
||||
|
||||
- `auth.login_code/en/subject.tmpl`
|
||||
- `auth.login_code/en/text.tmpl`
|
||||
|
||||
Rendering rules:
|
||||
|
||||
- the process loads the full catalog at startup
|
||||
- exact locale match is attempted first
|
||||
- the only fallback locale is `en`
|
||||
- there are no intermediate reductions such as `fr-CA -> fr -> en`
|
||||
- `locale_fallback_used=true` is stored durably when fallback is applied
|
||||
- subject and text use `text/template`
|
||||
- optional HTML uses `html/template`
|
||||
- missing required variables and template lookup failures are classified into
|
||||
stable render-failure codes
|
||||
|
||||
## Redis Logical Model
|
||||
|
||||
Primary keys:
|
||||
|
||||
- `mail:deliveries:<delivery_id>`
|
||||
- `mail:attempts:<delivery_id>:<attempt_no>`
|
||||
- `mail:idempotency:<source>:<idempotency_key>`
|
||||
- `mail:dead_letters:<delivery_id>`
|
||||
- `mail:delivery_payloads:<delivery_id>`
|
||||
- `mail:malformed_commands:<stream_entry_id>`
|
||||
- `mail:stream_offsets:<stream>`
|
||||
|
||||
Scheduling and ingress keys:
|
||||
|
||||
- `mail:delivery_commands`
|
||||
- `mail:attempt_schedule`
|
||||
|
||||
Operator indexes:
|
||||
|
||||
- `mail:idx:recipient:<email>`
|
||||
- `mail:idx:status:<status>`
|
||||
- `mail:idx:source:<source>`
|
||||
- `mail:idx:template:<template_id>`
|
||||
- `mail:idx:idempotency:<source>:<idempotency_key>`
|
||||
- `mail:idx:created_at`
|
||||
- `mail:idx:malformed_command:created_at`
|
||||
|
||||
Storage rules:
|
||||
|
||||
- dynamic Redis key segments are base64url-encoded
|
||||
- durable records are stored as strict JSON blobs
|
||||
- timestamps are stored in Unix milliseconds
|
||||
- raw attachment payloads are separated from audit metadata
|
||||
- malformed async commands are stored idempotently by `stream_entry_id`
|
||||
|
||||
Current fixed retentions:
|
||||
|
||||
- idempotency: `7d`
|
||||
- deliveries and payload audit: `30d`
|
||||
- attempts and dead letters: `90d`
|
||||
- malformed commands: `90d`
|
||||
|
||||
## Provider, Retry, and Failure Policy
|
||||
|
||||
Provider modes:
|
||||
|
||||
- `stub`
|
||||
- `smtp`
|
||||
|
||||
SMTP rules:
|
||||
|
||||
- outbound SMTP requires `STARTTLS`
|
||||
- servers without `STARTTLS` support are treated as permanent failure
|
||||
- SMTP authentication is enabled only when both username and password are set
|
||||
|
||||
Retry ladder:
|
||||
|
||||
- attempt `1 -> 2`: `1m`
|
||||
- attempt `2 -> 3`: `5m`
|
||||
- attempt `3 -> 4`: `30m`
|
||||
- after attempt `4`: `dead_letter`
|
||||
|
||||
Failure handling:
|
||||
|
||||
- retryable provider failures become `transport_failed` or `timed_out`, then
|
||||
either reschedule or escalate to `dead_letter`
|
||||
- permanent provider failures become `failed`
|
||||
- render failures become `failed` with `render_failed`
|
||||
- stale claimed work is recovered after `MAIL_SMTP_TIMEOUT + 30s`
|
||||
|
||||
## Observability
|
||||
|
||||
The runtime exports telemetry through configured OpenTelemetry exporters only.
|
||||
|
||||
Main signals:
|
||||
|
||||
- `mail.delivery.accepted_auth`
|
||||
- `mail.delivery.accepted_generic`
|
||||
- `mail.delivery.suppressed`
|
||||
- `mail.delivery.status_transitions`
|
||||
- `mail.attempt.outcomes`
|
||||
- `mail.delivery.dead_letters`
|
||||
- `mail.template.locale_fallback`
|
||||
- `mail.attempt_schedule.depth`
|
||||
- `mail.attempt_schedule.oldest_age_ms`
|
||||
- `mail.provider.send.duration_ms`
|
||||
- `mail.stream_commands.malformed`
|
||||
|
||||
Additional behavior:
|
||||
|
||||
- internal HTTP uses `otelhttp`
|
||||
- Redis clients use `redisotel`
|
||||
- structured logs include `otel_trace_id` and `otel_span_id` when available
|
||||
|
||||
## Verification
|
||||
|
||||
Relevant commands:
|
||||
|
||||
- `cd mail && go test ./...`
|
||||
- `cd integration && go test ./authsessionmail/...`
|
||||
- `cd integration && go test ./gatewayauthsessionmail/...`
|
||||
|
||||
Extended references:
|
||||
|
||||
- [Runtime and components](docs/runtime.md)
|
||||
- [Main flows](docs/flows.md)
|
||||
- [Configuration and contract examples](docs/examples.md)
|
||||
- [Operator runbook](docs/runbook.md)
|
||||
Reference in New Issue
Block a user