Files
galaxy-game/mail
2026-04-22 08:49:45 +02:00
..
2026-04-22 08:49:45 +02:00
2026-04-17 18:39:16 +02:00
2026-04-22 08:49:45 +02:00
2026-04-22 08:49:45 +02:00
2026-04-22 08:49:45 +02:00
2026-04-17 18:39:16 +02:00
2026-04-17 18:39:16 +02:00
2026-04-17 18:39:16 +02:00
2026-04-17 18:39:16 +02:00
2026-04-17 18:39:16 +02:00
2026-04-22 08:49:45 +02:00

Mail Service

Mail Service is the internal e-mail delivery service of Galaxy.

Canonical contracts:

Purpose

Mail Service owns durable intake, rendering, execution, retry, audit, and operator recovery for outbound e-mail.

It does not decide whether a business event should become e-mail. That decision belongs to Notification Service.

Responsibility Boundaries

Mail Service is responsible for:

  • direct auth-code mail intake from Auth / Session Service
  • async generic mail intake from Notification Service
  • validation of recipient envelope, payload shape, locale, and attachments
  • deterministic template rendering for template-mode deliveries
  • provider execution through stub or smtp
  • retry scheduling, dead-letter escalation, and operator-visible audit state
  • trusted operator reads and resend by clone creation

Mail Service is not responsible for:

  • end-user authentication or authorization
  • notification preference ownership
  • deciding whether non-auth mail should be sent at all
  • direct calls from Geo Profile Service
  • hot-reloading templates or editing template catalog state at runtime

Cross-service routing rules:

  • Auth / Session Service -> Mail Service is synchronous trusted REST
  • Notification Service -> Mail Service is asynchronous Redis Streams
  • Geo Profile Service must route optional admin e-mail through Notification Service, not directly to Mail Service
  • auth-code delivery remains a direct Auth / Session Service -> Mail Service flow and does not pass through Notification Service

Runtime Surface

cmd/mail starts one internal-only process with:

  • one trusted internal HTTP listener on MAIL_INTERNAL_HTTP_ADDR
  • one async command consumer
  • one attempt scheduler
  • one attempt worker pool
  • one cleanup worker

The service has no public ingress and no dedicated admin listener.

Intentional runtime omissions:

  • no /healthz
  • no /readyz
  • no /metrics

Operational behavior:

  • startup performs bounded Redis connectivity checks and fails fast on invalid runtime configuration
  • the template catalog is parsed once at startup and kept immutable for the lifetime of the process
  • template changes require process restart
  • operator handlers execute under MAIL_OPERATOR_REQUEST_TIMEOUT

Configuration

Required for all starts:

  • MAIL_REDIS_ADDR

Primary configuration groups:

  • process and logging:
    • MAIL_SHUTDOWN_TIMEOUT
    • MAIL_LOG_LEVEL
  • internal HTTP:
    • MAIL_INTERNAL_HTTP_ADDR
    • MAIL_INTERNAL_HTTP_READ_HEADER_TIMEOUT
    • MAIL_INTERNAL_HTTP_READ_TIMEOUT
    • MAIL_INTERNAL_HTTP_IDLE_TIMEOUT
  • Redis connectivity:
    • MAIL_REDIS_USERNAME
    • MAIL_REDIS_PASSWORD
    • MAIL_REDIS_DB
    • MAIL_REDIS_TLS_ENABLED
    • MAIL_REDIS_OPERATION_TIMEOUT
    • MAIL_REDIS_COMMAND_STREAM
  • SMTP provider:
    • MAIL_SMTP_MODE=stub|smtp
    • MAIL_SMTP_ADDR
    • MAIL_SMTP_USERNAME
    • MAIL_SMTP_PASSWORD
    • MAIL_SMTP_FROM_EMAIL
    • MAIL_SMTP_FROM_NAME
    • MAIL_SMTP_TIMEOUT
    • MAIL_SMTP_INSECURE_SKIP_VERIFY
  • template catalog:
    • MAIL_TEMPLATE_DIR
  • worker and operator behavior:
    • MAIL_ATTEMPT_WORKER_CONCURRENCY
    • MAIL_STREAM_BLOCK_TIMEOUT
    • MAIL_OPERATOR_REQUEST_TIMEOUT
  • OpenTelemetry:
    • OTEL_SERVICE_NAME
    • OTEL_TRACES_EXPORTER
    • OTEL_METRICS_EXPORTER
    • OTEL_EXPORTER_OTLP_PROTOCOL
    • OTEL_EXPORTER_OTLP_TRACES_PROTOCOL
    • OTEL_EXPORTER_OTLP_METRICS_PROTOCOL
    • MAIL_OTEL_STDOUT_TRACES_ENABLED
    • MAIL_OTEL_STDOUT_METRICS_ENABLED

Defaults worth knowing:

  • MAIL_INTERNAL_HTTP_ADDR=:8080
  • MAIL_SMTP_MODE=stub
  • MAIL_SMTP_TIMEOUT=15s

Additional SMTP note:

  • MAIL_SMTP_INSECURE_SKIP_VERIFY=false by default and is intended only for local self-signed SMTP capture or similar non-production environments
  • MAIL_TEMPLATE_DIR=templates
  • MAIL_ATTEMPT_WORKER_CONCURRENCY=4
  • MAIL_STREAM_BLOCK_TIMEOUT=2s
  • MAIL_OPERATOR_REQUEST_TIMEOUT=5s
  • MAIL_SHUTDOWN_TIMEOUT=5s

Current implementation caveats:

  • MAIL_REDIS_COMMAND_STREAM is effective for the async command consumer
  • MAIL_REDIS_ATTEMPT_SCHEDULE_KEY and MAIL_REDIS_DEAD_LETTER_PREFIX are parsed but the Redis adapters still use the fixed keys mail:attempt_schedule and mail:dead_letters:<delivery_id>
  • MAIL_IDEMPOTENCY_TTL, MAIL_DELIVERY_TTL, and MAIL_ATTEMPT_TTL are parsed but the Redis adapters still enforce fixed retentions of 7d, 30d, and 90d

Stable Input Contracts

1. Auth delivery REST

Route:

  • POST /api/v1/internal/login-code-deliveries

Headers:

  • required Idempotency-Key

Request body:

  • email
  • code
  • locale

Stable success outcomes:

  • sent
  • suppressed

Important semantics:

  • sent means the request was durably accepted into the internal mail-delivery pipeline
  • sent does not mean that SMTP delivery has already completed
  • new durable auth deliveries surface as:
    • queued in MAIL_SMTP_MODE=smtp
    • suppressed in MAIL_SMTP_MODE=stub
  • duplicate replays with the same normalized request return the same stable outcome
  • mismatched replays on the same (source, idempotency_key) return 409 conflict

2. Async generic command intake

Ingress stream:

  • mail:delivery_commands

Stable envelope fields:

  • delivery_id
  • source
  • payload_mode
  • idempotency_key
  • requested_at_ms
  • request_id
  • trace_id
  • payload_json

Contract rules:

  • async source is fixed to notification
  • supported payload_mode values are rendered and template
  • Notification Service uses only payload_mode=template for notification-generated mail, even though the generic async contract keeps both rendered and template
  • notification-owned template_id values are identical to the notification_type vocabulary, for example game.turn.ready and lobby.membership.approved
  • the real Notification Service -> Mail Service integration suite verifies template-mode handoff for notification-owned mail
  • requested_at_ms stores the publisher-side original request timestamp in Unix milliseconds
  • request_id and trace_id are observability-only metadata and do not participate in idempotency fingerprinting
  • malformed commands are metered, logged, and recorded as dedicated malformed-command entries
  • malformed commands do not create a durable delivery record
  • stream offset advances only after durable acceptance or durable malformed-command recording

3. Trusted operator REST

Routes:

  • GET /api/v1/internal/deliveries
  • GET /api/v1/internal/deliveries/{delivery_id}
  • GET /api/v1/internal/deliveries/{delivery_id}/attempts
  • POST /api/v1/internal/deliveries/{delivery_id}/resend

List filters:

  • recipient
  • status
  • source
  • template_id
  • idempotency_key
  • from_created_at_ms
  • to_created_at_ms
  • limit
  • cursor

Stable list behavior:

  • ordering is created_at_ms DESC, then delivery_id DESC
  • cursor is an opaque base64url encoding of created_at_ms:delivery_id
  • idempotency_key without source matches across all stable sources

Stable resend rules:

  • resend is clone-only
  • resend is allowed only for terminal delivery states
  • resend creates a new delivery with source=operator_resend
  • resend clones preserve audit history of the original instead of mutating it

Delivery Model

Source vocabulary

Stable mail_delivery.source values:

  • authsession
  • notification
  • operator_resend

Payload modes

Stable mail_delivery.payload_mode values:

  • rendered
  • template

Rules:

  • rendered stores final subject, text_body, and optional html_body
  • template stores template_id, canonical locale, and strict JSON-object template_variables
  • raw attachment bodies are stored separately from the delivery audit record

Delivery statuses

Stable operator-visible mail_delivery.status values:

  • queued
  • rendered
  • sending
  • sent
  • suppressed
  • failed
  • dead_letter

Status meanings:

  • queued: durable intake completed and the next attempt is scheduled
  • rendered: template content has been materialized
  • sending: one worker currently owns the active attempt
  • sent: provider accepted the envelope
  • suppressed: delivery was intentionally skipped as a successful business outcome
  • failed: terminal failure without dead-letter escalation
  • dead_letter: retry budget was exhausted and operator follow-up is required

Stable transition rules:

  • newly accepted durable deliveries surface as queued or suppressed
  • queued -> rendered is used only for payload_mode=template
  • queued|rendered -> sending happens on successful claim
  • sending -> sent|suppressed|failed|queued|dead_letter depends on provider classification and retry policy

The internal type delivery.StatusAccepted still exists in code, but it is not part of the stable public delivery-status vocabulary and is not emitted by the current runtime.

Attempt statuses

Stable mail_attempt.status values:

  • scheduled
  • in_progress
  • render_failed
  • provider_accepted
  • provider_rejected
  • transport_failed
  • timed_out

Rules:

  • there is at most one active in_progress attempt per delivery
  • render_failed means template rendering failed before provider execution
  • provider_accepted ends the delivery as sent
  • provider_rejected is used for:
    • provider-side suppression ending in suppressed
    • permanent provider failure ending in failed
  • retryable paths are expressed through:
    • transport_failed
    • timed_out

Template and Locale Policy

Template layout:

  • <template_id>/<locale>/subject.tmpl
  • <template_id>/<locale>/text.tmpl
  • optional <template_id>/<locale>/html.tmpl

Required auth fallback files:

  • auth.login_code/en/subject.tmpl
  • auth.login_code/en/text.tmpl

Notification-owned English template directories are frozen by ../notification/README.md and the service-local Notification Service docs. auth.login_code remains the required auth template family for the direct Auth / Session Service -> Mail Service flow and is not part of the notification-owned template set.

Rendering rules:

  • the process loads the full catalog at startup
  • exact locale match is attempted first
  • the only fallback locale is en
  • there are no intermediate reductions such as fr-CA -> fr -> en
  • locale_fallback_used=true is stored durably when fallback is applied
  • subject and text use text/template
  • optional HTML uses html/template
  • missing required variables and template lookup failures are classified into stable render-failure codes

Redis Logical Model

Primary keys:

  • mail:deliveries:<delivery_id>
  • mail:attempts:<delivery_id>:<attempt_no>
  • mail:idempotency:<source>:<idempotency_key>
  • mail:dead_letters:<delivery_id>
  • mail:delivery_payloads:<delivery_id>
  • mail:malformed_commands:<stream_entry_id>
  • mail:stream_offsets:<stream>

Scheduling and ingress keys:

  • mail:delivery_commands
  • mail:attempt_schedule

Operator indexes:

  • mail:idx:recipient:<email>
  • mail:idx:status:<status>
  • mail:idx:source:<source>
  • mail:idx:template:<template_id>
  • mail:idx:idempotency:<source>:<idempotency_key>
  • mail:idx:created_at
  • mail:idx:malformed_command:created_at

Storage rules:

  • dynamic Redis key segments are base64url-encoded
  • durable records are stored as strict JSON blobs
  • timestamps are stored in Unix milliseconds
  • raw attachment payloads are separated from audit metadata
  • malformed async commands are stored idempotently by stream_entry_id

Current fixed retentions:

  • idempotency: 7d
  • deliveries and payload audit: 30d
  • attempts and dead letters: 90d
  • malformed commands: 90d

Provider, Retry, and Failure Policy

Provider modes:

  • stub
  • smtp

SMTP rules:

  • outbound SMTP requires STARTTLS
  • servers without STARTTLS support are treated as permanent failure
  • SMTP authentication is enabled only when both username and password are set

Retry ladder:

  • attempt 1 -> 2: 1m
  • attempt 2 -> 3: 5m
  • attempt 3 -> 4: 30m
  • after attempt 4: dead_letter

Failure handling:

  • retryable provider failures become transport_failed or timed_out, then either reschedule or escalate to dead_letter
  • permanent provider failures become failed
  • render failures become failed with render_failed
  • stale claimed work is recovered after MAIL_SMTP_TIMEOUT + 30s

Observability

The runtime exports telemetry through configured OpenTelemetry exporters only.

Main signals:

  • mail.delivery.accepted_auth
  • mail.delivery.accepted_generic
  • mail.delivery.suppressed
  • mail.delivery.status_transitions
  • mail.attempt.outcomes
  • mail.delivery.dead_letters
  • mail.template.locale_fallback
  • mail.attempt_schedule.depth
  • mail.attempt_schedule.oldest_age_ms
  • mail.provider.send.duration_ms
  • mail.stream_commands.malformed

Additional behavior:

  • internal HTTP uses otelhttp
  • Redis clients use redisotel
  • structured logs include otel_trace_id and otel_span_id when available

Verification

Relevant commands:

  • cd mail && go test ./...
  • cd integration && go test ./authsessionmail/...
  • cd integration && go test ./gatewayauthsessionmail/...

Extended references: