# Mail Service Implementation Plan This plan has been already implemented and stays here for historical reasons. It should NOT be threated as source of truth for service functionality. ## Summary This plan describes the full v1 implementation path for `galaxy/mail`. It is intentionally decision-complete: the implementer should not need to invent service boundaries, storage layout, contracts, or retry semantics while building the service. The target outcome is one runnable internal service that: - accepts auth-code mail synchronously over trusted REST - consumes generic non-auth mail asynchronously from `Redis Streams` - renders templates or accepts pre-rendered content - delivers through SMTP or a deterministic stub - retries bounded transient failures - stores durable delivery audit state - exposes trusted operator reads and resend controls ## Global Rules - keep one logical delivery equal to one SMTP envelope - keep `suppressed` separate from failure - require explicit idempotency for every accepted command - prefer deterministic Redis-backed scheduling over in-memory timers - keep operator inspection possible without direct Redis access - treat filesystem templates as the v1 source of truth - keep public and trusted contracts explicit and versionable ## Target Runtime Layout ```text mail/ ├── cmd/ │ └── mail/ │ └── main.go ├── internal/ │ ├── app/ │ │ ├── app.go │ │ ├── bootstrap.go │ │ └── runtime.go │ ├── config/ │ │ ├── config.go │ │ ├── env.go │ │ └── validation.go │ ├── domain/ │ │ ├── delivery/ │ │ │ ├── model.go │ │ │ ├── state.go │ │ │ └── errors.go │ │ ├── attempt/ │ │ │ ├── model.go │ │ │ ├── state.go │ │ │ └── policy.go │ │ ├── idempotency/ │ │ │ └── model.go │ │ ├── template/ │ │ │ ├── model.go │ │ │ └── locale.go │ │ └── common/ │ │ ├── email.go │ │ ├── locale.go │ │ ├── attachment.go │ │ └── ids.go │ ├── ports/ │ │ ├── deliverystore.go │ │ ├── attemptstore.go │ │ ├── idempotencystore.go │ │ ├── commandsubscriber.go │ │ ├── attemptscheduler.go │ │ ├── templatecatalog.go │ │ ├── provider.go │ │ ├── clock.go │ │ └── idgenerator.go │ ├── service/ │ │ ├── acceptauthdelivery/ │ │ ├── acceptgenericdelivery/ │ │ ├── executeattempt/ │ │ ├── listdeliveries/ │ │ ├── getdelivery/ │ │ ├── listattempts/ │ │ └── resenddelivery/ │ ├── api/ │ │ ├── internalhttp/ │ │ └── streamcommand/ │ ├── adapters/ │ │ ├── redis/ │ │ ├── smtp/ │ │ ├── templates/ │ │ ├── stubprovider/ │ │ ├── clock/ │ │ └── id/ │ ├── worker/ │ │ ├── command_consumer.go │ │ ├── scheduler.go │ │ ├── attempt_worker.go │ │ └── cleanup_worker.go │ ├── observability/ │ │ ├── logging.go │ │ ├── metrics.go │ │ └── tracing.go │ └── testkit/ │ ├── redis.go │ ├── provider.go │ ├── clock.go │ ├── templates.go │ └── commands.go ├── templates/ │ └── ... ├── docs/ │ ├── README.md │ └── stage-01-vocabulary-and-ownership.md ├── README.md └── PLAN.md ``` ## Target Configuration Planned environment variables: - `MAIL_INTERNAL_HTTP_ADDR` with default `:8080` - `MAIL_REDIS_ADDR` required - `MAIL_REDIS_COMMAND_STREAM` with default `mail:delivery_commands` - `MAIL_REDIS_ATTEMPT_SCHEDULE_KEY` with default `mail:attempt_schedule` - `MAIL_REDIS_DEAD_LETTER_PREFIX` with default `mail:dead_letters:` - `MAIL_SMTP_MODE=stub|smtp` with default `stub` - `MAIL_SMTP_ADDR` required in `smtp` mode - `MAIL_SMTP_USERNAME` optional - `MAIL_SMTP_PASSWORD` optional - `MAIL_SMTP_FROM_EMAIL` required in `smtp` mode - `MAIL_SMTP_FROM_NAME` optional - `MAIL_SMTP_TIMEOUT` with default `15s` - `MAIL_TEMPLATE_DIR` with default `templates` - `MAIL_ATTEMPT_WORKER_CONCURRENCY` with default `4` - `MAIL_STREAM_BLOCK_TIMEOUT` with default `2s` - `MAIL_OPERATOR_REQUEST_TIMEOUT` with default `5s` - `MAIL_IDEMPOTENCY_TTL` with default `168h` - `MAIL_DELIVERY_TTL` with default `720h` - `MAIL_ATTEMPT_TTL` with default `2160h` ## ~~Stage 01.~~ Freeze Vocabulary and Ownership Status: implemented. ### Goal Freeze the service vocabulary and remove cross-service ambiguity before any implementation work starts. ### Tasks - Freeze that `Mail Service` owns delivery acceptance, attempts, retry, suppression, audit, and resend. - Freeze that `Notification Service` owns the business decision to request non-auth mail. - Freeze that `Auth / Session Service` uses the dedicated auth REST contract. - Freeze that `Geo Profile Service` routes optional admin mail through `Notification Service`, not directly to `Mail Service`. - Freeze that operator APIs are part of v1, not a later add-on. ### Artifacts - stable service README - aligned architecture references - list of accepted source values: - `authsession` - `notification` - `operator_resend` ### Exit Criteria - no document still treats `Geo Profile Service` as a direct `Mail Service` caller - no document claims that all `Mail Service` callers use the same transport ### Targeted Tests - documentation review only ## ~~Stage 02.~~ Define the Domain Model and State Rules Status: implemented. ### Goal Describe the logical delivery entities and freeze their valid state transitions. ### Tasks - Define `mail_delivery`, `mail_attempt`, `mail_idempotency_record`, `mail_template`, and `mail_dead_letter_entry`. - Freeze delivery states: - `accepted` - `queued` - `rendered` - `sending` - `sent` - `suppressed` - `failed` - `dead_letter` - Freeze attempt states: - `scheduled` - `in_progress` - `provider_accepted` - `provider_rejected` - `transport_failed` - `timed_out` - Freeze resend as clone-only with immutable parent history. - Freeze terminal-state resend eligibility: - `sent` - `suppressed` - `failed` - `dead_letter` ### Artifacts - domain models - state transition table - resend eligibility rules ### Exit Criteria - every use case can rely on one explicit state machine ### Targeted Tests - unit tests for allowed and forbidden delivery transitions - unit tests for resend eligibility ## ~~Stage 03.~~ Freeze the Redis Physical Model Status: implemented. ### Goal Lock the Redis layout so repository and scheduling adapters can be implemented without revisiting the data design. ### Tasks - Freeze primary keys: - `mail:deliveries:` - `mail:attempts::` - `mail:idempotency::` - `mail:dead_letters:` - Freeze scheduler and ingress keys: - `mail:delivery_commands` - `mail:attempt_schedule` - Freeze search indexes: - `mail:idx:recipient:` - `mail:idx:status:` - `mail:idx:source:` - `mail:idx:template:` - `mail:idx:idempotency::` - `mail:idx:created_at` - Freeze storage format: - canonical JSON blob in Redis string keys for delivery and attempt records - sorted-set indexes scored by `created_at_ms` - Explicitly reject Redis storage for template contents in v1 because the template catalog is filesystem-backed. - Freeze retention: - idempotency `7d` - delivery `30d` - attempts and dead letters `90d` - Freeze atomic write boundaries: - reserve idempotency - store delivery - schedule first attempt - create resend clone ### Artifacts - Redis key catalog - atomicity notes for Lua or optimistic transaction usage - retention and cleanup notes ### Exit Criteria - the Redis adapters can be implemented without unresolved naming or transactional questions ### Targeted Tests - repository tests for key naming - atomicity tests for duplicate idempotency races - cleanup tests for TTL-driven record expiry ## ~~Stage 04.~~ Freeze the Auth REST Contract Status: implemented. ### Goal Define the direct trusted contract from `Auth / Session Service`. ### Tasks - Freeze route `POST /api/v1/internal/login-code-deliveries`. - Freeze required `Idempotency-Key` header. - Freeze body fields: - `email` - `code` - `locale` - Freeze success outcomes: - `sent` - `suppressed` - Freeze trusted error codes: - `invalid_request` - `internal_error` - `service_unavailable` - Freeze the meaning of `sent` as durable acceptance into the mail pipeline, not immediate SMTP completion. - Freeze auth-client behavior of no automatic retry on upstream or transport failures. ### Artifacts - request/response DTOs - handler contract notes - error mapping table ### Exit Criteria - the auth REST client and server can be built from the frozen contract ### Targeted Tests - strict JSON decoding tests - required header validation tests - idempotent repeat request tests - sent versus suppressed response tests ## ~~Stage 05.~~ Freeze the Async Generic Contract Status: implemented. ### Goal Define the exact `Redis Streams` command format used by `Notification Service`. ### Tasks - Freeze the stream name `mail:delivery_commands`. - Freeze required fields: - `delivery_id` - `source` - `payload_mode` - `idempotency_key` - `requested_at_ms` - `payload_json` - Freeze optional fields: - `request_id` - `trace_id` - Freeze that async `source` accepts only: - `notification` - Freeze payload modes: - `rendered` - `template` - Freeze the rendered payload shape with: - recipient envelope - `subject` - `text_body` - optional `html_body` - attachments - Freeze the template payload shape with: - recipient envelope - `template_id` - `locale` - `variables` - attachments - Freeze duplicate handling by `(source, idempotency_key)`. - Freeze `request_id` and `trace_id` as tracing-only metadata excluded from the idempotency fingerprint. - Freeze the malformed-command path into dedicated operator-visible `mail_malformed_command_entry` state outside `mail_delivery`. ### Artifacts - stream field catalog - typed stream command contract - `AsyncAPI` specification - `payload_json` schema notes - malformed command handling rules ### Exit Criteria - `Notification Service` can publish one command without needing a follow-up design round ### Targeted Tests - strict stream-entry decoding tests - duplicate idempotency tests - malformed command recording-contract tests - rendered and template payload acceptance tests ## ~~Stage 06.~~ Build the Runnable Service Skeleton Status: implemented. ### Goal Create one runnable internal process with config, Redis, HTTP server, and workers. ### Tasks - Implement `cmd/mail`. - Implement config loading and validation. - Wire Redis client, template catalog, provider adapter, HTTP server, and workers. - Add graceful shutdown across: - HTTP server - stream consumer - scheduler - attempt workers - cleanup worker - Add startup validation for required Redis and provider config. ### Artifacts - runnable `cmd/mail` - bootstrap wiring - graceful shutdown logic ### Exit Criteria - the process starts and stops cleanly with valid config ### Targeted Tests - startup with stub mode - startup failure on invalid Redis config - graceful shutdown without leaked goroutines ## ~~Stage 07.~~ Implement Auth Delivery Acceptance Status: implemented. ### Goal Accept auth-code deliveries synchronously and durably. ### Tasks - Implement the auth acceptance use case. - Validate `email`, `code`, `locale`, and `Idempotency-Key`. - Classify explicit suppression without treating it as failure. - Persist delivery, idempotency record, and first scheduled attempt atomically. - Keep `suppressed` acceptance as the explicit exception that persists only delivery plus idempotency state without a first attempt. - Return stable `sent` or `suppressed`. - Add telemetry for accepted auth requests. - Reject mismatched replays with the same idempotency key. ### Artifacts - auth acceptance service - internal HTTP handler - DTO validation and error mapping ### Exit Criteria - auth requests create one durable delivery or fail closed without partial state ### Targeted Tests - valid request accepted as `sent` - valid request accepted as `suppressed` without attempt scheduling - duplicate identical request returns same result - duplicate mismatched request is rejected - Redis persistence failure surfaces `503 service_unavailable` ## ~~Stage 08.~~ Implement Async Generic Acceptance Status: implemented. ### Goal Consume generic mail commands from `Redis Streams` and convert them into durable deliveries. ### Tasks - Implement plain `XREAD`-based stream consumption. - Decode and validate stream entries. - Persist one delivery and schedule one first attempt atomically. - Advance the consumer offset only after durable acceptance. - Meter malformed entries and record them as operator-visible `mail_malformed_command_entry` state. - Keep duplicate idempotency requests as no-op accepts. ### Artifacts - stream consumer worker - generic acceptance service - malformed command recorder ### Exit Criteria - valid commands are never lost after they are read from the stream ### Targeted Tests - rendered command acceptance - template command acceptance - duplicate command no-op behavior - malformed command recording - consumer restart continuing from the correct offset ## ~~Stage 09.~~ Implement the Template Catalog and Rendering Status: implemented. ### Goal Provide deterministic rendering for template-mode deliveries. ### Tasks - Implement filesystem-backed template discovery under `templates/`. - Freeze directory layout as `//subject.tmpl`, `text.tmpl`, and optional `html.tmpl`. - Implement locale validation and fallback to `en`. - Record `locale_fallback_used`. - Validate required variables before rendering. - Reject unknown missing required variables deterministically. - Add dedicated auth template family: - `auth.login_code` ### Artifacts - template catalog adapter - renderer - auth template assets ### Exit Criteria - template mode always produces one final deterministic subject/body bundle or one classified render failure ### Targeted Tests - exact locale render - unsupported locale fallback to `en` - missing `en` fallback failure - missing required variable failure - deterministic render snapshots ## ~~Stage 10.~~ Implement the Provider Layer Status: implemented. ### Goal Provide concrete delivery adapters for SMTP and deterministic local testing. ### Tasks - Freeze provider result classifications: - `accepted` - `suppressed` - `transient_failure` - `permanent_failure` - Implement SMTP adapter with: - dial/connect - optional auth - envelope mapping - MIME body construction - inline attachment mapping - timeout classification - Implement stub adapter with scriptable outcomes. - Redact provider summaries before storing them in audit fields. ### Artifacts - SMTP adapter - stub provider adapter - MIME builder helpers ### Exit Criteria - one attempt can be executed against either adapter with stable classified outcomes ### Targeted Tests - SMTP request construction tests - attachment mapping tests - timeout classification tests - stub scripted outcome tests ## ~~Stage 11.~~ Implement the Attempt Scheduler and Workers Status: implemented. ### Goal Run due attempts exactly once per scheduled slot and apply retry policy. ### Tasks - Implement `mail:attempt_schedule` claim logic. - Enforce at most one active attempt per delivery. - Execute provider calls through the attempt service. - Schedule retries at: - `1m` - `5m` - `30m` - Transition exhausted deliveries to `dead_letter`. - Keep recoverable state across process restarts. - Ensure claimed but unfinished work becomes visible again after worker crash recovery. ### Artifacts - scheduler worker - attempt worker - retry planner - dead-letter writer ### Exit Criteria - the service survives restarts and resumes scheduled work without duplicate attempt ownership ### Targeted Tests - immediate first attempt - transient retry chain to success - retry exhaustion to dead letter - crash recovery of in-progress attempt ownership ## ~~Stage 12.~~ Implement the Operator API Status: implemented. ### Goal Provide trusted read and resend controls without direct Redis access. ### Tasks - Implement delivery lookup by `delivery_id`. - Implement filtered list with deterministic cursor pagination. - Implement attempt history reads. - Implement resend clone creation. - Freeze cursor format as opaque base64 of `created_at_ms:delivery_id`. - Reject resend for non-terminal statuses. ### Artifacts - operator HTTP handlers - list query DTOs - resend service ### Exit Criteria - operators can inspect and resend deliveries safely through the service API ### Targeted Tests - list filtering by recipient, status, source, template, and idempotency key - cursor pagination tests - resend allowed for terminal states only - resend creates a linked clone rather than mutating the original ## ~~Stage 13.~~ Add Observability and Runbook Coverage Status: implemented. ### Goal Make the service operable without reading the code. ### Tasks - Add counters for: - accepted auth deliveries - accepted generic deliveries - suppressed deliveries - delivery statuses - attempt outcomes - dead letters - locale fallback - Add gauges or histograms for: - scheduled depth - oldest scheduled age - SMTP latency - Add structured logs with: - `delivery_id` - `source` - `template_id` - `attempt_no` - Add traces around: - acceptance - rendering - provider send - resend - Write operator runbook content for: - backlog growth - dead-letter spikes - repeated suppressions - SMTP auth or timeout failures - malformed stream commands ### Artifacts - telemetry runtime - logging helpers - runbook section drafts ### Exit Criteria - common failure modes are visible and actionable ### Targeted Tests - metric emission tests - log field presence tests - trace smoke tests where practical ## ~~Stage 14.~~ Complete the Test Matrix Status: implemented. ### Goal Reach a safe verification baseline across unit, integration, and end-to-end scenarios. ### Tasks - Add unit tests for: - validation - state transitions - idempotency - rendering - provider classification - retry planning - Add integration tests for: - auth REST to durable delivery - stream command to durable delivery - attempt execution against stub provider - operator API against Redis-backed state - Add end-to-end scenarios for: - auth `sent` - auth `suppressed` - template locale fallback - transient retry to success - retry exhaustion to dead letter - duplicate idempotency key - resend clone - graceful shutdown with pending work ### Artifacts - unit test suite - integration harness - end-to-end scenarios ### Exit Criteria - the planned behavior is covered closely enough to refactor safely ### Targeted Tests - execute the smallest relevant subset: - `go test ./mail/...` - focused integration packages once they exist ## ~~Stage 15.~~ Align Cross-Service Documentation Status: implemented. ### Goal Update existing documentation so the repository tells one coherent story about `Mail Service`. ### Tasks - Update `ARCHITECTURE.md`: - direct auth mail is synchronous trusted REST - generic notification mail is asynchronous through `Notification Service` - clarify that durable acceptance may precede SMTP completion - Update `geoprofile` docs: - remove direct `Geo Profile Service -> Mail Service` - route optional admin mail through `Notification Service` - Update `authsession` docs: - clarify localized mail acceptance semantics - clarify that `sent` means accepted into the mail pipeline - Update `gateway` docs: - document `Accept-Language` as the public auth locale source - keep JSON bodies unchanged - Update `user` docs: - document the auth-provided preferred-language candidate rule for new-user creation ### Artifacts - aligned service READMEs and docs - aligned architecture narrative ### Exit Criteria - no first-class document contradicts the new `Mail Service` model ### Targeted Tests - documentation review - contract-document sync review ## Final Acceptance Checklist The implementation is complete only when all of the following hold: - the process starts with Redis and stub provider config - auth REST intake works with explicit idempotency - async generic stream intake works with duplicate suppression - template rendering and locale fallback are deterministic - SMTP and stub providers both work through the same port - retries and dead-letter flow operate after restarts - operator reads and resend clone work - metrics, logs, and traces cover the main failure modes - repository documentation is aligned with the final service model