21 KiB
Mail Service Implementation Plan
This plan has been already implemented and stays here for historical reasons.
It should NOT be threated as source of truth for service functionality.
Summary
This plan describes the full v1 implementation path for galaxy/mail.
It is intentionally decision-complete: the implementer should not need to
invent service boundaries, storage layout, contracts, or retry semantics while
building the service.
The target outcome is one runnable internal service that:
- accepts auth-code mail synchronously over trusted REST
- consumes generic non-auth mail asynchronously from
Redis Streams - renders templates or accepts pre-rendered content
- delivers through SMTP or a deterministic stub
- retries bounded transient failures
- stores durable delivery audit state
- exposes trusted operator reads and resend controls
Global Rules
- keep one logical delivery equal to one SMTP envelope
- keep
suppressedseparate from failure - require explicit idempotency for every accepted command
- prefer deterministic Redis-backed scheduling over in-memory timers
- keep operator inspection possible without direct Redis access
- treat filesystem templates as the v1 source of truth
- keep public and trusted contracts explicit and versionable
Target Runtime Layout
mail/
├── cmd/
│ └── mail/
│ └── main.go
├── internal/
│ ├── app/
│ │ ├── app.go
│ │ ├── bootstrap.go
│ │ └── runtime.go
│ ├── config/
│ │ ├── config.go
│ │ ├── env.go
│ │ └── validation.go
│ ├── domain/
│ │ ├── delivery/
│ │ │ ├── model.go
│ │ │ ├── state.go
│ │ │ └── errors.go
│ │ ├── attempt/
│ │ │ ├── model.go
│ │ │ ├── state.go
│ │ │ └── policy.go
│ │ ├── idempotency/
│ │ │ └── model.go
│ │ ├── template/
│ │ │ ├── model.go
│ │ │ └── locale.go
│ │ └── common/
│ │ ├── email.go
│ │ ├── locale.go
│ │ ├── attachment.go
│ │ └── ids.go
│ ├── ports/
│ │ ├── deliverystore.go
│ │ ├── attemptstore.go
│ │ ├── idempotencystore.go
│ │ ├── commandsubscriber.go
│ │ ├── attemptscheduler.go
│ │ ├── templatecatalog.go
│ │ ├── provider.go
│ │ ├── clock.go
│ │ └── idgenerator.go
│ ├── service/
│ │ ├── acceptauthdelivery/
│ │ ├── acceptgenericdelivery/
│ │ ├── executeattempt/
│ │ ├── listdeliveries/
│ │ ├── getdelivery/
│ │ ├── listattempts/
│ │ └── resenddelivery/
│ ├── api/
│ │ ├── internalhttp/
│ │ └── streamcommand/
│ ├── adapters/
│ │ ├── redis/
│ │ ├── smtp/
│ │ ├── templates/
│ │ ├── stubprovider/
│ │ ├── clock/
│ │ └── id/
│ ├── worker/
│ │ ├── command_consumer.go
│ │ ├── scheduler.go
│ │ ├── attempt_worker.go
│ │ └── cleanup_worker.go
│ ├── observability/
│ │ ├── logging.go
│ │ ├── metrics.go
│ │ └── tracing.go
│ └── testkit/
│ ├── redis.go
│ ├── provider.go
│ ├── clock.go
│ ├── templates.go
│ └── commands.go
├── templates/
│ └── ...
├── docs/
│ ├── README.md
│ └── stage-01-vocabulary-and-ownership.md
├── README.md
└── PLAN.md
Target Configuration
Planned environment variables:
MAIL_INTERNAL_HTTP_ADDRwith default:8080MAIL_REDIS_ADDRrequiredMAIL_REDIS_COMMAND_STREAMwith defaultmail:delivery_commandsMAIL_REDIS_ATTEMPT_SCHEDULE_KEYwith defaultmail:attempt_scheduleMAIL_REDIS_DEAD_LETTER_PREFIXwith defaultmail:dead_letters:MAIL_SMTP_MODE=stub|smtpwith defaultstubMAIL_SMTP_ADDRrequired insmtpmodeMAIL_SMTP_USERNAMEoptionalMAIL_SMTP_PASSWORDoptionalMAIL_SMTP_FROM_EMAILrequired insmtpmodeMAIL_SMTP_FROM_NAMEoptionalMAIL_SMTP_TIMEOUTwith default15sMAIL_TEMPLATE_DIRwith defaulttemplatesMAIL_ATTEMPT_WORKER_CONCURRENCYwith default4MAIL_STREAM_BLOCK_TIMEOUTwith default2sMAIL_OPERATOR_REQUEST_TIMEOUTwith default5sMAIL_IDEMPOTENCY_TTLwith default168hMAIL_DELIVERY_TTLwith default720hMAIL_ATTEMPT_TTLwith default2160h
Stage 01. Freeze Vocabulary and Ownership
Status: implemented.
Goal
Freeze the service vocabulary and remove cross-service ambiguity before any implementation work starts.
Tasks
- Freeze that
Mail Serviceowns delivery acceptance, attempts, retry, suppression, audit, and resend. - Freeze that
Notification Serviceowns the business decision to request non-auth mail. - Freeze that
Auth / Session Serviceuses the dedicated auth REST contract. - Freeze that
Geo Profile Serviceroutes optional admin mail throughNotification Service, not directly toMail Service. - Freeze that operator APIs are part of v1, not a later add-on.
Artifacts
- stable service README
- aligned architecture references
- list of accepted source values:
authsessionnotificationoperator_resend
Exit Criteria
- no document still treats
Geo Profile Serviceas a directMail Servicecaller - no document claims that all
Mail Servicecallers use the same transport
Targeted Tests
- documentation review only
Stage 02. Define the Domain Model and State Rules
Status: implemented.
Goal
Describe the logical delivery entities and freeze their valid state transitions.
Tasks
- Define
mail_delivery,mail_attempt,mail_idempotency_record,mail_template, andmail_dead_letter_entry. - Freeze delivery states:
acceptedqueuedrenderedsendingsentsuppressedfaileddead_letter
- Freeze attempt states:
scheduledin_progressprovider_acceptedprovider_rejectedtransport_failedtimed_out
- Freeze resend as clone-only with immutable parent history.
- Freeze terminal-state resend eligibility:
sentsuppressedfaileddead_letter
Artifacts
- domain models
- state transition table
- resend eligibility rules
Exit Criteria
- every use case can rely on one explicit state machine
Targeted Tests
- unit tests for allowed and forbidden delivery transitions
- unit tests for resend eligibility
Stage 03. Freeze the Redis Physical Model
Status: implemented.
Goal
Lock the Redis layout so repository and scheduling adapters can be implemented without revisiting the data design.
Tasks
- Freeze primary keys:
mail:deliveries:<delivery_id>mail:attempts:<delivery_id>:<attempt_no>mail:idempotency:<source>:<idempotency_key>mail:dead_letters:<delivery_id>
- Freeze scheduler and ingress keys:
mail:delivery_commandsmail:attempt_schedule
- Freeze search indexes:
mail:idx:recipient:<email>mail:idx:status:<status>mail:idx:source:<source>mail:idx:template:<template_id>mail:idx:idempotency:<source>:<idempotency_key>mail:idx:created_at
- Freeze storage format:
- canonical JSON blob in Redis string keys for delivery and attempt records
- sorted-set indexes scored by
created_at_ms
- Explicitly reject Redis storage for template contents in v1 because the template catalog is filesystem-backed.
- Freeze retention:
- idempotency
7d - delivery
30d - attempts and dead letters
90d
- idempotency
- Freeze atomic write boundaries:
- reserve idempotency
- store delivery
- schedule first attempt
- create resend clone
Artifacts
- Redis key catalog
- atomicity notes for Lua or optimistic transaction usage
- retention and cleanup notes
Exit Criteria
- the Redis adapters can be implemented without unresolved naming or transactional questions
Targeted Tests
- repository tests for key naming
- atomicity tests for duplicate idempotency races
- cleanup tests for TTL-driven record expiry
Stage 04. Freeze the Auth REST Contract
Status: implemented.
Goal
Define the direct trusted contract from Auth / Session Service.
Tasks
- Freeze route
POST /api/v1/internal/login-code-deliveries. - Freeze required
Idempotency-Keyheader. - Freeze body fields:
emailcodelocale
- Freeze success outcomes:
sentsuppressed
- Freeze trusted error codes:
invalid_requestinternal_errorservice_unavailable
- Freeze the meaning of
sentas durable acceptance into the mail pipeline, not immediate SMTP completion. - Freeze auth-client behavior of no automatic retry on upstream or transport failures.
Artifacts
- request/response DTOs
- handler contract notes
- error mapping table
Exit Criteria
- the auth REST client and server can be built from the frozen contract
Targeted Tests
- strict JSON decoding tests
- required header validation tests
- idempotent repeat request tests
- sent versus suppressed response tests
Stage 05. Freeze the Async Generic Contract
Status: implemented.
Goal
Define the exact Redis Streams command format used by
Notification Service.
Tasks
- Freeze the stream name
mail:delivery_commands. - Freeze required fields:
delivery_idsourcepayload_modeidempotency_keyrequested_at_mspayload_json
- Freeze optional fields:
request_idtrace_id
- Freeze that async
sourceaccepts only:notification
- Freeze payload modes:
renderedtemplate
- Freeze the rendered payload shape with:
- recipient envelope
subjecttext_body- optional
html_body - attachments
- Freeze the template payload shape with:
- recipient envelope
template_idlocalevariables- attachments
- Freeze duplicate handling by
(source, idempotency_key). - Freeze
request_idandtrace_idas tracing-only metadata excluded from the idempotency fingerprint. - Freeze the malformed-command path into dedicated operator-visible
mail_malformed_command_entrystate outsidemail_delivery.
Artifacts
- stream field catalog
- typed stream command contract
AsyncAPIspecificationpayload_jsonschema notes- malformed command handling rules
Exit Criteria
Notification Servicecan publish one command without needing a follow-up design round
Targeted Tests
- strict stream-entry decoding tests
- duplicate idempotency tests
- malformed command recording-contract tests
- rendered and template payload acceptance tests
Stage 06. Build the Runnable Service Skeleton
Status: implemented.
Goal
Create one runnable internal process with config, Redis, HTTP server, and workers.
Tasks
- Implement
cmd/mail. - Implement config loading and validation.
- Wire Redis client, template catalog, provider adapter, HTTP server, and workers.
- Add graceful shutdown across:
- HTTP server
- stream consumer
- scheduler
- attempt workers
- cleanup worker
- Add startup validation for required Redis and provider config.
Artifacts
- runnable
cmd/mail - bootstrap wiring
- graceful shutdown logic
Exit Criteria
- the process starts and stops cleanly with valid config
Targeted Tests
- startup with stub mode
- startup failure on invalid Redis config
- graceful shutdown without leaked goroutines
Stage 07. Implement Auth Delivery Acceptance
Status: implemented.
Goal
Accept auth-code deliveries synchronously and durably.
Tasks
- Implement the auth acceptance use case.
- Validate
email,code,locale, andIdempotency-Key. - Classify explicit suppression without treating it as failure.
- Persist delivery, idempotency record, and first scheduled attempt atomically.
- Keep
suppressedacceptance as the explicit exception that persists only delivery plus idempotency state without a first attempt. - Return stable
sentorsuppressed. - Add telemetry for accepted auth requests.
- Reject mismatched replays with the same idempotency key.
Artifacts
- auth acceptance service
- internal HTTP handler
- DTO validation and error mapping
Exit Criteria
- auth requests create one durable delivery or fail closed without partial state
Targeted Tests
- valid request accepted as
sent - valid request accepted as
suppressedwithout attempt scheduling - duplicate identical request returns same result
- duplicate mismatched request is rejected
- Redis persistence failure surfaces
503 service_unavailable
Stage 08. Implement Async Generic Acceptance
Status: implemented.
Goal
Consume generic mail commands from Redis Streams and convert them into
durable deliveries.
Tasks
- Implement plain
XREAD-based stream consumption. - Decode and validate stream entries.
- Persist one delivery and schedule one first attempt atomically.
- Advance the consumer offset only after durable acceptance.
- Meter malformed entries and record them as operator-visible
mail_malformed_command_entrystate. - Keep duplicate idempotency requests as no-op accepts.
Artifacts
- stream consumer worker
- generic acceptance service
- malformed command recorder
Exit Criteria
- valid commands are never lost after they are read from the stream
Targeted Tests
- rendered command acceptance
- template command acceptance
- duplicate command no-op behavior
- malformed command recording
- consumer restart continuing from the correct offset
Stage 09. Implement the Template Catalog and Rendering
Status: implemented.
Goal
Provide deterministic rendering for template-mode deliveries.
Tasks
- Implement filesystem-backed template discovery under
templates/. - Freeze directory layout as
<template_id>/<locale>/subject.tmpl,text.tmpl, and optionalhtml.tmpl. - Implement locale validation and fallback to
en. - Record
locale_fallback_used. - Validate required variables before rendering.
- Reject unknown missing required variables deterministically.
- Add dedicated auth template family:
auth.login_code
Artifacts
- template catalog adapter
- renderer
- auth template assets
Exit Criteria
- template mode always produces one final deterministic subject/body bundle or one classified render failure
Targeted Tests
- exact locale render
- unsupported locale fallback to
en - missing
enfallback failure - missing required variable failure
- deterministic render snapshots
Stage 10. Implement the Provider Layer
Status: implemented.
Goal
Provide concrete delivery adapters for SMTP and deterministic local testing.
Tasks
- Freeze provider result classifications:
acceptedsuppressedtransient_failurepermanent_failure
- Implement SMTP adapter with:
- dial/connect
- optional auth
- envelope mapping
- MIME body construction
- inline attachment mapping
- timeout classification
- Implement stub adapter with scriptable outcomes.
- Redact provider summaries before storing them in audit fields.
Artifacts
- SMTP adapter
- stub provider adapter
- MIME builder helpers
Exit Criteria
- one attempt can be executed against either adapter with stable classified outcomes
Targeted Tests
- SMTP request construction tests
- attachment mapping tests
- timeout classification tests
- stub scripted outcome tests
Stage 11. Implement the Attempt Scheduler and Workers
Status: implemented.
Goal
Run due attempts exactly once per scheduled slot and apply retry policy.
Tasks
- Implement
mail:attempt_scheduleclaim logic. - Enforce at most one active attempt per delivery.
- Execute provider calls through the attempt service.
- Schedule retries at:
1m5m30m
- Transition exhausted deliveries to
dead_letter. - Keep recoverable state across process restarts.
- Ensure claimed but unfinished work becomes visible again after worker crash recovery.
Artifacts
- scheduler worker
- attempt worker
- retry planner
- dead-letter writer
Exit Criteria
- the service survives restarts and resumes scheduled work without duplicate attempt ownership
Targeted Tests
- immediate first attempt
- transient retry chain to success
- retry exhaustion to dead letter
- crash recovery of in-progress attempt ownership
Stage 12. Implement the Operator API
Status: implemented.
Goal
Provide trusted read and resend controls without direct Redis access.
Tasks
- Implement delivery lookup by
delivery_id. - Implement filtered list with deterministic cursor pagination.
- Implement attempt history reads.
- Implement resend clone creation.
- Freeze cursor format as opaque base64 of
created_at_ms:delivery_id. - Reject resend for non-terminal statuses.
Artifacts
- operator HTTP handlers
- list query DTOs
- resend service
Exit Criteria
- operators can inspect and resend deliveries safely through the service API
Targeted Tests
- list filtering by recipient, status, source, template, and idempotency key
- cursor pagination tests
- resend allowed for terminal states only
- resend creates a linked clone rather than mutating the original
Stage 13. Add Observability and Runbook Coverage
Status: implemented.
Goal
Make the service operable without reading the code.
Tasks
- Add counters for:
- accepted auth deliveries
- accepted generic deliveries
- suppressed deliveries
- delivery statuses
- attempt outcomes
- dead letters
- locale fallback
- Add gauges or histograms for:
- scheduled depth
- oldest scheduled age
- SMTP latency
- Add structured logs with:
delivery_idsourcetemplate_idattempt_no
- Add traces around:
- acceptance
- rendering
- provider send
- resend
- Write operator runbook content for:
- backlog growth
- dead-letter spikes
- repeated suppressions
- SMTP auth or timeout failures
- malformed stream commands
Artifacts
- telemetry runtime
- logging helpers
- runbook section drafts
Exit Criteria
- common failure modes are visible and actionable
Targeted Tests
- metric emission tests
- log field presence tests
- trace smoke tests where practical
Stage 14. Complete the Test Matrix
Status: implemented.
Goal
Reach a safe verification baseline across unit, integration, and end-to-end scenarios.
Tasks
- Add unit tests for:
- validation
- state transitions
- idempotency
- rendering
- provider classification
- retry planning
- Add integration tests for:
- auth REST to durable delivery
- stream command to durable delivery
- attempt execution against stub provider
- operator API against Redis-backed state
- Add end-to-end scenarios for:
- auth
sent - auth
suppressed - template locale fallback
- transient retry to success
- retry exhaustion to dead letter
- duplicate idempotency key
- resend clone
- graceful shutdown with pending work
- auth
Artifacts
- unit test suite
- integration harness
- end-to-end scenarios
Exit Criteria
- the planned behavior is covered closely enough to refactor safely
Targeted Tests
- execute the smallest relevant subset:
go test ./mail/...- focused integration packages once they exist
Stage 15. Align Cross-Service Documentation
Status: implemented.
Goal
Update existing documentation so the repository tells one coherent story about
Mail Service.
Tasks
- Update
ARCHITECTURE.md:- direct auth mail is synchronous trusted REST
- generic notification mail is asynchronous through
Notification Service - clarify that durable acceptance may precede SMTP completion
- Update
geoprofiledocs:- remove direct
Geo Profile Service -> Mail Service - route optional admin mail through
Notification Service
- remove direct
- Update
authsessiondocs:- clarify localized mail acceptance semantics
- clarify that
sentmeans accepted into the mail pipeline
- Update
gatewaydocs:- document
Accept-Languageas the public auth locale source - keep JSON bodies unchanged
- document
- Update
userdocs:- document the auth-provided preferred-language candidate rule for new-user creation
Artifacts
- aligned service READMEs and docs
- aligned architecture narrative
Exit Criteria
- no first-class document contradicts the new
Mail Servicemodel
Targeted Tests
- documentation review
- contract-document sync review
Final Acceptance Checklist
The implementation is complete only when all of the following hold:
- the process starts with Redis and stub provider config
- auth REST intake works with explicit idempotency
- async generic stream intake works with duplicate suppression
- template rendering and locale fallback are deterministic
- SMTP and stub providers both work through the same port
- retries and dead-letter flow operate after restarts
- operator reads and resend clone work
- metrics, logs, and traces cover the main failure modes
- repository documentation is aligned with the final service model