feat: mail service

This commit is contained in:
Ilia Denisov
2026-04-17 18:39:16 +02:00
committed by GitHub
parent 23ffcb7535
commit 5b7593e6f6
183 changed files with 31215 additions and 248 deletions
+834
View File
@@ -0,0 +1,834 @@
# Mail Service Implementation Plan
This plan has been already implemented and stays here for historical reasons.
It should NOT be threated as source of truth for service functionality.
## Summary
This plan describes the full v1 implementation path for `galaxy/mail`.
It is intentionally decision-complete: the implementer should not need to
invent service boundaries, storage layout, contracts, or retry semantics while
building the service.
The target outcome is one runnable internal service that:
- accepts auth-code mail synchronously over trusted REST
- consumes generic non-auth mail asynchronously from `Redis Streams`
- renders templates or accepts pre-rendered content
- delivers through SMTP or a deterministic stub
- retries bounded transient failures
- stores durable delivery audit state
- exposes trusted operator reads and resend controls
## Global Rules
- keep one logical delivery equal to one SMTP envelope
- keep `suppressed` separate from failure
- require explicit idempotency for every accepted command
- prefer deterministic Redis-backed scheduling over in-memory timers
- keep operator inspection possible without direct Redis access
- treat filesystem templates as the v1 source of truth
- keep public and trusted contracts explicit and versionable
## Target Runtime Layout
```text
mail/
├── cmd/
│ └── mail/
│ └── main.go
├── internal/
│ ├── app/
│ │ ├── app.go
│ │ ├── bootstrap.go
│ │ └── runtime.go
│ ├── config/
│ │ ├── config.go
│ │ ├── env.go
│ │ └── validation.go
│ ├── domain/
│ │ ├── delivery/
│ │ │ ├── model.go
│ │ │ ├── state.go
│ │ │ └── errors.go
│ │ ├── attempt/
│ │ │ ├── model.go
│ │ │ ├── state.go
│ │ │ └── policy.go
│ │ ├── idempotency/
│ │ │ └── model.go
│ │ ├── template/
│ │ │ ├── model.go
│ │ │ └── locale.go
│ │ └── common/
│ │ ├── email.go
│ │ ├── locale.go
│ │ ├── attachment.go
│ │ └── ids.go
│ ├── ports/
│ │ ├── deliverystore.go
│ │ ├── attemptstore.go
│ │ ├── idempotencystore.go
│ │ ├── commandsubscriber.go
│ │ ├── attemptscheduler.go
│ │ ├── templatecatalog.go
│ │ ├── provider.go
│ │ ├── clock.go
│ │ └── idgenerator.go
│ ├── service/
│ │ ├── acceptauthdelivery/
│ │ ├── acceptgenericdelivery/
│ │ ├── executeattempt/
│ │ ├── listdeliveries/
│ │ ├── getdelivery/
│ │ ├── listattempts/
│ │ └── resenddelivery/
│ ├── api/
│ │ ├── internalhttp/
│ │ └── streamcommand/
│ ├── adapters/
│ │ ├── redis/
│ │ ├── smtp/
│ │ ├── templates/
│ │ ├── stubprovider/
│ │ ├── clock/
│ │ └── id/
│ ├── worker/
│ │ ├── command_consumer.go
│ │ ├── scheduler.go
│ │ ├── attempt_worker.go
│ │ └── cleanup_worker.go
│ ├── observability/
│ │ ├── logging.go
│ │ ├── metrics.go
│ │ └── tracing.go
│ └── testkit/
│ ├── redis.go
│ ├── provider.go
│ ├── clock.go
│ ├── templates.go
│ └── commands.go
├── templates/
│ └── ...
├── docs/
│ ├── README.md
│ └── stage-01-vocabulary-and-ownership.md
├── README.md
└── PLAN.md
```
## Target Configuration
Planned environment variables:
- `MAIL_INTERNAL_HTTP_ADDR` with default `:8080`
- `MAIL_REDIS_ADDR` required
- `MAIL_REDIS_COMMAND_STREAM` with default `mail:delivery_commands`
- `MAIL_REDIS_ATTEMPT_SCHEDULE_KEY` with default `mail:attempt_schedule`
- `MAIL_REDIS_DEAD_LETTER_PREFIX` with default `mail:dead_letters:`
- `MAIL_SMTP_MODE=stub|smtp` with default `stub`
- `MAIL_SMTP_ADDR` required in `smtp` mode
- `MAIL_SMTP_USERNAME` optional
- `MAIL_SMTP_PASSWORD` optional
- `MAIL_SMTP_FROM_EMAIL` required in `smtp` mode
- `MAIL_SMTP_FROM_NAME` optional
- `MAIL_SMTP_TIMEOUT` with default `15s`
- `MAIL_TEMPLATE_DIR` with default `templates`
- `MAIL_ATTEMPT_WORKER_CONCURRENCY` with default `4`
- `MAIL_STREAM_BLOCK_TIMEOUT` with default `2s`
- `MAIL_OPERATOR_REQUEST_TIMEOUT` with default `5s`
- `MAIL_IDEMPOTENCY_TTL` with default `168h`
- `MAIL_DELIVERY_TTL` with default `720h`
- `MAIL_ATTEMPT_TTL` with default `2160h`
## ~~Stage 01.~~ Freeze Vocabulary and Ownership
Status: implemented.
### Goal
Freeze the service vocabulary and remove cross-service ambiguity before any
implementation work starts.
### Tasks
- Freeze that `Mail Service` owns delivery acceptance, attempts, retry,
suppression, audit, and resend.
- Freeze that `Notification Service` owns the business decision to request
non-auth mail.
- Freeze that `Auth / Session Service` uses the dedicated auth REST contract.
- Freeze that `Geo Profile Service` routes optional admin mail through
`Notification Service`, not directly to `Mail Service`.
- Freeze that operator APIs are part of v1, not a later add-on.
### Artifacts
- stable service README
- aligned architecture references
- list of accepted source values:
- `authsession`
- `notification`
- `operator_resend`
### Exit Criteria
- no document still treats `Geo Profile Service` as a direct `Mail Service`
caller
- no document claims that all `Mail Service` callers use the same transport
### Targeted Tests
- documentation review only
## ~~Stage 02.~~ Define the Domain Model and State Rules
Status: implemented.
### Goal
Describe the logical delivery entities and freeze their valid state
transitions.
### Tasks
- Define `mail_delivery`, `mail_attempt`, `mail_idempotency_record`,
`mail_template`, and `mail_dead_letter_entry`.
- Freeze delivery states:
- `accepted`
- `queued`
- `rendered`
- `sending`
- `sent`
- `suppressed`
- `failed`
- `dead_letter`
- Freeze attempt states:
- `scheduled`
- `in_progress`
- `provider_accepted`
- `provider_rejected`
- `transport_failed`
- `timed_out`
- Freeze resend as clone-only with immutable parent history.
- Freeze terminal-state resend eligibility:
- `sent`
- `suppressed`
- `failed`
- `dead_letter`
### Artifacts
- domain models
- state transition table
- resend eligibility rules
### Exit Criteria
- every use case can rely on one explicit state machine
### Targeted Tests
- unit tests for allowed and forbidden delivery transitions
- unit tests for resend eligibility
## ~~Stage 03.~~ Freeze the Redis Physical Model
Status: implemented.
### Goal
Lock the Redis layout so repository and scheduling adapters can be implemented
without revisiting the data design.
### Tasks
- Freeze primary keys:
- `mail:deliveries:<delivery_id>`
- `mail:attempts:<delivery_id>:<attempt_no>`
- `mail:idempotency:<source>:<idempotency_key>`
- `mail:dead_letters:<delivery_id>`
- Freeze scheduler and ingress keys:
- `mail:delivery_commands`
- `mail:attempt_schedule`
- Freeze search indexes:
- `mail:idx:recipient:<email>`
- `mail:idx:status:<status>`
- `mail:idx:source:<source>`
- `mail:idx:template:<template_id>`
- `mail:idx:idempotency:<source>:<idempotency_key>`
- `mail:idx:created_at`
- Freeze storage format:
- canonical JSON blob in Redis string keys for delivery and attempt records
- sorted-set indexes scored by `created_at_ms`
- Explicitly reject Redis storage for template contents in v1 because the
template catalog is filesystem-backed.
- Freeze retention:
- idempotency `7d`
- delivery `30d`
- attempts and dead letters `90d`
- Freeze atomic write boundaries:
- reserve idempotency
- store delivery
- schedule first attempt
- create resend clone
### Artifacts
- Redis key catalog
- atomicity notes for Lua or optimistic transaction usage
- retention and cleanup notes
### Exit Criteria
- the Redis adapters can be implemented without unresolved naming or
transactional questions
### Targeted Tests
- repository tests for key naming
- atomicity tests for duplicate idempotency races
- cleanup tests for TTL-driven record expiry
## ~~Stage 04.~~ Freeze the Auth REST Contract
Status: implemented.
### Goal
Define the direct trusted contract from `Auth / Session Service`.
### Tasks
- Freeze route `POST /api/v1/internal/login-code-deliveries`.
- Freeze required `Idempotency-Key` header.
- Freeze body fields:
- `email`
- `code`
- `locale`
- Freeze success outcomes:
- `sent`
- `suppressed`
- Freeze trusted error codes:
- `invalid_request`
- `internal_error`
- `service_unavailable`
- Freeze the meaning of `sent` as durable acceptance into the mail pipeline,
not immediate SMTP completion.
- Freeze auth-client behavior of no automatic retry on upstream or transport
failures.
### Artifacts
- request/response DTOs
- handler contract notes
- error mapping table
### Exit Criteria
- the auth REST client and server can be built from the frozen contract
### Targeted Tests
- strict JSON decoding tests
- required header validation tests
- idempotent repeat request tests
- sent versus suppressed response tests
## ~~Stage 05.~~ Freeze the Async Generic Contract
Status: implemented.
### Goal
Define the exact `Redis Streams` command format used by
`Notification Service`.
### Tasks
- Freeze the stream name `mail:delivery_commands`.
- Freeze required fields:
- `delivery_id`
- `source`
- `payload_mode`
- `idempotency_key`
- `requested_at_ms`
- `payload_json`
- Freeze optional fields:
- `request_id`
- `trace_id`
- Freeze that async `source` accepts only:
- `notification`
- Freeze payload modes:
- `rendered`
- `template`
- Freeze the rendered payload shape with:
- recipient envelope
- `subject`
- `text_body`
- optional `html_body`
- attachments
- Freeze the template payload shape with:
- recipient envelope
- `template_id`
- `locale`
- `variables`
- attachments
- Freeze duplicate handling by `(source, idempotency_key)`.
- Freeze `request_id` and `trace_id` as tracing-only metadata excluded from
the idempotency fingerprint.
- Freeze the malformed-command path into dedicated operator-visible
`mail_malformed_command_entry` state outside `mail_delivery`.
### Artifacts
- stream field catalog
- typed stream command contract
- `AsyncAPI` specification
- `payload_json` schema notes
- malformed command handling rules
### Exit Criteria
- `Notification Service` can publish one command without needing a follow-up
design round
### Targeted Tests
- strict stream-entry decoding tests
- duplicate idempotency tests
- malformed command recording-contract tests
- rendered and template payload acceptance tests
## ~~Stage 06.~~ Build the Runnable Service Skeleton
Status: implemented.
### Goal
Create one runnable internal process with config, Redis, HTTP server, and
workers.
### Tasks
- Implement `cmd/mail`.
- Implement config loading and validation.
- Wire Redis client, template catalog, provider adapter, HTTP server, and
workers.
- Add graceful shutdown across:
- HTTP server
- stream consumer
- scheduler
- attempt workers
- cleanup worker
- Add startup validation for required Redis and provider config.
### Artifacts
- runnable `cmd/mail`
- bootstrap wiring
- graceful shutdown logic
### Exit Criteria
- the process starts and stops cleanly with valid config
### Targeted Tests
- startup with stub mode
- startup failure on invalid Redis config
- graceful shutdown without leaked goroutines
## ~~Stage 07.~~ Implement Auth Delivery Acceptance
Status: implemented.
### Goal
Accept auth-code deliveries synchronously and durably.
### Tasks
- Implement the auth acceptance use case.
- Validate `email`, `code`, `locale`, and `Idempotency-Key`.
- Classify explicit suppression without treating it as failure.
- Persist delivery, idempotency record, and first scheduled attempt
atomically.
- Keep `suppressed` acceptance as the explicit exception that persists only
delivery plus idempotency state without a first attempt.
- Return stable `sent` or `suppressed`.
- Add telemetry for accepted auth requests.
- Reject mismatched replays with the same idempotency key.
### Artifacts
- auth acceptance service
- internal HTTP handler
- DTO validation and error mapping
### Exit Criteria
- auth requests create one durable delivery or fail closed without partial
state
### Targeted Tests
- valid request accepted as `sent`
- valid request accepted as `suppressed` without attempt scheduling
- duplicate identical request returns same result
- duplicate mismatched request is rejected
- Redis persistence failure surfaces `503 service_unavailable`
## ~~Stage 08.~~ Implement Async Generic Acceptance
Status: implemented.
### Goal
Consume generic mail commands from `Redis Streams` and convert them into
durable deliveries.
### Tasks
- Implement plain `XREAD`-based stream consumption.
- Decode and validate stream entries.
- Persist one delivery and schedule one first attempt atomically.
- Advance the consumer offset only after durable acceptance.
- Meter malformed entries and record them as operator-visible
`mail_malformed_command_entry` state.
- Keep duplicate idempotency requests as no-op accepts.
### Artifacts
- stream consumer worker
- generic acceptance service
- malformed command recorder
### Exit Criteria
- valid commands are never lost after they are read from the stream
### Targeted Tests
- rendered command acceptance
- template command acceptance
- duplicate command no-op behavior
- malformed command recording
- consumer restart continuing from the correct offset
## ~~Stage 09.~~ Implement the Template Catalog and Rendering
Status: implemented.
### Goal
Provide deterministic rendering for template-mode deliveries.
### Tasks
- Implement filesystem-backed template discovery under `templates/`.
- Freeze directory layout as `<template_id>/<locale>/subject.tmpl`,
`text.tmpl`, and optional `html.tmpl`.
- Implement locale validation and fallback to `en`.
- Record `locale_fallback_used`.
- Validate required variables before rendering.
- Reject unknown missing required variables deterministically.
- Add dedicated auth template family:
- `auth.login_code`
### Artifacts
- template catalog adapter
- renderer
- auth template assets
### Exit Criteria
- template mode always produces one final deterministic subject/body bundle or
one classified render failure
### Targeted Tests
- exact locale render
- unsupported locale fallback to `en`
- missing `en` fallback failure
- missing required variable failure
- deterministic render snapshots
## ~~Stage 10.~~ Implement the Provider Layer
Status: implemented.
### Goal
Provide concrete delivery adapters for SMTP and deterministic local testing.
### Tasks
- Freeze provider result classifications:
- `accepted`
- `suppressed`
- `transient_failure`
- `permanent_failure`
- Implement SMTP adapter with:
- dial/connect
- optional auth
- envelope mapping
- MIME body construction
- inline attachment mapping
- timeout classification
- Implement stub adapter with scriptable outcomes.
- Redact provider summaries before storing them in audit fields.
### Artifacts
- SMTP adapter
- stub provider adapter
- MIME builder helpers
### Exit Criteria
- one attempt can be executed against either adapter with stable classified
outcomes
### Targeted Tests
- SMTP request construction tests
- attachment mapping tests
- timeout classification tests
- stub scripted outcome tests
## ~~Stage 11.~~ Implement the Attempt Scheduler and Workers
Status: implemented.
### Goal
Run due attempts exactly once per scheduled slot and apply retry policy.
### Tasks
- Implement `mail:attempt_schedule` claim logic.
- Enforce at most one active attempt per delivery.
- Execute provider calls through the attempt service.
- Schedule retries at:
- `1m`
- `5m`
- `30m`
- Transition exhausted deliveries to `dead_letter`.
- Keep recoverable state across process restarts.
- Ensure claimed but unfinished work becomes visible again after worker crash
recovery.
### Artifacts
- scheduler worker
- attempt worker
- retry planner
- dead-letter writer
### Exit Criteria
- the service survives restarts and resumes scheduled work without duplicate
attempt ownership
### Targeted Tests
- immediate first attempt
- transient retry chain to success
- retry exhaustion to dead letter
- crash recovery of in-progress attempt ownership
## ~~Stage 12.~~ Implement the Operator API
Status: implemented.
### Goal
Provide trusted read and resend controls without direct Redis access.
### Tasks
- Implement delivery lookup by `delivery_id`.
- Implement filtered list with deterministic cursor pagination.
- Implement attempt history reads.
- Implement resend clone creation.
- Freeze cursor format as opaque base64 of `created_at_ms:delivery_id`.
- Reject resend for non-terminal statuses.
### Artifacts
- operator HTTP handlers
- list query DTOs
- resend service
### Exit Criteria
- operators can inspect and resend deliveries safely through the service API
### Targeted Tests
- list filtering by recipient, status, source, template, and idempotency key
- cursor pagination tests
- resend allowed for terminal states only
- resend creates a linked clone rather than mutating the original
## ~~Stage 13.~~ Add Observability and Runbook Coverage
Status: implemented.
### Goal
Make the service operable without reading the code.
### Tasks
- Add counters for:
- accepted auth deliveries
- accepted generic deliveries
- suppressed deliveries
- delivery statuses
- attempt outcomes
- dead letters
- locale fallback
- Add gauges or histograms for:
- scheduled depth
- oldest scheduled age
- SMTP latency
- Add structured logs with:
- `delivery_id`
- `source`
- `template_id`
- `attempt_no`
- Add traces around:
- acceptance
- rendering
- provider send
- resend
- Write operator runbook content for:
- backlog growth
- dead-letter spikes
- repeated suppressions
- SMTP auth or timeout failures
- malformed stream commands
### Artifacts
- telemetry runtime
- logging helpers
- runbook section drafts
### Exit Criteria
- common failure modes are visible and actionable
### Targeted Tests
- metric emission tests
- log field presence tests
- trace smoke tests where practical
## ~~Stage 14.~~ Complete the Test Matrix
Status: implemented.
### Goal
Reach a safe verification baseline across unit, integration, and end-to-end
scenarios.
### Tasks
- Add unit tests for:
- validation
- state transitions
- idempotency
- rendering
- provider classification
- retry planning
- Add integration tests for:
- auth REST to durable delivery
- stream command to durable delivery
- attempt execution against stub provider
- operator API against Redis-backed state
- Add end-to-end scenarios for:
- auth `sent`
- auth `suppressed`
- template locale fallback
- transient retry to success
- retry exhaustion to dead letter
- duplicate idempotency key
- resend clone
- graceful shutdown with pending work
### Artifacts
- unit test suite
- integration harness
- end-to-end scenarios
### Exit Criteria
- the planned behavior is covered closely enough to refactor safely
### Targeted Tests
- execute the smallest relevant subset:
- `go test ./mail/...`
- focused integration packages once they exist
## ~~Stage 15.~~ Align Cross-Service Documentation
Status: implemented.
### Goal
Update existing documentation so the repository tells one coherent story about
`Mail Service`.
### Tasks
- Update `ARCHITECTURE.md`:
- direct auth mail is synchronous trusted REST
- generic notification mail is asynchronous through `Notification Service`
- clarify that durable acceptance may precede SMTP completion
- Update `geoprofile` docs:
- remove direct `Geo Profile Service -> Mail Service`
- route optional admin mail through `Notification Service`
- Update `authsession` docs:
- clarify localized mail acceptance semantics
- clarify that `sent` means accepted into the mail pipeline
- Update `gateway` docs:
- document `Accept-Language` as the public auth locale source
- keep JSON bodies unchanged
- Update `user` docs:
- document the auth-provided preferred-language candidate rule for new-user
creation
### Artifacts
- aligned service READMEs and docs
- aligned architecture narrative
### Exit Criteria
- no first-class document contradicts the new `Mail Service` model
### Targeted Tests
- documentation review
- contract-document sync review
## Final Acceptance Checklist
The implementation is complete only when all of the following hold:
- the process starts with Redis and stub provider config
- auth REST intake works with explicit idempotency
- async generic stream intake works with duplicate suppression
- template rendering and locale fallback are deterministic
- SMTP and stub providers both work through the same port
- retries and dead-letter flow operate after restarts
- operator reads and resend clone work
- metrics, logs, and traces cover the main failure modes
- repository documentation is aligned with the final service model