Files
galaxy-game/mail/PLAN.md
T
2026-04-17 18:39:16 +02:00

21 KiB

Mail Service Implementation Plan

This plan has been already implemented and stays here for historical reasons.

It should NOT be threated as source of truth for service functionality.

Summary

This plan describes the full v1 implementation path for galaxy/mail. It is intentionally decision-complete: the implementer should not need to invent service boundaries, storage layout, contracts, or retry semantics while building the service.

The target outcome is one runnable internal service that:

  • accepts auth-code mail synchronously over trusted REST
  • consumes generic non-auth mail asynchronously from Redis Streams
  • renders templates or accepts pre-rendered content
  • delivers through SMTP or a deterministic stub
  • retries bounded transient failures
  • stores durable delivery audit state
  • exposes trusted operator reads and resend controls

Global Rules

  • keep one logical delivery equal to one SMTP envelope
  • keep suppressed separate from failure
  • require explicit idempotency for every accepted command
  • prefer deterministic Redis-backed scheduling over in-memory timers
  • keep operator inspection possible without direct Redis access
  • treat filesystem templates as the v1 source of truth
  • keep public and trusted contracts explicit and versionable

Target Runtime Layout

mail/
├── cmd/
│   └── mail/
│       └── main.go
├── internal/
│   ├── app/
│   │   ├── app.go
│   │   ├── bootstrap.go
│   │   └── runtime.go
│   ├── config/
│   │   ├── config.go
│   │   ├── env.go
│   │   └── validation.go
│   ├── domain/
│   │   ├── delivery/
│   │   │   ├── model.go
│   │   │   ├── state.go
│   │   │   └── errors.go
│   │   ├── attempt/
│   │   │   ├── model.go
│   │   │   ├── state.go
│   │   │   └── policy.go
│   │   ├── idempotency/
│   │   │   └── model.go
│   │   ├── template/
│   │   │   ├── model.go
│   │   │   └── locale.go
│   │   └── common/
│   │       ├── email.go
│   │       ├── locale.go
│   │       ├── attachment.go
│   │       └── ids.go
│   ├── ports/
│   │   ├── deliverystore.go
│   │   ├── attemptstore.go
│   │   ├── idempotencystore.go
│   │   ├── commandsubscriber.go
│   │   ├── attemptscheduler.go
│   │   ├── templatecatalog.go
│   │   ├── provider.go
│   │   ├── clock.go
│   │   └── idgenerator.go
│   ├── service/
│   │   ├── acceptauthdelivery/
│   │   ├── acceptgenericdelivery/
│   │   ├── executeattempt/
│   │   ├── listdeliveries/
│   │   ├── getdelivery/
│   │   ├── listattempts/
│   │   └── resenddelivery/
│   ├── api/
│   │   ├── internalhttp/
│   │   └── streamcommand/
│   ├── adapters/
│   │   ├── redis/
│   │   ├── smtp/
│   │   ├── templates/
│   │   ├── stubprovider/
│   │   ├── clock/
│   │   └── id/
│   ├── worker/
│   │   ├── command_consumer.go
│   │   ├── scheduler.go
│   │   ├── attempt_worker.go
│   │   └── cleanup_worker.go
│   ├── observability/
│   │   ├── logging.go
│   │   ├── metrics.go
│   │   └── tracing.go
│   └── testkit/
│       ├── redis.go
│       ├── provider.go
│       ├── clock.go
│       ├── templates.go
│       └── commands.go
├── templates/
│   └── ...
├── docs/
│   ├── README.md
│   └── stage-01-vocabulary-and-ownership.md
├── README.md
└── PLAN.md

Target Configuration

Planned environment variables:

  • MAIL_INTERNAL_HTTP_ADDR with default :8080
  • MAIL_REDIS_ADDR required
  • MAIL_REDIS_COMMAND_STREAM with default mail:delivery_commands
  • MAIL_REDIS_ATTEMPT_SCHEDULE_KEY with default mail:attempt_schedule
  • MAIL_REDIS_DEAD_LETTER_PREFIX with default mail:dead_letters:
  • MAIL_SMTP_MODE=stub|smtp with default stub
  • MAIL_SMTP_ADDR required in smtp mode
  • MAIL_SMTP_USERNAME optional
  • MAIL_SMTP_PASSWORD optional
  • MAIL_SMTP_FROM_EMAIL required in smtp mode
  • MAIL_SMTP_FROM_NAME optional
  • MAIL_SMTP_TIMEOUT with default 15s
  • MAIL_TEMPLATE_DIR with default templates
  • MAIL_ATTEMPT_WORKER_CONCURRENCY with default 4
  • MAIL_STREAM_BLOCK_TIMEOUT with default 2s
  • MAIL_OPERATOR_REQUEST_TIMEOUT with default 5s
  • MAIL_IDEMPOTENCY_TTL with default 168h
  • MAIL_DELIVERY_TTL with default 720h
  • MAIL_ATTEMPT_TTL with default 2160h

Stage 01. Freeze Vocabulary and Ownership

Status: implemented.

Goal

Freeze the service vocabulary and remove cross-service ambiguity before any implementation work starts.

Tasks

  • Freeze that Mail Service owns delivery acceptance, attempts, retry, suppression, audit, and resend.
  • Freeze that Notification Service owns the business decision to request non-auth mail.
  • Freeze that Auth / Session Service uses the dedicated auth REST contract.
  • Freeze that Geo Profile Service routes optional admin mail through Notification Service, not directly to Mail Service.
  • Freeze that operator APIs are part of v1, not a later add-on.

Artifacts

  • stable service README
  • aligned architecture references
  • list of accepted source values:
    • authsession
    • notification
    • operator_resend

Exit Criteria

  • no document still treats Geo Profile Service as a direct Mail Service caller
  • no document claims that all Mail Service callers use the same transport

Targeted Tests

  • documentation review only

Stage 02. Define the Domain Model and State Rules

Status: implemented.

Goal

Describe the logical delivery entities and freeze their valid state transitions.

Tasks

  • Define mail_delivery, mail_attempt, mail_idempotency_record, mail_template, and mail_dead_letter_entry.
  • Freeze delivery states:
    • accepted
    • queued
    • rendered
    • sending
    • sent
    • suppressed
    • failed
    • dead_letter
  • Freeze attempt states:
    • scheduled
    • in_progress
    • provider_accepted
    • provider_rejected
    • transport_failed
    • timed_out
  • Freeze resend as clone-only with immutable parent history.
  • Freeze terminal-state resend eligibility:
    • sent
    • suppressed
    • failed
    • dead_letter

Artifacts

  • domain models
  • state transition table
  • resend eligibility rules

Exit Criteria

  • every use case can rely on one explicit state machine

Targeted Tests

  • unit tests for allowed and forbidden delivery transitions
  • unit tests for resend eligibility

Stage 03. Freeze the Redis Physical Model

Status: implemented.

Goal

Lock the Redis layout so repository and scheduling adapters can be implemented without revisiting the data design.

Tasks

  • Freeze primary keys:
    • mail:deliveries:<delivery_id>
    • mail:attempts:<delivery_id>:<attempt_no>
    • mail:idempotency:<source>:<idempotency_key>
    • mail:dead_letters:<delivery_id>
  • Freeze scheduler and ingress keys:
    • mail:delivery_commands
    • mail:attempt_schedule
  • Freeze search indexes:
    • mail:idx:recipient:<email>
    • mail:idx:status:<status>
    • mail:idx:source:<source>
    • mail:idx:template:<template_id>
    • mail:idx:idempotency:<source>:<idempotency_key>
    • mail:idx:created_at
  • Freeze storage format:
    • canonical JSON blob in Redis string keys for delivery and attempt records
    • sorted-set indexes scored by created_at_ms
  • Explicitly reject Redis storage for template contents in v1 because the template catalog is filesystem-backed.
  • Freeze retention:
    • idempotency 7d
    • delivery 30d
    • attempts and dead letters 90d
  • Freeze atomic write boundaries:
    • reserve idempotency
    • store delivery
    • schedule first attempt
    • create resend clone

Artifacts

  • Redis key catalog
  • atomicity notes for Lua or optimistic transaction usage
  • retention and cleanup notes

Exit Criteria

  • the Redis adapters can be implemented without unresolved naming or transactional questions

Targeted Tests

  • repository tests for key naming
  • atomicity tests for duplicate idempotency races
  • cleanup tests for TTL-driven record expiry

Stage 04. Freeze the Auth REST Contract

Status: implemented.

Goal

Define the direct trusted contract from Auth / Session Service.

Tasks

  • Freeze route POST /api/v1/internal/login-code-deliveries.
  • Freeze required Idempotency-Key header.
  • Freeze body fields:
    • email
    • code
    • locale
  • Freeze success outcomes:
    • sent
    • suppressed
  • Freeze trusted error codes:
    • invalid_request
    • internal_error
    • service_unavailable
  • Freeze the meaning of sent as durable acceptance into the mail pipeline, not immediate SMTP completion.
  • Freeze auth-client behavior of no automatic retry on upstream or transport failures.

Artifacts

  • request/response DTOs
  • handler contract notes
  • error mapping table

Exit Criteria

  • the auth REST client and server can be built from the frozen contract

Targeted Tests

  • strict JSON decoding tests
  • required header validation tests
  • idempotent repeat request tests
  • sent versus suppressed response tests

Stage 05. Freeze the Async Generic Contract

Status: implemented.

Goal

Define the exact Redis Streams command format used by Notification Service.

Tasks

  • Freeze the stream name mail:delivery_commands.
  • Freeze required fields:
    • delivery_id
    • source
    • payload_mode
    • idempotency_key
    • requested_at_ms
    • payload_json
  • Freeze optional fields:
    • request_id
    • trace_id
  • Freeze that async source accepts only:
    • notification
  • Freeze payload modes:
    • rendered
    • template
  • Freeze the rendered payload shape with:
    • recipient envelope
    • subject
    • text_body
    • optional html_body
    • attachments
  • Freeze the template payload shape with:
    • recipient envelope
    • template_id
    • locale
    • variables
    • attachments
  • Freeze duplicate handling by (source, idempotency_key).
  • Freeze request_id and trace_id as tracing-only metadata excluded from the idempotency fingerprint.
  • Freeze the malformed-command path into dedicated operator-visible mail_malformed_command_entry state outside mail_delivery.

Artifacts

  • stream field catalog
  • typed stream command contract
  • AsyncAPI specification
  • payload_json schema notes
  • malformed command handling rules

Exit Criteria

  • Notification Service can publish one command without needing a follow-up design round

Targeted Tests

  • strict stream-entry decoding tests
  • duplicate idempotency tests
  • malformed command recording-contract tests
  • rendered and template payload acceptance tests

Stage 06. Build the Runnable Service Skeleton

Status: implemented.

Goal

Create one runnable internal process with config, Redis, HTTP server, and workers.

Tasks

  • Implement cmd/mail.
  • Implement config loading and validation.
  • Wire Redis client, template catalog, provider adapter, HTTP server, and workers.
  • Add graceful shutdown across:
    • HTTP server
    • stream consumer
    • scheduler
    • attempt workers
    • cleanup worker
  • Add startup validation for required Redis and provider config.

Artifacts

  • runnable cmd/mail
  • bootstrap wiring
  • graceful shutdown logic

Exit Criteria

  • the process starts and stops cleanly with valid config

Targeted Tests

  • startup with stub mode
  • startup failure on invalid Redis config
  • graceful shutdown without leaked goroutines

Stage 07. Implement Auth Delivery Acceptance

Status: implemented.

Goal

Accept auth-code deliveries synchronously and durably.

Tasks

  • Implement the auth acceptance use case.
  • Validate email, code, locale, and Idempotency-Key.
  • Classify explicit suppression without treating it as failure.
  • Persist delivery, idempotency record, and first scheduled attempt atomically.
  • Keep suppressed acceptance as the explicit exception that persists only delivery plus idempotency state without a first attempt.
  • Return stable sent or suppressed.
  • Add telemetry for accepted auth requests.
  • Reject mismatched replays with the same idempotency key.

Artifacts

  • auth acceptance service
  • internal HTTP handler
  • DTO validation and error mapping

Exit Criteria

  • auth requests create one durable delivery or fail closed without partial state

Targeted Tests

  • valid request accepted as sent
  • valid request accepted as suppressed without attempt scheduling
  • duplicate identical request returns same result
  • duplicate mismatched request is rejected
  • Redis persistence failure surfaces 503 service_unavailable

Stage 08. Implement Async Generic Acceptance

Status: implemented.

Goal

Consume generic mail commands from Redis Streams and convert them into durable deliveries.

Tasks

  • Implement plain XREAD-based stream consumption.
  • Decode and validate stream entries.
  • Persist one delivery and schedule one first attempt atomically.
  • Advance the consumer offset only after durable acceptance.
  • Meter malformed entries and record them as operator-visible mail_malformed_command_entry state.
  • Keep duplicate idempotency requests as no-op accepts.

Artifacts

  • stream consumer worker
  • generic acceptance service
  • malformed command recorder

Exit Criteria

  • valid commands are never lost after they are read from the stream

Targeted Tests

  • rendered command acceptance
  • template command acceptance
  • duplicate command no-op behavior
  • malformed command recording
  • consumer restart continuing from the correct offset

Stage 09. Implement the Template Catalog and Rendering

Status: implemented.

Goal

Provide deterministic rendering for template-mode deliveries.

Tasks

  • Implement filesystem-backed template discovery under templates/.
  • Freeze directory layout as <template_id>/<locale>/subject.tmpl, text.tmpl, and optional html.tmpl.
  • Implement locale validation and fallback to en.
  • Record locale_fallback_used.
  • Validate required variables before rendering.
  • Reject unknown missing required variables deterministically.
  • Add dedicated auth template family:
    • auth.login_code

Artifacts

  • template catalog adapter
  • renderer
  • auth template assets

Exit Criteria

  • template mode always produces one final deterministic subject/body bundle or one classified render failure

Targeted Tests

  • exact locale render
  • unsupported locale fallback to en
  • missing en fallback failure
  • missing required variable failure
  • deterministic render snapshots

Stage 10. Implement the Provider Layer

Status: implemented.

Goal

Provide concrete delivery adapters for SMTP and deterministic local testing.

Tasks

  • Freeze provider result classifications:
    • accepted
    • suppressed
    • transient_failure
    • permanent_failure
  • Implement SMTP adapter with:
    • dial/connect
    • optional auth
    • envelope mapping
    • MIME body construction
    • inline attachment mapping
    • timeout classification
  • Implement stub adapter with scriptable outcomes.
  • Redact provider summaries before storing them in audit fields.

Artifacts

  • SMTP adapter
  • stub provider adapter
  • MIME builder helpers

Exit Criteria

  • one attempt can be executed against either adapter with stable classified outcomes

Targeted Tests

  • SMTP request construction tests
  • attachment mapping tests
  • timeout classification tests
  • stub scripted outcome tests

Stage 11. Implement the Attempt Scheduler and Workers

Status: implemented.

Goal

Run due attempts exactly once per scheduled slot and apply retry policy.

Tasks

  • Implement mail:attempt_schedule claim logic.
  • Enforce at most one active attempt per delivery.
  • Execute provider calls through the attempt service.
  • Schedule retries at:
    • 1m
    • 5m
    • 30m
  • Transition exhausted deliveries to dead_letter.
  • Keep recoverable state across process restarts.
  • Ensure claimed but unfinished work becomes visible again after worker crash recovery.

Artifacts

  • scheduler worker
  • attempt worker
  • retry planner
  • dead-letter writer

Exit Criteria

  • the service survives restarts and resumes scheduled work without duplicate attempt ownership

Targeted Tests

  • immediate first attempt
  • transient retry chain to success
  • retry exhaustion to dead letter
  • crash recovery of in-progress attempt ownership

Stage 12. Implement the Operator API

Status: implemented.

Goal

Provide trusted read and resend controls without direct Redis access.

Tasks

  • Implement delivery lookup by delivery_id.
  • Implement filtered list with deterministic cursor pagination.
  • Implement attempt history reads.
  • Implement resend clone creation.
  • Freeze cursor format as opaque base64 of created_at_ms:delivery_id.
  • Reject resend for non-terminal statuses.

Artifacts

  • operator HTTP handlers
  • list query DTOs
  • resend service

Exit Criteria

  • operators can inspect and resend deliveries safely through the service API

Targeted Tests

  • list filtering by recipient, status, source, template, and idempotency key
  • cursor pagination tests
  • resend allowed for terminal states only
  • resend creates a linked clone rather than mutating the original

Stage 13. Add Observability and Runbook Coverage

Status: implemented.

Goal

Make the service operable without reading the code.

Tasks

  • Add counters for:
    • accepted auth deliveries
    • accepted generic deliveries
    • suppressed deliveries
    • delivery statuses
    • attempt outcomes
    • dead letters
    • locale fallback
  • Add gauges or histograms for:
    • scheduled depth
    • oldest scheduled age
    • SMTP latency
  • Add structured logs with:
    • delivery_id
    • source
    • template_id
    • attempt_no
  • Add traces around:
    • acceptance
    • rendering
    • provider send
    • resend
  • Write operator runbook content for:
    • backlog growth
    • dead-letter spikes
    • repeated suppressions
    • SMTP auth or timeout failures
    • malformed stream commands

Artifacts

  • telemetry runtime
  • logging helpers
  • runbook section drafts

Exit Criteria

  • common failure modes are visible and actionable

Targeted Tests

  • metric emission tests
  • log field presence tests
  • trace smoke tests where practical

Stage 14. Complete the Test Matrix

Status: implemented.

Goal

Reach a safe verification baseline across unit, integration, and end-to-end scenarios.

Tasks

  • Add unit tests for:
    • validation
    • state transitions
    • idempotency
    • rendering
    • provider classification
    • retry planning
  • Add integration tests for:
    • auth REST to durable delivery
    • stream command to durable delivery
    • attempt execution against stub provider
    • operator API against Redis-backed state
  • Add end-to-end scenarios for:
    • auth sent
    • auth suppressed
    • template locale fallback
    • transient retry to success
    • retry exhaustion to dead letter
    • duplicate idempotency key
    • resend clone
    • graceful shutdown with pending work

Artifacts

  • unit test suite
  • integration harness
  • end-to-end scenarios

Exit Criteria

  • the planned behavior is covered closely enough to refactor safely

Targeted Tests

  • execute the smallest relevant subset:
    • go test ./mail/...
    • focused integration packages once they exist

Stage 15. Align Cross-Service Documentation

Status: implemented.

Goal

Update existing documentation so the repository tells one coherent story about Mail Service.

Tasks

  • Update ARCHITECTURE.md:
    • direct auth mail is synchronous trusted REST
    • generic notification mail is asynchronous through Notification Service
    • clarify that durable acceptance may precede SMTP completion
  • Update geoprofile docs:
    • remove direct Geo Profile Service -> Mail Service
    • route optional admin mail through Notification Service
  • Update authsession docs:
    • clarify localized mail acceptance semantics
    • clarify that sent means accepted into the mail pipeline
  • Update gateway docs:
    • document Accept-Language as the public auth locale source
    • keep JSON bodies unchanged
  • Update user docs:
    • document the auth-provided preferred-language candidate rule for new-user creation

Artifacts

  • aligned service READMEs and docs
  • aligned architecture narrative

Exit Criteria

  • no first-class document contradicts the new Mail Service model

Targeted Tests

  • documentation review
  • contract-document sync review

Final Acceptance Checklist

The implementation is complete only when all of the following hold:

  • the process starts with Redis and stub provider config
  • auth REST intake works with explicit idempotency
  • async generic stream intake works with duplicate suppression
  • template rendering and locale fallback are deterministic
  • SMTP and stub providers both work through the same port
  • retries and dead-letter flow operate after restarts
  • operator reads and resend clone work
  • metrics, logs, and traces cover the main failure modes
  • repository documentation is aligned with the final service model