feat: use postgres
This commit is contained in:
@@ -9,6 +9,7 @@ Sections:
|
||||
- [Main flows](flows.md)
|
||||
- [Configuration and contract examples](examples.md)
|
||||
- [Operator runbook](runbook.md)
|
||||
- [PostgreSQL migration decisions (Stage 4 of `PG_PLAN.md`)](postgres-migration.md)
|
||||
|
||||
Primary references:
|
||||
|
||||
|
||||
@@ -8,7 +8,9 @@ unless explicitly stated otherwise.
|
||||
Minimal local runtime with stub provider:
|
||||
|
||||
```dotenv
|
||||
MAIL_REDIS_ADDR=127.0.0.1:6379
|
||||
MAIL_REDIS_MASTER_ADDR=127.0.0.1:6379
|
||||
MAIL_REDIS_PASSWORD=local
|
||||
MAIL_POSTGRES_PRIMARY_DSN=postgres://mailservice:mailservice@127.0.0.1:5432/galaxy?search_path=mail&sslmode=disable
|
||||
MAIL_INTERNAL_HTTP_ADDR=:8080
|
||||
MAIL_TEMPLATE_DIR=templates
|
||||
MAIL_SMTP_MODE=stub
|
||||
@@ -20,7 +22,9 @@ OTEL_METRICS_EXPORTER=none
|
||||
SMTP-backed shape:
|
||||
|
||||
```dotenv
|
||||
MAIL_REDIS_ADDR=127.0.0.1:6379
|
||||
MAIL_REDIS_MASTER_ADDR=127.0.0.1:6379
|
||||
MAIL_REDIS_PASSWORD=local
|
||||
MAIL_POSTGRES_PRIMARY_DSN=postgres://mailservice:mailservice@127.0.0.1:5432/galaxy?search_path=mail&sslmode=disable
|
||||
MAIL_INTERNAL_HTTP_ADDR=:8080
|
||||
MAIL_TEMPLATE_DIR=templates
|
||||
|
||||
|
||||
+21
-20
@@ -6,22 +6,22 @@
|
||||
sequenceDiagram
|
||||
participant Auth as Auth / Session Service
|
||||
participant Mail as Mail Service
|
||||
participant Redis
|
||||
participant Postgres
|
||||
participant Scheduler
|
||||
participant SMTP as Provider
|
||||
|
||||
Auth->>Mail: POST /api/v1/internal/login-code-deliveries + Idempotency-Key
|
||||
Mail->>Mail: validate request and idempotency scope
|
||||
alt MAIL_SMTP_MODE = stub
|
||||
Mail->>Redis: persist delivery as suppressed
|
||||
Mail->>Postgres: persist delivery as suppressed
|
||||
Mail-->>Auth: 200 {outcome=suppressed}
|
||||
else MAIL_SMTP_MODE = smtp
|
||||
Mail->>Redis: persist delivery as queued + attempt #1 scheduled
|
||||
Mail->>Postgres: persist delivery as queued + attempt #1 scheduled
|
||||
Mail-->>Auth: 200 {outcome=sent}
|
||||
Scheduler->>Redis: claim due attempt
|
||||
Scheduler->>Postgres: claim due attempt (FOR UPDATE SKIP LOCKED)
|
||||
Scheduler->>SMTP: send rendered auth mail
|
||||
SMTP-->>Scheduler: accepted or classified failure
|
||||
Scheduler->>Redis: commit sent / retry / failed / dead_letter
|
||||
Scheduler->>Postgres: commit sent / retry / failed / dead_letter
|
||||
end
|
||||
```
|
||||
|
||||
@@ -36,16 +36,17 @@ sequenceDiagram
|
||||
participant Stream as Redis Stream mail:delivery_commands
|
||||
participant Consumer as Command consumer
|
||||
participant Mail as Mail Service
|
||||
participant Postgres
|
||||
participant Redis
|
||||
|
||||
Notify->>Stream: XADD generic command
|
||||
Consumer->>Stream: XREAD from last stored offset
|
||||
Consumer->>Mail: decode and validate command
|
||||
alt malformed or conflicting command
|
||||
Mail->>Redis: record malformed command entry
|
||||
Mail->>Postgres: record malformed command entry
|
||||
Consumer->>Redis: save stream offset
|
||||
else valid command
|
||||
Mail->>Redis: persist delivery + first attempt + optional payload bundle
|
||||
Mail->>Postgres: persist delivery + first attempt + optional payload bundle
|
||||
Consumer->>Redis: save stream offset
|
||||
end
|
||||
```
|
||||
@@ -55,29 +56,29 @@ sequenceDiagram
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Scheduler
|
||||
participant Redis
|
||||
participant Postgres
|
||||
participant Worker as Attempt worker
|
||||
participant SMTP as Provider
|
||||
|
||||
Scheduler->>Redis: find next due delivery
|
||||
Scheduler->>Redis: load work item
|
||||
Scheduler->>Postgres: find next due delivery (next_attempt_at <= now)
|
||||
Scheduler->>Postgres: load work item (delivery + active attempt)
|
||||
alt template delivery not yet rendered
|
||||
Scheduler->>Redis: render and store materialized content
|
||||
Scheduler->>Postgres: render and store materialized content
|
||||
end
|
||||
Scheduler->>Redis: claim scheduled attempt
|
||||
Scheduler->>Postgres: claim scheduled attempt (FOR UPDATE SKIP LOCKED)
|
||||
Scheduler->>Worker: enqueue claimed work
|
||||
Worker->>SMTP: send materialized message
|
||||
SMTP-->>Worker: accepted / suppressed / transient_failure / permanent_failure
|
||||
alt accepted
|
||||
Worker->>Redis: commit sent + provider_accepted
|
||||
Worker->>Postgres: commit sent + provider_accepted
|
||||
else suppressed
|
||||
Worker->>Redis: commit suppressed + provider_rejected
|
||||
Worker->>Postgres: commit suppressed + provider_rejected
|
||||
else transient failure before retry budget ends
|
||||
Worker->>Redis: commit transport_failed|timed_out + next scheduled attempt
|
||||
Worker->>Postgres: commit transport_failed|timed_out + next scheduled attempt
|
||||
else retry budget exhausted
|
||||
Worker->>Redis: commit dead_letter + dead-letter entry
|
||||
Worker->>Postgres: commit dead_letter + dead-letter entry
|
||||
else permanent failure
|
||||
Worker->>Redis: commit failed + provider_rejected
|
||||
Worker->>Postgres: commit failed + provider_rejected
|
||||
end
|
||||
```
|
||||
|
||||
@@ -87,12 +88,12 @@ sequenceDiagram
|
||||
sequenceDiagram
|
||||
participant Ops as Trusted operator
|
||||
participant Mail as Mail Service
|
||||
participant Redis
|
||||
participant Postgres
|
||||
|
||||
Ops->>Mail: POST /api/v1/internal/deliveries/{delivery_id}/resend
|
||||
Mail->>Redis: load original delivery and optional payload bundle
|
||||
Mail->>Postgres: load original delivery and optional payload bundle
|
||||
Mail->>Mail: verify original status is terminal
|
||||
Mail->>Redis: create clone delivery with source=operator_resend
|
||||
Mail->>Postgres: create clone delivery with source=operator_resend
|
||||
Mail-->>Ops: 200 {delivery_id=<clone>}
|
||||
```
|
||||
|
||||
|
||||
@@ -0,0 +1,236 @@
|
||||
# PostgreSQL Migration
|
||||
|
||||
PG_PLAN.md §4 migrated `galaxy/mail` from a Redis-only durable store to the
|
||||
steady-state split codified in `ARCHITECTURE.md §Persistence Backends`:
|
||||
PostgreSQL is the source of truth for table-shaped business state, and Redis
|
||||
keeps only the inbound `mail:delivery_commands` stream and its persisted
|
||||
consumer offset.
|
||||
|
||||
This document records the schema decisions and the non-obvious agreements
|
||||
behind them. Use it together with the migration script
|
||||
(`internal/adapters/postgres/migrations/00001_init.sql`) and the runtime
|
||||
wiring (`internal/app/runtime.go`).
|
||||
|
||||
## Outcomes
|
||||
|
||||
- Schema `mail` (provisioned externally) holds the durable state:
|
||||
`deliveries`, `delivery_recipients`, `attempts`, `dead_letters`,
|
||||
`delivery_payloads`, `malformed_commands`.
|
||||
- The runtime opens one PostgreSQL pool via `pkg/postgres.OpenPrimary`,
|
||||
applies embedded goose migrations strictly before any HTTP listener
|
||||
becomes ready, and exits non-zero when migration or ping fails.
|
||||
- The runtime opens one shared `*redis.Client` via
|
||||
`pkg/redisconn.NewMasterClient` and passes it to the command consumer and
|
||||
the stream offset store; both stores no longer hold their own connection
|
||||
topology fields.
|
||||
- The Redis adapter package (`internal/adapters/redisstate/`) is reduced to
|
||||
the surviving `StreamOffsetStore` plus a slim `Keyspace` exposing only
|
||||
`StreamOffset(stream)` and `DeliveryCommands()`. The Lua-backed atomic
|
||||
writer, the secondary index keys, the recipient/template/status indexes,
|
||||
the idempotency keyspace, and the per-record TTL constants are gone.
|
||||
- Configuration drops `MAIL_REDIS_USERNAME`, `MAIL_REDIS_TLS_ENABLED`,
|
||||
`MAIL_REDIS_ATTEMPT_SCHEDULE_KEY`, `MAIL_REDIS_DEAD_LETTER_PREFIX`,
|
||||
`MAIL_DELIVERY_TTL`, and `MAIL_ATTEMPT_TTL`. `MAIL_REDIS_ADDR` becomes
|
||||
`MAIL_REDIS_MASTER_ADDR` + optional `MAIL_REDIS_REPLICA_ADDRS`.
|
||||
PostgreSQL-specific knobs live under `MAIL_POSTGRES_*`. New retention
|
||||
knobs (`MAIL_DELIVERY_RETENTION`, `MAIL_MALFORMED_COMMAND_RETENTION`,
|
||||
`MAIL_CLEANUP_INTERVAL`) drive a periodic SQL retention worker.
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. One schema, externally-provisioned role
|
||||
|
||||
**Decision.** The `mail` schema and the matching `mailservice` role are
|
||||
created outside the migration sequence (in tests, by
|
||||
`integration/internal/harness/postgres_container.go::EnsureRoleAndSchema`;
|
||||
in production, by an ops init script not in scope for this stage). The
|
||||
embedded migration `00001_init.sql` only contains DDL for tables and
|
||||
indexes and assumes it runs as the schema owner with `search_path=mail`.
|
||||
|
||||
**Why.** Mixing role creation, schema creation, and table DDL into one
|
||||
script forces every consumer of the migration to run as a superuser. The
|
||||
schema-per-service architectural rule
|
||||
(`ARCHITECTURE.md §Persistence Backends`) lines up neatly with the
|
||||
operational split: ops provisions roles and schemas, the service applies
|
||||
schema-scoped migrations.
|
||||
|
||||
### 2. Idempotency record IS the deliveries row
|
||||
|
||||
**Decision.** The deliveries table carries `source`,
|
||||
`idempotency_key`, `request_fingerprint`, and `idempotency_expires_at`
|
||||
columns and a `UNIQUE (source, idempotency_key)` constraint. Acceptance
|
||||
flows insert the row directly; a duplicate request races on the UNIQUE
|
||||
constraint and surfaces as `acceptauthdelivery.ErrConflict` /
|
||||
`acceptgenericdelivery.ErrConflict`. There is no separate idempotency
|
||||
table.
|
||||
|
||||
**Why.** PG_PLAN.md §3 fixed this rule for every PG-backed service. With
|
||||
the reservation living on the durable record, recovery is a single fact
|
||||
("the row either exists or it does not"); no Redis-loss window can make a
|
||||
duplicate sneak through. Resend deliveries store an empty
|
||||
`request_fingerprint` and a synthetic far-future `idempotency_expires_at`;
|
||||
the read helper treats those rows as non-idempotent so future operator
|
||||
queries cannot mistake a clone for a hit.
|
||||
|
||||
### 3. Recipients live in a normalised side table
|
||||
|
||||
**Decision.** A `delivery_recipients(delivery_id, kind, position, email)`
|
||||
table stores envelope addresses with a `kind` CHECK constraint
|
||||
(`'to'|'cc'|'bcc'|'reply_to'`) and an `email` index that excludes
|
||||
`reply_to`. The deliveries row does not embed envelope JSON.
|
||||
|
||||
**Why.** PG_PLAN.md §4 prescribed `INDEX on … recipient as needed`. A
|
||||
normalised table makes future recipient-filtered listing slot in without
|
||||
schema work and lets the existing operator listing implement the
|
||||
recipient filter as `delivery_id IN (SELECT … FROM delivery_recipients
|
||||
WHERE … lower(email) = lower($1))`. The Redis adapter previously
|
||||
maintained one index key per recipient — the same observable behaviour
|
||||
now comes for free from the PostgreSQL row layout plus a single index.
|
||||
|
||||
### 4. Timestamps are uniformly `timestamptz` and always UTC at the boundary
|
||||
|
||||
**Decision.** Every time-valued column on every Stage 4 table uses
|
||||
PostgreSQL's `timestamptz`. The domain model continues to use
|
||||
`time.Time` / `*time.Time`; the adapter normalises every `time.Time`
|
||||
parameter to UTC at the binding site (`record.X.UTC()` or the
|
||||
`nullableTime` helper that wraps `*time.Time`), and re-wraps every
|
||||
scanned `time.Time` with `.UTC()` (directly or via `timeFromNullable`)
|
||||
before it leaves the adapter. The architecture-wide form of this rule
|
||||
lives in `ARCHITECTURE.md §Persistence Backends → Timestamp handling`.
|
||||
|
||||
**Why.** PG_PLAN.md §4 originally specified mixed naming
|
||||
(`timestamptz` on deliveries, `bigint` epoch-ms on attempts/dead_letters/
|
||||
malformed_commands). User Service Stage 3 already uses `timestamptz` for
|
||||
every table and the runtime contract tests expect Go-level `time.Time`
|
||||
semantics throughout. Keeping the same shape across services reduces
|
||||
adapter-layer complexity and avoids two parallel encoding paths in the
|
||||
mailstore. The deviation from the literal plan is intentional and is
|
||||
documented here. The defensive UTC rule on both sides eliminates the
|
||||
class of bug where the pgx driver returns scanned values in
|
||||
`time.Local`, which silently breaks equality tests, JSON formatting,
|
||||
and comparison against pointer fields.
|
||||
|
||||
### 5. Attempt scheduler reads via `FOR UPDATE SKIP LOCKED`
|
||||
|
||||
**Decision.** The attempt scheduler uses two indexed predicates:
|
||||
|
||||
- `SELECT delivery_id FROM deliveries WHERE next_attempt_at IS NOT NULL
|
||||
AND next_attempt_at <= $now ORDER BY next_attempt_at ASC LIMIT $n` to
|
||||
surface due deliveries (partial index `deliveries_due_idx`).
|
||||
- `SELECT … FROM deliveries WHERE delivery_id = $id AND status IN
|
||||
('queued','rendered') AND next_attempt_at IS NOT NULL AND next_attempt_at
|
||||
<= $now FOR UPDATE SKIP LOCKED` inside the claim transaction.
|
||||
|
||||
The `next_attempt_at` column is maintained explicitly: acceptance and
|
||||
attempt-commit transactions write it from the active scheduled attempt;
|
||||
claim sets it to NULL (the row is `sending` and the row stops being a
|
||||
scheduling candidate); a recovery commit re-populates it for the next
|
||||
attempt.
|
||||
|
||||
**Why.** `FOR UPDATE SKIP LOCKED` lets multiple scheduler instances run
|
||||
concurrently without serialising work on a single sorted set. Maintaining
|
||||
`next_attempt_at` in lockstep with the active attempt keeps the partial
|
||||
index small and avoids reading attempt rows during the hot-path schedule
|
||||
query. The previous Redis ZSET sort key was implicit; the SQL column is
|
||||
explicit, which removes a class of "the index is out of sync with the
|
||||
record" bugs that Lua-coordinated mutations made possible.
|
||||
|
||||
### 6. Recovery uses the most-recent attempt by exact `attempt_no`
|
||||
|
||||
**Decision.** `LoadWorkItem(deliveryID)` reads the delivery row and then
|
||||
the attempt row whose `attempt_no = delivery.attempt_count`. Concurrent
|
||||
commits that update the count and insert a new attempt are tolerated:
|
||||
the load lookup uses an exact key and never observes a partial state.
|
||||
|
||||
**Why.** A naive `ORDER BY attempt_no DESC LIMIT 1` racing against a
|
||||
commit that already wrote the next attempt but had not yet committed
|
||||
the parent delivery row could observe `attempt_no=count+1` while the
|
||||
delivery still reports `count`. Keying the read by the count
|
||||
deterministically returns the delivery's view of its own active attempt
|
||||
even under concurrent worker progress.
|
||||
|
||||
### 7. Periodic SQL retention replaces Redis index cleanup
|
||||
|
||||
**Decision.** A new `worker.SQLRetentionWorker` runs the two DELETE
|
||||
statements driven by config:
|
||||
|
||||
- `DELETE FROM deliveries WHERE created_at < now() - $delivery_retention`
|
||||
cascades to `attempts`, `dead_letters`, `delivery_payloads`, and
|
||||
`delivery_recipients` via `ON DELETE CASCADE`.
|
||||
- `DELETE FROM malformed_commands WHERE recorded_at < now() - $malformed_retention`
|
||||
is a standalone retention pass.
|
||||
|
||||
Three new env vars (`MAIL_DELIVERY_RETENTION`, `MAIL_MALFORMED_COMMAND_RETENTION`,
|
||||
`MAIL_CLEANUP_INTERVAL`) drive the worker. `MAIL_IDEMPOTENCY_TTL` survives
|
||||
unchanged: it controls the per-acceptance `idempotency_expires_at` column
|
||||
the service layer materialises on each row.
|
||||
|
||||
**Why.** PostgreSQL maintains its own indexes; the previous
|
||||
`redisstate.IndexCleaner` had nothing to do once secondary index keys
|
||||
were gone. A per-table retention worker is the simplest model that keeps
|
||||
the mail database from accumulating audit history forever, while leaving
|
||||
the per-acceptance idempotency window controlled by its existing knob.
|
||||
|
||||
### 8. Shared Redis client with consumer-driven shutdown
|
||||
|
||||
**Decision.** `internal/app/runtime.go` constructs one
|
||||
`redisconn.NewMasterClient(cfg.Redis.Conn)` and passes it to both the
|
||||
stream offset store and the command consumer. The consumer's `Shutdown`
|
||||
closes the shared client to break the in-flight blocking `XREAD`; the
|
||||
runtime's cleanup function tolerates `redis.ErrClosed` so a double-close
|
||||
is benign.
|
||||
|
||||
**Why.** Each subsequent PG_PLAN stage (Notification, Lobby) ships a
|
||||
similar pattern; sharing one client is the shape we want all stages to
|
||||
converge on. The dedicated client for the consumer was an artefact of
|
||||
the Redis-only architecture and multiplied TCP connections, ping points,
|
||||
and OpenTelemetry instrumentation hooks for no functional benefit.
|
||||
|
||||
### 9. Query layer is `go-jet/jet/v2`
|
||||
|
||||
**Decision.** All `mailstore` packages build SQL through the jet
|
||||
builder API (`pgtable.<Table>.INSERT/SELECT/UPDATE/DELETE` plus the
|
||||
`pg.AND/OR/SET/IN/...` DSL). `cmd/jetgen` (invoked via `make jet`)
|
||||
brings up a transient PostgreSQL container, applies the embedded
|
||||
migrations, and runs
|
||||
`github.com/go-jet/jet/v2/generator/postgres.GenerateDB` against the
|
||||
provisioned schema; the generated table/model code lives under
|
||||
`internal/adapters/postgres/jet/mail/{model,table}/*.go` and is
|
||||
committed to the repo, so build consumers do not need Docker.
|
||||
Statements are run through the `database/sql` API
|
||||
(`stmt.Sql() → db/tx.Exec/Query/QueryRow`); manual scanners preserve
|
||||
the codecs.go boundary translations and domain-type mapping.
|
||||
|
||||
**Why.** Aligns with `PG_PLAN.md` §Library stack ("Query layer:
|
||||
`github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives
|
||||
under each service `internal/adapters/postgres/jet/`, regenerated via
|
||||
a `make jet` target and committed to the repo"). Constructs the jet
|
||||
builder does not cover natively (`FOR UPDATE`, `FOR UPDATE SKIP
|
||||
LOCKED`, keyset-pagination row-comparison, JSONB params,
|
||||
`LOWER(...)` on subselects) are expressed through the per-DSL helpers
|
||||
(`.FOR(pg.UPDATE())`, `.FOR(pg.UPDATE().SKIP_LOCKED())`, `pg.LOWER`,
|
||||
`OR/AND` expansion of cursor predicates).
|
||||
|
||||
## Cross-References
|
||||
|
||||
- `PG_PLAN.md §4` (Stage 4 — Mail Service migration).
|
||||
- `ARCHITECTURE.md §Persistence Backends`.
|
||||
- `internal/adapters/postgres/migrations/00001_init.sql` and
|
||||
`internal/adapters/postgres/migrations/migrations.go`.
|
||||
- `internal/adapters/postgres/mailstore/{store,deliveries,
|
||||
auth_acceptance,generic_acceptance,render,operator,
|
||||
attempt_execution,malformed_command,codecs,helpers}.go` plus the
|
||||
testcontainers-backed unit suite under
|
||||
`mailstore/{harness,store}_test.go`.
|
||||
- `internal/adapters/postgres/jet/mail/{model,table}/*.go` (committed
|
||||
generated code) plus `cmd/jetgen/main.go` and the `make jet`
|
||||
Makefile target that regenerate it.
|
||||
- `internal/config/{config,env,validation}.go` (PostgresConfig + the
|
||||
`redisconn.Config`-shaped Redis envelope).
|
||||
- `internal/app/{runtime,bootstrap}.go` (shared Redis client + PG pool
|
||||
open + migration + mailstore wiring).
|
||||
- `internal/worker/sqlretention.go` (periodic SQL retention worker).
|
||||
- `internal/adapters/redisstate/{keyspace,offset_codec,stream_offset_store}.go`
|
||||
(surviving slim Redis surface).
|
||||
- `integration/internal/harness/mailservice.go` (per-suite Postgres
|
||||
container + `mail`/`mailservice` provisioning).
|
||||
+22
-13
@@ -7,21 +7,25 @@ verification, shutdown, and common `Mail Service` incidents.
|
||||
|
||||
Before starting the process, confirm:
|
||||
|
||||
- `MAIL_REDIS_ADDR` points to the Redis deployment that stores deliveries,
|
||||
attempts, idempotency reservations, malformed commands, and stream offsets
|
||||
- the configured Redis ACL, DB, TLS, and timeout settings match the target
|
||||
environment
|
||||
- `MAIL_REDIS_MASTER_ADDR` and `MAIL_REDIS_PASSWORD` point to the Redis
|
||||
deployment that hosts the inbound `mail:delivery_commands` Stream and the
|
||||
persisted consumer offset
|
||||
- `MAIL_POSTGRES_PRIMARY_DSN` points to the PostgreSQL deployment whose
|
||||
`mail` schema (provisioned externally for the `mailservice` role) holds the
|
||||
durable mail state — deliveries, attempts, dead letters, payloads,
|
||||
idempotency reservations, malformed commands
|
||||
- `MAIL_TEMPLATE_DIR` points to the intended immutable template catalog
|
||||
- if `MAIL_SMTP_MODE=smtp`, the SMTP address, sender identity, and optional
|
||||
credentials are configured together
|
||||
- the OpenTelemetry exporter settings point at the intended collector when
|
||||
traces or metrics are expected outside the process
|
||||
|
||||
At startup the process performs bounded `PING` checks for both Redis clients
|
||||
used by the runtime and parses the full template catalog.
|
||||
At startup the process pings the shared Redis master client, opens the
|
||||
PostgreSQL pool, applies embedded goose migrations strictly before any HTTP
|
||||
listener opens, parses the full template catalog, and only then starts the
|
||||
internal HTTP listener and background workers.
|
||||
|
||||
Startup fails fast if those checks fail or if the template catalog cannot be
|
||||
loaded.
|
||||
Startup fails fast if any of those steps fail.
|
||||
|
||||
Known startup caveats:
|
||||
|
||||
@@ -36,11 +40,13 @@ Known startup caveats:
|
||||
Practical readiness verification is:
|
||||
|
||||
1. confirm the process emitted startup logs for the internal HTTP listener,
|
||||
command consumer, scheduler, and worker pool
|
||||
command consumer, scheduler, attempt worker pool, and SQL retention
|
||||
worker
|
||||
2. open a TCP connection to `MAIL_INTERNAL_HTTP_ADDR`
|
||||
3. issue one trusted smoke request such as
|
||||
`GET /api/v1/internal/deliveries/does-not-exist`
|
||||
4. verify Redis connectivity and OpenTelemetry exporter health out of band
|
||||
4. verify Redis and PostgreSQL connectivity, plus OpenTelemetry exporter
|
||||
health, out of band
|
||||
|
||||
Expected steady-state signals:
|
||||
|
||||
@@ -58,14 +64,15 @@ Shutdown behavior:
|
||||
|
||||
- coordinated shutdown is bounded by `MAIL_SHUTDOWN_TIMEOUT`
|
||||
- the internal HTTP listener is stopped before process resources are closed
|
||||
- Redis clients are closed after the app stops
|
||||
- the Redis master client and PostgreSQL pool are closed after the app stops
|
||||
- OpenTelemetry providers are flushed during runtime cleanup
|
||||
|
||||
During a planned restart:
|
||||
|
||||
1. send `SIGTERM`
|
||||
2. wait for listener and worker shutdown logs
|
||||
3. restart the process with the same Redis and template configuration
|
||||
3. restart the process with the same Redis, PostgreSQL, and template
|
||||
configuration
|
||||
4. repeat the steady-state verification steps
|
||||
|
||||
## Incident Triage
|
||||
@@ -81,7 +88,9 @@ Symptoms:
|
||||
Checks:
|
||||
|
||||
1. confirm the scheduler is still logging regular activity
|
||||
2. confirm Redis connectivity and latency for attempt-schedule keys
|
||||
2. confirm PostgreSQL connectivity and latency on the `deliveries`
|
||||
`(next_attempt_at)` partial index — scheduler claims rely on
|
||||
`FOR UPDATE SKIP LOCKED`, so contention here surfaces as backlog
|
||||
3. confirm attempt workers are running and not blocked on SMTP
|
||||
4. inspect `mail.provider.send.duration_ms` for elevated latency
|
||||
5. verify `MAIL_ATTEMPT_WORKER_CONCURRENCY` is appropriate for the workload
|
||||
|
||||
+27
-17
@@ -104,17 +104,21 @@ configuration or unavailable Redis.
|
||||
- processes only already claimed work items
|
||||
- concurrency is controlled by `MAIL_ATTEMPT_WORKER_CONCURRENCY`
|
||||
|
||||
### Cleanup worker
|
||||
### SQL retention worker
|
||||
|
||||
- removes stale delivery-index members after primary delivery expiry
|
||||
- does not clean `mail:attempt_schedule`
|
||||
- does not clean malformed-command index entries
|
||||
- periodically deletes expired `deliveries` rows whose retention window has
|
||||
elapsed; cascades to `attempts`, `dead_letters`, `delivery_payloads`, and
|
||||
`delivery_recipients`
|
||||
- periodically deletes expired `malformed_commands` rows
|
||||
- runs an immediate first pass at startup, then on `MAIL_CLEANUP_INTERVAL`
|
||||
|
||||
## Configuration Groups
|
||||
|
||||
Required for all starts:
|
||||
|
||||
- `MAIL_REDIS_ADDR`
|
||||
- `MAIL_REDIS_MASTER_ADDR`
|
||||
- `MAIL_REDIS_PASSWORD`
|
||||
- `MAIL_POSTGRES_PRIMARY_DSN`
|
||||
|
||||
Core process config:
|
||||
|
||||
@@ -128,16 +132,23 @@ Internal HTTP config:
|
||||
- `MAIL_INTERNAL_HTTP_READ_TIMEOUT`
|
||||
- `MAIL_INTERNAL_HTTP_IDLE_TIMEOUT`
|
||||
|
||||
Redis connectivity:
|
||||
Redis connectivity (`pkg/redisconn` shape):
|
||||
|
||||
- `MAIL_REDIS_USERNAME`
|
||||
- `MAIL_REDIS_MASTER_ADDR`
|
||||
- `MAIL_REDIS_REPLICA_ADDRS`
|
||||
- `MAIL_REDIS_PASSWORD`
|
||||
- `MAIL_REDIS_DB`
|
||||
- `MAIL_REDIS_TLS_ENABLED`
|
||||
- `MAIL_REDIS_OPERATION_TIMEOUT`
|
||||
- `MAIL_REDIS_COMMAND_STREAM`
|
||||
- `MAIL_REDIS_ATTEMPT_SCHEDULE_KEY`
|
||||
- `MAIL_REDIS_DEAD_LETTER_PREFIX`
|
||||
|
||||
PostgreSQL connectivity (`pkg/postgres` shape):
|
||||
|
||||
- `MAIL_POSTGRES_PRIMARY_DSN`
|
||||
- `MAIL_POSTGRES_REPLICA_DSNS`
|
||||
- `MAIL_POSTGRES_OPERATION_TIMEOUT`
|
||||
- `MAIL_POSTGRES_MAX_OPEN_CONNS`
|
||||
- `MAIL_POSTGRES_MAX_IDLE_CONNS`
|
||||
- `MAIL_POSTGRES_CONN_MAX_LIFETIME`
|
||||
|
||||
SMTP provider:
|
||||
|
||||
@@ -157,8 +168,9 @@ Templates and workers:
|
||||
- `MAIL_STREAM_BLOCK_TIMEOUT`
|
||||
- `MAIL_OPERATOR_REQUEST_TIMEOUT`
|
||||
- `MAIL_IDEMPOTENCY_TTL`
|
||||
- `MAIL_DELIVERY_TTL`
|
||||
- `MAIL_ATTEMPT_TTL`
|
||||
- `MAIL_DELIVERY_RETENTION`
|
||||
- `MAIL_MALFORMED_COMMAND_RETENTION`
|
||||
- `MAIL_CLEANUP_INTERVAL`
|
||||
|
||||
Telemetry:
|
||||
|
||||
@@ -174,13 +186,11 @@ Telemetry:
|
||||
## Runtime Notes
|
||||
|
||||
- `MAIL_REDIS_COMMAND_STREAM` is the only Redis key override that currently
|
||||
changes runtime behavior
|
||||
changes runtime behavior; durable mail state otherwise lives in PostgreSQL
|
||||
- `MAIL_SMTP_INSECURE_SKIP_VERIFY` is a local-development escape hatch for
|
||||
self-signed SMTP capture only and should remain disabled in production
|
||||
- attempt-schedule and dead-letter key overrides are parsed but not yet wired
|
||||
into Redis adapters
|
||||
- retention overrides are parsed but storage still uses the fixed `7d`, `30d`,
|
||||
and `90d` values
|
||||
- the SQL retention worker is the only periodic durable cleanup; PostgreSQL
|
||||
indexes are maintained by the engine
|
||||
- template catalog parsing is eager and immutable
|
||||
- auth deliveries in `MAIL_SMTP_MODE=stub` surface as `suppressed`
|
||||
- auth deliveries in `MAIL_SMTP_MODE=smtp` surface as `queued` and later move
|
||||
|
||||
Reference in New Issue
Block a user