feat: use postgres
This commit is contained in:
+22
-13
@@ -7,21 +7,25 @@ verification, shutdown, and common `Mail Service` incidents.
|
||||
|
||||
Before starting the process, confirm:
|
||||
|
||||
- `MAIL_REDIS_ADDR` points to the Redis deployment that stores deliveries,
|
||||
attempts, idempotency reservations, malformed commands, and stream offsets
|
||||
- the configured Redis ACL, DB, TLS, and timeout settings match the target
|
||||
environment
|
||||
- `MAIL_REDIS_MASTER_ADDR` and `MAIL_REDIS_PASSWORD` point to the Redis
|
||||
deployment that hosts the inbound `mail:delivery_commands` Stream and the
|
||||
persisted consumer offset
|
||||
- `MAIL_POSTGRES_PRIMARY_DSN` points to the PostgreSQL deployment whose
|
||||
`mail` schema (provisioned externally for the `mailservice` role) holds the
|
||||
durable mail state — deliveries, attempts, dead letters, payloads,
|
||||
idempotency reservations, malformed commands
|
||||
- `MAIL_TEMPLATE_DIR` points to the intended immutable template catalog
|
||||
- if `MAIL_SMTP_MODE=smtp`, the SMTP address, sender identity, and optional
|
||||
credentials are configured together
|
||||
- the OpenTelemetry exporter settings point at the intended collector when
|
||||
traces or metrics are expected outside the process
|
||||
|
||||
At startup the process performs bounded `PING` checks for both Redis clients
|
||||
used by the runtime and parses the full template catalog.
|
||||
At startup the process pings the shared Redis master client, opens the
|
||||
PostgreSQL pool, applies embedded goose migrations strictly before any HTTP
|
||||
listener opens, parses the full template catalog, and only then starts the
|
||||
internal HTTP listener and background workers.
|
||||
|
||||
Startup fails fast if those checks fail or if the template catalog cannot be
|
||||
loaded.
|
||||
Startup fails fast if any of those steps fail.
|
||||
|
||||
Known startup caveats:
|
||||
|
||||
@@ -36,11 +40,13 @@ Known startup caveats:
|
||||
Practical readiness verification is:
|
||||
|
||||
1. confirm the process emitted startup logs for the internal HTTP listener,
|
||||
command consumer, scheduler, and worker pool
|
||||
command consumer, scheduler, attempt worker pool, and SQL retention
|
||||
worker
|
||||
2. open a TCP connection to `MAIL_INTERNAL_HTTP_ADDR`
|
||||
3. issue one trusted smoke request such as
|
||||
`GET /api/v1/internal/deliveries/does-not-exist`
|
||||
4. verify Redis connectivity and OpenTelemetry exporter health out of band
|
||||
4. verify Redis and PostgreSQL connectivity, plus OpenTelemetry exporter
|
||||
health, out of band
|
||||
|
||||
Expected steady-state signals:
|
||||
|
||||
@@ -58,14 +64,15 @@ Shutdown behavior:
|
||||
|
||||
- coordinated shutdown is bounded by `MAIL_SHUTDOWN_TIMEOUT`
|
||||
- the internal HTTP listener is stopped before process resources are closed
|
||||
- Redis clients are closed after the app stops
|
||||
- the Redis master client and PostgreSQL pool are closed after the app stops
|
||||
- OpenTelemetry providers are flushed during runtime cleanup
|
||||
|
||||
During a planned restart:
|
||||
|
||||
1. send `SIGTERM`
|
||||
2. wait for listener and worker shutdown logs
|
||||
3. restart the process with the same Redis and template configuration
|
||||
3. restart the process with the same Redis, PostgreSQL, and template
|
||||
configuration
|
||||
4. repeat the steady-state verification steps
|
||||
|
||||
## Incident Triage
|
||||
@@ -81,7 +88,9 @@ Symptoms:
|
||||
Checks:
|
||||
|
||||
1. confirm the scheduler is still logging regular activity
|
||||
2. confirm Redis connectivity and latency for attempt-schedule keys
|
||||
2. confirm PostgreSQL connectivity and latency on the `deliveries`
|
||||
`(next_attempt_at)` partial index — scheduler claims rely on
|
||||
`FOR UPDATE SKIP LOCKED`, so contention here surfaces as backlog
|
||||
3. confirm attempt workers are running and not blocked on SMTP
|
||||
4. inspect `mail.provider.send.duration_ms` for elevated latency
|
||||
5. verify `MAIL_ATTEMPT_WORKER_CONCURRENCY` is appropriate for the workload
|
||||
|
||||
Reference in New Issue
Block a user