feat: use postgres

This commit is contained in:
Ilia Denisov
2026-04-26 20:34:39 +02:00
committed by GitHub
parent 48b0056b49
commit fe829285a6
365 changed files with 29223 additions and 24049 deletions
+22 -13
View File
@@ -7,21 +7,25 @@ verification, shutdown, and common `Mail Service` incidents.
Before starting the process, confirm:
- `MAIL_REDIS_ADDR` points to the Redis deployment that stores deliveries,
attempts, idempotency reservations, malformed commands, and stream offsets
- the configured Redis ACL, DB, TLS, and timeout settings match the target
environment
- `MAIL_REDIS_MASTER_ADDR` and `MAIL_REDIS_PASSWORD` point to the Redis
deployment that hosts the inbound `mail:delivery_commands` Stream and the
persisted consumer offset
- `MAIL_POSTGRES_PRIMARY_DSN` points to the PostgreSQL deployment whose
`mail` schema (provisioned externally for the `mailservice` role) holds the
durable mail state — deliveries, attempts, dead letters, payloads,
idempotency reservations, malformed commands
- `MAIL_TEMPLATE_DIR` points to the intended immutable template catalog
- if `MAIL_SMTP_MODE=smtp`, the SMTP address, sender identity, and optional
credentials are configured together
- the OpenTelemetry exporter settings point at the intended collector when
traces or metrics are expected outside the process
At startup the process performs bounded `PING` checks for both Redis clients
used by the runtime and parses the full template catalog.
At startup the process pings the shared Redis master client, opens the
PostgreSQL pool, applies embedded goose migrations strictly before any HTTP
listener opens, parses the full template catalog, and only then starts the
internal HTTP listener and background workers.
Startup fails fast if those checks fail or if the template catalog cannot be
loaded.
Startup fails fast if any of those steps fail.
Known startup caveats:
@@ -36,11 +40,13 @@ Known startup caveats:
Practical readiness verification is:
1. confirm the process emitted startup logs for the internal HTTP listener,
command consumer, scheduler, and worker pool
command consumer, scheduler, attempt worker pool, and SQL retention
worker
2. open a TCP connection to `MAIL_INTERNAL_HTTP_ADDR`
3. issue one trusted smoke request such as
`GET /api/v1/internal/deliveries/does-not-exist`
4. verify Redis connectivity and OpenTelemetry exporter health out of band
4. verify Redis and PostgreSQL connectivity, plus OpenTelemetry exporter
health, out of band
Expected steady-state signals:
@@ -58,14 +64,15 @@ Shutdown behavior:
- coordinated shutdown is bounded by `MAIL_SHUTDOWN_TIMEOUT`
- the internal HTTP listener is stopped before process resources are closed
- Redis clients are closed after the app stops
- the Redis master client and PostgreSQL pool are closed after the app stops
- OpenTelemetry providers are flushed during runtime cleanup
During a planned restart:
1. send `SIGTERM`
2. wait for listener and worker shutdown logs
3. restart the process with the same Redis and template configuration
3. restart the process with the same Redis, PostgreSQL, and template
configuration
4. repeat the steady-state verification steps
## Incident Triage
@@ -81,7 +88,9 @@ Symptoms:
Checks:
1. confirm the scheduler is still logging regular activity
2. confirm Redis connectivity and latency for attempt-schedule keys
2. confirm PostgreSQL connectivity and latency on the `deliveries`
`(next_attempt_at)` partial index — scheduler claims rely on
`FOR UPDATE SKIP LOCKED`, so contention here surfaces as backlog
3. confirm attempt workers are running and not blocked on SMTP
4. inspect `mail.provider.send.duration_ms` for elevated latency
5. verify `MAIL_ATTEMPT_WORKER_CONCURRENCY` is appropriate for the workload