diplomail (Stage F): docs + edge-case tests + LibreTranslate recipe
Closes the documentation gaps from the freshly-audited diplomail implementation. FUNCTIONAL.md gains a §11 "Diplomatic mail" with the full user-facing story across all five stages, mirrored into FUNCTIONAL_ru.md as the project conventions require. A new backend/docs/diplomail-translator-setup.md captures the LibreTranslate operational recipe (Docker image, env wiring, manual smoke test, troubleshooting). The package README gains a "Multi-instance posture" note documenting the deliberate absence of FOR UPDATE in the worker pickup query — single-instance is safe today; multi-instance scaling will revisit the claim mechanism. Two small edge-case tests round things out: malformed LibreTranslate response bodies (single string, short array, empty array, missing field) must surface as errors so the worker falls back instead of crashing; and an empty translation queue must produce zero events on three consecutive Worker.Tick calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,164 @@
|
||||
# LibreTranslate setup for diplomatic mail
|
||||
|
||||
This document describes how to run the LibreTranslate backend that the
|
||||
diplomatic-mail subsystem uses for body translation. The instructions
|
||||
target three audiences: developers spinning up LibreTranslate
|
||||
alongside `tools/local-dev`, operators preparing a real deployment,
|
||||
and reviewers verifying the end-to-end translation flow by hand.
|
||||
|
||||
## When you need LibreTranslate
|
||||
|
||||
The diplomatic-mail worker runs unconditionally — `make up` and `make
|
||||
test` both work without any translator. With
|
||||
`BACKEND_DIPLOMAIL_TRANSLATOR_URL` unset, the noop translator
|
||||
short-circuits the pipeline: messages are delivered in the original
|
||||
language, and the inbox handler returns the original body to every
|
||||
reader.
|
||||
|
||||
You only need LibreTranslate when you want to exercise the cross-
|
||||
language path: sender writes in language X, recipient's
|
||||
`accounts.preferred_language` is Y, the worker is expected to fetch
|
||||
a Y rendering. The pipeline is otherwise identical and unaware of
|
||||
which engine is producing translations.
|
||||
|
||||
## Running a local instance
|
||||
|
||||
LibreTranslate ships a public Docker image at
|
||||
`libretranslate/libretranslate`. The image is ~3 GB on first pull
|
||||
because it bundles every supported language model; subsequent runs
|
||||
reuse the layer cache.
|
||||
|
||||
The simplest setup is a one-shot container:
|
||||
|
||||
```bash
|
||||
docker run --rm -d --name libretranslate \
|
||||
-p 5000:5000 \
|
||||
-e LT_LOAD_ONLY=en,ru \
|
||||
libretranslate/libretranslate:latest
|
||||
```
|
||||
|
||||
The `LT_LOAD_ONLY` whitelist trims the loaded model set so the
|
||||
container fits in ~600 MB of RAM. Drop the variable to load every
|
||||
language pair LibreTranslate ships.
|
||||
|
||||
LibreTranslate boots in ~30 seconds (cold) or ~5 seconds (warm
|
||||
model cache). Wait until `curl -s http://localhost:5000/languages`
|
||||
returns a JSON array before pointing backend at it.
|
||||
|
||||
## Wiring backend at it
|
||||
|
||||
Add three env vars to the backend process:
|
||||
|
||||
```
|
||||
BACKEND_DIPLOMAIL_TRANSLATOR_URL=http://localhost:5000
|
||||
BACKEND_DIPLOMAIL_TRANSLATOR_TIMEOUT=10s
|
||||
BACKEND_DIPLOMAIL_TRANSLATOR_MAX_ATTEMPTS=5
|
||||
```
|
||||
|
||||
When backend lives inside the `tools/local-dev` Docker network and
|
||||
LibreTranslate runs on the host, replace `localhost` with the host's
|
||||
docker-bridge address (`http://host.docker.internal:5000` on
|
||||
Docker Desktop; `http://172.17.0.1:5000` on a Linux bridge by
|
||||
default).
|
||||
|
||||
For a stack-internal deployment, drop LibreTranslate into the same
|
||||
Docker compose file alongside backend and reach it by its service
|
||||
name:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
libretranslate:
|
||||
image: libretranslate/libretranslate:latest
|
||||
environment:
|
||||
LT_LOAD_ONLY: "en,ru"
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-qO-", "http://localhost:5000/languages"]
|
||||
interval: 5s
|
||||
timeout: 2s
|
||||
retries: 12
|
||||
|
||||
backend:
|
||||
environment:
|
||||
BACKEND_DIPLOMAIL_TRANSLATOR_URL: "http://libretranslate:5000"
|
||||
depends_on:
|
||||
libretranslate:
|
||||
condition: service_healthy
|
||||
```
|
||||
|
||||
## Manual smoke test
|
||||
|
||||
Once both services are up:
|
||||
|
||||
1. Register two accounts via the public auth flow. Set the second
|
||||
account's `preferred_language` to a value that differs from the
|
||||
sender's writing language (e.g. sender writes in English, second
|
||||
account is `ru`).
|
||||
2. Create a private game with the first account, invite the second,
|
||||
land both as active members.
|
||||
3. Send a personal message: `POST /api/v1/user/games/{id}/mail/messages`
|
||||
with the body in English.
|
||||
4. Watch backend logs for the diplomail worker. After ~2 seconds you
|
||||
should see `translator attempt succeeded` (or equivalent INFO
|
||||
line) and the recipient flipped to `available_at`.
|
||||
5. As the second account, fetch
|
||||
`GET /api/v1/user/games/{id}/mail/messages/{message_id}`. The
|
||||
response should carry both `body` (English original) and
|
||||
`translated_body` (Russian) along with the `translation_lang`
|
||||
and `translator` fields.
|
||||
|
||||
## Operational notes
|
||||
|
||||
- **Resource budget.** With `LT_LOAD_ONLY=en,ru` the container peaks
|
||||
around 800 MB resident; with all languages, ~3 GB. Plan accordingly.
|
||||
- **CPU.** LibreTranslate is CPU-bound. One translation of a 200-
|
||||
word body takes ~200 ms on a modern x86 core; the diplomail worker
|
||||
is single-threaded by design, so steady-state throughput is
|
||||
`1 / avg_latency` per backend instance.
|
||||
- **Outage behaviour.** A LibreTranslate outage stalls delivery of
|
||||
pending pairs by at most ~31 seconds per pair (the worker's
|
||||
exponential backoff schedule), then falls back to the original
|
||||
body. Inbox listings never depend on the translator's
|
||||
availability.
|
||||
- **API key.** Backend does not send an API key. Self-hosted
|
||||
deployments without `LT_API_KEYS` configured accept anonymous
|
||||
POSTs by default, which matches our deployment posture
|
||||
(LibreTranslate sits on the internal docker network, not
|
||||
reachable from outside).
|
||||
- **Models.** Adding a new target language is an operator-side
|
||||
task: install the corresponding Argos model into the
|
||||
LibreTranslate container (`argospm install …`) and either restart
|
||||
the container or send a SIGHUP. The diplomail pipeline notices
|
||||
the new language pair automatically — there is no allow-list
|
||||
inside backend.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **`translator: do request: dial tcp ...: connect: connection refused`.**
|
||||
LibreTranslate is not listening on the configured address. Verify
|
||||
with `curl http://${URL}/languages`. On Docker setups, double-
|
||||
check the bridge address discussion above.
|
||||
- **`translator: libretranslate http 400`** in worker logs but the
|
||||
language pair clearly exists.
|
||||
Make sure the request used the two-letter codes (`en`, not
|
||||
`en-US`). Backend normalises before sending; if you see a region
|
||||
subtag in the log, file an issue against `internal/diplomail` —
|
||||
the normalisation should be unconditional.
|
||||
- **`translator: libretranslate http 503`.**
|
||||
Container is still loading models. Wait for `/languages` to
|
||||
respond `200`. The worker retries with backoff, so steady-state
|
||||
recovers automatically.
|
||||
- **Worker logs only "noop translator returned, delivering
|
||||
fallback".**
|
||||
`BACKEND_DIPLOMAIL_TRANSLATOR_URL` is empty in the backend
|
||||
process. Confirm with `docker compose exec backend env | grep
|
||||
DIPLOMAIL`.
|
||||
|
||||
## Future work
|
||||
|
||||
- Adding an OpenTelemetry counter and histogram for translator
|
||||
outcomes is tracked in the diplomail package README; the metrics
|
||||
will surface in Grafana once LibreTranslate is deployed.
|
||||
- Email-alerting on prolonged outage (e.g. ≥ N consecutive failures
|
||||
in M minutes) is planned through a new
|
||||
`diplomail.translator.unhealthy` notification kind. Not wired
|
||||
yet — current monitoring lives in zap logs.
|
||||
@@ -136,6 +136,29 @@ through standard OpenTelemetry export — translation outcomes
|
||||
surface in `diplomail.worker` logs at Info / Warn levels;
|
||||
Grafana / Prometheus dashboards live outside this package.
|
||||
|
||||
### Multi-instance posture (known limitation)
|
||||
|
||||
`PickPendingTranslationPair` intentionally drops `FOR UPDATE`: the
|
||||
worker is single-threaded per process, and we did not want a slow
|
||||
LibreTranslate HTTP call to keep a row-lock open. The cost is a
|
||||
small window where two backend instances pulling at the same
|
||||
moment can both claim the same pair: the cache-write side stays
|
||||
clean (`INSERT … ON CONFLICT DO NOTHING`), but each instance will
|
||||
publish its own push event to every recipient of the pair, so the
|
||||
duplicate push is the visible failure mode.
|
||||
|
||||
The current deployment runs a single backend instance and the
|
||||
window does not exist. When the platform scales to multiple
|
||||
instances, we will revisit the pickup query — either by holding
|
||||
the lock through the HTTP call (with a short timeout to bound the
|
||||
worst case) or by introducing a `claimed_at` column and a
|
||||
short-lived advisory lease. The change is local to this package
|
||||
and does not affect callers.
|
||||
|
||||
For the LibreTranslate operational recipe — installing, wiring,
|
||||
manual smoke test — see
|
||||
[`backend/docs/diplomail-translator-setup.md`](../../docs/diplomail-translator-setup.md).
|
||||
|
||||
## Push integration
|
||||
|
||||
Every successful send emits a `diplomail.message.received` push
|
||||
|
||||
@@ -808,6 +808,36 @@ func TestDiplomailLifecycleMembershipKick(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiplomailWorkerTickOnEmptyQueueIsNoop confirms the async
|
||||
// worker tolerates an empty pending queue: no error, no panic, no
|
||||
// publisher events. Belt-and-suspenders for the case where backend
|
||||
// starts, mounts the worker as an `app.Component`, and ticks before
|
||||
// any user has sent mail.
|
||||
func TestDiplomailWorkerTickOnEmptyQueueIsNoop(t *testing.T) {
|
||||
db := startPostgres(t)
|
||||
ctx := context.Background()
|
||||
|
||||
publisher := &recordingPublisher{}
|
||||
svc := diplomail.NewService(diplomail.Deps{
|
||||
Store: diplomail.NewStore(db),
|
||||
Memberships: &staticMembershipLookup{},
|
||||
Notification: publisher,
|
||||
Config: config.DiplomailConfig{
|
||||
MaxBodyBytes: 4096,
|
||||
MaxSubjectBytes: 256,
|
||||
},
|
||||
})
|
||||
worker := diplomail.NewWorker(svc)
|
||||
for i := 0; i < 3; i++ {
|
||||
if err := worker.Tick(ctx); err != nil {
|
||||
t.Fatalf("tick %d on empty queue: %v", i, err)
|
||||
}
|
||||
}
|
||||
if got := publisher.snapshot(); len(got) != 0 {
|
||||
t.Fatalf("publisher fired %d events on empty queue", len(got))
|
||||
}
|
||||
}
|
||||
|
||||
// TestDiplomailAsyncTranslationDelivery covers the Stage E flow:
|
||||
// 1. SendPersonal where recipient.preferred_language != body_lang
|
||||
// materialises a recipient with `AvailableAt == nil`; the inbox
|
||||
|
||||
@@ -139,3 +139,35 @@ func TestLibreTranslateRequiresURL(t *testing.T) {
|
||||
t.Fatalf("expected error for empty URL")
|
||||
}
|
||||
}
|
||||
|
||||
// TestLibreTranslateRejectsMalformedArray defends against a server
|
||||
// that returns a partial / unexpected `translatedText` payload. The
|
||||
// client must surface an error (not panic, not return a half-empty
|
||||
// Result) so the worker can decide between retry and fallback.
|
||||
func TestLibreTranslateRejectsMalformedArray(t *testing.T) {
|
||||
t.Parallel()
|
||||
cases := []struct {
|
||||
name string
|
||||
body string
|
||||
}{
|
||||
{"single string", `{"translatedText": "only one"}`},
|
||||
{"array of one", `{"translatedText": ["only one"]}`},
|
||||
{"empty array", `{"translatedText": []}`},
|
||||
{"missing field", `{"foo":"bar"}`},
|
||||
}
|
||||
for _, tc := range cases {
|
||||
body := tc.body
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
_, _ = w.Write([]byte(body))
|
||||
}))
|
||||
t.Cleanup(server.Close)
|
||||
tr, _ := NewLibreTranslate(LibreTranslateConfig{URL: server.URL})
|
||||
res, err := tr.Translate(context.Background(), "en", "ru", "subject", "body")
|
||||
if err == nil {
|
||||
t.Fatalf("expected error for malformed body %q, got %+v", body, res)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user