galaxy-game/backend/docs/diplomail-translator-setup.md

# LibreTranslate setup for diplomatic mail

This document describes how to run the LibreTranslate backend that the
diplomatic-mail subsystem uses for body translation. The instructions
target three audiences: developers spinning up LibreTranslate
alongside `tools/local-dev`, operators preparing a real deployment,
and reviewers verifying the end-to-end translation flow by hand.

## When you need LibreTranslate

The diplomatic-mail worker runs unconditionally — `make up` and `make
test` both work without any translator. With
`BACKEND_DIPLOMAIL_TRANSLATOR_URL` unset, the noop translator
short-circuits the pipeline: messages are delivered in the original
language, and the inbox handler returns the original body to every
reader.

You only need LibreTranslate when you want to exercise the cross-
language path: sender writes in language X, recipient's
`accounts.preferred_language` is Y, the worker is expected to fetch
a Y rendering. The pipeline is otherwise identical and unaware of
which engine is producing translations.

## Running a local instance

LibreTranslate ships a public Docker image at
`libretranslate/libretranslate`. The image is ~3 GB on first pull
because it bundles every supported language model; subsequent runs
reuse the layer cache.

The simplest setup is a one-shot container:

```bash
docker run --rm -d --name libretranslate \
  -p 5000:5000 \
  -e LT_LOAD_ONLY=en,ru \
  libretranslate/libretranslate:latest
```

The `LT_LOAD_ONLY` whitelist trims the loaded model set so the
container fits in ~600 MB of RAM. Drop the variable to load every
language pair LibreTranslate ships.

LibreTranslate boots in ~30 seconds (cold) or ~5 seconds (warm
model cache). Wait until `curl -s http://localhost:5000/languages`
returns a JSON array before pointing backend at it.

## Wiring backend at it

Add three env vars to the backend process:

```
BACKEND_DIPLOMAIL_TRANSLATOR_URL=http://localhost:5000
BACKEND_DIPLOMAIL_TRANSLATOR_TIMEOUT=10s
BACKEND_DIPLOMAIL_TRANSLATOR_MAX_ATTEMPTS=5
```

When backend lives inside the `tools/local-dev` Docker network and
LibreTranslate runs on the host, replace `localhost` with the host's
docker-bridge address (`http://host.docker.internal:5000` on
Docker Desktop; `http://172.17.0.1:5000` on a Linux bridge by
default).

For a stack-internal deployment, drop LibreTranslate into the same
Docker compose file alongside backend and reach it by its service
name:

```yaml
services:
  libretranslate:
    image: libretranslate/libretranslate:latest
    environment:
      LT_LOAD_ONLY: "en,ru"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:5000/languages"]
      interval: 5s
      timeout: 2s
      retries: 12

  backend:
    environment:
      BACKEND_DIPLOMAIL_TRANSLATOR_URL: "http://libretranslate:5000"
    depends_on:
      libretranslate:
        condition: service_healthy
```

## Manual smoke test

Once both services are up:

1. Register two accounts via the public auth flow. Set the second
   account's `preferred_language` to a value that differs from the
   sender's writing language (e.g. sender writes in English, second
   account is `ru`).
2. Create a private game with the first account, invite the second,
   land both as active members.
3. Send a personal message: `POST /api/v1/user/games/{id}/mail/messages`
   with the body in English.
4. Watch backend logs for the diplomail worker. After ~2 seconds you
   should see `translator attempt succeeded` (or equivalent INFO
   line) and the recipient flipped to `available_at`.
5. As the second account, fetch
   `GET /api/v1/user/games/{id}/mail/messages/{message_id}`. The
   response should carry both `body` (English original) and
   `translated_body` (Russian) along with the `translation_lang`
   and `translator` fields.

## Operational notes

- **Resource budget.** With `LT_LOAD_ONLY=en,ru` the container peaks
  around 800 MB resident; with all languages, ~3 GB. Plan accordingly.
- **CPU.** LibreTranslate is CPU-bound. One translation of a 200-
  word body takes ~200 ms on a modern x86 core; the diplomail worker
  is single-threaded by design, so steady-state throughput is
  `1 / avg_latency` per backend instance.
- **Outage behaviour.** A LibreTranslate outage stalls delivery of
  pending pairs by at most ~31 seconds per pair (the worker's
  exponential backoff schedule), then falls back to the original
  body. Inbox listings never depend on the translator's
  availability.
- **API key.** Backend does not send an API key. Self-hosted
  deployments without `LT_API_KEYS` configured accept anonymous
  POSTs by default, which matches our deployment posture
  (LibreTranslate sits on the internal docker network, not
  reachable from outside).
- **Models.** Adding a new target language is an operator-side
  task: install the corresponding Argos model into the
  LibreTranslate container (`argospm install …`) and either restart
  the container or send a SIGHUP. The diplomail pipeline notices
  the new language pair automatically — there is no allow-list
  inside backend.

## Troubleshooting

- **`translator: do request: dial tcp ...: connect: connection refused`.**
  LibreTranslate is not listening on the configured address. Verify
  with `curl http://${URL}/languages`. On Docker setups, double-
  check the bridge address discussion above.
- **`translator: libretranslate http 400`** in worker logs but the
  language pair clearly exists.
  Make sure the request used the two-letter codes (`en`, not
  `en-US`). Backend normalises before sending; if you see a region
  subtag in the log, file an issue against `internal/diplomail` —
  the normalisation should be unconditional.
- **`translator: libretranslate http 503`.**
  Container is still loading models. Wait for `/languages` to
  respond `200`. The worker retries with backoff, so steady-state
  recovers automatically.
- **Worker logs only "noop translator returned, delivering
  fallback".**
  `BACKEND_DIPLOMAIL_TRANSLATOR_URL` is empty in the backend
  process. Confirm with `docker compose exec backend env | grep
  DIPLOMAIL`.

## Future work

- Adding an OpenTelemetry counter and histogram for translator
  outcomes is tracked in the diplomail package README; the metrics
  will surface in Grafana once LibreTranslate is deployed.
- Email-alerting on prolonged outage (e.g. ≥ N consecutive failures
  in M minutes) is planned through a new
  `diplomail.translator.unhealthy` notification kind. Not wired
  yet — current monitoring lives in zap logs.