# Auth / Session Service

## Run and Dependencies

`cmd/authsession` starts two HTTP listeners:

- public REST on `AUTHSESSION_PUBLIC_HTTP_ADDR` with default `:8080`
- trusted internal REST on `AUTHSESSION_INTERNAL_HTTP_ADDR` with default `:8081`

Startup requires:

- one reachable Redis deployment configured by `AUTHSESSION_REDIS_ADDR`

That Redis deployment is used for:

- source-of-truth challenges
- source-of-truth device sessions
- dynamic active-session limit config
- gateway session projection cache and stream updates
- send-email-code resend throttling

Optional integrations:

- `AUTHSESSION_USER_SERVICE_MODE=stub|rest`
- `AUTHSESSION_MAIL_SERVICE_MODE=stub|rest`
- OTLP telemetry through standard `OTEL_*` variables
- stdout telemetry through
  `AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED` and
  `AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED`

Operational caveats:

- the service exposes no `/healthz`, `/readyz`, or `/metrics` endpoints
- user-service and mail-service default to in-process stub adapters until
  `rest` mode is configured
- startup performs bounded Redis `PING` checks for every Redis-backed adapter
  and fails fast if Redis or runtime config is invalid

Additional module docs:

- [Public REST contract](api/public-openapi.yaml)
- [Internal REST contract](api/internal-openapi.yaml)
- [Documentation index](docs/README.md)
- [Edge Gateway README](../gateway/README.md)

## Purpose

`Auth / Session Service` owns e-mail-code authentication and the lifecycle of
device sessions.

It is the source of truth for:

- authentication challenges
- device sessions
- revoke and block state
- publication of session lifecycle updates consumed by
  [`Edge Gateway`](../gateway/README.md)

The service is intentionally not on the hot path for every authenticated
request. Gateway authenticates the steady-state request path from its own cache
and session-lifecycle updates rather than by synchronous round-trips back to
auth for each command.

## Responsibilities

The service is responsible for:

- public auth commands:
  - `send-email-code`
  - `confirm-email-code`
- creating device sessions after successful confirmation
- registering the client public key for a newly created session
- revoking one device session
- revoking all sessions of one user
- blocking a user or e-mail subject for future auth flows
- persisting source-of-truth session state
- projecting session state into gateway-consumable Redis data
- exposing a trusted internal REST API for read, revoke, and block operations

The service is not responsible for:

- verifying authenticated transport signatures on every business request
- gateway anti-replay for authenticated command traffic
- downstream business authorization
- direct push delivery to clients
- long-lived hot-path session caching inside gateway
- mail-service implementation details beyond the dedicated login-code delivery
  REST contract

## Position in the System

```mermaid
flowchart LR
    Client["Client"]
    Gateway["Edge Gateway"]
    Auth["Auth / Session Service"]
    User["User Service"]
    Mail["Mail Service"]
    Redis["Redis"]
    Business["Business Services"]

    Client --> Gateway
    Gateway --> Auth
    Gateway --> Business
    Auth --> User
    Auth --> Mail
    Auth --> Redis
    Redis --> Gateway
```

## Main Principles

- public auth stays synchronous
- `send-email-code` returns `challenge_id`
- `confirm-email-code` returns a ready `device_session_id`
- no pending async session-provisioning stage exists
- source-of-truth session state and gateway-facing projection remain separate
- Redis is the initial backend, but the domain and service layers stay storage
  agnostic behind ports
- `send-email-code` stays success-shaped for existing, new, blocked, and
  throttled e-mail flows
- `confirm-email-code` supports short-window idempotent retry for the same
  confirmed challenge and the same `client_public_key`
- active-session limits are configuration driven:
  - absent limit means disabled
  - limit overflow rejects new session creation explicitly
  - the service does not evict existing sessions to make room

## Gateway-Facing Public Contract

Gateway already exposes the public REST auth surface and delegates it to this
service:

- `POST /api/v1/public/auth/send-email-code`
- `POST /api/v1/public/auth/confirm-email-code`

The effective DTO contract is:

| Operation | Request | Success response |
| --- | --- | --- |
| `POST /api/v1/public/auth/send-email-code` | `{ "email": string }` | `{ "challenge_id": string }` |
| `POST /api/v1/public/auth/confirm-email-code` | `{ "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string }` | `{ "device_session_id": string }` |

`send-email-code` may additionally receive the optional public
`Accept-Language` header through gateway. Auth resolves the first supported
BCP 47 language tag from that header, falls back to `en` when no supported
value is available, uses the resolved value as the auth-mail locale for the
dedicated `Mail Service` REST contract, and stores it on the challenge as the
create-only preferred-language candidate for a later first-user ensure step.
The created `challenge_id` is sent to `Mail Service` as the raw
`Idempotency-Key` header value of that dedicated REST call.
`client_public_key` is the standard base64-encoded raw 32-byte Ed25519 public
key registered for the created device session.
`time_zone` is the client-selected IANA time zone name. During the current
rollout phase, successful confirms forward create-only user registration
context to `User Service` as the stored preferred-language candidate from
`send-email-code` and the supplied `time_zone`.
`User Service` now validates `preferred_language` as BCP 47 and canonicalizes
the stored value on creation, so the derived public language value must
already be a valid BCP 47 tag before auth forwards it.

Public boundary rules:

- requests and responses are JSON only
- request DTOs reject unknown fields
- empty bodies, malformed JSON, trailing JSON input, and unknown fields return
  `400 invalid_request`
- surrounding ASCII and Unicode whitespace is trimmed from input string fields
  before validation
- `confirm-email-code` requires a non-empty `time_zone` and validates it as an
  IANA time zone name
- `send-email-code` remains success-shaped for existing, new, blocked, and
  throttled e-mail paths
- `send-email-code` may use optional public `Accept-Language` to derive and
  store the auth-mail locale plus future create-only `preferred_language`
  candidate; unsupported or missing values fall back to `en`
- `confirm-email-code` returns a ready `device_session_id` synchronously on
  success

Stable public business-error contract:

| HTTP status | `error.code` | Stable `error.message` |
| --- | --- | --- |
| `400` | `invalid_request` | field-specific validation detail |
| `400` | `invalid_code` | `confirmation code is invalid` |
| `400` | `invalid_client_public_key` | `client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key` |
| `403` | `blocked_by_policy` | `authentication is blocked by policy` |
| `404` | `challenge_not_found` | `challenge not found` |
| `409` | `session_limit_exceeded` | `active session limit would be exceeded` |
| `410` | `challenge_expired` | `challenge expired` |
| `503` | `service_unavailable` | `service is unavailable` |

The public error envelope is always:

```json
{
  "error": {
    "code": "string",
    "message": "string"
  }
}
```

## Trusted Internal API

The trusted internal REST surface lives under `/api/v1/internal` and is
documented in [`api/internal-openapi.yaml`](api/internal-openapi.yaml).

Implemented endpoints:

- `GET /api/v1/internal/sessions/{device_session_id}`
- `GET /api/v1/internal/users/{user_id}/sessions`
- `POST /api/v1/internal/sessions/{device_session_id}/revoke`
- `POST /api/v1/internal/users/{user_id}/sessions/revoke-all`
- `POST /api/v1/internal/user-blocks`

Key internal API properties:

- all bodies are JSON only
- `ListUserSessions` is newest-first and unpaginated in v1
- revoke and block mutations require audit metadata as `reason_code` and
  `actor`
- `BlockUser` accepts exactly one of `user_id` or `email`
- mutating operations are idempotent and return explicit acknowledgement
  payloads rather than empty `204` responses

Stable internal error surface:

| HTTP status | `error.code` | Stable `error.message` |
| --- | --- | --- |
| `400` | `invalid_request` | field-specific validation detail |
| `404` | `session_not_found` | `session not found` |
| `404` | `subject_not_found` | `subject not found` |
| `500` | `internal_error` | `internal server error` |
| `503` | `service_unavailable` | `service is unavailable` |

## Challenge Model

A challenge represents one short-lived public e-mail-code flow.

Core fields:

- `challenge_id`
- normalized e-mail
- hashed confirmation code
- `status`
- `delivery_state`
- creation and expiration timestamps
- send and confirm attempt counters
- minimal abuse metadata
- stored preferred-language candidate derived at send time
- optional confirmation metadata used for idempotent retry

### Challenge States

Supported `challenge.Status` values:

- `pending_send`
- `sent`
- `delivery_suppressed`
- `delivery_throttled`
- `confirmed_pending_expire`
- `expired`
- `failed`
- `cancelled`

Supported `challenge.DeliveryState` values:

- `pending`
- `sent`
- `suppressed`
- `throttled`
- `failed`

For the dedicated `Mail Service` REST contract, `delivery_state=sent` means
auth successfully handed the request off to
`POST /api/v1/internal/login-code-deliveries` and the mail-delivery pipeline.
That call uses the created `challenge_id` as the raw `Idempotency-Key` header
value.
It does not require that the SMTP provider exchange already completed before
`challenge_id` was returned to the caller.

Policy rules:

- initial challenge TTL is `5m`
- confirmed-challenge retention for idempotent retry is `5m`
- max invalid confirm attempts is `5`
- every `send-email-code` call creates a fresh challenge
- resend throttling is e-mail scoped with a fixed `1m` cooldown
- a throttled send still creates a fresh challenge in
  `status=delivery_throttled` and `delivery_state=throttled`
- throttled sends do not call `UserDirectory` and do not call `MailSender`
- blocked sends outside the throttle path become `delivery_suppressed`

Fresh confirm semantics:

- only `sent` and `delivery_suppressed` accept a first successful confirm
- `pending_send`, `delivery_throttled`, `failed`, and `cancelled` return
  `invalid_code`
- expired challenges return `challenge_expired` while the Redis grace window
  keeps the record present, then `challenge_not_found` after cleanup removes
  the key

Idempotent retry semantics:

- a repeated confirm with the same `challenge_id`, valid `code`, and identical
  `client_public_key` on `confirmed_pending_expire` returns the same
  `device_session_id`
- the same confirmed challenge with a different `client_public_key` fails as
  `invalid_code`
- idempotent retry republishes the stored gateway session view

## Device Session And Revoke Model

A device session is created only after successful confirmation.

Core fields:

- `device_session_id`
- `user_id`
- parsed client public key
- `status`
- `created_at`
- optional revocation metadata

Supported session states:

- `active`
- `revoked`

Built-in revoke reason codes:

- `device_logout`
- `logout_all`
- `admin_revoke`
- `user_blocked`
- `confirm_race_repair` for best-effort cleanup of superseded sessions created
  during a confirm race

Revoke behavior is intentionally separated by use case:

- revoke one device session
- revoke all sessions of one user
- block a subject and revoke active sessions implied by that subject

Internal mutation responses report only sessions changed by the current call,
so repeated idempotent operations may return:

- `already_revoked` with `affected_session_count=0`
- `no_active_sessions` with `affected_session_count=0`
- `already_blocked` with `affected_session_count=0`

## User Resolution And Session Limits

`Auth / Session Service` does not own durable user records. It delegates to
`UserDirectory` for:

- resolve-by-email without mutation
- ensure existing-or-created user during confirm
- existence checks for stable `user_id`
- block-by-user-id and block-by-email operations

Supported user-resolution outcomes:

- `existing`
- `creatable`
- `blocked`

Supported ensure-user outcomes:

- `existing`
- `created`
- `blocked`

Session-limit rules:

- the value is loaded from a shared config provider
- absent value means the limit is disabled
- active sessions are counted before creating a new one
- limit overflow returns `session_limit_exceeded`
- the service never silently revokes an existing session to satisfy the limit

## Gateway Projection Model

Gateway-facing session projection is separate from source-of-truth
`devicesession.Session`.

Each successful projection publish writes:

- one Redis KV snapshot under
  `<gateway_session_cache_key_prefix><device_session_id>`
- one full-snapshot Redis Stream event under the session-events stream

The default gateway-facing namespaces are:

- cache key prefix: `gateway:session:`
- session-events stream: `gateway:session_events`

Projected fields are intentionally limited to what gateway consumes:

- `device_session_id`
- `user_id`
- `client_public_key`
- `status`
- optional `revoked_at_ms`

Revoke reason and actor metadata stay in authsession source of truth and are
not projected to gateway.

## Consistency Model

Source of truth is written first. Gateway projection is published only after
the source-of-truth write succeeds.

Caller-visible rules:

- if projection publication does not reach its required success threshold, the
  public or internal call returns `service_unavailable`
- already-written source-of-truth state is intentionally preserved
- the documented repair path is to repeat the same confirm or revoke command

Projection publish rules:

- request-path projection publish uses a bounded retry loop with `3` total
  attempts
- repeated publishes are safe because the cache snapshot is overwritten and
  duplicate full-snapshot stream events remain valid under gateway's
  later-event-wins model
- `confirm-email-code` rereads the stored session after the challenge CAS
  succeeds and republishes that current view so a concurrent revoke or block
  cannot overwrite source of truth with a stale active projection
- idempotent confirm retry also republishes the stored session view
- best-effort cleanup of superseded confirm-race sessions uses the same
  publish helper but is not part of the caller-visible success contract

## Runtime Summary

Runtime wiring is implemented in [`internal/app`](internal/app) and
[`cmd/authsession`](cmd/authsession/main.go).

Process-local collaborators:

- system UTC clock
- crypto-random `challenge_id` and `device_session_id` generators
- crypto-random 6-digit confirmation-code generator
- bcrypt-backed code hashing
- structured logging through `zap`
- process telemetry through OpenTelemetry

Redis-backed adapters:

- challenge store
- session store
- session-limit config provider
- gateway projection publisher
- send-email-code abuse protector

External service adapters:

- user-service:
  - default `stub`
  - optional REST adapter with one retry for read-style methods on transport
    errors and HTTP `502`, `503`, or `504`
  - mutation methods do not auto-retry
- mail-service:
  - default `stub`
  - optional REST adapter with no automatic retry on transport or upstream
    failure, to avoid duplicate deliveries

Listener defaults:

- public HTTP: `:8080`
- internal HTTP: `:8081`
- read-header timeout: `2s`
- read timeout: `10s`
- idle timeout: `1m`
- per-request use-case timeout: `3s`

For detailed runtime behavior, configuration groups, operational notes, and
examples, see [`docs/README.md`](docs/README.md).

## Non-Goals

- making authsession a hot synchronous dependency for every authenticated
  gateway command
- moving business authorization into authsession
- exposing revoke or read operations as public unauthenticated routes
- introducing short-lived access-token or refresh-token flows
- adding pending async session provisioning after confirm