# Auth / Session Service ## Run and Dependencies `cmd/authsession` starts two HTTP listeners: - public REST on `AUTHSESSION_PUBLIC_HTTP_ADDR` with default `:8080` - trusted internal REST on `AUTHSESSION_INTERNAL_HTTP_ADDR` with default `:8081` Startup requires: - one reachable Redis deployment configured by `AUTHSESSION_REDIS_ADDR` That Redis deployment is used for: - source-of-truth challenges - source-of-truth device sessions - dynamic active-session limit config - gateway session projection cache and stream updates - send-email-code resend throttling Optional integrations: - `AUTHSESSION_USER_SERVICE_MODE=stub|rest` - `AUTHSESSION_MAIL_SERVICE_MODE=stub|rest` - OTLP telemetry through standard `OTEL_*` variables - stdout telemetry through `AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED` and `AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED` Operational caveats: - the service exposes no `/healthz`, `/readyz`, or `/metrics` endpoints - user-service and mail-service default to in-process stub adapters until `rest` mode is configured - startup performs bounded Redis `PING` checks for every Redis-backed adapter and fails fast if Redis or runtime config is invalid Additional module docs: - [Public REST contract](api/public-openapi.yaml) - [Internal REST contract](api/internal-openapi.yaml) - [Documentation index](docs/README.md) - [Edge Gateway README](../gateway/README.md) ## Purpose `Auth / Session Service` owns e-mail-code authentication and the lifecycle of device sessions. It is the source of truth for: - authentication challenges - device sessions - revoke and block state - publication of session lifecycle updates consumed by [`Edge Gateway`](../gateway/README.md) The service is intentionally not on the hot path for every authenticated request. Gateway authenticates the steady-state request path from its own cache and session-lifecycle updates rather than by synchronous round-trips back to auth for each command. ## Responsibilities The service is responsible for: - public auth commands: - `send-email-code` - `confirm-email-code` - creating device sessions after successful confirmation - registering the client public key for a newly created session - revoking one device session - revoking all sessions of one user - blocking a user or e-mail subject for future auth flows - persisting source-of-truth session state - projecting session state into gateway-consumable Redis data - exposing a trusted internal REST API for read, revoke, and block operations The service is not responsible for: - verifying authenticated transport signatures on every business request - gateway anti-replay for authenticated command traffic - downstream business authorization - direct push delivery to clients - long-lived hot-path session caching inside gateway - mail-service implementation details beyond the dedicated login-code delivery REST contract ## Position in the System ```mermaid flowchart LR Client["Client"] Gateway["Edge Gateway"] Auth["Auth / Session Service"] User["User Service"] Mail["Mail Service"] Redis["Redis"] Business["Business Services"] Client --> Gateway Gateway --> Auth Gateway --> Business Auth --> User Auth --> Mail Auth --> Redis Redis --> Gateway ``` ## Main Principles - public auth stays synchronous - `send-email-code` returns `challenge_id` - `confirm-email-code` returns a ready `device_session_id` - no pending async session-provisioning stage exists - source-of-truth session state and gateway-facing projection remain separate - Redis is the initial backend, but the domain and service layers stay storage agnostic behind ports - `send-email-code` stays success-shaped for existing, new, blocked, and throttled e-mail flows - `confirm-email-code` supports short-window idempotent retry for the same confirmed challenge and the same `client_public_key` - active-session limits are configuration driven: - absent limit means disabled - limit overflow rejects new session creation explicitly - the service does not evict existing sessions to make room ## Gateway-Facing Public Contract Gateway already exposes the public REST auth surface and delegates it to this service: - `POST /api/v1/public/auth/send-email-code` - `POST /api/v1/public/auth/confirm-email-code` The effective DTO contract is: | Operation | Request | Success response | | --- | --- | --- | | `POST /api/v1/public/auth/send-email-code` | `{ "email": string }` | `{ "challenge_id": string }` | | `POST /api/v1/public/auth/confirm-email-code` | `{ "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string }` | `{ "device_session_id": string }` | `send-email-code` may additionally receive the optional public `Accept-Language` header through gateway. Auth resolves the first supported BCP 47 language tag from that header, falls back to `en` when no supported value is available, uses the resolved value as the auth-mail locale for the dedicated `Mail Service` REST contract, and stores it on the challenge as the create-only preferred-language candidate for a later first-user ensure step. The created `challenge_id` is sent to `Mail Service` as the raw `Idempotency-Key` header value of that dedicated REST call. `client_public_key` is the standard base64-encoded raw 32-byte Ed25519 public key registered for the created device session. `time_zone` is the client-selected IANA time zone name. During the current rollout phase, successful confirms forward create-only user registration context to `User Service` as the stored preferred-language candidate from `send-email-code` and the supplied `time_zone`. `User Service` now validates `preferred_language` as BCP 47 and canonicalizes the stored value on creation, so the derived public language value must already be a valid BCP 47 tag before auth forwards it. Public boundary rules: - requests and responses are JSON only - request DTOs reject unknown fields - empty bodies, malformed JSON, trailing JSON input, and unknown fields return `400 invalid_request` - surrounding ASCII and Unicode whitespace is trimmed from input string fields before validation - `confirm-email-code` requires a non-empty `time_zone` and validates it as an IANA time zone name - `send-email-code` remains success-shaped for existing, new, blocked, and throttled e-mail paths - `send-email-code` may use optional public `Accept-Language` to derive and store the auth-mail locale plus future create-only `preferred_language` candidate; unsupported or missing values fall back to `en` - `confirm-email-code` returns a ready `device_session_id` synchronously on success Stable public business-error contract: | HTTP status | `error.code` | Stable `error.message` | | --- | --- | --- | | `400` | `invalid_request` | field-specific validation detail | | `400` | `invalid_code` | `confirmation code is invalid` | | `400` | `invalid_client_public_key` | `client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key` | | `403` | `blocked_by_policy` | `authentication is blocked by policy` | | `404` | `challenge_not_found` | `challenge not found` | | `409` | `session_limit_exceeded` | `active session limit would be exceeded` | | `410` | `challenge_expired` | `challenge expired` | | `503` | `service_unavailable` | `service is unavailable` | The public error envelope is always: ```json { "error": { "code": "string", "message": "string" } } ``` ## Trusted Internal API The trusted internal REST surface lives under `/api/v1/internal` and is documented in [`api/internal-openapi.yaml`](api/internal-openapi.yaml). Implemented endpoints: - `GET /api/v1/internal/sessions/{device_session_id}` - `GET /api/v1/internal/users/{user_id}/sessions` - `POST /api/v1/internal/sessions/{device_session_id}/revoke` - `POST /api/v1/internal/users/{user_id}/sessions/revoke-all` - `POST /api/v1/internal/user-blocks` Key internal API properties: - all bodies are JSON only - `ListUserSessions` is newest-first and unpaginated in v1 - revoke and block mutations require audit metadata as `reason_code` and `actor` - `BlockUser` accepts exactly one of `user_id` or `email` - mutating operations are idempotent and return explicit acknowledgement payloads rather than empty `204` responses Stable internal error surface: | HTTP status | `error.code` | Stable `error.message` | | --- | --- | --- | | `400` | `invalid_request` | field-specific validation detail | | `404` | `session_not_found` | `session not found` | | `404` | `subject_not_found` | `subject not found` | | `500` | `internal_error` | `internal server error` | | `503` | `service_unavailable` | `service is unavailable` | ## Challenge Model A challenge represents one short-lived public e-mail-code flow. Core fields: - `challenge_id` - normalized e-mail - hashed confirmation code - `status` - `delivery_state` - creation and expiration timestamps - send and confirm attempt counters - minimal abuse metadata - stored preferred-language candidate derived at send time - optional confirmation metadata used for idempotent retry ### Challenge States Supported `challenge.Status` values: - `pending_send` - `sent` - `delivery_suppressed` - `delivery_throttled` - `confirmed_pending_expire` - `expired` - `failed` - `cancelled` Supported `challenge.DeliveryState` values: - `pending` - `sent` - `suppressed` - `throttled` - `failed` For the dedicated `Mail Service` REST contract, `delivery_state=sent` means auth successfully handed the request off to `POST /api/v1/internal/login-code-deliveries` and the mail-delivery pipeline. That call uses the created `challenge_id` as the raw `Idempotency-Key` header value. It does not require that the SMTP provider exchange already completed before `challenge_id` was returned to the caller. Policy rules: - initial challenge TTL is `5m` - confirmed-challenge retention for idempotent retry is `5m` - max invalid confirm attempts is `5` - every `send-email-code` call creates a fresh challenge - resend throttling is e-mail scoped with a fixed `1m` cooldown - a throttled send still creates a fresh challenge in `status=delivery_throttled` and `delivery_state=throttled` - throttled sends do not call `UserDirectory` and do not call `MailSender` - blocked sends outside the throttle path become `delivery_suppressed` Fresh confirm semantics: - only `sent` and `delivery_suppressed` accept a first successful confirm - `pending_send`, `delivery_throttled`, `failed`, and `cancelled` return `invalid_code` - expired challenges return `challenge_expired` while the Redis grace window keeps the record present, then `challenge_not_found` after cleanup removes the key Idempotent retry semantics: - a repeated confirm with the same `challenge_id`, valid `code`, and identical `client_public_key` on `confirmed_pending_expire` returns the same `device_session_id` - the same confirmed challenge with a different `client_public_key` fails as `invalid_code` - idempotent retry republishes the stored gateway session view ## Device Session And Revoke Model A device session is created only after successful confirmation. Core fields: - `device_session_id` - `user_id` - parsed client public key - `status` - `created_at` - optional revocation metadata Supported session states: - `active` - `revoked` Built-in revoke reason codes: - `device_logout` - `logout_all` - `admin_revoke` - `user_blocked` - `confirm_race_repair` for best-effort cleanup of superseded sessions created during a confirm race Revoke behavior is intentionally separated by use case: - revoke one device session - revoke all sessions of one user - block a subject and revoke active sessions implied by that subject Internal mutation responses report only sessions changed by the current call, so repeated idempotent operations may return: - `already_revoked` with `affected_session_count=0` - `no_active_sessions` with `affected_session_count=0` - `already_blocked` with `affected_session_count=0` ## User Resolution And Session Limits `Auth / Session Service` does not own durable user records. It delegates to `UserDirectory` for: - resolve-by-email without mutation - ensure existing-or-created user during confirm - existence checks for stable `user_id` - block-by-user-id and block-by-email operations Supported user-resolution outcomes: - `existing` - `creatable` - `blocked` Supported ensure-user outcomes: - `existing` - `created` - `blocked` Session-limit rules: - the value is loaded from a shared config provider - absent value means the limit is disabled - active sessions are counted before creating a new one - limit overflow returns `session_limit_exceeded` - the service never silently revokes an existing session to satisfy the limit ## Gateway Projection Model Gateway-facing session projection is separate from source-of-truth `devicesession.Session`. Each successful projection publish writes: - one Redis KV snapshot under `` - one full-snapshot Redis Stream event under the session-events stream The default gateway-facing namespaces are: - cache key prefix: `gateway:session:` - session-events stream: `gateway:session_events` Projected fields are intentionally limited to what gateway consumes: - `device_session_id` - `user_id` - `client_public_key` - `status` - optional `revoked_at_ms` Revoke reason and actor metadata stay in authsession source of truth and are not projected to gateway. ## Consistency Model Source of truth is written first. Gateway projection is published only after the source-of-truth write succeeds. Caller-visible rules: - if projection publication does not reach its required success threshold, the public or internal call returns `service_unavailable` - already-written source-of-truth state is intentionally preserved - the documented repair path is to repeat the same confirm or revoke command Projection publish rules: - request-path projection publish uses a bounded retry loop with `3` total attempts - repeated publishes are safe because the cache snapshot is overwritten and duplicate full-snapshot stream events remain valid under gateway's later-event-wins model - `confirm-email-code` rereads the stored session after the challenge CAS succeeds and republishes that current view so a concurrent revoke or block cannot overwrite source of truth with a stale active projection - idempotent confirm retry also republishes the stored session view - best-effort cleanup of superseded confirm-race sessions uses the same publish helper but is not part of the caller-visible success contract ## Runtime Summary Runtime wiring is implemented in [`internal/app`](internal/app) and [`cmd/authsession`](cmd/authsession/main.go). Process-local collaborators: - system UTC clock - crypto-random `challenge_id` and `device_session_id` generators - crypto-random 6-digit confirmation-code generator - bcrypt-backed code hashing - structured logging through `zap` - process telemetry through OpenTelemetry Redis-backed adapters: - challenge store - session store - session-limit config provider - gateway projection publisher - send-email-code abuse protector External service adapters: - user-service: - default `stub` - optional REST adapter with one retry for read-style methods on transport errors and HTTP `502`, `503`, or `504` - mutation methods do not auto-retry - mail-service: - default `stub` - optional REST adapter with no automatic retry on transport or upstream failure, to avoid duplicate deliveries Listener defaults: - public HTTP: `:8080` - internal HTTP: `:8081` - read-header timeout: `2s` - read timeout: `10s` - idle timeout: `1m` - per-request use-case timeout: `3s` For detailed runtime behavior, configuration groups, operational notes, and examples, see [`docs/README.md`](docs/README.md). ## Non-Goals - making authsession a hot synchronous dependency for every authenticated gateway command - moving business authorization into authsession - exposing revoke or read operations as public unauthenticated routes - introducing short-lived access-token or refresh-token flows - adding pending async session provisioning after confirm