AUTHSESSION_USER_SERVICE_MODE=stub|rest
AUTHSESSION_MAIL_SERVICE_MODE=stub|rest
OTLP telemetry through standard OTEL_* variables
stdout telemetry through AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED and AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED

Operational caveats:

the service exposes no /healthz, /readyz, or /metrics endpoints
user-service and mail-service default to in-process stub adapters until rest mode is configured
startup performs bounded Redis PING checks for every Redis-backed adapter and fails fast if Redis or runtime config is invalid

Additional module docs:

Purpose

Auth / Session Service owns e-mail-code authentication and the lifecycle of device sessions.

It is the source of truth for:

authentication challenges
device sessions
revoke and block state
publication of session lifecycle updates consumed by Edge Gateway

The service is intentionally not on the hot path for every authenticated request. Gateway authenticates the steady-state request path from its own cache and session-lifecycle updates rather than by synchronous round-trips back to auth for each command.

Responsibilities

The service is responsible for:

public auth commands:
- send-email-code
- confirm-email-code
creating device sessions after successful confirmation
registering the client public key for a newly created session
revoking one device session
revoking all sessions of one user
blocking a user or e-mail subject for future auth flows
persisting source-of-truth session state
projecting session state into gateway-consumable Redis data
exposing a trusted internal REST API for read, revoke, and block operations

The service is not responsible for:

verifying authenticated transport signatures on every business request
gateway anti-replay for authenticated command traffic
downstream business authorization
direct push delivery to clients
long-lived hot-path session caching inside gateway
mail-service implementation details beyond the mail-delivery contract

Position in the System

flowchart LR
    Client["Client"]
    Gateway["Edge Gateway"]
    Auth["Auth / Session Service"]
    User["User Service"]
    Mail["Mail Service"]
    Redis["Redis"]
    Business["Business Services"]

    Client --> Gateway
    Gateway --> Auth
    Gateway --> Business
    Auth --> User
    Auth --> Mail
    Auth --> Redis
    Redis --> Gateway

Main Principles

public auth stays synchronous
send-email-code returns challenge_id
confirm-email-code returns a ready device_session_id
no pending async session-provisioning stage exists
source-of-truth session state and gateway-facing projection remain separate
Redis is the initial backend, but the domain and service layers stay storage agnostic behind ports
send-email-code stays success-shaped for existing, new, blocked, and throttled e-mail flows
confirm-email-code supports short-window idempotent retry for the same confirmed challenge and the same client_public_key
active-session limits are configuration driven:
- absent limit means disabled
- limit overflow rejects new session creation explicitly
- the service does not evict existing sessions to make room

Gateway-Facing Public Contract

Gateway already exposes the public REST auth surface and delegates it to this service:

POST /api/v1/public/auth/send-email-code
POST /api/v1/public/auth/confirm-email-code

The effective DTO contract is:

Operation	Request	Success response
`POST /api/v1/public/auth/send-email-code`	`{ "email": string }`	`{ "challenge_id": string }`
`POST /api/v1/public/auth/confirm-email-code`	`{ "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string }`	`{ "device_session_id": string }`

client_public_key is the standard base64-encoded raw 32-byte Ed25519 public key registered for the created device session. time_zone is the client-selected IANA time zone name. During the current rollout phase, successful confirms forward create-only user registration context to User Service as preferred_language="en" and the supplied time_zone until gateway geoip-based language derivation is deployed.

Public boundary rules:

requests and responses are JSON only
request DTOs reject unknown fields
empty bodies, malformed JSON, trailing JSON input, and unknown fields return 400 invalid_request
surrounding ASCII and Unicode whitespace is trimmed from input string fields before validation
confirm-email-code requires a non-empty time_zone and validates it as an IANA time zone name
send-email-code remains success-shaped for existing, new, blocked, and throttled e-mail paths
confirm-email-code returns a ready device_session_id synchronously on success

Stable public business-error contract:

HTTP status	`error.code`	Stable `error.message`
`400`	`invalid_request`	field-specific validation detail
`400`	`invalid_code`	`confirmation code is invalid`
`400`	`invalid_client_public_key`	`client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key`
`403`	`blocked_by_policy`	`authentication is blocked by policy`
`404`	`challenge_not_found`	`challenge not found`
`409`	`session_limit_exceeded`	`active session limit would be exceeded`
`410`	`challenge_expired`	`challenge expired`
`503`	`service_unavailable`	`service is unavailable`

The public error envelope is always:

{
  "error": {
    "code": "string",
    "message": "string"
  }
}

Trusted Internal API

The trusted internal REST surface lives under /api/v1/internal and is documented in api/internal-openapi.yaml.

Implemented endpoints:

GET /api/v1/internal/sessions/{device_session_id}
GET /api/v1/internal/users/{user_id}/sessions
POST /api/v1/internal/sessions/{device_session_id}/revoke
POST /api/v1/internal/users/{user_id}/sessions/revoke-all
POST /api/v1/internal/user-blocks

Key internal API properties:

all bodies are JSON only
ListUserSessions is newest-first and unpaginated in v1
revoke and block mutations require audit metadata as reason_code and actor
BlockUser accepts exactly one of user_id or email
mutating operations are idempotent and return explicit acknowledgement payloads rather than empty 204 responses

Stable internal error surface:

HTTP status	`error.code`	Stable `error.message`
`400`	`invalid_request`	field-specific validation detail
`404`	`session_not_found`	`session not found`
`404`	`subject_not_found`	`subject not found`
`500`	`internal_error`	`internal server error`
`503`	`service_unavailable`	`service is unavailable`

Challenge Model

A challenge represents one short-lived public e-mail-code flow.

Core fields:

challenge_id
normalized e-mail
hashed confirmation code
status
delivery_state
creation and expiration timestamps
send and confirm attempt counters
minimal abuse metadata
optional confirmation metadata used for idempotent retry

Challenge States

Supported challenge.Status values:

pending_send
sent
delivery_suppressed
delivery_throttled
confirmed_pending_expire
expired
failed
cancelled

Supported challenge.DeliveryState values:

pending
sent
suppressed
throttled
failed

Policy rules:

initial challenge TTL is 5m
confirmed-challenge retention for idempotent retry is 5m
max invalid confirm attempts is 5
every send-email-code call creates a fresh challenge
resend throttling is e-mail scoped with a fixed 1m cooldown
a throttled send still creates a fresh challenge in status=delivery_throttled and delivery_state=throttled
throttled sends do not call UserDirectory and do not call MailSender
blocked sends outside the throttle path become delivery_suppressed

Fresh confirm semantics:

only sent and delivery_suppressed accept a first successful confirm
pending_send, delivery_throttled, failed, and cancelled return invalid_code
expired challenges return challenge_expired while the Redis grace window keeps the record present, then challenge_not_found after cleanup removes the key

Idempotent retry semantics:

a repeated confirm with the same challenge_id, valid code, and identical client_public_key on confirmed_pending_expire returns the same device_session_id
the same confirmed challenge with a different client_public_key fails as invalid_code
idempotent retry republishes the stored gateway session view

Device Session And Revoke Model

A device session is created only after successful confirmation.

Core fields:

device_session_id
user_id
parsed client public key
status
created_at
optional revocation metadata

Supported session states:

active
revoked

Built-in revoke reason codes:

device_logout
logout_all
admin_revoke
user_blocked
confirm_race_repair for best-effort cleanup of superseded sessions created during a confirm race

Revoke behavior is intentionally separated by use case:

revoke one device session
revoke all sessions of one user
block a subject and revoke active sessions implied by that subject

Internal mutation responses report only sessions changed by the current call, so repeated idempotent operations may return:

already_revoked with affected_session_count=0
no_active_sessions with affected_session_count=0
already_blocked with affected_session_count=0

User Resolution And Session Limits

Auth / Session Service does not own durable user records. It delegates to UserDirectory for:

resolve-by-email without mutation
ensure existing-or-created user during confirm
existence checks for stable user_id
block-by-user-id and block-by-email operations

Supported user-resolution outcomes:

existing
creatable
blocked

Supported ensure-user outcomes:

existing
created
blocked

Session-limit rules:

the value is loaded from a shared config provider
absent value means the limit is disabled
active sessions are counted before creating a new one
limit overflow returns session_limit_exceeded
the service never silently revokes an existing session to satisfy the limit

Gateway Projection Model

Gateway-facing session projection is separate from source-of-truth devicesession.Session.

Each successful projection publish writes:

one Redis KV snapshot under <gateway_session_cache_key_prefix><device_session_id>
one full-snapshot Redis Stream event under the session-events stream

The default gateway-facing namespaces are:

cache key prefix: gateway:session:
session-events stream: gateway:session_events

Projected fields are intentionally limited to what gateway consumes:

device_session_id
user_id
client_public_key
status
optional revoked_at_ms

Revoke reason and actor metadata stay in authsession source of truth and are not projected to gateway.

Consistency Model

Source of truth is written first. Gateway projection is published only after the source-of-truth write succeeds.

Caller-visible rules:

if projection publication does not reach its required success threshold, the public or internal call returns service_unavailable
already-written source-of-truth state is intentionally preserved
the documented repair path is to repeat the same confirm or revoke command

Projection publish rules:

request-path projection publish uses a bounded retry loop with 3 total attempts
repeated publishes are safe because the cache snapshot is overwritten and duplicate full-snapshot stream events remain valid under gateway's later-event-wins model
confirm-email-code rereads the stored session after the challenge CAS succeeds and republishes that current view so a concurrent revoke or block cannot overwrite source of truth with a stale active projection
idempotent confirm retry also republishes the stored session view
best-effort cleanup of superseded confirm-race sessions uses the same publish helper but is not part of the caller-visible success contract

Runtime Summary

Runtime wiring is implemented in internal/app and cmd/authsession.

Process-local collaborators:

system UTC clock
crypto-random challenge_id and device_session_id generators
crypto-random 6-digit confirmation-code generator
bcrypt-backed code hashing
structured logging through zap
process telemetry through OpenTelemetry

Redis-backed adapters:

challenge store
session store
session-limit config provider
gateway projection publisher
send-email-code abuse protector

External service adapters:

user-service:
- default stub
- optional REST adapter with one retry for read-style methods on transport errors and HTTP 502, 503, or 504
- mutation methods do not auto-retry
mail-service:
- default stub
- optional REST adapter with no automatic retry on transport or upstream failure, to avoid duplicate deliveries

Listener defaults:

public HTTP: :8080
internal HTTP: :8081
read-header timeout: 2s
read timeout: 10s
idle timeout: 1m
per-request use-case timeout: 3s

For detailed runtime behavior, configuration groups, operational notes, and examples, see docs/README.md.

Non-Goals

making authsession a hot synchronous dependency for every authenticated gateway command
moving business authorization into authsession
exposing revoke or read operations as public unauthenticated routes
introducing short-lived access-token or refresh-token flows
adding pending async session provisioning after confirm