Auth / Session Service
Run and Dependencies
cmd/authsession starts two HTTP listeners:
- public REST on
AUTHSESSION_PUBLIC_HTTP_ADDRwith default:8080 - trusted internal REST on
AUTHSESSION_INTERNAL_HTTP_ADDRwith default:8081
Startup requires:
- one reachable Redis deployment configured by
AUTHSESSION_REDIS_ADDR
That Redis deployment is used for:
- source-of-truth challenges
- source-of-truth device sessions
- dynamic active-session limit config
- gateway session projection cache and stream updates
- send-email-code resend throttling
Optional integrations:
AUTHSESSION_USER_SERVICE_MODE=stub|restAUTHSESSION_MAIL_SERVICE_MODE=stub|rest- OTLP telemetry through standard
OTEL_*variables - stdout telemetry through
AUTHSESSION_OTEL_STDOUT_TRACES_ENABLEDandAUTHSESSION_OTEL_STDOUT_METRICS_ENABLED
Operational caveats:
- the service exposes no
/healthz,/readyz, or/metricsendpoints - user-service and mail-service default to in-process stub adapters until
restmode is configured - startup performs bounded Redis
PINGchecks for every Redis-backed adapter and fails fast if Redis or runtime config is invalid
Additional module docs:
Purpose
Auth / Session Service owns e-mail-code authentication and the lifecycle of
device sessions.
It is the source of truth for:
- authentication challenges
- device sessions
- revoke and block state
- publication of session lifecycle updates consumed by
Edge Gateway
The service is intentionally not on the hot path for every authenticated request. Gateway authenticates the steady-state request path from its own cache and session-lifecycle updates rather than by synchronous round-trips back to auth for each command.
Responsibilities
The service is responsible for:
- public auth commands:
send-email-codeconfirm-email-code
- creating device sessions after successful confirmation
- registering the client public key for a newly created session
- revoking one device session
- revoking all sessions of one user
- blocking a user or e-mail subject for future auth flows
- persisting source-of-truth session state
- projecting session state into gateway-consumable Redis data
- exposing a trusted internal REST API for read, revoke, and block operations
The service is not responsible for:
- verifying authenticated transport signatures on every business request
- gateway anti-replay for authenticated command traffic
- downstream business authorization
- direct push delivery to clients
- long-lived hot-path session caching inside gateway
- mail-service implementation details beyond the mail-delivery contract
Position in the System
flowchart LR
Client["Client"]
Gateway["Edge Gateway"]
Auth["Auth / Session Service"]
User["User Service"]
Mail["Mail Service"]
Redis["Redis"]
Business["Business Services"]
Client --> Gateway
Gateway --> Auth
Gateway --> Business
Auth --> User
Auth --> Mail
Auth --> Redis
Redis --> Gateway
Main Principles
- public auth stays synchronous
send-email-codereturnschallenge_idconfirm-email-codereturns a readydevice_session_id- no pending async session-provisioning stage exists
- source-of-truth session state and gateway-facing projection remain separate
- Redis is the initial backend, but the domain and service layers stay storage agnostic behind ports
send-email-codestays success-shaped for existing, new, blocked, and throttled e-mail flowsconfirm-email-codesupports short-window idempotent retry for the same confirmed challenge and the sameclient_public_key- active-session limits are configuration driven:
- absent limit means disabled
- limit overflow rejects new session creation explicitly
- the service does not evict existing sessions to make room
Gateway-Facing Public Contract
Gateway already exposes the public REST auth surface and delegates it to this service:
POST /api/v1/public/auth/send-email-codePOST /api/v1/public/auth/confirm-email-code
The effective DTO contract is:
| Operation | Request | Success response |
|---|---|---|
POST /api/v1/public/auth/send-email-code |
{ "email": string } |
{ "challenge_id": string } |
POST /api/v1/public/auth/confirm-email-code |
{ "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string } |
{ "device_session_id": string } |
client_public_key is the standard base64-encoded raw 32-byte Ed25519 public
key registered for the created device session.
time_zone is the client-selected IANA time zone name. During the current
rollout phase, successful confirms forward create-only user registration
context to User Service as preferred_language="en" and the supplied
time_zone until gateway geoip-based language derivation is deployed.
Public boundary rules:
- requests and responses are JSON only
- request DTOs reject unknown fields
- empty bodies, malformed JSON, trailing JSON input, and unknown fields return
400 invalid_request - surrounding ASCII and Unicode whitespace is trimmed from input string fields before validation
confirm-email-coderequires a non-emptytime_zoneand validates it as an IANA time zone namesend-email-coderemains success-shaped for existing, new, blocked, and throttled e-mail pathsconfirm-email-codereturns a readydevice_session_idsynchronously on success
Stable public business-error contract:
| HTTP status | error.code |
Stable error.message |
|---|---|---|
400 |
invalid_request |
field-specific validation detail |
400 |
invalid_code |
confirmation code is invalid |
400 |
invalid_client_public_key |
client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key |
403 |
blocked_by_policy |
authentication is blocked by policy |
404 |
challenge_not_found |
challenge not found |
409 |
session_limit_exceeded |
active session limit would be exceeded |
410 |
challenge_expired |
challenge expired |
503 |
service_unavailable |
service is unavailable |
The public error envelope is always:
{
"error": {
"code": "string",
"message": "string"
}
}
Trusted Internal API
The trusted internal REST surface lives under /api/v1/internal and is
documented in api/internal-openapi.yaml.
Implemented endpoints:
GET /api/v1/internal/sessions/{device_session_id}GET /api/v1/internal/users/{user_id}/sessionsPOST /api/v1/internal/sessions/{device_session_id}/revokePOST /api/v1/internal/users/{user_id}/sessions/revoke-allPOST /api/v1/internal/user-blocks
Key internal API properties:
- all bodies are JSON only
ListUserSessionsis newest-first and unpaginated in v1- revoke and block mutations require audit metadata as
reason_codeandactor BlockUseraccepts exactly one ofuser_idoremail- mutating operations are idempotent and return explicit acknowledgement
payloads rather than empty
204responses
Stable internal error surface:
| HTTP status | error.code |
Stable error.message |
|---|---|---|
400 |
invalid_request |
field-specific validation detail |
404 |
session_not_found |
session not found |
404 |
subject_not_found |
subject not found |
500 |
internal_error |
internal server error |
503 |
service_unavailable |
service is unavailable |
Challenge Model
A challenge represents one short-lived public e-mail-code flow.
Core fields:
challenge_id- normalized e-mail
- hashed confirmation code
statusdelivery_state- creation and expiration timestamps
- send and confirm attempt counters
- minimal abuse metadata
- optional confirmation metadata used for idempotent retry
Challenge States
Supported challenge.Status values:
pending_sendsentdelivery_suppresseddelivery_throttledconfirmed_pending_expireexpiredfailedcancelled
Supported challenge.DeliveryState values:
pendingsentsuppressedthrottledfailed
Policy rules:
- initial challenge TTL is
5m - confirmed-challenge retention for idempotent retry is
5m - max invalid confirm attempts is
5 - every
send-email-codecall creates a fresh challenge - resend throttling is e-mail scoped with a fixed
1mcooldown - a throttled send still creates a fresh challenge in
status=delivery_throttledanddelivery_state=throttled - throttled sends do not call
UserDirectoryand do not callMailSender - blocked sends outside the throttle path become
delivery_suppressed
Fresh confirm semantics:
- only
sentanddelivery_suppressedaccept a first successful confirm pending_send,delivery_throttled,failed, andcancelledreturninvalid_code- expired challenges return
challenge_expiredwhile the Redis grace window keeps the record present, thenchallenge_not_foundafter cleanup removes the key
Idempotent retry semantics:
- a repeated confirm with the same
challenge_id, validcode, and identicalclient_public_keyonconfirmed_pending_expirereturns the samedevice_session_id - the same confirmed challenge with a different
client_public_keyfails asinvalid_code - idempotent retry republishes the stored gateway session view
Device Session And Revoke Model
A device session is created only after successful confirmation.
Core fields:
device_session_iduser_id- parsed client public key
statuscreated_at- optional revocation metadata
Supported session states:
activerevoked
Built-in revoke reason codes:
device_logoutlogout_alladmin_revokeuser_blockedconfirm_race_repairfor best-effort cleanup of superseded sessions created during a confirm race
Revoke behavior is intentionally separated by use case:
- revoke one device session
- revoke all sessions of one user
- block a subject and revoke active sessions implied by that subject
Internal mutation responses report only sessions changed by the current call, so repeated idempotent operations may return:
already_revokedwithaffected_session_count=0no_active_sessionswithaffected_session_count=0already_blockedwithaffected_session_count=0
User Resolution And Session Limits
Auth / Session Service does not own durable user records. It delegates to
UserDirectory for:
- resolve-by-email without mutation
- ensure existing-or-created user during confirm
- existence checks for stable
user_id - block-by-user-id and block-by-email operations
Supported user-resolution outcomes:
existingcreatableblocked
Supported ensure-user outcomes:
existingcreatedblocked
Session-limit rules:
- the value is loaded from a shared config provider
- absent value means the limit is disabled
- active sessions are counted before creating a new one
- limit overflow returns
session_limit_exceeded - the service never silently revokes an existing session to satisfy the limit
Gateway Projection Model
Gateway-facing session projection is separate from source-of-truth
devicesession.Session.
Each successful projection publish writes:
- one Redis KV snapshot under
<gateway_session_cache_key_prefix><device_session_id> - one full-snapshot Redis Stream event under the session-events stream
The default gateway-facing namespaces are:
- cache key prefix:
gateway:session: - session-events stream:
gateway:session_events
Projected fields are intentionally limited to what gateway consumes:
device_session_iduser_idclient_public_keystatus- optional
revoked_at_ms
Revoke reason and actor metadata stay in authsession source of truth and are not projected to gateway.
Consistency Model
Source of truth is written first. Gateway projection is published only after the source-of-truth write succeeds.
Caller-visible rules:
- if projection publication does not reach its required success threshold, the
public or internal call returns
service_unavailable - already-written source-of-truth state is intentionally preserved
- the documented repair path is to repeat the same confirm or revoke command
Projection publish rules:
- request-path projection publish uses a bounded retry loop with
3total attempts - repeated publishes are safe because the cache snapshot is overwritten and duplicate full-snapshot stream events remain valid under gateway's later-event-wins model
confirm-email-coderereads the stored session after the challenge CAS succeeds and republishes that current view so a concurrent revoke or block cannot overwrite source of truth with a stale active projection- idempotent confirm retry also republishes the stored session view
- best-effort cleanup of superseded confirm-race sessions uses the same publish helper but is not part of the caller-visible success contract
Runtime Summary
Runtime wiring is implemented in internal/app and
cmd/authsession.
Process-local collaborators:
- system UTC clock
- crypto-random
challenge_idanddevice_session_idgenerators - crypto-random 6-digit confirmation-code generator
- bcrypt-backed code hashing
- structured logging through
zap - process telemetry through OpenTelemetry
Redis-backed adapters:
- challenge store
- session store
- session-limit config provider
- gateway projection publisher
- send-email-code abuse protector
External service adapters:
- user-service:
- default
stub - optional REST adapter with one retry for read-style methods on transport
errors and HTTP
502,503, or504 - mutation methods do not auto-retry
- default
- mail-service:
- default
stub - optional REST adapter with no automatic retry on transport or upstream failure, to avoid duplicate deliveries
- default
Listener defaults:
- public HTTP:
:8080 - internal HTTP:
:8081 - read-header timeout:
2s - read timeout:
10s - idle timeout:
1m - per-request use-case timeout:
3s
For detailed runtime behavior, configuration groups, operational notes, and
examples, see docs/README.md.
Non-Goals
- making authsession a hot synchronous dependency for every authenticated gateway command
- moving business authorization into authsession
- exposing revoke or read operations as public unauthenticated routes
- introducing short-lived access-token or refresh-token flows
- adding pending async session provisioning after confirm