469 lines
15 KiB
Markdown
469 lines
15 KiB
Markdown
# Auth / Session Service
|
|
|
|
## Run and Dependencies
|
|
|
|
`cmd/authsession` starts two HTTP listeners:
|
|
|
|
- public REST on `AUTHSESSION_PUBLIC_HTTP_ADDR` with default `:8080`
|
|
- trusted internal REST on `AUTHSESSION_INTERNAL_HTTP_ADDR` with default `:8081`
|
|
|
|
Startup requires:
|
|
|
|
- one reachable Redis deployment configured by `AUTHSESSION_REDIS_ADDR`
|
|
|
|
That Redis deployment is used for:
|
|
|
|
- source-of-truth challenges
|
|
- source-of-truth device sessions
|
|
- dynamic active-session limit config
|
|
- gateway session projection cache and stream updates
|
|
- send-email-code resend throttling
|
|
|
|
Optional integrations:
|
|
|
|
- `AUTHSESSION_USER_SERVICE_MODE=stub|rest`
|
|
- `AUTHSESSION_MAIL_SERVICE_MODE=stub|rest`
|
|
- OTLP telemetry through standard `OTEL_*` variables
|
|
- stdout telemetry through
|
|
`AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED` and
|
|
`AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED`
|
|
|
|
Operational caveats:
|
|
|
|
- the service exposes no `/healthz`, `/readyz`, or `/metrics` endpoints
|
|
- user-service and mail-service default to in-process stub adapters until
|
|
`rest` mode is configured
|
|
- startup performs bounded Redis `PING` checks for every Redis-backed adapter
|
|
and fails fast if Redis or runtime config is invalid
|
|
|
|
Additional module docs:
|
|
|
|
- [Public REST contract](api/public-openapi.yaml)
|
|
- [Internal REST contract](api/internal-openapi.yaml)
|
|
- [Documentation index](docs/README.md)
|
|
- [Edge Gateway README](../gateway/README.md)
|
|
|
|
## Purpose
|
|
|
|
`Auth / Session Service` owns e-mail-code authentication and the lifecycle of
|
|
device sessions.
|
|
|
|
It is the source of truth for:
|
|
|
|
- authentication challenges
|
|
- device sessions
|
|
- revoke and block state
|
|
- publication of session lifecycle updates consumed by
|
|
[`Edge Gateway`](../gateway/README.md)
|
|
|
|
The service is intentionally not on the hot path for every authenticated
|
|
request. Gateway authenticates the steady-state request path from its own cache
|
|
and session-lifecycle updates rather than by synchronous round-trips back to
|
|
auth for each command.
|
|
|
|
## Responsibilities
|
|
|
|
The service is responsible for:
|
|
|
|
- public auth commands:
|
|
- `send-email-code`
|
|
- `confirm-email-code`
|
|
- creating device sessions after successful confirmation
|
|
- registering the client public key for a newly created session
|
|
- revoking one device session
|
|
- revoking all sessions of one user
|
|
- blocking a user or e-mail subject for future auth flows
|
|
- persisting source-of-truth session state
|
|
- projecting session state into gateway-consumable Redis data
|
|
- exposing a trusted internal REST API for read, revoke, and block operations
|
|
|
|
The service is not responsible for:
|
|
|
|
- verifying authenticated transport signatures on every business request
|
|
- gateway anti-replay for authenticated command traffic
|
|
- downstream business authorization
|
|
- direct push delivery to clients
|
|
- long-lived hot-path session caching inside gateway
|
|
- mail-service implementation details beyond the mail-delivery contract
|
|
|
|
## Position in the System
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Client["Client"]
|
|
Gateway["Edge Gateway"]
|
|
Auth["Auth / Session Service"]
|
|
User["User Service"]
|
|
Mail["Mail Service"]
|
|
Redis["Redis"]
|
|
Business["Business Services"]
|
|
|
|
Client --> Gateway
|
|
Gateway --> Auth
|
|
Gateway --> Business
|
|
Auth --> User
|
|
Auth --> Mail
|
|
Auth --> Redis
|
|
Redis --> Gateway
|
|
```
|
|
|
|
## Main Principles
|
|
|
|
- public auth stays synchronous
|
|
- `send-email-code` returns `challenge_id`
|
|
- `confirm-email-code` returns a ready `device_session_id`
|
|
- no pending async session-provisioning stage exists
|
|
- source-of-truth session state and gateway-facing projection remain separate
|
|
- Redis is the initial backend, but the domain and service layers stay storage
|
|
agnostic behind ports
|
|
- `send-email-code` stays success-shaped for existing, new, blocked, and
|
|
throttled e-mail flows
|
|
- `confirm-email-code` supports short-window idempotent retry for the same
|
|
confirmed challenge and the same `client_public_key`
|
|
- active-session limits are configuration driven:
|
|
- absent limit means disabled
|
|
- limit overflow rejects new session creation explicitly
|
|
- the service does not evict existing sessions to make room
|
|
|
|
## Gateway-Facing Public Contract
|
|
|
|
Gateway already exposes the public REST auth surface and delegates it to this
|
|
service:
|
|
|
|
- `POST /api/v1/public/auth/send-email-code`
|
|
- `POST /api/v1/public/auth/confirm-email-code`
|
|
|
|
The effective DTO contract is:
|
|
|
|
| Operation | Request | Success response |
|
|
| --- | --- | --- |
|
|
| `POST /api/v1/public/auth/send-email-code` | `{ "email": string }` | `{ "challenge_id": string }` |
|
|
| `POST /api/v1/public/auth/confirm-email-code` | `{ "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string }` | `{ "device_session_id": string }` |
|
|
|
|
`client_public_key` is the standard base64-encoded raw 32-byte Ed25519 public
|
|
key registered for the created device session.
|
|
`time_zone` is the client-selected IANA time zone name. During the current
|
|
rollout phase, successful confirms forward create-only user registration
|
|
context to `User Service` as `preferred_language="en"` and the supplied
|
|
`time_zone` until gateway geoip-based language derivation is deployed.
|
|
`User Service` now validates `preferred_language` as BCP 47 and canonicalizes
|
|
the stored value on creation, so any future derived language must already be a
|
|
valid BCP 47 tag before auth forwards it.
|
|
|
|
Public boundary rules:
|
|
|
|
- requests and responses are JSON only
|
|
- request DTOs reject unknown fields
|
|
- empty bodies, malformed JSON, trailing JSON input, and unknown fields return
|
|
`400 invalid_request`
|
|
- surrounding ASCII and Unicode whitespace is trimmed from input string fields
|
|
before validation
|
|
- `confirm-email-code` requires a non-empty `time_zone` and validates it as an
|
|
IANA time zone name
|
|
- `send-email-code` remains success-shaped for existing, new, blocked, and
|
|
throttled e-mail paths
|
|
- `confirm-email-code` returns a ready `device_session_id` synchronously on
|
|
success
|
|
|
|
Stable public business-error contract:
|
|
|
|
| HTTP status | `error.code` | Stable `error.message` |
|
|
| --- | --- | --- |
|
|
| `400` | `invalid_request` | field-specific validation detail |
|
|
| `400` | `invalid_code` | `confirmation code is invalid` |
|
|
| `400` | `invalid_client_public_key` | `client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key` |
|
|
| `403` | `blocked_by_policy` | `authentication is blocked by policy` |
|
|
| `404` | `challenge_not_found` | `challenge not found` |
|
|
| `409` | `session_limit_exceeded` | `active session limit would be exceeded` |
|
|
| `410` | `challenge_expired` | `challenge expired` |
|
|
| `503` | `service_unavailable` | `service is unavailable` |
|
|
|
|
The public error envelope is always:
|
|
|
|
```json
|
|
{
|
|
"error": {
|
|
"code": "string",
|
|
"message": "string"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Trusted Internal API
|
|
|
|
The trusted internal REST surface lives under `/api/v1/internal` and is
|
|
documented in [`api/internal-openapi.yaml`](api/internal-openapi.yaml).
|
|
|
|
Implemented endpoints:
|
|
|
|
- `GET /api/v1/internal/sessions/{device_session_id}`
|
|
- `GET /api/v1/internal/users/{user_id}/sessions`
|
|
- `POST /api/v1/internal/sessions/{device_session_id}/revoke`
|
|
- `POST /api/v1/internal/users/{user_id}/sessions/revoke-all`
|
|
- `POST /api/v1/internal/user-blocks`
|
|
|
|
Key internal API properties:
|
|
|
|
- all bodies are JSON only
|
|
- `ListUserSessions` is newest-first and unpaginated in v1
|
|
- revoke and block mutations require audit metadata as `reason_code` and
|
|
`actor`
|
|
- `BlockUser` accepts exactly one of `user_id` or `email`
|
|
- mutating operations are idempotent and return explicit acknowledgement
|
|
payloads rather than empty `204` responses
|
|
|
|
Stable internal error surface:
|
|
|
|
| HTTP status | `error.code` | Stable `error.message` |
|
|
| --- | --- | --- |
|
|
| `400` | `invalid_request` | field-specific validation detail |
|
|
| `404` | `session_not_found` | `session not found` |
|
|
| `404` | `subject_not_found` | `subject not found` |
|
|
| `500` | `internal_error` | `internal server error` |
|
|
| `503` | `service_unavailable` | `service is unavailable` |
|
|
|
|
## Challenge Model
|
|
|
|
A challenge represents one short-lived public e-mail-code flow.
|
|
|
|
Core fields:
|
|
|
|
- `challenge_id`
|
|
- normalized e-mail
|
|
- hashed confirmation code
|
|
- `status`
|
|
- `delivery_state`
|
|
- creation and expiration timestamps
|
|
- send and confirm attempt counters
|
|
- minimal abuse metadata
|
|
- optional confirmation metadata used for idempotent retry
|
|
|
|
### Challenge States
|
|
|
|
Supported `challenge.Status` values:
|
|
|
|
- `pending_send`
|
|
- `sent`
|
|
- `delivery_suppressed`
|
|
- `delivery_throttled`
|
|
- `confirmed_pending_expire`
|
|
- `expired`
|
|
- `failed`
|
|
- `cancelled`
|
|
|
|
Supported `challenge.DeliveryState` values:
|
|
|
|
- `pending`
|
|
- `sent`
|
|
- `suppressed`
|
|
- `throttled`
|
|
- `failed`
|
|
|
|
Policy rules:
|
|
|
|
- initial challenge TTL is `5m`
|
|
- confirmed-challenge retention for idempotent retry is `5m`
|
|
- max invalid confirm attempts is `5`
|
|
- every `send-email-code` call creates a fresh challenge
|
|
- resend throttling is e-mail scoped with a fixed `1m` cooldown
|
|
- a throttled send still creates a fresh challenge in
|
|
`status=delivery_throttled` and `delivery_state=throttled`
|
|
- throttled sends do not call `UserDirectory` and do not call `MailSender`
|
|
- blocked sends outside the throttle path become `delivery_suppressed`
|
|
|
|
Fresh confirm semantics:
|
|
|
|
- only `sent` and `delivery_suppressed` accept a first successful confirm
|
|
- `pending_send`, `delivery_throttled`, `failed`, and `cancelled` return
|
|
`invalid_code`
|
|
- expired challenges return `challenge_expired` while the Redis grace window
|
|
keeps the record present, then `challenge_not_found` after cleanup removes
|
|
the key
|
|
|
|
Idempotent retry semantics:
|
|
|
|
- a repeated confirm with the same `challenge_id`, valid `code`, and identical
|
|
`client_public_key` on `confirmed_pending_expire` returns the same
|
|
`device_session_id`
|
|
- the same confirmed challenge with a different `client_public_key` fails as
|
|
`invalid_code`
|
|
- idempotent retry republishes the stored gateway session view
|
|
|
|
## Device Session And Revoke Model
|
|
|
|
A device session is created only after successful confirmation.
|
|
|
|
Core fields:
|
|
|
|
- `device_session_id`
|
|
- `user_id`
|
|
- parsed client public key
|
|
- `status`
|
|
- `created_at`
|
|
- optional revocation metadata
|
|
|
|
Supported session states:
|
|
|
|
- `active`
|
|
- `revoked`
|
|
|
|
Built-in revoke reason codes:
|
|
|
|
- `device_logout`
|
|
- `logout_all`
|
|
- `admin_revoke`
|
|
- `user_blocked`
|
|
- `confirm_race_repair` for best-effort cleanup of superseded sessions created
|
|
during a confirm race
|
|
|
|
Revoke behavior is intentionally separated by use case:
|
|
|
|
- revoke one device session
|
|
- revoke all sessions of one user
|
|
- block a subject and revoke active sessions implied by that subject
|
|
|
|
Internal mutation responses report only sessions changed by the current call,
|
|
so repeated idempotent operations may return:
|
|
|
|
- `already_revoked` with `affected_session_count=0`
|
|
- `no_active_sessions` with `affected_session_count=0`
|
|
- `already_blocked` with `affected_session_count=0`
|
|
|
|
## User Resolution And Session Limits
|
|
|
|
`Auth / Session Service` does not own durable user records. It delegates to
|
|
`UserDirectory` for:
|
|
|
|
- resolve-by-email without mutation
|
|
- ensure existing-or-created user during confirm
|
|
- existence checks for stable `user_id`
|
|
- block-by-user-id and block-by-email operations
|
|
|
|
Supported user-resolution outcomes:
|
|
|
|
- `existing`
|
|
- `creatable`
|
|
- `blocked`
|
|
|
|
Supported ensure-user outcomes:
|
|
|
|
- `existing`
|
|
- `created`
|
|
- `blocked`
|
|
|
|
Session-limit rules:
|
|
|
|
- the value is loaded from a shared config provider
|
|
- absent value means the limit is disabled
|
|
- active sessions are counted before creating a new one
|
|
- limit overflow returns `session_limit_exceeded`
|
|
- the service never silently revokes an existing session to satisfy the limit
|
|
|
|
## Gateway Projection Model
|
|
|
|
Gateway-facing session projection is separate from source-of-truth
|
|
`devicesession.Session`.
|
|
|
|
Each successful projection publish writes:
|
|
|
|
- one Redis KV snapshot under
|
|
`<gateway_session_cache_key_prefix><device_session_id>`
|
|
- one full-snapshot Redis Stream event under the session-events stream
|
|
|
|
The default gateway-facing namespaces are:
|
|
|
|
- cache key prefix: `gateway:session:`
|
|
- session-events stream: `gateway:session_events`
|
|
|
|
Projected fields are intentionally limited to what gateway consumes:
|
|
|
|
- `device_session_id`
|
|
- `user_id`
|
|
- `client_public_key`
|
|
- `status`
|
|
- optional `revoked_at_ms`
|
|
|
|
Revoke reason and actor metadata stay in authsession source of truth and are
|
|
not projected to gateway.
|
|
|
|
## Consistency Model
|
|
|
|
Source of truth is written first. Gateway projection is published only after
|
|
the source-of-truth write succeeds.
|
|
|
|
Caller-visible rules:
|
|
|
|
- if projection publication does not reach its required success threshold, the
|
|
public or internal call returns `service_unavailable`
|
|
- already-written source-of-truth state is intentionally preserved
|
|
- the documented repair path is to repeat the same confirm or revoke command
|
|
|
|
Projection publish rules:
|
|
|
|
- request-path projection publish uses a bounded retry loop with `3` total
|
|
attempts
|
|
- repeated publishes are safe because the cache snapshot is overwritten and
|
|
duplicate full-snapshot stream events remain valid under gateway's
|
|
later-event-wins model
|
|
- `confirm-email-code` rereads the stored session after the challenge CAS
|
|
succeeds and republishes that current view so a concurrent revoke or block
|
|
cannot overwrite source of truth with a stale active projection
|
|
- idempotent confirm retry also republishes the stored session view
|
|
- best-effort cleanup of superseded confirm-race sessions uses the same
|
|
publish helper but is not part of the caller-visible success contract
|
|
|
|
## Runtime Summary
|
|
|
|
Runtime wiring is implemented in [`internal/app`](internal/app) and
|
|
[`cmd/authsession`](cmd/authsession/main.go).
|
|
|
|
Process-local collaborators:
|
|
|
|
- system UTC clock
|
|
- crypto-random `challenge_id` and `device_session_id` generators
|
|
- crypto-random 6-digit confirmation-code generator
|
|
- bcrypt-backed code hashing
|
|
- structured logging through `zap`
|
|
- process telemetry through OpenTelemetry
|
|
|
|
Redis-backed adapters:
|
|
|
|
- challenge store
|
|
- session store
|
|
- session-limit config provider
|
|
- gateway projection publisher
|
|
- send-email-code abuse protector
|
|
|
|
External service adapters:
|
|
|
|
- user-service:
|
|
- default `stub`
|
|
- optional REST adapter with one retry for read-style methods on transport
|
|
errors and HTTP `502`, `503`, or `504`
|
|
- mutation methods do not auto-retry
|
|
- mail-service:
|
|
- default `stub`
|
|
- optional REST adapter with no automatic retry on transport or upstream
|
|
failure, to avoid duplicate deliveries
|
|
|
|
Listener defaults:
|
|
|
|
- public HTTP: `:8080`
|
|
- internal HTTP: `:8081`
|
|
- read-header timeout: `2s`
|
|
- read timeout: `10s`
|
|
- idle timeout: `1m`
|
|
- per-request use-case timeout: `3s`
|
|
|
|
For detailed runtime behavior, configuration groups, operational notes, and
|
|
examples, see [`docs/README.md`](docs/README.md).
|
|
|
|
## Non-Goals
|
|
|
|
- making authsession a hot synchronous dependency for every authenticated
|
|
gateway command
|
|
- moving business authorization into authsession
|
|
- exposing revoke or read operations as public unauthenticated routes
|
|
- introducing short-lived access-token or refresh-token flows
|
|
- adding pending async session provisioning after confirm
|