feat: authsession service
This commit is contained in:
@@ -0,0 +1,459 @@
|
||||
# Auth / Session Service
|
||||
|
||||
## Run and Dependencies
|
||||
|
||||
`cmd/authsession` starts two HTTP listeners:
|
||||
|
||||
- public REST on `AUTHSESSION_PUBLIC_HTTP_ADDR` with default `:8080`
|
||||
- trusted internal REST on `AUTHSESSION_INTERNAL_HTTP_ADDR` with default `:8081`
|
||||
|
||||
Startup requires:
|
||||
|
||||
- one reachable Redis deployment configured by `AUTHSESSION_REDIS_ADDR`
|
||||
|
||||
That Redis deployment is used for:
|
||||
|
||||
- source-of-truth challenges
|
||||
- source-of-truth device sessions
|
||||
- dynamic active-session limit config
|
||||
- gateway session projection cache and stream updates
|
||||
- send-email-code resend throttling
|
||||
|
||||
Optional integrations:
|
||||
|
||||
- `AUTHSESSION_USER_SERVICE_MODE=stub|rest`
|
||||
- `AUTHSESSION_MAIL_SERVICE_MODE=stub|rest`
|
||||
- OTLP telemetry through standard `OTEL_*` variables
|
||||
- stdout telemetry through
|
||||
`AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED` and
|
||||
`AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED`
|
||||
|
||||
Operational caveats:
|
||||
|
||||
- the service exposes no `/healthz`, `/readyz`, or `/metrics` endpoints
|
||||
- user-service and mail-service default to in-process stub adapters until
|
||||
`rest` mode is configured
|
||||
- startup performs bounded Redis `PING` checks for every Redis-backed adapter
|
||||
and fails fast if Redis or runtime config is invalid
|
||||
|
||||
Additional module docs:
|
||||
|
||||
- [Public REST contract](api/public-openapi.yaml)
|
||||
- [Internal REST contract](api/internal-openapi.yaml)
|
||||
- [Documentation index](docs/README.md)
|
||||
- [Edge Gateway README](../gateway/README.md)
|
||||
|
||||
## Purpose
|
||||
|
||||
`Auth / Session Service` owns e-mail-code authentication and the lifecycle of
|
||||
device sessions.
|
||||
|
||||
It is the source of truth for:
|
||||
|
||||
- authentication challenges
|
||||
- device sessions
|
||||
- revoke and block state
|
||||
- publication of session lifecycle updates consumed by
|
||||
[`Edge Gateway`](../gateway/README.md)
|
||||
|
||||
The service is intentionally not on the hot path for every authenticated
|
||||
request. Gateway authenticates the steady-state request path from its own cache
|
||||
and session-lifecycle updates rather than by synchronous round-trips back to
|
||||
auth for each command.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
The service is responsible for:
|
||||
|
||||
- public auth commands:
|
||||
- `send-email-code`
|
||||
- `confirm-email-code`
|
||||
- creating device sessions after successful confirmation
|
||||
- registering the client public key for a newly created session
|
||||
- revoking one device session
|
||||
- revoking all sessions of one user
|
||||
- blocking a user or e-mail subject for future auth flows
|
||||
- persisting source-of-truth session state
|
||||
- projecting session state into gateway-consumable Redis data
|
||||
- exposing a trusted internal REST API for read, revoke, and block operations
|
||||
|
||||
The service is not responsible for:
|
||||
|
||||
- verifying authenticated transport signatures on every business request
|
||||
- gateway anti-replay for authenticated command traffic
|
||||
- downstream business authorization
|
||||
- direct push delivery to clients
|
||||
- long-lived hot-path session caching inside gateway
|
||||
- mail-service implementation details beyond the mail-delivery contract
|
||||
|
||||
## Position in the System
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
Client["Client"]
|
||||
Gateway["Edge Gateway"]
|
||||
Auth["Auth / Session Service"]
|
||||
User["User Service"]
|
||||
Mail["Mail Service"]
|
||||
Redis["Redis"]
|
||||
Business["Business Services"]
|
||||
|
||||
Client --> Gateway
|
||||
Gateway --> Auth
|
||||
Gateway --> Business
|
||||
Auth --> User
|
||||
Auth --> Mail
|
||||
Auth --> Redis
|
||||
Redis --> Gateway
|
||||
```
|
||||
|
||||
## Main Principles
|
||||
|
||||
- public auth stays synchronous
|
||||
- `send-email-code` returns `challenge_id`
|
||||
- `confirm-email-code` returns a ready `device_session_id`
|
||||
- no pending async session-provisioning stage exists
|
||||
- source-of-truth session state and gateway-facing projection remain separate
|
||||
- Redis is the initial backend, but the domain and service layers stay storage
|
||||
agnostic behind ports
|
||||
- `send-email-code` stays success-shaped for existing, new, blocked, and
|
||||
throttled e-mail flows
|
||||
- `confirm-email-code` supports short-window idempotent retry for the same
|
||||
confirmed challenge and the same `client_public_key`
|
||||
- active-session limits are configuration driven:
|
||||
- absent limit means disabled
|
||||
- limit overflow rejects new session creation explicitly
|
||||
- the service does not evict existing sessions to make room
|
||||
|
||||
## Gateway-Facing Public Contract
|
||||
|
||||
Gateway already exposes the public REST auth surface and delegates it to this
|
||||
service:
|
||||
|
||||
- `POST /api/v1/public/auth/send-email-code`
|
||||
- `POST /api/v1/public/auth/confirm-email-code`
|
||||
|
||||
The effective DTO contract is:
|
||||
|
||||
| Operation | Request | Success response |
|
||||
| --- | --- | --- |
|
||||
| `POST /api/v1/public/auth/send-email-code` | `{ "email": string }` | `{ "challenge_id": string }` |
|
||||
| `POST /api/v1/public/auth/confirm-email-code` | `{ "challenge_id": string, "code": string, "client_public_key": string }` | `{ "device_session_id": string }` |
|
||||
|
||||
`client_public_key` is the standard base64-encoded raw 32-byte Ed25519 public
|
||||
key registered for the created device session.
|
||||
|
||||
Public boundary rules:
|
||||
|
||||
- requests and responses are JSON only
|
||||
- request DTOs reject unknown fields
|
||||
- empty bodies, malformed JSON, trailing JSON input, and unknown fields return
|
||||
`400 invalid_request`
|
||||
- surrounding ASCII and Unicode whitespace is trimmed from input string fields
|
||||
before validation
|
||||
- `send-email-code` remains success-shaped for existing, new, blocked, and
|
||||
throttled e-mail paths
|
||||
- `confirm-email-code` returns a ready `device_session_id` synchronously on
|
||||
success
|
||||
|
||||
Stable public business-error contract:
|
||||
|
||||
| HTTP status | `error.code` | Stable `error.message` |
|
||||
| --- | --- | --- |
|
||||
| `400` | `invalid_request` | field-specific validation detail |
|
||||
| `400` | `invalid_code` | `confirmation code is invalid` |
|
||||
| `400` | `invalid_client_public_key` | `client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key` |
|
||||
| `403` | `blocked_by_policy` | `authentication is blocked by policy` |
|
||||
| `404` | `challenge_not_found` | `challenge not found` |
|
||||
| `409` | `session_limit_exceeded` | `active session limit would be exceeded` |
|
||||
| `410` | `challenge_expired` | `challenge expired` |
|
||||
| `503` | `service_unavailable` | `service is unavailable` |
|
||||
|
||||
The public error envelope is always:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"code": "string",
|
||||
"message": "string"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Trusted Internal API
|
||||
|
||||
The trusted internal REST surface lives under `/api/v1/internal` and is
|
||||
documented in [`api/internal-openapi.yaml`](api/internal-openapi.yaml).
|
||||
|
||||
Implemented endpoints:
|
||||
|
||||
- `GET /api/v1/internal/sessions/{device_session_id}`
|
||||
- `GET /api/v1/internal/users/{user_id}/sessions`
|
||||
- `POST /api/v1/internal/sessions/{device_session_id}/revoke`
|
||||
- `POST /api/v1/internal/users/{user_id}/sessions/revoke-all`
|
||||
- `POST /api/v1/internal/user-blocks`
|
||||
|
||||
Key internal API properties:
|
||||
|
||||
- all bodies are JSON only
|
||||
- `ListUserSessions` is newest-first and unpaginated in v1
|
||||
- revoke and block mutations require audit metadata as `reason_code` and
|
||||
`actor`
|
||||
- `BlockUser` accepts exactly one of `user_id` or `email`
|
||||
- mutating operations are idempotent and return explicit acknowledgement
|
||||
payloads rather than empty `204` responses
|
||||
|
||||
Stable internal error surface:
|
||||
|
||||
| HTTP status | `error.code` | Stable `error.message` |
|
||||
| --- | --- | --- |
|
||||
| `400` | `invalid_request` | field-specific validation detail |
|
||||
| `404` | `session_not_found` | `session not found` |
|
||||
| `404` | `subject_not_found` | `subject not found` |
|
||||
| `500` | `internal_error` | `internal server error` |
|
||||
| `503` | `service_unavailable` | `service is unavailable` |
|
||||
|
||||
## Challenge Model
|
||||
|
||||
A challenge represents one short-lived public e-mail-code flow.
|
||||
|
||||
Core fields:
|
||||
|
||||
- `challenge_id`
|
||||
- normalized e-mail
|
||||
- hashed confirmation code
|
||||
- `status`
|
||||
- `delivery_state`
|
||||
- creation and expiration timestamps
|
||||
- send and confirm attempt counters
|
||||
- minimal abuse metadata
|
||||
- optional confirmation metadata used for idempotent retry
|
||||
|
||||
### Challenge States
|
||||
|
||||
Supported `challenge.Status` values:
|
||||
|
||||
- `pending_send`
|
||||
- `sent`
|
||||
- `delivery_suppressed`
|
||||
- `delivery_throttled`
|
||||
- `confirmed_pending_expire`
|
||||
- `expired`
|
||||
- `failed`
|
||||
- `cancelled`
|
||||
|
||||
Supported `challenge.DeliveryState` values:
|
||||
|
||||
- `pending`
|
||||
- `sent`
|
||||
- `suppressed`
|
||||
- `throttled`
|
||||
- `failed`
|
||||
|
||||
Policy rules:
|
||||
|
||||
- initial challenge TTL is `5m`
|
||||
- confirmed-challenge retention for idempotent retry is `5m`
|
||||
- max invalid confirm attempts is `5`
|
||||
- every `send-email-code` call creates a fresh challenge
|
||||
- resend throttling is e-mail scoped with a fixed `1m` cooldown
|
||||
- a throttled send still creates a fresh challenge in
|
||||
`status=delivery_throttled` and `delivery_state=throttled`
|
||||
- throttled sends do not call `UserDirectory` and do not call `MailSender`
|
||||
- blocked sends outside the throttle path become `delivery_suppressed`
|
||||
|
||||
Fresh confirm semantics:
|
||||
|
||||
- only `sent` and `delivery_suppressed` accept a first successful confirm
|
||||
- `pending_send`, `delivery_throttled`, `failed`, and `cancelled` return
|
||||
`invalid_code`
|
||||
- expired challenges return `challenge_expired` while the Redis grace window
|
||||
keeps the record present, then `challenge_not_found` after cleanup removes
|
||||
the key
|
||||
|
||||
Idempotent retry semantics:
|
||||
|
||||
- a repeated confirm with the same `challenge_id`, valid `code`, and identical
|
||||
`client_public_key` on `confirmed_pending_expire` returns the same
|
||||
`device_session_id`
|
||||
- the same confirmed challenge with a different `client_public_key` fails as
|
||||
`invalid_code`
|
||||
- idempotent retry republishes the stored gateway session view
|
||||
|
||||
## Device Session And Revoke Model
|
||||
|
||||
A device session is created only after successful confirmation.
|
||||
|
||||
Core fields:
|
||||
|
||||
- `device_session_id`
|
||||
- `user_id`
|
||||
- parsed client public key
|
||||
- `status`
|
||||
- `created_at`
|
||||
- optional revocation metadata
|
||||
|
||||
Supported session states:
|
||||
|
||||
- `active`
|
||||
- `revoked`
|
||||
|
||||
Built-in revoke reason codes:
|
||||
|
||||
- `device_logout`
|
||||
- `logout_all`
|
||||
- `admin_revoke`
|
||||
- `user_blocked`
|
||||
- `confirm_race_repair` for best-effort cleanup of superseded sessions created
|
||||
during a confirm race
|
||||
|
||||
Revoke behavior is intentionally separated by use case:
|
||||
|
||||
- revoke one device session
|
||||
- revoke all sessions of one user
|
||||
- block a subject and revoke active sessions implied by that subject
|
||||
|
||||
Internal mutation responses report only sessions changed by the current call,
|
||||
so repeated idempotent operations may return:
|
||||
|
||||
- `already_revoked` with `affected_session_count=0`
|
||||
- `no_active_sessions` with `affected_session_count=0`
|
||||
- `already_blocked` with `affected_session_count=0`
|
||||
|
||||
## User Resolution And Session Limits
|
||||
|
||||
`Auth / Session Service` does not own durable user records. It delegates to
|
||||
`UserDirectory` for:
|
||||
|
||||
- resolve-by-email without mutation
|
||||
- ensure existing-or-created user during confirm
|
||||
- existence checks for stable `user_id`
|
||||
- block-by-user-id and block-by-email operations
|
||||
|
||||
Supported user-resolution outcomes:
|
||||
|
||||
- `existing`
|
||||
- `creatable`
|
||||
- `blocked`
|
||||
|
||||
Supported ensure-user outcomes:
|
||||
|
||||
- `existing`
|
||||
- `created`
|
||||
- `blocked`
|
||||
|
||||
Session-limit rules:
|
||||
|
||||
- the value is loaded from a shared config provider
|
||||
- absent value means the limit is disabled
|
||||
- active sessions are counted before creating a new one
|
||||
- limit overflow returns `session_limit_exceeded`
|
||||
- the service never silently revokes an existing session to satisfy the limit
|
||||
|
||||
## Gateway Projection Model
|
||||
|
||||
Gateway-facing session projection is separate from source-of-truth
|
||||
`devicesession.Session`.
|
||||
|
||||
Each successful projection publish writes:
|
||||
|
||||
- one Redis KV snapshot under
|
||||
`<gateway_session_cache_key_prefix><device_session_id>`
|
||||
- one full-snapshot Redis Stream event under the session-events stream
|
||||
|
||||
The default gateway-facing namespaces are:
|
||||
|
||||
- cache key prefix: `gateway:session:`
|
||||
- session-events stream: `gateway:session_events`
|
||||
|
||||
Projected fields are intentionally limited to what gateway consumes:
|
||||
|
||||
- `device_session_id`
|
||||
- `user_id`
|
||||
- `client_public_key`
|
||||
- `status`
|
||||
- optional `revoked_at_ms`
|
||||
|
||||
Revoke reason and actor metadata stay in authsession source of truth and are
|
||||
not projected to gateway.
|
||||
|
||||
## Consistency Model
|
||||
|
||||
Source of truth is written first. Gateway projection is published only after
|
||||
the source-of-truth write succeeds.
|
||||
|
||||
Caller-visible rules:
|
||||
|
||||
- if projection publication does not reach its required success threshold, the
|
||||
public or internal call returns `service_unavailable`
|
||||
- already-written source-of-truth state is intentionally preserved
|
||||
- the documented repair path is to repeat the same confirm or revoke command
|
||||
|
||||
Projection publish rules:
|
||||
|
||||
- request-path projection publish uses a bounded retry loop with `3` total
|
||||
attempts
|
||||
- repeated publishes are safe because the cache snapshot is overwritten and
|
||||
duplicate full-snapshot stream events remain valid under gateway's
|
||||
later-event-wins model
|
||||
- `confirm-email-code` rereads the stored session after the challenge CAS
|
||||
succeeds and republishes that current view so a concurrent revoke or block
|
||||
cannot overwrite source of truth with a stale active projection
|
||||
- idempotent confirm retry also republishes the stored session view
|
||||
- best-effort cleanup of superseded confirm-race sessions uses the same
|
||||
publish helper but is not part of the caller-visible success contract
|
||||
|
||||
## Runtime Summary
|
||||
|
||||
Runtime wiring is implemented in [`internal/app`](internal/app) and
|
||||
[`cmd/authsession`](cmd/authsession/main.go).
|
||||
|
||||
Process-local collaborators:
|
||||
|
||||
- system UTC clock
|
||||
- crypto-random `challenge_id` and `device_session_id` generators
|
||||
- crypto-random 6-digit confirmation-code generator
|
||||
- bcrypt-backed code hashing
|
||||
- structured logging through `zap`
|
||||
- process telemetry through OpenTelemetry
|
||||
|
||||
Redis-backed adapters:
|
||||
|
||||
- challenge store
|
||||
- session store
|
||||
- session-limit config provider
|
||||
- gateway projection publisher
|
||||
- send-email-code abuse protector
|
||||
|
||||
External service adapters:
|
||||
|
||||
- user-service:
|
||||
- default `stub`
|
||||
- optional REST adapter with one retry for read-style methods on transport
|
||||
errors and HTTP `502`, `503`, or `504`
|
||||
- mutation methods do not auto-retry
|
||||
- mail-service:
|
||||
- default `stub`
|
||||
- optional REST adapter with no automatic retry on transport or upstream
|
||||
failure, to avoid duplicate deliveries
|
||||
|
||||
Listener defaults:
|
||||
|
||||
- public HTTP: `:8080`
|
||||
- internal HTTP: `:8081`
|
||||
- read-header timeout: `2s`
|
||||
- read timeout: `10s`
|
||||
- idle timeout: `1m`
|
||||
- per-request use-case timeout: `3s`
|
||||
|
||||
For detailed runtime behavior, configuration groups, operational notes, and
|
||||
examples, see [`docs/README.md`](docs/README.md).
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- making authsession a hot synchronous dependency for every authenticated
|
||||
gateway command
|
||||
- moving business authorization into authsession
|
||||
- exposing revoke or read operations as public unauthenticated routes
|
||||
- introducing short-lived access-token or refresh-token flows
|
||||
- adding pending async session provisioning after confirm
|
||||
Reference in New Issue
Block a user