Files
galaxy-game/authsession

Auth / Session Service

Run and Dependencies

cmd/authsession starts two HTTP listeners:

  • public REST on AUTHSESSION_PUBLIC_HTTP_ADDR with default :8080
  • trusted internal REST on AUTHSESSION_INTERNAL_HTTP_ADDR with default :8081

Startup requires:

  • one reachable Redis deployment configured by AUTHSESSION_REDIS_ADDR

That Redis deployment is used for:

  • source-of-truth challenges
  • source-of-truth device sessions
  • dynamic active-session limit config
  • gateway session projection cache and stream updates
  • send-email-code resend throttling

Optional integrations:

  • AUTHSESSION_USER_SERVICE_MODE=stub|rest
  • AUTHSESSION_MAIL_SERVICE_MODE=stub|rest
  • OTLP telemetry through standard OTEL_* variables
  • stdout telemetry through AUTHSESSION_OTEL_STDOUT_TRACES_ENABLED and AUTHSESSION_OTEL_STDOUT_METRICS_ENABLED

Operational caveats:

  • the service exposes no /healthz, /readyz, or /metrics endpoints
  • user-service and mail-service default to in-process stub adapters until rest mode is configured
  • startup performs bounded Redis PING checks for every Redis-backed adapter and fails fast if Redis or runtime config is invalid

Additional module docs:

Purpose

Auth / Session Service owns e-mail-code authentication and the lifecycle of device sessions.

It is the source of truth for:

  • authentication challenges
  • device sessions
  • revoke and block state
  • publication of session lifecycle updates consumed by Edge Gateway

The service is intentionally not on the hot path for every authenticated request. Gateway authenticates the steady-state request path from its own cache and session-lifecycle updates rather than by synchronous round-trips back to auth for each command.

Responsibilities

The service is responsible for:

  • public auth commands:
    • send-email-code
    • confirm-email-code
  • creating device sessions after successful confirmation
  • registering the client public key for a newly created session
  • revoking one device session
  • revoking all sessions of one user
  • blocking a user or e-mail subject for future auth flows
  • persisting source-of-truth session state
  • projecting session state into gateway-consumable Redis data
  • exposing a trusted internal REST API for read, revoke, and block operations

The service is not responsible for:

  • verifying authenticated transport signatures on every business request
  • gateway anti-replay for authenticated command traffic
  • downstream business authorization
  • direct push delivery to clients
  • long-lived hot-path session caching inside gateway
  • mail-service implementation details beyond the mail-delivery contract

Position in the System

flowchart LR
    Client["Client"]
    Gateway["Edge Gateway"]
    Auth["Auth / Session Service"]
    User["User Service"]
    Mail["Mail Service"]
    Redis["Redis"]
    Business["Business Services"]

    Client --> Gateway
    Gateway --> Auth
    Gateway --> Business
    Auth --> User
    Auth --> Mail
    Auth --> Redis
    Redis --> Gateway

Main Principles

  • public auth stays synchronous
  • send-email-code returns challenge_id
  • confirm-email-code returns a ready device_session_id
  • no pending async session-provisioning stage exists
  • source-of-truth session state and gateway-facing projection remain separate
  • Redis is the initial backend, but the domain and service layers stay storage agnostic behind ports
  • send-email-code stays success-shaped for existing, new, blocked, and throttled e-mail flows
  • confirm-email-code supports short-window idempotent retry for the same confirmed challenge and the same client_public_key
  • active-session limits are configuration driven:
    • absent limit means disabled
    • limit overflow rejects new session creation explicitly
    • the service does not evict existing sessions to make room

Gateway-Facing Public Contract

Gateway already exposes the public REST auth surface and delegates it to this service:

  • POST /api/v1/public/auth/send-email-code
  • POST /api/v1/public/auth/confirm-email-code

The effective DTO contract is:

Operation Request Success response
POST /api/v1/public/auth/send-email-code { "email": string } { "challenge_id": string }
POST /api/v1/public/auth/confirm-email-code { "challenge_id": string, "code": string, "client_public_key": string, "time_zone": string } { "device_session_id": string }

client_public_key is the standard base64-encoded raw 32-byte Ed25519 public key registered for the created device session. time_zone is the client-selected IANA time zone name. During the current rollout phase, successful confirms forward create-only user registration context to User Service as preferred_language="en" and the supplied time_zone until gateway geoip-based language derivation is deployed.

Public boundary rules:

  • requests and responses are JSON only
  • request DTOs reject unknown fields
  • empty bodies, malformed JSON, trailing JSON input, and unknown fields return 400 invalid_request
  • surrounding ASCII and Unicode whitespace is trimmed from input string fields before validation
  • confirm-email-code requires a non-empty time_zone and validates it as an IANA time zone name
  • send-email-code remains success-shaped for existing, new, blocked, and throttled e-mail paths
  • confirm-email-code returns a ready device_session_id synchronously on success

Stable public business-error contract:

HTTP status error.code Stable error.message
400 invalid_request field-specific validation detail
400 invalid_code confirmation code is invalid
400 invalid_client_public_key client_public_key is not a valid base64-encoded raw 32-byte Ed25519 public key
403 blocked_by_policy authentication is blocked by policy
404 challenge_not_found challenge not found
409 session_limit_exceeded active session limit would be exceeded
410 challenge_expired challenge expired
503 service_unavailable service is unavailable

The public error envelope is always:

{
  "error": {
    "code": "string",
    "message": "string"
  }
}

Trusted Internal API

The trusted internal REST surface lives under /api/v1/internal and is documented in api/internal-openapi.yaml.

Implemented endpoints:

  • GET /api/v1/internal/sessions/{device_session_id}
  • GET /api/v1/internal/users/{user_id}/sessions
  • POST /api/v1/internal/sessions/{device_session_id}/revoke
  • POST /api/v1/internal/users/{user_id}/sessions/revoke-all
  • POST /api/v1/internal/user-blocks

Key internal API properties:

  • all bodies are JSON only
  • ListUserSessions is newest-first and unpaginated in v1
  • revoke and block mutations require audit metadata as reason_code and actor
  • BlockUser accepts exactly one of user_id or email
  • mutating operations are idempotent and return explicit acknowledgement payloads rather than empty 204 responses

Stable internal error surface:

HTTP status error.code Stable error.message
400 invalid_request field-specific validation detail
404 session_not_found session not found
404 subject_not_found subject not found
500 internal_error internal server error
503 service_unavailable service is unavailable

Challenge Model

A challenge represents one short-lived public e-mail-code flow.

Core fields:

  • challenge_id
  • normalized e-mail
  • hashed confirmation code
  • status
  • delivery_state
  • creation and expiration timestamps
  • send and confirm attempt counters
  • minimal abuse metadata
  • optional confirmation metadata used for idempotent retry

Challenge States

Supported challenge.Status values:

  • pending_send
  • sent
  • delivery_suppressed
  • delivery_throttled
  • confirmed_pending_expire
  • expired
  • failed
  • cancelled

Supported challenge.DeliveryState values:

  • pending
  • sent
  • suppressed
  • throttled
  • failed

Policy rules:

  • initial challenge TTL is 5m
  • confirmed-challenge retention for idempotent retry is 5m
  • max invalid confirm attempts is 5
  • every send-email-code call creates a fresh challenge
  • resend throttling is e-mail scoped with a fixed 1m cooldown
  • a throttled send still creates a fresh challenge in status=delivery_throttled and delivery_state=throttled
  • throttled sends do not call UserDirectory and do not call MailSender
  • blocked sends outside the throttle path become delivery_suppressed

Fresh confirm semantics:

  • only sent and delivery_suppressed accept a first successful confirm
  • pending_send, delivery_throttled, failed, and cancelled return invalid_code
  • expired challenges return challenge_expired while the Redis grace window keeps the record present, then challenge_not_found after cleanup removes the key

Idempotent retry semantics:

  • a repeated confirm with the same challenge_id, valid code, and identical client_public_key on confirmed_pending_expire returns the same device_session_id
  • the same confirmed challenge with a different client_public_key fails as invalid_code
  • idempotent retry republishes the stored gateway session view

Device Session And Revoke Model

A device session is created only after successful confirmation.

Core fields:

  • device_session_id
  • user_id
  • parsed client public key
  • status
  • created_at
  • optional revocation metadata

Supported session states:

  • active
  • revoked

Built-in revoke reason codes:

  • device_logout
  • logout_all
  • admin_revoke
  • user_blocked
  • confirm_race_repair for best-effort cleanup of superseded sessions created during a confirm race

Revoke behavior is intentionally separated by use case:

  • revoke one device session
  • revoke all sessions of one user
  • block a subject and revoke active sessions implied by that subject

Internal mutation responses report only sessions changed by the current call, so repeated idempotent operations may return:

  • already_revoked with affected_session_count=0
  • no_active_sessions with affected_session_count=0
  • already_blocked with affected_session_count=0

User Resolution And Session Limits

Auth / Session Service does not own durable user records. It delegates to UserDirectory for:

  • resolve-by-email without mutation
  • ensure existing-or-created user during confirm
  • existence checks for stable user_id
  • block-by-user-id and block-by-email operations

Supported user-resolution outcomes:

  • existing
  • creatable
  • blocked

Supported ensure-user outcomes:

  • existing
  • created
  • blocked

Session-limit rules:

  • the value is loaded from a shared config provider
  • absent value means the limit is disabled
  • active sessions are counted before creating a new one
  • limit overflow returns session_limit_exceeded
  • the service never silently revokes an existing session to satisfy the limit

Gateway Projection Model

Gateway-facing session projection is separate from source-of-truth devicesession.Session.

Each successful projection publish writes:

  • one Redis KV snapshot under <gateway_session_cache_key_prefix><device_session_id>
  • one full-snapshot Redis Stream event under the session-events stream

The default gateway-facing namespaces are:

  • cache key prefix: gateway:session:
  • session-events stream: gateway:session_events

Projected fields are intentionally limited to what gateway consumes:

  • device_session_id
  • user_id
  • client_public_key
  • status
  • optional revoked_at_ms

Revoke reason and actor metadata stay in authsession source of truth and are not projected to gateway.

Consistency Model

Source of truth is written first. Gateway projection is published only after the source-of-truth write succeeds.

Caller-visible rules:

  • if projection publication does not reach its required success threshold, the public or internal call returns service_unavailable
  • already-written source-of-truth state is intentionally preserved
  • the documented repair path is to repeat the same confirm or revoke command

Projection publish rules:

  • request-path projection publish uses a bounded retry loop with 3 total attempts
  • repeated publishes are safe because the cache snapshot is overwritten and duplicate full-snapshot stream events remain valid under gateway's later-event-wins model
  • confirm-email-code rereads the stored session after the challenge CAS succeeds and republishes that current view so a concurrent revoke or block cannot overwrite source of truth with a stale active projection
  • idempotent confirm retry also republishes the stored session view
  • best-effort cleanup of superseded confirm-race sessions uses the same publish helper but is not part of the caller-visible success contract

Runtime Summary

Runtime wiring is implemented in internal/app and cmd/authsession.

Process-local collaborators:

  • system UTC clock
  • crypto-random challenge_id and device_session_id generators
  • crypto-random 6-digit confirmation-code generator
  • bcrypt-backed code hashing
  • structured logging through zap
  • process telemetry through OpenTelemetry

Redis-backed adapters:

  • challenge store
  • session store
  • session-limit config provider
  • gateway projection publisher
  • send-email-code abuse protector

External service adapters:

  • user-service:
    • default stub
    • optional REST adapter with one retry for read-style methods on transport errors and HTTP 502, 503, or 504
    • mutation methods do not auto-retry
  • mail-service:
    • default stub
    • optional REST adapter with no automatic retry on transport or upstream failure, to avoid duplicate deliveries

Listener defaults:

  • public HTTP: :8080
  • internal HTTP: :8081
  • read-header timeout: 2s
  • read timeout: 10s
  • idle timeout: 1m
  • per-request use-case timeout: 3s

For detailed runtime behavior, configuration groups, operational notes, and examples, see docs/README.md.

Non-Goals

  • making authsession a hot synchronous dependency for every authenticated gateway command
  • moving business authorization into authsession
  • exposing revoke or read operations as public unauthenticated routes
  • introducing short-lived access-token or refresh-token flows
  • adding pending async session provisioning after confirm