Files
galaxy-game/user/docs/runbook.md
T
2026-04-17 18:39:16 +02:00

3.0 KiB

Runbook

Startup Checklist

Before starting userservice, verify:

  • USERSERVICE_REDIS_ADDR points to the intended Redis instance
  • internal HTTP bind address is free
  • optional admin metrics listener does not collide with another process
  • domain-events stream settings match the environment that consumes them

Expected startup behavior:

  • configuration is loaded and validated first
  • Redis-backed stores and publishers are constructed
  • startup fails fast on Redis misconfiguration or connectivity failure

Health And Readiness

userservice does not expose public health endpoints.

Operational readiness is typically checked through one trusted internal route, for example:

  • GET /api/v1/internal/users/{user_id}/exists

with a guaranteed-missing user_id. A healthy process returns 200 with {"exists":false}.

If admin metrics are enabled, /metrics on the admin listener is the additional process-level operational endpoint.

Common Failure Modes

Redis unavailable

Symptoms:

  • process fails during startup
  • internal API returns 503 service_unavailable
  • domain events stop being published

Checks:

  • connectivity to USERSERVICE_REDIS_ADDR
  • Redis ACL credentials
  • Redis DB number
  • TLS setting mismatch

Invalid registration context

Symptoms:

  • ensure-by-email returns 400 invalid_request

Checks:

  • preferred_language is a valid BCP 47 tag
  • time_zone is a valid IANA time-zone name

race_name conflict

Symptoms:

  • profile update returns 409 conflict

Checks:

  • desired race name is not already reserved under canonical uniqueness rules
  • user is not currently blocked by profile_update_block

declared-country sync rejected

Symptoms:

  • geo sync returns 400 invalid_request

Checks:

  • country code is uppercase ISO 3166-1 alpha-2
  • trusted caller is using the intended internal route

Safe Rollout Notes

  • Keep Auth / Session Service and User Service aligned on the current registration_context shape.
  • During the current rollout, treat the authsession-provided preferred_language derived from public Accept-Language, with fallback to en, as the active create-path contract.
  • Gateway direct user.* self-service routing depends on the internal REST routes staying stable.
  • Do not roll out billing-driven entitlement mutations assuming another service owns current entitlement state. User Service remains the source of truth for current entitlement.

Debugging Data Mismatches

When a caller reports mismatched user state:

  1. Read the current account aggregate through the trusted internal route.
  2. Confirm whether the discrepancy is in source-of-truth state or in a downstream projection.
  3. If the issue concerns declared-country workflow history, switch to Geo Profile Service; User Service stores only the current effective value.
  4. If the issue concerns authenticated edge transport, verify the same user through gateway user.account.get to distinguish transport problems from source-of-truth problems.