Files
galaxy-game/user/docs/runbook.md
T
2026-04-25 23:20:55 +02:00

111 lines
3.1 KiB
Markdown

# Runbook
## Startup Checklist
Before starting `userservice`, verify:
- `USERSERVICE_REDIS_ADDR` points to the intended Redis instance
- internal HTTP bind address is free
- optional admin metrics listener does not collide with another process
- domain-events stream settings match the environment that consumes them
Expected startup behavior:
- configuration is loaded and validated first
- Redis-backed stores and publishers are constructed
- startup fails fast on Redis misconfiguration or connectivity failure
## Health And Readiness
`userservice` does not expose public health endpoints.
Operational readiness is typically checked through one trusted internal route,
for example:
- `GET /api/v1/internal/users/{user_id}/exists`
with a guaranteed-missing `user_id`. A healthy process returns `200` with
`{"exists":false}`.
If admin metrics are enabled, `/metrics` on the admin listener is the
additional process-level operational endpoint.
## Common Failure Modes
### Redis unavailable
Symptoms:
- process fails during startup
- internal API returns `503 service_unavailable`
- domain events stop being published
Checks:
- connectivity to `USERSERVICE_REDIS_ADDR`
- Redis ACL credentials
- Redis DB number
- TLS setting mismatch
### Invalid registration context
Symptoms:
- `ensure-by-email` returns `400 invalid_request`
Checks:
- `preferred_language` is a valid BCP 47 tag
- `time_zone` is a valid IANA time-zone name
### profile update rejected
Symptoms:
- profile update returns `400 invalid_request` or `409 conflict`
Checks:
- submitted `display_name` passes `pkg/util/string.go:ValidateTypeName`; empty
values are accepted and reset the stored display name
- user is not currently blocked by `profile_update_block`
- `user_name` is immutable; any attempt to mutate it surfaces as
`409 conflict`
### declared-country sync rejected
Symptoms:
- geo sync returns `400 invalid_request`
Checks:
- country code is uppercase ISO 3166-1 alpha-2
- trusted caller is using the intended internal route
## Safe Rollout Notes
- Keep `Auth / Session Service` and `User Service` aligned on the current
`registration_context` shape.
- During the current rollout, treat the authsession-provided
`preferred_language` derived from public `Accept-Language`, with fallback to
`en`, as the active create-path contract.
- Gateway direct `user.*` self-service routing depends on the internal REST
routes staying stable.
- Do not roll out billing-driven entitlement mutations assuming another
service owns current entitlement state. `User Service` remains the source of
truth for current entitlement.
## Debugging Data Mismatches
When a caller reports mismatched user state:
1. Read the current account aggregate through the trusted internal route.
2. Confirm whether the discrepancy is in source-of-truth state or in a
downstream projection.
3. If the issue concerns declared-country workflow history, switch to `Geo
Profile Service`; `User Service` stores only the current effective value.
4. If the issue concerns authenticated edge transport, verify the same user
through gateway `user.account.get` to distinguish transport problems from
source-of-truth problems.