111 lines
3.1 KiB
Markdown
111 lines
3.1 KiB
Markdown
# Runbook
|
|
|
|
## Startup Checklist
|
|
|
|
Before starting `userservice`, verify:
|
|
|
|
- `USERSERVICE_REDIS_ADDR` points to the intended Redis instance
|
|
- internal HTTP bind address is free
|
|
- optional admin metrics listener does not collide with another process
|
|
- domain-events stream settings match the environment that consumes them
|
|
|
|
Expected startup behavior:
|
|
|
|
- configuration is loaded and validated first
|
|
- Redis-backed stores and publishers are constructed
|
|
- startup fails fast on Redis misconfiguration or connectivity failure
|
|
|
|
## Health And Readiness
|
|
|
|
`userservice` does not expose public health endpoints.
|
|
|
|
Operational readiness is typically checked through one trusted internal route,
|
|
for example:
|
|
|
|
- `GET /api/v1/internal/users/{user_id}/exists`
|
|
|
|
with a guaranteed-missing `user_id`. A healthy process returns `200` with
|
|
`{"exists":false}`.
|
|
|
|
If admin metrics are enabled, `/metrics` on the admin listener is the
|
|
additional process-level operational endpoint.
|
|
|
|
## Common Failure Modes
|
|
|
|
### Redis unavailable
|
|
|
|
Symptoms:
|
|
|
|
- process fails during startup
|
|
- internal API returns `503 service_unavailable`
|
|
- domain events stop being published
|
|
|
|
Checks:
|
|
|
|
- connectivity to `USERSERVICE_REDIS_ADDR`
|
|
- Redis ACL credentials
|
|
- Redis DB number
|
|
- TLS setting mismatch
|
|
|
|
### Invalid registration context
|
|
|
|
Symptoms:
|
|
|
|
- `ensure-by-email` returns `400 invalid_request`
|
|
|
|
Checks:
|
|
|
|
- `preferred_language` is a valid BCP 47 tag
|
|
- `time_zone` is a valid IANA time-zone name
|
|
|
|
### profile update rejected
|
|
|
|
Symptoms:
|
|
|
|
- profile update returns `400 invalid_request` or `409 conflict`
|
|
|
|
Checks:
|
|
|
|
- submitted `display_name` passes `pkg/util/string.go:ValidateTypeName`; empty
|
|
values are accepted and reset the stored display name
|
|
- user is not currently blocked by `profile_update_block`
|
|
- `user_name` is immutable; any attempt to mutate it surfaces as
|
|
`409 conflict`
|
|
|
|
### declared-country sync rejected
|
|
|
|
Symptoms:
|
|
|
|
- geo sync returns `400 invalid_request`
|
|
|
|
Checks:
|
|
|
|
- country code is uppercase ISO 3166-1 alpha-2
|
|
- trusted caller is using the intended internal route
|
|
|
|
## Safe Rollout Notes
|
|
|
|
- Keep `Auth / Session Service` and `User Service` aligned on the current
|
|
`registration_context` shape.
|
|
- During the current rollout, treat the authsession-provided
|
|
`preferred_language` derived from public `Accept-Language`, with fallback to
|
|
`en`, as the active create-path contract.
|
|
- Gateway direct `user.*` self-service routing depends on the internal REST
|
|
routes staying stable.
|
|
- Do not roll out billing-driven entitlement mutations assuming another
|
|
service owns current entitlement state. `User Service` remains the source of
|
|
truth for current entitlement.
|
|
|
|
## Debugging Data Mismatches
|
|
|
|
When a caller reports mismatched user state:
|
|
|
|
1. Read the current account aggregate through the trusted internal route.
|
|
2. Confirm whether the discrepancy is in source-of-truth state or in a
|
|
downstream projection.
|
|
3. If the issue concerns declared-country workflow history, switch to `Geo
|
|
Profile Service`; `User Service` stores only the current effective value.
|
|
4. If the issue concerns authenticated edge transport, verify the same user
|
|
through gateway `user.account.get` to distinguish transport problems from
|
|
source-of-truth problems.
|