1093 lines
34 KiB
Markdown
1093 lines
34 KiB
Markdown
# TESTING.md
|
|
|
|
## Purpose
|
|
|
|
This document defines the testing strategy for the [Galaxy Game](ARCHITECTURE.md) platform and provides a staged testing matrix aligned with the agreed service implementation order.
|
|
|
|
The strategy is built around the current architecture constraints:
|
|
|
|
* `Edge Gateway` is the single public ingress and owns the external transport, authenticated gRPC verification pipeline, routing, and push delivery.
|
|
* `Auth / Session Service` is the source of truth for challenges and `device_session`, but it must not become the hot-path dependency for every authenticated request.
|
|
* `Geo Profile Service` is asynchronous and auxiliary; it must not block the current request and only affects subsequent requests.
|
|
* Internal event propagation already exists as an architectural pattern through Redis-backed cache updates and pub/sub-style flows.
|
|
|
|
## Global Testing Strategy
|
|
|
|
* Start with **service tests** for each service in isolation.
|
|
* As soon as a new service is integrated with already implemented services, add **inter-service integration tests** for that concrete boundary.
|
|
* Only after all major components are implemented, add **full system tests** that exercise complete end-to-end platform flows.
|
|
* Do not postpone all integration testing until the end.
|
|
* Do not try to replace service tests with end-to-end tests.
|
|
* Keep most tests deterministic and cheap to run.
|
|
* Use real Redis in integration tests where Redis is part of the service contract.
|
|
* Keep `Mail Service` stubbed in most integration and system tests, except for a small dedicated smoke suite for the real mail adapter.
|
|
* Prefer fake or test-specific implementations for external side effects until the corresponding real service is intentionally introduced.
|
|
* For every new service:
|
|
|
|
* first add service tests;
|
|
* then add inter-service tests against already implemented services;
|
|
* then add regression scenarios to the growing system test suite.
|
|
* For asynchronous flows:
|
|
|
|
* test both successful delivery and delayed/eventual delivery;
|
|
* test duplicate event handling;
|
|
* test retry-safe and idempotent consumption;
|
|
* test observability of stuck or failed processing.
|
|
* For synchronous flows:
|
|
|
|
* test happy path, validation failures, timeout propagation, dependency unavailability, and deterministic error mapping.
|
|
* Every service with an external or trusted internal API must have contract tests in addition to behavioral tests.
|
|
* Every service that publishes or consumes Redis Stream events must have schema/contract tests for those event payloads.
|
|
* Full system tests should be small in number but broad in vertical coverage.
|
|
|
|
## Test Layer Definitions
|
|
|
|
### Service tests
|
|
|
|
Service tests verify one component in isolation.
|
|
|
|
They include:
|
|
|
|
* domain/model tests;
|
|
* use-case/service-layer tests;
|
|
* adapter tests for storage, queues, clocks, IDs, and protocol encoding;
|
|
* API handler/controller tests;
|
|
* contract tests for DTOs and stable error surfaces;
|
|
* service-local integration tests with owned infrastructure such as Redis.
|
|
|
|
### Inter-service integration tests
|
|
|
|
Inter-service integration tests verify one real boundary between two or more already implemented services.
|
|
|
|
They include:
|
|
|
|
* synchronous API compatibility;
|
|
* event publication and consumption;
|
|
* error propagation across service boundaries;
|
|
* cache/projection compatibility;
|
|
* retry and idempotency behavior across the seam;
|
|
* compatibility of internal authenticated context and domain decisions.
|
|
|
|
### Full system tests
|
|
|
|
Full system tests verify complete user or admin flows through the real architecture.
|
|
|
|
They include:
|
|
|
|
* gateway ingress;
|
|
* authentication;
|
|
* user/profile state;
|
|
* game lifecycle;
|
|
* notifications and push;
|
|
* runtime orchestration;
|
|
* administrative operations;
|
|
* failure and recovery behavior across multiple services.
|
|
|
|
## Test Environment Rules
|
|
|
|
* Use an isolated Redis instance per integration test suite or per test worker.
|
|
* Use a stub `Mail Service` by default.
|
|
* Use fake/test doubles for not-yet-implemented downstream services.
|
|
* Introduce real downstream services progressively as they are implemented.
|
|
* Use a test engine container or test engine stub for `Game Master` and `Runtime Manager` tests before relying on a real production engine image.
|
|
* Use deterministic test clocks where scheduling or expiration matters.
|
|
* Make async tests wait on observable states, not arbitrary sleeps, whenever possible.
|
|
* Keep one small smoke suite for:
|
|
|
|
* real Redis;
|
|
* real runtime backend path;
|
|
* real SMTP adapter later;
|
|
* real signed gateway request/response flow.
|
|
|
|
## Recommended Service Implementation and Testing Order
|
|
|
|
The testing plan follows this service order:
|
|
|
|
* `Edge Gateway Service`
|
|
* `Auth / Session Service`
|
|
* `User Service`
|
|
* `Mail Service`
|
|
* `Notification Service`
|
|
* `Game Lobby Service`
|
|
* `Runtime Manager`
|
|
* `Game Master`
|
|
* `Admin Service`
|
|
* `Geo Profile Service`
|
|
* `Billing Service`
|
|
|
|
---
|
|
|
|
## 1. [Edge Gateway](gateway/README.md) Service
|
|
|
|
### Service tests
|
|
|
|
* Public REST routing tests:
|
|
|
|
* `GET /healthz`
|
|
* `GET /readyz`
|
|
* mounted public auth routes
|
|
* wrong-method and not-found handling
|
|
* public route-class classification for auth, browser bootstrap, browser asset, and misc traffic
|
|
* isolation of browser/public-auth rate-limit buckets
|
|
* rejection of oversized public request bodies
|
|
* `RemoteAddr`-based public IP derivation that ignores forwarded proxy headers
|
|
* public rate-limit behavior
|
|
* stable projection of upstream public auth errors
|
|
* sensitive-field redaction in public-auth logs
|
|
* public OpenAPI contract validation
|
|
* admin `/metrics` availability only on the private admin listener
|
|
* Authenticated gRPC envelope validation tests:
|
|
|
|
* missing required fields
|
|
* unsupported `protocol_version`
|
|
* parsed envelope attachment before delegate execution
|
|
* malformed `payload_hash`
|
|
* mismatched `payload_hash`
|
|
* invalid signature
|
|
* stale timestamp
|
|
* replay detection
|
|
* unknown session
|
|
* revoked session
|
|
* Session cache behavior tests:
|
|
|
|
* cache hit
|
|
* cache miss
|
|
* malformed cached record
|
|
* read-through local-cache warming after first fallback lookup
|
|
* local hit skips fallback lookup
|
|
* cache invalidation/update handling
|
|
* Response signing tests:
|
|
|
|
* signed unary response generation
|
|
* unary response fails closed when the response signer is unavailable
|
|
* signed bootstrap push event generation
|
|
* bootstrap push fails closed when the response signer is unavailable
|
|
* signed stream event generation
|
|
* Routing tests:
|
|
|
|
* unrouted `message_type`
|
|
* downstream timeout mapping
|
|
* downstream availability mapping
|
|
* authenticated internal command context construction
|
|
* verified trace/span context propagation downstream
|
|
* graceful drain of in-flight unary requests on shutdown
|
|
* sensitive transport material redaction in authenticated logs
|
|
* Push tests:
|
|
|
|
* `SubscribeEvents` binds `user_id` and `device_session_id`
|
|
* bootstrap server-time event is emitted
|
|
* user-targeted events fan out to all matching user sessions
|
|
* session-targeted events reach only the addressed session
|
|
* stream queue overflow closes only the affected stream
|
|
* revoked session closes matching streams only
|
|
* revoked-session stream reopen is rejected
|
|
* active streams close with deterministic status on gateway shutdown
|
|
* Anti-abuse tests:
|
|
|
|
* IP/session/user/message-class buckets
|
|
* interaction between rate limits and verification order
|
|
* authenticated/public anti-abuse bucket isolation
|
|
* authenticated policy-hook input and reject mapping
|
|
* Redis adapter tests:
|
|
|
|
* session cache lookup
|
|
* replay reservation
|
|
* client event stream consumption
|
|
* session event stream consumption
|
|
* subscriber start-from-tail semantics
|
|
* malformed-event drop/evict-and-continue behavior
|
|
* later-event-wins behavior for session snapshots
|
|
* subscriber shutdown interrupts blocking reads
|
|
|
|
### Inter-service integration tests at this stage
|
|
|
|
* `Gateway <-> Redis`
|
|
|
|
* session cache compatibility
|
|
* replay reservation semantics
|
|
* session update warms local cache without repeated fallback lookups
|
|
* revoked snapshot invalidates authenticated requests without fallback lookup
|
|
* client-event stream consumption for push fan-out
|
|
* session-event stream consumption for revoke propagation and push teardown
|
|
* `Gateway <-> stub Auth adapter`
|
|
|
|
* public auth passthrough
|
|
* timeout/error projection
|
|
* `Gateway <-> fake downstream`
|
|
|
|
* verified authenticated command routing
|
|
* signed response generation after downstream success
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Authenticated request verification pipeline remains stable.
|
|
* Public auth routes remain mounted and deterministic.
|
|
* Public route classes and anti-abuse buckets remain isolated.
|
|
* Admin metrics stay off the public ingress.
|
|
* Push bootstrap event remains signed and schema-compatible.
|
|
* Push revoke and shutdown close streams with stable status mapping.
|
|
* Gateway logs remain free of sensitive request/auth material.
|
|
|
|
---
|
|
|
|
## 2. [Auth / Session](authsession/README.md) Service
|
|
|
|
### Service tests
|
|
|
|
* Challenge lifecycle tests:
|
|
|
|
* challenge creation
|
|
* TTL expiration
|
|
* resend throttling
|
|
* `delivery_throttled` challenge creation without `UserDirectory` or `MailSender` calls
|
|
* `delivery_suppressed` behavior for blocked subjects
|
|
* expiry grace-window transition from `challenge_expired` to `challenge_not_found`
|
|
* delivery state transitions
|
|
* invalid confirm attempt limits
|
|
* success-shaped `send-email-code` behavior
|
|
* Confirm flow tests:
|
|
|
|
* valid `challenge_id + code + client_public_key`
|
|
* malformed `client_public_key`
|
|
* blocked user
|
|
* existing user
|
|
* creatable user
|
|
* short-window idempotent confirm retry
|
|
* projection repair on repeated confirm after prior publish failure
|
|
* same challenge plus different public key failure
|
|
* confirm-race cleanup of superseded sessions
|
|
* session-limit exceeded
|
|
* Session lifecycle tests:
|
|
|
|
* create session
|
|
* revoke one session
|
|
* revoke all sessions
|
|
* block user/email and revoke implied sessions
|
|
* `already_revoked`, `no_active_sessions`, and `already_blocked` acknowledgement semantics
|
|
* Projection tests:
|
|
|
|
* source-of-truth session write
|
|
* gateway KV snapshot write
|
|
* gateway session stream event publish
|
|
* repeated publish idempotency
|
|
* stored session reread before publish to avoid stale active projection
|
|
* Public API tests:
|
|
|
|
* JSON decoding, input validation, and invalid-request mapping
|
|
* public error mapping
|
|
* stable success DTO shape
|
|
* end-to-end public HTTP send/confirm scenarios
|
|
* timeout mapping and invalid-success-payload rejection
|
|
* stable public OpenAPI validation and gateway contract parity
|
|
* stable public error examples
|
|
* trace/metric emission and sensitive-field log redaction
|
|
* Internal API tests:
|
|
|
|
* `GetSession`
|
|
* `ListUserSessions`
|
|
* `RevokeDeviceSession`
|
|
* `RevokeAllUserSessions`
|
|
* `BlockUser`
|
|
* path/body validation and invalid-request mapping
|
|
* end-to-end internal HTTP read/revoke/block scenarios
|
|
* timeout mapping and invalid-success-payload rejection
|
|
* stable internal OpenAPI validation and frozen mutation DTO/enums
|
|
* trace/metric emission and sensitive-field log redaction
|
|
* Redis adapter tests:
|
|
|
|
* challenge store
|
|
* session store
|
|
* config provider
|
|
* projection publisher
|
|
* Runtime and architecture tests:
|
|
|
|
* public/internal HTTP server lifecycle
|
|
* intentional absence of `/healthz`, `/readyz`, and `/metrics`
|
|
* runtime wiring for `stub|rest` user-service and mail-service adapters
|
|
* startup fail-fast on Redis-backed ping failure
|
|
* storage-agnostic core for domain/service/ports layers
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Gateway <-> Auth / Session`
|
|
|
|
* public `send-email-code`
|
|
* public `confirm-email-code`
|
|
* upstream timeout handling
|
|
* public error passthrough
|
|
* `Auth / Session <-> Redis`
|
|
|
|
* challenge persistence
|
|
* session persistence
|
|
* session projection compatibility
|
|
* duplicate publish keeps gateway cache canonical
|
|
* `Gateway <-> Auth / Session <-> Redis`
|
|
|
|
* login creates session
|
|
* session projection becomes visible to gateway
|
|
* repeated confirm repairs a previously failed projection publish
|
|
* revoked session invalidates gateway authentication path
|
|
* revoked session closes gateway push stream
|
|
* malformed client public key keeps stable client-facing error
|
|
* `Auth / Session <-> stub Mail`
|
|
|
|
* auth code send path
|
|
* suppression path
|
|
* explicit mail failure path
|
|
* `Auth / Session <-> Mail REST`
|
|
|
|
* sent/suppressed/failure compatibility
|
|
* blocked/throttled sends skip mail delivery
|
|
* `Auth / Session <-> User REST`
|
|
|
|
* resolve-by-email compatibility for public send
|
|
* ensure-user compatibility for confirm
|
|
* exists/block compatibility for internal revoke/block flows
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* `confirm-email-code` always returns a ready `device_session_id`.
|
|
* Gateway continues authenticating from cache rather than synchronous auth lookups.
|
|
* Confirm idempotency window behavior remains stable.
|
|
* Projection repair-on-retry remains safe after source-of-truth commits.
|
|
* Confirm-race cleanup does not leave multiple active winner sessions.
|
|
* Projection repair continues working after process restart.
|
|
* Redis reconnect on the same live process preserves recovery semantics.
|
|
* Expired challenges continue returning `challenge_expired` during grace and `challenge_not_found` after TTL cleanup.
|
|
* Large session-list and bulk-revoke paths remain stable.
|
|
* Concurrent confirm, revoke-all, and block flows do not leak active sessions.
|
|
* Session projection remains compatible with gateway expectations.
|
|
|
|
---
|
|
|
|
## 3. [User](user/README.md) Service
|
|
|
|
### Service tests
|
|
|
|
* User creation and identity tests:
|
|
|
|
* create user
|
|
* find by email
|
|
* exact-after-trim e-mail storage and lookup semantics
|
|
* generated default `race_name` for new users
|
|
* `race_name` uniqueness and confusable-substitution policy
|
|
* tariff/entitlement fields
|
|
* Profile tests:
|
|
|
|
* allowed profile reads
|
|
* allowed profile edits
|
|
* forbidden profile edits
|
|
* self-service rejection for e-mail and `declared_country` mutations
|
|
* `profile_update_block` sanction gating for profile/settings writes
|
|
* settings reads/writes
|
|
* BCP 47 and IANA validation for settings values
|
|
* Restriction/sanction tests:
|
|
|
|
* block flags
|
|
* user limits
|
|
* override fields
|
|
* declared current sanctions view
|
|
* effective sanction/limit snapshot shaping for downstream consumers
|
|
* Entitlement tests:
|
|
|
|
* free user
|
|
* paid placeholder states
|
|
* default simultaneous-game limit and per-user overrides
|
|
* entitlement, sanction, and limit interaction rules
|
|
* Internal/admin-oriented tests:
|
|
|
|
* resolve existing/creatable/blocked decision for auth
|
|
* `ensure-by-email` create-only `registration_context` semantics
|
|
* current `declared_country` read/write path
|
|
* exact lookup by `user_id`, exact-after-trim `email`, and exact `race_name`
|
|
* paginated filtered listing with deterministic ordering
|
|
* Storage and API contract tests:
|
|
|
|
* public/trusted endpoints
|
|
* stable DTO mapping
|
|
* Redis persistence if used directly in v1
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Auth / Session <-> User`
|
|
|
|
* resolve existing user
|
|
* create new user during confirm
|
|
* blocked-by-policy outcome
|
|
* `Gateway <-> User`
|
|
|
|
* authenticated `user.account.get`
|
|
* authenticated successful `user.profile.update`
|
|
* authenticated successful `user.settings.update`
|
|
* `profile_update_block` conflict projection
|
|
* invalid-request projection for malformed self-service payload values
|
|
* `Gateway <-> Auth / Session <-> User`
|
|
|
|
* first registration by email
|
|
* repeat login by same email without overwriting create-only settings
|
|
* blocked email/user behavior
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* User resolution outcomes remain stable for auth flow.
|
|
* User-facing profile APIs do not bypass auth/session rules.
|
|
* `registration_context` stays create-only and does not overwrite existing users.
|
|
* `race_name` uniqueness policy remains stable for self-service and auth-created users.
|
|
* User limit and sanction data stay compatible with downstream consumers.
|
|
|
|
---
|
|
|
|
## 4. Mail Service
|
|
|
|
### Service tests
|
|
|
|
* Mail command validation tests:
|
|
|
|
* recipient validation
|
|
* template selection
|
|
* payload rendering
|
|
* Internal queue tests:
|
|
|
|
* enqueue
|
|
* dequeue
|
|
* retry
|
|
* permanent failure
|
|
* idempotent duplicate suppression where applicable
|
|
* Delivery adapter tests:
|
|
|
|
* stub adapter behavior
|
|
* future SMTP adapter smoke behavior
|
|
* Operational tests:
|
|
|
|
* queue backlog metrics
|
|
* dead-letter or failure recording behavior
|
|
* timeout handling
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Auth / Session <-> Mail`
|
|
|
|
* direct auth-code send
|
|
* explicit mail failure behavior
|
|
* suppression path still preserves correct auth semantics
|
|
* `Gateway <-> Auth / Session <-> Mail`
|
|
|
|
* public auth flow still behaves correctly with mail delivery involved
|
|
* Keep `Mail Service` stubbed in most broader suites.
|
|
* Add only a small dedicated smoke suite for the real mail adapter.
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Auth code mail remains a direct dependency of auth flow.
|
|
* Mail failures do not corrupt auth challenge/session state.
|
|
* Stub mail remains the default for most non-mail-focused suites.
|
|
|
|
---
|
|
|
|
## 5. Notification Service
|
|
|
|
### Service tests
|
|
|
|
* Event intake tests:
|
|
|
|
* accepted event types
|
|
* malformed event rejection
|
|
* idempotent duplicate handling
|
|
* Routing decision tests:
|
|
|
|
* push only
|
|
* email only
|
|
* push and email
|
|
* discard/no-delivery cases
|
|
* Rendering tests:
|
|
|
|
* event-to-notification mapping
|
|
* payload shaping for push
|
|
* payload shaping for email
|
|
* Failure isolation tests:
|
|
|
|
* push failure does not corrupt email route decision
|
|
* email failure does not corrupt push route decision
|
|
* retriable delivery behavior
|
|
* Redis/event bus tests:
|
|
|
|
* consume domain/integration events
|
|
* publish client-facing events for gateway
|
|
* enqueue mail commands for mail service
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Notification <-> Gateway`
|
|
|
|
* client-facing event publication and push delivery
|
|
* user-targeted vs session-targeted push routing
|
|
* `Notification <-> Mail`
|
|
|
|
* non-auth email delivery
|
|
* retry/failure isolation
|
|
* `Lobby/other fake producers <-> Notification`
|
|
|
|
* domain event intake compatibility
|
|
* Assert explicitly that auth-code emails still bypass notification and go directly from auth to mail.
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Notification stays delivery/orchestration-only and does not become source of truth.
|
|
* Non-auth notifications consistently go through notification service.
|
|
* Gateway push compatibility remains stable.
|
|
|
|
---
|
|
|
|
## 6. Game Lobby Service
|
|
|
|
### Service tests
|
|
|
|
* Game lifecycle tests:
|
|
|
|
* `draft`
|
|
* `enrollment_open`
|
|
* `enrollment_closed`
|
|
* `ready_to_start`
|
|
* `starting`
|
|
* `running`
|
|
* `paused`
|
|
* `finished`
|
|
* `cancelled`
|
|
* Public/private game rules:
|
|
|
|
* public game creation by admin only
|
|
* private game creation entitlement checks
|
|
* visibility rules for private games
|
|
* Invite lifecycle tests:
|
|
|
|
* invite code creation
|
|
* invite code redemption
|
|
* invite approval/rejection
|
|
* invite expiration if applicable later
|
|
* Application and approval tests:
|
|
|
|
* public game application
|
|
* manual approval
|
|
* duplicate application handling
|
|
* Membership tests:
|
|
|
|
* invited
|
|
* pending
|
|
* accepted
|
|
* removed
|
|
* blocked from party
|
|
* User list/read-model tests:
|
|
|
|
* active games
|
|
* finished games
|
|
* pending applications
|
|
* invited games
|
|
* Start-preparation tests:
|
|
|
|
* roster validation
|
|
* schedule validation
|
|
* engine version target validation
|
|
* readiness to start
|
|
* Runtime snapshot import tests:
|
|
|
|
* `current_turn`
|
|
* `runtime_status`
|
|
* `engine_health_summary`
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Gateway <-> Game Lobby`
|
|
|
|
* authenticated platform-level command routing
|
|
* owner-only commands before start
|
|
* `Lobby <-> User`
|
|
|
|
* entitlement checks for private game creation
|
|
* per-user simultaneous-game limits
|
|
* sanctions affecting join/create flows
|
|
* `Lobby <-> Notification`
|
|
|
|
* invite events
|
|
* approval/rejection events
|
|
* game status change events at platform level
|
|
* `Lobby <-> Auth / Session`
|
|
|
|
* authenticated context correctly propagated from gateway
|
|
* Keep runtime launch boundaries stubbed until `Runtime Manager` exists.
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* `Lobby` remains source of truth for platform game metadata and membership.
|
|
* `Lobby` user-facing game lists remain independent from `Game Master`.
|
|
* Private-game visibility and invite semantics remain stable.
|
|
|
|
---
|
|
|
|
## 7. Runtime Manager
|
|
|
|
### Service tests
|
|
|
|
* Runtime job tests:
|
|
|
|
* start container
|
|
* stop container
|
|
* restart container
|
|
* patch container
|
|
* inspect/status
|
|
* Invariant tests:
|
|
|
|
* one game -> one container
|
|
* one container -> one game
|
|
* Monitoring tests:
|
|
|
|
* health probe collection
|
|
* health event publication
|
|
* container disappearance handling
|
|
* restart/patch result reporting
|
|
* Failure tests:
|
|
|
|
* Docker API unavailable
|
|
* image missing
|
|
* startup timeout
|
|
* stop timeout
|
|
* patch failure
|
|
* Event publication tests:
|
|
|
|
* runtime job completion events
|
|
* technical health events
|
|
* duplicate event safety
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Lobby <-> Runtime Manager`
|
|
|
|
* async start job request
|
|
* completion event consumption
|
|
* full fail-start path
|
|
* `Runtime Manager <-> Notification`
|
|
|
|
* optional operational event routing if enabled
|
|
* Use a fake or test runtime backend first, then a targeted smoke suite against a real local Docker backend.
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Runtime Manager remains the only component talking to Docker API.
|
|
* Runtime job event contracts remain stable for `Lobby` and later `Game Master`.
|
|
|
|
---
|
|
|
|
## 8. Game Master
|
|
|
|
### Service tests
|
|
|
|
* Runtime registry tests:
|
|
|
|
* register running game
|
|
* unregister/stop game
|
|
* runtime state transitions
|
|
* Engine version registry tests:
|
|
|
|
* version registration
|
|
* patch compatibility policy
|
|
* version-specific options
|
|
* Runtime metadata tests:
|
|
|
|
* current turn
|
|
* runtime status
|
|
* generation status
|
|
* engine health summary
|
|
* patch state
|
|
* Membership/runtime mapping tests:
|
|
|
|
* `user_id -> engine player UUID`
|
|
* game-scoped engine identifiers
|
|
* Scheduling tests:
|
|
|
|
* scheduled turn generation
|
|
* cutoff enforcement
|
|
* manual force-next-turn
|
|
* skip-next-scheduled-slot after manual generation
|
|
* Failure tests:
|
|
|
|
* `generation_failed`
|
|
* `engine_unreachable`
|
|
* runtime recovery from engine errors
|
|
* Post-start administrative tests:
|
|
|
|
* `stop game`
|
|
* `patch engine`
|
|
* temporary player removal at platform gate only
|
|
* final player removal/deactivation inside engine
|
|
* Engine mediation tests:
|
|
|
|
* engine setup after lobby metadata persistence
|
|
* engine finish notification handling
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Gateway <-> Game Master`
|
|
|
|
* running-game command routing with `game_id`
|
|
* runtime-admin commands for running games
|
|
* system admin vs private-owner privileges where applicable
|
|
* `Game Master <-> Lobby`
|
|
|
|
* running-game registration after successful container start
|
|
* membership lookup/cached authorization
|
|
* runtime snapshot backfill into lobby
|
|
* finished-game notification to lobby
|
|
* `Game Master <-> Runtime Manager`
|
|
|
|
* patch/stop/restart jobs
|
|
* runtime health event consumption
|
|
* `Game Master <-> Notification`
|
|
|
|
* new turn event publication
|
|
* game finished event publication
|
|
* generation failure admin notification
|
|
* `Game Master <-> test engine container`
|
|
|
|
* command proxying
|
|
* status read
|
|
* setup call
|
|
* finish callback
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* `Game Master` remains the only service allowed to call game engine containers.
|
|
* Turn cutoff logic stays authoritative at platform level.
|
|
* Manual next-turn generation always suppresses the next scheduled slot.
|
|
* Runtime snapshot compatibility with `Lobby` remains stable.
|
|
|
|
---
|
|
|
|
## 9. Admin Service
|
|
|
|
### Service tests
|
|
|
|
* Admin API surface tests:
|
|
|
|
* admin-only route handling
|
|
* DTO validation
|
|
* aggregation/read models
|
|
* Orchestration tests:
|
|
|
|
* forwards trusted operations to downstream services
|
|
* error aggregation and normalization
|
|
* partial failure handling for multi-step admin workflows
|
|
* Role-handling tests:
|
|
|
|
* admin-only enforcement assumptions
|
|
* no accidental privilege leak into normal user flows
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Gateway <-> Admin`
|
|
|
|
* separate admin REST surface
|
|
* admin-authenticated request handling
|
|
* `Admin <-> User`
|
|
|
|
* user restriction/sanction/admin reads
|
|
* `Admin <-> Lobby`
|
|
|
|
* public game administration
|
|
* global read of private games
|
|
* `Admin <-> Game Master`
|
|
|
|
* runtime administration
|
|
* global status reads
|
|
* patch/stop/force-next-turn
|
|
* `Admin <-> Auth / Session`
|
|
|
|
* session revoke/block operations if exposed through admin workflows
|
|
* `Admin <-> Notification`
|
|
|
|
* admin-generated notifications where needed
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Admin Service remains orchestration/backend only.
|
|
* System admin capabilities remain separate from private-owner capabilities.
|
|
|
|
---
|
|
|
|
## 10. [Geo Profile](geoprofile/README.md) Service
|
|
|
|
### Service tests
|
|
|
|
* Ingest tests:
|
|
|
|
* enqueue authenticated observation
|
|
* ingest validation
|
|
* malformed FlatBuffers payload rejection
|
|
* required-scalar-field validation
|
|
* non-blocking acceptance
|
|
* Worker pipeline tests:
|
|
|
|
* geo lookup
|
|
* geo lookup miss handling
|
|
* country aggregation
|
|
* `usual_connection_country` derivation
|
|
* suspicious multi-country detection
|
|
* review recommendation calculation
|
|
* queue retry-safe processing
|
|
* State tests:
|
|
|
|
* durable `country_review_recommended`
|
|
* declared-country version history
|
|
* declared-country version lifecycle: `recorded`, `applied`, `sync_failed`
|
|
* session block action history
|
|
* Admin/query API tests:
|
|
|
|
* list review candidates
|
|
* stable ordering and pagination for candidate queries
|
|
* read user geo profile
|
|
* grouping by `device_session_id` in review/read responses
|
|
* apply approved declared-country change
|
|
* Queue and lag tests:
|
|
|
|
* backlog observability
|
|
* duplicate observation safety
|
|
* delayed processing behavior
|
|
* retry and failure observability
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Gateway <-> Geo`
|
|
|
|
* async observation publish from authenticated request context
|
|
* fail-open edge behavior when geo ingest is unavailable
|
|
* `Geo <-> Auth / Session`
|
|
|
|
* suspicious session block request
|
|
* subsequent-request effect rather than current-request effect
|
|
* `Geo <-> User`
|
|
|
|
* synchronous update of current `declared_country`
|
|
* no divergence between history and current value
|
|
* `Geo <-> Notification`
|
|
|
|
* review-recommended event fan-out
|
|
* optional admin notification flow
|
|
* Keep geo processing fail-open relative to gameplay in all integration tests.
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Geo processing never blocks the current gameplay request.
|
|
* Review-recommended state remains queryable even when event/mail side effects fail.
|
|
* Session suspicion affects only later requests via auth/session.
|
|
* Geo owns history, while user service owns current effective declared country.
|
|
|
|
---
|
|
|
|
## 11. Billing Service
|
|
|
|
### Service tests
|
|
|
|
* Payment event intake tests:
|
|
|
|
* accepted event types
|
|
* malformed event rejection
|
|
* idempotent duplicate handling
|
|
* Entitlement mapping tests:
|
|
|
|
* free
|
|
* monthly-paid
|
|
* annual-paid
|
|
* once-forever-paid
|
|
* Lifecycle tests:
|
|
|
|
* activate paid entitlement
|
|
* expire renewable entitlement
|
|
* cancel paid entitlement
|
|
* preserve perpetual entitlement
|
|
* Failure tests:
|
|
|
|
* unknown user
|
|
* invalid payment state
|
|
* downstream user update failure
|
|
|
|
### Inter-service integration tests with already implemented components
|
|
|
|
* `Billing <-> User`
|
|
|
|
* entitlement updates become current source of truth in user service
|
|
* `Billing <-> Notification`
|
|
|
|
* optional billing-related user/admin notifications
|
|
* `Gateway <-> User` regression:
|
|
|
|
* user-facing entitlement reads reflect billing-fed updates correctly
|
|
|
|
### Regression tests to keep from this stage onward
|
|
|
|
* Other services never depend directly on billing for live entitlement decisions.
|
|
* `User Service` remains the source of truth for current entitlement.
|
|
|
|
---
|
|
|
|
## Full System Tests
|
|
|
|
These tests are added only after all major components are implemented.
|
|
|
|
By default, they should use:
|
|
|
|
* real gateway;
|
|
* real auth/session;
|
|
* real user;
|
|
* real notification;
|
|
* real lobby;
|
|
* real runtime manager;
|
|
* real game master;
|
|
* real admin;
|
|
* real geo;
|
|
* real Redis;
|
|
* stub `Mail Service` by default;
|
|
* test engine container or stable test engine image.
|
|
|
|
### A. Authentication and session lifecycle
|
|
|
|
* Register/login via email code through gateway.
|
|
* Confirm that `device_session_id` becomes usable through gateway without synchronous auth lookups on every request.
|
|
* Confirm that repeated `confirm-email-code` within the idempotency window returns the same `device_session_id`.
|
|
* Revoke one session and verify:
|
|
|
|
* authenticated requests fail for that session;
|
|
* only push streams bound to that session are closed.
|
|
* Revoke all sessions of a user and verify all sessions are rejected afterward.
|
|
|
|
### B. User profile and entitlement flow
|
|
|
|
* Read and update allowed user profile fields through gateway.
|
|
* Read tariff/entitlement and user limits through gateway.
|
|
* Verify that private-party creation entitlement decisions reflect current user-service state.
|
|
* Later, verify billing-fed entitlement changes become visible through user-service reads.
|
|
|
|
### C. Public game lifecycle
|
|
|
|
* Admin creates a public game.
|
|
* Users see it in public lists.
|
|
* Users apply.
|
|
* Admin approves roster.
|
|
* Lobby validates readiness.
|
|
* Runtime Manager starts container.
|
|
* Lobby persists metadata.
|
|
* Game Master registers the running game and initializes engine.
|
|
* Game becomes visible as running in user lists.
|
|
|
|
### D. Private game lifecycle
|
|
|
|
* Eligible user creates private game.
|
|
* Owner creates invite code.
|
|
* Another user redeems invite code and applies.
|
|
* Owner approves application.
|
|
* Owner starts game.
|
|
* Running registration completes.
|
|
* Only authorized users see the private game.
|
|
|
|
### E. Running-game command and push flow
|
|
|
|
* Player sends valid game command before cutoff.
|
|
* Gateway authenticates and routes to Game Master.
|
|
* Game Master verifies access and forwards to engine.
|
|
* Scheduled turn generation occurs.
|
|
* Player receives lightweight push notification through gateway.
|
|
* Player separately fetches updated per-player game state.
|
|
|
|
### F. Force-next-turn flow
|
|
|
|
* Running game has a fixed schedule.
|
|
* Owner or admin triggers manual next-turn generation.
|
|
* Current turn increments.
|
|
* Next scheduled slot is skipped.
|
|
* Subsequent scheduled generation happens only after the following valid slot.
|
|
|
|
### G. Runtime failure flow
|
|
|
|
* Scheduled turn generation fails.
|
|
* Game Master marks `generation_failed`.
|
|
* Lobby receives updated runtime snapshot.
|
|
* Only administrators are notified through notification flow.
|
|
* Users can still observe degraded problem state through status reads.
|
|
|
|
### H. Start failure and recovery flow
|
|
|
|
* Lobby requests runtime start.
|
|
* Runtime Manager starts container.
|
|
* Simulate metadata persistence failure in Lobby.
|
|
* Verify container is removed and game is not left half-started.
|
|
* Simulate successful metadata persistence but Game Master registration failure.
|
|
* Verify game is marked `paused` and admin is notified.
|
|
|
|
### I. Temporary vs final player removal flow
|
|
|
|
* Temporarily remove player after game start.
|
|
* Verify player can no longer send commands through platform.
|
|
* Verify engine still keeps the slot.
|
|
* Final-remove or account-block the player.
|
|
* Verify Game Master sends engine admin command to deactivate/remove the player.
|
|
|
|
### J. Notification routing flow
|
|
|
|
* Lobby emits invite/application/approval events.
|
|
* Notification Service sends push through gateway.
|
|
* Non-auth email notifications route through Notification Service to Mail Service.
|
|
* Auth-code emails remain direct `Auth / Session -> Mail`.
|
|
|
|
### K. Geo auxiliary flow
|
|
|
|
* Authenticated traffic generates geo observations.
|
|
* Suspicious multi-country pattern is detected.
|
|
* Current triggering request still succeeds.
|
|
* Auth / Session blocks the suspicious session.
|
|
* Next request from that session is rejected.
|
|
|
|
### L. Admin supervision flow
|
|
|
|
* System admin uses admin REST through gateway.
|
|
* Admin can view public and private games.
|
|
* Admin can inspect running-game runtime state.
|
|
* Admin can stop game, patch engine, and force next turn.
|
|
* Admin can block users and revoke sessions through appropriate downstream APIs.
|
|
|
|
## Ongoing Regression Policy
|
|
|
|
* Every time a new service is added, its service tests are mandatory before merging.
|
|
* Every new service boundary must add at least one inter-service integration suite against already implemented neighbors.
|
|
* Every bug found in integration or system testing must produce:
|
|
|
|
* one narrow regression test at the lowest useful level;
|
|
* and, if applicable, one broader integration or system scenario.
|
|
* The full system suite should stay intentionally limited to high-value vertical slices, not explode into a giant matrix.
|
|
|
|
## Practical Rule of Execution
|
|
|
|
* During early development:
|
|
|
|
* run service tests on every change;
|
|
* run inter-service tests for affected neighboring services on every branch;
|
|
* run a reduced smoke subset of system tests in CI.
|
|
* During stabilization:
|
|
|
|
* keep service and integration tests mandatory in CI;
|
|
* expand system tests around the critical product flows only.
|
|
|
|
## Summary
|
|
|
|
The project-wide testing strategy is fixed as follows:
|
|
|
|
* first, **service tests** inside each component;
|
|
* then, as components appear, **inter-service integration tests** between real neighboring services;
|
|
* finally, after all major components are implemented, **full system tests** for complete end-to-end platform flows.
|
|
|
|
This order is mandatory for the project because the architecture contains several critical stateful and asynchronous seams:
|
|
|
|
* gateway verification and routing;
|
|
* auth/session projection into gateway cache;
|
|
* push delivery through gateway;
|
|
* Redis Streams event propagation;
|
|
* runtime job completion;
|
|
* lobby/game-master synchronization;
|
|
* geo post-factum protective actions.
|