galaxy-game/TESTING.md

# TESTING.md

## Purpose

This document defines the testing strategy for the Galaxy Plus platform and provides a staged testing matrix aligned with the agreed service implementation order.

The strategy is built around the current architecture constraints:

* `Edge Gateway` is the single public ingress and owns the external transport, authenticated gRPC verification pipeline, routing, and push delivery.
* `Auth / Session Service` is the source of truth for challenges and `device_session`, but it must not become the hot-path dependency for every authenticated request.
* `Geo Profile Service` is asynchronous and auxiliary; it must not block the current request and only affects subsequent requests.
* Internal event propagation already exists as an architectural pattern through Redis-backed cache updates and pub/sub-style flows.

## Global Testing Strategy

* Start with **service tests** for each service in isolation.
* As soon as a new service is integrated with already implemented services, add **inter-service integration tests** for that concrete boundary.
* Only after all major components are implemented, add **full system tests** that exercise complete end-to-end platform flows.
* Do not postpone all integration testing until the end.
* Do not try to replace service tests with end-to-end tests.
* Keep most tests deterministic and cheap to run.
* Use real Redis in integration tests where Redis is part of the service contract.
* Keep `Mail Service` stubbed in most integration and system tests, except for a small dedicated smoke suite for the real mail adapter.
* Prefer fake or test-specific implementations for external side effects until the corresponding real service is intentionally introduced.
* For every new service:

  * first add service tests;
  * then add inter-service tests against already implemented services;
  * then add regression scenarios to the growing system test suite.
* For asynchronous flows:

  * test both successful delivery and delayed/eventual delivery;
  * test duplicate event handling;
  * test retry-safe and idempotent consumption;
  * test observability of stuck or failed processing.
* For synchronous flows:

  * test happy path, validation failures, timeout propagation, dependency unavailability, and deterministic error mapping.
* Every service with an external or trusted internal API must have contract tests in addition to behavioral tests.
* Every service that publishes or consumes Redis Stream events must have schema/contract tests for those event payloads.
* Full system tests should be small in number but broad in vertical coverage.

## Test Layer Definitions

### Service tests

Service tests verify one component in isolation.

They include:

* domain/model tests;
* use-case/service-layer tests;
* adapter tests for storage, queues, clocks, IDs, and protocol encoding;
* API handler/controller tests;
* contract tests for DTOs and stable error surfaces;
* service-local integration tests with owned infrastructure such as Redis.

### Inter-service integration tests

Inter-service integration tests verify one real boundary between two or more already implemented services.

They include:

* synchronous API compatibility;
* event publication and consumption;
* error propagation across service boundaries;
* cache/projection compatibility;
* retry and idempotency behavior across the seam;
* compatibility of internal authenticated context and domain decisions.

### Full system tests

Full system tests verify complete user or admin flows through the real architecture.

They include:

* gateway ingress;
* authentication;
* user/profile state;
* game lifecycle;
* notifications and push;
* runtime orchestration;
* administrative operations;
* failure and recovery behavior across multiple services.

## Test Environment Rules

* Use an isolated Redis instance per integration test suite or per test worker.
* Use a stub `Mail Service` by default.
* Use fake/test doubles for not-yet-implemented downstream services.
* Introduce real downstream services progressively as they are implemented.
* Use a test engine container or test engine stub for `Game Master` and `Runtime Manager` tests before relying on a real production engine image.
* Use deterministic test clocks where scheduling or expiration matters.
* Make async tests wait on observable states, not arbitrary sleeps, whenever possible.
* Keep one small smoke suite for:

  * real Redis;
  * real runtime backend path;
  * real SMTP adapter later;
  * real signed gateway request/response flow.

## Recommended Service Implementation and Testing Order

The testing plan follows this service order:

* `Edge Gateway Service`
* `Auth / Session Service`
* `User Service`
* `Mail Service`
* `Notification Service`
* `Game Lobby Service`
* `Runtime Manager`
* `Game Master`
* `Admin Service`
* `Geo Profile Service`
* `Billing Service`

---

## 1. Edge Gateway Service

### Service tests

* Public REST routing tests:

  * `GET /healthz`
  * `GET /readyz`
  * mounted public auth routes
  * rejection of oversized public request bodies
  * public rate-limit behavior
  * stable projection of upstream public auth errors
* Authenticated gRPC envelope validation tests:

  * missing required fields
  * unsupported `protocol_version`
  * malformed `payload_hash`
  * mismatched `payload_hash`
  * invalid signature
  * stale timestamp
  * replay detection
  * unknown session
  * revoked session
* Session cache behavior tests:

  * cache hit
  * cache miss
  * malformed cached record
  * cache invalidation/update handling
* Response signing tests:

  * signed unary response generation
  * signed bootstrap push event generation
  * signed stream event generation
* Routing tests:

  * unrouted `message_type`
  * downstream timeout mapping
  * downstream availability mapping
  * authenticated internal command context construction
* Push tests:

  * `SubscribeEvents` binds `user_id` and `device_session_id`
  * bootstrap server-time event is emitted
  * stream queue overflow closes only the affected stream
  * revoked session closes matching streams only
* Anti-abuse tests:

  * IP/session/user/message-class buckets
  * interaction between rate limits and verification order
* Redis adapter tests:

  * session cache lookup
  * replay reservation
  * client event stream consumption
  * session event stream consumption

### Inter-service integration tests at this stage

* `Gateway <-> Redis`

  * session cache compatibility
  * replay reservation semantics
  * event stream consumption for push
* `Gateway <-> stub Auth adapter`

  * public auth passthrough
  * timeout/error projection
* `Gateway <-> fake downstream`

  * verified authenticated command routing
  * signed response generation after downstream success

### Regression tests to keep from this stage onward

* Authenticated request verification pipeline remains stable.
* Public auth routes remain mounted and deterministic.
* Push bootstrap event remains signed and schema-compatible.

---

## 2. Auth / Session Service

### Service tests

* Challenge lifecycle tests:

  * challenge creation
  * TTL expiration
  * resend throttling
  * delivery state transitions
  * invalid confirm attempt limits
  * success-shaped `send-email-code` behavior
* Confirm flow tests:

  * valid `challenge_id + code + client_public_key`
  * malformed `client_public_key`
  * blocked user
  * existing user
  * creatable user
  * short-window idempotent confirm retry
  * same challenge plus different public key failure
  * session-limit exceeded
* Session lifecycle tests:

  * create session
  * revoke one session
  * revoke all sessions
  * block user/email and revoke implied sessions
  * already-revoked and already-blocked idempotent results
* Projection tests:

  * source-of-truth session write
  * gateway KV snapshot write
  * gateway session stream event publish
  * repeated publish idempotency
* Public API tests:

  * JSON decoding and unknown field rejection
  * public error mapping
  * stable success DTO shape
* Internal API tests:

  * `GetSession`
  * `ListUserSessions`
  * `RevokeDeviceSession`
  * `RevokeAllUserSessions`
  * `BlockUser`
* Redis adapter tests:

  * challenge store
  * session store
  * config provider
  * projection publisher

### Inter-service integration tests with already implemented components

* `Gateway <-> Auth / Session`

  * public `send-email-code`
  * public `confirm-email-code`
  * upstream timeout handling
  * public error passthrough
* `Auth / Session <-> Redis`

  * challenge persistence
  * session persistence
  * session projection compatibility
* `Gateway <-> Auth / Session <-> Redis`

  * login creates session
  * session projection becomes visible to gateway
  * revoked session invalidates gateway authentication path
  * revoked session closes gateway push stream
* `Auth / Session <-> stub Mail`

  * auth code send path
  * suppression path
  * explicit mail failure path

### Regression tests to keep from this stage onward

* `confirm-email-code` always returns a ready `device_session_id`.
* Gateway continues authenticating from cache rather than synchronous auth lookups.
* Confirm idempotency window behavior remains stable.
* Session projection remains compatible with gateway expectations.

---

## 3. User Service

### Service tests

* User creation and identity tests:

  * create user
  * find by email
  * normalized email uniqueness
  * role assignment
  * tariff/entitlement fields
* Profile tests:

  * allowed profile reads
  * allowed profile edits
  * forbidden profile edits
  * settings reads/writes
* Restriction/sanction tests:

  * block flags
  * user limits
  * override fields
  * declared current sanctions view
* Entitlement tests:

  * free user
  * paid placeholder states
  * default simultaneous-game limit and per-user overrides
* Internal/admin-oriented tests:

  * resolve existing/creatable/blocked decision for auth
  * current `declared_country` read/write path
* Storage and API contract tests:

  * public/trusted endpoints
  * stable DTO mapping
  * Redis persistence if used directly in v1

### Inter-service integration tests with already implemented components

* `Auth / Session <-> User`

  * resolve existing user
  * create new user during confirm
  * blocked-by-policy outcome
* `Gateway <-> User`

  * authenticated profile read
  * authenticated allowed profile update
  * tariff and settings read paths
* `Gateway <-> Auth / Session <-> User`

  * first registration by email
  * repeat login by same email
  * blocked email/user behavior

### Regression tests to keep from this stage onward

* User resolution outcomes remain stable for auth flow.
* User-facing profile APIs do not bypass auth/session rules.
* User limit and sanction data stay compatible with downstream consumers.

---

## 4. Mail Service

### Service tests

* Mail command validation tests:

  * recipient validation
  * template selection
  * payload rendering
* Internal queue tests:

  * enqueue
  * dequeue
  * retry
  * permanent failure
  * idempotent duplicate suppression where applicable
* Delivery adapter tests:

  * stub adapter behavior
  * future SMTP adapter smoke behavior
* Operational tests:

  * queue backlog metrics
  * dead-letter or failure recording behavior
  * timeout handling

### Inter-service integration tests with already implemented components

* `Auth / Session <-> Mail`

  * direct auth-code send
  * explicit mail failure behavior
  * suppression path still preserves correct auth semantics
* `Gateway <-> Auth / Session <-> Mail`

  * public auth flow still behaves correctly with mail delivery involved
* Keep `Mail Service` stubbed in most broader suites.
* Add only a small dedicated smoke suite for the real mail adapter.

### Regression tests to keep from this stage onward

* Auth code mail remains a direct dependency of auth flow.
* Mail failures do not corrupt auth challenge/session state.
* Stub mail remains the default for most non-mail-focused suites.

---

## 5. Notification Service

### Service tests

* Event intake tests:

  * accepted event types
  * malformed event rejection
  * idempotent duplicate handling
* Routing decision tests:

  * push only
  * email only
  * push and email
  * discard/no-delivery cases
* Rendering tests:

  * event-to-notification mapping
  * payload shaping for push
  * payload shaping for email
* Failure isolation tests:

  * push failure does not corrupt email route decision
  * email failure does not corrupt push route decision
  * retriable delivery behavior
* Redis/event bus tests:

  * consume domain/integration events
  * publish client-facing events for gateway
  * enqueue mail commands for mail service

### Inter-service integration tests with already implemented components

* `Notification <-> Gateway`

  * client-facing event publication and push delivery
  * user-targeted vs session-targeted push routing
* `Notification <-> Mail`

  * non-auth email delivery
  * retry/failure isolation
* `Lobby/other fake producers <-> Notification`

  * domain event intake compatibility
* Assert explicitly that auth-code emails still bypass notification and go directly from auth to mail.

### Regression tests to keep from this stage onward

* Notification stays delivery/orchestration-only and does not become source of truth.
* Non-auth notifications consistently go through notification service.
* Gateway push compatibility remains stable.

---

## 6. Game Lobby Service

### Service tests

* Game lifecycle tests:

  * `draft`
  * `enrollment_open`
  * `enrollment_closed`
  * `ready_to_start`
  * `starting`
  * `running`
  * `paused`
  * `finished`
  * `cancelled`
* Public/private game rules:

  * public game creation by admin only
  * private game creation entitlement checks
  * visibility rules for private games
* Invite lifecycle tests:

  * invite code creation
  * invite code redemption
  * invite approval/rejection
  * invite expiration if applicable later
* Application and approval tests:

  * public game application
  * manual approval
  * duplicate application handling
* Membership tests:

  * invited
  * pending
  * accepted
  * removed
  * blocked from party
* User list/read-model tests:

  * active games
  * finished games
  * pending applications
  * invited games
* Start-preparation tests:

  * roster validation
  * schedule validation
  * engine version target validation
  * readiness to start
* Runtime snapshot import tests:

  * `current_turn`
  * `runtime_status`
  * `engine_health_summary`

### Inter-service integration tests with already implemented components

* `Gateway <-> Game Lobby`

  * authenticated platform-level command routing
  * owner-only commands before start
* `Lobby <-> User`

  * entitlement checks for private game creation
  * per-user simultaneous-game limits
  * sanctions affecting join/create flows
* `Lobby <-> Notification`

  * invite events
  * approval/rejection events
  * game status change events at platform level
* `Lobby <-> Auth / Session`

  * authenticated context correctly propagated from gateway
* Keep runtime launch boundaries stubbed until `Runtime Manager` exists.

### Regression tests to keep from this stage onward

* `Lobby` remains source of truth for platform game metadata and membership.
* `Lobby` user-facing game lists remain independent from `Game Master`.
* Private-game visibility and invite semantics remain stable.

---

## 7. Runtime Manager

### Service tests

* Runtime job tests:

  * start container
  * stop container
  * restart container
  * patch container
  * inspect/status
* Invariant tests:

  * one game -> one container
  * one container -> one game
* Monitoring tests:

  * health probe collection
  * health event publication
  * container disappearance handling
  * restart/patch result reporting
* Failure tests:

  * Docker API unavailable
  * image missing
  * startup timeout
  * stop timeout
  * patch failure
* Event publication tests:

  * runtime job completion events
  * technical health events
  * duplicate event safety

### Inter-service integration tests with already implemented components

* `Lobby <-> Runtime Manager`

  * async start job request
  * completion event consumption
  * full fail-start path
* `Runtime Manager <-> Notification`

  * optional operational event routing if enabled
* Use a fake or test runtime backend first, then a targeted smoke suite against a real local Docker backend.

### Regression tests to keep from this stage onward

* Runtime Manager remains the only component talking to Docker API.
* Runtime job event contracts remain stable for `Lobby` and later `Game Master`.

---

## 8. Game Master

### Service tests

* Runtime registry tests:

  * register running game
  * unregister/stop game
  * runtime state transitions
* Engine version registry tests:

  * version registration
  * patch compatibility policy
  * version-specific options
* Runtime metadata tests:

  * current turn
  * runtime status
  * generation status
  * engine health summary
  * patch state
* Membership/runtime mapping tests:

  * `user_id -> engine player UUID`
  * game-scoped engine identifiers
* Scheduling tests:

  * scheduled turn generation
  * cutoff enforcement
  * manual force-next-turn
  * skip-next-scheduled-slot after manual generation
* Failure tests:

  * `generation_failed`
  * `engine_unreachable`
  * runtime recovery from engine errors
* Post-start administrative tests:

  * `stop game`
  * `patch engine`
  * temporary player removal at platform gate only
  * final player removal/deactivation inside engine
* Engine mediation tests:

  * engine setup after lobby metadata persistence
  * engine finish notification handling

### Inter-service integration tests with already implemented components

* `Gateway <-> Game Master`

  * running-game command routing with `game_id`
  * runtime-admin commands for running games
  * system admin vs private-owner privileges where applicable
* `Game Master <-> Lobby`

  * running-game registration after successful container start
  * membership lookup/cached authorization
  * runtime snapshot backfill into lobby
  * finished-game notification to lobby
* `Game Master <-> Runtime Manager`

  * patch/stop/restart jobs
  * runtime health event consumption
* `Game Master <-> Notification`

  * new turn event publication
  * game finished event publication
  * generation failure admin notification
* `Game Master <-> test engine container`

  * command proxying
  * status read
  * setup call
  * finish callback

### Regression tests to keep from this stage onward

* `Game Master` remains the only service allowed to call game engine containers.
* Turn cutoff logic stays authoritative at platform level.
* Manual next-turn generation always suppresses the next scheduled slot.
* Runtime snapshot compatibility with `Lobby` remains stable.

---

## 9. Admin Service

### Service tests

* Admin API surface tests:

  * admin-only route handling
  * DTO validation
  * aggregation/read models
* Orchestration tests:

  * forwards trusted operations to downstream services
  * error aggregation and normalization
  * partial failure handling for multi-step admin workflows
* Role-handling tests:

  * admin-only enforcement assumptions
  * no accidental privilege leak into normal user flows

### Inter-service integration tests with already implemented components

* `Gateway <-> Admin`

  * separate admin REST surface
  * admin-authenticated request handling
* `Admin <-> User`

  * user restriction/sanction/admin reads
* `Admin <-> Lobby`

  * public game administration
  * global read of private games
* `Admin <-> Game Master`

  * runtime administration
  * global status reads
  * patch/stop/force-next-turn
* `Admin <-> Auth / Session`

  * session revoke/block operations if exposed through admin workflows
* `Admin <-> Notification`

  * admin-generated notifications where needed

### Regression tests to keep from this stage onward

* Admin Service remains orchestration/backend only.
* System admin capabilities remain separate from private-owner capabilities.

---

## 10. Geo Profile Service

### Service tests

* Ingest tests:

  * enqueue authenticated observation
  * ingest validation
  * non-blocking acceptance
* Worker pipeline tests:

  * geo lookup
  * country aggregation
  * `usual_connection_country` derivation
  * suspicious multi-country detection
  * review recommendation calculation
* State tests:

  * durable `country_review_recommended`
  * declared-country version history
  * session block action history
* Admin/query API tests:

  * list review candidates
  * read user geo profile
  * apply approved declared-country change
* Queue and lag tests:

  * backlog observability
  * duplicate observation safety
  * delayed processing behavior

### Inter-service integration tests with already implemented components

* `Gateway <-> Geo`

  * async observation publish from authenticated request context
* `Geo <-> Auth / Session`

  * suspicious session block request
  * subsequent-request effect rather than current-request effect
* `Geo <-> User`

  * synchronous update of current `declared_country`
  * no divergence between history and current value
* `Geo <-> Notification`

  * review-recommended event fan-out
  * optional admin notification flow
* Keep geo processing fail-open relative to gameplay in all integration tests.

### Regression tests to keep from this stage onward

* Geo processing never blocks the current gameplay request.
* Session suspicion affects only later requests via auth/session.
* Geo owns history, while user service owns current effective declared country.

---

## 11. Billing Service

### Service tests

* Payment event intake tests:

  * accepted event types
  * malformed event rejection
  * idempotent duplicate handling
* Entitlement mapping tests:

  * free
  * monthly-paid
  * annual-paid
  * once-forever-paid
* Lifecycle tests:

  * activate paid entitlement
  * expire renewable entitlement
  * cancel paid entitlement
  * preserve perpetual entitlement
* Failure tests:

  * unknown user
  * invalid payment state
  * downstream user update failure

### Inter-service integration tests with already implemented components

* `Billing <-> User`

  * entitlement updates become current source of truth in user service
* `Billing <-> Notification`

  * optional billing-related user/admin notifications
* `Gateway <-> User` regression:

  * user-facing entitlement reads reflect billing-fed updates correctly

### Regression tests to keep from this stage onward

* Other services never depend directly on billing for live entitlement decisions.
* `User Service` remains the source of truth for current entitlement.

---

## Full System Tests

These tests are added only after all major components are implemented.

By default, they should use:

* real gateway;
* real auth/session;
* real user;
* real notification;
* real lobby;
* real runtime manager;
* real game master;
* real admin;
* real geo;
* real Redis;
* stub `Mail Service` by default;
* test engine container or stable test engine image.

### A. Authentication and session lifecycle

* Register/login via email code through gateway.
* Confirm that `device_session_id` becomes usable through gateway without synchronous auth lookups on every request.
* Confirm that repeated `confirm-email-code` within the idempotency window returns the same `device_session_id`.
* Revoke one session and verify:

  * authenticated requests fail for that session;
  * only push streams bound to that session are closed.
* Revoke all sessions of a user and verify all sessions are rejected afterward.

### B. User profile and entitlement flow

* Read and update allowed user profile fields through gateway.
* Read tariff/entitlement and user limits through gateway.
* Verify that private-party creation entitlement decisions reflect current user-service state.
* Later, verify billing-fed entitlement changes become visible through user-service reads.

### C. Public game lifecycle

* Admin creates a public game.
* Users see it in public lists.
* Users apply.
* Admin approves roster.
* Lobby validates readiness.
* Runtime Manager starts container.
* Lobby persists metadata.
* Game Master registers the running game and initializes engine.
* Game becomes visible as running in user lists.

### D. Private game lifecycle

* Eligible user creates private game.
* Owner creates invite code.
* Another user redeems invite code and applies.
* Owner approves application.
* Owner starts game.
* Running registration completes.
* Only authorized users see the private game.

### E. Running-game command and push flow

* Player sends valid game command before cutoff.
* Gateway authenticates and routes to Game Master.
* Game Master verifies access and forwards to engine.
* Scheduled turn generation occurs.
* Player receives lightweight push notification through gateway.
* Player separately fetches updated per-player game state.

### F. Force-next-turn flow

* Running game has a fixed schedule.
* Owner or admin triggers manual next-turn generation.
* Current turn increments.
* Next scheduled slot is skipped.
* Subsequent scheduled generation happens only after the following valid slot.

### G. Runtime failure flow

* Scheduled turn generation fails.
* Game Master marks `generation_failed`.
* Lobby receives updated runtime snapshot.
* Only administrators are notified through notification flow.
* Users can still observe degraded problem state through status reads.

### H. Start failure and recovery flow

* Lobby requests runtime start.
* Runtime Manager starts container.
* Simulate metadata persistence failure in Lobby.
* Verify container is removed and game is not left half-started.
* Simulate successful metadata persistence but Game Master registration failure.
* Verify game is marked `paused` and admin is notified.

### I. Temporary vs final player removal flow

* Temporarily remove player after game start.
* Verify player can no longer send commands through platform.
* Verify engine still keeps the slot.
* Final-remove or account-block the player.
* Verify Game Master sends engine admin command to deactivate/remove the player.

### J. Notification routing flow

* Lobby emits invite/application/approval events.
* Notification Service sends push through gateway.
* Non-auth email notifications route through Notification Service to Mail Service.
* Auth-code emails remain direct `Auth / Session -> Mail`.

### K. Geo auxiliary flow

* Authenticated traffic generates geo observations.
* Suspicious multi-country pattern is detected.
* Current triggering request still succeeds.
* Auth / Session blocks the suspicious session.
* Next request from that session is rejected.

### L. Admin supervision flow

* System admin uses admin REST through gateway.
* Admin can view public and private games.
* Admin can inspect running-game runtime state.
* Admin can stop game, patch engine, and force next turn.
* Admin can block users and revoke sessions through appropriate downstream APIs.

## Ongoing Regression Policy

* Every time a new service is added, its service tests are mandatory before merging.
* Every new service boundary must add at least one inter-service integration suite against already implemented neighbors.
* Every bug found in integration or system testing must produce:

  * one narrow regression test at the lowest useful level;
  * and, if applicable, one broader integration or system scenario.
* The full system suite should stay intentionally limited to high-value vertical slices, not explode into a giant matrix.

## Practical Rule of Execution

* During early development:

  * run service tests on every change;
  * run inter-service tests for affected neighboring services on every branch;
  * run a reduced smoke subset of system tests in CI.
* During stabilization:

  * keep service and integration tests mandatory in CI;
  * expand system tests around the critical product flows only.

## Summary

The project-wide testing strategy is fixed as follows:

* first, **service tests** inside each component;
* then, as components appear, **inter-service integration tests** between real neighboring services;
* finally, after all major components are implemented, **full system tests** for complete end-to-end platform flows.

This order is mandatory for the project because the architecture contains several critical stateful and asynchronous seams:

* gateway verification and routing;
* auth/session projection into gateway cache;
* push delivery through gateway;
* Redis Streams event propagation;
* runtime job completion;
* lobby/game-master synchronization;
* geo post-factum protective actions.