Files
galaxy-game/geoprofile/PLAN.md
T
2026-04-22 08:49:45 +02:00

20 KiB

Implementation Plan for Geo Profile Service

Planning Principles

This plan is aligned with the agreed architecture and is written for an experienced developer implementing an internal microservice in a trusted environment.

Execution priorities:

  • Keep the edge path non-blocking.
  • Keep the service boundary narrow.
  • Build append/update-only ingest first.
  • Preserve clear ownership split with User Service.
  • Defer threshold tuning until after the basic data model is working.
  • Avoid unnecessary infrastructure on the first iteration.

Stage 01 — Freeze Service Vocabulary and Contracts

Goal:

  • Remove naming ambiguity before any implementation begins.

Tasks:

  • Choose the final service name used in repository, configuration, and docs.
  • Freeze the country-related domain terms:
    • declared_country
    • observed_country
    • usual_connection_country
    • country_review_recommended
  • Freeze cross-service ownership rules.
  • Write a short internal ADR describing why the latest declared_country lives in User Service while version history lives in Geo Profile Service.
  • Write a short internal ADR describing why the edge path is async FlatBuffers instead of request-response RPC.

Exit criteria:

  • No domain term remains overloaded or unclear.
  • No service boundary question remains unresolved.

Stage 02 — Define the Minimal Domain Model

Goal:

  • Describe the persistent state before choosing transport or storage details.

Tasks:

  • Define conceptual entities and their relationships.
  • Freeze mandatory fields for:
    • country observation
    • per-session country ranking
    • review flag state
    • declared country version history
    • session block request log
  • Decide which timestamps are mandatory on each entity.
  • Decide whether optional hashed IP storage exists at all in v1.
  • Decide whether declared-country version records need explicit lifecycle state:
    • recorded
    • applied
    • sync_failed

Recommended minimal entities:

  • country_observation
  • device_session_country_score
  • user_review_state
  • declared_country_version
  • session_block_action

Exit criteria:

  • The storage layer can be designed directly from the domain model.
  • The model reflects all agreed semantics and no extra features.

Stage 03 — Design the Ingest Message Schema

Goal:

  • Freeze the binary contract from Edge Service to Geo Profile Service.

Tasks:

  • Create the FlatBuffers schema for the async ingest message.
  • Limit message fields to:
    • user_id
    • device_session_id
    • ip_address
  • Define allowed field types and byte layout.
  • Define message versioning strategy for future backward-compatible additions.
  • Decide how schema version is represented.
  • Define receiver behavior for malformed messages.

Important constraints:

  • No protobuf wrapper.
  • No business reply payload.
  • No external validation of identifiers.
  • Only schema-level validation on receipt.

Exit criteria:

  • Edge Service and Geo Profile Service can generate compatible FlatBuffers code.
  • Message evolution path exists without breaking v1.

Stage 04 — Choose and Implement the Async Ingest Transport

Goal:

  • Implement the simplest possible binary ingress path that does not behave like normal RPC.

Tasks:

  • Choose the concrete transport for internal binary publication.
  • Recommended default:
    • internal HTTP endpoint
    • application/octet-stream
    • FlatBuffers body
    • empty response body
    • status-only acknowledgement
  • Implement the receiver endpoint in Geo Profile Service.
  • Implement an async publisher client in Edge Service.
  • Ensure the edge client publishes out-of-band from the main request execution path.
  • Ensure the edge ignores publication failures for request progression.
  • Add metrics for publish attempts, successes, and failures.

Important note:

  • The edge path must remain operational even if Geo Profile Service is completely unavailable.

Exit criteria:

  • The edge can publish authenticated observations without blocking the main API flow.
  • Transport failures do not change edge business behavior.

Stage 05 — Build the Internal Durable Queue

Goal:

  • Decouple acceptance of ingress messages from their processing.

Tasks:

  • Select the simplest queue implementation inside the service.
  • Prefer a durable queue over an in-memory-only queue.
  • Implement enqueue-on-receive behavior.
  • Implement worker dequeue behavior.
  • Define queue item lifecycle:
    • accepted
    • processing
    • processed
    • failed
  • Define retry strategy for worker failures.
  • Define dead-letter or failure-handling strategy if retries are exhausted.
  • Add queue metrics:
    • depth
    • oldest item age
    • processing rate
    • failure count

Recommended starting point:

  • Database-backed queue table or similarly simple durable append structure.

Exit criteria:

  • Geo Profile Service can accept messages quickly and process them later.
  • Worker failures do not lose already accepted work silently.

Stage 06 — Add Local Geo-IP Resolution

Goal:

  • Resolve country from IP locally and cheaply.

Tasks:

  • Choose the Geo-IP database for v1.
  • Add a loader for the local country database.
  • Implement lookup adapter for IP to country.
  • Define how unknown, invalid, or non-resolvable IPs are handled.
  • Add a periodic database refresh job.
  • Add health signals for Geo-IP database presence and age.

Design constraints:

  • Country only.
  • No external network lookup during request processing.
  • No Geo-IP version persistence with each observation.

Exit criteria:

  • Workers can resolve country from IP locally.
  • Geo-IP database refresh is operationally manageable.

Stage 07 — Persist Observation Facts

Goal:

  • Materialize observed_country as stored domain facts.

Tasks:

  • Implement the observation persistence model.
  • Store at minimum:
    • user_id
    • device_session_id
    • observed_country
    • observation time
  • Decide whether observations are stored as full facts, time-bucketed facts, or a hybrid model.
  • Keep storage bounded and suitable for later aggregation.
  • Add read support needed for internal recalculation and admin inspection.

Constraints:

  • Do not turn this into a raw per-request IP audit log.
  • Prefer country-level facts over low-level network data.

Exit criteria:

  • The service stores enough observed-country history to support ranking and review.

Stage 08 — Implement Per-Session Country Ranking

Goal:

  • Maintain ranked countries per device_session_id.

Tasks:

  • Define the initial scoring algorithm using recent activities with decay.
  • Implement score update on each processed observation.
  • Persist ranked country scores per device_session_id.
  • Define how ties are handled.
  • Define how stale scores decay or are compacted over time.
  • Expose enough state for later admin inspection.

Important constraints:

  • No active-day model in v1.
  • No heavy analytics pipeline.
  • Keep updates cheap enough for continuous background processing.

Exit criteria:

  • Each device_session_id has a current ranked country list.
  • Ranking is stable and cheap to update.

Stage 09 — Compute usual_connection_country

Goal:

  • Derive a current per-session representative country from the ranking.

Tasks:

  • Define the selection rule for the top country.
  • Decide whether a minimum score or minimum margin is needed before setting a value.
  • Persist the current usual_connection_country per device_session_id.
  • Add recalculation hooks when session country scores change.
  • Add tests for common drift scenarios:
    • one stable country
    • gradual shift over time
    • alternating countries
    • sparse activity

Exit criteria:

  • usual_connection_country can be read directly without recomputing the full score set every time.

Stage 10 — Implement Review Recommendation State

Goal:

  • Persist and expose country_review_recommended.

Tasks:

  • Define the initial rule that sets the review flag.
  • Persist review state at user level.
  • Detect transitions from false to true.
  • Ensure repeated writes do not keep re-emitting the same transition indefinitely.
  • Add API access for reading the flag.
  • Add background recalculation entry points if the rule changes later.

Design requirement:

  • Review state must live in storage and be queryable even if event delivery fails.

Exit criteria:

  • The flag is durable, queryable, and transition-aware.

Stage 11 — Publish Review Events and Optional Email

Goal:

  • Add auxiliary notifications for review-worthy users.

Tasks:

  • Define the normalized notification-intent payload for geo.review_recommended.
  • Implement intent publication on transition to true.
  • Implement configuration-driven administrator-notification handoff through Notification Service.
  • Add notification deduplication or transition-only logic to prevent spam.
  • Add failure metrics for both event publication and downstream notification handoff.

Important constraints:

  • The event bus is not the authoritative source of truth.
  • Email is optional and non-blocking for business correctness.

Exit criteria:

  • Review transitions can notify administrators without becoming a dependency for state correctness.

Stage 12 — Implement Suspicious Multi-Country Session Detection

Goal:

  • Detect suspicious short-window cross-country behavior across sessions of the same user.

Tasks:

  • Define the initial heuristic for suspicious mixed-country windows.
  • Decide which session becomes the target of blocking when a conflict appears.
  • Implement detection logic using stored observations and/or per-session summaries.
  • Add persistence for suspicion evidence or at least action logs.
  • Keep the heuristic configurable, not hard-coded deep in the codebase.

Important constraints:

  • The current triggering request is allowed to continue.
  • Only suspicious device_session_id values are blocked.
  • The entire user account is never blocked by this service.

Exit criteria:

  • The service can identify suspicious session patterns and produce a block action request.

Stage 13 — Integrate Session Blocking with Auth / Session Service

Goal:

  • Make suspicious session handling operational.

Tasks:

  • Define the internal API contract for session blocking.
  • Implement the client toward Auth / Session Service.
  • Ensure block requests are idempotent.
  • Record block requests and outcomes locally for inspection.
  • Add retry or failure-handling policy for temporary downstream failures.
  • Add metrics for block attempts, successes, and failures.

Exit criteria:

  • Geo Profile Service can request blocking of suspicious sessions and track the result.

Stage 14 — Implement Declared Country Version History

Goal:

  • Add versioned history of declared_country inside Geo Profile Service.

Tasks:

  • Define the version record schema.
  • Persist all approved changes as immutable version records.
  • Add actor metadata needed for internal audit:
    • who triggered the change
    • when it happened
    • optional reason or comment
  • Implement version lifecycle state if adopted:
    • recorded
    • applied
    • sync_failed
  • Add read support for history in admin APIs.

Important constraint:

  • Version history is owned only by Geo Profile Service.

Exit criteria:

  • The service can preserve the full change history independently from User Service.

Stage 15 — Implement Current Country Sync to User Service

Goal:

  • Keep the latest effective declared_country centralized in User Service.

Tasks:

  • Define the internal REST contract to update current declared_country in User Service.
  • Implement synchronous update from Geo Profile Service.
  • Ensure that a history version does not become effective until the sync succeeds.
  • Implement failure handling and status persistence when sync fails.
  • Add retry tooling or operator visibility for failed syncs.

Design requirement:

  • No other service should bypass this write path.

Exit criteria:

  • Approved changes update both version history and current user state without silent divergence.

Stage 16 — Build the Internal Read APIs

Goal:

  • Expose the minimum trusted JSON REST API required for operations and admin tooling.

Tasks:

  • Implement review-candidate listing endpoint.
  • Support at least:
    • country_review_recommended=true
    • pagination
    • stable ordering
  • Implement user geo-profile endpoint.
  • Group returned data by device_session_id.
  • Include:
    • review flag
    • per-session ranked countries
    • usual_connection_country
    • observation summaries
    • declared country history
    • block-action history if useful
  • Add authentication and authorization appropriate for trusted internal callers.

Exit criteria:

  • Admin tools can list users for review and inspect full geo-related user state.

Stage 17 — Build the Internal Command API for Country Change Application

Goal:

  • Expose the internal command path for approved declared_country changes.

Tasks:

  • Implement the trusted internal command endpoint.
  • Accept the approved new country and actor metadata.
  • Write the new version record.
  • Synchronize current value into User Service.
  • Return success only if the change is fully applied.
  • Return a recoverable failure state if sync fails.

Clarification:

  • Public user-facing request creation is outside this service boundary unless explicitly added later.
  • This command API is for internal orchestration of approved changes.

Exit criteria:

  • Admin or internal orchestration can apply a country change through one controlled path.

Stage 18 — Add Admin-Oriented Data Shaping

Goal:

  • Make the returned data useful for manual decisions without overloading the API consumer.

Tasks:

  • Shape user geo-profile responses around manual review needs.
  • Include compact ranked-country views per session.
  • Include enough timestamps to understand temporal drift.
  • Include current review recommendation state.
  • Include declared-country version chain in a readable order.
  • Avoid leaking unnecessary low-level network data.

Exit criteria:

  • The admin interface can render useful country history and session separation without extra joins.

Stage 19 — Add Observability and Operational Controls

Goal:

  • Make the service operable in production before traffic ramps up.

Tasks:

  • Add metrics for every critical path:
    • ingest publish receipt
    • queue depth and lag
    • worker throughput
    • Geo-IP lookup failures
    • ranking updates
    • review-flag transitions
    • block requests
    • user-service sync failures
    • mail and event failures
  • Add structured logs with correlation identifiers where possible.
  • Add readiness and liveness endpoints.
  • Add dashboards and alerts for:
    • queue lag
    • persistent sync failures
    • spike in suspicious session blocks
    • Geo-IP database stale age

Exit criteria:

  • Production operation does not depend on manual log-grepping.

Stage 20 — Add Test Coverage in Increasing Layers

Goal:

  • Validate the service incrementally, from pure logic up to full integration.

Tasks:

  • Add unit tests for:
    • Geo-IP lookup adapter
    • ranking logic
    • usual_connection_country selection
    • review recommendation logic
    • suspicious session detection
  • Add storage tests for:
    • observation persistence
    • version history
    • queue behavior
  • Add integration tests for:
    • edge-style ingest acceptance
    • worker processing
    • User Service sync behavior
    • Auth / Session Service block calls
    • event and mail side effects
  • Add failure-path tests:
    • malformed FlatBuffers payload
    • queue retry
    • Geo-IP lookup miss
    • User Service sync failure
    • block-request downstream failure

Exit criteria:

  • The highest-risk logic and all external integrations are covered.

Stage 21 — Add Data Migration and Backfill Strategy

Goal:

  • Prepare for safe rollout in an existing microservice environment.

Tasks:

  • Create initial database migrations.
  • Define zero-data bootstrap behavior for new users and sessions.
  • Define how existing users with already populated declared_country in User Service appear in Geo Profile Service before any version history exists.
  • Decide whether an initial synthetic version record is needed for current production users.
  • Add operational scripts for repair and backfill if required.

Exit criteria:

  • The service can be introduced without corrupting current user country state.

Stage 22 — Roll Out in Shadow Mode

Goal:

  • Validate the service behavior before relying on its outputs operationally.

Tasks:

  • Deploy Geo Profile Service without enabling admin actions or session blocking.
  • Publish ingest data from edge asynchronously.
  • Process observations and compute derived state silently.
  • Observe queue behavior, lookup correctness, score stability, and storage growth.
  • Compare resulting data shape against expected real traffic behavior.
  • Tune thresholds for:
    • review recommendation
    • suspicious mixed-country detection
    • score decay

Exit criteria:

  • The service behaves sanely on production-shaped traffic without affecting users.

Stage 23 — Enable Review Workflow

Goal:

  • Turn on the first real consumer-facing internal functionality.

Tasks:

  • Enable review-candidate listing in the admin interface.
  • Enable user geo-profile rendering.
  • Enable approved country-change application path.
  • Keep session blocking disabled if needed for a staged rollout.
  • Verify that User Service stays consistent with declared-country version history.

Exit criteria:

  • Administrators can inspect users and apply country changes safely.

Stage 24 — Enable Suspicious Session Blocking

Goal:

  • Turn on the account-protection part of the service.

Tasks:

  • Enable session-block command emission to Auth / Session Service.
  • Start with conservative thresholds.
  • Monitor false positives closely.
  • Add temporary operational kill-switches for the detection path.
  • Verify that only suspicious sessions are blocked and not entire accounts.

Exit criteria:

  • The service can protect accounts without destabilizing the rest of the platform.

Stage 25 — Stabilize and Simplify

Goal:

  • Remove accidental complexity after the first complete iteration.

Tasks:

  • Review actual queue backlog behavior.
  • Review observation retention cost.
  • Review whether optional hashed IP storage is still unnecessary.
  • Review scoring tunability versus implementation complexity.
  • Remove dead code and speculative abstractions.
  • Freeze the v1 API once real consumers are stable.

Exit criteria:

  • The service remains small, understandable, and aligned with its original narrow purpose.

Delivery Sequence Summary

Recommended delivery order:

  • Domain vocabulary and ownership
  • Domain model
  • FlatBuffers schema
  • Async ingest transport
  • Internal durable queue
  • Geo-IP lookup
  • Observation persistence
  • Session ranking
  • usual_connection_country
  • Review state
  • Event and mail notifications
  • Suspicious-session detection
  • Session blocking integration
  • Declared-country versioning
  • Sync to User Service
  • Admin read API
  • Country-change command API
  • Observability
  • Tests
  • Shadow rollout
  • Review enablement
  • Blocking enablement
  • Cleanup

Final Acceptance Criteria

The implementation may be considered complete for v1 when all of the following are true:

  • Edge Service publishes authenticated country observations asynchronously without affecting request processing.
  • Geo Profile Service resolves and stores observed_country.
  • The service maintains per-device_session_id country ranking and usual_connection_country.
  • country_review_recommended is durable, queryable, and not event-dependent.
  • Admin tooling can fetch review candidates and per-user geo profiles.
  • Approved declared_country changes are versioned in Geo Profile Service and synchronized into User Service.
  • Suspicious sessions can be blocked through Auth / Session Service.
  • Optional email and event notifications work without becoming correctness dependencies.
  • The service is observable and operable under real traffic.