Files
galaxy-game/geoprofile/PLAN.md
T
2026-04-26 20:34:39 +02:00

26 KiB

Implementation Plan for Geo Profile Service

Planning Principles

This plan is aligned with the agreed architecture and is written for an experienced developer implementing an internal microservice in a trusted environment.

Execution priorities:

  • Keep the edge path non-blocking.
  • Keep the service boundary narrow.
  • Build append/update-only ingest first.
  • Preserve clear ownership split with User Service.
  • Defer threshold tuning until after the basic data model is working.
  • Avoid unnecessary infrastructure on the first iteration.

Stage 00 — Persistence Stack and Backend Assignment

Goal:

  • Pin the platform-wide persistence stack and the per-service backend ownership before any feature stage begins, so that subsequent stages design schemas, queries, and worker loops consistently with the project-wide rules in ../ARCHITECTURE.md §Persistence Backends and the staged migration plan in ../PG_PLAN.md.

This stage is documentation-only: no code exists in this service yet, and this stage adds none. It is a prerequisite to every later stage and ships as part of PG_PLAN.md Stage 8.

Tasks:

  • Adopt the shared Postgres helper pkg/postgres for every durable storage path:

    • driver github.com/jackc/pgx/v5, exposed as *sql.DB via github.com/jackc/pgx/v5/stdlib;
    • query layer github.com/go-jet/jet/v2 (PostgreSQL dialect) with generated code under internal/adapters/postgres/jet/, regenerated by a per-service make jet target and committed to the repo;
    • migrations via github.com/pressly/goose/v3 library API embedded with //go:embed, applied at service startup before any HTTP listener becomes ready, with non-zero exit on failure;
    • github.com/testcontainers/testcontainers-go (modules/postgres) for unit tests and for hosting the transient instance used by make jet.
  • Adopt the shared Redis helper pkg/redisconn for every Redis client:

    • master/replica/password connection shape;
    • mandatory password;
    • no TLS_ENABLED, no USERNAME (rejected at startup with a clear error from pkg/redisconn.LoadFromEnv).
  • Own the geoprofile schema in the shared galaxy PostgreSQL database. Connect with a dedicated geoprofile PG role whose grants are restricted to its own schema (defense-in-depth, expressed in the initial migration).

  • Lay out the postgres-backed adapter directory consistently with the PG-migrated services:

    geoprofile/
      internal/
        adapters/
          postgres/
            migrations/         # *.sql files + migrations.go (//go:embed)
            jet/                # generated code, commit-checked
            <storeName>/        # adapter implementations matching
                                # internal/ports
        config/
          config.go             # Postgres + Redis schemas
      Makefile                  # `jet` target: testcontainers + goose + jet
    
  • Backend assignment for the entities listed in README.md §Data Entities:

    • PostgreSQL (geoprofile schema, source of truth):

      • country_observation — durable observed-country fact rows.
      • device_session_country_score — per-device_session_id weighted country aggregates.
      • device_session_geo_state — current usual_connection_country per device_session_id.
      • user_review_statecountry_review_recommended flag and last evaluation timestamp.
      • declared_country_version — immutable history of approved declared_country changes (with version status recorded / applied / sync_failed).
      • session_block_action — local audit of block-request outcomes.
      • Ingest-queue lifecycle from §Stage 05 (accepted / processing / processed / failed) is materialised as status / next_attempt_at columns on the durable observation row, not as a Redis ZSET. Workers select pending work via SELECT ... FOR UPDATE SKIP LOCKED, mirroring the pattern already in use by Mail and Notification.
    • Redis (pkg/redisconn):

      • only ephemeral runtime-coordination signals if any appear during implementation — for example, transition-deduplication windows for review-flag notifications, short worker leases on processing claims. No durable business state.
      • the notification:intents Redis Stream is used by this service only as a producer to publish geo.review_recommended intents (see §Stage 11 and README.md §Integration with Notification Service); that connection is built via pkg/redisconn.
  • Idempotency, if added for ingest deduplication, is a UNIQUE constraint on the durable observation row, never a separate Redis kv. Retry scheduling, if added for worker reprocessing or User Service sync retries, is a column on the durable record, worked off via FOR UPDATE SKIP LOCKED. Both rules align this service with the platform-wide pattern.

  • Time-valued columns are timestamptz. Adapters normalise every time.Time value crossing the SQL boundary to time.UTC on bind and scan, per ../ARCHITECTURE.md §Persistence Backends — Timestamp handling.

  • Configuration (target):

    • PostgreSQL knobs (loaded via pkg/postgres.LoadFromEnv("GEOPROFILE")):

      • GEOPROFILE_POSTGRES_PRIMARY_DSN (required; postgres://geoprofile:<pwd>@<host>:5432/galaxy?search_path=geoprofile&sslmode=disable);
      • GEOPROFILE_POSTGRES_REPLICA_DSNS (optional, comma-separated; reserved for future read-routing, not consumed yet);
      • GEOPROFILE_POSTGRES_OPERATION_TIMEOUT, GEOPROFILE_POSTGRES_MAX_OPEN_CONNS, GEOPROFILE_POSTGRES_MAX_IDLE_CONNS, GEOPROFILE_POSTGRES_CONN_MAX_LIFETIME.
    • Redis knobs (loaded via pkg/redisconn.LoadFromEnv("GEOPROFILE")):

      • GEOPROFILE_REDIS_MASTER_ADDR (required), GEOPROFILE_REDIS_REPLICA_ADDRS (optional, comma-separated);
      • GEOPROFILE_REDIS_PASSWORD (required);
      • GEOPROFILE_REDIS_DB, GEOPROFILE_REDIS_OPERATION_TIMEOUT.
  • Per-service decision record geoprofile/docs/postgres-migration.md is created by the stage that actually implements the service. It must capture: schema and role grants, queue materialisation choice, retry pattern, and any non-trivial deviation from the platform-wide rules (analogous to ../user/docs/postgres-migration.md, ../mail/docs/postgres-migration.md, ../notification/docs/postgres-migration.md, and ../lobby/docs/postgres-migration.md).

Exit criteria:

  • The persistence stack and schema ownership are fixed and visible to implementers.
  • Every later stage (Stage 01+) designs schemas and queries on top of the geoprofile Postgres schema, or — for any ephemeral signal — on top of pkg/redisconn.
  • ../ARCHITECTURE.md §Persistence Backends and ../PG_PLAN.md remain the canonical references; this PLAN points at them rather than duplicating their content.

Stage 01 — Freeze Service Vocabulary and Contracts

Goal:

  • Remove naming ambiguity before any implementation begins.

Tasks:

  • Choose the final service name used in repository, configuration, and docs.
  • Freeze the country-related domain terms:
    • declared_country
    • observed_country
    • usual_connection_country
    • country_review_recommended
  • Freeze cross-service ownership rules.
  • Write a short internal ADR describing why the latest declared_country lives in User Service while version history lives in Geo Profile Service.
  • Write a short internal ADR describing why the edge path is async FlatBuffers instead of request-response RPC.

Exit criteria:

  • No domain term remains overloaded or unclear.
  • No service boundary question remains unresolved.

Stage 02 — Define the Minimal Domain Model

Goal:

  • Describe the persistent state before choosing transport or storage details.

Tasks:

  • Define conceptual entities and their relationships.
  • Freeze mandatory fields for:
    • country observation
    • per-session country ranking
    • review flag state
    • declared country version history
    • session block request log
  • Decide which timestamps are mandatory on each entity.
  • Decide whether optional hashed IP storage exists at all in v1.
  • Decide whether declared-country version records need explicit lifecycle state:
    • recorded
    • applied
    • sync_failed

Recommended minimal entities:

  • country_observation
  • device_session_country_score
  • user_review_state
  • declared_country_version
  • session_block_action

Exit criteria:

  • The storage layer can be designed directly from the domain model.
  • The model reflects all agreed semantics and no extra features.

Stage 03 — Design the Ingest Message Schema

Goal:

  • Freeze the binary contract from Edge Service to Geo Profile Service.

Tasks:

  • Create the FlatBuffers schema for the async ingest message.
  • Limit message fields to:
    • user_id
    • device_session_id
    • ip_address
  • Define allowed field types and byte layout.
  • Define message versioning strategy for future backward-compatible additions.
  • Decide how schema version is represented.
  • Define receiver behavior for malformed messages.

Important constraints:

  • No protobuf wrapper.
  • No business reply payload.
  • No external validation of identifiers.
  • Only schema-level validation on receipt.

Exit criteria:

  • Edge Service and Geo Profile Service can generate compatible FlatBuffers code.
  • Message evolution path exists without breaking v1.

Stage 04 — Choose and Implement the Async Ingest Transport

Goal:

  • Implement the simplest possible binary ingress path that does not behave like normal RPC.

Tasks:

  • Choose the concrete transport for internal binary publication.
  • Recommended default:
    • internal HTTP endpoint
    • application/octet-stream
    • FlatBuffers body
    • empty response body
    • status-only acknowledgement
  • Implement the receiver endpoint in Geo Profile Service.
  • Implement an async publisher client in Edge Service.
  • Ensure the edge client publishes out-of-band from the main request execution path.
  • Ensure the edge ignores publication failures for request progression.
  • Add metrics for publish attempts, successes, and failures.

Important note:

  • The edge path must remain operational even if Geo Profile Service is completely unavailable.

Exit criteria:

  • The edge can publish authenticated observations without blocking the main API flow.
  • Transport failures do not change edge business behavior.

Stage 05 — Build the Internal Durable Queue

Goal:

  • Decouple acceptance of ingress messages from their processing.

Tasks:

  • Select the simplest queue implementation inside the service.
  • Prefer a durable queue over an in-memory-only queue.
  • Implement enqueue-on-receive behavior.
  • Implement worker dequeue behavior.
  • Define queue item lifecycle:
    • accepted
    • processing
    • processed
    • failed
  • Define retry strategy for worker failures.
  • Define dead-letter or failure-handling strategy if retries are exhausted.
  • Add queue metrics:
    • depth
    • oldest item age
    • processing rate
    • failure count

Recommended starting point:

  • Database-backed queue table or similarly simple durable append structure.

Exit criteria:

  • Geo Profile Service can accept messages quickly and process them later.
  • Worker failures do not lose already accepted work silently.

Stage 06 — Add Local Geo-IP Resolution

Goal:

  • Resolve country from IP locally and cheaply.

Tasks:

  • Choose the Geo-IP database for v1.
  • Add a loader for the local country database.
  • Implement lookup adapter for IP to country.
  • Define how unknown, invalid, or non-resolvable IPs are handled.
  • Add a periodic database refresh job.
  • Add health signals for Geo-IP database presence and age.

Design constraints:

  • Country only.
  • No external network lookup during request processing.
  • No Geo-IP version persistence with each observation.

Exit criteria:

  • Workers can resolve country from IP locally.
  • Geo-IP database refresh is operationally manageable.

Stage 07 — Persist Observation Facts

Goal:

  • Materialize observed_country as stored domain facts.

Tasks:

  • Implement the observation persistence model.
  • Store at minimum:
    • user_id
    • device_session_id
    • observed_country
    • observation time
  • Decide whether observations are stored as full facts, time-bucketed facts, or a hybrid model.
  • Keep storage bounded and suitable for later aggregation.
  • Add read support needed for internal recalculation and admin inspection.

Constraints:

  • Do not turn this into a raw per-request IP audit log.
  • Prefer country-level facts over low-level network data.

Exit criteria:

  • The service stores enough observed-country history to support ranking and review.

Stage 08 — Implement Per-Session Country Ranking

Goal:

  • Maintain ranked countries per device_session_id.

Tasks:

  • Define the initial scoring algorithm using recent activities with decay.
  • Implement score update on each processed observation.
  • Persist ranked country scores per device_session_id.
  • Define how ties are handled.
  • Define how stale scores decay or are compacted over time.
  • Expose enough state for later admin inspection.

Important constraints:

  • No active-day model in v1.
  • No heavy analytics pipeline.
  • Keep updates cheap enough for continuous background processing.

Exit criteria:

  • Each device_session_id has a current ranked country list.
  • Ranking is stable and cheap to update.

Stage 09 — Compute usual_connection_country

Goal:

  • Derive a current per-session representative country from the ranking.

Tasks:

  • Define the selection rule for the top country.
  • Decide whether a minimum score or minimum margin is needed before setting a value.
  • Persist the current usual_connection_country per device_session_id.
  • Add recalculation hooks when session country scores change.
  • Add tests for common drift scenarios:
    • one stable country
    • gradual shift over time
    • alternating countries
    • sparse activity

Exit criteria:

  • usual_connection_country can be read directly without recomputing the full score set every time.

Stage 10 — Implement Review Recommendation State

Goal:

  • Persist and expose country_review_recommended.

Tasks:

  • Define the initial rule that sets the review flag.
  • Persist review state at user level.
  • Detect transitions from false to true.
  • Ensure repeated writes do not keep re-emitting the same transition indefinitely.
  • Add API access for reading the flag.
  • Add background recalculation entry points if the rule changes later.

Design requirement:

  • Review state must live in storage and be queryable even if event delivery fails.

Exit criteria:

  • The flag is durable, queryable, and transition-aware.

Stage 11 — Publish Review Events and Optional Email

Goal:

  • Add auxiliary notifications for review-worthy users.

Tasks:

  • Define the normalized notification-intent payload for geo.review_recommended.
  • Implement intent publication on transition to true.
  • Implement configuration-driven administrator-notification handoff through Notification Service.
  • Add notification deduplication or transition-only logic to prevent spam.
  • Add failure metrics for both event publication and downstream notification handoff.

Important constraints:

  • The event bus is not the authoritative source of truth.
  • Email is optional and non-blocking for business correctness.

Exit criteria:

  • Review transitions can notify administrators without becoming a dependency for state correctness.

Stage 12 — Implement Suspicious Multi-Country Session Detection

Goal:

  • Detect suspicious short-window cross-country behavior across sessions of the same user.

Tasks:

  • Define the initial heuristic for suspicious mixed-country windows.
  • Decide which session becomes the target of blocking when a conflict appears.
  • Implement detection logic using stored observations and/or per-session summaries.
  • Add persistence for suspicion evidence or at least action logs.
  • Keep the heuristic configurable, not hard-coded deep in the codebase.

Important constraints:

  • The current triggering request is allowed to continue.
  • Only suspicious device_session_id values are blocked.
  • The entire user account is never blocked by this service.

Exit criteria:

  • The service can identify suspicious session patterns and produce a block action request.

Stage 13 — Integrate Session Blocking with Auth / Session Service

Goal:

  • Make suspicious session handling operational.

Tasks:

  • Define the internal API contract for session blocking.
  • Implement the client toward Auth / Session Service.
  • Ensure block requests are idempotent.
  • Record block requests and outcomes locally for inspection.
  • Add retry or failure-handling policy for temporary downstream failures.
  • Add metrics for block attempts, successes, and failures.

Exit criteria:

  • Geo Profile Service can request blocking of suspicious sessions and track the result.

Stage 14 — Implement Declared Country Version History

Goal:

  • Add versioned history of declared_country inside Geo Profile Service.

Tasks:

  • Define the version record schema.
  • Persist all approved changes as immutable version records.
  • Add actor metadata needed for internal audit:
    • who triggered the change
    • when it happened
    • optional reason or comment
  • Implement version lifecycle state if adopted:
    • recorded
    • applied
    • sync_failed
  • Add read support for history in admin APIs.

Important constraint:

  • Version history is owned only by Geo Profile Service.

Exit criteria:

  • The service can preserve the full change history independently from User Service.

Stage 15 — Implement Current Country Sync to User Service

Goal:

  • Keep the latest effective declared_country centralized in User Service.

Tasks:

  • Define the internal REST contract to update current declared_country in User Service.
  • Implement synchronous update from Geo Profile Service.
  • Ensure that a history version does not become effective until the sync succeeds.
  • Implement failure handling and status persistence when sync fails.
  • Add retry tooling or operator visibility for failed syncs.

Design requirement:

  • No other service should bypass this write path.

Exit criteria:

  • Approved changes update both version history and current user state without silent divergence.

Stage 16 — Build the Internal Read APIs

Goal:

  • Expose the minimum trusted JSON REST API required for operations and admin tooling.

Tasks:

  • Implement review-candidate listing endpoint.
  • Support at least:
    • country_review_recommended=true
    • pagination
    • stable ordering
  • Implement user geo-profile endpoint.
  • Group returned data by device_session_id.
  • Include:
    • review flag
    • per-session ranked countries
    • usual_connection_country
    • observation summaries
    • declared country history
    • block-action history if useful
  • Add authentication and authorization appropriate for trusted internal callers.

Exit criteria:

  • Admin tools can list users for review and inspect full geo-related user state.

Stage 17 — Build the Internal Command API for Country Change Application

Goal:

  • Expose the internal command path for approved declared_country changes.

Tasks:

  • Implement the trusted internal command endpoint.
  • Accept the approved new country and actor metadata.
  • Write the new version record.
  • Synchronize current value into User Service.
  • Return success only if the change is fully applied.
  • Return a recoverable failure state if sync fails.

Clarification:

  • Public user-facing request creation is outside this service boundary unless explicitly added later.
  • This command API is for internal orchestration of approved changes.

Exit criteria:

  • Admin or internal orchestration can apply a country change through one controlled path.

Stage 18 — Add Admin-Oriented Data Shaping

Goal:

  • Make the returned data useful for manual decisions without overloading the API consumer.

Tasks:

  • Shape user geo-profile responses around manual review needs.
  • Include compact ranked-country views per session.
  • Include enough timestamps to understand temporal drift.
  • Include current review recommendation state.
  • Include declared-country version chain in a readable order.
  • Avoid leaking unnecessary low-level network data.

Exit criteria:

  • The admin interface can render useful country history and session separation without extra joins.

Stage 19 — Add Observability and Operational Controls

Goal:

  • Make the service operable in production before traffic ramps up.

Tasks:

  • Add metrics for every critical path:
    • ingest publish receipt
    • queue depth and lag
    • worker throughput
    • Geo-IP lookup failures
    • ranking updates
    • review-flag transitions
    • block requests
    • user-service sync failures
    • mail and event failures
  • Add structured logs with correlation identifiers where possible.
  • Add readiness and liveness endpoints.
  • Add dashboards and alerts for:
    • queue lag
    • persistent sync failures
    • spike in suspicious session blocks
    • Geo-IP database stale age

Exit criteria:

  • Production operation does not depend on manual log-grepping.

Stage 20 — Add Test Coverage in Increasing Layers

Goal:

  • Validate the service incrementally, from pure logic up to full integration.

Tasks:

  • Add unit tests for:
    • Geo-IP lookup adapter
    • ranking logic
    • usual_connection_country selection
    • review recommendation logic
    • suspicious session detection
  • Add storage tests for:
    • observation persistence
    • version history
    • queue behavior
  • Add integration tests for:
    • edge-style ingest acceptance
    • worker processing
    • User Service sync behavior
    • Auth / Session Service block calls
    • event and mail side effects
  • Add failure-path tests:
    • malformed FlatBuffers payload
    • queue retry
    • Geo-IP lookup miss
    • User Service sync failure
    • block-request downstream failure

Exit criteria:

  • The highest-risk logic and all external integrations are covered.

Stage 21 — Add Data Migration and Backfill Strategy

Goal:

  • Prepare for safe rollout in an existing microservice environment.

Tasks:

  • Create initial database migrations.
  • Define zero-data bootstrap behavior for new users and sessions.
  • Define how existing users with already populated declared_country in User Service appear in Geo Profile Service before any version history exists.
  • Decide whether an initial synthetic version record is needed for current production users.
  • Add operational scripts for repair and backfill if required.

Exit criteria:

  • The service can be introduced without corrupting current user country state.

Stage 22 — Roll Out in Shadow Mode

Goal:

  • Validate the service behavior before relying on its outputs operationally.

Tasks:

  • Deploy Geo Profile Service without enabling admin actions or session blocking.
  • Publish ingest data from edge asynchronously.
  • Process observations and compute derived state silently.
  • Observe queue behavior, lookup correctness, score stability, and storage growth.
  • Compare resulting data shape against expected real traffic behavior.
  • Tune thresholds for:
    • review recommendation
    • suspicious mixed-country detection
    • score decay

Exit criteria:

  • The service behaves sanely on production-shaped traffic without affecting users.

Stage 23 — Enable Review Workflow

Goal:

  • Turn on the first real consumer-facing internal functionality.

Tasks:

  • Enable review-candidate listing in the admin interface.
  • Enable user geo-profile rendering.
  • Enable approved country-change application path.
  • Keep session blocking disabled if needed for a staged rollout.
  • Verify that User Service stays consistent with declared-country version history.

Exit criteria:

  • Administrators can inspect users and apply country changes safely.

Stage 24 — Enable Suspicious Session Blocking

Goal:

  • Turn on the account-protection part of the service.

Tasks:

  • Enable session-block command emission to Auth / Session Service.
  • Start with conservative thresholds.
  • Monitor false positives closely.
  • Add temporary operational kill-switches for the detection path.
  • Verify that only suspicious sessions are blocked and not entire accounts.

Exit criteria:

  • The service can protect accounts without destabilizing the rest of the platform.

Stage 25 — Stabilize and Simplify

Goal:

  • Remove accidental complexity after the first complete iteration.

Tasks:

  • Review actual queue backlog behavior.
  • Review observation retention cost.
  • Review whether optional hashed IP storage is still unnecessary.
  • Review scoring tunability versus implementation complexity.
  • Remove dead code and speculative abstractions.
  • Freeze the v1 API once real consumers are stable.

Exit criteria:

  • The service remains small, understandable, and aligned with its original narrow purpose.

Delivery Sequence Summary

Recommended delivery order:

  • Persistence stack and backend assignment
  • Domain vocabulary and ownership
  • Domain model
  • FlatBuffers schema
  • Async ingest transport
  • Internal durable queue
  • Geo-IP lookup
  • Observation persistence
  • Session ranking
  • usual_connection_country
  • Review state
  • Event and mail notifications
  • Suspicious-session detection
  • Session blocking integration
  • Declared-country versioning
  • Sync to User Service
  • Admin read API
  • Country-change command API
  • Observability
  • Tests
  • Shadow rollout
  • Review enablement
  • Blocking enablement
  • Cleanup

Final Acceptance Criteria

The implementation may be considered complete for v1 when all of the following are true:

  • Edge Service publishes authenticated country observations asynchronously without affecting request processing.
  • Geo Profile Service resolves and stores observed_country.
  • The service maintains per-device_session_id country ranking and usual_connection_country.
  • country_review_recommended is durable, queryable, and not event-dependent.
  • Admin tooling can fetch review candidates and per-user geo profiles.
  • Approved declared_country changes are versioned in Geo Profile Service and synchronized into User Service.
  • Suspicious sessions can be blocked through Auth / Session Service.
  • Optional email and event notifications work without becoming correctness dependencies.
  • The service is observable and operable under real traffic.