docs: geoprofile service
This commit is contained in:
@@ -0,0 +1,679 @@
|
||||
# Implementation Plan for Geo Profile Service
|
||||
|
||||
## Planning Principles
|
||||
|
||||
This plan is aligned with the agreed architecture and is written for an experienced developer implementing an internal microservice in a trusted environment.
|
||||
|
||||
Execution priorities:
|
||||
|
||||
- Keep the edge path non-blocking.
|
||||
- Keep the service boundary narrow.
|
||||
- Build append/update-only ingest first.
|
||||
- Preserve clear ownership split with `User Service`.
|
||||
- Defer threshold tuning until after the basic data model is working.
|
||||
- Avoid unnecessary infrastructure on the first iteration.
|
||||
|
||||
## Stage 01 — Freeze Service Vocabulary and Contracts
|
||||
|
||||
Goal:
|
||||
|
||||
- Remove naming ambiguity before any implementation begins.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Choose the final service name used in repository, configuration, and docs.
|
||||
- Freeze the country-related domain terms:
|
||||
- `declared_country`
|
||||
- `observed_country`
|
||||
- `usual_connection_country`
|
||||
- `country_review_recommended`
|
||||
- Freeze cross-service ownership rules.
|
||||
- Write a short internal ADR describing why the latest `declared_country` lives in `User Service` while version history lives in Geo Profile Service.
|
||||
- Write a short internal ADR describing why the edge path is async FlatBuffers instead of request-response RPC.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- No domain term remains overloaded or unclear.
|
||||
- No service boundary question remains unresolved.
|
||||
|
||||
## Stage 02 — Define the Minimal Domain Model
|
||||
|
||||
Goal:
|
||||
|
||||
- Describe the persistent state before choosing transport or storage details.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define conceptual entities and their relationships.
|
||||
- Freeze mandatory fields for:
|
||||
- country observation
|
||||
- per-session country ranking
|
||||
- review flag state
|
||||
- declared country version history
|
||||
- session block request log
|
||||
- Decide which timestamps are mandatory on each entity.
|
||||
- Decide whether optional hashed IP storage exists at all in v1.
|
||||
- Decide whether declared-country version records need explicit lifecycle state:
|
||||
- `recorded`
|
||||
- `applied`
|
||||
- `sync_failed`
|
||||
|
||||
Recommended minimal entities:
|
||||
|
||||
- `country_observation`
|
||||
- `device_session_country_score`
|
||||
- `user_review_state`
|
||||
- `declared_country_version`
|
||||
- `session_block_action`
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The storage layer can be designed directly from the domain model.
|
||||
- The model reflects all agreed semantics and no extra features.
|
||||
|
||||
## Stage 03 — Design the Ingest Message Schema
|
||||
|
||||
Goal:
|
||||
|
||||
- Freeze the binary contract from `Edge Service` to Geo Profile Service.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Create the FlatBuffers schema for the async ingest message.
|
||||
- Limit message fields to:
|
||||
- `user_id`
|
||||
- `device_session_id`
|
||||
- `ip_address`
|
||||
- Define allowed field types and byte layout.
|
||||
- Define message versioning strategy for future backward-compatible additions.
|
||||
- Decide how schema version is represented.
|
||||
- Define receiver behavior for malformed messages.
|
||||
|
||||
Important constraints:
|
||||
|
||||
- No protobuf wrapper.
|
||||
- No business reply payload.
|
||||
- No external validation of identifiers.
|
||||
- Only schema-level validation on receipt.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- `Edge Service` and Geo Profile Service can generate compatible FlatBuffers code.
|
||||
- Message evolution path exists without breaking v1.
|
||||
|
||||
## Stage 04 — Choose and Implement the Async Ingest Transport
|
||||
|
||||
Goal:
|
||||
|
||||
- Implement the simplest possible binary ingress path that does not behave like normal RPC.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Choose the concrete transport for internal binary publication.
|
||||
- Recommended default:
|
||||
- internal HTTP endpoint
|
||||
- `application/octet-stream`
|
||||
- FlatBuffers body
|
||||
- empty response body
|
||||
- status-only acknowledgement
|
||||
- Implement the receiver endpoint in Geo Profile Service.
|
||||
- Implement an async publisher client in `Edge Service`.
|
||||
- Ensure the edge client publishes out-of-band from the main request execution path.
|
||||
- Ensure the edge ignores publication failures for request progression.
|
||||
- Add metrics for publish attempts, successes, and failures.
|
||||
|
||||
Important note:
|
||||
|
||||
- The edge path must remain operational even if Geo Profile Service is completely unavailable.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The edge can publish authenticated observations without blocking the main API flow.
|
||||
- Transport failures do not change edge business behavior.
|
||||
|
||||
## Stage 05 — Build the Internal Durable Queue
|
||||
|
||||
Goal:
|
||||
|
||||
- Decouple acceptance of ingress messages from their processing.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Select the simplest queue implementation inside the service.
|
||||
- Prefer a durable queue over an in-memory-only queue.
|
||||
- Implement enqueue-on-receive behavior.
|
||||
- Implement worker dequeue behavior.
|
||||
- Define queue item lifecycle:
|
||||
- accepted
|
||||
- processing
|
||||
- processed
|
||||
- failed
|
||||
- Define retry strategy for worker failures.
|
||||
- Define dead-letter or failure-handling strategy if retries are exhausted.
|
||||
- Add queue metrics:
|
||||
- depth
|
||||
- oldest item age
|
||||
- processing rate
|
||||
- failure count
|
||||
|
||||
Recommended starting point:
|
||||
|
||||
- Database-backed queue table or similarly simple durable append structure.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Geo Profile Service can accept messages quickly and process them later.
|
||||
- Worker failures do not lose already accepted work silently.
|
||||
|
||||
## Stage 06 — Add Local Geo-IP Resolution
|
||||
|
||||
Goal:
|
||||
|
||||
- Resolve country from IP locally and cheaply.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Choose the Geo-IP database for v1.
|
||||
- Add a loader for the local country database.
|
||||
- Implement lookup adapter for IP to country.
|
||||
- Define how unknown, invalid, or non-resolvable IPs are handled.
|
||||
- Add a periodic database refresh job.
|
||||
- Add health signals for Geo-IP database presence and age.
|
||||
|
||||
Design constraints:
|
||||
|
||||
- Country only.
|
||||
- No external network lookup during request processing.
|
||||
- No Geo-IP version persistence with each observation.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Workers can resolve country from IP locally.
|
||||
- Geo-IP database refresh is operationally manageable.
|
||||
|
||||
## Stage 07 — Persist Observation Facts
|
||||
|
||||
Goal:
|
||||
|
||||
- Materialize `observed_country` as stored domain facts.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Implement the observation persistence model.
|
||||
- Store at minimum:
|
||||
- `user_id`
|
||||
- `device_session_id`
|
||||
- `observed_country`
|
||||
- observation time
|
||||
- Decide whether observations are stored as full facts, time-bucketed facts, or a hybrid model.
|
||||
- Keep storage bounded and suitable for later aggregation.
|
||||
- Add read support needed for internal recalculation and admin inspection.
|
||||
|
||||
Constraints:
|
||||
|
||||
- Do not turn this into a raw per-request IP audit log.
|
||||
- Prefer country-level facts over low-level network data.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service stores enough observed-country history to support ranking and review.
|
||||
|
||||
## Stage 08 — Implement Per-Session Country Ranking
|
||||
|
||||
Goal:
|
||||
|
||||
- Maintain ranked countries per `device_session_id`.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the initial scoring algorithm using recent activities with decay.
|
||||
- Implement score update on each processed observation.
|
||||
- Persist ranked country scores per `device_session_id`.
|
||||
- Define how ties are handled.
|
||||
- Define how stale scores decay or are compacted over time.
|
||||
- Expose enough state for later admin inspection.
|
||||
|
||||
Important constraints:
|
||||
|
||||
- No active-day model in v1.
|
||||
- No heavy analytics pipeline.
|
||||
- Keep updates cheap enough for continuous background processing.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Each `device_session_id` has a current ranked country list.
|
||||
- Ranking is stable and cheap to update.
|
||||
|
||||
## Stage 09 — Compute usual_connection_country
|
||||
|
||||
Goal:
|
||||
|
||||
- Derive a current per-session representative country from the ranking.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the selection rule for the top country.
|
||||
- Decide whether a minimum score or minimum margin is needed before setting a value.
|
||||
- Persist the current `usual_connection_country` per `device_session_id`.
|
||||
- Add recalculation hooks when session country scores change.
|
||||
- Add tests for common drift scenarios:
|
||||
- one stable country
|
||||
- gradual shift over time
|
||||
- alternating countries
|
||||
- sparse activity
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- `usual_connection_country` can be read directly without recomputing the full score set every time.
|
||||
|
||||
## Stage 10 — Implement Review Recommendation State
|
||||
|
||||
Goal:
|
||||
|
||||
- Persist and expose `country_review_recommended`.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the initial rule that sets the review flag.
|
||||
- Persist review state at user level.
|
||||
- Detect transitions from `false` to `true`.
|
||||
- Ensure repeated writes do not keep re-emitting the same transition indefinitely.
|
||||
- Add API access for reading the flag.
|
||||
- Add background recalculation entry points if the rule changes later.
|
||||
|
||||
Design requirement:
|
||||
|
||||
- Review state must live in storage and be queryable even if event delivery fails.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The flag is durable, queryable, and transition-aware.
|
||||
|
||||
## Stage 11 — Publish Review Events and Optional Email
|
||||
|
||||
Goal:
|
||||
|
||||
- Add auxiliary notifications for review-worthy users.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the event payload for `country_review_recommended=true`.
|
||||
- Implement event publication on transition to `true`.
|
||||
- Implement configuration-driven email notification through `Mail Service`.
|
||||
- Add notification deduplication or transition-only logic to prevent spam.
|
||||
- Add failure metrics for both event publication and mail send.
|
||||
|
||||
Important constraints:
|
||||
|
||||
- The event bus is not the authoritative source of truth.
|
||||
- Email is optional and non-blocking for business correctness.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Review transitions can notify administrators without becoming a dependency for state correctness.
|
||||
|
||||
## Stage 12 — Implement Suspicious Multi-Country Session Detection
|
||||
|
||||
Goal:
|
||||
|
||||
- Detect suspicious short-window cross-country behavior across sessions of the same user.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the initial heuristic for suspicious mixed-country windows.
|
||||
- Decide which session becomes the target of blocking when a conflict appears.
|
||||
- Implement detection logic using stored observations and/or per-session summaries.
|
||||
- Add persistence for suspicion evidence or at least action logs.
|
||||
- Keep the heuristic configurable, not hard-coded deep in the codebase.
|
||||
|
||||
Important constraints:
|
||||
|
||||
- The current triggering request is allowed to continue.
|
||||
- Only suspicious `device_session_id` values are blocked.
|
||||
- The entire user account is never blocked by this service.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service can identify suspicious session patterns and produce a block action request.
|
||||
|
||||
## Stage 13 — Integrate Session Blocking with Auth / Session Service
|
||||
|
||||
Goal:
|
||||
|
||||
- Make suspicious session handling operational.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the internal API contract for session blocking.
|
||||
- Implement the client toward `Auth / Session Service`.
|
||||
- Ensure block requests are idempotent.
|
||||
- Record block requests and outcomes locally for inspection.
|
||||
- Add retry or failure-handling policy for temporary downstream failures.
|
||||
- Add metrics for block attempts, successes, and failures.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Geo Profile Service can request blocking of suspicious sessions and track the result.
|
||||
|
||||
## Stage 14 — Implement Declared Country Version History
|
||||
|
||||
Goal:
|
||||
|
||||
- Add versioned history of `declared_country` inside Geo Profile Service.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the version record schema.
|
||||
- Persist all approved changes as immutable version records.
|
||||
- Add actor metadata needed for internal audit:
|
||||
- who triggered the change
|
||||
- when it happened
|
||||
- optional reason or comment
|
||||
- Implement version lifecycle state if adopted:
|
||||
- `recorded`
|
||||
- `applied`
|
||||
- `sync_failed`
|
||||
- Add read support for history in admin APIs.
|
||||
|
||||
Important constraint:
|
||||
|
||||
- Version history is owned only by Geo Profile Service.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service can preserve the full change history independently from `User Service`.
|
||||
|
||||
## Stage 15 — Implement Current Country Sync to User Service
|
||||
|
||||
Goal:
|
||||
|
||||
- Keep the latest effective `declared_country` centralized in `User Service`.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Define the internal REST contract to update current `declared_country` in `User Service`.
|
||||
- Implement synchronous update from Geo Profile Service.
|
||||
- Ensure that a history version does not become effective until the sync succeeds.
|
||||
- Implement failure handling and status persistence when sync fails.
|
||||
- Add retry tooling or operator visibility for failed syncs.
|
||||
|
||||
Design requirement:
|
||||
|
||||
- No other service should bypass this write path.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Approved changes update both version history and current user state without silent divergence.
|
||||
|
||||
## Stage 16 — Build the Internal Read APIs
|
||||
|
||||
Goal:
|
||||
|
||||
- Expose the minimum trusted JSON REST API required for operations and admin tooling.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Implement review-candidate listing endpoint.
|
||||
- Support at least:
|
||||
- `country_review_recommended=true`
|
||||
- pagination
|
||||
- stable ordering
|
||||
- Implement user geo-profile endpoint.
|
||||
- Group returned data by `device_session_id`.
|
||||
- Include:
|
||||
- review flag
|
||||
- per-session ranked countries
|
||||
- `usual_connection_country`
|
||||
- observation summaries
|
||||
- declared country history
|
||||
- block-action history if useful
|
||||
- Add authentication and authorization appropriate for trusted internal callers.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Admin tools can list users for review and inspect full geo-related user state.
|
||||
|
||||
## Stage 17 — Build the Internal Command API for Country Change Application
|
||||
|
||||
Goal:
|
||||
|
||||
- Expose the internal command path for approved `declared_country` changes.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Implement the trusted internal command endpoint.
|
||||
- Accept the approved new country and actor metadata.
|
||||
- Write the new version record.
|
||||
- Synchronize current value into `User Service`.
|
||||
- Return success only if the change is fully applied.
|
||||
- Return a recoverable failure state if sync fails.
|
||||
|
||||
Clarification:
|
||||
|
||||
- Public user-facing request creation is outside this service boundary unless explicitly added later.
|
||||
- This command API is for internal orchestration of approved changes.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Admin or internal orchestration can apply a country change through one controlled path.
|
||||
|
||||
## Stage 18 — Add Admin-Oriented Data Shaping
|
||||
|
||||
Goal:
|
||||
|
||||
- Make the returned data useful for manual decisions without overloading the API consumer.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Shape user geo-profile responses around manual review needs.
|
||||
- Include compact ranked-country views per session.
|
||||
- Include enough timestamps to understand temporal drift.
|
||||
- Include current review recommendation state.
|
||||
- Include declared-country version chain in a readable order.
|
||||
- Avoid leaking unnecessary low-level network data.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The admin interface can render useful country history and session separation without extra joins.
|
||||
|
||||
## Stage 19 — Add Observability and Operational Controls
|
||||
|
||||
Goal:
|
||||
|
||||
- Make the service operable in production before traffic ramps up.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Add metrics for every critical path:
|
||||
- ingest publish receipt
|
||||
- queue depth and lag
|
||||
- worker throughput
|
||||
- Geo-IP lookup failures
|
||||
- ranking updates
|
||||
- review-flag transitions
|
||||
- block requests
|
||||
- user-service sync failures
|
||||
- mail and event failures
|
||||
- Add structured logs with correlation identifiers where possible.
|
||||
- Add readiness and liveness endpoints.
|
||||
- Add dashboards and alerts for:
|
||||
- queue lag
|
||||
- persistent sync failures
|
||||
- spike in suspicious session blocks
|
||||
- Geo-IP database stale age
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Production operation does not depend on manual log-grepping.
|
||||
|
||||
## Stage 20 — Add Test Coverage in Increasing Layers
|
||||
|
||||
Goal:
|
||||
|
||||
- Validate the service incrementally, from pure logic up to full integration.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Add unit tests for:
|
||||
- Geo-IP lookup adapter
|
||||
- ranking logic
|
||||
- `usual_connection_country` selection
|
||||
- review recommendation logic
|
||||
- suspicious session detection
|
||||
- Add storage tests for:
|
||||
- observation persistence
|
||||
- version history
|
||||
- queue behavior
|
||||
- Add integration tests for:
|
||||
- edge-style ingest acceptance
|
||||
- worker processing
|
||||
- `User Service` sync behavior
|
||||
- `Auth / Session Service` block calls
|
||||
- event and mail side effects
|
||||
- Add failure-path tests:
|
||||
- malformed FlatBuffers payload
|
||||
- queue retry
|
||||
- Geo-IP lookup miss
|
||||
- `User Service` sync failure
|
||||
- block-request downstream failure
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The highest-risk logic and all external integrations are covered.
|
||||
|
||||
## Stage 21 — Add Data Migration and Backfill Strategy
|
||||
|
||||
Goal:
|
||||
|
||||
- Prepare for safe rollout in an existing microservice environment.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Create initial database migrations.
|
||||
- Define zero-data bootstrap behavior for new users and sessions.
|
||||
- Define how existing users with already populated `declared_country` in `User Service` appear in Geo Profile Service before any version history exists.
|
||||
- Decide whether an initial synthetic version record is needed for current production users.
|
||||
- Add operational scripts for repair and backfill if required.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service can be introduced without corrupting current user country state.
|
||||
|
||||
## Stage 22 — Roll Out in Shadow Mode
|
||||
|
||||
Goal:
|
||||
|
||||
- Validate the service behavior before relying on its outputs operationally.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Deploy Geo Profile Service without enabling admin actions or session blocking.
|
||||
- Publish ingest data from edge asynchronously.
|
||||
- Process observations and compute derived state silently.
|
||||
- Observe queue behavior, lookup correctness, score stability, and storage growth.
|
||||
- Compare resulting data shape against expected real traffic behavior.
|
||||
- Tune thresholds for:
|
||||
- review recommendation
|
||||
- suspicious mixed-country detection
|
||||
- score decay
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service behaves sanely on production-shaped traffic without affecting users.
|
||||
|
||||
## Stage 23 — Enable Review Workflow
|
||||
|
||||
Goal:
|
||||
|
||||
- Turn on the first real consumer-facing internal functionality.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Enable review-candidate listing in the admin interface.
|
||||
- Enable user geo-profile rendering.
|
||||
- Enable approved country-change application path.
|
||||
- Keep session blocking disabled if needed for a staged rollout.
|
||||
- Verify that `User Service` stays consistent with declared-country version history.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Administrators can inspect users and apply country changes safely.
|
||||
|
||||
## Stage 24 — Enable Suspicious Session Blocking
|
||||
|
||||
Goal:
|
||||
|
||||
- Turn on the account-protection part of the service.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Enable session-block command emission to `Auth / Session Service`.
|
||||
- Start with conservative thresholds.
|
||||
- Monitor false positives closely.
|
||||
- Add temporary operational kill-switches for the detection path.
|
||||
- Verify that only suspicious sessions are blocked and not entire accounts.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service can protect accounts without destabilizing the rest of the platform.
|
||||
|
||||
## Stage 25 — Stabilize and Simplify
|
||||
|
||||
Goal:
|
||||
|
||||
- Remove accidental complexity after the first complete iteration.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Review actual queue backlog behavior.
|
||||
- Review observation retention cost.
|
||||
- Review whether optional hashed IP storage is still unnecessary.
|
||||
- Review scoring tunability versus implementation complexity.
|
||||
- Remove dead code and speculative abstractions.
|
||||
- Freeze the v1 API once real consumers are stable.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The service remains small, understandable, and aligned with its original narrow purpose.
|
||||
|
||||
## Delivery Sequence Summary
|
||||
|
||||
Recommended delivery order:
|
||||
|
||||
- Domain vocabulary and ownership
|
||||
- Domain model
|
||||
- FlatBuffers schema
|
||||
- Async ingest transport
|
||||
- Internal durable queue
|
||||
- Geo-IP lookup
|
||||
- Observation persistence
|
||||
- Session ranking
|
||||
- `usual_connection_country`
|
||||
- Review state
|
||||
- Event and mail notifications
|
||||
- Suspicious-session detection
|
||||
- Session blocking integration
|
||||
- Declared-country versioning
|
||||
- Sync to `User Service`
|
||||
- Admin read API
|
||||
- Country-change command API
|
||||
- Observability
|
||||
- Tests
|
||||
- Shadow rollout
|
||||
- Review enablement
|
||||
- Blocking enablement
|
||||
- Cleanup
|
||||
|
||||
## Final Acceptance Criteria
|
||||
|
||||
The implementation may be considered complete for v1 when all of the following are true:
|
||||
|
||||
- `Edge Service` publishes authenticated country observations asynchronously without affecting request processing.
|
||||
- Geo Profile Service resolves and stores `observed_country`.
|
||||
- The service maintains per-`device_session_id` country ranking and `usual_connection_country`.
|
||||
- `country_review_recommended` is durable, queryable, and not event-dependent.
|
||||
- Admin tooling can fetch review candidates and per-user geo profiles.
|
||||
- Approved `declared_country` changes are versioned in Geo Profile Service and synchronized into `User Service`.
|
||||
- Suspicious sessions can be blocked through `Auth / Session Service`.
|
||||
- Optional email and event notifications work without becoming correctness dependencies.
|
||||
- The service is observable and operable under real traffic.
|
||||
Reference in New Issue
Block a user