# PostgreSQL Migration Plan This plan has been already implemented and stays here for historical reasons. It should NOT be threated as source of truth for service functionality. ## Context The Galaxy Game project currently uses Redis as the only persistence backend across all implemented services (`user`, `mail`, `notification`, `lobby`, `gateway`, `authsession`). Redis serves both kinds of state: ephemeral and runtime-coordination state (where it shines — Streams, caches, replay keys, runtime queues, session caches, leases) and table-shaped business state where it is a poor fit (durable user accounts, entitlements/sanctions, mail audit records, notification routes/idempotency, lobby memberships and invites). Replication and standby for Redis are not configured anywhere. There is no SQL/migration tooling in the repo at all. We migrate to a Redis + PostgreSQL split where each backend owns the data it serves best. PostgreSQL becomes the source of truth for table-shaped business state, gives us ACID transactions, mature physical/logical replication, and backup/restore via `pg_dump` and WAL archiving. Redis remains the source of truth for streams, pub/sub, caches, leases, replay keys, rate limits, session caches, runtime queues, and stream consumer offsets. The plan migrates only services already implemented and explicitly excludes `galaxy/game`. It targets steady-state architecture rules first (one authoritative document, `ARCHITECTURE.md`), then walks each service end to end — code, tests, service-local README/docs, and integration suites — so that no intermediate commit leaves docs and code in conflict. ## Confirmed decisions (with project owner) 1. **Documentation strategy**: `ARCHITECTURE.md` is updated as the very first stage with the architecture-wide rules. Each per-service README and per- service `docs/` change inside that service's own stage, paired with code and tests. This keeps `ARCHITECTURE.md` ≡ policy, README ≡ current state, and ensures any commit can be checked out without code/doc divergence. 2. **Service scope**: full migration of durable storage to PostgreSQL for `user`, `mail`, `notification`, `lobby`. Only Redis configuration refactor (master/replica + mandatory password, drop `TLS_ENABLED` / `USERNAME`) for `gateway` and `authsession` — these services intentionally stay Redis- only. `geoprofile` has no implementation; its `PLAN.md` and `README.md` absorb the new persistence rules so future implementation follows them. 3. **Idempotency and retry-schedule placement**: idempotency records and retry schedule queues live in PostgreSQL on the same table as the durable record they protect (`(producer, idempotency_key)` UNIQUE on `records`, `next_attempt_at` column on `deliveries` / `routes`). One source of truth, no dual-write hazard between PG and Redis ZSETs. 4. **Stack**: `github.com/jackc/pgx/v5` driver, exposed as `*sql.DB` via `github.com/jackc/pgx/v5/stdlib`. `github.com/go-jet/jet/v2` for type-safe query building + code generation, generated against a testcontainers PostgreSQL instance with migrations applied (Makefile target per service). `github.com/pressly/goose/v3` library API for embedded migrations applied at service startup; the `goose` CLI may be used for local development and rollback investigations but is not in the service binary path. 5. **Code**: all postgres queries must use pre-generated code with `jet` and appropriate builders rather than raw SQL queries, unless this usage cannot achive the goal of businness-scenario due to lack of `go-jet` functionality. ## Architectural rules (target steady-state) These rules land in `ARCHITECTURE.md` in Stage 0 and govern every subsequent service stage. ### Backend assignment PostgreSQL is the source of truth for: - Domain entities with table-shaped business state (`accounts`, `entitlement_records`, `sanction_records`, `limit_records`, `blocked_emails`, `deliveries`, `attempts`, `dead_letters`, `malformed_commands`, `notification_records`, `notification_routes`, `games`, `applications`, `invites`, `memberships`, `race_names`). - Idempotency records (UNIQUE constraint on the durable table, not a separate kv). - Retry scheduling state (`next_attempt_at` column + supporting index on the durable table). - Audit history records that must outlive any Redis snapshot. Redis is the source of truth for: - Redis Streams used as the event bus (`user:domain_events`, `user:lifecycle_events`, `gm:lobby_events`, `runtime:job_results`, `notification:intents`, `gateway:client-events`, `mail:delivery_commands`). - Stream consumer offsets (small runtime coordination state, rebuildable). - Caches and projections (gateway session cache). - Replay reservation keys. - Rate limit counters. - Runtime coordination locks/leases (e.g. notification `route_leases`). - Authentication challenge state and active session tokens (TTL-bounded; loss is recoverable by re-authentication). - Ephemeral per-game runtime aggregates that are deleted at game finish (lobby `game_turn_stats`, `gap_activated_at`, capability evaluation marker). ### Database topology - Single PostgreSQL database `galaxy`. - Schema-per-service: `user`, `mail`, `notification`, `lobby`. Reserved for later: `geoprofile`. Not allocated unless needed: `gateway`, `authsession`. - Per-service PostgreSQL role with grants restricted to its own schema (defense-in-depth, simple to express in the initial migration). - Authentication: username + password only. `sslmode=disable`. No client certificates, no SCRAM channel binding, no custom auth plugins. - Each service connects to one primary plus zero-or-more read-only replicas. In this iteration only the primary is used; the replica pool is wired but receives no traffic. Future read-routing is non-breaking. ### Redis topology - Each service connects to one master Redis plus zero-or-more replica Redis hosts. - All connections use a mandatory password. `USERNAME`/ACL not used. TLS off. - In this iteration only the master is used; the replica list is wired but unused — non-breaking switch later when the app starts routing reads. - Existing env vars `*_REDIS_TLS_ENABLED`, `*_REDIS_USERNAME` are removed (hard rename; no backward-compat shim — fresh project, no production deploys to migrate). ### Library stack - Driver: `github.com/jackc/pgx/v5` (modern, actively maintained), exposed to `database/sql` via `github.com/jackc/pgx/v5/stdlib` so go-jet's `qrm.Queryable` interface is satisfied without changes. - Query layer: `github.com/go-jet/jet/v2` (PostgreSQL dialect). Generated code lives under each service `internal/adapters/postgres/jet/`, regenerated via a `make jet` target and committed to the repo. - Migrations: `github.com/pressly/goose/v3` library API; migration files embedded via `//go:embed *.sql`; applied at startup, before opening any HTTP/gRPC listener; non-zero exit on failure. - Test infrastructure: `github.com/testcontainers/testcontainers-go` plus the `modules/postgres` submodule; the same setup is reused by `make jet` to host a transient instance for jet codegen. ### Migration discipline - Forward-only sequence-numbered files: `00001_init.sql`, `00002_*.sql`, … - Lowercase snake_case names; goose `-- +goose Up` / `-- +goose Down` markers; statements that need transaction-wrapping use `-- +goose StatementBegin` / `-- +goose StatementEnd`. - Migrations apply at service startup; service exits non-zero on failure. - Per-service decision record at `galaxy//docs/postgres-migration.md` captures schema decisions and any non-trivial deviation from the rules. ### Per-service code organisation ```text galaxy// internal/ adapters/ postgres/ migrations/ # *.sql files + migrations.go (//go:embed) jet/ # generated; commit-checked / # adapter implementations matching internal/ports config/ config.go # adds Postgres + new Redis schema Makefile # `jet` target: testcontainers + goose + jet ``` ### Test patterns - Per-service unit tests against a real PostgreSQL via `testcontainers-go`; replace the corresponding miniredis test path where storage moved to PG. - Shared port-test suites (e.g. `lobby/internal/ports/racenamedirtest/`) gain a Postgres harness; they remain backend-agnostic in shape. - `integration/internal/harness/postgres_container.go` is added; integration suites that need PG declare it next to their existing Redis container. - Stub adapters (`*stub/`) are kept where the in-memory port is useful for tests that don't need a real backend. Redis adapters that previously implemented these ports are removed (no dead code). ### Configuration env vars (target) For each service `` ∈ { `USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`, `GATEWAY`, `AUTHSESSION` }: - `_REDIS_MASTER_ADDR` (required) - `_REDIS_REPLICA_ADDRS` (optional, comma-separated; default empty) - `_REDIS_PASSWORD` (required) - `_REDIS_DB` (default 0) - `_REDIS_OPERATION_TIMEOUT` (default 250ms) For PG-backed services (`USERSERVICE`, `MAIL`, `NOTIFICATION`, `LOBBY`): - `_POSTGRES_PRIMARY_DSN` (required; e.g. `postgres://userservice:secret@postgres:5432/galaxy?search_path=user&sslmode=disable`) - `_POSTGRES_REPLICA_DSNS` (optional, comma-separated) - `_POSTGRES_OPERATION_TIMEOUT` (default 1s) - `_POSTGRES_MAX_OPEN_CONNS` (default 25) - `_POSTGRES_MAX_IDLE_CONNS` (default 5) - `_POSTGRES_CONN_MAX_LIFETIME` (default 30m) DSN sets `search_path=` so unqualified table references resolve into the service-owned schema; `sslmode=disable` is set explicitly per the "no TLS" requirement. Service-prefix-specific stream/keyspace env vars (`*_REDIS_DOMAIN_EVENTS_STREAM`, `*_REDIS_LIFECYCLE_EVENTS_STREAM`, `*_REDIS_KEYSPACE_PREFIX`, `MAIL_REDIS_COMMAND_STREAM`, etc.) keep their current names and semantics — they describe stream/key shapes, not connection topology. --- ## Stages Each stage is independently executable and shippable. ### ~~Stage 0~~ — Architecture-wide rules and PG_PLAN.md materialisation This stage is implemented. **Goal**: land the steady-state rules in `ARCHITECTURE.md` and place `PG_PLAN.md` at the project root so subsequent `/stage-implementation` invocations have an authoritative reference. **Actions**: 1. Write the contents of this plan file to `/Users/id/src/go/galaxy/PG_PLAN.md`. 2. Add a new section to `ARCHITECTURE.md` (e.g. `§9 Persistence Backends`) capturing every rule under the *Architectural rules* heading above: backend assignment, database/Redis topology, library stack, migration discipline, code organisation, test patterns, env-var conventions. 3. Add a short *Migration Window* sub-section to `ARCHITECTURE.md` noting that until all `PG_PLAN.md` stages complete, each service's `README.md` continues to describe its actual current state — this caveat is removed in Stage 9. 4. Adjust `ARCHITECTURE.md §8` (publisher rules) so cross-references distinguish "Redis Stream" (event bus, stays Redis) from "PG-backed table" (durable record). **Files (modified / new)**: - `/Users/id/src/go/galaxy/PG_PLAN.md` — new - `/Users/id/src/go/galaxy/ARCHITECTURE.md` — modified **Out of scope**: zero service code, zero per-service README/docs, zero `go.mod` changes, zero new dependencies in service modules. **Verification**: - `git diff --stat` reports two paths only: `PG_PLAN.md`, `ARCHITECTURE.md`. - `ARCHITECTURE.md` reads coherently end to end, with the new section cross-referenced from §8 and from any other place that today says "Redis is the v1 backend". - Manual: read `PG_PLAN.md` top to bottom, confirm every architectural decision matches the section in `ARCHITECTURE.md`. --- ### ~~Stage 1~~ — Shared infrastructure packages (`pkg/postgres`, `pkg/redisconn`) This stage is implemented. **Goal**: provide one canonical helper each for Postgres and Redis so per- service stages don't reinvent connection/migration wiring. No service consumes them yet. **Files (new)**: - `pkg/postgres/config.go` — `Config` struct (PrimaryDSN, ReplicaDSNs, OperationTimeout, MaxOpenConns, MaxIdleConns, ConnMaxLifetime); helper `LoadFromEnv(prefix string) (Config, error)` that reads `_POSTGRES_*`. - `pkg/postgres/open.go` — `OpenPrimary(ctx, cfg) (*sql.DB, error)` and `OpenReplicas(ctx, cfg) ([]*sql.DB, error)` using `pgx.ConnConfig` → `stdlib.OpenDB(...)`; configures pool sizes and per-statement context timeout. - `pkg/postgres/migrate.go` — `RunMigrations(ctx context.Context, db *sql.DB, fs embed.FS) error` wrapping `goose.SetBaseFS(fs)` + `goose.UpContext`. - `pkg/postgres/otel.go` — `Instrument(db *sql.DB, telemetry telemetry.Runtime)` applying `otelsql.RegisterDBStatsMetrics` and statement spans. - `pkg/postgres/postgres_test.go` — testcontainers-backed smoke test: open primary, run a one-line migration, insert/select. - `pkg/redisconn/config.go` — `Config` struct (MasterAddr, ReplicaAddrs, Password, DB, OperationTimeout); helper `LoadFromEnv(prefix string) (Config, error)` that reads `_REDIS_*` (the new shape only; rejects deprecated TLS/USERNAME vars with a clear error). - `pkg/redisconn/client.go` — `NewMasterClient(cfg) *redis.Client` and `NewReplicaClients(cfg) []*redis.Client` (latter returns nil/empty when replicas not configured). - `pkg/redisconn/otel.go` — `Instrument(client *redis.Client, telemetry telemetry.Runtime)` applying `redisotel.InstrumentTracing` / `InstrumentMetrics`. - `pkg/redisconn/redisconn_test.go` — miniredis-backed config and master client tests. **Files (touched)**: - `pkg/go.mod` — add `github.com/jackc/pgx/v5`, `github.com/jackc/pgx/v5/stdlib`, `github.com/pressly/goose/v3`, `github.com/testcontainers/testcontainers-go/modules/postgres`, `github.com/XSAM/otelsql` (for db instrumentation; alternative: `go.nhat.io/otelsql` — pick one in implementation). - `go.work` — confirm `pkg/` is registered (already is). **Verification**: - `cd /Users/id/src/go/galaxy/pkg && go test ./postgres/... ./redisconn/...` passes locally with Docker available. - `go vet ./...` clean. --- ### ~~Stage 2~~ — Integration test harness extension This stage is implemented. **Goal**: extend `integration/internal/harness/` with a Postgres container helper and a service-bootstrap helper that builds the per-service DSN with the right `search_path`. All existing integration suites stay green. **Files (new)**: - `integration/internal/harness/postgres_container.go` — `StartPostgresContainer(t testing.TB) *PostgresRuntime`. The runtime exposes `BaseDSN()`, `DSNForSchema(schema, role string) string`, and `EnsureRoleAndSchema(ctx, schema, role, password string) error` so each test can prepare an isolated schema for the service it is booting. - `integration/internal/harness/postgres_container_test.go` — smoke test. **Files (touched)**: - `integration/internal/harness/binary.go` — extend `Process`/launch helpers with `WithPostgres(rt *PostgresRuntime, schema, role string)` that injects the right `_POSTGRES_PRIMARY_DSN`. (Existing API already takes `env map[string]string`; this is a thin wrapper.) - `integration/go.mod` — add the testcontainers Postgres module. **Out of scope**: no integration suite is yet wired to Postgres; each service stage wires in its suites. **Verification**: - `cd integration && go test ./internal/harness/...` passes. - `cd integration && go test ./...` still green for all existing suites (Redis-only services remain Redis-only). --- ### ~~Stage 3~~ — User Service migration (pilot) **Goal**: replace User Service's Redis durable storage with PostgreSQL. The two Redis Streams (`user:domain_events`, `user:lifecycle_events`) remain on Redis. This stage is the pilot; subsequent service stages copy its shape. **Schema (`user` schema)**: - `accounts` (user_id PK, email UNIQUE, user_name UNIQUE, display_name, preferred_language, time_zone, declared_country, created_at, updated_at, deleted_at). - `blocked_emails` (email PK, reason_code, blocked_at, actor_type, actor_id, resolved_user_id). - `entitlement_records` (record_id PK, user_id FK, plan_code, is_paid, starts_at, ends_at, source, actor_type, actor_id, reason_code, updated_at). - `entitlement_snapshots` (user_id PK FK → accounts, …current effective values mirroring Redis snapshot shape). - `sanction_records` (record_id PK, user_id FK, sanction_code, scope, reason_code, actor_type, actor_id, applied_at, expires_at, removed_at, removed_by_type, removed_by_id, removed_reason_code). - `sanction_active` (user_id, sanction_code, record_id) PRIMARY KEY (user_id, sanction_code). - `limit_records`, `limit_active` — analogous to sanctions. - Indexes: `accounts(created_at DESC, user_id DESC)` for newest-first pagination; `accounts(declared_country)`; `entitlement_snapshots(plan_code, is_paid)`; `entitlement_snapshots(ends_at) WHERE is_paid AND ends_at IS NOT NULL`; `sanction_active(sanction_code)`; `limit_active(limit_code)`. Eligibility flags become computed predicates on these columns. **Files (new)**: - `galaxy/user/internal/adapters/postgres/migrations/00001_init.sql` — full schema with grants (`GRANT USAGE ON SCHEMA user TO userservice; GRANT … ON ALL TABLES …;`). - `galaxy/user/internal/adapters/postgres/migrations/migrations.go` — `//go:embed *.sql` and a `Migrations() embed.FS` accessor. - `galaxy/user/internal/adapters/postgres/jet/...` — generated code (commit-checked). - `galaxy/user/internal/adapters/postgres/userstore/store.go` — Postgres implementation of `ports.UserAccountStore` and `ports.AuthDirectoryStore`. - `galaxy/user/internal/adapters/postgres/userstore/entitlement_store.go` — Postgres implementation of `EntitlementSnapshotStore` and `EntitlementHistoryStore`. - `galaxy/user/internal/adapters/postgres/userstore/policy_store.go` — Postgres implementation of `SanctionStore` and `LimitStore`. - `galaxy/user/internal/adapters/postgres/userstore/list_store.go` — Postgres implementation of `UserListStore` (pagination + filters expressed as SQL). - `galaxy/user/internal/adapters/postgres/userstore/store_test.go` and siblings — testcontainers-backed unit tests covering the same matrix the current Redis tests cover. - `galaxy/user/Makefile` — `jet` target. - `galaxy/user/docs/postgres-migration.md` — decision record (schema shape, why we keep `entitlement_snapshots` denormalised, eligibility expressed as SQL predicates, schema role grants). **Files (touched)**: - `galaxy/user/internal/config/config.go` — add Postgres config; refactor Redis config to master/replica/password (drop `TLS_ENABLED`, `USERNAME`). - `galaxy/user/internal/config/config_test.go` — update to new env shape. - `galaxy/user/internal/app/runtime.go` — open Postgres pool, run migrations on startup before listeners open, wire postgres adapters into services. Redis client now serves only the two stream publishers. - `galaxy/user/README.md` — replace "Redis-backed user state" with the new persistence model, update env-var section. - `galaxy/user/docs/runbook.md`, `galaxy/user/docs/runtime.md`, `galaxy/user/docs/examples.md` — update storage references and config sections. - `galaxy/user/go.mod` — add `github.com/jackc/pgx/v5{,/stdlib}`, `github.com/pressly/goose/v3`, `github.com/go-jet/jet/v2`, `github.com/testcontainers/testcontainers-go/modules/postgres`. Use `pkg/postgres`, `pkg/redisconn`. **Files (deleted)**: - `galaxy/user/internal/adapters/redis/userstore/` — entire directory. - The portions of `galaxy/user/internal/adapters/redisstate/keyspace.go` that defined account/entitlement/sanction/limit/index keys (keep only what `domainevents` and `lifecycleevents` publishers still require — if none, delete the file outright). **Files retained on Redis**: - `galaxy/user/internal/adapters/redis/domainevents/publisher.go`. - `galaxy/user/internal/adapters/redis/lifecycleevents/publisher.go`. **Touched integration suites** (each gets a Postgres container in addition to the existing Redis one): - `integration/authsessionuser/` - `integration/gatewayauthsessionuser/` - `integration/gatewayauthsessionusermail/` - `integration/notificationuser/` - `integration/lobbyuser/` **Verification**: - `cd galaxy/user && make jet && go test ./...` (Docker needed). - `cd integration && go test ./authsessionuser/... ./gatewayauthsessionuser/... ./gatewayauthsessionusermail/... ./notificationuser/... ./lobbyuser/...` - Manual smoke against a `docker-compose` stack (PG + Redis with passwords) using flows from `galaxy/user/docs/examples.md`. --- ### ~~Stage 4~~ — Mail Service migration This stage is implemented. **Goal**: move durable mail storage (deliveries, attempts, dead letters, malformed commands, payloads, idempotency, attempt schedule) into PostgreSQL. Keep Redis only for the inbound `mail:delivery_commands` stream and its consumer offset. **Schema (`mail` schema)**: - `deliveries` (delivery_id PK, source, status, recipient_envelope JSONB, subject, text_body, html_body, payload_mode, template_id, idempotency_source, idempotency_key, locale_fallback_used, next_attempt_at, attempt_count, max_attempts, created_at, updated_at). - INDEX (status, next_attempt_at) for the scheduler. - UNIQUE (idempotency_source, idempotency_key) — the idempotency record IS this row (no separate kv). - INDEX (created_at DESC) for operator listings; INDEX on status, source, template_id, recipient as needed. - `attempts` (delivery_id FK, attempt_no, status, provider_summary, scheduled_for_ms, started_at_ms, completed_at_ms, PRIMARY KEY (delivery_id, attempt_no)). - `dead_letters` (delivery_id PK FK, final_attempt_count, max_attempts, failure_classification, failure_message, created_at_ms). - `delivery_payloads` (delivery_id PK FK, template_variables JSONB). - `malformed_commands` (stream_entry_id PK, failure_code, failure_message, raw_fields JSONB, recorded_at_ms; INDEX created_at). **Files**: mirror Stage 3 (postgres adapter package, migrations, jet codegen, Makefile, decision record, removal of corresponding `internal/adapters/redisstate/*` files for migrated entities, retention of stream offset and consumer wiring on Redis). **Worker change**: the mail attempt scheduler loop replaces `ZRANGEBYSCORE` over `mail:attempt_schedule` with `SELECT … FROM deliveries WHERE status IN ('queued','retry_pending') AND next_attempt_at <= now() ORDER BY next_attempt_at LIMIT N FOR UPDATE SKIP LOCKED`. **Files (deleted)**: - `galaxy/mail/internal/adapters/redisstate/auth_acceptance_store.go` - `galaxy/mail/internal/adapters/redisstate/generic_acceptance_store.go` - `galaxy/mail/internal/adapters/redisstate/attempt_execution_store.go` - `galaxy/mail/internal/adapters/redisstate/operator_store.go` - `galaxy/mail/internal/adapters/redisstate/malformed_command_store.go` - `galaxy/mail/internal/adapters/redisstate/render_store.go` - The portions of `galaxy/mail/internal/adapters/redisstate/keyspace.go` no longer used (`mail:attempt_schedule`, `mail:idempotency:*`, all delivery/attempt/dead-letter/index keys). **Files retained on Redis**: - `galaxy/mail/internal/adapters/redisstate/stream_offset_store.go` (offset for `mail:delivery_commands` consumer). - The command stream consumer wiring itself. **Touched integration suites**: - `integration/authsessionmail/` - `integration/gatewayauthsessionmail/` - `integration/gatewayauthsessionusermail/` - `integration/notificationmail/` **Verification**: per Stage 3 pattern; plus end-to-end smoke that pushes a delivery through retry_pending → provider_accepted using the SMTP stub. --- ### ~~Stage 5~~ — Notification Service migration This stage is implemented. **Goal**: move durable notification storage (records, routes, idempotency, dead letters, malformed intents) into PostgreSQL. Keep Redis for the inbound `notification:intents` stream, the outbound `gateway:client-events` stream, the outbound `mail:delivery_commands` stream, the corresponding stream offsets, and the short-lived per-route lease (`route_leases:*`). **Schema (`notification` schema)**: - `records` (notification_id PK, notification_type, producer, audience_kind, recipient_user_ids JSONB, payload JSONB, idempotency_key, request_fingerprint, request_id, trace_id, occurred_at_ms, accepted_at_ms, updated_at_ms). - UNIQUE (producer, idempotency_key) — idempotency record IS this row. - `routes` (notification_id, route_id, channel, recipient_ref, status, attempt_count, max_attempts, next_attempt_at_ms, resolved_email, resolved_locale, last_error_classification, last_error_message, last_error_at_ms, created_at_ms, updated_at_ms, published_at_ms, dead_lettered_at_ms, skipped_at_ms, PRIMARY KEY (notification_id, route_id)). - INDEX (status, next_attempt_at_ms) for the scheduler. - `dead_letters` (notification_id, route_id PK FK, channel, recipient_ref, final_attempt_count, max_attempts, failure_classification, failure_message, recovery_hint, created_at_ms). - `malformed_intents` (stream_entry_id PK, notification_type, producer, idempotency_key, failure_code, failure_message, raw_fields JSONB, recorded_at_ms). **Worker change**: route publisher selects work via the same `FOR UPDATE SKIP LOCKED` pattern as Mail. The Redis lease is still used as a short-lived, per-process exclusivity hint atop the SQL claim. **Files (deleted)**: - `galaxy/notification/internal/adapters/redisstate/acceptance_store.go` - `galaxy/notification/internal/adapters/redisstate/route_state_store.go` - `galaxy/notification/internal/adapters/redisstate/malformed_intent_store.go` - The portions of `galaxy/notification/internal/adapters/redisstate/keyspace.go` no longer used (records, routes, idempotency, dead_letters, malformed_intents). **Files retained on Redis**: - `galaxy/notification/internal/adapters/redisstate/stream_offset_store.go`. - Route lease key generator (still under `redisstate/`, narrowed to leases only). - All stream consumer/publisher wiring. **Touched integration suites**: - `integration/notificationgateway/` - `integration/notificationmail/` - `integration/notificationuser/` --- ### ~~Stage 6A~~ — Lobby Service: core enrollment entities **Goal**: move `Game`, `Application`, `Invite`, `Membership` records and their indexes into PostgreSQL. RaceNameDirectory, GameTurnStats, GapActivation, EvaluationGuard, StreamOffset remain on Redis until later sub-stages. **Schema (`lobby` schema, partial)**: - `games` (game_id PK, owner_id, kind ('public'|'private'), status, created_at, updated_at, runtime_snapshot JSONB, runtime_binding JSONB, …other denormalised game settings). - INDEX (status, created_at). - INDEX (owner_id) WHERE kind = 'private'. - `applications` (application_id PK, game_id FK, user_id, status, canonical_key, submitted_at, decided_at). - PARTIAL UNIQUE INDEX (user_id, game_id) WHERE status = 'active' — enforces the single-active constraint at the DB level (replaces `lobby:user_game_application:*:*`). - INDEX (game_id), INDEX (user_id). - `invites` (invite_id PK, game_id FK, inviter_id, invitee_id, race_name, status, created_at, expires_at, decided_at). - INDEX (game_id), INDEX (invitee_id), INDEX (inviter_id). - INDEX (status, expires_at) for any expiration scanner if needed. - `memberships` (membership_id PK, game_id FK, user_id, status, joined_at, canonical_key, …). - INDEX (game_id), INDEX (user_id). **Files (new)**: - `galaxy/lobby/internal/adapters/postgres/migrations/00001_core_entities.sql`. - `galaxy/lobby/internal/adapters/postgres/migrations/migrations.go`. - `galaxy/lobby/internal/adapters/postgres/jet/...`. - `galaxy/lobby/internal/adapters/postgres/gamestore/store.go`. - `galaxy/lobby/internal/adapters/postgres/applicationstore/store.go`. - `galaxy/lobby/internal/adapters/postgres/invitestore/store.go`. - `galaxy/lobby/internal/adapters/postgres/membershipstore/store.go`. - Test files for each store using the existing test patterns. - `galaxy/lobby/Makefile` (`jet` target). - `galaxy/lobby/docs/postgres-migration.md` (decision record covering this sub-stage and what is intentionally left for 6B/6C). **Files (touched)**: - `galaxy/lobby/internal/config/config.go` — add Postgres config; refactor Redis config to the new shape. - `galaxy/lobby/internal/app/runtime.go` — open Postgres pool, run migrations on startup, wire core PG-backed stores into services. RaceNameDirectory and stats/guard stores still wired to Redis until 6B/6C. - `galaxy/lobby/README.md` and `galaxy/lobby/docs/runbook.md` — updated to describe core entities on PG, RND/stats still on Redis until 6B/6C. **Files (deleted)**: - `galaxy/lobby/internal/adapters/redisstate/gamestore.go`, `applicationstore.go`, `invitestore.go`, `membershipstore.go`. - The corresponding sections of `redisstate/keyspace.go`. **Stub adapters retained**: `gamestub/`, `applicationstub/`, `invitestub/`, `membershipstub/` stay — they are pure in-memory ports useful for tests that don't need real PG. **Touched integration suites**: - `integration/lobbyuser/` - `integration/lobbynotification/` **Verification**: per Stage 3 pattern; plus the existing lobby HTTP contract tests against the public/internal ports. --- ### ~~Stage 6B~~ — Lobby Service: RaceNameDirectory This stage is implemented. **Goal**: replace the Lua-backed Redis `RaceNameDirectory` with a PG implementation that preserves the two-tier model (registered / reservation / pending_registration) and atomic registration semantics via SQL transactions and (where required) advisory locks. **Schema (additions to `lobby` schema)**: - `race_names` (canonical_key PK, holder_user_id, binding_kind ('registered' | 'reserved' | 'pending_registration'), source_game_id, eligible_until_ms, registered_at_ms, reserved_at_ms). - INDEX (holder_user_id) for `ListRegistered`/`ListReservations`/ `ListPendingRegistrations` queries. - PARTIAL INDEX (eligible_until_ms) WHERE binding_kind = 'pending_registration' for the expiration scanner. - The confusable-pair policy is enforced at write time inside `BEGIN … COMMIT` transactions; `Reserve`/`Register`/ `MarkPendingRegistration` use `SELECT … FOR UPDATE` on the canonical keys involved (or PG advisory locks keyed by `hashtext(canonical_key)`) to serialise concurrent attempts. **Files (new)**: - `galaxy/lobby/internal/adapters/postgres/migrations/00002_race_names.sql`. - `galaxy/lobby/internal/adapters/postgres/racenamedir/directory.go` — Postgres implementation of `ports.RaceNameDirectory`. - `galaxy/lobby/internal/adapters/postgres/racenamedir/directory_test.go` — runs the existing shared suite at `galaxy/lobby/internal/ports/racenamedirtest/suite.go`. **Files (touched)**: - `galaxy/lobby/internal/app/runtime.go` — wire PG RND. - `galaxy/lobby/internal/ports/racenamedirtest/suite.go` — only shape-preserving updates if the suite assumed Redis-only behaviour (e.g. SCAN-based list ordering). - `galaxy/lobby/README.md`, `galaxy/lobby/docs/runbook.md` — RND now PG- backed; canonical_lookup cache no longer needed (PG indexed lookup is fast enough; remove the Redis cache key from `redisstate/keyspace.go`). **Files (deleted)**: - `galaxy/lobby/internal/adapters/redisstate/racenamedir.go` and the embedded Lua scripts. - `galaxy/lobby/internal/adapters/racenamestub/` stays (useful for unit tests that don't need PG). **Worker change**: the pending-registration expiration worker switches from `ZRANGEBYSCORE` on `lobby:race_names:pending_index` to `SELECT … FROM race_names WHERE binding_kind='pending_registration' AND eligible_until_ms <= now()`. **Verification**: shared port suite (`racenamedirtest`) green against PG adapter; lobby unit tests green; `integration/lobbyuser/`, `integration/lobbynotification/` green. --- ### ~~Stage 6C~~ — Lobby Service: workers, ephemeral stores, cleanup This stage is implemented. **Goal**: finish the lobby migration. Confirm what stays Redis-only, update workers that touch both backends, drop dead Redis adapters. **Stays on Redis (per architectural rules)**: - `GameTurnStatsStore` — ephemeral per-game aggregate, deleted at game finish, rebuildable from GM events. - `EvaluationGuardStore` — ephemeral marker. - `GapActivationStore` — short-lived gap-window timestamp cache. - `StreamOffsetStore` — runtime coordination per the architectural rule. - All stream consumers and publishers (`gm:lobby_events`, `runtime:job_results`, `user:lifecycle_events`, `notification:intents`). This is documented in `galaxy/lobby/docs/postgres-migration.md`. **Files (touched)**: - `galaxy/lobby/internal/worker/gmevents/consumer.go` — write durable updates via PG-backed `GameStore`. - `galaxy/lobby/internal/worker/runtimejobresult/consumer.go` — same. - `galaxy/lobby/internal/adapters/userlifecycle/consumer.go` (and the worker that drives it) — RND release, membership/application/invite cascade all flow through PG. - `galaxy/lobby/internal/worker/pendingregistration/worker.go` — PG-based scan, no Redis ZSET. - `galaxy/lobby/internal/worker/enrollmentautomation/worker.go` — uses PG `GameStore.GetByStatus("enrollment_open")`. - `galaxy/lobby/internal/adapters/redisstate/keyspace.go` — pruned to the remaining Redis keys (turn stats, gap activation, evaluation guard, stream offsets, lifecycle stream consumer state). - `galaxy/lobby/README.md`, `galaxy/lobby/docs/runtime.md`, `galaxy/lobby/docs/runbook.md`, `galaxy/lobby/docs/examples.md` — finalised storage descriptions. **Files (deleted)**: - Anything left in `galaxy/lobby/internal/adapters/redisstate/` whose only consumer was a port now PG-backed (see 6A/6B deletions). **Verification**: - All previously-green lobby unit tests pass with PG-backed adapters. - `integration/lobbyuser/`, `integration/lobbynotification/` pass. - `grep -rn "redisstate" galaxy/lobby/internal/` returns only the keys intentionally retained on Redis. --- ### ~~Stage 7~~ — Gateway and Auth/Session: Redis configuration refactor This stage is implemented. **Goal**: apply the new Redis configuration shape (master/replica/password, drop TLS/USERNAME) to Gateway and Auth/Session. No PG migration; these services intentionally stay Redis-only. **Files (touched)**: - `galaxy/gateway/internal/config/config.go` — switch `RedisConfig` fields to the `pkg/redisconn.Config` shape; update the three prefixes: `GATEWAY_SESSION_CACHE_REDIS_*`, `GATEWAY_REPLAY_REDIS_*`, `GATEWAY_SESSION_EVENTS_REDIS_*`. Drop `TLS_ENABLED`, `USERNAME`. - `galaxy/gateway/internal/session/redis.go`, `galaxy/gateway/internal/replay/redis.go`, `galaxy/gateway/internal/events/subscriber.go` — adopt new client constructor via `pkg/redisconn`. - `galaxy/gateway/internal/config/config_test.go`, `galaxy/gateway/internal/session/redis_test.go`, `galaxy/gateway/internal/replay/redis_test.go` — updated to new env shape. - `galaxy/authsession/internal/config/config.go` — same pattern; drop TLS, USERNAME. - `galaxy/authsession/internal/adapters/redis/sessionstore/store.go`, `challengestore/store.go`, `projectionpublisher/publisher.go`, `sendemailcodeabuse/protector.go`, `configprovider/store.go` — adopt new client. - `galaxy/authsession/internal/config/config_test.go` — updated. - `galaxy/gateway/README.md`, `galaxy/authsession/README.md`, `galaxy/gateway/docs/runbook.md`, `galaxy/authsession/docs/runbook.md` — note that Redis-only is intentional and reference the `ARCHITECTURE.md` rule on TTL-bounded auth state. **No deletions of business logic**; only env-var refactor and adapter plumbing through `pkg/redisconn`. **Touched integration suites**: - `integration/gatewayauthsession/` - `integration/authsession/` - (every suite that boots gateway or authsession picks up the new env vars via the harness; confirm none still pass `*_REDIS_TLS_ENABLED`). **Verification**: - `cd galaxy/gateway && go test ./...` - `cd galaxy/authsession && go test ./...` - `cd integration && go test ./gatewayauthsession/... ./authsession/...` --- ### ~~Stage 8~~ — GeoProfile: documentation only **Goal**: ensure the GeoProfile plan and README reflect the new persistence rules so its future implementation follows them. No code exists yet. **Files (touched)**: - `galaxy/geoprofile/PLAN.md` — add a stage referencing `pkg/postgres` and `pkg/redisconn`; specify that observed-country aggregates, declared_country history and review records will live in a `geoprofile` schema, while ephemeral per-session signals (if any) stay on Redis. - `galaxy/geoprofile/README.md` — note ownership of the `geoprofile` schema and the stack choices. **No code change**. --- ### ~~Stage 9~~ — Final sweep **Goal**: confirm no dead Redis adapter code, no orphaned stub, no broken doc reference. Remove the *Migration Window* caveat from `ARCHITECTURE.md` once all stages are done. **Activities**: - Walk every PG-backed service: `grep -rn "redis" galaxy//internal/adapters/` and verify every match belongs to a still-active stream/cache/runtime use case. - Walk integration suites: confirm each one provisions only the containers it actually needs; no stale env vars. - Update `ARCHITECTURE.md` to drop the *Migration Window* sub-section. - Combine sequences of migration `.sql` files into a single first file. Rewrite SQL-code, not just concat. The reason is that project still in in development state and all schema updates can go directly in the only and first step of relevant migrations. This should be represented in `ARCHITECTURE.md` as well. - One round of `go test ./...` in every module plus `cd integration && go test ./...`. **Verification**: - All tests pass in every module. - No file matches `// TODO.*postgres` or `// TODO.*migrate`. - `git grep -n REDIS_TLS_ENABLED REDIS_USERNAME` returns nothing under `galaxy/` (these env vars are fully retired). --- ## Verification strategy (whole project) After each stage: - `cd /Users/id/src/go/galaxy/pkg && go test ./...` - `cd /Users/id/src/go/galaxy/ && go test ./...` (with Docker available for testcontainers). - `cd /Users/id/src/go/galaxy/integration && go test .//...` - Manual smoke against a `docker-compose` stack (PG + Redis, both with passwords) using the example flows in each service's `docs/examples.md`. After Stage 9: - `cd /Users/id/src/go/galaxy/integration && go test ./...` end to end against real PG + real Redis. - Confirm `git grep -nE 'REDIS_(TLS_ENABLED|USERNAME)'` returns nothing under `galaxy/`. - Confirm `git grep -n 'TODO.*(postgres|migrate)'` returns nothing. ## Out of scope - `galaxy/game` — explicitly excluded by the project owner. - Production deployment manifests (Helm/k8s) — local `docker-compose` is enough for development. - Backup/restore tooling configuration — `pg_dump` and WAL archiving are available out of the box; operational setup is not part of this plan. - Sentinel/Cluster Redis topology code paths — config exposes replica addresses for future use; no failover routing implemented yet. - Read-traffic routing to PG replicas — config exposes `*_POSTGRES_REPLICA_DSNS` for future use; no routing implemented yet. - `golangci-lint` config addition — not part of this migration. - CI pipeline — no `.github/workflows/` exists; not added by this plan. ## Risks and notes - **`go-jet` codegen requires a live database**. The `make jet` target per service uses `testcontainers-go` to bring up a transient PG, applies the same goose migrations the service applies at startup, then runs `jet -dsn=… -path=internal/adapters/postgres/jet`. Generated code is committed; consumers don't need Docker just to build. - **Schema-per-service vs single-DB cross-service joins**: there are no cross-schema joins in this plan. Each service reads only its own schema; cross-service data flows go via Redis Streams (event bus) or HTTP contracts (User Service is queried by Lobby for eligibility) — same as today. The DB-level role grants enforce this. - **Pending registration expiration worker**: under Redis it scanned a global ZSET; under PG it does an indexed scan. The partial index on `eligible_until_ms WHERE binding_kind='pending_registration'` keeps the scan cheap. - **Idempotency under crash**: with idempotency expressed as a UNIQUE constraint on the durable record, recovery is "the row either exists or it doesn't" — no Redis-loss window where duplicates can sneak through. - **lib/pq vs pgx (revisit)**: confirmed pgx/v5 + jet via stdlib adapter. The `make jet` target will pass `-source=postgres` to jet (the dialect is independent of which Go driver runs the queries at runtime). - **No backward-compat shim for env vars**: `*_REDIS_TLS_ENABLED` and `*_REDIS_USERNAME` are retired in one cut. Any external dev environment that sets these will start failing fast at startup with a clear error emitted by `pkg/redisconn.LoadFromEnv`.