A Go library that returns every legal play ranked by score and scores or validates plays, using the Appel-Jacobson DAWG algorithm over github.com/iliadenisov/dafsa v1.1.0. - DAWG move generation (across / down / both), full tournament scoring with a per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay, ValidatePlay. - Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset). - cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress (self-play benchmark), selfplay engine; brute-force test oracle. - A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
10 KiB
Scrabble Solver — Implementation Plan
Outcome (current state)
Both generators were implemented and verified to produce identical moves, then compared
by self-play stress test (RESULTS.md). The GADDAG was removed: for a scoring solver
it was ~7× larger and no faster than the DAWG, which is now the sole generator.
Shipped: the DAWG generator, full scoring + breakdown, the public Solver
(GenerateMoves/ScorePlay/ValidatePlay), and three rulesets (English Scrabble,
Russian Scrabble, Эрудит). The rest of this document is the original roadmap, kept for
history; the DAWG/GADDAG comparison it describes is preserved in RESULTS.md.
Context
We are building a Go library that, given a dictionary, a current game position and a
player's rack, returns every legal new play ranked by descending score. The core is a
fast finite-automaton move generator based on two papers (analysed in ALGORITHM.md):
- Appel & Jacobson, The World's Fastest Scrabble Program (CACM 1988) — the DAWG
algorithm (cross-checks, anchor squares,
LeftPart/ExtendRight, edge encoding, cross-sums for scoring, transpose for the perpendicular direction). - Gordon, A Faster Scrabble Move Generation Algorithm (SP&E 1994) — the GADDAG
(
REV(x)◊yrepresentation, singleGen/GoOngenerator, deterministic cross-sets, construction algorithm).
The graph engine is github.com/iliadenisov/dafsa — a compact, bit-packed minimized
DAWG with a 6-bit "compact alphabet" (alphabet.Indexer, ≤63 symbols) and an
index-based (*B) API, checked out locally at ../dafsa.
Headline approach
Implement BOTH generators — DAWG and GADDAG — behind one shared Generator
interface, then decide which becomes the production default empirically, via a clean
self-play stress test (two greedy players, several games) on the same dictionary,
measuring speed and memory. The choice is made after implementation and measurement.
Both implementations are kept; the comparison output (RESULTS.md) picks the default.
Locked decisions
| # | Topic | Decision |
|---|---|---|
| 1 | Core algorithm | Implement both: DAWG (Appel-Jacobson) and GADDAG (Gordon, over dafsa as a DAWG of {REV(x)◊y} with per-node final flags). Pick the default after a self-play stress test. |
| 2 | dafsa changes | Edit ../dafsa directly, wire via go.mod replace. Leave a spec/CHANGELOG there. |
| 3 | Ruleset scope | Default standard English Scrabble, fully parameterizable (board geometry, premium layout DL/TL/DW/TW, tile values & counts, alphabet, blank count, bingo bonus). Must support Russian "Эрудит" (same 15×15 board + premiums; different tile values/counts; Cyrillic alphabet; one word per move, horizontal or vertical). |
| 4 | Scoring | Full tournament scoring + breakdown: main word + all cross-words + premiums (newly-placed tiles only) + bingo bonus; result carries formed cross-words and a per-tile breakdown. |
| 5 | Symbol encoding | 0x80 = wildcard/blank flag (board/rack/output only — never in the graph). GADDAG separator ◊ = index == alphabet size (cbits minimal; measured optimum). |
| 6 | State model | Compact byte board is the generation core; a structured Play type + a constructor that applies plays to build a board provide the full game-state overlay. |
| 7 | API scope | Generation + scoring + validation of arbitrary plays. |
| 8 | Dictionary | kamilmielnik/scrabble-dictionaries as a git submodule; cmd/builddict builds serialized structures cached in testdata/ (gitignored). English now; Russian later. |
Static structure probe (informs expectations; NOT the decision)
Full SOWPODS (267,752 words, Σlen = 2,439,269), built through dafsa:
| Structure | nodes | bytes | bits/char | build | ns/arc |
|---|---|---|---|---|---|
| DAWG (a–z) | 77,808 | 750 KB | 2.46 | 186 ms | 48.7 |
| GADDAG sep=size(26) · cbits5 | 587,940 | 5.37 MB | 17.61 | 2.92 s | 61.0 |
| GADDAG sep=62 · cbits6 (measured) | 587,940 | 5.53 MB | 18.13 | — | 60.9 |
| GADDAG sep=0x40 · cbits7 (extrapolated) | 587,940 | ~5.69 MB | ~18.6 | — | ~61 |
GADDAG is ~7× the DAWG and ~25% costlier per arc, but ~2× faster at actual move generation (Gordon Table IV: ~2.5× fewer arcs). The stress test settles it.
Deliverable documents
ALGORITHM.md— single source of truth (verbatim pseudocode + our adaptation).PLAN.md— this plan.RESULTS.md— stress-test comparison + the production-default decision.
Architecture & package layout
scrabble-solver/
go.mod # + replace github.com/iliadenisov/dafsa => ../dafsa
PLAN.md ALGORITHM.md RESULTS.md README.md
dictionaries/ # git submodule: kamilmielnik/scrabble-dictionaries
testdata/ # gitignored: cached serialized DAWG + GADDAG
internal/
gaddag/ # REV(x)◊y transform + build + traversal wrapper over dafsa
dictdawg/ # plain-DAWG build + traversal wrapper over dafsa
encoding/ # byte conventions (wildcard 0x80, separator, board cells)
board/ # compact board grid, transpose, premium layout
rack/ # rack as per-letter counts + blanks
rules/ # Ruleset: geometry, premiums, tile values/counts, alphabet, bonus
scrabble/ (public pkg) # Solver + Generator interface; Play/Move types;
gen_dawg.go # DAWG generator (LeftPart/ExtendRight)
gen_gaddag.go # GADDAG generator (Gen/GoOn)
selfplay/ # bag + greedy player + game loop (self-play engine)
cmd/builddict/ # word list -> serialized DAWG/GADDAG -> testdata
cmd/stress/ # run N self-play games per generator, emit comparison
Shared Generator interface so the harness can swap implementations:
type Generator interface {
GenerateMoves(b *board.Board, r rack.Rack, mode Mode) []Move // ranked, descending score
Name() string
}
Board, rack, rules and scoring are shared; cross-set computation is per-generator (DAWG: probe the dictionary DAWG incl. the non-deterministic left set; GADDAG: deterministic GADDAG walk) — that difference is part of what is measured.
Changes to ../dafsa (additive ⇒ backward compatible)
- Low-level traversal API:
type Node(opaque bit-offset);Root() Node;Next(n, ch) (child, final, ok)(= Gordon'sNextArc, wraps privategetEdge);Arcs(n, fn)(wrapsgetNode, for blanks/cross-sets); a reusable allocation-free cursor for hot-pathNext. - Custom-alphabet persistence:
WriteTo/SaveWith(allow non-embedded alphabet);ReadWith/LoadWith(inject a known indexer, skip language reconstruction). - (Optional) accurate serialized arc/node count; document that
NumEdges()is build-time.
Data model & compact formats
- Byte symbol: low 6 bits = alphabet index;
0x80= wildcard/blank (I/O only);◊= indexlen(alphabet)(GADDAG graph only). - Board:
[]byte, row-major.0= empty; occupied =letterIndex+1; blank =(letterIndex+1) | 0x80. Helpers:At,Set,Transpose, premium lookup. - Rack:
[]bytecounts, lengthalphabetSize+1; last slot = blank count. Play:{Row, Col; Dir; Tiles []byte (0x80 flags); Main; CrossWords; Score; Breakdown}— input for apply/validate/score and the output element.- Modes:
Both,Horizontal,Vertical.
Staged implementation
- Stage 0 — Scaffolding & docs:
ALGORITHM.md,PLAN.md,dictionaries/submodule,go.modreplace,.gitignore. - Stage 1 — dafsa traversal API (shared):
Node,Root,Next,Arcs, cursor; tests. - Stage 2 — dafsa custom-alphabet persistence:
SaveWith/ReadWith; round-trip. - Stage 3 — Shared infra: encoding, board (+transpose), rack, rules (EN; Эрудит stub),
scoring,
Generatorinterface,Move/Playtypes. - Stage 4 — Dictionary build:
internal/dictdawg+internal/gaddag;cmd/builddictcaching serialized DAWG and GADDAG intestdata. - Stage 5 — Cross-sets: DAWG cross-checks (incl. non-deterministic left set) and GADDAG deterministic cross-sets; validated against each other + brute force on a small lexicon.
- Stage 6 — DAWG generator (
LeftPart/ExtendRight). - Stage 7 — GADDAG generator (
Gen/GoOn). - Stage 8 — Correctness gate: DAWG and GADDAG identical move sets on random positions (each move once) + brute force on a tiny dictionary. Must pass before perf comparison.
- Stage 9 — Self-play stress test:
selfplayengine (bag, racks, greedy policy, seeded RNG, end conditions);cmd/stressplays N games per generator measuring time, arcs, allocations (runtime.MemStats), peak RSS (/proc/self/statusVmHWM), footprint; emitRESULTS.md. - Stage 10 — Decision + public API: choose default from
RESULTS.md(both selectable); finalizeSolverAPI, Play↔board constructors, examples. - Stage 11 — Polish: benchmarks, README, optional prebuilt-graph distribution.
Verification
go test ./...,go vet, lint green per stage.- Mutual oracle (Stage 8): identical move sets; brute force on a tiny dictionary.
- Build EN structures from the SOWPODS submodule via
cmd/builddict; runGenerateMoveson canonical positions (e.g. Gordon's "CARE on ABLE") and assert top moves/scores. - Run
cmd/stress(100–1000 seeded games per generator) →RESULTS.md.
Assumptions & caveats
- Both algorithms ship; the production default is decided by the stress test. Both remain selectable.
- Self-play policy defaults to greedy (deterministic tie-break, seeded RNG); tunable.
- Separator = real 27th token (
◊, index = size,cbits=5);0x40reserved on the board. Wildcard/blank =0x80, never in the graph. - Stateless per-call generation in v1 (anchors + cross-sets recomputed per call); incremental maintenance is a later optimization (both generators run stateless — a fairness note for the comparison).
- Persistence stores only the graph; the (custom) alphabet is injected on load.
- Russian "Эрудит" alphabet specifics (Е/Ё handling, tile values/counts) resolved at Stage 3/4; "one word per move, H or V" is satisfied by the modes.
- The final-flag GADDAG is larger than Gordon's letter-set form; letter-sets-on-arcs remain a possible future optimization.