A Go library that returns every legal play ranked by score and scores or validates plays, using the Appel-Jacobson DAWG algorithm over github.com/iliadenisov/dafsa v1.1.0. - DAWG move generation (across / down / both), full tournament scoring with a per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay, ValidatePlay. - Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset). - cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress (self-play benchmark), selfplay engine; brute-force test oracle. - A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
7.4 KiB
Scrabble move generation — algorithm reference (single source of truth)
This is the authoritative description of the algorithm the solver implements: the DAWG move generator of Appel & Jacobson. It is distilled from the source paper (extracted from the PDF in this repo) plus our adaptation. Work from this file; do not re-parse the PDFs.
Source: [AJ88] A. Appel, G. Jacobson, The World's Fastest Scrabble Program, Commun. ACM 31(5):572–578, 585 (1988).
A GADDAG (Gordon, 1994) was also implemented and benchmarked against the DAWG, then removed: for a scoring solver it was ~7× larger and no faster. See
RESULTS.md.
Notation: letters are alphabet indexes (English a..z → 0..25). "Across" =
horizontal play; "down" = vertical. The board grid and all I/O use the byte encoding in
internal/encoding (low 6 bits = letter+1, 0 = empty, bit 7 = blank).
1. Reduction to one dimension [AJ88 §3.1]
Every play is either across (one row) or down (one column). Down plays are across plays on the transposed board, so the generator implements only "across" and runs on the board and/or its transpose. This is the two-mode requirement:
OnlyHorizontal= across on the board.OnlyVertical= across on the transpose.Both= both. (A move found on the transpose has its coordinates swapped back.)
Anchors [AJ88 §3.1.2]
An across word must include a newly-placed tile adjacent to a tile already on the board. Generation starts only from anchors — empty squares orthogonally adjacent to a filled square — which guarantees connectivity and prunes most of the search. The very first move has no anchors and is special-cased through the board's centre square.
Cross-checks (cross-sets) [AJ88 §3.1.1]
For each empty square the cross-set is the set of letters that, placed there, form a
legal perpendicular word with the tiles above/below it; it is stored as a bit-vector
(letterSet). A square with no perpendicular neighbour allows every letter. Cross-sets
let move generation stay one-dimensional. Off-board squares are treated as blocking
sentinels.
2. Lexicon: trie → DAWG [AJ88 §3.2]
The lexicon is a trie whose edges are labelled by letters; a word is a root-to-node
path; terminal nodes are marked. Minimizing the trie (merging equivalent sub-tries)
yields a DAWG, the minimum-state automaton for the word set. dafsa is exactly such
a minimized, bit-packed DAWG, with a compact ≤6-bit alphabet and a per-node final
flag. Its node identity is a bit-offset; edges of a node are sorted by letter.
3. Move generation — [AJ88 §3.3] (verbatim)
Two phases per anchor: place a left part, then extend right. The left part is
either tiles already on the board (read them, then ExtendRight from the corresponding
node) or rack tiles placed by a pruned DAWG traversal bounded by limit = number of
non-anchor squares to the left (this bound, reset at each anchor, makes each move appear
once).
ExtendRight(PartialWord, node N, square):
if square is vacant then
if N is terminal then LegalMove(PartialWord)
for each edge E out of N:
let l = letter of E
if l is in the rack and l is in the cross-set of square then
remove a tile l from the rack
ExtendRight(PartialWord || l, node(E), square+1)
put the tile l back into the rack
else
let l = letter occupying square
if N has an edge labelled l leading to N' then
ExtendRight(PartialWord || l, N', square+1)
LeftPart(PartialWord, node N, limit):
ExtendRight(PartialWord, N, AnchorSquare)
if limit > 0 then
for each edge E out of N:
let l = letter of E
if l is in the rack then
remove a tile l from the rack
LeftPart(PartialWord || l, node(E), limit - 1)
put the tile l back into the rack
To generate from an anchor with k non-anchor squares to its left: LeftPart("", root, k). Implementation notes:
- A word is recorded only past the anchor (
square > anchor), so every recorded play covers the anchor square — the connection guaranteed by the anchor's filled neighbour. - Empty prefix when a tile sits left of the anchor: skip
LeftPart;ExtendRightdirectly from the node reached by walking the on-board left context. - Blanks: when scanning the rack for a letter, also allow a blank to stand for it; the placed tile is flagged (bit 7) and scores 0; restore on backtrack.
- First move: the only anchor is the centre; the left limit equals its column, so the word covers the centre.
This maps onto the dafsa traversal API: Cursor.Root, Cursor.Next(node, letter)
(an edge, with the destination's final flag), Cursor.Final(node), and Cursor.Arcs
(enumerate a node's edges, used for placements and cross-sets).
4. Cross-set computation
For a square with tiles above (top→bottom) and below, the cross-set is
{ X : above·X·below ∈ dict }:
- Right extension (no
below): deterministic —Xjust completes the prefixabove. Walkaboveto a node, then read the completers (one arc enumeration: the letters whose arc leads straight to an accepting node). - Left extension (tiles
below): non-deterministic — probe eachX(walkabove,X, thenbelow, test acceptance). This is the asymmetry inherent to a left-to-right DAWG.
Cross-sets are recomputed per generation for the squares that need them (cached within a
call). Scoring is done separately by Evaluate (§5), so cross-sums are not precomputed.
5. Scoring (full tournament rules + breakdown)
A play's score = main word + every cross-word formed + the all-tiles bonus. Per word:
- Sum
tileValue(letter)over its tiles; a blank scores 0. - A letter premium (DL/TL) multiplies the value of a tile placed on it only when newly placed this turn.
- A word premium (DW/TW) multiplies the whole word only when a newly-placed tile sits on it; multiple word premiums multiply.
- Each cross-word counts its new tile plus the existing perpendicular run.
The all-tiles bonus is added when the play uses a full rack. Board geometry, premium
layout, tile values/counts, blank count, rack size and the bonus are all part of the
rules.Ruleset (English, RussianScrabble, Erudit). Evaluate returns the main
word, the cross-words and a per-tile breakdown. ValidatePlay adds dictionary and
connectivity checks.
6. Rulesets
- English Scrabble: 15×15, standard premiums, 26 letters, 100 tiles (98 + 2 blanks), 7-tile rack, 50-point bonus.
- Russian Scrabble: 33-letter alphabet (incl. Ё), standard board, 104 tiles, 50-bonus.
- Эрудит: 33-letter alphabet with Ё unused (no tile — fold Ё→Е when preparing the
dictionary,
wordlist.FoldYo; an out-of-engine step), the centre does not double, 131 tiles (128 + 3 blanks), blanks score 0, 15-point bonus.
7. Special cases checklist
- First move: no anchors; the play must cover the centre.
- Blanks: any letter, score 0, flagged via bit 7; expand to every cross-set-allowed letter during generation.
- Off-board sentinels: stop extension at the edge.
- A single newly-placed tile can form both an across and a down word.
- Dedup: each legal move is generated once (anchor + left-part limit); a canonical move key guards against any residual duplicate.