# Scrabble move generation — algorithm reference (single source of truth) This is the authoritative description of the algorithm the solver implements: the DAWG move generator of Appel & Jacobson. It is distilled from the source paper (extracted from the PDF in this repo) plus our adaptation. **Work from this file; do not re-parse the PDFs.** Source: **[AJ88]** A. Appel, G. Jacobson, *The World's Fastest Scrabble Program*, Commun. ACM 31(5):572–578, 585 (1988). > A GADDAG (Gordon, 1994) was also implemented and benchmarked against the DAWG, then > **removed**: for a scoring solver it was ~7× larger and no faster. See `RESULTS.md`. Notation: letters are alphabet **indexes** (English `a..z` → `0..25`). "Across" = horizontal play; "down" = vertical. The board grid and all I/O use the byte encoding in `internal/encoding` (low 6 bits = letter+1, `0` = empty, bit 7 = blank). --- ## 1. Reduction to one dimension [AJ88 §3.1] Every play is either **across** (one row) or **down** (one column). Down plays are across plays on the **transposed** board, so the generator implements only "across" and runs on the board and/or its transpose. This is the two-mode requirement: - `OnlyHorizontal` = across on the board. - `OnlyVertical` = across on the transpose. - `Both` = both. (A move found on the transpose has its coordinates swapped back.) ### Anchors [AJ88 §3.1.2] An across word must include a newly-placed tile adjacent to a tile already on the board. Generation starts only from **anchors** — empty squares orthogonally adjacent to a filled square — which guarantees connectivity and prunes most of the search. The very first move has no anchors and is special-cased through the board's centre square. ### Cross-checks (cross-sets) [AJ88 §3.1.1] For each empty square the **cross-set** is the set of letters that, placed there, form a legal **perpendicular** word with the tiles above/below it; it is stored as a bit-vector (`letterSet`). A square with no perpendicular neighbour allows every letter. Cross-sets let move generation stay one-dimensional. Off-board squares are treated as blocking sentinels. --- ## 2. Lexicon: trie → DAWG [AJ88 §3.2] The lexicon is a trie whose edges are labelled by letters; a word is a root-to-node path; terminal nodes are marked. Minimizing the trie (merging equivalent sub-tries) yields a **DAWG**, the minimum-state automaton for the word set. `dafsa` is exactly such a minimized, bit-packed DAWG, with a compact ≤6-bit alphabet and a **per-node** `final` flag. Its node identity is a bit-offset; edges of a node are sorted by letter. --- ## 3. Move generation — [AJ88 §3.3] (verbatim) Two phases per anchor: place a **left part**, then **extend right**. The left part is either tiles already on the board (read them, then `ExtendRight` from the corresponding node) or rack tiles placed by a pruned DAWG traversal bounded by `limit` = number of non-anchor squares to the left (this bound, reset at each anchor, makes each move appear once). ``` ExtendRight(PartialWord, node N, square): if square is vacant then if N is terminal then LegalMove(PartialWord) for each edge E out of N: let l = letter of E if l is in the rack and l is in the cross-set of square then remove a tile l from the rack ExtendRight(PartialWord || l, node(E), square+1) put the tile l back into the rack else let l = letter occupying square if N has an edge labelled l leading to N' then ExtendRight(PartialWord || l, N', square+1) LeftPart(PartialWord, node N, limit): ExtendRight(PartialWord, N, AnchorSquare) if limit > 0 then for each edge E out of N: let l = letter of E if l is in the rack then remove a tile l from the rack LeftPart(PartialWord || l, node(E), limit - 1) put the tile l back into the rack ``` To generate from an anchor with `k` non-anchor squares to its left: `LeftPart("", root, k)`. Implementation notes: - A word is **recorded only past the anchor** (`square > anchor`), so every recorded play covers the anchor square — the connection guaranteed by the anchor's filled neighbour. - **Empty prefix** when a tile sits left of the anchor: skip `LeftPart`; `ExtendRight` directly from the node reached by walking the on-board left context. - **Blanks**: when scanning the rack for a letter, also allow a blank to stand for it; the placed tile is flagged (bit 7) and scores 0; restore on backtrack. - **First move**: the only anchor is the centre; the left limit equals its column, so the word covers the centre. This maps onto the `dafsa` traversal API: `Cursor.Root`, `Cursor.Next(node, letter)` (an edge, with the destination's `final` flag), `Cursor.Final(node)`, and `Cursor.Arcs` (enumerate a node's edges, used for placements and cross-sets). --- ## 4. Cross-set computation For a square with tiles `above` (top→bottom) and `below`, the cross-set is `{ X : above·X·below ∈ dict }`: - **Right extension** (no `below`): deterministic — `X` just completes the prefix `above`. Walk `above` to a node, then read the **completers** (one arc enumeration: the letters whose arc leads straight to an accepting node). - **Left extension** (tiles `below`): non-deterministic — probe each `X` (walk `above`, `X`, then `below`, test acceptance). This is the asymmetry inherent to a left-to-right DAWG. Cross-sets are recomputed per generation for the squares that need them (cached within a call). Scoring is done separately by `Evaluate` (§5), so cross-sums are not precomputed. --- ## 5. Scoring (full tournament rules + breakdown) A play's score = main word + every cross-word formed + the all-tiles bonus. Per word: - Sum `tileValue(letter)` over its tiles; a **blank** scores 0. - A **letter** premium (DL/TL) multiplies the value of a tile placed on it **only when newly placed** this turn. - A **word** premium (DW/TW) multiplies the whole word **only when a newly-placed tile sits on it**; multiple word premiums multiply. - Each cross-word counts its new tile plus the existing perpendicular run. The all-tiles bonus is added when the play uses a full rack. Board geometry, premium layout, tile values/counts, blank count, rack size and the bonus are all part of the `rules.Ruleset` (`English`, `RussianScrabble`, `Erudit`). `Evaluate` returns the main word, the cross-words and a per-tile breakdown. `ValidatePlay` adds dictionary and connectivity checks. --- ## 6. Rulesets - **English** Scrabble: 15×15, standard premiums, 26 letters, 100 tiles (98 + 2 blanks), 7-tile rack, 50-point bonus. - **Russian Scrabble**: 33-letter alphabet (incl. Ё), standard board, 104 tiles, 50-bonus. - **Эрудит**: 33-letter alphabet with **Ё unused** (no tile — fold Ё→Е when preparing the dictionary, `wordlist.FoldYo`; an out-of-engine step), the **centre does not double**, 131 tiles (128 + 3 blanks), blanks score 0, **15-point bonus**. --- ## 7. Special cases checklist - **First move**: no anchors; the play must cover the centre. - **Blanks**: any letter, score 0, flagged via bit 7; expand to every cross-set-allowed letter during generation. - **Off-board sentinels**: stop extension at the edge. - **A single newly-placed tile** can form both an across and a down word. - **Dedup**: each legal move is generated once (anchor + left-part limit); a canonical move key guards against any residual duplicate.