Files
Ilia Denisov 15c7959d96 Implement Scrabble move generator (DAWG) with English and Russian rules
A Go library that returns every legal play ranked by score and scores or
validates plays, using the Appel-Jacobson DAWG algorithm over
github.com/iliadenisov/dafsa v1.1.0.

- DAWG move generation (across / down / both), full tournament scoring with a
  per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay,
  ValidatePlay.
- Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset).
- cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress
  (self-play benchmark), selfplay engine; brute-force test oracle.
- A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and
  faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
2026-06-01 16:07:32 +02:00

7.4 KiB
Raw Permalink Blame History

Scrabble move generation — algorithm reference (single source of truth)

This is the authoritative description of the algorithm the solver implements: the DAWG move generator of Appel & Jacobson. It is distilled from the source paper (extracted from the PDF in this repo) plus our adaptation. Work from this file; do not re-parse the PDFs.

Source: [AJ88] A. Appel, G. Jacobson, The World's Fastest Scrabble Program, Commun. ACM 31(5):572578, 585 (1988).

A GADDAG (Gordon, 1994) was also implemented and benchmarked against the DAWG, then removed: for a scoring solver it was ~7× larger and no faster. See RESULTS.md.

Notation: letters are alphabet indexes (English a..z0..25). "Across" = horizontal play; "down" = vertical. The board grid and all I/O use the byte encoding in internal/encoding (low 6 bits = letter+1, 0 = empty, bit 7 = blank).


1. Reduction to one dimension [AJ88 §3.1]

Every play is either across (one row) or down (one column). Down plays are across plays on the transposed board, so the generator implements only "across" and runs on the board and/or its transpose. This is the two-mode requirement:

  • OnlyHorizontal = across on the board.
  • OnlyVertical = across on the transpose.
  • Both = both. (A move found on the transpose has its coordinates swapped back.)

Anchors [AJ88 §3.1.2]

An across word must include a newly-placed tile adjacent to a tile already on the board. Generation starts only from anchors — empty squares orthogonally adjacent to a filled square — which guarantees connectivity and prunes most of the search. The very first move has no anchors and is special-cased through the board's centre square.

Cross-checks (cross-sets) [AJ88 §3.1.1]

For each empty square the cross-set is the set of letters that, placed there, form a legal perpendicular word with the tiles above/below it; it is stored as a bit-vector (letterSet). A square with no perpendicular neighbour allows every letter. Cross-sets let move generation stay one-dimensional. Off-board squares are treated as blocking sentinels.


2. Lexicon: trie → DAWG [AJ88 §3.2]

The lexicon is a trie whose edges are labelled by letters; a word is a root-to-node path; terminal nodes are marked. Minimizing the trie (merging equivalent sub-tries) yields a DAWG, the minimum-state automaton for the word set. dafsa is exactly such a minimized, bit-packed DAWG, with a compact ≤6-bit alphabet and a per-node final flag. Its node identity is a bit-offset; edges of a node are sorted by letter.


3. Move generation — [AJ88 §3.3] (verbatim)

Two phases per anchor: place a left part, then extend right. The left part is either tiles already on the board (read them, then ExtendRight from the corresponding node) or rack tiles placed by a pruned DAWG traversal bounded by limit = number of non-anchor squares to the left (this bound, reset at each anchor, makes each move appear once).

ExtendRight(PartialWord, node N, square):
    if square is vacant then
        if N is terminal then LegalMove(PartialWord)
        for each edge E out of N:
            let l = letter of E
            if l is in the rack and l is in the cross-set of square then
                remove a tile l from the rack
                ExtendRight(PartialWord || l, node(E), square+1)
                put the tile l back into the rack
    else
        let l = letter occupying square
        if N has an edge labelled l leading to N' then
            ExtendRight(PartialWord || l, N', square+1)

LeftPart(PartialWord, node N, limit):
    ExtendRight(PartialWord, N, AnchorSquare)
    if limit > 0 then
        for each edge E out of N:
            let l = letter of E
            if l is in the rack then
                remove a tile l from the rack
                LeftPart(PartialWord || l, node(E), limit - 1)
                put the tile l back into the rack

To generate from an anchor with k non-anchor squares to its left: LeftPart("", root, k). Implementation notes:

  • A word is recorded only past the anchor (square > anchor), so every recorded play covers the anchor square — the connection guaranteed by the anchor's filled neighbour.
  • Empty prefix when a tile sits left of the anchor: skip LeftPart; ExtendRight directly from the node reached by walking the on-board left context.
  • Blanks: when scanning the rack for a letter, also allow a blank to stand for it; the placed tile is flagged (bit 7) and scores 0; restore on backtrack.
  • First move: the only anchor is the centre; the left limit equals its column, so the word covers the centre.

This maps onto the dafsa traversal API: Cursor.Root, Cursor.Next(node, letter) (an edge, with the destination's final flag), Cursor.Final(node), and Cursor.Arcs (enumerate a node's edges, used for placements and cross-sets).


4. Cross-set computation

For a square with tiles above (top→bottom) and below, the cross-set is { X : above·X·below ∈ dict }:

  • Right extension (no below): deterministic — X just completes the prefix above. Walk above to a node, then read the completers (one arc enumeration: the letters whose arc leads straight to an accepting node).
  • Left extension (tiles below): non-deterministic — probe each X (walk above, X, then below, test acceptance). This is the asymmetry inherent to a left-to-right DAWG.

Cross-sets are recomputed per generation for the squares that need them (cached within a call). Scoring is done separately by Evaluate (§5), so cross-sums are not precomputed.


5. Scoring (full tournament rules + breakdown)

A play's score = main word + every cross-word formed + the all-tiles bonus. Per word:

  • Sum tileValue(letter) over its tiles; a blank scores 0.
  • A letter premium (DL/TL) multiplies the value of a tile placed on it only when newly placed this turn.
  • A word premium (DW/TW) multiplies the whole word only when a newly-placed tile sits on it; multiple word premiums multiply.
  • Each cross-word counts its new tile plus the existing perpendicular run.

The all-tiles bonus is added when the play uses a full rack. Board geometry, premium layout, tile values/counts, blank count, rack size and the bonus are all part of the rules.Ruleset (English, RussianScrabble, Erudit). Evaluate returns the main word, the cross-words and a per-tile breakdown. ValidatePlay adds dictionary and connectivity checks.


6. Rulesets

  • English Scrabble: 15×15, standard premiums, 26 letters, 100 tiles (98 + 2 blanks), 7-tile rack, 50-point bonus.
  • Russian Scrabble: 33-letter alphabet (incl. Ё), standard board, 104 tiles, 50-bonus.
  • Эрудит: 33-letter alphabet with Ё unused (no tile — fold Ё→Е when preparing the dictionary, wordlist.FoldYo; an out-of-engine step), the centre does not double, 131 tiles (128 + 3 blanks), blanks score 0, 15-point bonus.

7. Special cases checklist

  • First move: no anchors; the play must cover the centre.
  • Blanks: any letter, score 0, flagged via bit 7; expand to every cross-set-allowed letter during generation.
  • Off-board sentinels: stop extension at the edge.
  • A single newly-placed tile can form both an across and a down word.
  • Dedup: each legal move is generated once (anchor + left-part limit); a canonical move key guards against any residual duplicate.