15c7959d96
A Go library that returns every legal play ranked by score and scores or validates plays, using the Appel-Jacobson DAWG algorithm over github.com/iliadenisov/dafsa v1.1.0. - DAWG move generation (across / down / both), full tournament scoring with a per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay, ValidatePlay. - Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset). - cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress (self-play benchmark), selfplay engine; brute-force test oracle. - A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
165 lines
7.4 KiB
Markdown
165 lines
7.4 KiB
Markdown
# Scrabble move generation — algorithm reference (single source of truth)
|
||
|
||
This is the authoritative description of the algorithm the solver implements: the DAWG
|
||
move generator of Appel & Jacobson. It is distilled from the source paper (extracted
|
||
from the PDF in this repo) plus our adaptation. **Work from this file; do not re-parse
|
||
the PDFs.**
|
||
|
||
Source: **[AJ88]** A. Appel, G. Jacobson, *The World's Fastest Scrabble Program*,
|
||
Commun. ACM 31(5):572–578, 585 (1988).
|
||
|
||
> A GADDAG (Gordon, 1994) was also implemented and benchmarked against the DAWG, then
|
||
> **removed**: for a scoring solver it was ~7× larger and no faster. See `RESULTS.md`.
|
||
|
||
Notation: letters are alphabet **indexes** (English `a..z` → `0..25`). "Across" =
|
||
horizontal play; "down" = vertical. The board grid and all I/O use the byte encoding in
|
||
`internal/encoding` (low 6 bits = letter+1, `0` = empty, bit 7 = blank).
|
||
|
||
---
|
||
|
||
## 1. Reduction to one dimension [AJ88 §3.1]
|
||
|
||
Every play is either **across** (one row) or **down** (one column). Down plays are
|
||
across plays on the **transposed** board, so the generator implements only "across" and
|
||
runs on the board and/or its transpose. This is the two-mode requirement:
|
||
|
||
- `OnlyHorizontal` = across on the board.
|
||
- `OnlyVertical` = across on the transpose.
|
||
- `Both` = both. (A move found on the transpose has its coordinates swapped back.)
|
||
|
||
### Anchors [AJ88 §3.1.2]
|
||
|
||
An across word must include a newly-placed tile adjacent to a tile already on the board.
|
||
Generation starts only from **anchors** — empty squares orthogonally adjacent to a
|
||
filled square — which guarantees connectivity and prunes most of the search. The very
|
||
first move has no anchors and is special-cased through the board's centre square.
|
||
|
||
### Cross-checks (cross-sets) [AJ88 §3.1.1]
|
||
|
||
For each empty square the **cross-set** is the set of letters that, placed there, form a
|
||
legal **perpendicular** word with the tiles above/below it; it is stored as a bit-vector
|
||
(`letterSet`). A square with no perpendicular neighbour allows every letter. Cross-sets
|
||
let move generation stay one-dimensional. Off-board squares are treated as blocking
|
||
sentinels.
|
||
|
||
---
|
||
|
||
## 2. Lexicon: trie → DAWG [AJ88 §3.2]
|
||
|
||
The lexicon is a trie whose edges are labelled by letters; a word is a root-to-node
|
||
path; terminal nodes are marked. Minimizing the trie (merging equivalent sub-tries)
|
||
yields a **DAWG**, the minimum-state automaton for the word set. `dafsa` is exactly such
|
||
a minimized, bit-packed DAWG, with a compact ≤6-bit alphabet and a **per-node** `final`
|
||
flag. Its node identity is a bit-offset; edges of a node are sorted by letter.
|
||
|
||
---
|
||
|
||
## 3. Move generation — [AJ88 §3.3] (verbatim)
|
||
|
||
Two phases per anchor: place a **left part**, then **extend right**. The left part is
|
||
either tiles already on the board (read them, then `ExtendRight` from the corresponding
|
||
node) or rack tiles placed by a pruned DAWG traversal bounded by `limit` = number of
|
||
non-anchor squares to the left (this bound, reset at each anchor, makes each move appear
|
||
once).
|
||
|
||
```
|
||
ExtendRight(PartialWord, node N, square):
|
||
if square is vacant then
|
||
if N is terminal then LegalMove(PartialWord)
|
||
for each edge E out of N:
|
||
let l = letter of E
|
||
if l is in the rack and l is in the cross-set of square then
|
||
remove a tile l from the rack
|
||
ExtendRight(PartialWord || l, node(E), square+1)
|
||
put the tile l back into the rack
|
||
else
|
||
let l = letter occupying square
|
||
if N has an edge labelled l leading to N' then
|
||
ExtendRight(PartialWord || l, N', square+1)
|
||
|
||
LeftPart(PartialWord, node N, limit):
|
||
ExtendRight(PartialWord, N, AnchorSquare)
|
||
if limit > 0 then
|
||
for each edge E out of N:
|
||
let l = letter of E
|
||
if l is in the rack then
|
||
remove a tile l from the rack
|
||
LeftPart(PartialWord || l, node(E), limit - 1)
|
||
put the tile l back into the rack
|
||
```
|
||
|
||
To generate from an anchor with `k` non-anchor squares to its left: `LeftPart("", root,
|
||
k)`. Implementation notes:
|
||
|
||
- A word is **recorded only past the anchor** (`square > anchor`), so every recorded play
|
||
covers the anchor square — the connection guaranteed by the anchor's filled neighbour.
|
||
- **Empty prefix** when a tile sits left of the anchor: skip `LeftPart`; `ExtendRight`
|
||
directly from the node reached by walking the on-board left context.
|
||
- **Blanks**: when scanning the rack for a letter, also allow a blank to stand for it;
|
||
the placed tile is flagged (bit 7) and scores 0; restore on backtrack.
|
||
- **First move**: the only anchor is the centre; the left limit equals its column, so the
|
||
word covers the centre.
|
||
|
||
This maps onto the `dafsa` traversal API: `Cursor.Root`, `Cursor.Next(node, letter)`
|
||
(an edge, with the destination's `final` flag), `Cursor.Final(node)`, and `Cursor.Arcs`
|
||
(enumerate a node's edges, used for placements and cross-sets).
|
||
|
||
---
|
||
|
||
## 4. Cross-set computation
|
||
|
||
For a square with tiles `above` (top→bottom) and `below`, the cross-set is
|
||
`{ X : above·X·below ∈ dict }`:
|
||
|
||
- **Right extension** (no `below`): deterministic — `X` just completes the prefix
|
||
`above`. Walk `above` to a node, then read the **completers** (one arc enumeration:
|
||
the letters whose arc leads straight to an accepting node).
|
||
- **Left extension** (tiles `below`): non-deterministic — probe each `X` (walk `above`,
|
||
`X`, then `below`, test acceptance). This is the asymmetry inherent to a left-to-right
|
||
DAWG.
|
||
|
||
Cross-sets are recomputed per generation for the squares that need them (cached within a
|
||
call). Scoring is done separately by `Evaluate` (§5), so cross-sums are not precomputed.
|
||
|
||
---
|
||
|
||
## 5. Scoring (full tournament rules + breakdown)
|
||
|
||
A play's score = main word + every cross-word formed + the all-tiles bonus. Per word:
|
||
|
||
- Sum `tileValue(letter)` over its tiles; a **blank** scores 0.
|
||
- A **letter** premium (DL/TL) multiplies the value of a tile placed on it **only when
|
||
newly placed** this turn.
|
||
- A **word** premium (DW/TW) multiplies the whole word **only when a newly-placed tile
|
||
sits on it**; multiple word premiums multiply.
|
||
- Each cross-word counts its new tile plus the existing perpendicular run.
|
||
|
||
The all-tiles bonus is added when the play uses a full rack. Board geometry, premium
|
||
layout, tile values/counts, blank count, rack size and the bonus are all part of the
|
||
`rules.Ruleset` (`English`, `RussianScrabble`, `Erudit`). `Evaluate` returns the main
|
||
word, the cross-words and a per-tile breakdown. `ValidatePlay` adds dictionary and
|
||
connectivity checks.
|
||
|
||
---
|
||
|
||
## 6. Rulesets
|
||
|
||
- **English** Scrabble: 15×15, standard premiums, 26 letters, 100 tiles (98 + 2 blanks),
|
||
7-tile rack, 50-point bonus.
|
||
- **Russian Scrabble**: 33-letter alphabet (incl. Ё), standard board, 104 tiles, 50-bonus.
|
||
- **Эрудит**: 33-letter alphabet with **Ё unused** (no tile — fold Ё→Е when preparing the
|
||
dictionary, `wordlist.FoldYo`; an out-of-engine step), the **centre does not double**,
|
||
131 tiles (128 + 3 blanks), blanks score 0, **15-point bonus**.
|
||
|
||
---
|
||
|
||
## 7. Special cases checklist
|
||
|
||
- **First move**: no anchors; the play must cover the centre.
|
||
- **Blanks**: any letter, score 0, flagged via bit 7; expand to every cross-set-allowed
|
||
letter during generation.
|
||
- **Off-board sentinels**: stop extension at the edge.
|
||
- **A single newly-placed tile** can form both an across and a down word.
|
||
- **Dedup**: each legal move is generated once (anchor + left-part limit); a canonical
|
||
move key guards against any residual duplicate.
|