Implement Scrabble move generator (DAWG) with English and Russian rules
A Go library that returns every legal play ranked by score and scores or validates plays, using the Appel-Jacobson DAWG algorithm over github.com/iliadenisov/dafsa v1.1.0. - DAWG move generation (across / down / both), full tournament scoring with a per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay, ValidatePlay. - Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset). - cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress (self-play benchmark), selfplay engine; brute-force test oracle. - A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
This commit is contained in:
+164
@@ -0,0 +1,164 @@
|
||||
# Scrabble move generation — algorithm reference (single source of truth)
|
||||
|
||||
This is the authoritative description of the algorithm the solver implements: the DAWG
|
||||
move generator of Appel & Jacobson. It is distilled from the source paper (extracted
|
||||
from the PDF in this repo) plus our adaptation. **Work from this file; do not re-parse
|
||||
the PDFs.**
|
||||
|
||||
Source: **[AJ88]** A. Appel, G. Jacobson, *The World's Fastest Scrabble Program*,
|
||||
Commun. ACM 31(5):572–578, 585 (1988).
|
||||
|
||||
> A GADDAG (Gordon, 1994) was also implemented and benchmarked against the DAWG, then
|
||||
> **removed**: for a scoring solver it was ~7× larger and no faster. See `RESULTS.md`.
|
||||
|
||||
Notation: letters are alphabet **indexes** (English `a..z` → `0..25`). "Across" =
|
||||
horizontal play; "down" = vertical. The board grid and all I/O use the byte encoding in
|
||||
`internal/encoding` (low 6 bits = letter+1, `0` = empty, bit 7 = blank).
|
||||
|
||||
---
|
||||
|
||||
## 1. Reduction to one dimension [AJ88 §3.1]
|
||||
|
||||
Every play is either **across** (one row) or **down** (one column). Down plays are
|
||||
across plays on the **transposed** board, so the generator implements only "across" and
|
||||
runs on the board and/or its transpose. This is the two-mode requirement:
|
||||
|
||||
- `OnlyHorizontal` = across on the board.
|
||||
- `OnlyVertical` = across on the transpose.
|
||||
- `Both` = both. (A move found on the transpose has its coordinates swapped back.)
|
||||
|
||||
### Anchors [AJ88 §3.1.2]
|
||||
|
||||
An across word must include a newly-placed tile adjacent to a tile already on the board.
|
||||
Generation starts only from **anchors** — empty squares orthogonally adjacent to a
|
||||
filled square — which guarantees connectivity and prunes most of the search. The very
|
||||
first move has no anchors and is special-cased through the board's centre square.
|
||||
|
||||
### Cross-checks (cross-sets) [AJ88 §3.1.1]
|
||||
|
||||
For each empty square the **cross-set** is the set of letters that, placed there, form a
|
||||
legal **perpendicular** word with the tiles above/below it; it is stored as a bit-vector
|
||||
(`letterSet`). A square with no perpendicular neighbour allows every letter. Cross-sets
|
||||
let move generation stay one-dimensional. Off-board squares are treated as blocking
|
||||
sentinels.
|
||||
|
||||
---
|
||||
|
||||
## 2. Lexicon: trie → DAWG [AJ88 §3.2]
|
||||
|
||||
The lexicon is a trie whose edges are labelled by letters; a word is a root-to-node
|
||||
path; terminal nodes are marked. Minimizing the trie (merging equivalent sub-tries)
|
||||
yields a **DAWG**, the minimum-state automaton for the word set. `dafsa` is exactly such
|
||||
a minimized, bit-packed DAWG, with a compact ≤6-bit alphabet and a **per-node** `final`
|
||||
flag. Its node identity is a bit-offset; edges of a node are sorted by letter.
|
||||
|
||||
---
|
||||
|
||||
## 3. Move generation — [AJ88 §3.3] (verbatim)
|
||||
|
||||
Two phases per anchor: place a **left part**, then **extend right**. The left part is
|
||||
either tiles already on the board (read them, then `ExtendRight` from the corresponding
|
||||
node) or rack tiles placed by a pruned DAWG traversal bounded by `limit` = number of
|
||||
non-anchor squares to the left (this bound, reset at each anchor, makes each move appear
|
||||
once).
|
||||
|
||||
```
|
||||
ExtendRight(PartialWord, node N, square):
|
||||
if square is vacant then
|
||||
if N is terminal then LegalMove(PartialWord)
|
||||
for each edge E out of N:
|
||||
let l = letter of E
|
||||
if l is in the rack and l is in the cross-set of square then
|
||||
remove a tile l from the rack
|
||||
ExtendRight(PartialWord || l, node(E), square+1)
|
||||
put the tile l back into the rack
|
||||
else
|
||||
let l = letter occupying square
|
||||
if N has an edge labelled l leading to N' then
|
||||
ExtendRight(PartialWord || l, N', square+1)
|
||||
|
||||
LeftPart(PartialWord, node N, limit):
|
||||
ExtendRight(PartialWord, N, AnchorSquare)
|
||||
if limit > 0 then
|
||||
for each edge E out of N:
|
||||
let l = letter of E
|
||||
if l is in the rack then
|
||||
remove a tile l from the rack
|
||||
LeftPart(PartialWord || l, node(E), limit - 1)
|
||||
put the tile l back into the rack
|
||||
```
|
||||
|
||||
To generate from an anchor with `k` non-anchor squares to its left: `LeftPart("", root,
|
||||
k)`. Implementation notes:
|
||||
|
||||
- A word is **recorded only past the anchor** (`square > anchor`), so every recorded play
|
||||
covers the anchor square — the connection guaranteed by the anchor's filled neighbour.
|
||||
- **Empty prefix** when a tile sits left of the anchor: skip `LeftPart`; `ExtendRight`
|
||||
directly from the node reached by walking the on-board left context.
|
||||
- **Blanks**: when scanning the rack for a letter, also allow a blank to stand for it;
|
||||
the placed tile is flagged (bit 7) and scores 0; restore on backtrack.
|
||||
- **First move**: the only anchor is the centre; the left limit equals its column, so the
|
||||
word covers the centre.
|
||||
|
||||
This maps onto the `dafsa` traversal API: `Cursor.Root`, `Cursor.Next(node, letter)`
|
||||
(an edge, with the destination's `final` flag), `Cursor.Final(node)`, and `Cursor.Arcs`
|
||||
(enumerate a node's edges, used for placements and cross-sets).
|
||||
|
||||
---
|
||||
|
||||
## 4. Cross-set computation
|
||||
|
||||
For a square with tiles `above` (top→bottom) and `below`, the cross-set is
|
||||
`{ X : above·X·below ∈ dict }`:
|
||||
|
||||
- **Right extension** (no `below`): deterministic — `X` just completes the prefix
|
||||
`above`. Walk `above` to a node, then read the **completers** (one arc enumeration:
|
||||
the letters whose arc leads straight to an accepting node).
|
||||
- **Left extension** (tiles `below`): non-deterministic — probe each `X` (walk `above`,
|
||||
`X`, then `below`, test acceptance). This is the asymmetry inherent to a left-to-right
|
||||
DAWG.
|
||||
|
||||
Cross-sets are recomputed per generation for the squares that need them (cached within a
|
||||
call). Scoring is done separately by `Evaluate` (§5), so cross-sums are not precomputed.
|
||||
|
||||
---
|
||||
|
||||
## 5. Scoring (full tournament rules + breakdown)
|
||||
|
||||
A play's score = main word + every cross-word formed + the all-tiles bonus. Per word:
|
||||
|
||||
- Sum `tileValue(letter)` over its tiles; a **blank** scores 0.
|
||||
- A **letter** premium (DL/TL) multiplies the value of a tile placed on it **only when
|
||||
newly placed** this turn.
|
||||
- A **word** premium (DW/TW) multiplies the whole word **only when a newly-placed tile
|
||||
sits on it**; multiple word premiums multiply.
|
||||
- Each cross-word counts its new tile plus the existing perpendicular run.
|
||||
|
||||
The all-tiles bonus is added when the play uses a full rack. Board geometry, premium
|
||||
layout, tile values/counts, blank count, rack size and the bonus are all part of the
|
||||
`rules.Ruleset` (`English`, `RussianScrabble`, `Erudit`). `Evaluate` returns the main
|
||||
word, the cross-words and a per-tile breakdown. `ValidatePlay` adds dictionary and
|
||||
connectivity checks.
|
||||
|
||||
---
|
||||
|
||||
## 6. Rulesets
|
||||
|
||||
- **English** Scrabble: 15×15, standard premiums, 26 letters, 100 tiles (98 + 2 blanks),
|
||||
7-tile rack, 50-point bonus.
|
||||
- **Russian Scrabble**: 33-letter alphabet (incl. Ё), standard board, 104 tiles, 50-bonus.
|
||||
- **Эрудит**: 33-letter alphabet with **Ё unused** (no tile — fold Ё→Е when preparing the
|
||||
dictionary, `wordlist.FoldYo`; an out-of-engine step), the **centre does not double**,
|
||||
131 tiles (128 + 3 blanks), blanks score 0, **15-point bonus**.
|
||||
|
||||
---
|
||||
|
||||
## 7. Special cases checklist
|
||||
|
||||
- **First move**: no anchors; the play must cover the centre.
|
||||
- **Blanks**: any letter, score 0, flagged via bit 7; expand to every cross-set-allowed
|
||||
letter during generation.
|
||||
- **Off-board sentinels**: stop extension at the edge.
|
||||
- **A single newly-placed tile** can form both an across and a down word.
|
||||
- **Dedup**: each legal move is generated once (anchor + left-part limit); a canonical
|
||||
move key guards against any residual duplicate.
|
||||
Reference in New Issue
Block a user