Files
scrabble-solver/RESULTS.md
T
Ilia Denisov 15c7959d96 Implement Scrabble move generator (DAWG) with English and Russian rules
A Go library that returns every legal play ranked by score and scores or
validates plays, using the Appel-Jacobson DAWG algorithm over
github.com/iliadenisov/dafsa v1.1.0.

- DAWG move generation (across / down / both), full tournament scoring with a
  per-tile breakdown; public Solver: GenerateMoves (ranked), ScorePlay,
  ValidatePlay.
- Rulesets: English Scrabble, Russian Scrabble, Эрудит (parameterizable Ruleset).
- cmd/builddict (build the DAWG from the dictionaries submodule), cmd/stress
  (self-play benchmark), selfplay engine; brute-force test oracle.
- A GADDAG was implemented, benchmarked and removed (the DAWG was smaller and
  faster for a scoring solver); see RESULTS.md and ALGORITHM.md.
2026-06-01 16:07:32 +02:00

88 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DAWG vs GADDAG — self-play stress test
> **Note.** The GADDAG generator has since been **removed** from the codebase — the DAWG
> is the sole move generator. This document is kept as the record of the comparison that
> justified that choice. `cmd/stress` now benchmarks the DAWG alone.
The two move generators were compared by playing greedy AI-vs-AI games on the same
dictionary with the same seeds, measuring speed and memory. Reproduce with:
```
go run ./cmd/builddict # build testdata/sowpods.{dawg,gaddag}
go run ./cmd/stress -games 200
```
## Setup
- Dictionary: English **SOWPODS**, 267,752 words (215 letters).
- Board: standard 15×15; greedy player (highest-scoring move), seeded RNG.
- 200 games per generator, identical seeds; mode = both orientations.
- Machine: this dev container (Go 1.26, 12 cores; single-threaded run).
## Results (200 games)
Includes the **deterministic cross-set optimization** (one arc enumeration via
`completers()` for one-sided squares instead of probing every letter); both one-sided
cases are deterministic for the GADDAG, only the right-extension is for the DAWG.
| metric | DAWG | GADDAG |
|---|---:|---:|
| structure size | **732.5 KB** | 5.12 MB |
| games / turns / plays | 200 / 4783 / 4769 | 200 / 4783 / 4769 |
| moves generated | 3,880,236 | 3,880,236 |
| generation time | **23.68 s** | 25.98 s |
| µs / move-generation call | **4951** | 5431 |
| moves generated / sec | **163,863** | 149,383 |
| arcs traversed | **261.8 M** | 276.0 M |
| arcs / move generated | **67.5** | 71.1 |
| heap allocated | 16.79 GB | 16.79 GB |
| GC cycles | 5624 | 5605 |
| avg final game score | 849.3 | 849.3 |
`GADDAG vs DAWG: 1.10× generation time, 7.16× structure size, 1.00× heap.`
Peak process RSS (both structures mapped): ~42 MB.
The optimization cut the GADDAG's cross-set arcs (285.5 M → 276.0 M) and narrowed the
arc gap (1.08× → 1.05×), but the verdict is unchanged: **the GADDAG is still ~10 %
slower, 7× larger and traverses slightly more arcs.** End-to-end time is dominated by
the shared per-move scoring (`Evaluate`, ~3.9 M calls), not by graph search, so the
search-algorithm difference barely moves the total — and what difference remains favours
the narrower, smaller DAWG.
## Interpretation
- **Correctness.** Both generators produce the *identical* set of moves and scores at
every position (identical turns/plays/moves/score above, and the Stage-8 mutual oracle
agreeing on 119 positions over 5 games). They differ only in how they search.
- **Speed.** The DAWG is ~10 % faster end-to-end and traverses ~8 % **fewer** arcs.
- **The GADDAG does not win here, contrary to Gordon's paper.** Two reasons:
1. We use the **final-flag GADDAG** (the minimized DAWG of `REV(x)◊y`, completion via
accepting states) so that `dafsa` can be used essentially unmodified. This variant
lacks Gordon's *letter-sets-on-arcs* compression, so it is both ~7× larger and has
wider nodes — it traverses *more* arcs, not fewer, erasing the theoretical edge.
2. The workload is a **scoring solver**: every generated move is scored (cross-words,
premiums, bonus) by the shared `Evaluate`. That shared per-move cost dominates, so
the search-algorithm difference is small — and what remains favours the simpler,
narrower DAWG.
- **Memory.** The GADDAG structure is 7.16× larger (5.12 MB vs 732 KB). Per-move heap
allocation is identical (dominated by shared scoring), and overall RSS is modest.
## Decision
**Use the DAWG (Appel-Jacobson) generator as the production default.** For this library
(a move generator that scores and ranks every play) it is smaller (7×), at least as fast,
and simpler to operate: it needs no separator symbol or custom alphabet, `dafsa`'s
`Save`/`Load` work unchanged, and it requires the fewest `dafsa` additions (only the
shared low-level traversal API).
The **GADDAG generator is kept and remains selectable** (`scrabble.NewGADDAGGenerator`),
both as an alternative and as a continuing correctness oracle for the DAWG.
### Caveats / what could change the picture
- A **letter-set GADDAG** (Gordon's true compressed form) plus **incremental cross-set
maintenance** would shrink the GADDAG and cut its arc count; it might then beat the DAWG
on raw move generation. This was not pursued: the DAWG already meets the goal, and the
7× size gap is decisive for a scoring-solver workload where generation is not the
bottleneck. It remains a documented future optimization.