From 0b760cd359b4e4a9dedd51304fd414ec02dfc51f Mon Sep 17 00:00:00 2001 From: Ilia Denisov Date: Tue, 2 Jun 2026 00:00:52 +0200 Subject: [PATCH] docs: add project CLAUDE.md (architecture, layout, build/test guide) --- CLAUDE.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..d866811 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,49 @@ +# scrabble-solver — project guide + +A Go library that, given a dictionary, a board position and a rack, returns every legal +play ranked by score, and scores/validates arbitrary plays. The move generator is the +**DAWG** algorithm (Appel & Jacobson) over `github.com/iliadenisov/dafsa` — a bit-packed, +minimised DAWG with a compact ≤63-symbol alphabet. A GADDAG generator was also built, +measured by self-play, and **removed**: DAWG won for this scoring-solver workload +(~7× smaller, comparable speed) — see `RESULTS.md`. + +Module `scrabble-solver`, Go 1.26. Rulesets: English Scrabble, Russian Scrabble, and +Russian **Эрудит** (`rules` package); Эрудит has no Ё tile and folds Ё→Е in its dictionary. + +## Layout + +- `scrabble/` — the public API: `Solver` (`NewSolver`, `GenerateMoves`, `ScorePlay`, + `ValidatePlay`), the `Move`/`Placement`/`Word` types, the DAWG generator and scoring. +- `board/`, `rack/`, `rules/` — board grid (+ transpose), rack as per-letter counts, + and rulesets (geometry, premium layout, tile values/counts, alphabet, bonus): + `rules.English()`, `rules.RussianScrabble()`, `rules.Erudit()`. +- `internal/` — `dictdawg` (build/load/serialise DAWGs over dafsa), `wordlist` + (encode/filter/sort/dedupe + `FoldYo`), `graph`, `dict`. +- `cmd/builddict` — word list → serialised DAWG (`-alphabet latin|russian`). +- `cmd/stress`, `selfplay/` — the self-play stress harness behind `RESULTS.md`. +- `dawg/` — **committed** dictionaries: `en_sowpods.dawg`, `ru_scrabble.dawg`, + `ru_erudit.dawg` (Ё→Е folded). Rebuild with `make dawg`. +- `dictionaries/` — `kamilmielnik/scrabble-dictionaries` git submodule (English source). +- `dictprep/` — self-contained tooling that turns the Russian academic orthographic + dictionary into a common-noun word list. See `dictprep/README.md`. Committed output is + `dictprep/russian/{all,scrabble}.txt` (+ `orfo_dict_2025.{pdf,txt}`, `manual_confirm.txt`). + Running Stage 2 needs a Python venv with `mawo-pymorphy3` and the `libmorph` apt packages + (see `dictprep/README.md`). + +## Build & test + + go test ./... # all packages green; also run go vet ./... and gofmt + make dawg # rebuild dawg/*.dawg from the word lists + +Scoring and move generation are validated against **real tournament games** in GCG format +(`scrabble/gcg_test.go` + `scrabble/testdata/*.gcg`, including the 700+ club): for every +move the test checks the score, the running total, and that the generator actually +produces the played move with that score — canonical play, not invented cases. + +## Key facts + +- Compact byte encoding: low 6 bits = alphabet index; `0x80` = blank/wildcard (board, rack + and output bytes only — never inside the graph). The public API is byte-indexed. +- DAWG is the production generator; the GADDAG was removed after measurement. +- Detailed docs: `ALGORITHM.md` (the algorithm — single source of truth), `PLAN.md` + (design and decisions), `RESULTS.md` (DAWG-vs-GADDAG), `dictprep/README.md` (RU pipeline).