Publish as versioned Gitea module; move dictionary pipeline out

- Rename module to gitea.iliadenisov.ru/developer/scrabble-solver so it can be
  consumed as a versioned dependency (no go.work replace / CI clone).
- De-internalize wordlist and dictdawg as public packages.
- Remove cmd/builddict, dictprep/, the dictionaries submodule and the dawg
  Makefile: the word-list parsing and DAWG build now live in the separate
  scrabble-dictionary repository, which publishes the DAWG set as a release artifact.
- internal/dict loads the committed dawg/en_sowpods.dawg fixture for cmd/stress.
- Update README/CLAUDE docs accordingly.
This commit is contained in:
Ilia Denisov
2026-06-04 19:11:46 +02:00
parent 63a7c663bf
commit 256999b42c
41 changed files with 93 additions and 402477 deletions
+11 -12
View File
@@ -17,23 +17,21 @@ Russian **Эрудит** (`rules` package); Эрудит has no Ё tile and fold
- `board/`, `rack/`, `rules/` — board grid (+ transpose), rack as per-letter counts,
and rulesets (geometry, premium layout, tile values/counts, alphabet, bonus):
`rules.English()`, `rules.RussianScrabble()`, `rules.Erudit()`.
- `internal/` `dictdawg` (build/load/serialise DAWGs over dafsa), `wordlist`
(encode/filter/sort/dedupe + `FoldYo`), `graph`, `dict`.
- `cmd/builddict` — word list → serialised DAWG (`-alphabet latin|russian`).
- `dictdawg/`, `wordlist/`**public** helpers: `dictdawg` (build/load/serialise DAWGs
over dafsa), `wordlist` (encode/filter/sort/dedupe + `FoldYo`). Imported by the separate
`scrabble-dictionary` repo that builds and publishes the DAWG set.
- `internal/``encoding`, `graph`, `dict` (loads the committed `dawg/en_sowpods.dawg`
for `cmd/stress`).
- `cmd/stress`, `selfplay/` — the self-play stress harness behind `RESULTS.md`.
- `dawg/`**committed** dictionaries: `en_sowpods.dawg`, `ru_scrabble.dawg`,
`ru_erudit.dawg` (Ё→Е folded). Rebuild with `make dawg`.
- `dictionaries/``kamilmielnik/scrabble-dictionaries` git submodule (English source).
- `dictprep/` — self-contained tooling that turns the Russian academic orthographic
dictionary into a common-noun word list. See `dictprep/README.md`. Committed output is
`dictprep/russian/{all,scrabble}.txt` (+ `orfo_dict_2025.{pdf,txt}`, `manual_confirm.txt`).
Running Stage 2 needs a Python venv with `mawo-pymorphy3` and the `libmorph` apt packages
(see `dictprep/README.md`).
`ru_erudit.dawg` (Ё→Е folded). The word-list sources and build pipeline live in the
separate [`scrabble-dictionary`](https://gitea.iliadenisov.ru/developer/scrabble-dictionary)
repo (which publishes the DAWG set as a release artifact); these committed copies are
test fixtures.
## Build & test
go test ./... # all packages green; also run go vet ./... and gofmt
make dawg # rebuild dawg/*.dawg from the word lists
Scoring and move generation are validated against **real tournament games** in GCG format
(`scrabble/gcg_test.go` + `scrabble/testdata/*.gcg`, including the 700+ club): for every
@@ -46,4 +44,5 @@ produces the played move with that score — canonical play, not invented cases.
and output bytes only — never inside the graph). The public API is byte-indexed.
- DAWG is the production generator; the GADDAG was removed after measurement.
- Detailed docs: `ALGORITHM.md` (the algorithm — single source of truth), `PLAN.md`
(design and decisions), `RESULTS.md` (DAWG-vs-GADDAG), `dictprep/README.md` (RU pipeline).
(design and decisions), `RESULTS.md` (DAWG-vs-GADDAG). The RU word-list pipeline and the
DAWG build now live in the `scrabble-dictionary` repo.