Consolidate the scattered build inputs (dictionaries/english/, dictprep/russian/) into one sources/ tree keyed by the variant labels (scrabble_en/scrabble_ru/ erudit_ru), and move the Russian prep pipeline to tools/. The dawg outputs and their filenames are unchanged — rebuilt byte-identical (en_sowpods/ru_scrabble/ ru_erudit) — so the release artifact and the backend are unaffected. ru_stage2.py OUT_DIR and the ruwords flag defaults are repointed to sources/scrabble_ru/; Makefile / CI / cmd/builddict default / README updated; pipeline intermediates git-ignored. Verified: make dawg byte-identical to the committed baseline, py_compile + go vet of the moved tools. The full Russian regeneration pipeline (pymorphy3/libmorph/orfo PDF) was not run here.
3.1 KiB
scrabble-dictionary
Versioned dictionary artifacts for the Scrabble game backend: the word-list sources and the build pipeline that produces the dictionary DAWGs, published as a release artifact (the DAWGs are data, not a Go module).
The build uses the published
scrabble-solver dictdawg/wordlist
packages (pinned in go.mod) over github.com/iliadenisov/{dafsa,alphabet} (v1.1.0), so the
on-disk format and letter indexing match the running backend exactly — there is no index
drift, because the backend pins the same dafsa/alphabet. The DAWGs this repo builds are
byte-identical to the solver's committed test fixtures.
Artifact
make dawg builds three DAWGs into dawg/:
| file | variant | source |
|---|---|---|
en_sowpods.dawg |
English (SOWPODS) | sources/scrabble_en/sowpods.txt |
ru_scrabble.dawg |
Russian Scrabble | sources/scrabble_ru/scrabble.txt |
ru_erudit.dawg |
Эрудит | sources/erudit_ru/erudit.txt (Ё→Е folded scrabble.txt, via tools/fold_yo.py) |
The CI (.gitea/workflows/build.yaml) rebuilds them on every push/PR as a validation gate
(inlined go run, no make/python needed on the runner). Release artifacts are published per
version (see Release below): the three DAWGs packaged flat into scrabble-dawg-<tag>.tar.gz
and attached to the Gitea release for the vX.Y.Z tag. The backend deploy unpacks that tarball
into BACKEND_DICT_DIR; one semver label versions the whole set (additive — a new version is
a new release, never breaking a running backend).
Sources / provenance
- English:
sources/scrabble_en/sowpods.txt, vendored fromkamilmielnik/scrabble-dictionaries. - Russian:
sources/scrabble_ru/scrabble.txt, derived from the Russian academic orthographic dictionary by the tooling undertools/(seetools/README.md);sources/erudit_ru/erudit.txtis its Ё→Е folded form (tools/fold_yo.py). Only the prepared word lists are vendored; the heavy upstream source (the orfo PDF/text) is not.
Build
make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
Requires Go (module deps fetched with GOPRIVATE=gitea.iliadenisov.ru/*, exported by the
Makefile). No python is needed for the build — the Ё→Е fold is committed as erudit.txt;
regenerate it with python3 tools/fold_yo.py sources/scrabble_ru/scrabble.txt > sources/erudit_ru/erudit.txt.
Release
CI builds and validates the DAWGs but does not upload them (the release upload needs a write token, kept out of CI for now — a future enhancement). To publish a version, tag it and attach the artifact to its Gitea release:
make dawg
tar czf scrabble-dawg-vX.Y.Z.tar.gz -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
# create the Gitea release for tag vX.Y.Z and upload scrabble-dawg-vX.Y.Z.tar.gz as an asset
The backend consumes it at
https://gitea.iliadenisov.ru/developer/scrabble-dictionary/releases/download/vX.Y.Z/scrabble-dawg-vX.Y.Z.tar.gz.