Files
scrabble-dictionary/README.md
T
Ilia Denisov d04470b741
build / dawg (push) Failing after 2s
Initial dictionary producer: builddict, word-list sources, DAWG build + CI
- builddict drives the de-internalized scrabble-solver dictdawg/wordlist builders
  (pinned v1.0.0) to produce the three DAWGs (en_sowpods, ru_scrabble, ru_erudit),
  byte-identical to the solver's committed fixtures (same dafsa/alphabet v1.1.0 -> no
  index drift with the running backend).
- Sources: english/sowpods.txt vendored from kamilmielnik/scrabble-dictionaries;
  russian/scrabble.txt + the dictprep tooling moved out of scrabble-solver.
- CI builds the DAWGs on push/PR and, on a vX.Y.Z tag, packages them flat into
  scrabble-dawg-<tag>.tar.gz and attaches it to the Gitea release.
2026-06-04 19:18:19 +02:00

45 lines
2.1 KiB
Markdown

# scrabble-dictionary
Versioned **dictionary artifacts** for the Scrabble game backend: the word-list sources and
the build pipeline that produces the dictionary DAWGs, published as a **release artifact**
(the DAWGs are data, not a Go module).
The build uses the published
[`scrabble-solver`](https://gitea.iliadenisov.ru/developer/scrabble-solver) `dictdawg`/`wordlist`
packages (pinned in `go.mod`) over `github.com/iliadenisov/{dafsa,alphabet}` (v1.1.0), so the
on-disk format and letter indexing match the running backend **exactly** — there is no index
drift, because the backend pins the same `dafsa`/`alphabet`. The DAWGs this repo builds are
byte-identical to the solver's committed test fixtures.
## Artifact
`make dawg` builds three DAWGs into `dawg/`:
| file | variant | source |
| --- | --- | --- |
| `en_sowpods.dawg` | English (SOWPODS) | `dictionaries/english/sowpods.txt` |
| `ru_scrabble.dawg` | Russian Scrabble | `dictprep/russian/scrabble.txt` |
| `ru_erudit.dawg` | Эрудит | the Russian list with Ё→Е folded (`dictprep/fold_yo.py`) |
The CI (`.gitea/workflows/build.yaml`) builds them on every push/PR and, on a `vX.Y.Z` tag,
packages them flat into `scrabble-dawg-<tag>.tar.gz` and attaches it to the Gitea release. The
backend deploy unpacks that tarball into `BACKEND_DICT_DIR`; **one semver label versions the
whole set** (additive — a new version is a new release, never breaking a running backend).
## Sources / provenance
- **English:** `dictionaries/english/sowpods.txt`, vendored from
[`kamilmielnik/scrabble-dictionaries`](https://github.com/kamilmielnik/scrabble-dictionaries).
- **Russian:** `dictprep/russian/scrabble.txt`, derived from the Russian academic orthographic
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`). Only the prepared word
list is vendored; the heavy upstream source (the orfo PDF/text) is not.
## Build
```sh
make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
```
Requires Go (module deps fetched with `GOPRIVATE=gitea.iliadenisov.ru/*`, exported by the
Makefile) and `python3` (for the Ё→Е fold).