Initial dictionary producer: builddict, word-list sources, DAWG build + CI
build / dawg (push) Failing after 2s
build / dawg (push) Failing after 2s
- builddict drives the de-internalized scrabble-solver dictdawg/wordlist builders (pinned v1.0.0) to produce the three DAWGs (en_sowpods, ru_scrabble, ru_erudit), byte-identical to the solver's committed fixtures (same dafsa/alphabet v1.1.0 -> no index drift with the running backend). - Sources: english/sowpods.txt vendored from kamilmielnik/scrabble-dictionaries; russian/scrabble.txt + the dictprep tooling moved out of scrabble-solver. - CI builds the DAWGs on push/PR and, on a vX.Y.Z tag, packages them flat into scrabble-dawg-<tag>.tar.gz and attaches it to the Gitea release.
This commit is contained in:
@@ -0,0 +1,44 @@
|
||||
# scrabble-dictionary
|
||||
|
||||
Versioned **dictionary artifacts** for the Scrabble game backend: the word-list sources and
|
||||
the build pipeline that produces the dictionary DAWGs, published as a **release artifact**
|
||||
(the DAWGs are data, not a Go module).
|
||||
|
||||
The build uses the published
|
||||
[`scrabble-solver`](https://gitea.iliadenisov.ru/developer/scrabble-solver) `dictdawg`/`wordlist`
|
||||
packages (pinned in `go.mod`) over `github.com/iliadenisov/{dafsa,alphabet}` (v1.1.0), so the
|
||||
on-disk format and letter indexing match the running backend **exactly** — there is no index
|
||||
drift, because the backend pins the same `dafsa`/`alphabet`. The DAWGs this repo builds are
|
||||
byte-identical to the solver's committed test fixtures.
|
||||
|
||||
## Artifact
|
||||
|
||||
`make dawg` builds three DAWGs into `dawg/`:
|
||||
|
||||
| file | variant | source |
|
||||
| --- | --- | --- |
|
||||
| `en_sowpods.dawg` | English (SOWPODS) | `dictionaries/english/sowpods.txt` |
|
||||
| `ru_scrabble.dawg` | Russian Scrabble | `dictprep/russian/scrabble.txt` |
|
||||
| `ru_erudit.dawg` | Эрудит | the Russian list with Ё→Е folded (`dictprep/fold_yo.py`) |
|
||||
|
||||
The CI (`.gitea/workflows/build.yaml`) builds them on every push/PR and, on a `vX.Y.Z` tag,
|
||||
packages them flat into `scrabble-dawg-<tag>.tar.gz` and attaches it to the Gitea release. The
|
||||
backend deploy unpacks that tarball into `BACKEND_DICT_DIR`; **one semver label versions the
|
||||
whole set** (additive — a new version is a new release, never breaking a running backend).
|
||||
|
||||
## Sources / provenance
|
||||
|
||||
- **English:** `dictionaries/english/sowpods.txt`, vendored from
|
||||
[`kamilmielnik/scrabble-dictionaries`](https://github.com/kamilmielnik/scrabble-dictionaries).
|
||||
- **Russian:** `dictprep/russian/scrabble.txt`, derived from the Russian academic orthographic
|
||||
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`). Only the prepared word
|
||||
list is vendored; the heavy upstream source (the orfo PDF/text) is not.
|
||||
|
||||
## Build
|
||||
|
||||
```sh
|
||||
make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
|
||||
```
|
||||
|
||||
Requires Go (module deps fetched with `GOPRIVATE=gitea.iliadenisov.ru/*`, exported by the
|
||||
Makefile) and `python3` (for the Ё→Е fold).
|
||||
Reference in New Issue
Block a user