CI: build-only validation (no make/python/contexts); commit folded erudit.txt
build / dawg (pull_request) Successful in 1m11s

- build.yaml dropped the release step whose ${{ github.* }} contexts failed the Gitea
  workflow compile (the run produced 0 jobs); it now inlines go run (no make dependency)
  and reads the committed dictprep/russian/erudit.txt (no python dependency).
- erudit.txt is scrabble.txt with Ё→Е folded (dictprep/fold_yo.py); it reproduces the
  canonical ru_erudit.dawg byte-for-byte. Release artifacts are published manually for now
  (see README).
This commit is contained in:
Ilia Denisov
2026-06-04 19:43:44 +02:00
parent d04470b741
commit 1d34753611
4 changed files with 83384 additions and 42 deletions
+27 -8
View File
@@ -19,20 +19,23 @@ byte-identical to the solver's committed test fixtures.
| --- | --- | --- |
| `en_sowpods.dawg` | English (SOWPODS) | `dictionaries/english/sowpods.txt` |
| `ru_scrabble.dawg` | Russian Scrabble | `dictprep/russian/scrabble.txt` |
| `ru_erudit.dawg` | Эрудит | the Russian list with Ё→Е folded (`dictprep/fold_yo.py`) |
| `ru_erudit.dawg` | Эрудит | `dictprep/russian/erudit.txt` (Ё→Е folded `scrabble.txt`, via `dictprep/fold_yo.py`) |
The CI (`.gitea/workflows/build.yaml`) builds them on every push/PR and, on a `vX.Y.Z` tag,
packages them flat into `scrabble-dawg-<tag>.tar.gz` and attaches it to the Gitea release. The
backend deploy unpacks that tarball into `BACKEND_DICT_DIR`; **one semver label versions the
whole set** (additive — a new version is a new release, never breaking a running backend).
The CI (`.gitea/workflows/build.yaml`) rebuilds them on every push/PR as a validation gate
(inlined `go run`, no `make`/`python` needed on the runner). Release artifacts are published per
version (see **Release** below): the three DAWGs packaged flat into `scrabble-dawg-<tag>.tar.gz`
and attached to the Gitea release for the `vX.Y.Z` tag. The backend deploy unpacks that tarball
into `BACKEND_DICT_DIR`; **one semver label versions the whole set** (additive — a new version is
a new release, never breaking a running backend).
## Sources / provenance
- **English:** `dictionaries/english/sowpods.txt`, vendored from
[`kamilmielnik/scrabble-dictionaries`](https://github.com/kamilmielnik/scrabble-dictionaries).
- **Russian:** `dictprep/russian/scrabble.txt`, derived from the Russian academic orthographic
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`). Only the prepared word
list is vendored; the heavy upstream source (the orfo PDF/text) is not.
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`); `dictprep/russian/erudit.txt`
is its Ё→Е folded form (`dictprep/fold_yo.py`). Only the prepared word lists are vendored; the
heavy upstream source (the orfo PDF/text) is not.
## Build
@@ -41,4 +44,20 @@ make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
```
Requires Go (module deps fetched with `GOPRIVATE=gitea.iliadenisov.ru/*`, exported by the
Makefile) and `python3` (for the Ё→Е fold).
Makefile). No `python` is needed for the build — the Ё→Е fold is committed as `erudit.txt`;
regenerate it with `python3 dictprep/fold_yo.py dictprep/russian/scrabble.txt > dictprep/russian/erudit.txt`.
## Release
CI builds and validates the DAWGs but does not upload them (the release upload needs a write
token, kept out of CI for now — a future enhancement). To publish a version, tag it and attach
the artifact to its Gitea release:
```sh
make dawg
tar czf scrabble-dawg-vX.Y.Z.tar.gz -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
# create the Gitea release for tag vX.Y.Z and upload scrabble-dawg-vX.Y.Z.tar.gz as an asset
```
The backend consumes it at
`https://gitea.iliadenisov.ru/developer/scrabble-dictionary/releases/download/vX.Y.Z/scrabble-dawg-vX.Y.Z.tar.gz`.