CI: build-only validation (no make/python/contexts); commit folded erudit.txt
build / dawg (pull_request) Successful in 1m11s

- build.yaml dropped the release step whose ${{ github.* }} contexts failed the Gitea
  workflow compile (the run produced 0 jobs); it now inlines go run (no make dependency)
  and reads the committed dictprep/russian/erudit.txt (no python dependency).
- erudit.txt is scrabble.txt with Ё→Е folded (dictprep/fold_yo.py); it reproduces the
  canonical ru_erudit.dawg byte-for-byte. Release artifacts are published manually for now
  (see README).
This commit is contained in:
Ilia Denisov
2026-06-04 19:43:44 +02:00
parent d04470b741
commit 1d34753611
4 changed files with 83384 additions and 42 deletions
+10 -29
View File
@@ -1,14 +1,15 @@
name: build
# Builds the dictionary DAWGs on every push/PR (validation) and, on a vX.Y.Z tag,
# packages them flat into scrabble-dawg-<tag>.tar.gz and attaches it to the Gitea release.
# The build pins the published scrabble-solver builders (GOPRIVATE -> direct VCS fetch from
# this Gitea), so the on-disk format matches the running backend exactly.
# Validation gate: rebuilds the three dictionary DAWGs on every push/PR and checks they are
# non-empty. The build pins the published scrabble-solver builders (GOPRIVATE -> direct VCS
# fetch from this Gitea), so the on-disk format and letter indexing match the running backend
# exactly. Release artifacts (scrabble-dawg-<tag>.tar.gz attached to the Gitea release) are
# published from this output; see README.md. Inlined go run (no make/python dependency on the
# runner).
on:
push:
branches: [master]
tags: ['v*']
pull_request:
branches: [master]
@@ -32,31 +33,11 @@ jobs:
- name: Build DAWGs
run: |
make dawg
mkdir -p dawg
go run ./cmd/builddict -dict dictionaries/english/sowpods.txt -alphabet latin -name en_sowpods -out dawg
go run ./cmd/builddict -dict dictprep/russian/scrabble.txt -alphabet russian -name ru_scrabble -out dawg
go run ./cmd/builddict -dict dictprep/russian/erudit.txt -alphabet russian -name ru_erudit -out dawg
ls -la dawg/
for f in en_sowpods ru_scrabble ru_erudit; do
test -s "dawg/$f.dawg" || { echo "missing dawg/$f.dawg"; exit 1; }
done
- name: Package and publish release artifact
if: startsWith(github.ref, 'refs/tags/v')
env:
TOKEN: ${{ github.token }}
API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
run: |
set -eo pipefail
tag="${GITHUB_REF_NAME}"
art="scrabble-dawg-${tag}.tar.gz"
tar czf "$art" -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
# Create the release (or fetch it if it already exists), then upload the asset.
code=$(curl -sS -o /tmp/rel.json -w '%{http_code}' -X POST "$API/releases" \
-H "Authorization: token $TOKEN" -H 'Content-Type: application/json' \
-d "{\"tag_name\":\"$tag\",\"name\":\"$tag\",\"body\":\"Dictionary DAWG set $tag (en_sowpods, ru_scrabble, ru_erudit).\"}")
if [ "$code" != "201" ]; then
echo "release POST returned $code; fetching existing release for tag $tag"
curl -sS -o /tmp/rel.json "$API/releases/tags/$tag" -H "Authorization: token $TOKEN"
fi
rel_id=$(python3 -c 'import json;print(json.load(open("/tmp/rel.json"))["id"])')
curl -sS -X POST "$API/releases/$rel_id/assets?name=$art" \
-H "Authorization: token $TOKEN" -F "attachment=@$art" -o /tmp/asset.json
echo "published $art to release $rel_id"
+4 -5
View File
@@ -5,14 +5,14 @@
# format and letter indexing match the running backend exactly (no index drift):
# en_sowpods.dawg — English SOWPODS (Latin alphabet)
# ru_scrabble.dawg — Russian Scrabble nouns (Cyrillic, 33 letters)
# ru_erudit.dawg — Эрудит (the same list with Ё→Е folded and de-duped)
# ru_erudit.dawg — Эрудит (the Ё→Е folded + de-duped list, committed as russian/erudit.txt)
#
# The CI workflow packages dawg/*.dawg into a release artifact on a vX.Y.Z tag.
# CI builds the DAWGs as a validation gate; release artifacts are published from this output
# (see README.md). Regenerate russian/erudit.txt from scrabble.txt with dictprep/fold_yo.py.
export GOPRIVATE := gitea.iliadenisov.ru/*
GO ?= go
PYTHON ?= python3
DAWG_DIR := dawg
BUILDDICT := $(GO) run ./cmd/builddict
@@ -27,8 +27,7 @@ dawg-ru:
$(BUILDDICT) -dict dictprep/russian/scrabble.txt -alphabet russian -name ru_scrabble -out $(DAWG_DIR)
dawg-erudit:
$(PYTHON) dictprep/fold_yo.py dictprep/russian/scrabble.txt > /tmp/ru_erudit_words.txt
$(BUILDDICT) -dict /tmp/ru_erudit_words.txt -alphabet russian -name ru_erudit -out $(DAWG_DIR)
$(BUILDDICT) -dict dictprep/russian/erudit.txt -alphabet russian -name ru_erudit -out $(DAWG_DIR)
clean-dawg:
rm -f $(DAWG_DIR)/*.dawg
+27 -8
View File
@@ -19,20 +19,23 @@ byte-identical to the solver's committed test fixtures.
| --- | --- | --- |
| `en_sowpods.dawg` | English (SOWPODS) | `dictionaries/english/sowpods.txt` |
| `ru_scrabble.dawg` | Russian Scrabble | `dictprep/russian/scrabble.txt` |
| `ru_erudit.dawg` | Эрудит | the Russian list with Ё→Е folded (`dictprep/fold_yo.py`) |
| `ru_erudit.dawg` | Эрудит | `dictprep/russian/erudit.txt` (Ё→Е folded `scrabble.txt`, via `dictprep/fold_yo.py`) |
The CI (`.gitea/workflows/build.yaml`) builds them on every push/PR and, on a `vX.Y.Z` tag,
packages them flat into `scrabble-dawg-<tag>.tar.gz` and attaches it to the Gitea release. The
backend deploy unpacks that tarball into `BACKEND_DICT_DIR`; **one semver label versions the
whole set** (additive — a new version is a new release, never breaking a running backend).
The CI (`.gitea/workflows/build.yaml`) rebuilds them on every push/PR as a validation gate
(inlined `go run`, no `make`/`python` needed on the runner). Release artifacts are published per
version (see **Release** below): the three DAWGs packaged flat into `scrabble-dawg-<tag>.tar.gz`
and attached to the Gitea release for the `vX.Y.Z` tag. The backend deploy unpacks that tarball
into `BACKEND_DICT_DIR`; **one semver label versions the whole set** (additive — a new version is
a new release, never breaking a running backend).
## Sources / provenance
- **English:** `dictionaries/english/sowpods.txt`, vendored from
[`kamilmielnik/scrabble-dictionaries`](https://github.com/kamilmielnik/scrabble-dictionaries).
- **Russian:** `dictprep/russian/scrabble.txt`, derived from the Russian academic orthographic
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`). Only the prepared word
list is vendored; the heavy upstream source (the orfo PDF/text) is not.
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`); `dictprep/russian/erudit.txt`
is its Ё→Е folded form (`dictprep/fold_yo.py`). Only the prepared word lists are vendored; the
heavy upstream source (the orfo PDF/text) is not.
## Build
@@ -41,4 +44,20 @@ make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
```
Requires Go (module deps fetched with `GOPRIVATE=gitea.iliadenisov.ru/*`, exported by the
Makefile) and `python3` (for the Ё→Е fold).
Makefile). No `python` is needed for the build — the Ё→Е fold is committed as `erudit.txt`;
regenerate it with `python3 dictprep/fold_yo.py dictprep/russian/scrabble.txt > dictprep/russian/erudit.txt`.
## Release
CI builds and validates the DAWGs but does not upload them (the release upload needs a write
token, kept out of CI for now — a future enhancement). To publish a version, tag it and attach
the artifact to its Gitea release:
```sh
make dawg
tar czf scrabble-dawg-vX.Y.Z.tar.gz -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
# create the Gitea release for tag vX.Y.Z and upload scrabble-dawg-vX.Y.Z.tar.gz as an asset
```
The backend consumes it at
`https://gitea.iliadenisov.ru/developer/scrabble-dictionary/releases/download/vX.Y.Z/scrabble-dawg-vX.Y.Z.tar.gz`.
File diff suppressed because it is too large Load Diff