CI: build-only validation (no make/python/contexts); commit folded erudit.txt
build / dawg (pull_request) Successful in 1m11s
build / dawg (pull_request) Successful in 1m11s
- build.yaml dropped the release step whose ${{ github.* }} contexts failed the Gitea
workflow compile (the run produced 0 jobs); it now inlines go run (no make dependency)
and reads the committed dictprep/russian/erudit.txt (no python dependency).
- erudit.txt is scrabble.txt with Ё→Е folded (dictprep/fold_yo.py); it reproduces the
canonical ru_erudit.dawg byte-for-byte. Release artifacts are published manually for now
(see README).
This commit is contained in:
+10
-29
@@ -1,14 +1,15 @@
|
||||
name: build
|
||||
|
||||
# Builds the dictionary DAWGs on every push/PR (validation) and, on a vX.Y.Z tag,
|
||||
# packages them flat into scrabble-dawg-<tag>.tar.gz and attaches it to the Gitea release.
|
||||
# The build pins the published scrabble-solver builders (GOPRIVATE -> direct VCS fetch from
|
||||
# this Gitea), so the on-disk format matches the running backend exactly.
|
||||
# Validation gate: rebuilds the three dictionary DAWGs on every push/PR and checks they are
|
||||
# non-empty. The build pins the published scrabble-solver builders (GOPRIVATE -> direct VCS
|
||||
# fetch from this Gitea), so the on-disk format and letter indexing match the running backend
|
||||
# exactly. Release artifacts (scrabble-dawg-<tag>.tar.gz attached to the Gitea release) are
|
||||
# published from this output; see README.md. Inlined go run (no make/python dependency on the
|
||||
# runner).
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
tags: ['v*']
|
||||
pull_request:
|
||||
branches: [master]
|
||||
|
||||
@@ -32,31 +33,11 @@ jobs:
|
||||
|
||||
- name: Build DAWGs
|
||||
run: |
|
||||
make dawg
|
||||
mkdir -p dawg
|
||||
go run ./cmd/builddict -dict dictionaries/english/sowpods.txt -alphabet latin -name en_sowpods -out dawg
|
||||
go run ./cmd/builddict -dict dictprep/russian/scrabble.txt -alphabet russian -name ru_scrabble -out dawg
|
||||
go run ./cmd/builddict -dict dictprep/russian/erudit.txt -alphabet russian -name ru_erudit -out dawg
|
||||
ls -la dawg/
|
||||
for f in en_sowpods ru_scrabble ru_erudit; do
|
||||
test -s "dawg/$f.dawg" || { echo "missing dawg/$f.dawg"; exit 1; }
|
||||
done
|
||||
|
||||
- name: Package and publish release artifact
|
||||
if: startsWith(github.ref, 'refs/tags/v')
|
||||
env:
|
||||
TOKEN: ${{ github.token }}
|
||||
API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
|
||||
run: |
|
||||
set -eo pipefail
|
||||
tag="${GITHUB_REF_NAME}"
|
||||
art="scrabble-dawg-${tag}.tar.gz"
|
||||
tar czf "$art" -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
|
||||
# Create the release (or fetch it if it already exists), then upload the asset.
|
||||
code=$(curl -sS -o /tmp/rel.json -w '%{http_code}' -X POST "$API/releases" \
|
||||
-H "Authorization: token $TOKEN" -H 'Content-Type: application/json' \
|
||||
-d "{\"tag_name\":\"$tag\",\"name\":\"$tag\",\"body\":\"Dictionary DAWG set $tag (en_sowpods, ru_scrabble, ru_erudit).\"}")
|
||||
if [ "$code" != "201" ]; then
|
||||
echo "release POST returned $code; fetching existing release for tag $tag"
|
||||
curl -sS -o /tmp/rel.json "$API/releases/tags/$tag" -H "Authorization: token $TOKEN"
|
||||
fi
|
||||
rel_id=$(python3 -c 'import json;print(json.load(open("/tmp/rel.json"))["id"])')
|
||||
curl -sS -X POST "$API/releases/$rel_id/assets?name=$art" \
|
||||
-H "Authorization: token $TOKEN" -F "attachment=@$art" -o /tmp/asset.json
|
||||
echo "published $art to release $rel_id"
|
||||
|
||||
@@ -5,14 +5,14 @@
|
||||
# format and letter indexing match the running backend exactly (no index drift):
|
||||
# en_sowpods.dawg — English SOWPODS (Latin alphabet)
|
||||
# ru_scrabble.dawg — Russian Scrabble nouns (Cyrillic, 33 letters)
|
||||
# ru_erudit.dawg — Эрудит (the same list with Ё→Е folded and de-duped)
|
||||
# ru_erudit.dawg — Эрудит (the Ё→Е folded + de-duped list, committed as russian/erudit.txt)
|
||||
#
|
||||
# The CI workflow packages dawg/*.dawg into a release artifact on a vX.Y.Z tag.
|
||||
# CI builds the DAWGs as a validation gate; release artifacts are published from this output
|
||||
# (see README.md). Regenerate russian/erudit.txt from scrabble.txt with dictprep/fold_yo.py.
|
||||
|
||||
export GOPRIVATE := gitea.iliadenisov.ru/*
|
||||
|
||||
GO ?= go
|
||||
PYTHON ?= python3
|
||||
DAWG_DIR := dawg
|
||||
BUILDDICT := $(GO) run ./cmd/builddict
|
||||
|
||||
@@ -27,8 +27,7 @@ dawg-ru:
|
||||
$(BUILDDICT) -dict dictprep/russian/scrabble.txt -alphabet russian -name ru_scrabble -out $(DAWG_DIR)
|
||||
|
||||
dawg-erudit:
|
||||
$(PYTHON) dictprep/fold_yo.py dictprep/russian/scrabble.txt > /tmp/ru_erudit_words.txt
|
||||
$(BUILDDICT) -dict /tmp/ru_erudit_words.txt -alphabet russian -name ru_erudit -out $(DAWG_DIR)
|
||||
$(BUILDDICT) -dict dictprep/russian/erudit.txt -alphabet russian -name ru_erudit -out $(DAWG_DIR)
|
||||
|
||||
clean-dawg:
|
||||
rm -f $(DAWG_DIR)/*.dawg
|
||||
|
||||
@@ -19,20 +19,23 @@ byte-identical to the solver's committed test fixtures.
|
||||
| --- | --- | --- |
|
||||
| `en_sowpods.dawg` | English (SOWPODS) | `dictionaries/english/sowpods.txt` |
|
||||
| `ru_scrabble.dawg` | Russian Scrabble | `dictprep/russian/scrabble.txt` |
|
||||
| `ru_erudit.dawg` | Эрудит | the Russian list with Ё→Е folded (`dictprep/fold_yo.py`) |
|
||||
| `ru_erudit.dawg` | Эрудит | `dictprep/russian/erudit.txt` (Ё→Е folded `scrabble.txt`, via `dictprep/fold_yo.py`) |
|
||||
|
||||
The CI (`.gitea/workflows/build.yaml`) builds them on every push/PR and, on a `vX.Y.Z` tag,
|
||||
packages them flat into `scrabble-dawg-<tag>.tar.gz` and attaches it to the Gitea release. The
|
||||
backend deploy unpacks that tarball into `BACKEND_DICT_DIR`; **one semver label versions the
|
||||
whole set** (additive — a new version is a new release, never breaking a running backend).
|
||||
The CI (`.gitea/workflows/build.yaml`) rebuilds them on every push/PR as a validation gate
|
||||
(inlined `go run`, no `make`/`python` needed on the runner). Release artifacts are published per
|
||||
version (see **Release** below): the three DAWGs packaged flat into `scrabble-dawg-<tag>.tar.gz`
|
||||
and attached to the Gitea release for the `vX.Y.Z` tag. The backend deploy unpacks that tarball
|
||||
into `BACKEND_DICT_DIR`; **one semver label versions the whole set** (additive — a new version is
|
||||
a new release, never breaking a running backend).
|
||||
|
||||
## Sources / provenance
|
||||
|
||||
- **English:** `dictionaries/english/sowpods.txt`, vendored from
|
||||
[`kamilmielnik/scrabble-dictionaries`](https://github.com/kamilmielnik/scrabble-dictionaries).
|
||||
- **Russian:** `dictprep/russian/scrabble.txt`, derived from the Russian academic orthographic
|
||||
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`). Only the prepared word
|
||||
list is vendored; the heavy upstream source (the orfo PDF/text) is not.
|
||||
dictionary by the tooling under `dictprep/` (see `dictprep/README.md`); `dictprep/russian/erudit.txt`
|
||||
is its Ё→Е folded form (`dictprep/fold_yo.py`). Only the prepared word lists are vendored; the
|
||||
heavy upstream source (the orfo PDF/text) is not.
|
||||
|
||||
## Build
|
||||
|
||||
@@ -41,4 +44,20 @@ make dawg # -> dawg/{en_sowpods,ru_scrabble,ru_erudit}.dawg
|
||||
```
|
||||
|
||||
Requires Go (module deps fetched with `GOPRIVATE=gitea.iliadenisov.ru/*`, exported by the
|
||||
Makefile) and `python3` (for the Ё→Е fold).
|
||||
Makefile). No `python` is needed for the build — the Ё→Е fold is committed as `erudit.txt`;
|
||||
regenerate it with `python3 dictprep/fold_yo.py dictprep/russian/scrabble.txt > dictprep/russian/erudit.txt`.
|
||||
|
||||
## Release
|
||||
|
||||
CI builds and validates the DAWGs but does not upload them (the release upload needs a write
|
||||
token, kept out of CI for now — a future enhancement). To publish a version, tag it and attach
|
||||
the artifact to its Gitea release:
|
||||
|
||||
```sh
|
||||
make dawg
|
||||
tar czf scrabble-dawg-vX.Y.Z.tar.gz -C dawg en_sowpods.dawg ru_scrabble.dawg ru_erudit.dawg
|
||||
# create the Gitea release for tag vX.Y.Z and upload scrabble-dawg-vX.Y.Z.tar.gz as an asset
|
||||
```
|
||||
|
||||
The backend consumes it at
|
||||
`https://gitea.iliadenisov.ru/developer/scrabble-dictionary/releases/download/vX.Y.Z/scrabble-dawg-vX.Y.Z.tar.gz`.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user