Initial dictionary producer: builddict, word-list sources, DAWG build + CI
build / dawg (push) Failing after 2s
build / dawg (push) Failing after 2s
- builddict drives the de-internalized scrabble-solver dictdawg/wordlist builders (pinned v1.0.0) to produce the three DAWGs (en_sowpods, ru_scrabble, ru_erudit), byte-identical to the solver's committed fixtures (same dafsa/alphabet v1.1.0 -> no index drift with the running backend). - Sources: english/sowpods.txt vendored from kamilmielnik/scrabble-dictionaries; russian/scrabble.txt + the dictprep tooling moved out of scrabble-solver. - CI builds the DAWGs on push/PR and, on a vX.Y.Z tag, packages them flat into scrabble-dawg-<tag>.tar.gz and attaches it to the Gitea release.
This commit is contained in:
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Fold Ё/ё → Е/е in a word list and de-duplicate — the dictionary prep for "Эрудит".
|
||||
|
||||
The Эрудит ruleset has no Ё tile and treats Е/Ё as one letter, so its dictionary must be
|
||||
folded before the DAWG is built. Folding merges pairs like ёж/еж, hence the de-dup. Output
|
||||
is sorted (Russian order over the 32 folded letters) and LF-separated.
|
||||
|
||||
Run: python3 dictprep/fold_yo.py dictprep/russian/scrabble.txt > /tmp/ru_erudit_words.txt
|
||||
"""
|
||||
import sys
|
||||
|
||||
ORDER = {c: i for i, c in enumerate("абвгдежзийклмнопрстуфхцчшщъыьэюя")} # 32 letters, no ё
|
||||
|
||||
|
||||
def key(w):
|
||||
return [ORDER.get(c, 99) for c in w]
|
||||
|
||||
|
||||
def main():
|
||||
src = sys.argv[1] if len(sys.argv) > 1 else "/dev/stdin"
|
||||
words = {line.strip().replace("ё", "е").replace("Ё", "Е") for line in open(src, encoding="utf-8")}
|
||||
words.discard("")
|
||||
sys.stdout.write("\n".join(sorted(words, key=key)) + "\n")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user