540ee32178
Build a committed Russian common-noun word list (dictprep/russian/scrabble.txt) from the RAN orthographic dictionary, for the Эрудит ruleset. - Stage 1 (Go, dictprep/ruwords): orfo_dict_2025.txt -> all.txt; extracts headwords, reconstructs "ед." singulars (suppressing plurals), pairs "и" variants. - Stage 2 (Python brain, dictprep/ru_stage2.py): OpenCorpora (mawo-pymorphy3) + libmorph + orthographic notes select common nouns (nom. sing.); --trace explains a word's fate, --dump writes the in-memory buckets. - libmorph C++ bridge (libmorph_check.cpp); manual_confirm.txt is merged in. - orfo_dict_2025.txt is the committed pdftotext source of truth. - See dictprep/README.md for methodology and reproducibility.
19 lines
550 B
Plaintext
19 lines
550 B
Plaintext
# Cached serialized dictionaries, built from the dictionaries/ submodule by
|
|
# cmd/builddict. They are reproducible artifacts, not source.
|
|
/testdata/*.dawg
|
|
/testdata/*.gaddag
|
|
/testdata/*.bin
|
|
|
|
# Local scratch
|
|
/tmp/
|
|
|
|
# Compiled libmorph bridge (build artifact; see dictprep/README.md)
|
|
/dictprep/libmorph_check
|
|
|
|
# Stage 2 --dump debug buckets (committed: all, scrabble, manual_confirm, orfo_dict_2025)
|
|
/dictprep/russian/undefined.txt
|
|
/dictprep/russian/adjectives.txt
|
|
/dictprep/russian/verbs.txt
|
|
/dictprep/russian/singulars.txt
|
|
/dictprep/russian/fate.tsv
|