Publish as versioned Gitea module; move dictionary pipeline out

- Rename module to gitea.iliadenisov.ru/developer/scrabble-solver so it can be consumed as a versioned dependency (no go.work replace / CI clone). - De-internalize wordlist and dictdawg as public packages. - Remove cmd/builddict, dictprep/, the dictionaries submodule and the dawg Makefile: the word-list parsing and DAWG build now live in the separate scrabble-dictionary repository, which publishes the DAWG set as a release artifact. - internal/dict loads the committed dawg/en_sowpods.dawg fixture for cmd/stress. - Update README/CLAUDE docs accordingly.
2026-06-04 19:11:46 +02:00
parent 63a7c663bf
commit 256999b42c
41 changed files with 93 additions and 402477 deletions
@@ -1,3 +0,0 @@
-[submodule "dictionaries"]
-	path = dictionaries
-	url = https://github.com/kamilmielnik/scrabble-dictionaries
@@ -17,23 +17,21 @@ Russian **Эрудит** (`rules` package); Эрудит has no Ё tile and fold
 - `board/`, `rack/`, `rules/` — board grid (+ transpose), rack as per-letter counts,
  and rulesets (geometry, premium layout, tile values/counts, alphabet, bonus):
  `rules.English()`, `rules.RussianScrabble()`, `rules.Erudit()`.
- `internal/` — `dictdawg` (build/load/serialise DAWGs over dafsa), `wordlist`
-  (encode/filter/sort/dedupe + `FoldYo`), `graph`, `dict`.
- `cmd/builddict` — word list → serialised DAWG (`-alphabet latin|russian`).
+- `dictdawg/`, `wordlist/` — **public** helpers: `dictdawg` (build/load/serialise DAWGs
+  over dafsa), `wordlist` (encode/filter/sort/dedupe + `FoldYo`). Imported by the separate
+  `scrabble-dictionary` repo that builds and publishes the DAWG set.
+- `internal/` — `encoding`, `graph`, `dict` (loads the committed `dawg/en_sowpods.dawg`
+  for `cmd/stress`).
 - `cmd/stress`, `selfplay/` — the self-play stress harness behind `RESULTS.md`.
 - `dawg/` — **committed** dictionaries: `en_sowpods.dawg`, `ru_scrabble.dawg`,
-  `ru_erudit.dawg` (Ё→Е folded). Rebuild with `make dawg`.
- `dictionaries/` — `kamilmielnik/scrabble-dictionaries` git submodule (English source).
- `dictprep/` — self-contained tooling that turns the Russian academic orthographic
-  dictionary into a common-noun word list. See `dictprep/README.md`. Committed output is
-  `dictprep/russian/{all,scrabble}.txt` (+ `orfo_dict_2025.{pdf,txt}`, `manual_confirm.txt`).
-  Running Stage 2 needs a Python venv with `mawo-pymorphy3` and the `libmorph` apt packages
-  (see `dictprep/README.md`).
+  `ru_erudit.dawg` (Ё→Е folded). The word-list sources and build pipeline live in the
+  separate [`scrabble-dictionary`](https://gitea.iliadenisov.ru/developer/scrabble-dictionary)
+  repo (which publishes the DAWG set as a release artifact); these committed copies are
+  test fixtures.

 ## Build & test

    go test ./...            # all packages green; also run go vet ./... and gofmt
-    make dawg                # rebuild dawg/*.dawg from the word lists

 Scoring and move generation are validated against **real tournament games** in GCG format
 (`scrabble/gcg_test.go` + `scrabble/testdata/*.gcg`, including the 700+ club): for every
@@ -46,4 +44,5 @@ produces the played move with that score — canonical play, not invented cases.
  and output bytes only — never inside the graph). The public API is byte-indexed.
 - DAWG is the production generator; the GADDAG was removed after measurement.
 - Detailed docs: `ALGORITHM.md` (the algorithm — single source of truth), `PLAN.md`
-  (design and decisions), `RESULTS.md` (DAWG-vs-GADDAG), `dictprep/README.md` (RU pipeline).
+  (design and decisions), `RESULTS.md` (DAWG-vs-GADDAG). The RU word-list pipeline and the
+  DAWG build now live in the `scrabble-dictionary` repo.
@@ -1,28 +0,0 @@
-# Scrabble-solver build helpers.
-#
-# `make dawg` (re)builds the committed dictionary DAWGs under dawg/ from their word lists:
-#   en_sowpods.dawg  — English SOWPODS (Latin alphabet)
-#   ru_scrabble.dawg — Russian Scrabble nouns (Cyrillic, 33 letters)
-#   ru_erudit.dawg   — Эрудит (the same list with Ё→Е folded and de-duped)
-
-GO        ?= go
-PYTHON    ?= python3
-DAWG_DIR  := dawg
-BUILDDICT := $(GO) run ./cmd/builddict
-
-.PHONY: dawg dawg-en dawg-ru dawg-erudit clean-dawg
-
-dawg: dawg-en dawg-ru dawg-erudit
-
-dawg-en:
-	$(BUILDDICT) -dict dictionaries/english/sowpods.txt -alphabet latin -name en_sowpods -out $(DAWG_DIR)
-
-dawg-ru:
-	$(BUILDDICT) -dict dictprep/russian/scrabble.txt -alphabet russian -name ru_scrabble -out $(DAWG_DIR)
-
-dawg-erudit:
-	$(PYTHON) dictprep/fold_yo.py dictprep/russian/scrabble.txt > /tmp/ru_erudit_words.txt
-	$(BUILDDICT) -dict /tmp/ru_erudit_words.txt -alphabet russian -name ru_erudit -out $(DAWG_DIR)
-
-clean-dawg:
-	rm -f $(DAWG_DIR)/*.dawg
@@ -24,27 +24,24 @@ See [`ALGORITHM.md`](ALGORITHM.md) for the algorithm (the single source of truth
 ```
 scrabble/        public API: Solver, Move/Play types, DAWG generator, scoring, validation
 board/ rack/ rules/   board grid (+transpose), rack, rulesets (English/Russian/Эрудит)
-internal/        encoding (byte conventions), wordlist, dictdawg, dict, graph
-cmd/builddict/   word list -> serialized DAWG in testdata
+wordlist/ dictdawg/   public word-list parsing and DAWG build/load helpers
+internal/        encoding (byte conventions), dict (committed-DAWG loader), graph
 cmd/stress/      greedy self-play benchmark of the generator
 selfplay/        bag + greedy player + game loop
 ```

 ## Setup

-```sh
-git submodule update --init            # the dictionaries submodule (SOWPODS, TWL06, …)
-go run ./cmd/builddict                 # build testdata/sowpods.dawg (≈0.2 s, ~730 KB)
-```
-
-`go.mod` carries `replace github.com/iliadenisov/dafsa => ../dafsa`: the solver needs
-dafsa's low-level traversal `Cursor` (see the patch notes in `../dafsa/SCRABBLE_API.md`).
+The committed dictionary DAWGs under `dawg/` (`en_sowpods`, `ru_scrabble`, `ru_erudit`)
+are used directly — no build step. The word-list parsing and DAWG build pipeline lives in
+the separate [`scrabble-dictionary`](https://gitea.iliadenisov.ru/developer/scrabble-dictionary)
+repository, which publishes the DAWG set as a release artifact.

 ## Usage

 ```go
 rs := rules.English()
-finder, _ := dict.EnglishDAWG()            // loads testdata/sowpods.dawg
+finder, _ := dict.EnglishDAWG()            // loads dawg/en_sowpods.dawg
 s := scrabble.NewSolver(rs, finder)

 b := board.New(rs.Rows, rs.Cols)           // empty board (first move)
@@ -9,7 +9,7 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
 )

 // Board is a row-major grid of encoded cells.
@@ -5,8 +5,8 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
 )

 func TestParseAndAccess(t *testing.T) {
@@ -1,75 +0,0 @@
-// Command builddict converts a word list into a serialized DAWG. By default it reads the
-// English SOWPODS list (Latin alphabet); pass -alphabet russian for the Cyrillic lists.
-package main
-
-import (
-	"flag"
-	"fmt"
-	"log"
-	"os"
-	"path/filepath"
-	"time"
-
-	"github.com/iliadenisov/alphabet"
-
-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
-)
-
-func main() {
-	dict := flag.String("dict", "dictionaries/english/sowpods.txt", "word list file (one word per line)")
-	out := flag.String("out", "testdata", "output directory")
-	name := flag.String("name", "sowpods", "base name for the output file")
-	minLen := flag.Int("min", 2, "minimum word length")
-	maxLen := flag.Int("max", 15, "maximum word length")
-	alpha := flag.String("alphabet", "latin", "alphabet: latin (English) or russian")
-	flag.Parse()
-
-	var idx alphabet.Indexer
-	switch *alpha {
-	case "latin":
-		idx = alphabet.Latin()
-	case "russian":
-		idx = alphabet.Embedded(alphabet.Langs.LangRu)
-	default:
-		log.Fatalf("unknown -alphabet %q (want latin or russian)", *alpha)
-	}
-
-	t0 := time.Now()
-	words, err := wordlist.Read(*dict, idx, *minLen, *maxLen)
-	if err != nil {
-		log.Fatalf("read %s: %v", *dict, err)
-	}
-	fmt.Printf("loaded %d words from %s in %s\n", len(words), *dict, time.Since(t0).Round(time.Millisecond))
-
-	if err := os.MkdirAll(*out, 0o755); err != nil {
-		log.Fatal(err)
-	}
-
-	t := time.Now()
-	f, err := dictdawg.Build(idx, words)
-	if err != nil {
-		log.Fatalf("build dawg: %v", err)
-	}
-	path := filepath.Join(*out, *name+".dawg")
-	if err := dictdawg.Save(f, path); err != nil {
-		log.Fatalf("save: %v", err)
-	}
-	size := int64(0)
-	if fi, err := os.Stat(path); err == nil {
-		size = fi.Size()
-	}
-	fmt.Printf("DAWG %d nodes, %s, built+saved in %s -> %s\n",
-		f.NumNodes(), humanBytes(size), time.Since(t).Round(time.Millisecond), path)
-}
-
-func humanBytes(n int64) string {
-	switch {
-	case n >= 1<<20:
-		return fmt.Sprintf("%.2f MB", float64(n)/(1<<20))
-	case n >= 1<<10:
-		return fmt.Sprintf("%.1f KB", float64(n)/(1<<10))
-	default:
-		return fmt.Sprintf("%d B", n)
-	}
-}
@@ -12,10 +12,10 @@ import (
 	"strings"
 	"time"

-	"scrabble-solver/internal/dict"
-	"scrabble-solver/rules"
-	"scrabble-solver/scrabble"
-	"scrabble-solver/selfplay"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/dict"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/scrabble"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/selfplay"
 )

 func main() {
@@ -24,7 +24,7 @@ func main() {

 	rs := rules.English()
 	if !dict.EnglishAvailable() {
-		log.Fatal("English dictionary not available; run `go run ./cmd/builddict` first")
+		log.Fatal("English dictionary not available: dawg/en_sowpods.dawg missing")
 	}
 	f, err := dict.EnglishDAWG()
 	if err != nil {
@@ -6,8 +6,8 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/wordlist"
 )

 func TestBuildAndQuery(t *testing.T) {
@@ -1,164 +0,0 @@
-# Russian word-list preparation (`dictprep`)
-
-Builds the Russian **noun** word list for the Scrabble/Эрудит solver out of the official
-Russian academic **orthographic dictionary**, cross-checked against two independent
-morphological dictionaries.
-
-The goal of the pipeline is a list of **common nouns in the nominative singular**
-(`dictprep/russian/scrabble.txt`), plus an ambiguous tail for manual review.
-
-> This directory is self-contained tooling for *building* the word list. It is not part
-> of the solver library. The committed result lives in `dictprep/russian/`.
-
-## Source
-
-`orfo_dict_2025.pdf` — *Русский орфографический словарь РАН* (≈ 200 000 entries), the
-authority for **spelling**. It encodes declension type in its grammatical notes but does
-**not** reliably mark part of speech.
-
- Source: <https://ruslang.ru/sites/default/files/doc/normativnyje_slovari/orfograficheskij_slovar.pdf>
- Mirror: <https://rus-gos.spbu.ru/index.php/dictionary>
-
-The PDF is git-ignored (large, third-party); place it here as `orfo_dict_2025.pdf`. Its
-pdftotext output is committed as `russian/orfo_dict_2025.txt`, so the word list rebuilds
-from the text alone — the binary PDF is needed only to regenerate that text.
-
-## Outputs (`dictprep/russian/`)
-
-The committed result is **three** files; every other bucket stays in the Stage-2
-process's memory (dump it with `--dump`, query it with `--trace WORD`).
-
-| File | Committed | Meaning |
-|------|:--:|---------|
-| `orfo_dict_2025.txt` | ✓ | the pdftotext output — the parsed source of truth (the PDF binary is not needed to rebuild). |
-| `all.txt` | ✓ | Stage 1 base: every clean Cyrillic headword/variant; a plural headword with a singular is replaced by that singular. |
-| `manual_confirm.txt` | ✓ | hand-reviewed nouns from the undefined tail; the brain merges them into the result. |
-| `scrabble.txt` | ✓ | **Stage 2 result**: common nouns, nominative singular (+ pluralia tantum), length 2–15 — the working dictionary. |
-| `undefined.txt` | — | the ambiguous tail; kept in memory, written only with `--dump`. |
-
-`--dump` also writes `adjectives.txt`, `verbs.txt`, `singulars.txt` and `fate.tsv` (every
-word with the reason it did or did not reach the dictionary); these are git-ignored debug
-artifacts. Stage 1 also writes `/tmp/ru_{skip,singulars,variants}.txt`, intermediate inputs
-the brain consumes.
-
-## Prerequisites
-
-```sh
-# 1. pdftotext (Poppler)
-sudo apt-get install -y poppler-utils
-
-# 2. Go toolchain (Stage 1) — already required by the parent module
-
-# 3. Python + the OpenCorpora analyser (Stage 2)
-sudo apt-get install -y python3-venv python3-pip
-python3 -m venv ru-venv
-ru-venv/bin/pip install mawo-pymorphy3            # bundles OpenCorpora 2025 (words.dawg)
-
-# 4. libmorph — the independent morphological dictionary (Stage 2 cross-check)
-sudo apt-get install -y morphrus morphrus-dev moonycode-dev morphapi-dev
-g++ -std=c++17 -O2 dictprep/libmorph_check.cpp -lmorphrus -lmoonycode -o dictprep/libmorph_check
-```
-
-If `dictprep/libmorph_check` is absent, Stage 2 still runs — it simply drops libmorph from
-the stack and reports `libmorph_helper=MISSING`.
-
-## How to run
-
-```sh
-# Stage 0 — PDF -> plain text (committed as the source of truth; run once)
-pdftotext dictprep/orfo_dict_2025.pdf dictprep/russian/orfo_dict_2025.txt
-
-# Stage 1 — build the base word list (Go): dictprep/russian/all.txt + /tmp/ru_*.txt
-go run ./dictprep/ruwords
-
-# Stage 2 — the brain (Python + mawo + libmorph): writes scrabble.txt
-ru-venv/bin/python dictprep/ru_stage2.py
-
-# ask how a word did or did not reach the dictionary
-ru-venv/bin/python dictprep/ru_stage2.py --trace травмпункт
-# also write the in-memory buckets (undefined, adjectives, verbs, singulars, fate.tsv)
-ru-venv/bin/python dictprep/ru_stage2.py --dump
-```
-
-`-from`/`-to` (defaulting to 452/168808) bound the column word-list section of
-`russian/orfo_dict_2025.txt` (line 452 = the first entry `а1, …`; line 168808 = the last,
-`я́щурный`). The preface above line 452 is prose and is skipped. Verify these bounds if the
-PDF is re-exported.
-
-## Algorithm
-
-### Stage 1 — `ruwords` (Go)
-
-Per dictionary line in `[from, to]` it collects, normalised (stress marks U+0300/U+0301
-stripped, lowercased, `ё` kept, hyphenated/capitalised/non-Cyrillic rejected):
-
- the **headword** (leading token). Leading whitespace including the form-feed `\f`
-  pdftotext puts at every page top is trimmed — otherwise the first headword of each page
-  is lost;
- the **singular of a plural headword** when the entry gives it after `ед.`, in full
-  (`ящеры, …, ед. ящер`) or as a replacement suffix (`…, ед. -вец`, spliced where the
-  suffix best overlaps the headword); the plural is then dropped (a plural that has a
-  singular is never needed) and the singular is also recorded (`/tmp/ru_singulars.txt`);
- **variant headwords** after `и` that carry their own grammatical note
-  (`аблатив, -а и аблятив, -а`; `регги и реггей, нескл.`), excluding inflected forms.
-
-Everything else (every maximal Cyrillic token not selected above) goes to
-`/tmp/ru_skip.txt`, a safety net for a later morphology re-check.
-
-### Stage 2 — `ru_stage2.py` (Python)
-
-Each Stage-1 word (length 2–15) is routed by three sources, most authoritative first:
-
-1. **OpenCorpora** (`words.dawg`, read directly — *not* the predictor): a common-noun
-   reading ⇒ keep the OpenCorpora lemma. The full OpenCorpora common-noun lexicon is also
-   added (so nouns absent from the PDF are included).
-2. **libmorph** (independent dictionary, via `libmorph_check`): a common-noun reading ⇒
-   keep the libmorph lemma. The two dictionaries are treated as **complementary** — a noun
-   reading in *either* is enough (their disagreements were reviewed and resolved this way,
-   since each is incomplete in different places). A singular reconstructed from "ед." that
-   neither dictionary knows is accepted as a noun (the orthographic note attests it).
-3. A word **both dictionaries miss** is classified by the orthographic **note**
-   (`-ая, -ое` ⇒ adjective; `-ть`, `сов./несов.` ⇒ verb; single genitive `-а/-и` or
-   `нескл., м./ж./с.` ⇒ noun). A note-noun goes straight to `scrabble.txt`; an adjective or
-   verb is dropped; anything undecided goes to `undefined.txt`.
-4. **Variant rescue**: when the dictionary joins two spellings with "и" (`травмопункт и
-   травмпункт`, `регги и реггей`) and one is already a confirmed noun, the other is moved
-   from review/undefined into the result as well, propagated transitively through chains.
-   The plural-form variants the dictionaries already resolve never reach this step.
-
-The nominative singular always comes from the dictionary that recognised the word, or from
-the orthographic `ед.` note — never from a predictor guess (libmorph and the predictor
-mis-lemmatise out-of-dictionary words, e.g. `витебчане → витебчан` instead of `витебчанин`).
-
-### The libmorph bridge — `libmorph_check.cpp`
-
-libmorph (A. Kovalenko, MIT) ships as `libmorphrus.so`. `libmorph_check` is a thin
-stdin→stdout filter: one UTF-8 word per line in, one line out:
-
-```
-<known>\t<pos>:<lemma>\t<pos>:<lemma>...
-```
-
-`<known>` is `CheckWord` (1 = in the dictionary). `<pos>` is `wdInfo & 0x3f`, the part of
-speech. The codes were reverse-engineered (the docs omit the table):
-
-| codes | part of speech |
-|------|----------------|
-| **7–21, 24** | **noun** (all genders / declensions / animacy; pluralia tantum is 24) |
-| 1–3 | verb · 25, 27 adjective · 28–32 pronoun · 33–36 numeral |
-| 38–39 | **proper noun** (excluded) · 48–58 comparative/adverb · 49–53 function words |
-
-The analyser instance is requested with the key `libmorph.api.v4:utf-8` so words are
-passed and lemmas returned in UTF-8.
-
-## Notes & caveats
-
- The hard tail (≈ 35 000 Stage-1 words / our candidates) is in **no** morphological
-  dictionary; only the orthographic dictionary attests them, so the PDF note is the sole
-  signal there. Compound and very recent nouns (`робототехник`, `толкинист`) live here.
- OpenCorpora and libmorph are near-equal in size (≈ 99 500 words each on `all.txt`)
-  and ≈ 96 % overlapping, but **complementary** (each contributes ≈ 2 200 unique nouns),
-  which is why both are kept. The mawo *predictor* "knows" ~98 % of everything by guessing
-  and is therefore used only as a weak confirming vote, never as dictionary membership.
- Licensing: OpenCorpora data is CC BY-SA 3.0; libmorph is MIT; the orthographic
-  dictionary has its own copyright. A list derived from CC BY-SA data inherits that licence.
@@ -1,27 +0,0 @@
-#!/usr/bin/env python3
-"""Fold Ё/ё → Е/е in a word list and de-duplicate — the dictionary prep for "Эрудит".
-
-The Эрудит ruleset has no Ё tile and treats Е/Ё as one letter, so its dictionary must be
-folded before the DAWG is built. Folding merges pairs like ёж/еж, hence the de-dup. Output
-is sorted (Russian order over the 32 folded letters) and LF-separated.
-
-Run:  python3 dictprep/fold_yo.py dictprep/russian/scrabble.txt > /tmp/ru_erudit_words.txt
-"""
-import sys
-
-ORDER = {c: i for i, c in enumerate("абвгдежзийклмнопрстуфхцчшщъыьэюя")}  # 32 letters, no ё
-
-
-def key(w):
-    return [ORDER.get(c, 99) for c in w]
-
-
-def main():
-    src = sys.argv[1] if len(sys.argv) > 1 else "/dev/stdin"
-    words = {line.strip().replace("ё", "е").replace("Ё", "Е") for line in open(src, encoding="utf-8")}
-    words.discard("")
-    sys.stdout.write("\n".join(sorted(words, key=key)) + "\n")
-
-
-if __name__ == "__main__":
-    main()
@@ -1,47 +0,0 @@
-// libmorph_check: a thin stdin->stdout bridge to the libmorph Russian morphological
-// analyser, for use by the Stage-2 classifier (scripts/ru_stage2.py).
-//
-// Reads one word per line (bytes are passed through verbatim — the caller encodes to
-// the code page the libmorph char interface expects, CP1251). For each word it writes
-// a line:
-//
-//     <known>\t<pos>:<lemma>\t<pos>:<lemma>...
-//
-// where <known> is CheckWord's result (1 = in the dictionary, 0 = not), and each
-// following field is one lexeme: its part of speech (wdInfo & 0x3f) and lemma.
-//
-// Build: g++ -std=c++17 -O2 scripts/libmorph_check.cpp -lmorphrus -lmoonycode -o libmorph_check
-#include <libmorph/rus.h>
-#include <libmorph/api.hpp>
-#include <cstdio>
-#include <iostream>
-#include <string>
-
-int main(int argc, char** argv) {
-  // The factory key selects the code page: "libmorph.api.v4:<charset>". Use the
-  // UTF-8 instance so words pass through verbatim. IMlmaMbXX only adds non-virtual
-  // convenience wrappers over IMlmaMb, so the filled pointer can be used as such.
-  const char* key = argc > 1 ? argv[1] : "libmorph.api.v4:utf-8";
-  IMlmaMbXX* mlma = nullptr;
-  int rc = mlmaruGetAPI(key, (void**)&mlma);
-  if (mlma == nullptr) {
-    std::fprintf(stderr, "libmorph_check: GetAPI('%s') failed, rc=%d\n", key, rc);
-    return 1;
-  }
-  std::string line;
-  while (std::getline(std::cin, line)) {
-    if (!line.empty() && line.back() == '\r') line.pop_back();
-    IMlmaMbXX::inword w(line.c_str(), line.size());
-    int known = mlma->CheckWord(w, sfIgnoreCapitals);
-    std::cout << known;
-    try {
-      for (auto& lx : mlma->Lemmatize(w, sfIgnoreCapitals)) {
-        unsigned pos = lx.ngrams > 0 ? (lx.pgrams[0].wdInfo & 0x3f) : 0xffu;
-        std::cout << '\t' << pos << ':' << (lx.plemma ? lx.plemma : "");
-      }
-    } catch (...) {
-    }
-    std::cout << '\n';
-  }
-  return 0;
-}
@@ -1,341 +0,0 @@
-#!/usr/bin/env python3
-"""Stage 2 — the "brain" of the Russian Scrabble word-list pipeline.
-
-It reads the Stage-1 base word list (built once by ruwords so the heavy PDF is not
-re-parsed) together with the grammatical notes and the singular/variant structure, runs
-the whole noun-selection logic in memory, and writes a minimal result:
-
-    dictprep/russian/scrabble.txt   — the working dictionary (common nouns, nom. sing.)
-    dictprep/russian/undefined.txt  — the ambiguous tail, left for manual review
-
-(dictprep/russian/all.txt is the Stage-1 base.) Every other bucket — adjectives, verbs,
-the merged note-nouns, singulars, variants — stays in memory. Pass --dump to also write
-them; pass --trace WORD to ask how a single word did or did not reach the dictionary.
-
-Note: all.txt is a plain word list, so the grammatical notes, "ед." singulars and "и"
-variants are read from the pdftotext output (slov.txt) and the Stage-1 side files; the
-expensive PDF parse itself runs only once.
-
-Sources, most authoritative first: OpenCorpora (mawo-pymorphy3), libmorph (libmorph_check),
-and the orthographic dictionary's own notes. See dictprep/README.md.
-
-Run:  ru-venv/bin/python dictprep/ru_stage2.py [--dump] [--trace WORD]
-"""
-import argparse
-import os
-import re
-import subprocess
-
-HERE = os.path.dirname(os.path.abspath(__file__))
-OUT_DIR = os.path.join(HERE, "russian")
-SLOV = os.path.join(OUT_DIR, "orfo_dict_2025.txt")  # committed pdftotext output (source of truth)
-WL_FROM, WL_TO = 452, 168808  # 1-based inclusive bounds of the column word-list section
-OC_CACHE = "/tmp/oc_nouns.txt"
-LIBMORPH_BIN = os.path.join(HERE, "libmorph_check")
-
-ALPHABET = "абвгдеёжзийклмнопрстуфхцчшщъыьэюя"
-ORDER = {c: i for i, c in enumerate(ALPHABET)}
-PROPER = {"Name", "Surn", "Patr", "Geox", "Orgn", "Trad"}
-LIBMORPH_NOUN_CODES = set(range(7, 22)) | {24}  # 7..21 plus 24 (pluralia tantum)
-ADJ_END = {"ая", "яя", "ое", "ее", "ье", "ья", "ьи"}
-VERB3 = ("ет", "ёт", "ит", "ют", "ут", "ает", "яет", "ует", "уют", "нет", "жет", "чет")
-GENPL = ("ов", "ёв", "ев", "ей")
-
-
-def key(w):
-    return [ORDER.get(c, 99) for c in w]
-
-
-def destress(s):
-    return "".join(c for c in s if ord(c) not in (0x0300, 0x0301)).lower()
-
-
-def cyr_ok(w):
-    return 2 <= len(w) <= 15 and all(("а" <= c <= "я") or c == "ё" for c in w)
-
-
-def load(p):
-    return [l.strip() for l in open(p, encoding="utf-8") if l.strip()] if os.path.exists(p) else []
-
-
-def write(path, words):
-    os.makedirs(os.path.dirname(path), exist_ok=True)
-    open(path, "w", encoding="utf-8").write("\n".join(sorted(set(words), key=key)) + "\n")
-
-
-import mawo_pymorphy3  # noqa: E402
-
-M = mawo_pymorphy3.MorphAnalyzer()
-D = M._dawg_dict
-
-
-def oc_noun_lemmas():
-    """Every common-noun lemma (nom. sing. / pluralia tantum) in OpenCorpora's words.dawg."""
-    gp, pt = D.get_paradigm, D.parse_tag_string
-    para0, tagc = {}, {}
-
-    def g0(pid):
-        r = para0.get(pid)
-        if r is None:
-            suf0, tag0, pre0 = gp(pid, 0)
-            _, gr = pt(tag0)
-            r = (pre0, suf0, gr)
-            para0[pid] = r
-        return r
-
-    def gt(pid, idx):
-        k = (pid, idx)
-        r = tagc.get(k)
-        if r is None:
-            suf, tag, pre = gp(pid, idx)
-            pos, gr = pt(tag)
-            r = (suf, pre, pos, gr)
-            tagc[k] = r
-        return r
-
-    out = set()
-    for word, rec in D.words_dawg.iteritems():
-        pid, idx = rec
-        suf, pre, pos, gr = gt(pid, idx)
-        if pos != "NOUN":
-            continue
-        pre0, suf0, gr0 = g0(pid)
-        if (PROPER & gr) or (PROPER & gr0):
-            continue
-        stem = word[len(pre):len(word) - len(suf)] if suf else word[len(pre):]
-        out.add(pre0 + stem + suf0)
-    return {w for w in out if cyr_ok(w)}
-
-
-def oc_status(word):
-    """(is_common_noun, in_dictionary) for word, from OpenCorpora only."""
-    parses = D.get_word_parses(word)
-    if not parses:
-        return False, False
-    gp, pt = D.get_paradigm, D.parse_tag_string
-    for pid, idx in parses:
-        suf, tag, pre = gp(pid, idx)
-        pos, gr = pt(tag)
-        if pos == "NOUN":
-            _, tag0, _ = gp(pid, 0)
-            _, gr0 = pt(tag0)
-            if not (PROPER & gr or PROPER & gr0):
-                return True, True
-    return False, True
-
-
-def libmorph_analyze(words):
-    """Map each word to (known, noun_lemma, codes) per libmorph; noun_lemma is None when it
-    is not a common noun there. Empty result if the helper binary is not built."""
-    words = list(words)
-    if not words or not os.path.exists(LIBMORPH_BIN):
-        return {}
-    proc = subprocess.run([LIBMORPH_BIN], input="\n".join(words), capture_output=True, text=True)
-    out = {}
-    for w, line in zip(words, proc.stdout.split("\n")):
-        fields = line.split("\t")
-        known = fields[:1] == ["1"]
-        codes, noun_lemmas = set(), []
-        for field in fields[1:]:
-            code, _, lex = field.partition(":")
-            if code.isdigit():
-                codes.add(int(code))
-                if int(code) in LIBMORPH_NOUN_CODES:
-                    noun_lemmas.append(lex)
-        lemma = (w if w in noun_lemmas else noun_lemmas[0]) if noun_lemmas else None
-        out[w] = (known, lemma, codes)
-    return out
-
-
-def build_notes():
-    """Map each headword (destressed, lowercased) to its grammatical note."""
-    def is_hw(ch):
-        o = ord(ch)
-        return (0x0430 <= o <= 0x044F) or (0x0410 <= o <= 0x042F) or o in (0x0401, 0x0451, 0x0300, 0x0301)
-
-    hmap = {}
-    lines = open(SLOV, encoding="utf-8").read().split("\n")
-    for l in lines[WL_FROM - 1:WL_TO]:
-        s = l.lstrip()
-        e = 0
-        for ch in s:
-            if is_hw(ch):
-                e += 1
-            else:
-                break
-        hw = destress(s[:e])
-        if hw and hw not in hmap:
-            hmap[hw] = destress(s[e:]).strip()
-    return hmap
-
-
-def classify(w, note):
-    """Coarse part of speech of an out-of-dictionary word from its PDF note."""
-    if note is None:
-        return "amb"
-    n = re.sub(r"\([^)]*\)", "", note).strip()  # drop domain/etymology parentheticals
-    if "кр. ф" in n or "кр.ф" in n or "прич." in n or "прил." in n:
-        return "adj"
-    ends = re.findall(r"-([а-яё]+)", n)
-    if any(e in ADJ_END for e in ends):
-        return "adj"
-    if "сов." in n or "несов." in n or "безл." in n:
-        return "verb"
-    if w.endswith("ся"):  # reflexive: no Russian noun ends in -ся
-        return "verb"
-    if any(e.endswith(VERB3) for e in ends) and not any(m in n for m in ("ед.", "тв.", "род.", "м.", "ж.", "с.")):
-        return "verb"
-    if n == "" and w.endswith(("ый", "ий", "ой", "ая", "ое", "ые", "ие", "яя", "ее")):
-        return "adj"
-    if "нескл" in n:
-        return "noun" if any(g in n for g in ("м.", "ж.", "с.", "мн.")) else "amb"
-    if ends:
-        return "noun"
-    if n == "" and w.endswith(("ать", "ять", "еть", "ить", "оть", "уть", "ыть", "ти", "чь")):
-        return "verb"
-    return "amb"
-
-
-def singular(w, note):
-    """Nominative singular of a noun headword from the PDF note (authoritative) or, for a
-    plural headword without an explicit singular, the mawo lemma; pluralia tantum kept."""
-    n = note or ""
-    full = re.search(r"ед\.\s+([а-яё]+)", n)
-    if full:
-        return full.group(1)
-    suf = re.search(r"ед\.\s+-([а-яё]+)", n)
-    if suf:
-        s = suf.group(1)
-        i = w.rfind(s[0])
-        return w[:i] + s if i > 0 else w
-    ends = re.findall(r"-([а-яё]+)", re.sub(r"\([^)]*\)", "", n))
-    if ends and ends[0].endswith(GENPL):
-        for p in M.parse(w):
-            if str(p.tag.POS) == "NOUN":
-                return p.normal_form
-        return w
-    return w
-
-
-def build():
-    """Run the whole pipeline in memory. Returns the result sets plus a `fate` map giving
-    every word's outcome, so a word's path can be traced or the buckets dumped."""
-    oc = set(load(OC_CACHE)) or oc_noun_lemmas()
-    if not os.path.exists(OC_CACHE):
-        write(OC_CACHE, oc)
-    hmap = build_notes()
-    all_words = load(os.path.join(OUT_DIR, "all.txt"))
-    ed_nouns = set(load("/tmp/ru_singulars.txt"))
-    pairs = [tuple(p) for l in load("/tmp/ru_variants.txt") if len(p := l.split("\t")) == 2]
-    pdf = [w for w in all_words if cyr_ok(w)]
-    lm = libmorph_analyze(pdf)
-
-    def to_singular(w):
-        s = singular(w, hmap.get(w))
-        return s if cyr_ok(s) else w
-
-    fate = {}
-    scrabble = set(oc)
-    adj, verb, amb = [], [], []
-    for w in pdf:
-        oc_noun, oc_known = oc_status(w)
-        if oc_noun:
-            fate[w] = "scrabble: сущ. по OpenCorpora"
-            continue
-        lm_known, lm_lemma, _ = lm.get(w, (False, None, frozenset()))
-        if lm_lemma is not None:
-            s = lm_lemma if cyr_ok(lm_lemma) else to_singular(w)
-            scrabble.add(s)
-            fate[w] = "scrabble: сущ. по libmorph" + ("" if s == w else f" → {s}")
-            continue
-        if oc_known or lm_known:
-            fate[w] = "отброшено: словарь знает как не-существительное"
-            continue
-        if w in ed_nouns:
-            scrabble.add(w)
-            fate[w] = "scrabble: ед.ч. по помете «ед.»"
-            continue
-        c = classify(w, hmap.get(w))
-        if c == "noun":
-            s = to_singular(w)
-            scrabble.add(s)
-            fate[w] = "scrabble: сущ. по помете орфословаря" + ("" if s == w else f" → {s}")
-        elif c == "adj":
-            adj.append(w)
-            fate[w] = "отброшено: прилагательное (помета орфословаря)"
-        elif c == "verb":
-            verb.append(w)
-            fate[w] = "отброшено: глагол (помета орфословаря)"
-        else:
-            amb.append(w)
-            fate[w] = "undefined: неоднозначное (нет в словарях, помета не определяет)"
-
-    # Manual confirmations: nouns the maintainer approved from the undefined tail.
-    for w in load(os.path.join(OUT_DIR, "manual_confirm.txt")):
-        if cyr_ok(w):
-            scrabble.add(w)
-            fate[w] = "scrabble: подтверждено вручную (manual_confirm.txt)"
-
-    # Variant rescue: a word joined by "и" to a confirmed noun is itself a noun.
-    pending = set(amb) - scrabble
-    changed = True
-    while changed:
-        changed = False
-        for a, b in pairs:
-            for x, y in ((a, b), (b, a)):
-                if x in scrabble and y in pending:
-                    scrabble.add(y)
-                    pending.discard(y)
-                    fate[y] = f"scrabble: вариант от «{x}» (через «и»)"
-                    changed = True
-
-    undefined = [w for w in amb if w not in scrabble]
-    return {
-        "oc": oc, "scrabble": scrabble, "undefined": undefined,
-        "adjectives": adj, "verbs": verb, "singulars": ed_nouns,
-        "fate": fate, "all": set(all_words),
-    }
-
-
-def trace(word, r):
-    w = destress(word)
-    if w in r["fate"]:
-        return r["fate"][w]
-    if w in r["scrabble"]:
-        return "scrabble: лексикон OpenCorpora" if w in r["oc"] else "scrabble: производная/лемма"
-    if w not in r["all"]:
-        return "нет в russian_all (не извлечено на Stage 1 — нет в .pdf, либо имя собств./дефис/форма)"
-    if not cyr_ok(w):
-        return "отсеяно: длина или символы вне диапазона (2–15 кириллица)"
-    return "не определено"
-
-
-def main():
-    ap = argparse.ArgumentParser(description="Stage 2 brain: build the noun dictionary, trace a word, or dump buckets.")
-    ap.add_argument("--dump", action="store_true", help="also write the in-memory buckets (adjectives, verbs, singulars, variants, fate)")
-    ap.add_argument("--trace", metavar="WORD", help="report how WORD did or did not reach the dictionary, then exit")
-    args = ap.parse_args()
-
-    r = build()
-    if args.trace:
-        print(f"{args.trace}: {trace(args.trace, r)}")
-        return
-
-    write(os.path.join(OUT_DIR, "scrabble.txt"), r["scrabble"])
-    print(f"=> dictprep/russian/scrabble.txt   {len(r['scrabble'])}")
-    print(f"   undefined kept in memory: {len(set(r['undefined']))} (use --dump to write it)")
-    if args.dump:
-        write(os.path.join(OUT_DIR, "undefined.txt"), r["undefined"])
-        write(os.path.join(OUT_DIR, "adjectives.txt"), r["adjectives"])
-        write(os.path.join(OUT_DIR, "verbs.txt"), r["verbs"])
-        write(os.path.join(OUT_DIR, "singulars.txt"), r["singulars"])
-        fate_path = os.path.join(OUT_DIR, "fate.tsv")
-        os.makedirs(OUT_DIR, exist_ok=True)
-        with open(fate_path, "w", encoding="utf-8") as f:
-            for w in sorted(r["fate"], key=key):
-                f.write(f"{w}\t{r['fate'][w]}\n")
-        print(f"   dumped: undefined.txt ({len(set(r['undefined']))}), adjectives.txt, verbs.txt, singulars.txt, fate.tsv")
-
-
-if __name__ == "__main__":
-    main()
@@ -1,135 +0,0 @@
-артгруппа
-бутень
-вебинар
-видеодневник
-водозащита
-генацвале
-жакоб
-оберфюрер
-околоть
-особина
-полбазара
-полбака
-полбалкона
-полбанана
-полбарана
-полбатальона
-полбатона
-полбиблиотеки
-полблокнота
-полбокала
-полбуханки
-полвагона
-полвечера
-полвзвода
-полвинта
-полгазеты
-полгектара
-полгостиницы
-полграмма
-полгруппы
-полдачи
-полдвора
-полдекабря
-полдеревни
-полдетсада
-полдивана
-полдивизии
-полдыни
-полжурнала
-ползавода
-ползарплаты
-полздания
-полканикул
-полканистры
-полкартофелины
-полкастрюли
-полквартиры
-полкилограмма
-полкласса
-полкниги
-полколлекции
-полкольца
-полкоманды
-полкоробки
-полкочана
-полкурса
-полкуска
-полмагазина
-полмандарина
-полмарта
-полматча
-полмиллиметра
-полмузея
-полноября
-полпакета
-полпарка
-полпартии
-полпинты
-полпирога
-полпирожка
-полпируэта
-полпоезда
-полполена
-полполка
-полполки
-полполосы
-полпомидора
-полпоросёнка
-полпосёлка
-полпредовский
-полпроцента
-полпузырька
-полрайона
-полромана
-полроты
-полрулона
-полряда
-полсада
-полсажени
-полсезона
-полсентября
-полсловаря
-полсостава
-полсрока
-полстада
-полстены
-полстолетия
-полстраницы
-полстроки
-полтаблетки
-полтайма
-полтакта
-полтарелки
-полтетради
-полтома
-полтона
-полторта
-полтысячелетия
-полтюбика
-полусанаторий
-полфакультета
-полфевраля
-полфлакона
-полфразы
-полхаты
-полцарства
-полцентнера
-полцистерны
-полчайника
-полчемодана
-полшажка
-полшажочка
-полшара
-полшкафа
-полшколы
-полщеки
-принт
-промо
-рентгеноаппарат
-сивец
-соцнаём
-срывка
-флеш
-флешмобер
-шиноремонт
@@ -1,434 +0,0 @@
-// Command ruwords extracts a clean Cyrillic word list from the plain text of a Russian
-// orthographic dictionary (the output of `pdftotext`).
-//
-// Stage 1 (this tool): from the column word-list section [from, to] it collects, per
-// entry, the headword (the leading token). When the headword is plural and the entry
-// gives its singular after "ед." — in full ("ящеры, …, ед. ящер") or as a replacement
-// suffix ("…, ед. -вец") — only the singular is kept, since a plural that has a singular
-// is never needed. It drops stress marks, lowercases, keeps ё, and discards proper nouns
-// (capitalized), hyphenated words, acronyms and non-Cyrillic tokens. The result is
-// de-duplicated and sorted in Russian alphabetical order (ё right after е), LF-separated.
-//
-// It also collects a variant headword joined by "и" when it carries its own grammatical
-// note (e.g. "аблатив, -а и аблятив, -а"). Suffix-singular reconstruction is heuristic;
-// Stage 2 (dictprep/ru_stage2.py) re-checks the words against real dictionaries.
-//
-//	pdftotext dictprep/orfo_dict_2025.pdf /tmp/slov.txt
-//	go run ./dictprep/ruwords -in /tmp/slov.txt -from 452 -to 168808 \
-//	    -out russian_all.txt -skip russian_skip.txt
-package main
-
-import (
-	"bufio"
-	"flag"
-	"fmt"
-	"log"
-	"os"
-	"path/filepath"
-	"sort"
-	"strings"
-	"unicode"
-)
-
-// ruAlphabet is the Russian alphabet in collation order (ё directly after е).
-const ruAlphabet = "абвгдеёжзийклмнопрстуфхцчшщъыьэюя"
-
-var ruRank = func() map[rune]int {
-	m := make(map[rune]int, len(ruAlphabet))
-	for i, r := range []rune(ruAlphabet) {
-		m[r] = i
-	}
-	return m
-}()
-
-func isCyrLetter(r rune) bool {
-	return (r >= 'а' && r <= 'я') || (r >= 'А' && r <= 'Я') || r == 'ё' || r == 'Ё'
-}
-
-func isUpperCyr(r rune) bool { return (r >= 'А' && r <= 'Я') || r == 'Ё' }
-
-func isStress(r rune) bool { return r == 0x0300 || r == 0x0301 }
-
-// cleanWord normalizes a run of letters/stress-marks into a lowercase Cyrillic word, or
-// returns ok=false for proper nouns (capitalized), hyphenated or non-Cyrillic runs.
-func cleanWord(run []rune) (string, bool) {
-	if len(run) == 0 || isUpperCyr(run[0]) {
-		return "", false
-	}
-	var b strings.Builder
-	for _, r := range run {
-		switch {
-		case isStress(r), r == '': // drop stress accents and soft hyphens
-		case r == '-': // a real hyphen means a hyphenated word: reject it
-			return "", false
-		default:
-			b.WriteRune(unicode.ToLower(r))
-		}
-	}
-	w := b.String()
-	if w == "" {
-		return "", false
-	}
-	for _, r := range w {
-		if !((r >= 'а' && r <= 'я') || r == 'ё') {
-			return "", false
-		}
-	}
-	return w, true
-}
-
-// headword returns the entry's headword: the leading run of letters, stress marks and
-// hyphens, normalized.
-func headword(line string) (string, bool) {
-	// Trim leading whitespace, including the form-feed (U+000C) that pdftotext puts at
-	// the top of each page — otherwise the first headword on every page is lost.
-	line = strings.TrimLeftFunc(line, unicode.IsSpace)
-	var run []rune
-	for _, r := range line {
-		if isCyrLetter(r) || isStress(r) || r == '-' || r == '' {
-			run = append(run, r)
-		} else {
-			break
-		}
-	}
-	return cleanWord(run)
-}
-
-// embeddedSingulars returns the singular form of a plural headword spelled out after
-// "ед.", either in full ("ед. ящер") or as a replacement suffix ("ед. -вец",
-// reconstructed from headword). It skips gender marks ("ед. м") and abbreviations that
-// merely start with "ед." ("ед. измер.", "ден. ед.").
-func embeddedSingulars(line, headword string) []string {
-	var out []string
-	for i := 0; ; {
-		j := strings.Index(line[i:], "ед.")
-		if j < 0 {
-			break
-		}
-		i += j + len("ед.")
-		rest := strings.TrimLeft(line[i:], "  \t")
-
-		if strings.HasPrefix(rest, "-") { // suffix form: reconstruct from the headword
-			var suf []rune
-			for _, r := range rest[len("-"):] {
-				if isCyrLetter(r) || isStress(r) {
-					suf = append(suf, r)
-				} else {
-					break
-				}
-			}
-			if s, ok := cleanWord(suf); ok && len([]rune(s)) >= 2 {
-				if recon := reconstructSingular(headword, s); recon != "" {
-					out = append(out, recon)
-				}
-			}
-			continue
-		}
-
-		var run []rune
-		consumed := 0
-		for _, r := range rest {
-			if isCyrLetter(r) || isStress(r) {
-				run = append(run, r)
-				consumed += len(string(r))
-			} else {
-				break
-			}
-		}
-		if len(run) == 0 {
-			continue
-		}
-		if strings.HasPrefix(rest[consumed:], ".") {
-			continue // an abbreviation like "ед. измер." rather than a singular form
-		}
-		w, ok := cleanWord(run)
-		if !ok || len([]rune(w)) < 2 { // 2+ letters excludes the gender marks м/ж/с
-			continue
-		}
-		out = append(out, w)
-	}
-	return out
-}
-
-// reconstructSingular builds the singular from a plural headword and the replacement
-// suffix from "ед. -<suffix>", splicing where the suffix best overlaps the tail of the
-// headword (the position of longest common prefix between the suffix and a headword
-// suffix). It is a heuristic; Stage 2 re-checks the words against real dictionaries.
-func reconstructSingular(headword, suffix string) string {
-	hw, sf := []rune(headword), []rune(suffix)
-	bestK, bestLen := -1, 0
-	for k := 0; k < len(hw); k++ {
-		m := 0
-		for k+m < len(hw) && m < len(sf) && hw[k+m] == sf[m] {
-			m++
-		}
-		if m > bestLen {
-			bestK, bestLen = k, m
-		}
-	}
-	if bestK < 0 {
-		return ""
-	}
-	return string(hw[:bestK]) + suffix
-}
-
-// headwordNotes are the grammatical notes that mark a parallel headword (a lemma) after
-// "и", as opposed to an inflected form. A "-" ending also marks one; form labels such as
-// деепр. (gerund) or сравн. (comparative) deliberately do not.
-var headwordNotes = map[string]bool{
-	"нескл": true, "неизм": true, "предлог": true, "предл": true, "нареч": true,
-	"нар": true, "прил": true, "союз": true, "частица": true, "част": true,
-	"межд": true, "мн": true, "ед": true, "тв": true, "числ": true, "мест": true,
-	"м": true, "ж": true, "с": true, "вводн": true, "сказ": true,
-}
-
-// variantNoteOK reports whether the note following a candidate variant marks a headword:
-// a "-" inflection ending or one of headwordNotes (and not a bare inflected word).
-func variantNoteOK(note string) bool {
-	if strings.HasPrefix(note, "-") {
-		return true
-	}
-	var stem []rune
-	for _, r := range note {
-		if (r >= 'а' && r <= 'я') || r == 'ё' {
-			stem = append(stem, r)
-		} else {
-			break
-		}
-	}
-	return headwordNotes[string(stem)]
-}
-
-// variants returns the second (and further) headwords of an entry, written as a parallel
-// form after " и ", e.g. "аблатив, -а и аблятив, -а" yields "аблятив" and "регги и реггей,
-// нескл." yields "реггей". Requiring a headword note after the comma keeps this from
-// matching "и" inside examples or picking up inflected forms.
-func variants(line string) []string {
-	var out []string
-	const sep = " и "
-	for i := 0; ; {
-		j := strings.Index(line[i:], sep)
-		if j < 0 {
-			break
-		}
-		i += j + len(sep)
-		rest := line[i:]
-		var run []rune
-		consumed := 0
-		for _, r := range rest {
-			if isCyrLetter(r) || isStress(r) {
-				run = append(run, r)
-				consumed += len(string(r))
-			} else {
-				break
-			}
-		}
-		if len(run) == 0 {
-			continue
-		}
-		after := rest[consumed:]
-		if !strings.HasPrefix(after, ", ") || !variantNoteOK(after[len(", "):]) {
-			continue
-		}
-		if w, ok := cleanWord(run); ok && len([]rune(w)) >= 2 {
-			out = append(out, w)
-		}
-	}
-	return out
-}
-
-// normToken normalizes any token (a run of letters and stress marks) for the skip set:
-// lowercase, stress removed, kept only if it is 2+ all-Cyrillic letters. Unlike
-// cleanWord it does NOT reject capitalized tokens — a lowercased proper noun belongs in
-// the skip set so it can be re-checked by a morphological analyzer.
-func normToken(run []rune) (string, bool) {
-	var b strings.Builder
-	for _, r := range run {
-		if isStress(r) {
-			continue
-		}
-		b.WriteRune(unicode.ToLower(r))
-	}
-	w := b.String()
-	if len([]rune(w)) < 2 {
-		return "", false
-	}
-	for _, r := range w {
-		if !((r >= 'а' && r <= 'я') || r == 'ё') {
-			return "", false
-		}
-	}
-	return w, true
-}
-
-// tokens returns every maximal run of Cyrillic letters (plus stress marks) in the line,
-// normalized; runs are split on every other character (so hyphens split a word).
-func tokens(line string) []string {
-	var out []string
-	var run []rune
-	flush := func() {
-		if len(run) > 0 {
-			if w, ok := normToken(run); ok {
-				out = append(out, w)
-			}
-			run = run[:0]
-		}
-	}
-	for _, r := range line {
-		if isCyrLetter(r) || isStress(r) {
-			run = append(run, r)
-		} else {
-			flush()
-		}
-	}
-	flush()
-	return out
-}
-
-func lessRu(a, b string) bool {
-	ra, rb := []rune(a), []rune(b)
-	for i := 0; i < len(ra) && i < len(rb); i++ {
-		if ra[i] != rb[i] {
-			return ruRank[ra[i]] < ruRank[rb[i]]
-		}
-	}
-	return len(ra) < len(rb)
-}
-
-func sortedRu(set map[string]struct{}) []string {
-	words := make([]string, 0, len(set))
-	for w := range set {
-		words = append(words, w)
-	}
-	sort.Slice(words, func(i, j int) bool { return lessRu(words[i], words[j]) })
-	return words
-}
-
-func writeWords(path string, words []string) error {
-	if dir := filepath.Dir(path); dir != "" && dir != "." {
-		if err := os.MkdirAll(dir, 0o755); err != nil {
-			return err
-		}
-	}
-	o, err := os.Create(path)
-	if err != nil {
-		return err
-	}
-	w := bufio.NewWriter(o)
-	for _, word := range words {
-		w.WriteString(word)
-		w.WriteByte('\n')
-	}
-	if err := w.Flush(); err != nil {
-		o.Close()
-		return err
-	}
-	return o.Close()
-}
-
-func main() {
-	in := flag.String("in", "dictprep/russian/orfo_dict_2025.txt", "plain-text dictionary (pdftotext output)")
-	out := flag.String("out", "dictprep/russian/all.txt", "output: the base word list (clean headwords + reconstructed singulars + variants)")
-	skip := flag.String("skip", "/tmp/ru_skip.txt", "output: every other token, for a later morphology re-check")
-	sings := flag.String("singulars", "/tmp/ru_singulars.txt", "output: singulars reconstructed from \"ед.\" (known nouns)")
-	varsOut := flag.String("variants", "/tmp/ru_variants.txt", "output: variant pairs joined by \"и\" (primary<TAB>variant)")
-	from := flag.Int("from", 452, "first line of the word-list section (1-based, inclusive)")
-	to := flag.Int("to", 168808, "last line of the word-list section (inclusive)")
-	flag.Parse()
-	if *in == "" {
-		log.Fatal("ruwords: -in is required")
-	}
-
-	f, err := os.Open(*in)
-	if err != nil {
-		log.Fatal(err)
-	}
-	defer f.Close()
-
-	all := make(map[string]struct{})
-	allTokens := make(map[string]struct{})
-	singulars := make(map[string]struct{})
-	variantPairs := make(map[string]struct{})
-	entries, fromHead, fromSing, fromVar := 0, 0, 0, 0
-	sc := bufio.NewScanner(f)
-	sc.Buffer(make([]byte, 1<<20), 1<<20)
-	for line := 0; sc.Scan(); {
-		line++
-		if line < *from || line > *to {
-			continue
-		}
-		entries++
-		text := sc.Text()
-		hw, hwOK := headword(text)
-		var sings []string
-		if hwOK {
-			sings = embeddedSingulars(text, hw)
-		}
-		primary := ""
-		if len(sings) > 0 {
-			// the headword is plural and the entry gives its singular: keep only the singular
-			primary = sings[0]
-			for _, w := range sings {
-				if _, seen := all[w]; !seen {
-					fromSing++
-					all[w] = struct{}{}
-				}
-				singulars[w] = struct{}{}
-			}
-		} else if hwOK {
-			primary = hw
-			if _, seen := all[hw]; !seen {
-				fromHead++
-			}
-			all[hw] = struct{}{}
-		}
-		for _, w := range variants(text) {
-			if _, seen := all[w]; !seen {
-				fromVar++
-				all[w] = struct{}{}
-			}
-			if primary != "" && primary != w {
-				variantPairs[primary+"\t"+w] = struct{}{}
-			}
-		}
-		for _, w := range tokens(text) {
-			allTokens[w] = struct{}{}
-		}
-	}
-	if err := sc.Err(); err != nil {
-		log.Fatal(err)
-	}
-
-	skipSet := make(map[string]struct{})
-	for w := range allTokens {
-		if _, ok := all[w]; !ok {
-			skipSet[w] = struct{}{}
-		}
-	}
-
-	allWords := sortedRu(all)
-	skipWords := sortedRu(skipSet)
-	if err := writeWords(*out, allWords); err != nil {
-		log.Fatal(err)
-	}
-	if err := writeWords(*skip, skipWords); err != nil {
-		log.Fatal(err)
-	}
-	if err := writeWords(*sings, sortedRu(singulars)); err != nil {
-		log.Fatal(err)
-	}
-	pairList := make([]string, 0, len(variantPairs))
-	for p := range variantPairs {
-		pairList = append(pairList, p)
-	}
-	sort.Strings(pairList)
-	if err := writeWords(*varsOut, pairList); err != nil {
-		log.Fatal(err)
-	}
-
-	fmt.Printf("scanned %d entries\n", entries)
-	fmt.Printf("  %-20s %7d words (%d headwords + %d embedded singulars + %d variants)\n", *out, len(allWords), fromHead, fromSing, fromVar)
-	fmt.Printf("  %-20s %7d words (tokens not in %s; for a morphology re-check)\n", *skip, len(skipWords), *out)
-	fmt.Printf("  %-20s %7d words (singulars from \"ед.\"; known nouns)\n", *sings, len(singulars))
-	fmt.Printf("  %-20s %7d pairs (variants joined by \"и\")\n", *varsOut, len(variantPairs))
-}
@@ -1,4 +1,4 @@
-module scrabble-solver
+module gitea.iliadenisov.ru/developer/scrabble-solver

 go 1.26.3

@@ -1,24 +1,18 @@
-// Package dict loads the English test dictionary as a DAWG, preferring the serialized
-// cache under testdata and falling back to building from the dictionaries submodule.
-// Paths are resolved relative to the repository root so it works both from the repo root
-// (commands) and from a package directory (tests).
+// Package dict loads the English test dictionary as a DAWG from the committed
+// dawg/en_sowpods.dawg fixture, for the cmd/stress benchmark. The dictionary build
+// pipeline (word-list parsing and DAWG construction from sources) now lives in the
+// separate scrabble-dictionary repository; this package only loads the committed
+// artifact. Paths are resolved relative to the repository root so it works both from
+// the repo root (commands) and from a package directory (tests).
 package dict

 import (
 	"os"
 	"path/filepath"

-	"github.com/iliadenisov/alphabet"
 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
-)
-
-// MinLen and MaxLen bound playable word lengths (a 15x15 board holds at most 15).
-const (
-	MinLen = 2
-	MaxLen = 15
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
 )

 func exists(p string) bool { _, err := os.Stat(p); return err == nil }
@@ -42,35 +36,11 @@ func Root() string {
 	}
 }

-// DAWGCache and WordlistPath locate the English cache file and source word list,
-// relative to the repository root.
-func DAWGCache() string    { return filepath.Join(Root(), "testdata", "sowpods.dawg") }
-func WordlistPath() string { return filepath.Join(Root(), "dictionaries", "english", "sowpods.txt") }
+// DAWGCache locates the committed English DAWG, relative to the repository root.
+func DAWGCache() string { return filepath.Join(Root(), "dawg", "en_sowpods.dawg") }

-// EnglishAvailable reports whether the English dictionary can be loaded (cache or source).
-func EnglishAvailable() bool {
-	return exists(DAWGCache()) || exists(WordlistPath())
-}
+// EnglishAvailable reports whether the committed English DAWG is present.
+func EnglishAvailable() bool { return exists(DAWGCache()) }

-// EnglishWords returns the encoded English word list (from the submodule source).
-func EnglishWords() ([][]byte, error) {
-	return wordlist.Read(WordlistPath(), alphabet.Latin(), MinLen, MaxLen)
-}
-
-// EnglishDAWG returns the English DAWG, loading the cache if present, otherwise building
-// it from the word list and caching it (best effort).
-func EnglishDAWG() (dawg.Finder, error) {
-	if exists(DAWGCache()) {
-		return dictdawg.Load(DAWGCache())
-	}
-	words, err := EnglishWords()
-	if err != nil {
-		return nil, err
-	}
-	f, err := dictdawg.Build(alphabet.Latin(), words)
-	if err != nil {
-		return nil, err
-	}
-	_ = dictdawg.Save(f, DAWGCache())
-	return f, nil
-}
+// EnglishDAWG loads the committed English DAWG.
+func EnglishDAWG() (dawg.Finder, error) { return dictdawg.Load(DAWGCache()) }
@@ -6,7 +6,7 @@ import (
 	"github.com/iliadenisov/alphabet"
 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/internal/graph"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/graph"
 )

 // TestSpellSmoke also exercises the go.mod replace => ../dafsa wiring and the new
@@ -1,8 +1,8 @@
 package scrabble

 import (
-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
 )

 // Apply places a move's newly-placed tiles on the board. The move must be legal for the
@@ -3,8 +3,8 @@ package scrabble
 import (
 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
 )

 // letterSet is a bit set over alphabet letter indexes (alphabets are at most 63
@@ -6,8 +6,8 @@ import (
 	"github.com/iliadenisov/alphabet"
 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/wordlist"
 )

 func bruteCrossSet(words [][]byte, above, below []byte, size int) letterSet {
@@ -8,9 +8,9 @@ import (
 	"strings"
 	"testing"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // TestScoreRealGames replays real tournament games recorded in GCG format and checks that
@@ -19,11 +19,11 @@ import (
 //
 // The games come from cross-tables.com (annotated self-play) and are stored under
 // testdata/. They use the standard English board and SOWPODS, so the test loads the
-// committed dawg/en_sowpods.dawg (build it with `make dawg`).
+// committed dawg/en_sowpods.dawg.
 func TestScoreRealGames(t *testing.T) {
 	finder, err := dictdawg.Load("../dawg/en_sowpods.dawg")
 	if err != nil {
-		t.Skipf("need dawg/en_sowpods.dawg (run `make dawg`): %v", err)
+		t.Skipf("need dawg/en_sowpods.dawg: %v", err)
 	}
 	s := NewSolver(rules.English(), finder)
 	games, _ := filepath.Glob("testdata/*.gcg")
@@ -1,9 +1,9 @@
 package scrabble

 import (
-	"scrabble-solver/board"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // generateBoth runs an across-generator on the board (for horizontal plays) and on its
@@ -3,10 +3,10 @@ package scrabble
 import (
 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // DAWGGenerator generates moves with the Appel-Jacobson two-phase algorithm
@@ -5,12 +5,12 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/encoding"
-	"scrabble-solver/internal/wordlist"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/wordlist"
 )

 func makeRack(letters string, blanks int) rack.Rack {
@@ -1,8 +1,8 @@
 package scrabble

 import (
-	"scrabble-solver/board"
-	"scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
 )

 // Generator produces every legal play for a position. The DAWG generator
@@ -1,9 +1,9 @@
 package scrabble

 import (
-	"scrabble-solver/board"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // dict is a membership set of words (alphabet-index strings) for the oracle.
@@ -5,9 +5,9 @@ import (
 	"fmt"
 	"sort"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // coord maps a line coordinate (fixed, axis) to a board (row, col) for direction dir.
@@ -3,9 +3,9 @@ package scrabble
 import (
 	"testing"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/encoding"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/internal/encoding"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 const plain7 = `.......
@@ -7,9 +7,9 @@ import (

 	dawg "github.com/iliadenisov/dafsa"

-	"scrabble-solver/board"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
 )

 // Solver is the high-level entry point: it generates ranked plays and scores or
@@ -5,9 +5,9 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/board"
-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/wordlist"
 )

 func newTestSolver(t *testing.T) *Solver {
@@ -7,10 +7,10 @@ import (
 	"sort"
 	"time"

-	"scrabble-solver/board"
-	"scrabble-solver/rack"
-	"scrabble-solver/rules"
-	"scrabble-solver/scrabble"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/board"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rack"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/scrabble"
 )

 // blankTile marks a blank in the bag and in a player's hand.
@@ -5,11 +5,11 @@ import (

 	"github.com/iliadenisov/alphabet"

-	"scrabble-solver/internal/dictdawg"
-	"scrabble-solver/internal/wordlist"
-	"scrabble-solver/rules"
-	"scrabble-solver/scrabble"
-	"scrabble-solver/selfplay"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/dictdawg"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/rules"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/scrabble"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/selfplay"
+	"gitea.iliadenisov.ru/developer/scrabble-solver/wordlist"
 )

 func TestPlayGameSmoke(t *testing.T) {