TRACE¶
Textual Reuse, Alignment, and Collation Engine — a Python library for philological alignment with pluggable language packs. Pairwise (v0.1) and simultaneous multi-witness (v0.2) alignment.
TRACE is built for textual criticism, manuscript witness comparison, and the creation of digital synopses and critical editions. The core is language-agnostic; the first shipped language pack covers Biblical and Rabbinic Hebrew (hbo).
At a glance¶
Tokenizer pipeline with editorial-marker awareness (
[reconstructed],⟦deletion⟧,〈insertion〉,(expanded), lacunae).Tiered scoring that returns (score, reason) per token pair —
EXACT,NIQQUD_STRIPPED,PLENE_DEFECTIVE,ABBREVIATION,ORTHOGRAPHIC,INSERTION,OMISSION,NO_MATCH.Pairwise aligner — semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (
ר"י↔רבי ישמעאל).Multi-witness aligner (v0.2) — N witnesses aligned simultaneously into a canonical variant graph plus a derived aligned table, via pairwise distances → UPGMA guide tree → POA-based progressive merge. Determinism and lossless reconstruction are pinned by property tests.
Hebrew language pack with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via
Lexica.merge()).I/O for plain text, JSON (round-trip for both pairwise and multi-witness results), eScriptorium exports, and TEI XML.
Reproducible: every
AlignmentResult/MultiAlignmentResultcarriestrace_versionandlanguage_pack_versionin its params.
Get going¶
Documentation
- Installation
- Usage
- Details
- FAQ
- Why is the package on PyPI called
tracealignbut the project isTRACE? - Does TRACE work for languages other than Hebrew?
- Why semi-global instead of global or local alignment?
- How does the abbreviation lookahead work?
- How is
total_scorecomputed? - Is TRACE fast enough for production alignments?
- How do I extend the Hebrew abbreviation lexicon?
- What about
<choice>,<corr>,<reg>,<expan>in TEI? - Can I use TRACE for plagiarism / text-reuse detection?
- How are alignment results meant to be persisted?
- What’s the v0.2 outlook?
- How does multi-witness alignment differ from pairwise?
- Is
align_multideterministic? - How big can multi-witness alignments get?
- Why UPGMA and not Neighbor-Joining for the guide tree?
- Can I add a new witness to an existing alignment incrementally?
- How do I persist a multi-witness result?
- Why is the package on PyPI called
- Contributing
Project status¶
TRACE is an early-stage research library. v0.1.x ships the pairwise aligner and the Hebrew pack; v0.2 adds the multi-witness master alignment graph. Future stages cover Geniza fragment anchor detection, text-reuse detection, apparatus / critical-edition generation, cross-tradition Hexapla-style alignment, stemmatic reconstruction, allusion detection, citation graphs, and multi-millennial reception history. See the roadmap for the long-term ten-stage plan.
License¶
MIT © 2026 Benjamin Schnabel.