Validation Log

Chronological record of every adversarial test run against the H12 decoder. Both passed and failed tests are documented. This log exists so that reviewers can verify we are testing our own hypothesis honestly.

Methodology: Every test compares H12 against random decoders using the same architecture with randomized consonant mappings. The random decoders share H12's vowel-insertion mechanism and abugida structure — only the 10 consonant mappings are randomized from the same 11-consonant, 5-vowel inventory.

2026-02-10: Initial Validation Suite

Coverage Validation

Script: scripts/validate_coverage.py
Result: Tier 1 (glossed) = 90.7%, Total known (Tier 1+2+3) = 99.4%, Unknown = 0.6%
Verdict: PASS

Vowel-Final Constraint

Script: scripts/validate_vowel_final.py
Result: 99.73% of tokens end in vowels (35,798/35,894)
Note: This is a property of the abugida decoder mechanism, shared by all random decoders. It confirms the encoding is abugida-based (South Asian, not European) but does not discriminate H12 from random decoders.
Verdict: N/A as discriminator (confirms abugida encoding)

Domain Clustering

Script: scripts/validate_domain_clustering.py
Result: 46.1% medical vocabulary, 80.7% domain-consistent. 9x enrichment over ~5% random baseline.
Verdict: PASS

Phonotactic Cross-Language Comparison

Script: scripts/validate_phonotactics.py
Result: Sinhala ranks #2-3 across CV-pattern measures. Hindi ranks #1 on all measures.
Note: Hindi's higher ranking is explained by shared Indic phonotactic structure. The test confirms the decoded output has Indic phonotactic properties but does not uniquely identify Sinhala. Other tests (cross-language discrimination, Panchavidha) provide language-specific evidence.
Verdict: PARTIAL (confirms Indic, does not uniquely identify Sinhala)

2026-02-11: Adversarial Random-Decoder Tests

Panchavidha Kashaya Kalpana (Classical Dosage Forms)

Script: scripts/structural_multilang_test.py (Panchavidha section)
Result: 6 H12-decoded terms map one-to-one to the 5 classical Ayurvedic dosage forms. 1,560 tokens. Z = 7.2 vs 200 random decoders. 1/200 beat H12.
Details: ugeda=Churna, ugeea=Sneha (fat-soluble), ea=Ghrita (ghee), uteda=Kashaya (decoction), mea=Madhu (honey), gula=Vati (pill). Random decoders produce gibberish from the same EVA words.
Verdict: STRONG PASS (Z=7.2, CIRCULAR — see disclosures)

External Pharmaceutical Vocabulary (Anti-Circularity)

Script: scripts/validate_external_pharma.py
Result: 7,130 tokens (19.9%) match 150 independently-compiled pharmaceutical terms. Uncontrolled: Z=2.3, 3/200 beat H12. Controlled (o->u only): Z=3.4, 0/38 beat H12.
Post-hoc analysis: All 3 "beating" random decoders share o->a mapping (vowel degeneration inflating matches) + sh->m (same as H12). With o->a removed, all 3 fall below H12.
Sources: Bodleian Library MSS (Liyanaratne 1992), Yogaratnakaraya, Charaka/Sushruta Samhita, Sri Lanka Ayurvedic Drugs Corporation formulary.
Verdict: STRONG PASS (Controlled Z=3.4, p < 0.001)

Cross-Language Pharmaceutical Discrimination

Script: scripts/crosslang_discrimination_test.py
Result: Sinhala 5,244 tokens (14.6%), Pali 1,306 (3.6%), Hindi 332 (0.9%), Malayalam 77 (0.2%), Tamil 3 (0.0%). Discrimination ratio 4.02x. Sinhala Z=1.87 (p=0.031).
Note: Sinhala wins 199/200 random decoder trials, indicating short CV syllable patterns broadly favor Sinhala. H12's specific advantage is modest (Z=1.87) but real.
Verdict: WEAK PASS

Folio-Section Pharmaceutical Clustering

Script: scripts/folio_pharma_clustering.py
Result: Medical vs non-medical section clustering ratio 1.17x, Z = 0.2.
Why it failed: The entire manuscript appears to be medical text. There are no true non-medical control sections — even "Stars" and "Zodiac" pages contain medical vocabulary. This is actually consistent with the hypothesis (the manuscript is a medical teaching manual throughout) but means the test cannot discriminate.
Verdict: FAILED (informative — entire manuscript is medical)

Recipe Phase Sequence Ordering

Script: scripts/recipe_sequence_test.py
Result: Forward momentum Z=0.9, recipe consistency Z=-0.7, collocation Z=0.1. All 3 sub-tests failed.
Why it failed: The matched pharmaceutical vocabulary (ula, gena, ura) is too generic and ubiquitous to reveal recipe phase ordering. These terms appear in nearly every folio regardless of phase.
Verdict: FAILED (matched vocabulary too generic)

SOV Syntax Validation

Script: scripts/sov_syntax_test.py
Result: 77.1% postposition-after-noun (Z=8.10), 66.2% noun-before-verb (Z=5.07), 56.2% verb-final (Z=2.17). SOV word order 8/8, SVO 0/8, VSO 0/8. 1000 scrambled-word-order controls.
Cross-section consistency: HERBAL 67.0%/76.3%, ZODIAC 73.9%/76.6%, RECIPE 65.0%/78.2% postpositional — all sections show SOV.
Note: Z-scores are STRONGER than original estimates (8.10 vs 7.04 for postpositional, 5.07 vs 4.19 for noun-before-verb) due to reproducible implementation with explicit word classification lists.
Verdict: STRONG PASS (Z=8.10, p < 10^-12)

Pharmaceutical Collocations

Script: scripts/collocation_test.py
Result: 16/36 (44%) pharmaceutical collocations observed, 5 STRONG (>5x baseline), 5 MODERATE (2-5x). H12 produces 16 hits vs random decoder avg 3.3 (max 9), giving 4.8x advantage.
Controls: Random word-pair baseline (1.43x), shuffled word order (p=0.983, NOT significant), random consonant mappings (H12=16 vs avg=3.3).
Note: Shuffled word order test FAILED (p=0.983) — shuffled order produces similar hit counts, meaning the collocations are driven by global word frequencies rather than local adjacency. The random decoder comparison is the meaningful control.
Verdict: MODERATELY SUPPORTED (composite 4/7; H12 4.8x random decoders, but shuffled order test failed)

2026-02-11: Literature-Derived Analyses (arXiv)

Keyword-Section Clustering (Montemurro & Zanette 2013 Replication)

Script: scripts/keyword_section_clustering.py
Result: Chi-squared = 2,015.09, Z = 30.30 vs 1000 random folio-shuffle baseline (0/1000 shuffles reached observed value). Decoded keyword semantic profiles (PLANT, PREPARATION, LIQUID, BODY, DISEASE, etc.) differ significantly across manuscript sections.
Note: HERBAL sections do NOT have the highest plant keyword proportion (ZODIAC does at 9.1% vs HERBAL 3.5%). PHARMA sections do NOT have the highest preparation proportion (BALNEO does at 34.4% vs PHARMA 18.2%). The clustering is statistically real but the section-label mapping is imperfect.
Significance: This test IS Naibbe-proof because it depends on folio-level content variation within the manuscript, not just global H12 output frequencies.
Verdict: STRONG PASS (Z=30.30, p < 0.001)

Entropy h2 Analysis (Bowern & Lindemann 2021)

Script: scripts/entropy_analysis.py
Result: EVA h2 = 2.358 bits, H12 decoded h2 = 2.339 bits (delta = -0.019). Decoded text is NOT closer to natural language entropy (~3.3 bits) than raw EVA.
Context: Literature reports Voynichese h2 ≈ 2 bits vs natural languages h2 ≈ 3-4 bits. The H12 decoder does not resolve this gap. The low entropy may reflect the restricted phoneme inventory (14 phonemes) of spoken Elu.
Verdict: NEUTRAL (entropy unchanged by decoding)

Directionality Analysis (Parisel 2025)

Script: scripts/entropy_analysis.py
Result: EVA shows RTL optimization (4-gram perplexity ratio RTL/LTR = 0.899). H12 decoded shows LTR optimization (ratio = 1.804). Sinhala dictionary shows LTR optimization (ratio = 1.118). English shows LTR optimization (ratio = 1.093).
Significance: The directional FLIP from RTL→LTR when decoding through H12 is consistent with abugida encoding rules that transform position-dependent Sinhala phoneme sequences into EVA patterns with reversed directional properties.
Verdict: PASS (supports abugida encoding hypothesis)

2026-02-12: Methodological Robustness Suite

Dual Null Model Comparison

Script: scripts/dual_null_model_test.py
Result: 200 constrained (abugida-preserving) + 200 unconstrained (full random) trials. Three of five tests significant (Z >= 2.0) under both nulls: pharma vocab (Z=2.31/73.81), collocations (Z=5.32/inf), section clustering (Z=3.40/116.95). SOV syntax NOT significant under either null (post-after-noun Z=0.85, NbV Z=1.50).
Note: Z-scores higher under unconstrained null as expected. Constrained null is the more conservative test. SOV non-significance means word-order statistics partially reflect EVA token distribution, not just grammar.
Verdict: SUPPORTED (3/5 significant under conservative constrained null)

Stability and Robustness Checks

Script: scripts/stability_robustness_test.py
Result: 3 claims tested under 4 perturbation types (bootstrap 100x80%, vocab pruning top-5/10/20, section splits, alternate tokenization). External pharma: ROBUST. Keyword clustering: ROBUST. SOV syntax: FRAGILE (NbV drops from 63.2% to 44.1% under top-10 pruning).
Note: SOV fragility is consistent with dual null model finding. Postposition-after-noun metric remains stable (77.3% → 74.2%).
Verdict: 2/3 ROBUST, 1/3 FRAGILE

Multiple-Testing Correction

Script: scripts/multiple_testing_correction.py
Result: 3 primary tests corrected as family at α=0.05 (SOV reclassified to conditional corroboration after dual null model). Bonferroni (α/3=0.017): 3/3 PASS. Holm-Bonferroni: 3/3 PASS. BH-FDR: 3/3 PASS. Sensitivity analysis correcting all 8 quantitative tests: 5/8 Bonferroni, 6/8 Holm-Bonferroni, 7/8 BH-FDR.
Note: Primary test classification follows Bender & Lange (2001): directional hypotheses with Z >= 3.0 and non-circular design. Collocation Z=5.32 from dual null model (N=200 constrained decoders). Original N=10 estimate was Z=4.69.
Verdict: ALL 3/3 primary survive most conservative correction (SOV treated as conditional corroboration, not primary)

Holdout (Train/Test Split) Validation

Script: scripts/holdout_validation.py
Result: Corpus split by odd/even folio numbers (TRAIN: 17,163 tokens, TEST: 18,753 tokens, both spanning all sections). Three holdout tests: (A) Pharmaceutical vocabulary generalises at 14.8x random baseline (Z=19.7); (B) 154/160 TRAIN collocations replicate in TEST above random pairs (Z=21.0); (C) TRAIN semantic profiles predict TEST folio sections at 43.5% vs 14.2% chance (Z=4.3). Supplementary chi-squared on TEST alone: Z=14.2.
Verdict: ALL 3/3 holdout tests PASS. Primary claims generalise to unseen data.

Fisher's Method Combined Significance

Result: Fisher's method on 3 primary p-values yields χ²=510.1 (df=6, p ≪ 10⁻¹⁰⁰). Conservative (excluding dominant clustering): p = 4.5 × 10⁻¹⁰ (χ²=49.5, df=4).
Verdict: Replaces informal "effectively zero" language with computed combined p-value.

Tests We Would Welcome

We actively invite the community to propose, implement, or run additional tests. Particularly valuable:

Independent Sinhala/Elu linguistic assessment of the decoded text
Additional cross-language discrimination with larger vocabularies
Statistical tests for hidden structure we haven't considered
Comparison against other proposed Voynich decipherments using the same random-decoder methodology

To propose a test, open an issue. To submit a test, follow the pattern of existing scripts: decode with H12, run the same test on random decoders, report the Z-score.

How to Verify

# Clone and run any test yourself
git clone https://github.com/kamb-code/Voynich.git
cd Voynich
python scripts/validate_coverage.py
python scripts/validate_external_pharma.py
python scripts/crosslang_discrimination_test.py

All scripts use relative paths from the scripts directory. The only external dependency is NumPy (required only for validate_phonotactics.py).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation Log

2026-02-10: Initial Validation Suite

Coverage Validation

Vowel-Final Constraint

Domain Clustering

Phonotactic Cross-Language Comparison

2026-02-11: Adversarial Random-Decoder Tests

Panchavidha Kashaya Kalpana (Classical Dosage Forms)

External Pharmaceutical Vocabulary (Anti-Circularity)

Cross-Language Pharmaceutical Discrimination

Folio-Section Pharmaceutical Clustering

Recipe Phase Sequence Ordering

SOV Syntax Validation

Pharmaceutical Collocations

2026-02-11: Literature-Derived Analyses (arXiv)

Keyword-Section Clustering (Montemurro & Zanette 2013 Replication)

Entropy h2 Analysis (Bowern & Lindemann 2021)

Directionality Analysis (Parisel 2025)

2026-02-12: Methodological Robustness Suite

Dual Null Model Comparison

Stability and Robustness Checks

Multiple-Testing Correction

Holdout (Train/Test Split) Validation

Fisher's Method Combined Significance

Tests We Would Welcome

How to Verify

FilesExpand file tree

VALIDATION_LOG.md

Latest commit

History

VALIDATION_LOG.md

File metadata and controls

Validation Log

2026-02-10: Initial Validation Suite

Coverage Validation

Vowel-Final Constraint

Domain Clustering

Phonotactic Cross-Language Comparison

2026-02-11: Adversarial Random-Decoder Tests

Panchavidha Kashaya Kalpana (Classical Dosage Forms)

External Pharmaceutical Vocabulary (Anti-Circularity)

Cross-Language Pharmaceutical Discrimination

Folio-Section Pharmaceutical Clustering

Recipe Phase Sequence Ordering

SOV Syntax Validation

Pharmaceutical Collocations

2026-02-11: Literature-Derived Analyses (arXiv)

Keyword-Section Clustering (Montemurro & Zanette 2013 Replication)

Entropy h2 Analysis (Bowern & Lindemann 2021)

Directionality Analysis (Parisel 2025)

2026-02-12: Methodological Robustness Suite

Dual Null Model Comparison

Stability and Robustness Checks

Multiple-Testing Correction

Holdout (Train/Test Split) Validation

Fisher's Method Combined Significance

Tests We Would Welcome

How to Verify