Chronological record of every adversarial test run against the H12 decoder. Both passed and failed tests are documented. This log exists so that reviewers can verify we are testing our own hypothesis honestly.
Methodology: Every test compares H12 against random decoders using the same architecture with randomized consonant mappings. The random decoders share H12's vowel-insertion mechanism and abugida structure — only the 10 consonant mappings are randomized from the same 11-consonant, 5-vowel inventory.
- Script:
scripts/validate_coverage.py - Result: Tier 1 (glossed) = 90.7%, Total known (Tier 1+2+3) = 99.4%, Unknown = 0.6%
- Verdict: PASS
- Script:
scripts/validate_vowel_final.py - Result: 99.73% of tokens end in vowels (35,798/35,894)
- Note: This is a property of the abugida decoder mechanism, shared by all random decoders. It confirms the encoding is abugida-based (South Asian, not European) but does not discriminate H12 from random decoders.
- Verdict: N/A as discriminator (confirms abugida encoding)
- Script:
scripts/validate_domain_clustering.py - Result: 46.1% medical vocabulary, 80.7% domain-consistent. 9x enrichment over ~5% random baseline.
- Verdict: PASS
- Script:
scripts/validate_phonotactics.py - Result: Sinhala ranks #2-3 across CV-pattern measures. Hindi ranks #1 on all measures.
- Note: Hindi's higher ranking is explained by shared Indic phonotactic structure. The test confirms the decoded output has Indic phonotactic properties but does not uniquely identify Sinhala. Other tests (cross-language discrimination, Panchavidha) provide language-specific evidence.
- Verdict: PARTIAL (confirms Indic, does not uniquely identify Sinhala)
- Script:
scripts/structural_multilang_test.py(Panchavidha section) - Result: 6 H12-decoded terms map one-to-one to the 5 classical Ayurvedic dosage forms. 1,560 tokens. Z = 7.2 vs 200 random decoders. 1/200 beat H12.
- Details: ugeda=Churna, ugeea=Sneha (fat-soluble), ea=Ghrita (ghee), uteda=Kashaya (decoction), mea=Madhu (honey), gula=Vati (pill). Random decoders produce gibberish from the same EVA words.
- Verdict: STRONG PASS (Z=7.2, CIRCULAR — see disclosures)
- Script:
scripts/validate_external_pharma.py - Result: 7,130 tokens (19.9%) match 150 independently-compiled pharmaceutical terms. Uncontrolled: Z=2.3, 3/200 beat H12. Controlled (o->u only): Z=3.4, 0/38 beat H12.
- Post-hoc analysis: All 3 "beating" random decoders share o->a mapping (vowel degeneration inflating matches) + sh->m (same as H12). With o->a removed, all 3 fall below H12.
- Sources: Bodleian Library MSS (Liyanaratne 1992), Yogaratnakaraya, Charaka/Sushruta Samhita, Sri Lanka Ayurvedic Drugs Corporation formulary.
- Verdict: STRONG PASS (Controlled Z=3.4, p < 0.001)
- Script:
scripts/crosslang_discrimination_test.py - Result: Sinhala 5,244 tokens (14.6%), Pali 1,306 (3.6%), Hindi 332 (0.9%), Malayalam 77 (0.2%), Tamil 3 (0.0%). Discrimination ratio 4.02x. Sinhala Z=1.87 (p=0.031).
- Note: Sinhala wins 199/200 random decoder trials, indicating short CV syllable patterns broadly favor Sinhala. H12's specific advantage is modest (Z=1.87) but real.
- Verdict: WEAK PASS
- Script:
scripts/folio_pharma_clustering.py - Result: Medical vs non-medical section clustering ratio 1.17x, Z = 0.2.
- Why it failed: The entire manuscript appears to be medical text. There are no true non-medical control sections — even "Stars" and "Zodiac" pages contain medical vocabulary. This is actually consistent with the hypothesis (the manuscript is a medical teaching manual throughout) but means the test cannot discriminate.
- Verdict: FAILED (informative — entire manuscript is medical)
- Script:
scripts/recipe_sequence_test.py - Result: Forward momentum Z=0.9, recipe consistency Z=-0.7, collocation Z=0.1. All 3 sub-tests failed.
- Why it failed: The matched pharmaceutical vocabulary (ula, gena, ura) is too generic and ubiquitous to reveal recipe phase ordering. These terms appear in nearly every folio regardless of phase.
- Verdict: FAILED (matched vocabulary too generic)
- Script:
scripts/sov_syntax_test.py - Result: 77.1% postposition-after-noun (Z=8.10), 66.2% noun-before-verb (Z=5.07), 56.2% verb-final (Z=2.17). SOV word order 8/8, SVO 0/8, VSO 0/8. 1000 scrambled-word-order controls.
- Cross-section consistency: HERBAL 67.0%/76.3%, ZODIAC 73.9%/76.6%, RECIPE 65.0%/78.2% postpositional — all sections show SOV.
- Note: Z-scores are STRONGER than original estimates (8.10 vs 7.04 for postpositional, 5.07 vs 4.19 for noun-before-verb) due to reproducible implementation with explicit word classification lists.
- Verdict: STRONG PASS (Z=8.10, p < 10^-12)
- Script:
scripts/collocation_test.py - Result: 16/36 (44%) pharmaceutical collocations observed, 5 STRONG (>5x baseline), 5 MODERATE (2-5x). H12 produces 16 hits vs random decoder avg 3.3 (max 9), giving 4.8x advantage.
- Controls: Random word-pair baseline (1.43x), shuffled word order (p=0.983, NOT significant), random consonant mappings (H12=16 vs avg=3.3).
- Note: Shuffled word order test FAILED (p=0.983) — shuffled order produces similar hit counts, meaning the collocations are driven by global word frequencies rather than local adjacency. The random decoder comparison is the meaningful control.
- Verdict: MODERATELY SUPPORTED (composite 4/7; H12 4.8x random decoders, but shuffled order test failed)
- Script:
scripts/keyword_section_clustering.py - Result: Chi-squared = 2,015.09, Z = 30.30 vs 1000 random folio-shuffle baseline (0/1000 shuffles reached observed value). Decoded keyword semantic profiles (PLANT, PREPARATION, LIQUID, BODY, DISEASE, etc.) differ significantly across manuscript sections.
- Note: HERBAL sections do NOT have the highest plant keyword proportion (ZODIAC does at 9.1% vs HERBAL 3.5%). PHARMA sections do NOT have the highest preparation proportion (BALNEO does at 34.4% vs PHARMA 18.2%). The clustering is statistically real but the section-label mapping is imperfect.
- Significance: This test IS Naibbe-proof because it depends on folio-level content variation within the manuscript, not just global H12 output frequencies.
- Verdict: STRONG PASS (Z=30.30, p < 0.001)
- Script:
scripts/entropy_analysis.py - Result: EVA h2 = 2.358 bits, H12 decoded h2 = 2.339 bits (delta = -0.019). Decoded text is NOT closer to natural language entropy (~3.3 bits) than raw EVA.
- Context: Literature reports Voynichese h2 ≈ 2 bits vs natural languages h2 ≈ 3-4 bits. The H12 decoder does not resolve this gap. The low entropy may reflect the restricted phoneme inventory (14 phonemes) of spoken Elu.
- Verdict: NEUTRAL (entropy unchanged by decoding)
- Script:
scripts/entropy_analysis.py - Result: EVA shows RTL optimization (4-gram perplexity ratio RTL/LTR = 0.899). H12 decoded shows LTR optimization (ratio = 1.804). Sinhala dictionary shows LTR optimization (ratio = 1.118). English shows LTR optimization (ratio = 1.093).
- Significance: The directional FLIP from RTL→LTR when decoding through H12 is consistent with abugida encoding rules that transform position-dependent Sinhala phoneme sequences into EVA patterns with reversed directional properties.
- Verdict: PASS (supports abugida encoding hypothesis)
- Script:
scripts/dual_null_model_test.py - Result: 200 constrained (abugida-preserving) + 200 unconstrained (full random) trials. Three of five tests significant (Z >= 2.0) under both nulls: pharma vocab (Z=2.31/73.81), collocations (Z=5.32/inf), section clustering (Z=3.40/116.95). SOV syntax NOT significant under either null (post-after-noun Z=0.85, NbV Z=1.50).
- Note: Z-scores higher under unconstrained null as expected. Constrained null is the more conservative test. SOV non-significance means word-order statistics partially reflect EVA token distribution, not just grammar.
- Verdict: SUPPORTED (3/5 significant under conservative constrained null)
- Script:
scripts/stability_robustness_test.py - Result: 3 claims tested under 4 perturbation types (bootstrap 100x80%, vocab pruning top-5/10/20, section splits, alternate tokenization). External pharma: ROBUST. Keyword clustering: ROBUST. SOV syntax: FRAGILE (NbV drops from 63.2% to 44.1% under top-10 pruning).
- Note: SOV fragility is consistent with dual null model finding. Postposition-after-noun metric remains stable (77.3% → 74.2%).
- Verdict: 2/3 ROBUST, 1/3 FRAGILE
- Script:
scripts/multiple_testing_correction.py - Result: 3 primary tests corrected as family at α=0.05 (SOV reclassified to conditional corroboration after dual null model). Bonferroni (α/3=0.017): 3/3 PASS. Holm-Bonferroni: 3/3 PASS. BH-FDR: 3/3 PASS. Sensitivity analysis correcting all 8 quantitative tests: 5/8 Bonferroni, 6/8 Holm-Bonferroni, 7/8 BH-FDR.
- Note: Primary test classification follows Bender & Lange (2001): directional hypotheses with Z >= 3.0 and non-circular design. Collocation Z=5.32 from dual null model (N=200 constrained decoders). Original N=10 estimate was Z=4.69.
- Verdict: ALL 3/3 primary survive most conservative correction (SOV treated as conditional corroboration, not primary)
- Script:
scripts/holdout_validation.py - Result: Corpus split by odd/even folio numbers (TRAIN: 17,163 tokens, TEST: 18,753 tokens, both spanning all sections). Three holdout tests: (A) Pharmaceutical vocabulary generalises at 14.8x random baseline (Z=19.7); (B) 154/160 TRAIN collocations replicate in TEST above random pairs (Z=21.0); (C) TRAIN semantic profiles predict TEST folio sections at 43.5% vs 14.2% chance (Z=4.3). Supplementary chi-squared on TEST alone: Z=14.2.
- Verdict: ALL 3/3 holdout tests PASS. Primary claims generalise to unseen data.
- Result: Fisher's method on 3 primary p-values yields χ²=510.1 (df=6, p ≪ 10⁻¹⁰⁰). Conservative (excluding dominant clustering): p = 4.5 × 10⁻¹⁰ (χ²=49.5, df=4).
- Verdict: Replaces informal "effectively zero" language with computed combined p-value.
We actively invite the community to propose, implement, or run additional tests. Particularly valuable:
- Independent Sinhala/Elu linguistic assessment of the decoded text
- Additional cross-language discrimination with larger vocabularies
- Statistical tests for hidden structure we haven't considered
- Comparison against other proposed Voynich decipherments using the same random-decoder methodology
To propose a test, open an issue. To submit a test, follow the pattern of existing scripts: decode with H12, run the same test on random decoders, report the Z-score.
# Clone and run any test yourself
git clone https://github.com/kamb-code/Voynich.git
cd Voynich
python scripts/validate_coverage.py
python scripts/validate_external_pharma.py
python scripts/crosslang_discrimination_test.pyAll scripts use relative paths from the scripts directory. The only external dependency is NumPy (required only for validate_phonotactics.py).