Skip to content

feat(jepa_t_ingest): Wave-14a L-S50 — plaintext → ternary triplet pipeline for JEPA-T#811

Open
gHashTag wants to merge 1 commit into
mainfrom
feat/jepa-t-ingest
Open

feat(jepa_t_ingest): Wave-14a L-S50 — plaintext → ternary triplet pipeline for JEPA-T#811
gHashTag wants to merge 1 commit into
mainfrom
feat/jepa-t-ingest

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Closes #807

Summary

Adds crates/jepa_t_ingest/ — a Rust-only (R1 CROWN) crate that streams plaintext corpora into ternary-quantized triplet sequences for JEPA-T training on Trinity silicon (Wave-14a L-S50).

Quantizer — Wave-9b RTL byte-for-byte match

pub fn quantize_phi_prior(fp_q15: i16) -> i8

Threshold: φ⁻² in Q1.15 = 12533 (0x30F4):

  • if fp_q15 >= +12533+1
  • if fp_q15 <= -12533-1
  • else → 0

Deliverables

File Description
Cargo.toml edition 2021, Apache-2.0, author Dmitrii Vasilev
src/lib.rs quantize_phi_prior, IngestConfig, Triplet, ingest_text
src/bin/jepa_t_ingest.rs CLI: --input corpus.txt --output triplets.bin
tests/quantize.rs Boundary tests ±12532→0, ±12533→±1, 0, ±0x7FFF; exhaustive 65536-input scan
tests/ingest.rs Golden corpus byte-compare (3 triplets, 192 bytes each)
README.md Full API docs + quantizer table

cargo test -p jepa_t_ingest output

running 8 tests (unit)
test unit_tests::ingest_empty_returns_empty ... ok
test unit_tests::ingest_produces_valid_ternary ... ok
test unit_tests::ingest_short_returns_empty ... ok
test unit_tests::quantize_max_values ... ok
test unit_tests::quantize_negative_boundary ... ok
test unit_tests::quantize_positive_boundary ... ok
test unit_tests::quantize_zero ... ok
test unit_tests::triplet_to_bytes_length ... ok
test result: ok. 8 passed; 0 failed

running 10 tests (tests/ingest.rs)
test empty_corpus_returns_empty ... ok
test all_values_are_ternary ... ok
test golden_corpus_triplet_count ... ok
test golden_first_triplet_anchor ... ok
test golden_first_triplet_negative ... ok
test golden_first_triplet_positive ... ok
test golden_total_bytes ... ok
test large_stride_still_produces_triplets ... ok
test single_token_returns_empty ... ok
test triplet_serialises_to_192_bytes ... ok
test result: ok. 10 passed; 0 failed

running 12 tests (tests/quantize.rs)
test boundary_minus_12533_is_negative_one ... ok
test boundary_minus_12532_is_zero ... ok
test boundary_plus_12532_is_zero ... ok
test boundary_plus_12533_is_positive_one ... ok
test max_i16_is_positive_one ... ok
test min_i16_is_negative_one ... ok
test mid_range_negative_is_zero ... ok
test mid_range_positive_is_zero ... ok
test one_above_threshold_is_positive_one ... ok
test one_below_negative_threshold_is_negative_one ... ok
test output_always_ternary_for_all_i16 ... ok
test zero_is_zero ... ok
test result: ok. 12 passed; 0 failed

Doc-tests jepa_t_ingest: 2 passed; 0 failed

TOTAL: 32 tests, 0 failures

Quantizer Parity Confirmation

Input Expected Actual Status
+12532 0 0 ✓ PASS
+12533 +1 +1 ✓ PASS
-12532 0 0 ✓ PASS
-12533 -1 -1 ✓ PASS
0 0 0 ✓ PASS
+0x7FFF +1 +1 ✓ PASS
-0x8000 -1 -1 ✓ PASS
all 65536 i16 inputs ternary ternary ✓ PASS (exhaustive)

R1 CROWN

No Python files in crates/jepa_t_ingest/. Rust ONLY.

References

  • Wave-9b RTL phi_prior_quantizer.v
  • DOI: 10.5281/zenodo.19227877
  • φ² + φ⁻² = 3; ternary alphabet {−1, 0, +1}

Closes #807

Add `crates/jepa_t_ingest/` — a Rust-only (R1 CROWN) crate that
streams plaintext corpora into ternary-quantized triplet sequences for
JEPA-T training on Trinity silicon.

## Quantizer — Wave-9b RTL byte-for-byte match

  pub fn quantize_phi_prior(fp_q15: i16) -> i8

Threshold: φ⁻² in Q1.15 = 12533 (0x30F4)
  if fp_q15 >= +12533  →  +1
  if fp_q15 <= -12533  →  -1
  else                 →   0

## Deliverables

- Cargo.toml  (edition 2021, Apache-2.0)
- src/lib.rs  (quantize_phi_prior, IngestConfig, Triplet, ingest_text)
- src/bin/jepa_t_ingest.rs  (CLI: --input corpus.txt --output triplets.bin)
- tests/quantize.rs  (boundary: ±12532→0, ±12533→±1, 0, ±0x7FFF;
                      exhaustive 65536-input scan)
- tests/ingest.rs   (golden corpus byte-compare, 3 triplets, 192 bytes each)
- README.md

## Test results

cargo test -p jepa_t_ingest: 32 tests, 0 failures
cargo build --release --bin jepa_t_ingest: success

Signed-off-by: Dmitrii Vasilev <admin@t27.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Wave-14a] JEPA-T ternary ingest pipeline (L-S50)

1 participant