Skip to content

Commit d8b7b8e

Browse files
committed
docs(jina): JinaV5 docstring — point at existing precision-path primitives
User directive: item 11 should reference existing code, NOT duplicate it. "Only document, use, don't duplicate." Updated the ModelSource::JinaV5 variant docstring to: 1. Correct "Qwen3 base" → "Qwen 3.5 base" (per user's Qwopus/Qwen3.5 clarification; Qwopus and Jina v5 share the Qwen 3.x family) 2. Add Reader-LM v3 alias explicitly — "Also known as Reader-LM v3 (same model, alternate name — BERT 3.x architecture lineage; NOT the older Qwen2-based Reader-LM 1.5B/v1/v2)" 3. Document the canonical precision path by CITING EXISTING PRIMITIVES with file:line references. No new code, no duplicated conversion logic: - crate::hpc::gguf::read_tensor_f32 (src/hpc/gguf.rs:188) — F16/F32/BF16/Q8_0 → Vec<f32> loader, handles F16 source to F32 transient upcast in a single call - crate::hpc::gguf::f16_to_f32 (src/hpc/gguf.rs:417) — scalar per-element F16 → F32 primitive (used internally by read_tensor_f32) - crate::hpc::quantized::f32_to_bf16_rounded (src/hpc/quantized.rs:80) — F32 working format → BF16 storage conversion - crate::hpc::quantized::f32_vec_to_bf16 — slice variant of the above - crate::hpc::quantized::bf16_gemm_f32 (src/hpc/quantized.rs:108) — BF16 GEMM with F32 accumulation (the actual BF16 compute primitive) - crate::simd::F32x16::mul_add / F32x8 / F64x8 (src/simd.rs:206) — hardware FMA primitive (the "add_mul" the user was referencing). Compiles to VFMADD213PS (AVX-FMA) or VDPBF16PS (AVX-512-BF16). 4. Explicit anti-patterns: - Never F16 → BF16 direct (loses 3 exponent bits, F16 max ~65504 overflows before reaching BF16 range) - Never 8-bit quantization as compute precision (only as final calibrated storage format) - No F32 in hot loops (F32 is strictly a transient upcast pipe) 5. Referenced the external calibration path for completeness: lance-graph/crates/bgz-tensor/src/gamma_phi.rs::calibrate_gamma (HDR-TV-style per-role normalizer, not an ndarray-internal primitive) Verified before commit (per "verify assumed validity" rule): - cargo check --lib: clean, pre-existing warnings only - cargo test --lib hpc::jina::runtime: 11 tests pass, including test_jina_runtime_loads and test_jina_v4_explicit_route (both still assert JinaV4 because JINA still loads v4 bytes pre-bake) - All cited symbols verified to exist at the file:line references via grep: * src/hpc/gguf.rs:188 read_tensor_f32 ✓ * src/hpc/gguf.rs:417 f16_to_f32 ✓ * src/hpc/quantized.rs:80 f32_to_bf16_rounded ✓ (confirmed wrapper line) * src/hpc/quantized.rs:108 bf16_gemm_f32 ✓ * src/simd.rs:206 mul_add ✓ Pure docstring change, no code behavior change, no new dependencies, no new functions. Fully additive. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
1 parent 2a7f89e commit d8b7b8e

1 file changed

Lines changed: 48 additions & 5 deletions

File tree

src/hpc/jina/runtime.rs

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,57 @@ pub enum ModelSource {
4040
/// that specifically need v4 behavior. Weights pre-baked at
4141
/// `weights/jina_base17_20k.bin` + `weights/jina_palette_20k.bin`.
4242
JinaV4,
43-
/// Jina v5 small (151K tokens, 1024D hidden, Qwen3 base, SiLU activation).
44-
/// **MAIN ROUTE** per AdaWorldAPI model registry (CLAUDE.md): Jina v5 is
45-
/// the canonical ground-truth anchor. Same BPE as Reranker v3.
43+
/// Jina v5 small (151K tokens, 1024D hidden, Qwen 3.5 base, SiLU activation).
44+
/// Also known as **Reader-LM v3** (same model, alternate name — BERT 3.x
45+
/// architecture lineage; NOT the older Qwen2-based Reader-LM 1.5B/v1/v2).
46+
///
47+
/// **MAIN ROUTE** per AdaWorldAPI model registry (`lance-graph/CLAUDE.md`
48+
/// → Model Registry → Production models): Jina v5 is the canonical
49+
/// ground-truth anchor. Same Qwen 3.x BPE as Reranker v3, Qwopus.
50+
///
51+
/// # Precision path (use existing primitives, do not duplicate)
52+
///
53+
/// Jina v5 is published in F16 only. The canonical ingestion chain uses
54+
/// existing primitives in this crate — do NOT write new conversion code:
55+
///
56+
/// 1. **Load** F16/F32/BF16/Q8_0 tensors via
57+
/// [`crate::hpc::gguf::read_tensor_f32`] (`src/hpc/gguf.rs:188`)
58+
/// which returns `Vec<f32>`. The scalar
59+
/// [`crate::hpc::gguf::f16_to_f32`] (`src/hpc/gguf.rs:417`) does the
60+
/// per-element F16 → F32 conversion.
61+
///
62+
/// 2. **F32 is transient**: the `Vec<f32>` from step 1 is never persisted.
63+
/// It is a momentary upcast pipe between F16 source bytes and BF16
64+
/// working format. No F32 in hot loops.
65+
///
66+
/// 3. **Convert to BF16** via
67+
/// [`crate::hpc::quantized::f32_to_bf16_rounded`] (`src/hpc/quantized.rs:80`)
68+
/// or [`crate::hpc::quantized::f32_vec_to_bf16`] for the whole slice.
69+
///
70+
/// 4. **Compute in BF16** via
71+
/// [`crate::hpc::quantized::bf16_gemm_f32`] (`src/hpc/quantized.rs:108`).
72+
/// F32-precision accumulation via fused hardware FMA.
73+
/// The primitive add_mul is exposed on SIMD lane types:
74+
/// [`crate::simd::F32x16::mul_add`] / `F32x8::mul_add` / `F64x8::mul_add`
75+
/// (`src/simd.rs:206`) compiles to VFMADD213PS (AVX-FMA) or
76+
/// VDPBF16PS (AVX-512-BF16) depending on the lane type.
77+
///
78+
/// 5. **Store** at runtime as Base17 i16 fixed-point (34-byte plane) or
79+
/// palette u8 index, after GammaProfile-calibrated quantization (see
80+
/// `lance-graph/crates/bgz-tensor/src/gamma_phi.rs::calibrate_gamma`
81+
/// for the per-role HDR-TV-style normalizer).
82+
///
83+
/// Never F16 → BF16 direct (would lose 3 exponent bits; F16 max ~65504
84+
/// overflows before reaching BF16 range). Never 8-bit quantization as
85+
/// a compute precision — only as a final calibrated storage format.
86+
///
87+
/// # Weight baking status
4688
///
4789
/// Weights NOT yet baked at compile time — the v5 bake pipeline must
4890
/// produce `weights/jina_v5_base17_151k.bin` + `weights/jina_v5_palette_151k.bin`
49-
/// before this variant is actually loadable via the `JINA_V5` static.
50-
/// Until then, the main-route alias `JINA` falls back to v4 bytes.
91+
/// before this variant is actually loadable via the `JINA_V5` static
92+
/// (to be added when the bake runs). Until then, the main-route alias
93+
/// `JINA` falls back to v4 bytes via `JINA_V4_BASE17` / `JINA_V4_PALETTE`.
5194
///
5295
/// See the TODO block above `JINA_V4_BASE17` for the exact swap sequence.
5396
JinaV5,

0 commit comments

Comments
 (0)