Skip to content

Commit 2a7f89e

Browse files
committed
feat(jina): scaffold Jina v5 as main route, preserve v4 explicit route
Authorization: user directive "you can upgrade ndarray code from jina 4 to jina5 but don't delete v4, just wire v5 as main route." This is additive scaffolding — Jina v5 bytes are NOT yet baked into weights/jina_v5_base17_151k.bin + weights/jina_v5_palette_151k.bin, so the `JINA` main-route static continues to load v4 bytes today. What this commit establishes is the migration path: 1. `ModelSource::JinaV5` variant added to the enum with a full docstring describing the Qwen3 base, 151K BPE tokens, 1024D hidden, SiLU activation. Explicitly marked as the MAIN ROUTE target per AdaWorldAPI model registry. 2. Internal weight-byte statics renamed for clarity: JINA_BASE17 → JINA_V4_BASE17 JINA_PALETTE → JINA_V4_PALETTE These are file-private `static` (not `pub`), so the rename does not affect any downstream caller. Names make v4-specificity explicit so the future JINA_V5_BASE17 / JINA_V5_PALETTE add-in is unambiguous. 3. `pub static JINA_V4` added as an explicit legacy-route accessor. Semantically identical to `JINA` today; the difference appears only AFTER v5 bake, at which point: - `JINA` will load v5 bytes (main route advances) - `JINA_V4` will still load v4 bytes (backward compat preserved) Tests that need v4 specifically can reference JINA_V4 directly and will NOT be silently upgraded to v5. 4. `JINA` main-route static keeps its current v4 load BUT gains a detailed docstring + inline TODO(jina-v5-bake) pointing at the exact one-line swap required when v5 weights are baked: ModelRuntime::load(ModelSource::JinaV5, JINA_V5_BASE17, JINA_V5_PALETTE) 5. New test `test_jina_v4_explicit_route` asserts that `&*JINA_V4` loads with `source == ModelSource::JinaV4` and `vocab_size() == 20000`. This test MUST still pass after any future v5 swap — it is the backward-compat guarantee that v4 is never silently deleted. 6. Existing test `test_jina_runtime_loads` is kept unchanged (still asserts `JINA == JinaV4`) because JINA currently loads v4. Its docstring notes that after v5 bake this test must be updated to assert JinaV5 source and ~151000 vocab_size. Verified: - `cargo check --lib` → clean (pre-existing warnings only, zero new) - `cargo test --lib hpc::jina::runtime` → test_jina_runtime_loads PASS - `cargo test --lib hpc::jina::runtime` → test_jina_v4_explicit_route PASS Not in this commit (deferred, pending v5 bake pipeline): - Actual JINA_V5_BASE17 / JINA_V5_PALETTE include_bytes statics - Swapping JINA's load to JinaV5 - New test asserting JINA.source == JinaV5 (would replace the current assertion in test_jina_runtime_loads after bake) - GammaProfile per-role calibration for the v5 weights (related but separate: see lance-graph/crates/bgz-tensor/src/gamma_phi.rs and the "γ+φ as HDR-TV-style distribution normalizer" architectural note) https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
1 parent 76a0a45 commit 2a7f89e

1 file changed

Lines changed: 68 additions & 6 deletions

File tree

src/hpc/jina/runtime.rs

Lines changed: 68 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,20 @@ use std::sync::LazyLock;
1313

1414
/// Embedded weight files (compiled into the binary via include_bytes!).
1515
/// Zero file I/O at runtime — the weights ARE the binary.
16-
static JINA_BASE17: &[u8] = include_bytes!("weights/jina_base17_20k.bin");
17-
static JINA_PALETTE: &[u8] = include_bytes!("weights/jina_palette_20k.bin");
16+
///
17+
/// Naming convention: {model}_{aspect}_{vocab_size}k.bin
18+
/// - aspect = base17 (token embeddings) or palette (256-entry lookup)
19+
/// - vocab_size = approximate token count in thousands
20+
static JINA_V4_BASE17: &[u8] = include_bytes!("weights/jina_base17_20k.bin");
21+
static JINA_V4_PALETTE: &[u8] = include_bytes!("weights/jina_palette_20k.bin");
22+
23+
// TODO(jina-v5-bake): When the bake pipeline produces Jina v5 weights
24+
// (151K Qwen3 BPE tokens, 1024D hidden → 34-byte Base17), add:
25+
// static JINA_V5_BASE17: &[u8] = include_bytes!("weights/jina_v5_base17_151k.bin");
26+
// static JINA_V5_PALETTE: &[u8] = include_bytes!("weights/jina_v5_palette_151k.bin");
27+
// Then swap the `JINA` LazyLock load line below to use JinaV5. See
28+
// `JINA` / `JINA_V4` / `JINA_V5` statics near end of file for the wiring.
29+
1830
static GPT2_BASE17: &[u8] = include_bytes!("weights/gpt2_base17_50k.bin");
1931
static GPT2_PALETTE: &[u8] = include_bytes!("weights/gpt2_palette_50k.bin");
2032
static BERT_BASE17: &[u8] = include_bytes!("weights/bert_base17_30k.bin");
@@ -23,9 +35,23 @@ static BERT_PALETTE: &[u8] = include_bytes!("weights/bert_palette_30k.bin");
2335
/// Which model's weights to use.
2436
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2537
pub enum ModelSource {
26-
/// Jina v4 text-retrieval (20K tokens, 2048D original).
38+
/// Jina v4 text-retrieval (20K tokens, 2048D original, XLM-R base).
39+
/// LEGACY route. Kept for backward compatibility and direct-access callers
40+
/// that specifically need v4 behavior. Weights pre-baked at
41+
/// `weights/jina_base17_20k.bin` + `weights/jina_palette_20k.bin`.
2742
JinaV4,
28-
/// GPT-2 small (50K tokens, 768D original). Same BPE as Jina.
43+
/// Jina v5 small (151K tokens, 1024D hidden, Qwen3 base, SiLU activation).
44+
/// **MAIN ROUTE** per AdaWorldAPI model registry (CLAUDE.md): Jina v5 is
45+
/// the canonical ground-truth anchor. Same BPE as Reranker v3.
46+
///
47+
/// Weights NOT yet baked at compile time — the v5 bake pipeline must
48+
/// produce `weights/jina_v5_base17_151k.bin` + `weights/jina_v5_palette_151k.bin`
49+
/// before this variant is actually loadable via the `JINA_V5` static.
50+
/// Until then, the main-route alias `JINA` falls back to v4 bytes.
51+
///
52+
/// See the TODO block above `JINA_V4_BASE17` for the exact swap sequence.
53+
JinaV5,
54+
/// GPT-2 small (50K tokens, 768D original). Same BPE as Jina v4.
2955
Gpt2,
3056
/// BERT base uncased (30K tokens, 768D original). WordPiece tokenizer.
3157
Bert,
@@ -190,9 +216,33 @@ fn build_similarity_table(palette: &JinaPalette) -> [f32; 256] {
190216
// Global LazyLock runtimes — loaded once, used forever
191217
// ============================================================================
192218

193-
/// Jina v4 runtime (20K tokens). LazyLock: zero cost after first access.
219+
/// Jina **main route**. LazyLock: zero cost after first access.
220+
///
221+
/// Today this loads Jina v4 bytes (20K tokens) because v5 weights are not yet
222+
/// baked into `weights/`. When the v5 bake pipeline produces
223+
/// `weights/jina_v5_base17_151k.bin` + `weights/jina_v5_palette_151k.bin`,
224+
/// swap the load line below to:
225+
///
226+
/// ```ignore
227+
/// ModelRuntime::load(ModelSource::JinaV5, JINA_V5_BASE17, JINA_V5_PALETTE)
228+
/// ```
229+
///
230+
/// Callers should use `JINA` for default behavior. Only use `JINA_V4`
231+
/// explicitly when v4-specific behavior is required (e.g., backward-compat
232+
/// tests).
194233
pub static JINA: LazyLock<ModelRuntime> = LazyLock::new(|| {
195-
ModelRuntime::load(ModelSource::JinaV4, JINA_BASE17, JINA_PALETTE)
234+
// TODO(jina-v5-bake): swap to JinaV5 when v5 weights exist.
235+
ModelRuntime::load(ModelSource::JinaV4, JINA_V4_BASE17, JINA_V4_PALETTE)
236+
});
237+
238+
/// Jina **v4 explicit route** (20K tokens, XLM-R base). LEGACY.
239+
///
240+
/// Use this when a caller specifically needs v4 behavior and should NOT be
241+
/// silently upgraded to v5 when the main route is swapped. Today this is
242+
/// functionally identical to `JINA` (both load v4 bytes), but after the v5
243+
/// bake `JINA` will load v5 while `JINA_V4` keeps loading v4.
244+
pub static JINA_V4: LazyLock<ModelRuntime> = LazyLock::new(|| {
245+
ModelRuntime::load(ModelSource::JinaV4, JINA_V4_BASE17, JINA_V4_PALETTE)
196246
});
197247

198248
/// GPT-2 runtime (50K tokens). Same BPE as Jina → interoperable palettes.
@@ -211,12 +261,24 @@ mod tests {
211261

212262
#[test]
213263
fn test_jina_runtime_loads() {
264+
// Main route. Today this is v4; when v5 is baked, update this test to
265+
// assert source == JinaV5 and vocab_size == ~151000.
214266
let rt = &*JINA;
215267
assert_eq!(rt.source, ModelSource::JinaV4);
216268
assert_eq!(rt.vocab_size(), 20000);
217269
assert!((rt.similarity[0] - 1.0).abs() < 0.01, "self-similarity should be ~1.0");
218270
}
219271

272+
#[test]
273+
fn test_jina_v4_explicit_route() {
274+
// Legacy v4-specific accessor. After v5 bake, this test MUST still
275+
// pass (v4 is the backward-compat guarantee — never deleted).
276+
let rt = &*JINA_V4;
277+
assert_eq!(rt.source, ModelSource::JinaV4);
278+
assert_eq!(rt.vocab_size(), 20000);
279+
assert!((rt.similarity[0] - 1.0).abs() < 0.01, "self-similarity should be ~1.0");
280+
}
281+
220282
#[test]
221283
fn test_gpt2_runtime_loads() {
222284
let rt = &*GPT2;

0 commit comments

Comments
 (0)