Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ This is the doc-level value of PR-X12: bgz code + PR-X12 docs = a complete archi

## 5. Gaps — what doesn't exist yet

### 5.1 `jd-nd` — the missing ndarray-side proof crate
### 5.1 `jd-nd` — the missing ndarray-side proof crate (Gap **G-1**)

The Explore search confirmed: `jd-nd` does not exist in `/home/user/ndarray/`. The math-proof infrastructure on the ndarray side lives ad-hoc inside `src/hpc/` modules (`deepnsm.rs`, `jina/runtime.rs`) as TODO comments.

Expand Down Expand Up @@ -335,7 +335,7 @@ ndarray/crates/jd-nd/

**Why now:** R-11's latency CI needs a *correctness* twin. Latency that's fast but wrong is the worst outcome. jd-nd is the structural place for those proofs.

### 5.2 Cronbach / ICC research crate
### 5.2 Cronbach / ICC research crate (Gap **G-2**)

`lance-graph/crates/lance-graph-codec-research/` exists per the Explore agent's report, **but its scope is FFT (rustfft) variants**, not Cronbach's α / ICC / encoding-reliability psychometrics.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,12 +308,14 @@ Updating the inventory from `pr-x12-bgz-jc-substrate-synergies.md` §7 with the

**Total estimated gap-closing work: 8-12 weeks** across the seven items, all incremental on existing infrastructure. None of them require new research; all are wiring existing primitives into the codec.

Two prior gaps from the earlier doc remain:
Two prior gaps from the earlier doc remain (their canonical IDs are owned by `pr-x12-bgz-jc-substrate-synergies.md` §5; cross-referenced here):

| Gap (prior) | Component | Cost |
| Gap (cross-ref) | Component | Cost |
|---|---|---|
| **G-8** | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing |
| **G-9** | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC |
| **bgz-jc G-1** (§5.1) | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing |
| **bgz-jc G-2** (§5.2) | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC |

The G-1..G-7 IDs in §5 of *this* doc are local to the cam-pq / sigker / dn_tree binding; bgz-jc's G-1 / G-2 are a separate namespace owned by that doc. When citing cross-doc, prefix with the source (e.g., "bgz-jc G-1" vs "cam-pq G-1") to avoid the collision the previous G-8 / G-9 labelling implied.

**Grand total: ~11-17 weeks** of substrate-binding + gap-closing work, parallel-able. PR-X12 codec body (~1500 LoC per R-3) is independent of this and can ship sooner.

Expand Down
66 changes: 59 additions & 7 deletions .claude/knowledge/pr-x12-canon-resolutions-delta.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,16 @@

## 0. What's actually new

The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Five categories of novel content survive the delta filter:
The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Six categories of novel content survive the delta filter:

1. **Concrete trait signatures** — R-1 (`Basis<T>` + `LinearReduce` split), §8 surface (`PredictiveSignal`, `CurveOrder<const N>`, `RdoMetric`)
2. **Quantified budgets** — R-3 LoC envelope per sub-card / per consumer + audit rule; R-4 four Plan G thresholds; R-11 4K@60fps latency budget
3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`)
3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`, kernel at `bgz17::scalar_sparse::tropical_spmv`)
4. **Type-level invariants** — R-2 bit-15/bit-14 split, R-9 topology-FREE codec
5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook
5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook (primitives: `cam_pq` + `bgz-hhtl-d` + `dn_tree` + `merkle_tree`)
6. **Formal-correctness + stream lane (post-merge)** — R-14 (`jc::pflug` Pillar 10 + `jc::hambly_lyons` Pillar 11), R-15 (`SignatureBasis<DEPTH>` as fifth Plan G lane)

Plus the synthesis layer: §9 falsifiability matrix (24 rows), §10 sequencing with named gates, §12 compaction-preservation contract.
Plus the synthesis layer: §9 falsifiability matrix (24+3 rows including R-14/R-15), §10 sequencing with named gates, §12 compaction-preservation contract.

---

Expand Down Expand Up @@ -216,7 +217,9 @@ Tropical-semiring (+, min) formulation:

At 4K 132K CTUs/frame: ~4 ms vs ~64 ms just for partition RDO. At 60 fps, the difference between fitting and missing budget.

**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom.
**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels nominally live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom.

**Actual kernel home (current):** `lance-graph::bgz17::scalar_sparse::tropical_spmv`. The `blasgraph` namespace is the eventual abstraction; until that lands, ndarray-codec depends on bgz17 directly. Cite the symbol when wiring A6, not the namespace.

**Plan A6 (1 week) ships this.** λ-RDO knob scales edge weights; tropical-GEMM relaxation computes optimal mode tree.

Expand Down Expand Up @@ -292,6 +295,16 @@ Pattern: ship simplest-that-works, measure, escalate. Don't pick best-in-theory

Wire-format hook for Option A: `WorkerId: u16` + `CodebookHash: u64` in frame header.

**Implementation primitives** (already exist; PR-X12 only adds the wire format + `CodebookHandle` trait):

| Concern | Crate / module |
|---|---|
| Codebook training (k-means + CAM-PQ) | `ndarray::hpc::cam_pq::CamCodebook` |
| Deployed encoding format | `lance-graph::bgz-tensor::Codebook4096` / `bgz-hhtl-d` |
| Online plastic updates (SharedClusterWide) | `ndarray::hpc::dn_tree` |
| Integrity proof (Blake3-48 Merkle root, xor_diff) | `ndarray::hpc::merkle_tree` |
| Gossip protocol | `q2` (external) |

### 5.3 Streaming flush granularity (R-12)

Per-CTU default. `FlushUnit` 2-bit tag in frame header:
Expand Down Expand Up @@ -405,9 +418,48 @@ Citation IDs (R-1..R-13) stable. Canon IDs (M:E-*, M:H-*, M:H-NEW-*, M:T-*, A:E-

---

## 11. The single load-bearing paragraph (§13)
## 11. Formal-correctness layer (R-14) — post-merge addition

The substrate-binding doc (`pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md`) surfaced two formal proofs in `lance-graph::jc` that the codec inherits without re-proving:

| Pillar | Crate / module | What it proves | Status |
|---|---|---|---|
| **Pillar 10** (Pflug-Pichler) | `jc::pflug` | Nested-distance Lipschitz on Sigma DN-trees: CAM-PQ tree quantization preserves FreeEnergy within Lε | Active in default zero-dep build |
| **Pillar 11** (Hambly-Lyons) | `jc::hambly_lyons` | Signature uniqueness on tree-quotient: any path of bounded variation is uniquely determined by its truncated signature up to tree-like equivalence (Annals 171(1), arXiv:math/0507536) | Active under `--features hambly-lyons` (PR #348, 2026-05-07); probe passes (forward<1e-9, converse>0.05, ratio≥1e6) |

R-4's quality-floor rows for video / KV / gradient inherit Pillar 10's Lipschitz bound. R-15's signature lane gates on Pillar 11.

**Open work (G-4):** PR #350 corrects `sigker::signature_kernel_pde`'s known Goursat-PDE math bug; Pillar 11's probe deliberately uses `signature_truncated` (tensor-algebra) until PR #350 lands. Production-scale benchmarking pending.

---

## 12. Stream-signal codec lane (R-15) — post-merge addition

`SignatureBasis<const DEPTH: usize>: Basis<f32>` is the fifth concrete `Basis<T>` impl, complementing the four lanes in §1's table:

```rust
// New: ndarray::hpc::signature (~1 wk, wraps sigker::signature_truncated)
impl<const DEPTH: usize> Basis<f32> for SignatureBasis<DEPTH> {
fn dim(&self) -> usize { /* truncated tensor-algebra dim */ }
fn apply(&self, path: &[f32], signature: &mut [f32]) {
// iterated-integral truncation via sigker::signature_truncated
}
fn invert(&self, _sig: &[f32], _path: &mut [f32]) {
unimplemented!("path-from-signature is unique only up to tree-like \
equivalence per R-14 Pillar 11")
}
}
```

**Plan G gets a fifth lane: "stream signal"** — audio waveforms / time-series / gesture / handwriting paths. Codec is `SignatureBasis<DEPTH=3>` + standard rANS over the four-mode taxonomy; quality floor inherits from Pillar 11 (R-14); compression target ~10× over raw f32 path samples (calibrate during Plan G).

**Why `signature_truncated` not `signature_kernel_pde`:** the PDE form ships a known divergence bug (PR #350). The tensor-algebra path is correct today and is what Pillar 11 cites.

---

## 13. The single load-bearing paragraph (canon-resolutions §13)

> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis<T>` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point.*
> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis<T>` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point. The substrate-binding follow-up (R-14, R-15) adds a formal-correctness layer via `jc` pillars and a fifth stream-signal lane via `SignatureBasis<DEPTH>`.*

---

Expand Down
8 changes: 5 additions & 3 deletions .claude/knowledge/pr-x12-gguf-llm-weights-encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Crucially, the residual is **rANS-coded with a Gaussian-tail prior** (R-10). GGU

For weights that are too extreme to fit any basin (the activation outliers that LLM.int8() and SmoothQuant fight over), encode as Escape + raw f16 value. ~3-5% of weights per layer, but they carry disproportionate information.

The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate.
The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate; see §10 falsifier **F-4** for the wire-format mechanism (rANS bypass channel in the A8 framing layer) and the HEVC-escape-coefficient precedent.

---

Expand Down Expand Up @@ -266,7 +266,9 @@ Per GEMM operation (e.g., compute attn_q @ x for batch):

The CTU bitstream is read forward-only (rANS is a streaming codec) and the decoded weights live in L1/L2 cache just long enough to be GEMM'd. **No full-tensor dequantize buffer needed.** For a 4096 × 4096 attention projection, the dequantize buffer would be 32 MB (f16); PR-X12 streams in ~3-4 MB of bitstream, decodes to ~64 KB cache-resident windows, GEMMs each window, drops it.

**Memory savings:** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch." A 7B model at PR-X12 is genuinely runnable on a phone-class device, where GGUF Q4 is borderline.
**Memory savings (weights only):** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch."

**Phone-class caveat — weights are not the only memory load.** The KV cache scales with context length and is independent of weight compression: for a 7B model at 8K context, KV cache is ~2 GB in fp16 / ~1 GB in int8, and grows linearly with context. PR-X12 weight compression alone takes a 7B from "borderline" to "easier" on phone-class hardware, but **the KV cache lane (Plan D, M:H-3, R-4) is the second lever** that has to compress for full phone-class viability at non-trivial context. Both lanes are needed; this lens only addresses the weights side.

**Latency:** the streaming decode happens in the same loop body as the GEMM accumulate. On a modern arch with VNNI + AMX, the decode cost (~5-10 cycles per cell, branchless via R-1's lookup-table pattern) is hidden by GEMM latency. **Estimated overhead: < 5% versus pre-dequantized GEMM.**

Expand Down Expand Up @@ -345,7 +347,7 @@ Concrete implications:

4. **Do** keep R-13's federated codebook policy. The LLM use case is the strongest motivation: per-model codebooks are 13 MB; without R-13, a hard-coded codebook would not work for arbitrary LLMs.

5. **Reserve** an `EncodingDomain::LLMWeights` discriminant in the codec metadata header (separate from the 16-bit per-CTU header). The codec body doesn't read this — it just stamps the file with a domain tag so decoders know which basin codebook to load.
5. **Reserve** the *enum-discriminant slot* for `EncodingDomain::LLMWeights` in the codec metadata header *now*, even though the actual LLM-lane decoder lands post-PR-X12 (per implication #2). The header reserves a fixed-size domain-tag field (separate from the 16-bit per-CTU header); the LLMWeights value of that field stays unimplemented in PR-X12, but the slot is forward-compatibility-locked so a future PR can add the variant without a wire-format break. The codec body doesn't read this — it stamps the file with a domain tag so decoders know which basin codebook to load.

6. **Bench against AWQ at parity perplexity, not just Q4_K_M.** Q4_K_M is a conservative baseline; AWQ + GPTQ are the actual state of the art. If PR-X12 can match AWQ at smaller storage, the case is strong; if not, ship at "drop-in GGUF replacement" framing only.

Expand Down
Loading
Loading