diff --git a/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md b/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md index e715d403..313833af 100644 --- a/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md +++ b/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md @@ -303,7 +303,7 @@ This is the doc-level value of PR-X12: bgz code + PR-X12 docs = a complete archi ## 5. Gaps — what doesn't exist yet -### 5.1 `jd-nd` — the missing ndarray-side proof crate +### 5.1 `jd-nd` — the missing ndarray-side proof crate (Gap **G-1**) The Explore search confirmed: `jd-nd` does not exist in `/home/user/ndarray/`. The math-proof infrastructure on the ndarray side lives ad-hoc inside `src/hpc/` modules (`deepnsm.rs`, `jina/runtime.rs`) as TODO comments. @@ -335,7 +335,7 @@ ndarray/crates/jd-nd/ **Why now:** R-11's latency CI needs a *correctness* twin. Latency that's fast but wrong is the worst outcome. jd-nd is the structural place for those proofs. -### 5.2 Cronbach / ICC research crate +### 5.2 Cronbach / ICC research crate (Gap **G-2**) `lance-graph/crates/lance-graph-codec-research/` exists per the Explore agent's report, **but its scope is FFT (rustfft) variants**, not Cronbach's α / ICC / encoding-reliability psychometrics. diff --git a/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md b/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md index 3ba29be3..e1deb77d 100644 --- a/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md +++ b/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md @@ -308,12 +308,14 @@ Updating the inventory from `pr-x12-bgz-jc-substrate-synergies.md` §7 with the **Total estimated gap-closing work: 8-12 weeks** across the seven items, all incremental on existing infrastructure. None of them require new research; all are wiring existing primitives into the codec. -Two prior gaps from the earlier doc remain: +Two prior gaps from the earlier doc remain (their canonical IDs are owned by `pr-x12-bgz-jc-substrate-synergies.md` §5; cross-referenced here): -| Gap (prior) | Component | Cost | +| Gap (cross-ref) | Component | Cost | |---|---|---| -| **G-8** | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing | -| **G-9** | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC | +| **bgz-jc G-1** (§5.1) | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing | +| **bgz-jc G-2** (§5.2) | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC | + +The G-1..G-7 IDs in §5 of *this* doc are local to the cam-pq / sigker / dn_tree binding; bgz-jc's G-1 / G-2 are a separate namespace owned by that doc. When citing cross-doc, prefix with the source (e.g., "bgz-jc G-1" vs "cam-pq G-1") to avoid the collision the previous G-8 / G-9 labelling implied. **Grand total: ~11-17 weeks** of substrate-binding + gap-closing work, parallel-able. PR-X12 codec body (~1500 LoC per R-3) is independent of this and can ship sooner. diff --git a/.claude/knowledge/pr-x12-canon-resolutions-delta.md b/.claude/knowledge/pr-x12-canon-resolutions-delta.md index ad7b923f..fd6795b7 100644 --- a/.claude/knowledge/pr-x12-canon-resolutions-delta.md +++ b/.claude/knowledge/pr-x12-canon-resolutions-delta.md @@ -9,15 +9,16 @@ ## 0. What's actually new -The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Five categories of novel content survive the delta filter: +The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Six categories of novel content survive the delta filter: 1. **Concrete trait signatures** — R-1 (`Basis` + `LinearReduce` split), §8 surface (`PredictiveSignal`, `CurveOrder`, `RdoMetric`) 2. **Quantified budgets** — R-3 LoC envelope per sub-card / per consumer + audit rule; R-4 four Plan G thresholds; R-11 4K@60fps latency budget -3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`) +3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`, kernel at `bgz17::scalar_sparse::tropical_spmv`) 4. **Type-level invariants** — R-2 bit-15/bit-14 split, R-9 topology-FREE codec -5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook +5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook (primitives: `cam_pq` + `bgz-hhtl-d` + `dn_tree` + `merkle_tree`) +6. **Formal-correctness + stream lane (post-merge)** — R-14 (`jc::pflug` Pillar 10 + `jc::hambly_lyons` Pillar 11), R-15 (`SignatureBasis` as fifth Plan G lane) -Plus the synthesis layer: §9 falsifiability matrix (24 rows), §10 sequencing with named gates, §12 compaction-preservation contract. +Plus the synthesis layer: §9 falsifiability matrix (24+3 rows including R-14/R-15), §10 sequencing with named gates, §12 compaction-preservation contract. --- @@ -216,7 +217,9 @@ Tropical-semiring (+, min) formulation: At 4K 132K CTUs/frame: ~4 ms vs ~64 ms just for partition RDO. At 60 fps, the difference between fitting and missing budget. -**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom. +**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels nominally live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom. + +**Actual kernel home (current):** `lance-graph::bgz17::scalar_sparse::tropical_spmv`. The `blasgraph` namespace is the eventual abstraction; until that lands, ndarray-codec depends on bgz17 directly. Cite the symbol when wiring A6, not the namespace. **Plan A6 (1 week) ships this.** λ-RDO knob scales edge weights; tropical-GEMM relaxation computes optimal mode tree. @@ -292,6 +295,16 @@ Pattern: ship simplest-that-works, measure, escalate. Don't pick best-in-theory Wire-format hook for Option A: `WorkerId: u16` + `CodebookHash: u64` in frame header. +**Implementation primitives** (already exist; PR-X12 only adds the wire format + `CodebookHandle` trait): + +| Concern | Crate / module | +|---|---| +| Codebook training (k-means + CAM-PQ) | `ndarray::hpc::cam_pq::CamCodebook` | +| Deployed encoding format | `lance-graph::bgz-tensor::Codebook4096` / `bgz-hhtl-d` | +| Online plastic updates (SharedClusterWide) | `ndarray::hpc::dn_tree` | +| Integrity proof (Blake3-48 Merkle root, xor_diff) | `ndarray::hpc::merkle_tree` | +| Gossip protocol | `q2` (external) | + ### 5.3 Streaming flush granularity (R-12) Per-CTU default. `FlushUnit` 2-bit tag in frame header: @@ -405,9 +418,48 @@ Citation IDs (R-1..R-13) stable. Canon IDs (M:E-*, M:H-*, M:H-NEW-*, M:T-*, A:E- --- -## 11. The single load-bearing paragraph (§13) +## 11. Formal-correctness layer (R-14) — post-merge addition + +The substrate-binding doc (`pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md`) surfaced two formal proofs in `lance-graph::jc` that the codec inherits without re-proving: + +| Pillar | Crate / module | What it proves | Status | +|---|---|---|---| +| **Pillar 10** (Pflug-Pichler) | `jc::pflug` | Nested-distance Lipschitz on Sigma DN-trees: CAM-PQ tree quantization preserves FreeEnergy within Lε | Active in default zero-dep build | +| **Pillar 11** (Hambly-Lyons) | `jc::hambly_lyons` | Signature uniqueness on tree-quotient: any path of bounded variation is uniquely determined by its truncated signature up to tree-like equivalence (Annals 171(1), arXiv:math/0507536) | Active under `--features hambly-lyons` (PR #348, 2026-05-07); probe passes (forward<1e-9, converse>0.05, ratio≥1e6) | + +R-4's quality-floor rows for video / KV / gradient inherit Pillar 10's Lipschitz bound. R-15's signature lane gates on Pillar 11. + +**Open work (G-4):** PR #350 corrects `sigker::signature_kernel_pde`'s known Goursat-PDE math bug; Pillar 11's probe deliberately uses `signature_truncated` (tensor-algebra) until PR #350 lands. Production-scale benchmarking pending. + +--- + +## 12. Stream-signal codec lane (R-15) — post-merge addition + +`SignatureBasis: Basis` is the fifth concrete `Basis` impl, complementing the four lanes in §1's table: + +```rust +// New: ndarray::hpc::signature (~1 wk, wraps sigker::signature_truncated) +impl Basis for SignatureBasis { + fn dim(&self) -> usize { /* truncated tensor-algebra dim */ } + fn apply(&self, path: &[f32], signature: &mut [f32]) { + // iterated-integral truncation via sigker::signature_truncated + } + fn invert(&self, _sig: &[f32], _path: &mut [f32]) { + unimplemented!("path-from-signature is unique only up to tree-like \ + equivalence per R-14 Pillar 11") + } +} +``` + +**Plan G gets a fifth lane: "stream signal"** — audio waveforms / time-series / gesture / handwriting paths. Codec is `SignatureBasis` + standard rANS over the four-mode taxonomy; quality floor inherits from Pillar 11 (R-14); compression target ~10× over raw f32 path samples (calibrate during Plan G). + +**Why `signature_truncated` not `signature_kernel_pde`:** the PDE form ships a known divergence bug (PR #350). The tensor-algebra path is correct today and is what Pillar 11 cites. + +--- + +## 13. The single load-bearing paragraph (canon-resolutions §13) -> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point.* +> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point. The substrate-binding follow-up (R-14, R-15) adds a formal-correctness layer via `jc` pillars and a fifth stream-signal lane via `SignatureBasis`.* --- diff --git a/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md b/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md index eda384c5..e1fb0c91 100644 --- a/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md +++ b/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md @@ -131,7 +131,7 @@ Crucially, the residual is **rANS-coded with a Gaussian-tail prior** (R-10). GGU For weights that are too extreme to fit any basin (the activation outliers that LLM.int8() and SmoothQuant fight over), encode as Escape + raw f16 value. ~3-5% of weights per layer, but they carry disproportionate information. -The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate. +The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate; see §10 falsifier **F-4** for the wire-format mechanism (rANS bypass channel in the A8 framing layer) and the HEVC-escape-coefficient precedent. --- @@ -266,7 +266,9 @@ Per GEMM operation (e.g., compute attn_q @ x for batch): The CTU bitstream is read forward-only (rANS is a streaming codec) and the decoded weights live in L1/L2 cache just long enough to be GEMM'd. **No full-tensor dequantize buffer needed.** For a 4096 × 4096 attention projection, the dequantize buffer would be 32 MB (f16); PR-X12 streams in ~3-4 MB of bitstream, decodes to ~64 KB cache-resident windows, GEMMs each window, drops it. -**Memory savings:** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch." A 7B model at PR-X12 is genuinely runnable on a phone-class device, where GGUF Q4 is borderline. +**Memory savings (weights only):** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch." + +**Phone-class caveat — weights are not the only memory load.** The KV cache scales with context length and is independent of weight compression: for a 7B model at 8K context, KV cache is ~2 GB in fp16 / ~1 GB in int8, and grows linearly with context. PR-X12 weight compression alone takes a 7B from "borderline" to "easier" on phone-class hardware, but **the KV cache lane (Plan D, M:H-3, R-4) is the second lever** that has to compress for full phone-class viability at non-trivial context. Both lanes are needed; this lens only addresses the weights side. **Latency:** the streaming decode happens in the same loop body as the GEMM accumulate. On a modern arch with VNNI + AMX, the decode cost (~5-10 cycles per cell, branchless via R-1's lookup-table pattern) is hidden by GEMM latency. **Estimated overhead: < 5% versus pre-dequantized GEMM.** @@ -345,7 +347,7 @@ Concrete implications: 4. **Do** keep R-13's federated codebook policy. The LLM use case is the strongest motivation: per-model codebooks are 13 MB; without R-13, a hard-coded codebook would not work for arbitrary LLMs. -5. **Reserve** an `EncodingDomain::LLMWeights` discriminant in the codec metadata header (separate from the 16-bit per-CTU header). The codec body doesn't read this — it just stamps the file with a domain tag so decoders know which basin codebook to load. +5. **Reserve** the *enum-discriminant slot* for `EncodingDomain::LLMWeights` in the codec metadata header *now*, even though the actual LLM-lane decoder lands post-PR-X12 (per implication #2). The header reserves a fixed-size domain-tag field (separate from the 16-bit per-CTU header); the LLMWeights value of that field stays unimplemented in PR-X12, but the slot is forward-compatibility-locked so a future PR can add the variant without a wire-format break. The codec body doesn't read this — it stamps the file with a domain tag so decoders know which basin codebook to load. 6. **Bench against AWQ at parity perplexity, not just Q4_K_M.** Q4_K_M is a conservative baseline; AWQ + GPTQ are the actual state of the art. If PR-X12 can match AWQ at smaller storage, the case is strong; if not, ship at "drop-in GGUF replacement" framing only. diff --git a/.claude/knowledge/pr-x12-substrate-canon-resolutions.md b/.claude/knowledge/pr-x12-substrate-canon-resolutions.md index 5bd633ba..26e99042 100644 --- a/.claude/knowledge/pr-x12-substrate-canon-resolutions.md +++ b/.claude/knowledge/pr-x12-substrate-canon-resolutions.md @@ -24,8 +24,11 @@ were raised in review: (R-5 through R-7 restorations) - **§6** — three pieces of detail from session B the merge underrepresented (R-8 through R-10 restorations) -- **§7** — three commitments missing from both originals and from the - merge (R-11 through R-13 new specs) +- **§7** — five commitments missing from both originals and from the + merge: R-11 through R-13 (latency, flush granularity, federated + codebook) plus R-14 (formal correctness via `jc` pillars) and R-15 + (`SignatureBasis` as fifth Plan G lane), the latter two + surfaced post-merge by the substrate-binding docs Then five integration pieces that make the resolutions actionable: @@ -36,9 +39,10 @@ Then five integration pieces that make the resolutions actionable: - **§11** — end-state + trajectory (think it from the end) - **§12** — compaction-preservation contract -Citation IDs: `R-1` through `R-13` for resolutions. Canon IDs (`M:E-*`, -`A:E-*`, `B:E-*`, `M:H-*`, `M:T-*`) remain stable; this doc adds, does -not renumber. +Citation IDs: `R-1` through `R-15` for resolutions (R-14, R-15 +appended post-merge from the substrate-binding doc; numbering remains +append-only). Canon IDs (`M:E-*`, `A:E-*`, `B:E-*`, `M:H-*`, `M:T-*`) +remain stable; this doc adds, does not renumber. Sister docs (read order): @@ -543,6 +547,14 @@ ships tropical-GEMM kernels. No new code in ndarray; cross-repo dep from ndarray-codec → lance-graph::blasgraph (after Plan H extraction, this is dep-allowed because ndarray-codec is a sibling, not the bottom). +**Actual kernel home (current).** The tropical-GEMM kernel lives today +at `lance-graph::bgz17::scalar_sparse::tropical_spmv` — NOT in an +abstract `blasgraph` namespace. The codec's tropical-GEMM call is +`bgz17::scalar_sparse::tropical_spmv(edge_weights, dag)`. The +`lance-graph::blasgraph` name above is the eventual abstraction layer +(post-Plan-H extraction); until that lands, ndarray-codec depends on +bgz17 directly. Cite the symbol, not the namespace, when wiring A6. + **Plan A6 RDO (1 week) ships this.** The λ-RDO knob (per A:§10.3) and the tropical-GEMM partition solver are the same kernel: λ scales the edge weights, the relaxation computes the optimal mode tree. @@ -935,10 +947,135 @@ empirically; v3 (research-grade) tries Option C. R-4 gradient threshold (8× compression at <0.5% loss delta). At that point, Plan F v1 escalates to Option B in a follow-up PR. +**Implementation primitives (current substrate, no new code required):** + +| Concern | Crate / module | +|---------|----------------| +| Codebook training (k-means + CAM-PQ) | `ndarray::hpc::cam_pq::CamCodebook` (`train_geometric` / `train_semantic` / `train_hybrid`) | +| Deployed encoding format (per-shard) | `lance-graph::bgz-tensor::Codebook4096` and the `bgz-hhtl-d` shared-palette variant | +| Online plastic updates (`SharedClusterWide`) | `ndarray::hpc::dn_tree` (quaternary plastic memory, partial-Hamming descent) | +| Integrity proof for distributed updates | `ndarray::hpc::merkle_tree` (Blake3-48-bit, 1 KB root, `xor_diff` panCAKES compression) | +| Gossip protocol (cluster-wide) | `q2` (external — implements the wire protocol) | + +The four policy modes (`LocalEphemeral` / `SharedClusterWide` / +`SharedRegional` / `PretrainedStatic`) compose these primitives +differently; the codec body exposes a `CodebookHandle` trait, and the +primitives plug in via that trait. **PR-X12 contributes the wire format ++ trait + Option A; the primitives above already exist.** + **Cite as R-13 in Plan F PR description.** --- +### R-14 — Formal correctness via `lance-graph::jc` pillars + +**Problem.** Canon and resolutions describe the codec's empirical +behaviour (R-4 thresholds, R-11 latency) but never name the formal +correctness proofs the substrate already carries. Without a citation, +"the codec is correct" is unverifiable; with citations, the codec +inherits machine-checked guarantees from existing crates. + +**Resolution.** Pin both pillars and what each proves. + +**Two formal proofs in `lance-graph::jc`:** + +- **Quantization correctness (Pillar 10, Pflug-Pichler):** + nested-distance Lipschitz on Sigma DN-trees. Proves that CAM-PQ tree + quantization preserves the FreeEnergy functional within a Lipschitz + factor Lε. **This is the proof PR-X12 cites for "wire-format + quantization is faithful."** Implementation: `jc::pflug` (active in + default build, zero-dep). +- **Path-signature correctness (Pillar 11, Hambly-Lyons):** + signature uniqueness on tree-quotient. Proves that any path of + bounded variation is uniquely determined by its truncated signature + up to tree-like equivalence (Annals of Mathematics 171(1):109–167, + arXiv:math/0507536). **This is the proof PR-X12 cites for the + `SignatureBasis` lane (R-15).** Implementation: + `jc::hambly_lyons` (active under `--features hambly-lyons`, since + PR #348 landed on 2026-05-07). + +**What the codec inherits.** Both pillars exist; the codec cites them +and does not reprove. R-4's "Quality floor" rows for video / KV / +gradient inherit Pillar 10's Lipschitz bound automatically. R-15's +signature-lane gates on Pillar 11. + +**Status.** + +- Pillar 10: active in default zero-dep build. +- Pillar 11: active under `--features hambly-lyons`; passes its probe + (forward < 1e-9, converse > 0.05, discrimination ratio ≥ 1e6 over + N=100 random pairs in d=3 at depth-2). +- Production-scale benchmarking + PR #350 (`signature_kernel_pde` + Goursat-PDE math correction) remain open — see Gap G-4 in + `pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md`. Pillar 11's + probe deliberately uses `signature_truncated` (tensor-algebra path), + not the buggy PDE form. + +**Falsifies if.** Pillar 10 ever flips state (a regression in the +Pflug-Pichler proof bound) — Plan G's video / KV / gradient quality +floors lose their formal underwriting and become empirical-only. + +**Cite as R-14 in any PR claiming "codec output is faithful to +input" or wiring `SignatureBasis` (R-15).** + +--- + +### R-15 — `SignatureBasis` as `Basis` impl + +**Problem.** R-1 commits the `Basis` shape; the canon lists three +concrete impls (`DctIIBasis` for video, `EwaSplatBasis` for 3DGS, +`ShSpectralBasis` for splat SH). No `Basis` impl targets +*streams* — audio waveforms, time-series, gesture/handwriting paths. +Plan G has only four lanes; path-structured signals are unaddressed. + +**Resolution.** Commit `SignatureBasis: Basis` +as the fifth concrete impl, wrapping the path-signature kernel from +the external `lance-graph::sigker` crate. + +```rust +// Concrete impl, lives in ndarray::hpc::signature (new module, ~1 wk) +impl Basis for SignatureBasis { + fn dim(&self) -> usize { /* truncated tensor-algebra dim at DEPTH */ } + fn apply(&self, path: &[f32], signature: &mut [f32]) { + // iterated-integral truncation against sigker::signature_truncated + } + fn invert(&self, _sig: &[f32], _path: &mut [f32]) { + // signature → path is many-to-one (tree-quotient); document as N/A + unimplemented!("signature inversion is N/A — path unique only up to \ + tree-like equivalence per R-14 / Pillar 11") + } +} +``` + +**Why `signature_truncated` and not `signature_kernel_pde`.** The +PDE form in sigker ships a known math bug (PR #350: Goursat-PDE form +diverges from the true kernel `I₀(2·√⟨u, v⟩)` at moderate inner +products). The tensor-algebra path (`signature_truncated`) is correct +today and is what jc Pillar 11 cites. R-15 wraps the truncated path; +the PDE form becomes available after PR #350 lands. + +**Plan G gets a fifth lane.** "Stream signal" mode: + +- Input: audio waveform / time-series / gesture stream +- Codec: `SignatureBasis` truncates path signature, residuals + go through standard rANS via the four-mode taxonomy +- Quality floor: signature-uniqueness preservation per Pillar 11 +- Compression target: ~10× over raw f32 path samples (estimate; + calibrate during Plan G) + +**Falsifies if.** `SignatureBasis` plus rANS fails to +reconstruct the path within ε under Pillar 11's discrimination ratio. +At that point, raise DEPTH or fall back to per-block DCT-II for the +stream lane. + +**Cost.** ~1 week wrapper around `sigker::signature_truncated` + +basis-trait plumbing + Plan G fifth-lane wiring. + +**Cite as R-15 in any PR adding a stream-signal codec lane or +wiring `SignatureBasis`.** + +--- + ## 8. The canonical contracts — concrete trait signatures All three plug-points (per M:E-E) get concrete signatures here. These @@ -1108,6 +1245,9 @@ that decides whether each holy-grail claim is demonstrated. | R-11 (4K 60fps SIMD-batched) | this doc | Plan G video latency assert | Per-CTU encode time | ≤210 ns/CTU on Sapphire Rapids | | R-12 (per-CTU flush) | this doc | A8 frame-header parse + decode | First-CTU latency | First CTU decodable before frame complete | | R-13 (Option A per-shard) | this doc | Plan F on BERT-glue | 8× compression + accuracy | Holds; else escalate to Option B | +| R-14 (Pillar 10 active) | this doc | `cargo test -p jc` (default features) | Pflug-Pichler Lipschitz bound | Pillar 10 probe green | +| R-14 (Pillar 11 active) | this doc | `cargo test -p jc --features hambly-lyons` | Signature uniqueness probe | forward < 1e-9, converse > 0.05, ratio ≥ 1e6 | +| R-15 (SignatureBasis lane) | this doc | Plan G stream-signal lane | signature-space discrimination under Pillar 11 (forward-only — path inversion is N/A per R-15) | forward < 1e-9, converse > 0.05, ratio ≥ 1e6 (or agreed DEPTH-specific floor) | **Every row of this matrix is a test.** Plan G's bench harness binary emits a JSON report containing the actual measurement for each row; @@ -1234,7 +1374,7 @@ When this doc is summarised across context windows, preserve at minimum: 1. **The five "merged well" items** from §3 (M:E-A / M:E-D / M:E-G / M:E-I / M:E-F) — these are the canon's load-bearing synthesis. -2. **The thirteen resolutions** R-1 through R-13 with their summary +2. **The fifteen resolutions** R-1 through R-15 with their summary commitments: - R-1: `LinearReduce` two-trait shape - R-2: bit 15 universal, bit 14 consumer-typed @@ -1242,13 +1382,19 @@ When this doc is summarised across context windows, preserve at minimum: - R-4: 4 threshold pairs (video, splat, kv, grad) - R-5: DCT crossover ~64 blocks - R-6: SSD via VNNI ≥30× over SAD - - R-7: tropical-GEMM partition O(4^d) → O(d²) + - R-7: tropical-GEMM partition O(4^d) → O(d²) via + `bgz17::scalar_sparse::tropical_spmv` - R-8: Plan G is confidence gate - R-9: topology-FREE codec layer - R-10: ~4 bits/Gaussian near target, ~1 bit stretch - R-11: 210 ns/CTU SIMD-batched encode - R-12: per-CTU flush default; per-bucket Plan F - - R-13: Option A (per-shard codebook) for Plan F v1 + - R-13: Option A (per-shard codebook) for Plan F v1; primitives are + `cam_pq` + `bgz-hhtl-d` + `dn_tree` + `merkle_tree` + - R-14: formal correctness via `jc::pflug` (Pillar 10) + + `jc::hambly_lyons` (Pillar 11, feature-gated) + - R-15: `SignatureBasis: Basis` as fifth Plan G lane + (stream signal) 3. **The trajectory** from §2 — Phase 0 → A7 → parallelise → Phase 2 4. **The five-category architecture** including `ndarray-codec` 5. **The four traits** as the canonical contracts: @@ -1258,7 +1404,7 @@ When this doc is summarised across context windows, preserve at minimum: 7. **The falsifiability matrix in §9** — every claim has a test; not every claim will pass; that's the design -**Citation IDs in this doc** (R-1 .. R-13) are stable. Canon IDs +**Citation IDs in this doc** (R-1 .. R-15) are stable. Canon IDs (M:E-*, M:H-*, M:H-NEW-*, M:T-*, A:E-*, A:H-*, A:T-*, B:E-*, B:HG-*, B:D-*) remain stable per canon's §10. Append, never renumber. diff --git a/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md b/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md index 0da19ed7..e3f53e81 100644 --- a/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md +++ b/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md @@ -1,15 +1,15 @@ # PR-X12 — WoA Orchestration & Multi-Arch Dispatch Lens > Date: 2026-05-22 -> Status: **perspective doc** — examines how the orchestration crates (`woa-rs`, `woa`, `q2`, `surrealdb`, `MedCare-rs`, `smb-office-rs`) consume the PR-X12 substrate, and how PR-X12's per-arch dispatch decisions (R-4, R-5, R-11) generalise to the entire HPC stack. +> Status: **perspective doc** — examines how the orchestration crates (`woa-rs`, `woa`, `q2`, `surrealdb`, `MedCare-rs`, `smb-office-rs`) consume the PR-X12 substrate, and how PR-X12's per-arch polyfill decisions (R-4, R-5, R-11) generalise to the entire HPC stack. > -> Premise: PR-X12 is not just a codec project. It's the **per-arch dispatch contract** that every consumer above `ndarray` will inherit. The codec is the first non-trivial test of whether that contract holds. +> Premise: PR-X12 is not just a codec project. It's the **per-arch polyfill contract** that every consumer above `ndarray` will inherit. The codec is the first non-trivial test of whether that contract holds. --- ## 0. Thesis -**Every consumer crate dispatches kernels across {Intel SPR, AMD Zen 4-5, ARM Graviton 3-4, Apple Silicon, NVIDIA Hopper-Blackwell} via the same `ndarray::hpc` capability traits.** PR-X12's per-arch DCT crossover (R-5) and latency assertion (R-11) aren't codec-specific — they're the canonical shape of how any consumer crate gates fast-paths. If the codec's per-arch story is wrong, the entire HPC consumer ecosystem inherits the bug. +**Every consumer crate calls the same `ndarray::simd::*` / `ndarray::hpc::*` polyfill surface, regardless of which arch the binary was built for.** The polyfill is a per-arch swap underneath, selected by `cfg(target_feature = ...)` at compile time (per §3 and the W1a contract). PR-X12's per-arch DCT crossover (R-5) and latency assertion (R-11) aren't codec-specific — they're the canonical shape of how any consumer crate's per-arch story bottoms out at the polyfill. If the codec's per-arch story is wrong, the entire HPC consumer ecosystem inherits the bug. --- @@ -23,18 +23,18 @@ In a real deployment, a `woa-rs` agent processing a request might: 4. Update node-local cache (`surrealdb`) 5. Emit response stream (codec again) -Steps 1, 2, 3, 5 all hit the `ndarray::hpc` BLAS layer. Each step has a per-arch fast-path: SPR uses AMX, Zen 4 uses VNNI+AVX-512, Graviton 3 uses SVE2, Apple uses NEON/AMX, Hopper uses tensor cores. **None of the consumer crates know which fast-path is active.** They call `blas_level2::batched_gemm` and the substrate dispatches. +Steps 1, 2, 3, 5 all bottom out at `ndarray::simd::*` and `ndarray::hpc::*`. Each is a polyfill consumer — they call e.g. `blas_level2::batched_gemm` and get whatever backend the binary was compiled with. **None of the consumer crates know which backend is active**, and they MUST NOT: backend-specific symbols (AMX bytecode, AVX-512 asm, NEON intrinsics, SVE2 predicates) live exclusively inside `src/simd_.rs` and never reach a consumer's source. The fleet ships per-arch binaries (§3.2); each binary embeds one backend file via cfg. -This is what makes PR-X12's R-4 / R-11 architecture-conditional bench gates *substrate policy*, not codec policy. R-4 says "Plan G clears at most on 1 of: SPR / Zen 4 / Graviton 3 / Apple M-class," and R-11 adds latency assertions. That same gate structure applies to: +This is what makes PR-X12's R-4 / R-11 architecture-conditional bench gates *substrate policy*, not codec policy. R-4 says "Plan G clears on each of: SPR / Zen 4 / Graviton 3 / Apple M-class" (per-arch CI matrix), and R-11 adds per-arch latency assertions. That same gate structure applies to: -- `burn` model serving (forward pass per arch) -- `candle` quantized inference (q4/q8 per arch) -- `lance-graph::blasgraph` graph queries (tropical-GEMM per arch) -- `surrealdb` HNSW search (vector dist per arch) -- `MedCare-rs` DICOM transform (DCT + wavelet per arch) -- `smb-office-rs` OCR + layout (conv + attention per arch) +- `burn` model serving (forward pass: same Rust, per-arch binary) +- `candle` quantized inference (q4/q8: same Rust, per-arch binary) +- `lance-graph::blasgraph` graph queries (tropical-GEMM: same Rust, per-arch binary) +- `surrealdb` HNSW search (vector dist: same Rust, per-arch binary) +- `MedCare-rs` DICOM transform (DCT + wavelet: same Rust, per-arch binary) +- `smb-office-rs` OCR + layout (conv + attention: same Rust, per-arch binary) -Every one of these inherits the dispatch contract. PR-X12 is the first to make it visible. +Every one of these inherits the polyfill contract: identical consumer-facing Rust, one cfg-selected backend per build. PR-X12 is the first to make the parity-test obligation visible. --- @@ -53,105 +53,121 @@ Every one of these inherits the dispatch contract. PR-X12 is the first to make i │ surrealdb, MedCare-rs, smb-office-rs │ │ (Each: ~1-5K LoC of generic code + traits) │ └────────────────────┬───────────────────────────────┘ - │ capability traits, target_feature + │ same Rust API on every arch ▼ ┌────────────────────────────────────────────────────┐ -│ ndarray::hpc (the dispatch substrate) │ +│ ndarray::hpc + ndarray::simd (polyfill substrate) │ │ blas_level{1,2,3}, fft, cam_pq, activations, │ │ simd_int_ops, bf16_tile_gemm │ │ (~15K LoC; PR-X12 ratchets at this layer) │ └────────────────────┬───────────────────────────────┘ - │ per-arch SIMD intrinsics + │ cfg(target_feature = …) picks ONE ▼ ┌────────────────────────────────────────────────────┐ -│ Hardware: SPR / Zen / Graviton / Apple / Hopper │ -└────────────────────────────────────────────────────┘ +│ Backend file (one per binary): │ +│ simd_avx512.rs → asm/intrinsics + AMX bytecode │ +│ simd_neon.rs → NEON / SVE2 intrinsics │ +│ simd_scalar.rs → portable fallback │ +└────────────────────┬───────────────────────────────┘ + ▼ + Hardware: SPR / Zen / Graviton / Apple ``` -**WoA never touches `target_feature` directly.** Its job is async scheduling, transport (Q2 over QUIC), persistence (surrealdb), and policy. The SIMD dispatch happens one layer below, in the consumer crates calling `ndarray::hpc`. +**WoA never touches `target_feature` directly.** Its job is async task scheduling, transport (Q2 over QUIC), persistence (surrealdb), and policy. Per-arch SIMD code lives exclusively inside the backend file (`simd_.rs`); the polyfill above swaps which file is compiled in via cfg. -This separation is what makes R-3's LoC envelope (≤1500 LoC codec body) tractable. The codec crate doesn't dispatch — it calls the substrate. WoA doesn't dispatch — it calls the codec, which calls the substrate. Per-arch code lives once, in `ndarray::hpc`. +This separation is what makes R-3's LoC envelope (≤1500 LoC codec body) tractable. The codec crate doesn't choose a backend — it calls the polyfill. WoA doesn't choose a backend — it calls the codec, which calls the polyfill. Per-arch code lives once, inside `src/simd_.rs`, behind the polyfill surface. --- -## 3. Per-arch dispatch as a substrate property +## 3. Per-arch substrate via compile-time polyfill -The PR-X12 substrate (per merged canon §M:E-G, §M:E-H, R-4, R-5, R-11) implements per-arch dispatch via three mechanisms: +The PR-X12 substrate follows the project's W1a consumer contract (see `CLAUDE.md` and `.claude/knowledge/vertical-simd-consumer-contract.md`): **all dispatch is polyfill**. The stack has three layers, and only the bottom one is allowed to know about specific architectures: -### 3.1 Compile-time `target_feature` +```text +┌────────────────────────────────────────────────────────────┐ +│ Consumers — codec encode/decode bodies, downstream crates │ +│ (ndarray-codec, burn, candle, lance-graph, surrealdb, │ +│ MedCare-rs, smb-office-rs, q2, WoA scheduler) │ +│ Call ndarray::simd::* directly. Never name a backend. │ +└────────────────────────┬───────────────────────────────────┘ + │ identical signatures everywhere + ▼ +┌────────────────────────────────────────────────────────────┐ +│ Polyfill surface — src/simd.rs │ +│ cfg(target_feature = ...) re-exports exactly ONE backend │ +│ to compile in. Same fn names, same types, every arch. │ +└────────────────────────┬───────────────────────────────────┘ + │ cfg substitutes one file + ▼ +┌────────────────────────────────────────────────────────────┐ +│ Backend — simd_avx512.rs / simd_neon.rs / simd_scalar.rs │ +│ This is where AMX bytecode, AVX-512 asm/intrinsics, │ +│ NEON loads, SVE2 predicates LIVE. Implementation detail. │ +│ Consumers above never reach in here. │ +└────────────────────────────────────────────────────────────┘ +``` -```rust -// In ndarray::hpc::blas_level2::batched_gemm: +There is **no runtime CPU detection, no `HwCaps`/`CpuCaps` branching, no `if has_avx512 else …` dispatch, and no `unsafe { runtime_branch }` chain.** The target CPU is fixed at build time via `.cargo/config.toml` (`target-cpu=x86-64-v4` makes AVX-512 mandatory on x86_64) or via the target triple for non-x86 builds. One build, one backend file compiled in, one path. -#[cfg(target_arch = "x86_64")] -mod x86_dispatch { - #[target_feature(enable = "avx512f,avx512bw,avx512vnni")] - pub unsafe fn batched_gemm_vnni(...) { /* VNNI path */ } +### 3.1 The polyfill primitive: cfg-selected per-arch files - #[target_feature(enable = "amx-tile,amx-int8,amx-bf16")] - pub unsafe fn batched_gemm_amx(...) { /* AMX path */ } -} +The pattern already shipping in `src/simd*.rs` (per `CLAUDE.md` Repository Structure): -#[cfg(target_arch = "aarch64")] -mod arm_dispatch { - #[target_feature(enable = "sve2")] - pub unsafe fn batched_gemm_sve2(...) { /* SVE2 path */ } +```rust +// src/simd.rs — consumer-facing surface, re-exports a single backend +#[cfg(target_feature = "avx512f")] +pub use crate::simd_avx512::*; - #[target_feature(enable = "neon,fp16")] - pub unsafe fn batched_gemm_neon_fp16(...) { /* Apple Silicon */ } -} +#[cfg(all(not(target_feature = "avx512f"), target_arch = "aarch64"))] +pub use crate::simd_neon::*; + +#[cfg(not(any(target_feature = "avx512f", target_arch = "aarch64")))] +pub use crate::simd_scalar::*; ``` -### 3.2 Runtime feature detection (cached at process start) +Each backend file implements the same public functions with identical signatures; **the actual AMX bytecode / AVX-512 asm / NEON intrinsics / SVE2 predicates are contained inside those files** and never escape. The W1a contract requires all three backends + a parity test before any new primitive lands. -```rust -// In ndarray::hpc::capability: -pub static CAP: OnceLock = OnceLock::new(); - -pub struct HwCaps { - pub has_amx: bool, - pub has_vnni: bool, - pub has_sve2: bool, - pub has_neon_fp16: bool, - pub l1_cache_size: usize, - pub vec_width_bits: u16, - // ... more as new features land -} +**The codec body is a consumer of this polyfill.** When `ndarray-codec` writes encoding code — Skip/Merge/Delta/Escape mode selection, basin lookups, tropical-GEMM RDO, rANS state-machine ticks, EWA splat composition — it calls `ndarray::simd::*` exactly the way `burn` / `candle` / `lance-graph` do. **The codec does not know it is on AMX.** It does not reach for `simd_avx512::*` directly, does not name a backend symbol, does not branch on architecture. The cfg at the polyfill layer picks the right backend at build time; the encoder is identical Rust across all architectures. -pub fn batched_gemm(input: ...) { - let caps = CAP.get().unwrap(); - if caps.has_amx { unsafe { batched_gemm_amx(input) } } - else if caps.has_vnni { unsafe { batched_gemm_vnni(input) } } - else if caps.has_sve2 { unsafe { batched_gemm_sve2(input) } } - // ... - else { batched_gemm_scalar(input) } -} -``` +**Escape hatch (rare).** A very small number of hot inner loops may need to drop below the polyfill into a backend-specific intrinsic for performance reasons that the polyfill surface genuinely cannot express. When that happens: the violation lives inside `src/simd_.rs` (where backend-specific code is already at home), is `cfg`-gated to that arch, is parity-tested against the other backends' equivalent, and gets a `// SAFETY:` + agent audit per `CLAUDE.md`'s sentinel-qa rule. **It is the exception, not the model.** No consumer crate — codec body included — is ever the right place for it. -### 3.3 Per-arch tunable crossover (R-5 generalised) +### 3.2 Build-time CPU selection (not runtime detection) -Some operations have a "small N: scalar, large N: SIMD" crossover that varies per arch: +Target CPU is decided once, at build time: + +| Mechanism | Source | Effect | +|---|---|---| +| `.cargo/config.toml` `target-cpu=x86-64-v4` | repo policy | AVX-512 mandatory on x86_64 (per `CLAUDE.md`) | +| `--target aarch64-apple-darwin` | CI / fleet build matrix | NEON-fp16 backend compiles in | +| `--target aarch64-unknown-linux-gnu` + SVE2 target-feature | Graviton build | SVE2 backend compiles in | + +The WoA fleet ships **per-arch binaries**, not a fat binary that probes. Q2 distributes the right binary to each node based on the node's already-known architecture (declared at registration time, not detected per request). Cross-arch determinism (§6 below) is enforced because each binary embeds exactly one backend and the W1a parity test gates every primitive at the substrate layer. + +### 3.3 Per-arch tunable crossover (R-5) + +Some operations (DCT-II vs GEMM, basin-lookup width, etc.) have a "small N: scalar path, large N: SIMD path" crossover whose break-even N varies per backend. The crossover lives in the **same polyfill** as the SIMD primitives: a `cfg(target_feature = ...)`-selected `const`. ```rust -const DCT_BATCH_CROSSOVER: usize = match Arch::CURRENT { - Arch::SapphireRapids => 64, // AMX wins above this - Arch::IceLakeServer => 32, // AVX-512 narrower; lower crossover - Arch::Zen4 => 96, // Zen's AVX-512 emulation widens crossover - Arch::AppleM3 => 256, // NEON's narrower; only worth at large N - Arch::GravitonV3 => 128, // SVE2 mid-range - Arch::Generic => usize::MAX, // Always scalar fallback -}; +// src/hpc/dct_crossover.rs — one const per backend file, cfg-selected +// +// simd_avx512.rs: pub const DCT_BATCH_CROSSOVER: usize = 64; +// simd_neon.rs (Apple Silicon): pub const DCT_BATCH_CROSSOVER: usize = 256; +// simd_scalar.rs: pub const DCT_BATCH_CROSSOVER: usize = usize::MAX; pub fn dct_apply(input: &[i16], output: &mut [i16]) { if N >= DCT_BATCH_CROSSOVER { - unsafe { dct_gemm_path(input, output) } + dct_gemm_path(input, output) // calls into ndarray::simd::* } else { - dct_butterfly_path(input, output) + dct_butterfly_path(input, output) // also calls into ndarray::simd::* } } ``` -R-5 commits these crossovers as **bench-tunable constants**, not hand-guessed numbers. Plan G's codec-bench includes a calibration sub-target that emits the right `const` values per arch via build script. +The integer `DCT_BATCH_CROSSOVER` comes from one of two places: +1. **Hand-tuned default**: a known-good number per backend, checked into the backend file. +2. **Plan G calibration override**: `build.rs` may consult `CARGO_CFG_TARGET_FEATURE` + a pre-recorded calibration artifact from `codec-bench` and emit a refined const into `OUT_DIR`, included by the backend file. This is still compile-time selection — the build script never probes the host CPU, only reads Cargo's target-config env vars. + +Either way the constant is **fixed in the compiled binary**. R-5 commits these crossovers as bench-tunable but compile-time-fixed; the `cfg(target_feature)`-selected backend file is the single source of truth. --- @@ -173,7 +189,7 @@ PR-X12 (R-11) commits a budget on `T_codec`: | Tropical-GEMM RDO | ≤ 50 µs per CTU on SPR | derived from R-7 cost analysis | | Basis::apply (DCT) | ≤ 2 µs per 32×32 block on SPR | derived from R-5 | -**WoA's contract:** if any of these are violated on a supported arch, the consumer can either accept the slowdown or refuse to schedule the request. WoA has visibility into per-arch dispatch quality via the substrate's metrics endpoint: +**WoA's contract:** if any of these are violated on a supported arch, the consumer can either accept the slowdown or refuse to schedule the request. WoA has visibility into per-arch polyfill performance (which backend was compiled into the binary it's running, plus stage-latency telemetry) via the substrate's metrics endpoint: ```rust ndarray::hpc::metrics::stage_latency_p99(stage: StageId) -> Duration; @@ -228,7 +244,7 @@ This is a model for many features that look "out of scope" for PR-X12 but actual - Federated codebook → swap pointer to handle (R-13) - 3DGS scene anchor → add SceneAnchor header_kind (x266 doc) -- GPU offload → add `Reducer::dispatch_target() -> DispatchTarget` (Plan E adjacent) +- GPU offload → add a `Reducer::backend_target() -> BackendTarget` hook to let consumers opt into a GPU polyfill at compile time (Plan E adjacent; still cfg-selected, not runtime-branched) - Speculative decode → add `Frame::is_speculative()` bit in header reserved field None of these are PR-X12 scope. All of them require ≤50 LoC of "anchor" in PR-X12. The discipline of M:H-NEW-2 + R-3's LoC envelope is what makes future anchoring possible without forking the codec. @@ -290,7 +306,7 @@ Quick tour of what each crate inherits from PR-X12 substrate decisions: ### 8.1 `burn` (model training/inference) -Uses `blas_level3::gemm` for matrix multiply, `activations` for nonlinearities, `cam_pq` for KV cache compression. Per-arch dispatch via the same target_feature paths. Will benefit directly from PR-X12's R-4 / R-11 latency-assertion infrastructure when it lands (burn has wanted this for ~14 months). +Uses `blas_level3::gemm` for matrix multiply, `activations` for nonlinearities, `cam_pq` for KV cache compression. Per-arch polyfill via the same `cfg(target_feature)` mechanism — `burn` itself never names a backend. Will benefit directly from PR-X12's R-4 / R-11 latency-assertion infrastructure when it lands (burn has wanted this for ~14 months). ### 8.2 `candle` (quantized inference) @@ -323,7 +339,7 @@ Owns the federation policy (R-13), the codec version negotiation, and the per-ar In light of the above, the irreducible commitments PR-X12 must keep for the consumer ecosystem: 1. **Substrate API stability** — `blas_level2::batched_gemm`, `cam_pq::kmeans`, `fft::dct_apply`, `activations::conv2d` keep their signatures across PR-X12 changes. Additions OK, breaks not OK. -2. **Per-arch dispatch transparency** — consumers continue calling capability-trait methods; the substrate continues choosing the right SIMD path. +2. **Per-arch polyfill transparency** — consumers continue calling the `ndarray::simd::*` / `ndarray::hpc::*` surface unchanged across arches; cfg at the polyfill layer selects exactly one backend at build time. Consumers never name a backend symbol. 3. **`Reducer` ordered-sum guarantee** — any consumer using `OrderedKahanReducer` (or similar) continues to get bit-exact cross-arch reductions. 4. **Latency-assertion CI infrastructure** — R-11's framework is consumer-callable for their own benches; not codec-private. 5. **Codebook handle indirection** (R-13) — the codec ships with the handle pattern, consumers can swap codebooks without forking. diff --git a/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md b/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md index 14ba0f2d..b22eb80a 100644 --- a/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md +++ b/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md @@ -268,12 +268,14 @@ Nothing in this doc is in PR-X12 scope. What it requires from PR-X12: | Requirement | Source | Status | |---|---|---| -| `Basis` trait with parametric `apply` | R-1, M:E-A | landed in concept; implementation in Plan A4 | +| `Basis` trait with parametric `apply` | R-1, M:E-A | **canon-fixed** (R-1 trait shape committed); **implementation** scheduled in Plan A4 | | EWA splat rasterizer as `Basis` impl | Plan E | scheduled | -| Codec body decoupled from specific basis | M:H-NEW-2 LoC envelope | enforced via R-3 audit | -| Header byte stable across basis swaps | R-2, M:E-J bits 0-1 | landed | +| Codec body decoupled from specific basis | M:H-NEW-2 LoC envelope | enforced via R-3 audit rule (doc commitment; CI check pending) | +| Header byte stable across basis swaps | R-2, M:E-J bits 0-1 | **canon-fixed** (R-2 commits bits 0-1 = `header_kind`); wire-format implementation in Plan A8 | | Plan G video lane validates per-arch latency | R-4, R-11 | scheduled | -| Federated codebook policy for scene anchors | R-13 | landed | +| Federated codebook policy for scene anchors | R-13 | **canon-fixed** (R-13 commits Option A: per-shard codebook for Plan F v1); implementation in Plan F | + +**"Canon-fixed"** = the resolution doc commits the design; **"scheduled"** = the implementation has a named plan card. None of the above have shipping code today. The path to x266-like capability is: