diff --git a/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md b/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md
index e715d403..313833af 100644
--- a/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md
+++ b/.claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md
@@ -303,7 +303,7 @@ This is the doc-level value of PR-X12: bgz code + PR-X12 docs = a complete archi
 
 ## 5. Gaps — what doesn't exist yet
 
-### 5.1 `jd-nd` — the missing ndarray-side proof crate
+### 5.1 `jd-nd` — the missing ndarray-side proof crate (Gap **G-1**)
 
 The Explore search confirmed: `jd-nd` does not exist in `/home/user/ndarray/`. The math-proof infrastructure on the ndarray side lives ad-hoc inside `src/hpc/` modules (`deepnsm.rs`, `jina/runtime.rs`) as TODO comments.
 
@@ -335,7 +335,7 @@ ndarray/crates/jd-nd/
 
 **Why now:** R-11's latency CI needs a *correctness* twin. Latency that's fast but wrong is the worst outcome. jd-nd is the structural place for those proofs.
 
-### 5.2 Cronbach / ICC research crate
+### 5.2 Cronbach / ICC research crate (Gap **G-2**)
 
 `lance-graph/crates/lance-graph-codec-research/` exists per the Explore agent's report, **but its scope is FFT (rustfft) variants**, not Cronbach's α / ICC / encoding-reliability psychometrics.
 
diff --git a/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md b/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md
index 3ba29be3..e1deb77d 100644
--- a/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md
+++ b/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md
@@ -308,12 +308,14 @@ Updating the inventory from `pr-x12-bgz-jc-substrate-synergies.md` §7 with the
 
 **Total estimated gap-closing work: 8-12 weeks** across the seven items, all incremental on existing infrastructure. None of them require new research; all are wiring existing primitives into the codec.
 
-Two prior gaps from the earlier doc remain:
+Two prior gaps from the earlier doc remain (their canonical IDs are owned by `pr-x12-bgz-jc-substrate-synergies.md` §5; cross-referenced here):
 
-| Gap (prior) | Component | Cost |
+| Gap (cross-ref) | Component | Cost |
 |---|---|---|
-| **G-8** | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing |
-| **G-9** | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC |
+| **bgz-jc G-1** (§5.1) | `jd-nd` crate does not exist (ndarray-side proof crate) | 2-3 weeks skeleton + ongoing |
+| **bgz-jc G-2** (§5.2) | Cronbach/ICC encoding-reliability research crate not implemented | 1-2 weeks skeleton + 2-3 weeks PoC |
+
+The G-1..G-7 IDs in §5 of *this* doc are local to the cam-pq / sigker / dn_tree binding; bgz-jc's G-1 / G-2 are a separate namespace owned by that doc. When citing cross-doc, prefix with the source (e.g., "bgz-jc G-1" vs "cam-pq G-1") to avoid the collision the previous G-8 / G-9 labelling implied.
 
 **Grand total: ~11-17 weeks** of substrate-binding + gap-closing work, parallel-able. PR-X12 codec body (~1500 LoC per R-3) is independent of this and can ship sooner.
 
diff --git a/.claude/knowledge/pr-x12-canon-resolutions-delta.md b/.claude/knowledge/pr-x12-canon-resolutions-delta.md
index ad7b923f..fd6795b7 100644
--- a/.claude/knowledge/pr-x12-canon-resolutions-delta.md
+++ b/.claude/knowledge/pr-x12-canon-resolutions-delta.md
@@ -9,15 +9,16 @@
 
 ## 0. What's actually new
 
-The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Five categories of novel content survive the delta filter:
+The merged canon (`bc9da4ad`) argued the architecture; canon-resolutions makes it falsifiable. Six categories of novel content survive the delta filter:
 
 1. **Concrete trait signatures** — R-1 (`Basis<T>` + `LinearReduce` split), §8 surface (`PredictiveSignal`, `CurveOrder<const N>`, `RdoMetric`)
 2. **Quantified budgets** — R-3 LoC envelope per sub-card / per consumer + audit rule; R-4 four Plan G thresholds; R-11 4K@60fps latency budget
-3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`)
+3. **Math identities** — R-6 SSD-via-VNNI (`||A||² - 2A·B + ||B||²`), R-7 tropical-GEMM partition (`O(4^d) → O(d²)`, kernel at `bgz17::scalar_sparse::tropical_spmv`)
 4. **Type-level invariants** — R-2 bit-15/bit-14 split, R-9 topology-FREE codec
-5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook
+5. **Phasing patterns** — R-8 confidence-gate framing, R-13 Option-A-then-B for federated codebook (primitives: `cam_pq` + `bgz-hhtl-d` + `dn_tree` + `merkle_tree`)
+6. **Formal-correctness + stream lane (post-merge)** — R-14 (`jc::pflug` Pillar 10 + `jc::hambly_lyons` Pillar 11), R-15 (`SignatureBasis<DEPTH>` as fifth Plan G lane)
 
-Plus the synthesis layer: §9 falsifiability matrix (24 rows), §10 sequencing with named gates, §12 compaction-preservation contract.
+Plus the synthesis layer: §9 falsifiability matrix (24+3 rows including R-14/R-15), §10 sequencing with named gates, §12 compaction-preservation contract.
 
 ---
 
@@ -216,7 +217,9 @@ Tropical-semiring (+, min) formulation:
 
 At 4K 132K CTUs/frame: ~4 ms vs ~64 ms just for partition RDO. At 60 fps, the difference between fitting and missing budget.
 
-**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom.
+**Dep direction:** `ndarray-codec → lance-graph::blasgraph` (tropical-GEMM kernels nominally live in blasgraph). Allowed post-Plan-H because ndarray-codec is a sibling crate, not the bottom.
+
+**Actual kernel home (current):** `lance-graph::bgz17::scalar_sparse::tropical_spmv`. The `blasgraph` namespace is the eventual abstraction; until that lands, ndarray-codec depends on bgz17 directly. Cite the symbol when wiring A6, not the namespace.
 
 **Plan A6 (1 week) ships this.** λ-RDO knob scales edge weights; tropical-GEMM relaxation computes optimal mode tree.
 
@@ -292,6 +295,16 @@ Pattern: ship simplest-that-works, measure, escalate. Don't pick best-in-theory
 
 Wire-format hook for Option A: `WorkerId: u16` + `CodebookHash: u64` in frame header.
 
+**Implementation primitives** (already exist; PR-X12 only adds the wire format + `CodebookHandle` trait):
+
+| Concern | Crate / module |
+|---|---|
+| Codebook training (k-means + CAM-PQ) | `ndarray::hpc::cam_pq::CamCodebook` |
+| Deployed encoding format | `lance-graph::bgz-tensor::Codebook4096` / `bgz-hhtl-d` |
+| Online plastic updates (SharedClusterWide) | `ndarray::hpc::dn_tree` |
+| Integrity proof (Blake3-48 Merkle root, xor_diff) | `ndarray::hpc::merkle_tree` |
+| Gossip protocol | `q2` (external) |
+
 ### 5.3 Streaming flush granularity (R-12)
 
 Per-CTU default. `FlushUnit` 2-bit tag in frame header:
@@ -405,9 +418,48 @@ Citation IDs (R-1..R-13) stable. Canon IDs (M:E-*, M:H-*, M:H-NEW-*, M:T-*, A:E-
 
 ---
 
-## 11. The single load-bearing paragraph (§13)
+## 11. Formal-correctness layer (R-14) — post-merge addition
+
+The substrate-binding doc (`pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md`) surfaced two formal proofs in `lance-graph::jc` that the codec inherits without re-proving:
+
+| Pillar | Crate / module | What it proves | Status |
+|---|---|---|---|
+| **Pillar 10** (Pflug-Pichler) | `jc::pflug` | Nested-distance Lipschitz on Sigma DN-trees: CAM-PQ tree quantization preserves FreeEnergy within Lε | Active in default zero-dep build |
+| **Pillar 11** (Hambly-Lyons) | `jc::hambly_lyons` | Signature uniqueness on tree-quotient: any path of bounded variation is uniquely determined by its truncated signature up to tree-like equivalence (Annals 171(1), arXiv:math/0507536) | Active under `--features hambly-lyons` (PR #348, 2026-05-07); probe passes (forward<1e-9, converse>0.05, ratio≥1e6) |
+
+R-4's quality-floor rows for video / KV / gradient inherit Pillar 10's Lipschitz bound. R-15's signature lane gates on Pillar 11.
+
+**Open work (G-4):** PR #350 corrects `sigker::signature_kernel_pde`'s known Goursat-PDE math bug; Pillar 11's probe deliberately uses `signature_truncated` (tensor-algebra) until PR #350 lands. Production-scale benchmarking pending.
+
+---
+
+## 12. Stream-signal codec lane (R-15) — post-merge addition
+
+`SignatureBasis<const DEPTH: usize>: Basis<f32>` is the fifth concrete `Basis<T>` impl, complementing the four lanes in §1's table:
+
+```rust
+// New: ndarray::hpc::signature (~1 wk, wraps sigker::signature_truncated)
+impl<const DEPTH: usize> Basis<f32> for SignatureBasis<DEPTH> {
+    fn dim(&self) -> usize { /* truncated tensor-algebra dim */ }
+    fn apply(&self, path: &[f32], signature: &mut [f32]) {
+        // iterated-integral truncation via sigker::signature_truncated
+    }
+    fn invert(&self, _sig: &[f32], _path: &mut [f32]) {
+        unimplemented!("path-from-signature is unique only up to tree-like \
+                        equivalence per R-14 Pillar 11")
+    }
+}
+```
+
+**Plan G gets a fifth lane: "stream signal"** — audio waveforms / time-series / gesture / handwriting paths. Codec is `SignatureBasis<DEPTH=3>` + standard rANS over the four-mode taxonomy; quality floor inherits from Pillar 11 (R-14); compression target ~10× over raw f32 path samples (calibrate during Plan G).
+
+**Why `signature_truncated` not `signature_kernel_pde`:** the PDE form ships a known divergence bug (PR #350). The tensor-algebra path is correct today and is what Pillar 11 cites.
+
+---
+
+## 13. The single load-bearing paragraph (canon-resolutions §13)
 
-> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis<T>` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point.*
+> *The merged canon committed to the right architectural synthesis (M:E-A, M:E-D, M:E-G, M:E-I) but left the load-bearing contracts unsigned. Canon-resolutions commits them: `Basis<T>` + `LinearReduce` are two traits not one (R-1); bit 14 of the leaf header is consumer-typed and bit 15 universal (R-2); generic codec body ≤1500 LoC with ≤200 LoC per consumer (R-3); four threshold pairs gate Plan G's pass criteria (R-4); the trajectory is Plan G (2 wks) → Plan A7 critical path (1.5 wks) → Phase 2 consumers parallel (3 wks); end state is one binary, four loads, ~2 KLoC stack demonstrating M:H-NEW-1 in ~10.5 weeks of wall-clock. Every claim in §9 has a test; Plan G's bench-harness binary is the audit. The falsifiability is the point. The substrate-binding follow-up (R-14, R-15) adds a formal-correctness layer via `jc` pillars and a fifth stream-signal lane via `SignatureBasis<DEPTH>`.*
 
 ---
 
diff --git a/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md b/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md
index eda384c5..e1fb0c91 100644
--- a/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md
+++ b/.claude/knowledge/pr-x12-gguf-llm-weights-encoding.md
@@ -131,7 +131,7 @@ Crucially, the residual is **rANS-coded with a Gaussian-tail prior** (R-10). GGU
 
 For weights that are too extreme to fit any basin (the activation outliers that LLM.int8() and SmoothQuant fight over), encode as Escape + raw f16 value. ~3-5% of weights per layer, but they carry disproportionate information.
 
-The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate.
+The PR-X12 wire format already supports Escape as the lossy-fallback path (with the codec body warning per M:T new items). For LLM weights, Escape *must be lossless* — no truncation of outliers. This is an additional R-N candidate; see §10 falsifier **F-4** for the wire-format mechanism (rANS bypass channel in the A8 framing layer) and the HEVC-escape-coefficient precedent.
 
 ---
 
@@ -266,7 +266,9 @@ Per GEMM operation (e.g., compute attn_q @ x for batch):
 
 The CTU bitstream is read forward-only (rANS is a streaming codec) and the decoded weights live in L1/L2 cache just long enough to be GEMM'd. **No full-tensor dequantize buffer needed.** For a 4096 × 4096 attention projection, the dequantize buffer would be 32 MB (f16); PR-X12 streams in ~3-4 MB of bitstream, decodes to ~64 KB cache-resident windows, GEMMs each window, drops it.
 
-**Memory savings:** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch." A 7B model at PR-X12 is genuinely runnable on a phone-class device, where GGUF Q4 is borderline.
+**Memory savings (weights only):** on a memory-constrained edge device (8 GB RAM), this turns "loads 4 GB model + needs 1 GB dequant scratch" into "loads 3 GB model + needs 64 KB scratch."
+
+**Phone-class caveat — weights are not the only memory load.** The KV cache scales with context length and is independent of weight compression: for a 7B model at 8K context, KV cache is ~2 GB in fp16 / ~1 GB in int8, and grows linearly with context. PR-X12 weight compression alone takes a 7B from "borderline" to "easier" on phone-class hardware, but **the KV cache lane (Plan D, M:H-3, R-4) is the second lever** that has to compress for full phone-class viability at non-trivial context. Both lanes are needed; this lens only addresses the weights side.
 
 **Latency:** the streaming decode happens in the same loop body as the GEMM accumulate. On a modern arch with VNNI + AMX, the decode cost (~5-10 cycles per cell, branchless via R-1's lookup-table pattern) is hidden by GEMM latency. **Estimated overhead: < 5% versus pre-dequantized GEMM.**
 
@@ -345,7 +347,7 @@ Concrete implications:
 
 4. **Do** keep R-13's federated codebook policy. The LLM use case is the strongest motivation: per-model codebooks are 13 MB; without R-13, a hard-coded codebook would not work for arbitrary LLMs.
 
-5. **Reserve** an `EncodingDomain::LLMWeights` discriminant in the codec metadata header (separate from the 16-bit per-CTU header). The codec body doesn't read this — it just stamps the file with a domain tag so decoders know which basin codebook to load.
+5. **Reserve** the *enum-discriminant slot* for `EncodingDomain::LLMWeights` in the codec metadata header *now*, even though the actual LLM-lane decoder lands post-PR-X12 (per implication #2). The header reserves a fixed-size domain-tag field (separate from the 16-bit per-CTU header); the LLMWeights value of that field stays unimplemented in PR-X12, but the slot is forward-compatibility-locked so a future PR can add the variant without a wire-format break. The codec body doesn't read this — it stamps the file with a domain tag so decoders know which basin codebook to load.
 
 6. **Bench against AWQ at parity perplexity, not just Q4_K_M.** Q4_K_M is a conservative baseline; AWQ + GPTQ are the actual state of the art. If PR-X12 can match AWQ at smaller storage, the case is strong; if not, ship at "drop-in GGUF replacement" framing only.
 
diff --git a/.claude/knowledge/pr-x12-substrate-canon-resolutions.md b/.claude/knowledge/pr-x12-substrate-canon-resolutions.md
index 5bd633ba..26e99042 100644
--- a/.claude/knowledge/pr-x12-substrate-canon-resolutions.md
+++ b/.claude/knowledge/pr-x12-substrate-canon-resolutions.md
@@ -24,8 +24,11 @@ were raised in review:
   (R-5 through R-7 restorations)
 - **§6** — three pieces of detail from session B the merge underrepresented
   (R-8 through R-10 restorations)
-- **§7** — three commitments missing from both originals and from the
-  merge (R-11 through R-13 new specs)
+- **§7** — five commitments missing from both originals and from the
+  merge: R-11 through R-13 (latency, flush granularity, federated
+  codebook) plus R-14 (formal correctness via `jc` pillars) and R-15
+  (`SignatureBasis<DEPTH>` as fifth Plan G lane), the latter two
+  surfaced post-merge by the substrate-binding docs
 
 Then five integration pieces that make the resolutions actionable:
 
@@ -36,9 +39,10 @@ Then five integration pieces that make the resolutions actionable:
 - **§11** — end-state + trajectory (think it from the end)
 - **§12** — compaction-preservation contract
 
-Citation IDs: `R-1` through `R-13` for resolutions. Canon IDs (`M:E-*`,
-`A:E-*`, `B:E-*`, `M:H-*`, `M:T-*`) remain stable; this doc adds, does
-not renumber.
+Citation IDs: `R-1` through `R-15` for resolutions (R-14, R-15
+appended post-merge from the substrate-binding doc; numbering remains
+append-only). Canon IDs (`M:E-*`, `A:E-*`, `B:E-*`, `M:H-*`, `M:T-*`)
+remain stable; this doc adds, does not renumber.
 
 Sister docs (read order):
 
@@ -543,6 +547,14 @@ ships tropical-GEMM kernels. No new code in ndarray; cross-repo dep
 from ndarray-codec → lance-graph::blasgraph (after Plan H extraction,
 this is dep-allowed because ndarray-codec is a sibling, not the bottom).
 
+**Actual kernel home (current).** The tropical-GEMM kernel lives today
+at `lance-graph::bgz17::scalar_sparse::tropical_spmv` — NOT in an
+abstract `blasgraph` namespace. The codec's tropical-GEMM call is
+`bgz17::scalar_sparse::tropical_spmv(edge_weights, dag)`. The
+`lance-graph::blasgraph` name above is the eventual abstraction layer
+(post-Plan-H extraction); until that lands, ndarray-codec depends on
+bgz17 directly. Cite the symbol, not the namespace, when wiring A6.
+
 **Plan A6 RDO (1 week) ships this.** The λ-RDO knob (per A:§10.3) and
 the tropical-GEMM partition solver are the same kernel: λ scales the
 edge weights, the relaxation computes the optimal mode tree.
@@ -935,10 +947,135 @@ empirically; v3 (research-grade) tries Option C.
 R-4 gradient threshold (8× compression at <0.5% loss delta). At that
 point, Plan F v1 escalates to Option B in a follow-up PR.
 
+**Implementation primitives (current substrate, no new code required):**
+
+| Concern | Crate / module |
+|---------|----------------|
+| Codebook training (k-means + CAM-PQ) | `ndarray::hpc::cam_pq::CamCodebook` (`train_geometric` / `train_semantic` / `train_hybrid`) |
+| Deployed encoding format (per-shard) | `lance-graph::bgz-tensor::Codebook4096` and the `bgz-hhtl-d` shared-palette variant |
+| Online plastic updates (`SharedClusterWide`) | `ndarray::hpc::dn_tree` (quaternary plastic memory, partial-Hamming descent) |
+| Integrity proof for distributed updates | `ndarray::hpc::merkle_tree` (Blake3-48-bit, 1 KB root, `xor_diff` panCAKES compression) |
+| Gossip protocol (cluster-wide) | `q2` (external — implements the wire protocol) |
+
+The four policy modes (`LocalEphemeral` / `SharedClusterWide` /
+`SharedRegional` / `PretrainedStatic`) compose these primitives
+differently; the codec body exposes a `CodebookHandle` trait, and the
+primitives plug in via that trait. **PR-X12 contributes the wire format
++ trait + Option A; the primitives above already exist.**
+
 **Cite as R-13 in Plan F PR description.**
 
 ---
 
+### R-14 — Formal correctness via `lance-graph::jc` pillars
+
+**Problem.** Canon and resolutions describe the codec's empirical
+behaviour (R-4 thresholds, R-11 latency) but never name the formal
+correctness proofs the substrate already carries. Without a citation,
+"the codec is correct" is unverifiable; with citations, the codec
+inherits machine-checked guarantees from existing crates.
+
+**Resolution.** Pin both pillars and what each proves.
+
+**Two formal proofs in `lance-graph::jc`:**
+
+- **Quantization correctness (Pillar 10, Pflug-Pichler):**
+  nested-distance Lipschitz on Sigma DN-trees. Proves that CAM-PQ tree
+  quantization preserves the FreeEnergy functional within a Lipschitz
+  factor Lε. **This is the proof PR-X12 cites for "wire-format
+  quantization is faithful."** Implementation: `jc::pflug` (active in
+  default build, zero-dep).
+- **Path-signature correctness (Pillar 11, Hambly-Lyons):**
+  signature uniqueness on tree-quotient. Proves that any path of
+  bounded variation is uniquely determined by its truncated signature
+  up to tree-like equivalence (Annals of Mathematics 171(1):109–167,
+  arXiv:math/0507536). **This is the proof PR-X12 cites for the
+  `SignatureBasis<DEPTH>` lane (R-15).** Implementation:
+  `jc::hambly_lyons` (active under `--features hambly-lyons`, since
+  PR #348 landed on 2026-05-07).
+
+**What the codec inherits.** Both pillars exist; the codec cites them
+and does not reprove. R-4's "Quality floor" rows for video / KV /
+gradient inherit Pillar 10's Lipschitz bound automatically. R-15's
+signature-lane gates on Pillar 11.
+
+**Status.**
+
+- Pillar 10: active in default zero-dep build.
+- Pillar 11: active under `--features hambly-lyons`; passes its probe
+  (forward < 1e-9, converse > 0.05, discrimination ratio ≥ 1e6 over
+  N=100 random pairs in d=3 at depth-2).
+- Production-scale benchmarking + PR #350 (`signature_kernel_pde`
+  Goursat-PDE math correction) remain open — see Gap G-4 in
+  `pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md`. Pillar 11's
+  probe deliberately uses `signature_truncated` (tensor-algebra path),
+  not the buggy PDE form.
+
+**Falsifies if.** Pillar 10 ever flips state (a regression in the
+Pflug-Pichler proof bound) — Plan G's video / KV / gradient quality
+floors lose their formal underwriting and become empirical-only.
+
+**Cite as R-14 in any PR claiming "codec output is faithful to
+input" or wiring `SignatureBasis` (R-15).**
+
+---
+
+### R-15 — `SignatureBasis<const DEPTH: usize>` as `Basis<f32>` impl
+
+**Problem.** R-1 commits the `Basis<T>` shape; the canon lists three
+concrete impls (`DctIIBasis<N>` for video, `EwaSplatBasis` for 3DGS,
+`ShSpectralBasis<L>` for splat SH). No `Basis<T>` impl targets
+*streams* — audio waveforms, time-series, gesture/handwriting paths.
+Plan G has only four lanes; path-structured signals are unaddressed.
+
+**Resolution.** Commit `SignatureBasis<const DEPTH: usize>: Basis<f32>`
+as the fifth concrete impl, wrapping the path-signature kernel from
+the external `lance-graph::sigker` crate.
+
+```rust
+// Concrete impl, lives in ndarray::hpc::signature (new module, ~1 wk)
+impl<const DEPTH: usize> Basis<f32> for SignatureBasis<DEPTH> {
+    fn dim(&self) -> usize { /* truncated tensor-algebra dim at DEPTH */ }
+    fn apply(&self, path: &[f32], signature: &mut [f32]) {
+        // iterated-integral truncation against sigker::signature_truncated
+    }
+    fn invert(&self, _sig: &[f32], _path: &mut [f32]) {
+        // signature → path is many-to-one (tree-quotient); document as N/A
+        unimplemented!("signature inversion is N/A — path unique only up to \
+                        tree-like equivalence per R-14 / Pillar 11")
+    }
+}
+```
+
+**Why `signature_truncated` and not `signature_kernel_pde`.** The
+PDE form in sigker ships a known math bug (PR #350: Goursat-PDE form
+diverges from the true kernel `I₀(2·√⟨u, v⟩)` at moderate inner
+products). The tensor-algebra path (`signature_truncated`) is correct
+today and is what jc Pillar 11 cites. R-15 wraps the truncated path;
+the PDE form becomes available after PR #350 lands.
+
+**Plan G gets a fifth lane.** "Stream signal" mode:
+
+- Input: audio waveform / time-series / gesture stream
+- Codec: `SignatureBasis<DEPTH=3>` truncates path signature, residuals
+  go through standard rANS via the four-mode taxonomy
+- Quality floor: signature-uniqueness preservation per Pillar 11
+- Compression target: ~10× over raw f32 path samples (estimate;
+  calibrate during Plan G)
+
+**Falsifies if.** `SignatureBasis<DEPTH=3>` plus rANS fails to
+reconstruct the path within ε under Pillar 11's discrimination ratio.
+At that point, raise DEPTH or fall back to per-block DCT-II for the
+stream lane.
+
+**Cost.** ~1 week wrapper around `sigker::signature_truncated` +
+basis-trait plumbing + Plan G fifth-lane wiring.
+
+**Cite as R-15 in any PR adding a stream-signal codec lane or
+wiring `SignatureBasis`.**
+
+---
+
 ## 8. The canonical contracts — concrete trait signatures
 
 All three plug-points (per M:E-E) get concrete signatures here. These
@@ -1108,6 +1245,9 @@ that decides whether each holy-grail claim is demonstrated.
 | R-11 (4K 60fps SIMD-batched) | this doc | Plan G video latency assert | Per-CTU encode time | ≤210 ns/CTU on Sapphire Rapids |
 | R-12 (per-CTU flush) | this doc | A8 frame-header parse + decode | First-CTU latency | First CTU decodable before frame complete |
 | R-13 (Option A per-shard) | this doc | Plan F on BERT-glue | 8× compression + accuracy | Holds; else escalate to Option B |
+| R-14 (Pillar 10 active) | this doc | `cargo test -p jc` (default features) | Pflug-Pichler Lipschitz bound | Pillar 10 probe green |
+| R-14 (Pillar 11 active) | this doc | `cargo test -p jc --features hambly-lyons` | Signature uniqueness probe | forward < 1e-9, converse > 0.05, ratio ≥ 1e6 |
+| R-15 (SignatureBasis lane) | this doc | Plan G stream-signal lane | signature-space discrimination under Pillar 11 (forward-only — path inversion is N/A per R-15) | forward < 1e-9, converse > 0.05, ratio ≥ 1e6 (or agreed DEPTH-specific floor) |
 
 **Every row of this matrix is a test.** Plan G's bench harness binary
 emits a JSON report containing the actual measurement for each row;
@@ -1234,7 +1374,7 @@ When this doc is summarised across context windows, preserve at minimum:
 
 1. **The five "merged well" items** from §3 (M:E-A / M:E-D / M:E-G /
    M:E-I / M:E-F) — these are the canon's load-bearing synthesis.
-2. **The thirteen resolutions** R-1 through R-13 with their summary
+2. **The fifteen resolutions** R-1 through R-15 with their summary
    commitments:
    - R-1: `LinearReduce<Basis>` two-trait shape
    - R-2: bit 15 universal, bit 14 consumer-typed
@@ -1242,13 +1382,19 @@ When this doc is summarised across context windows, preserve at minimum:
    - R-4: 4 threshold pairs (video, splat, kv, grad)
    - R-5: DCT crossover ~64 blocks
    - R-6: SSD via VNNI ≥30× over SAD
-   - R-7: tropical-GEMM partition O(4^d) → O(d²)
+   - R-7: tropical-GEMM partition O(4^d) → O(d²) via
+     `bgz17::scalar_sparse::tropical_spmv`
    - R-8: Plan G is confidence gate
    - R-9: topology-FREE codec layer
    - R-10: ~4 bits/Gaussian near target, ~1 bit stretch
    - R-11: 210 ns/CTU SIMD-batched encode
    - R-12: per-CTU flush default; per-bucket Plan F
-   - R-13: Option A (per-shard codebook) for Plan F v1
+   - R-13: Option A (per-shard codebook) for Plan F v1; primitives are
+     `cam_pq` + `bgz-hhtl-d` + `dn_tree` + `merkle_tree`
+   - R-14: formal correctness via `jc::pflug` (Pillar 10) +
+     `jc::hambly_lyons` (Pillar 11, feature-gated)
+   - R-15: `SignatureBasis<DEPTH>: Basis<f32>` as fifth Plan G lane
+     (stream signal)
 3. **The trajectory** from §2 — Phase 0 → A7 → parallelise → Phase 2
 4. **The five-category architecture** including `ndarray-codec`
 5. **The four traits** as the canonical contracts:
@@ -1258,7 +1404,7 @@ When this doc is summarised across context windows, preserve at minimum:
 7. **The falsifiability matrix in §9** — every claim has a test;
    not every claim will pass; that's the design
 
-**Citation IDs in this doc** (R-1 .. R-13) are stable. Canon IDs
+**Citation IDs in this doc** (R-1 .. R-15) are stable. Canon IDs
 (M:E-*, M:H-*, M:H-NEW-*, M:T-*, A:E-*, A:H-*, A:T-*, B:E-*, B:HG-*,
 B:D-*) remain stable per canon's §10. Append, never renumber.
 
diff --git a/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md b/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md
index 0da19ed7..e3f53e81 100644
--- a/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md
+++ b/.claude/knowledge/pr-x12-woa-multiarch-orchestration.md
@@ -1,15 +1,15 @@
 # PR-X12 — WoA Orchestration & Multi-Arch Dispatch Lens
 
 > Date: 2026-05-22
-> Status: **perspective doc** — examines how the orchestration crates (`woa-rs`, `woa`, `q2`, `surrealdb`, `MedCare-rs`, `smb-office-rs`) consume the PR-X12 substrate, and how PR-X12's per-arch dispatch decisions (R-4, R-5, R-11) generalise to the entire HPC stack.
+> Status: **perspective doc** — examines how the orchestration crates (`woa-rs`, `woa`, `q2`, `surrealdb`, `MedCare-rs`, `smb-office-rs`) consume the PR-X12 substrate, and how PR-X12's per-arch polyfill decisions (R-4, R-5, R-11) generalise to the entire HPC stack.
 >
-> Premise: PR-X12 is not just a codec project. It's the **per-arch dispatch contract** that every consumer above `ndarray` will inherit. The codec is the first non-trivial test of whether that contract holds.
+> Premise: PR-X12 is not just a codec project. It's the **per-arch polyfill contract** that every consumer above `ndarray` will inherit. The codec is the first non-trivial test of whether that contract holds.
 
 ---
 
 ## 0. Thesis
 
-**Every consumer crate dispatches kernels across {Intel SPR, AMD Zen 4-5, ARM Graviton 3-4, Apple Silicon, NVIDIA Hopper-Blackwell} via the same `ndarray::hpc` capability traits.** PR-X12's per-arch DCT crossover (R-5) and latency assertion (R-11) aren't codec-specific — they're the canonical shape of how any consumer crate gates fast-paths. If the codec's per-arch story is wrong, the entire HPC consumer ecosystem inherits the bug.
+**Every consumer crate calls the same `ndarray::simd::*` / `ndarray::hpc::*` polyfill surface, regardless of which arch the binary was built for.** The polyfill is a per-arch swap underneath, selected by `cfg(target_feature = ...)` at compile time (per §3 and the W1a contract). PR-X12's per-arch DCT crossover (R-5) and latency assertion (R-11) aren't codec-specific — they're the canonical shape of how any consumer crate's per-arch story bottoms out at the polyfill. If the codec's per-arch story is wrong, the entire HPC consumer ecosystem inherits the bug.
 
 ---
 
@@ -23,18 +23,18 @@ In a real deployment, a `woa-rs` agent processing a request might:
 4. Update node-local cache (`surrealdb`)
 5. Emit response stream (codec again)
 
-Steps 1, 2, 3, 5 all hit the `ndarray::hpc` BLAS layer. Each step has a per-arch fast-path: SPR uses AMX, Zen 4 uses VNNI+AVX-512, Graviton 3 uses SVE2, Apple uses NEON/AMX, Hopper uses tensor cores. **None of the consumer crates know which fast-path is active.** They call `blas_level2::batched_gemm` and the substrate dispatches.
+Steps 1, 2, 3, 5 all bottom out at `ndarray::simd::*` and `ndarray::hpc::*`. Each is a polyfill consumer — they call e.g. `blas_level2::batched_gemm` and get whatever backend the binary was compiled with. **None of the consumer crates know which backend is active**, and they MUST NOT: backend-specific symbols (AMX bytecode, AVX-512 asm, NEON intrinsics, SVE2 predicates) live exclusively inside `src/simd_<arch>.rs` and never reach a consumer's source. The fleet ships per-arch binaries (§3.2); each binary embeds one backend file via cfg.
 
-This is what makes PR-X12's R-4 / R-11 architecture-conditional bench gates *substrate policy*, not codec policy. R-4 says "Plan G clears at most on 1 of: SPR / Zen 4 / Graviton 3 / Apple M-class," and R-11 adds latency assertions. That same gate structure applies to:
+This is what makes PR-X12's R-4 / R-11 architecture-conditional bench gates *substrate policy*, not codec policy. R-4 says "Plan G clears on each of: SPR / Zen 4 / Graviton 3 / Apple M-class" (per-arch CI matrix), and R-11 adds per-arch latency assertions. That same gate structure applies to:
 
-- `burn` model serving (forward pass per arch)
-- `candle` quantized inference (q4/q8 per arch)
-- `lance-graph::blasgraph` graph queries (tropical-GEMM per arch)
-- `surrealdb` HNSW search (vector dist per arch)
-- `MedCare-rs` DICOM transform (DCT + wavelet per arch)
-- `smb-office-rs` OCR + layout (conv + attention per arch)
+- `burn` model serving (forward pass: same Rust, per-arch binary)
+- `candle` quantized inference (q4/q8: same Rust, per-arch binary)
+- `lance-graph::blasgraph` graph queries (tropical-GEMM: same Rust, per-arch binary)
+- `surrealdb` HNSW search (vector dist: same Rust, per-arch binary)
+- `MedCare-rs` DICOM transform (DCT + wavelet: same Rust, per-arch binary)
+- `smb-office-rs` OCR + layout (conv + attention: same Rust, per-arch binary)
 
-Every one of these inherits the dispatch contract. PR-X12 is the first to make it visible.
+Every one of these inherits the polyfill contract: identical consumer-facing Rust, one cfg-selected backend per build. PR-X12 is the first to make the parity-test obligation visible.
 
 ---
 
@@ -53,105 +53,121 @@ Every one of these inherits the dispatch contract. PR-X12 is the first to make i
 │   surrealdb, MedCare-rs, smb-office-rs             │
 │   (Each: ~1-5K LoC of generic code + traits)       │
 └────────────────────┬───────────────────────────────┘
-                     │ capability traits, target_feature
+                     │ same Rust API on every arch
                      ▼
 ┌────────────────────────────────────────────────────┐
-│ ndarray::hpc (the dispatch substrate)              │
+│ ndarray::hpc + ndarray::simd (polyfill substrate)  │
 │   blas_level{1,2,3}, fft, cam_pq, activations,     │
 │   simd_int_ops, bf16_tile_gemm                     │
 │   (~15K LoC; PR-X12 ratchets at this layer)        │
 └────────────────────┬───────────────────────────────┘
-                     │ per-arch SIMD intrinsics
+                     │ cfg(target_feature = …) picks ONE
                      ▼
 ┌────────────────────────────────────────────────────┐
-│ Hardware: SPR / Zen / Graviton / Apple / Hopper    │
-└────────────────────────────────────────────────────┘
+│ Backend file (one per binary):                     │
+│   simd_avx512.rs  →  asm/intrinsics + AMX bytecode │
+│   simd_neon.rs    →  NEON / SVE2 intrinsics        │
+│   simd_scalar.rs  →  portable fallback             │
+└────────────────────┬───────────────────────────────┘
+                     ▼
+        Hardware: SPR / Zen / Graviton / Apple
 ```
 
-**WoA never touches `target_feature` directly.** Its job is async scheduling, transport (Q2 over QUIC), persistence (surrealdb), and policy. The SIMD dispatch happens one layer below, in the consumer crates calling `ndarray::hpc`.
+**WoA never touches `target_feature` directly.** Its job is async task scheduling, transport (Q2 over QUIC), persistence (surrealdb), and policy. Per-arch SIMD code lives exclusively inside the backend file (`simd_<arch>.rs`); the polyfill above swaps which file is compiled in via cfg.
 
-This separation is what makes R-3's LoC envelope (≤1500 LoC codec body) tractable. The codec crate doesn't dispatch — it calls the substrate. WoA doesn't dispatch — it calls the codec, which calls the substrate. Per-arch code lives once, in `ndarray::hpc`.
+This separation is what makes R-3's LoC envelope (≤1500 LoC codec body) tractable. The codec crate doesn't choose a backend — it calls the polyfill. WoA doesn't choose a backend — it calls the codec, which calls the polyfill. Per-arch code lives once, inside `src/simd_<arch>.rs`, behind the polyfill surface.
 
 ---
 
-## 3. Per-arch dispatch as a substrate property
+## 3. Per-arch substrate via compile-time polyfill
 
-The PR-X12 substrate (per merged canon §M:E-G, §M:E-H, R-4, R-5, R-11) implements per-arch dispatch via three mechanisms:
+The PR-X12 substrate follows the project's W1a consumer contract (see `CLAUDE.md` and `.claude/knowledge/vertical-simd-consumer-contract.md`): **all dispatch is polyfill**. The stack has three layers, and only the bottom one is allowed to know about specific architectures:
 
-### 3.1 Compile-time `target_feature`
+```text
+┌────────────────────────────────────────────────────────────┐
+│ Consumers — codec encode/decode bodies, downstream crates  │
+│   (ndarray-codec, burn, candle, lance-graph, surrealdb,    │
+│    MedCare-rs, smb-office-rs, q2, WoA scheduler)           │
+│   Call ndarray::simd::* directly. Never name a backend.    │
+└────────────────────────┬───────────────────────────────────┘
+                         │ identical signatures everywhere
+                         ▼
+┌────────────────────────────────────────────────────────────┐
+│ Polyfill surface — src/simd.rs                             │
+│   cfg(target_feature = ...) re-exports exactly ONE backend │
+│   to compile in. Same fn names, same types, every arch.    │
+└────────────────────────┬───────────────────────────────────┘
+                         │ cfg substitutes one file
+                         ▼
+┌────────────────────────────────────────────────────────────┐
+│ Backend — simd_avx512.rs / simd_neon.rs / simd_scalar.rs   │
+│   This is where AMX bytecode, AVX-512 asm/intrinsics,      │
+│   NEON loads, SVE2 predicates LIVE. Implementation detail. │
+│   Consumers above never reach in here.                     │
+└────────────────────────────────────────────────────────────┘
+```
 
-```rust
-// In ndarray::hpc::blas_level2::batched_gemm:
+There is **no runtime CPU detection, no `HwCaps`/`CpuCaps` branching, no `if has_avx512 else …` dispatch, and no `unsafe { runtime_branch }` chain.** The target CPU is fixed at build time via `.cargo/config.toml` (`target-cpu=x86-64-v4` makes AVX-512 mandatory on x86_64) or via the target triple for non-x86 builds. One build, one backend file compiled in, one path.
 
-#[cfg(target_arch = "x86_64")]
-mod x86_dispatch {
-    #[target_feature(enable = "avx512f,avx512bw,avx512vnni")]
-    pub unsafe fn batched_gemm_vnni(...) { /* VNNI path */ }
+### 3.1 The polyfill primitive: cfg-selected per-arch files
 
-    #[target_feature(enable = "amx-tile,amx-int8,amx-bf16")]
-    pub unsafe fn batched_gemm_amx(...) { /* AMX path */ }
-}
+The pattern already shipping in `src/simd*.rs` (per `CLAUDE.md` Repository Structure):
 
-#[cfg(target_arch = "aarch64")]
-mod arm_dispatch {
-    #[target_feature(enable = "sve2")]
-    pub unsafe fn batched_gemm_sve2(...) { /* SVE2 path */ }
+```rust
+// src/simd.rs — consumer-facing surface, re-exports a single backend
+#[cfg(target_feature = "avx512f")]
+pub use crate::simd_avx512::*;
 
-    #[target_feature(enable = "neon,fp16")]
-    pub unsafe fn batched_gemm_neon_fp16(...) { /* Apple Silicon */ }
-}
+#[cfg(all(not(target_feature = "avx512f"), target_arch = "aarch64"))]
+pub use crate::simd_neon::*;
+
+#[cfg(not(any(target_feature = "avx512f", target_arch = "aarch64")))]
+pub use crate::simd_scalar::*;
 ```
 
-### 3.2 Runtime feature detection (cached at process start)
+Each backend file implements the same public functions with identical signatures; **the actual AMX bytecode / AVX-512 asm / NEON intrinsics / SVE2 predicates are contained inside those files** and never escape. The W1a contract requires all three backends + a parity test before any new primitive lands.
 
-```rust
-// In ndarray::hpc::capability:
-pub static CAP: OnceLock<HwCaps> = OnceLock::new();
-
-pub struct HwCaps {
-    pub has_amx: bool,
-    pub has_vnni: bool,
-    pub has_sve2: bool,
-    pub has_neon_fp16: bool,
-    pub l1_cache_size: usize,
-    pub vec_width_bits: u16,
-    // ... more as new features land
-}
+**The codec body is a consumer of this polyfill.** When `ndarray-codec` writes encoding code — Skip/Merge/Delta/Escape mode selection, basin lookups, tropical-GEMM RDO, rANS state-machine ticks, EWA splat composition — it calls `ndarray::simd::*` exactly the way `burn` / `candle` / `lance-graph` do. **The codec does not know it is on AMX.** It does not reach for `simd_avx512::*` directly, does not name a backend symbol, does not branch on architecture. The cfg at the polyfill layer picks the right backend at build time; the encoder is identical Rust across all architectures.
 
-pub fn batched_gemm(input: ...) {
-    let caps = CAP.get().unwrap();
-    if caps.has_amx { unsafe { batched_gemm_amx(input) } }
-    else if caps.has_vnni { unsafe { batched_gemm_vnni(input) } }
-    else if caps.has_sve2 { unsafe { batched_gemm_sve2(input) } }
-    // ...
-    else { batched_gemm_scalar(input) }
-}
-```
+**Escape hatch (rare).** A very small number of hot inner loops may need to drop below the polyfill into a backend-specific intrinsic for performance reasons that the polyfill surface genuinely cannot express. When that happens: the violation lives inside `src/simd_<arch>.rs` (where backend-specific code is already at home), is `cfg`-gated to that arch, is parity-tested against the other backends' equivalent, and gets a `// SAFETY:` + agent audit per `CLAUDE.md`'s sentinel-qa rule. **It is the exception, not the model.** No consumer crate — codec body included — is ever the right place for it.
 
-### 3.3 Per-arch tunable crossover (R-5 generalised)
+### 3.2 Build-time CPU selection (not runtime detection)
 
-Some operations have a "small N: scalar, large N: SIMD" crossover that varies per arch:
+Target CPU is decided once, at build time:
+
+| Mechanism | Source | Effect |
+|---|---|---|
+| `.cargo/config.toml` `target-cpu=x86-64-v4` | repo policy | AVX-512 mandatory on x86_64 (per `CLAUDE.md`) |
+| `--target aarch64-apple-darwin` | CI / fleet build matrix | NEON-fp16 backend compiles in |
+| `--target aarch64-unknown-linux-gnu` + SVE2 target-feature | Graviton build | SVE2 backend compiles in |
+
+The WoA fleet ships **per-arch binaries**, not a fat binary that probes. Q2 distributes the right binary to each node based on the node's already-known architecture (declared at registration time, not detected per request). Cross-arch determinism (§6 below) is enforced because each binary embeds exactly one backend and the W1a parity test gates every primitive at the substrate layer.
+
+### 3.3 Per-arch tunable crossover (R-5)
+
+Some operations (DCT-II vs GEMM, basin-lookup width, etc.) have a "small N: scalar path, large N: SIMD path" crossover whose break-even N varies per backend. The crossover lives in the **same polyfill** as the SIMD primitives: a `cfg(target_feature = ...)`-selected `const`.
 
 ```rust
-const DCT_BATCH_CROSSOVER: usize = match Arch::CURRENT {
-    Arch::SapphireRapids => 64,   // AMX wins above this
-    Arch::IceLakeServer => 32,    // AVX-512 narrower; lower crossover
-    Arch::Zen4 => 96,             // Zen's AVX-512 emulation widens crossover
-    Arch::AppleM3 => 256,         // NEON's narrower; only worth at large N
-    Arch::GravitonV3 => 128,      // SVE2 mid-range
-    Arch::Generic => usize::MAX,  // Always scalar fallback
-};
+// src/hpc/dct_crossover.rs — one const per backend file, cfg-selected
+//
+//   simd_avx512.rs:                pub const DCT_BATCH_CROSSOVER: usize = 64;
+//   simd_neon.rs (Apple Silicon):  pub const DCT_BATCH_CROSSOVER: usize = 256;
+//   simd_scalar.rs:                pub const DCT_BATCH_CROSSOVER: usize = usize::MAX;
 
 pub fn dct_apply<const N: usize>(input: &[i16], output: &mut [i16]) {
     if N >= DCT_BATCH_CROSSOVER {
-        unsafe { dct_gemm_path(input, output) }
+        dct_gemm_path(input, output)      // calls into ndarray::simd::*
     } else {
-        dct_butterfly_path(input, output)
+        dct_butterfly_path(input, output) // also calls into ndarray::simd::*
     }
 }
 ```
 
-R-5 commits these crossovers as **bench-tunable constants**, not hand-guessed numbers. Plan G's codec-bench includes a calibration sub-target that emits the right `const` values per arch via build script.
+The integer `DCT_BATCH_CROSSOVER` comes from one of two places:
+1. **Hand-tuned default**: a known-good number per backend, checked into the backend file.
+2. **Plan G calibration override**: `build.rs` may consult `CARGO_CFG_TARGET_FEATURE` + a pre-recorded calibration artifact from `codec-bench` and emit a refined const into `OUT_DIR`, included by the backend file. This is still compile-time selection — the build script never probes the host CPU, only reads Cargo's target-config env vars.
+
+Either way the constant is **fixed in the compiled binary**. R-5 commits these crossovers as bench-tunable but compile-time-fixed; the `cfg(target_feature)`-selected backend file is the single source of truth.
 
 ---
 
@@ -173,7 +189,7 @@ PR-X12 (R-11) commits a budget on `T_codec`:
 | Tropical-GEMM RDO | ≤ 50 µs per CTU on SPR | derived from R-7 cost analysis |
 | Basis::apply (DCT) | ≤ 2 µs per 32×32 block on SPR | derived from R-5 |
 
-**WoA's contract:** if any of these are violated on a supported arch, the consumer can either accept the slowdown or refuse to schedule the request. WoA has visibility into per-arch dispatch quality via the substrate's metrics endpoint:
+**WoA's contract:** if any of these are violated on a supported arch, the consumer can either accept the slowdown or refuse to schedule the request. WoA has visibility into per-arch polyfill performance (which backend was compiled into the binary it's running, plus stage-latency telemetry) via the substrate's metrics endpoint:
 
 ```rust
 ndarray::hpc::metrics::stage_latency_p99(stage: StageId) -> Duration;
@@ -228,7 +244,7 @@ This is a model for many features that look "out of scope" for PR-X12 but actual
 
 - Federated codebook → swap pointer to handle (R-13)
 - 3DGS scene anchor → add SceneAnchor header_kind (x266 doc)
-- GPU offload → add `Reducer::dispatch_target() -> DispatchTarget` (Plan E adjacent)
+- GPU offload → add a `Reducer::backend_target() -> BackendTarget` hook to let consumers opt into a GPU polyfill at compile time (Plan E adjacent; still cfg-selected, not runtime-branched)
 - Speculative decode → add `Frame::is_speculative()` bit in header reserved field
 
 None of these are PR-X12 scope. All of them require ≤50 LoC of "anchor" in PR-X12. The discipline of M:H-NEW-2 + R-3's LoC envelope is what makes future anchoring possible without forking the codec.
@@ -290,7 +306,7 @@ Quick tour of what each crate inherits from PR-X12 substrate decisions:
 
 ### 8.1 `burn` (model training/inference)
 
-Uses `blas_level3::gemm` for matrix multiply, `activations` for nonlinearities, `cam_pq` for KV cache compression. Per-arch dispatch via the same target_feature paths. Will benefit directly from PR-X12's R-4 / R-11 latency-assertion infrastructure when it lands (burn has wanted this for ~14 months).
+Uses `blas_level3::gemm` for matrix multiply, `activations` for nonlinearities, `cam_pq` for KV cache compression. Per-arch polyfill via the same `cfg(target_feature)` mechanism — `burn` itself never names a backend. Will benefit directly from PR-X12's R-4 / R-11 latency-assertion infrastructure when it lands (burn has wanted this for ~14 months).
 
 ### 8.2 `candle` (quantized inference)
 
@@ -323,7 +339,7 @@ Owns the federation policy (R-13), the codec version negotiation, and the per-ar
 In light of the above, the irreducible commitments PR-X12 must keep for the consumer ecosystem:
 
 1. **Substrate API stability** — `blas_level2::batched_gemm`, `cam_pq::kmeans`, `fft::dct_apply`, `activations::conv2d` keep their signatures across PR-X12 changes. Additions OK, breaks not OK.
-2. **Per-arch dispatch transparency** — consumers continue calling capability-trait methods; the substrate continues choosing the right SIMD path.
+2. **Per-arch polyfill transparency** — consumers continue calling the `ndarray::simd::*` / `ndarray::hpc::*` surface unchanged across arches; cfg at the polyfill layer selects exactly one backend at build time. Consumers never name a backend symbol.
 3. **`Reducer<T>` ordered-sum guarantee** — any consumer using `OrderedKahanReducer` (or similar) continues to get bit-exact cross-arch reductions.
 4. **Latency-assertion CI infrastructure** — R-11's framework is consumer-callable for their own benches; not codec-private.
 5. **Codebook handle indirection** (R-13) — the codec ships with the handle pattern, consumers can swap codebooks without forking.
diff --git a/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md b/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md
index 14ba0f2d..b22eb80a 100644
--- a/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md
+++ b/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md
@@ -268,12 +268,14 @@ Nothing in this doc is in PR-X12 scope. What it requires from PR-X12:
 
 | Requirement | Source | Status |
 |---|---|---|
-| `Basis<T>` trait with parametric `apply` | R-1, M:E-A | landed in concept; implementation in Plan A4 |
+| `Basis<T>` trait with parametric `apply` | R-1, M:E-A | **canon-fixed** (R-1 trait shape committed); **implementation** scheduled in Plan A4 |
 | EWA splat rasterizer as `Basis<f16>` impl | Plan E | scheduled |
-| Codec body decoupled from specific basis | M:H-NEW-2 LoC envelope | enforced via R-3 audit |
-| Header byte stable across basis swaps | R-2, M:E-J bits 0-1 | landed |
+| Codec body decoupled from specific basis | M:H-NEW-2 LoC envelope | enforced via R-3 audit rule (doc commitment; CI check pending) |
+| Header byte stable across basis swaps | R-2, M:E-J bits 0-1 | **canon-fixed** (R-2 commits bits 0-1 = `header_kind`); wire-format implementation in Plan A8 |
 | Plan G video lane validates per-arch latency | R-4, R-11 | scheduled |
-| Federated codebook policy for scene anchors | R-13 | landed |
+| Federated codebook policy for scene anchors | R-13 | **canon-fixed** (R-13 commits Option A: per-shard codebook for Plan F v1); implementation in Plan F |
+
+**"Canon-fixed"** = the resolution doc commits the design; **"scheduled"** = the implementation has a named plan card. None of the above have shipping code today.
 
 The path to x266-like capability is: