docs(pr-x12): address PR #197 CodeRabbit review — 7 open items

claude · claude · commit 21b61ebba0de · 2026-05-22T17:07:00.000Z
Arithmetic errors - R-11 (canon-resolutions): 132,710 was 8×8 leaves/frame, not 64×64 CTUs (which is 2,040). Relabelled the per-CTU breakdown table and budget derivation to per-leaf; clarified the 1 CTU = ~64 leaves at max split depth relationship. Latency narrative unchanged. - R-12 (canon-resolutions): per-CTU flush rate at 4K was 80,000/sec but 2,040 CTUs × 60 fps = 122,400/sec; 1080p figure refined to 30,600/sec (510 CTUs × 60). - x266 §6 bandwidth: HEVC at 4K/60 from 1080p/30 scales as 4 (res) × 2 (fps) × 6.25 MB = 50 MB, not 4 × 4 × 6.25 = 100 MB. PR-X12 wins by ~6× (not 12×); crossover ~1.3× (not 3×). Bits 14-15 consistency (3 sites reconciled to R-2 canon) - merged-canon M:E-J §3 formalisation note: replaced stale 'bits 14-15 = leaf_size' with R-2's bit 15 = UNIVERSAL inter-tier / bit 14 = CONSUMER-TYPED via ConsumerProfile. Leaf size lives in Ctu<const N> at the type level (M:E-G), not in header bits. - x266 §5 header layout: same R-2 alignment for the HEVC-compatible header block. - anti-neural §6: enhancement-layer flag moved from per-leaf bit 14 (claimed by R-2 consumer-typed demux) to the frame header reserved area (alongside ConsumerProfile and FlushUnit per R-2/R-12). Path / citation corrections (cam-pq-sigker-dn-tree doc) - Absolute /home/user/ndarray/... paths replaced with repo-relative src/hpc/... since this repo IS ndarray. - /home/user/lance-graph/... paths marked as external repo (adaworldapi/lance-graph), not resolvable in this checkout. - Hambly-Lyons 2010 citation now includes arXiv:math/0507536 and Annals of Mathematics 171(1):109-167. All claims re-verified by reading each affected file in full (canon-resolutions 1281 lines, x266 321, merged-canon 647, anti-neural 337, cam-pq 395) before edits — the same protocol applied to the bgz-jc proofread. https://claude.ai/code/session_01HbqooFZHAjaUtFEzhA1R2u
diff --git a/.claude/knowledge/pr-x12-anti-neural-lookup-inversion.md b/.claude/knowledge/pr-x12-anti-neural-lookup-inversion.md
@@ -234,7 +234,7 @@ For these use cases, the right architecture is a **layered codec**:
 1. **Base layer:** PR-X12 frozen-lookup codec for the bits-actually-transmitted
 2. **Enhancement layer:** NN generative refinement at the decoder (optional, off by default)
 
-The base layer guarantees fidelity bounded by Shannon. The enhancement layer provides perceptual hallucination when the user opts in. PR-X12's wire format reserves a single bit (M:E-J bit 14 currently used for leaf_size; one of the reserved bits in future revisions) for the "enhancement layer available" flag.
+The base layer guarantees fidelity bounded by Shannon. The enhancement layer provides perceptual hallucination when the user opts in. PR-X12's wire format reserves a single bit in the **frame header** (alongside `ConsumerProfile` and `FlushUnit` per R-2 / R-12) for the "enhancement layer available" flag — not in the per-leaf 16-bit header, whose bit 14 is already claimed by R-2's consumer-typed demux and whose bit 15 is the universal inter-tier reference.
 
 This is also the right architecture for high-stakes content (legal, medical, scientific): always run the base layer, never run the enhancement layer. Determinism preserved.
 
diff --git a/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md b/.claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md
@@ -17,7 +17,7 @@
 
 ### 1.1 What it is
 
-**Location:** `/home/user/ndarray/src/hpc/cam_pq.rs`
+**Location:** `src/hpc/cam_pq.rs` (this repo)
 
 **Algorithm:** Content-Addressable Memory (CAM) + Product Quantization (PQ). Unifies FAISS PQ6×8 (48-bit fingerprints, 6 subspaces × 256 centroids each) with CLAM 48-bit archetypes into a single codec.
 
@@ -108,7 +108,7 @@ The codebook implementation is `cam_pq::CamCodebook`. The four policy variants c
 
 ### 2.1 What it is
 
-**Location:** `/home/user/lance-graph/crates/sigker/`
+**Location:** `crates/sigker/` in the external `adaworldapi/lance-graph` repo (not in this `ndarray` repo)
 
 **Algorithm:** Path-signature representations for sequential / path-structured data. Implements Chen-Lyons signatures S(X) = (1, ∫dX, ∫∫dX⊗dX, …) up to depth N, with shuffle-product algebra and proven uniqueness.
 
@@ -134,7 +134,7 @@ pub struct CodecRouteSigker { /* lance-graph codec routing integration */ }
 |---|---|---|
 | Chen, "Iterated integrals and exponential homomorphisms" | 1957 | Original signature construction |
 | Lyons, "Differential equations driven by rough signals" | 1998 | Rough path theory, signature universal approximator |
-| Hambly-Lyons, "Uniqueness for the signature of a path of bounded variation" | 2010 | **Theorem 4: signatures uniquely determine paths up to tree-like equivalence** |
+| Hambly-Lyons, "Uniqueness for the signature of a path of bounded variation" (**arXiv:math/0507536**, Annals of Mathematics 171(1):109–167) | 2010 | **Theorem 4: signatures uniquely determine paths up to tree-like equivalence** |
 | Salvi-Cass-Foster-Lyons-Lemercier | 2020 | **arXiv:2006.14794** — Goursat-PDE solver for signature kernel, O(T₁·T₂·d), no signature materialization |
 | Cuchiero-Schmocker-Teichmann | 2021 | **Randomized signature universality**: any continuous path-functional ≈ linear combo of randomized-signature coordinates |
 
@@ -206,7 +206,7 @@ This unlocks: **path-structured codec lanes** in Plan G (audio waveforms, time-s
 
 ### 3.1 dn_tree — quaternary plastic memory
 
-**Location:** `/home/user/ndarray/src/hpc/dn_tree.rs`
+**Location:** `src/hpc/dn_tree.rs` (this repo)
 
 **Algorithm:** Quaternary hierarchical bitmap summary tree for plastic graph traversal. Adapted from "On Demand Memory Specialization for Distributed Graph Processing" (2013). Properties:
 
@@ -222,7 +222,7 @@ This unlocks: **path-structured codec lanes** in Plan G (audio waveforms, time-s
 
 ### 3.2 merkle_tree — integrity proof for CogRecord regions
 
-**Location:** `/home/user/ndarray/src/hpc/merkle_tree.rs`
+**Location:** `src/hpc/merkle_tree.rs` (this repo)
 
 **Algorithm:** 8-Kbit Merkle tree built from CogRecord regions as a compressed searchable proxy. Properties:
 
@@ -376,15 +376,16 @@ This doc (#4) and the bgz/jc doc (#3) are the ones that ground PR-X12 in working
 - **GGUF lens (activation-aware RDO claim):** `pr-x12-gguf-llm-weights-encoding.md` §5 — supported by G-1 closure
 - **Anti-neural lens (lookup-table cost analysis):** `pr-x12-anti-neural-lookup-inversion.md` §3 — supported by G-4 + G-5 closure
 - **Multi-arch lens (determinism + integrity):** `pr-x12-woa-multiarch-orchestration.md` §6 — supported by G-4 + G-7 closure
-- **Source code references:**
-  - `/home/user/ndarray/src/hpc/cam_pq.rs` — the codebook trainer
-  - `/home/user/ndarray/src/hpc/dn_tree.rs` — quaternary plastic memory
-  - `/home/user/ndarray/src/hpc/merkle_tree.rs` — Blake3-48-bit Merkle
-  - `/home/user/lance-graph/crates/sigker/` — Chen-Lyons signatures
-  - `/home/user/lance-graph/crates/sigker/src/` — `signature_kernel_pde`, `RandomizedSignature`, `CodecRouteSigker`
-  - `/home/user/lance-graph/crates/jc/src/hambly_lyons.rs` — Pillar 11 (active under `--features hambly-lyons`; DEFERRED only in default zero-dep build)
-  - `/home/user/lance-graph/crates/jc/src/pflug.rs` — Pillar 10 (nested-distance Lipschitz on Sigma DN-trees, certifies CAM-PQ)
-  - `/home/user/lance-graph/crates/bgz-tensor/src/adaptive_codec.rs` — cam_pq imports
+- **Source code references (in this repo `adaworldapi/ndarray`):**
+  - `src/hpc/cam_pq.rs` — the codebook trainer
+  - `src/hpc/dn_tree.rs` — quaternary plastic memory
+  - `src/hpc/merkle_tree.rs` — Blake3-48-bit Merkle
+- **Source code references (external repo `adaworldapi/lance-graph`):**
+  - `crates/sigker/` — Chen-Lyons signatures
+  - `crates/sigker/src/` — `signature_kernel_pde`, `RandomizedSignature`, `CodecRouteSigker`
+  - `crates/jc/src/hambly_lyons.rs` — Pillar 11 (active under `--features hambly-lyons`; DEFERRED only in default zero-dep build)
+  - `crates/jc/src/pflug.rs` — Pillar 10 (nested-distance Lipschitz on Sigma DN-trees, certifies CAM-PQ)
+  - `crates/bgz-tensor/src/adaptive_codec.rs` — cam_pq imports
 - **arXiv anchors for sigker:**
   - **2006.14794** (Salvi-Cass-Foster-Lyons-Lemercier 2020) — Goursat PDE for signature kernel
   - Hambly-Lyons 2010 — signature uniqueness theorem
diff --git a/.claude/knowledge/pr-x12-substrate-canon-resolutions.md b/.claude/knowledge/pr-x12-substrate-canon-resolutions.md
@@ -767,11 +767,17 @@ to the budget.
 ```text
 4K = 3840 × 2160 = 8.3 M pixels
 60 fps = 16.67 ms/frame
-At 64×64 CTU: 132,710 CTUs/frame
-Per-CTU budget: 16.67 / 132710 = 125 ns/CTU
+At 8×8 leaf granularity (HEVC's smallest CU; the unit at which the
+encoder's inner-loop work is paid):
+                              132,710 leaves/frame
+                              (= 2,040 CTUs/frame at 64×64, × ~64
+                               leaves/CTU at maximum split depth;
+                               130,560 from clean 3840·2160/64, with
+                               ~1.6 % bias for chroma alignment)
+Per-leaf budget: 16.67 ms / 132,710 = 125 ns/leaf
 ```
 
-**Encoder per-CTU breakdown (scalar reference, current):**
+**Encoder per-leaf breakdown (scalar reference, current):**
 
 | Stage | Scalar cost | SIMD-batched target |
 |-------|-------------|---------------------|
@@ -781,12 +787,12 @@ Per-CTU budget: 16.67 / 132710 = 125 ns/CTU
 | transform (A4, 8×8 DCT-II butterfly) | ~30 ns | ~30 ns |
 | quantize (i8 round) | ~5 ns | ~5 ns |
 | rANS encode (A7) | ~40 ns | ~40 ns |
-| **Total per-CTU** | **~960 ns** | **~210 ns** |
+| **Total per-leaf** | **~960 ns** | **~210 ns** |
 
-**At scalar reference (960 ns/CTU): 4K @ 60 fps requires 132710 ×
+**At scalar reference (960 ns/leaf): 4K @ 60 fps requires 132,710 ×
 960 ns = 127 ms/frame. Misses 60 fps by 7.6×.**
 
-**At SIMD-batched (210 ns/CTU): 132710 × 210 ns = 28 ms/frame. Misses
+**At SIMD-batched (210 ns/leaf): 132,710 × 210 ns = 28 ms/frame. Misses
 60 fps by 1.7×; needs further work but in the same order of magnitude.**
 
 **To hit 60 fps 4K real-time** requires the SIMD-batched-encode path
@@ -797,8 +803,8 @@ reference only.
 **Implication for Plan G.** The `--mode video` threshold (R-4)
 includes a latency assertion: total encode time for the Big Buck Bunny
 1080p clip must complete within (clip duration × 0.5). At 1080p that's
-33,825 CTUs/frame × 210 ns × 30 fps = ~213 ms/sec, well within budget.
-4K is the stretch target.
+~32,400 leaves/frame × 210 ns × 30 fps = ~204 ms/sec, well within
+budget. 4K is the stretch target.
 
 **Cite as R-11 in any encoder-path PR description; the latency
 budget is the gate that determines whether SIMD-batched encode is P0
@@ -813,13 +819,14 @@ Different answers make Plan A8 substantially different shapes.
 
 **Resolution.** Commit per-CTU as the default; per-bucket for Plan F.
 
-**Per-CTU flush (committed default):**
+**Per-CTU flush (committed default; CTU = 64×64 cells, so 4096 cells/CTU,
+2,040 CTUs/frame at 4K and ~510 CTUs/frame at 1080p):**
 
 ```text
 Buffer size:   ~12 KB per CTU
                  = 4096 cells × avg 3 bytes (mode-distribution per R-10)
-Flush rate:    ~80,000 flushes/sec at 4K 60 fps  (132710 CTU/frame × 60)
-               ~30,000 flushes/sec at 1080p 60 fps
+Flush rate:    ~122,400 flushes/sec at 4K 60 fps  (2,040 CTUs/frame × 60)
+               ~30,600 flushes/sec at 1080p 60 fps (510 CTUs/frame × 60)
 Latency:       sub-ms per CTU; consumer can start decoding the first
                CTU before encoder finishes the frame
 ```
diff --git a/.claude/knowledge/pr-x12-substrate-merged-canon.md b/.claude/knowledge/pr-x12-substrate-merged-canon.md
@@ -278,7 +278,18 @@ pub trait PredictiveSignal {
 
 ### M:E-J — The reserved header bits 14-15 carry causal-edge metadata for free
 
-> [Formalised post-merge as **R-2**: 16-bit header bit layout pinned — bits 0-1 = `header_kind`, bits 2-13 = `basin_index`, bits 14-15 = `leaf_size ∈ {8,16,32,64}` (which subsumes M:E-G's `Ctu<const N>`). The causal-tier reading below remains valid but is now the *interpretation*, not the on-wire layout — see R-2 and R-8.]
+> [Formalised post-merge as **R-2**: 16-bit header bit layout pinned —
+> bits 0-1 = `header_kind`, bits 2-13 = `basin_index`,
+> **bit 15 = UNIVERSAL "has inter-tier reference"** (identical across
+> all four consumers; A3-inter cross-tier link),
+> **bit 14 = CONSUMER-TYPED via the frame header's `ConsumerProfile`
+> tag** (cognitive: Pearl-rung high bit; video: reserved=0;
+> splat: LOD-cascade-source flag; gradient: worker-shard parity).
+> Leaf size (8/16/32/64) is encoded structurally via M:E-G's
+> `Ctu<const N>` at the type level, NOT in header bits 14-15. The
+> causal-tier reading below is the historical motivation for bit 14;
+> R-2 generalises it to the four-consumer demux. See
+> `pr-x12-substrate-canon-resolutions.md` §R-2.]
 
 A's E-15 (reserved bits 14-15 are inter-tier link) + A's T-22 (causal-edge v2 mantissa: Intervention=+6, Counterfactual=-6):
 
diff --git a/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md b/.claude/knowledge/pr-x12-x266-3dgs-spacetime-upscaling.md
@@ -163,7 +163,13 @@ Building on M:E-J's 16-bit header layout (header_kind ∈ {Skip, Merge, Delta, E
 HEVC-compatible PR-X12 header (16 bits, R-2):
     bits 0-1:   header_kind {Skip, Merge, Delta, Escape}
     bits 2-13:  basin_index (12 bits, M:E-J)
-    bits 14-15: leaf_size ∈ {8, 16, 32, 64}
+    bit  14:    CONSUMER-TYPED (semantic per frame-header `ConsumerProfile`;
+                cognitive: Pearl-rung high bit; video: reserved=0;
+                splat: LOD-cascade-source flag; gradient: worker-shard parity)
+    bit  15:    UNIVERSAL "has inter-tier reference" (A3-inter); identical
+                across all four consumers
+    NOTE: leaf-size (8/16/32/64) is encoded structurally via `Ctu<const N>`
+    (M:E-G) at the type level, not via header bits.
 
 x266 extension (NOT in PR-X12 scope, future):
     bits 0-1:   header_kind, now 4 variants
@@ -214,16 +220,17 @@ PR-X12 + 3DGS anchor (single anchor for the clip):
 → HEVC wins by ~25% for native (1080p, 30 fps) playback.
 
 BUT for 4K @ 60 fps playback:
-    HEVC: re-encode at 4K/60fps target = 4 × 4 × 6.25 = 100 MB
-            (or super-res upscaling at decode = 6.25 MB + neural inference)
+    HEVC: re-encode at 4K/60fps target = 4 (res) × 2 (fps) × 6.25 = 50 MB
+            (4× pixel scaling × 2× framerate scaling × 6.25 MB native bitrate;
+             or super-res upscaling at decode = 6.25 MB + neural inference)
     PR-X12 + 3DGS: same 8.3 MB
             decoder rasterizes at (4K, 60 fps); the math is in the scene
 
-→ PR-X12 wins by 12× for high-resolution playback,
+→ PR-X12 wins by ~6× for high-resolution playback,
    AND playback is deterministic (no neural model versioning).
 ```
 
-**Where the crossover sits:** PR-X12 + 3DGS becomes a win when the playback target (W × H × fps) exceeds the encode target by ~3×. At 1× (native), HEVC is a hair cheaper. At 12× (4K@60 from 1080p@24), PR-X12 dominates.
+**Where the crossover sits:** PR-X12 + 3DGS becomes a win when the playback target (W × H × fps) exceeds the encode target by ~1.3× (the point at which HEVC's re-encoded size crosses the fixed 8.3 MB PR-X12 budget). At 1× (native), HEVC is a hair cheaper. At 8× pixel-bandwidth (4K@60 from 1080p@30), PR-X12 dominates by ~6×.
 
 This matches the intuition that **3DGS is a scene model**, not a frame model — its compression ratio improves with resolution, while HEVC's degrades.