Skip to content

Commit 4d0ec96

Browse files
committed
docs(codec): address PR #196 CodeRabbit + Codex review
Two CodeRabbit MD040/MD027 findings + one Codex P2 stale-fact finding, all verified valid against current code. CodeRabbit (markdownlint): - pr-x12-codec-cognitive-substrate-mapping.md: 4 untyped code fences → ```text (Skip/Merge/Delta block at line 56; depth-0..3 quad-tree blocks for spatial/attention/gradient hierarchies at lines 80/91/104) - pr-x12-substrate-merged-canon.md: collapse `> - ` to `> - ` (lines 6-7, MD027 multi-space-after-blockquote), 2 untyped fences → ```text (architecture-rule list at line 165, sequencing diagram at line 411), and "delta IS" → "delta is" stylistic Codex P2 (stale doc vs shipped code): - Doc cited `MAX_BASIN_IDX = 4095` and framed BASIN_NONE collision as "pending / not yet merged / not yet pushed" — but PR #195 commit 2423298 already shipped `MAX_BASIN_IDX = 4094` with `BASIN_NONE = 4095` reserved sentinel, plus bijective `pack_leaf` via `?` operator with 3 regression tests. Updated: - §4.1 line 125: `MAX_BASIN_IDX = 4094` + sentinel-range explainer - §4.3 heading + body: "(resolved in PR #195)" with commit cite - §"Still open on PR #195" block: → "Resolved in PR #195 follow-up" - §12 debt-inventory T-1, T-2: marked ~~RESOLVED~~ with commit ref No content edits beyond what the findings asked for; citation IDs unchanged.
1 parent 4529c01 commit 4d0ec96

2 files changed

Lines changed: 18 additions & 18 deletions

File tree

.claude/knowledge/pr-x12-codec-cognitive-substrate-mapping.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
5353

5454
Two bits = four modes. Not more, not fewer. The four modes form a **strict cost lattice**:
5555

56-
```
56+
```text
5757
Skip (2 bytes total) ⊂ free
5858
Merge (3 bytes) ⊂ borrow from neighbour
5959
Delta (3 bytes) ⊂ store quantized perturbation
@@ -77,7 +77,7 @@ Monotone cost ordering is what lets `predict_intra` use a "first-fit cheapest" d
7777

7878
### 3.1 Spatial hierarchy (HEVC's original use)
7979

80-
```
80+
```text
8181
depth 0 (64×64 CTU) ↔ one BlockedGrid L1 block (`ctu.rs:236`)
8282
depth 1 (32×32 split) ↔ one CU at split-level 1
8383
depth 2 (16×16 split) ↔ ...
@@ -88,7 +88,7 @@ depth 3 (8×8 leaf) ↔ leaf CU (smallest cognitively-meaningful unit)
8888

8989
### 3.2 Attention hierarchy (rediscovery as transformer)
9090

91-
```
91+
```text
9292
depth 0 ↔ one attention layer
9393
depth 1 ↔ one attention head (32×32 = 1024 attention slots)
9494
depth 2 ↔ one multi-query-attention group (4 heads sharing KV)
@@ -101,7 +101,7 @@ Mistral / Llama4 sliding-window attention is **exactly depth-3 leaf processing**
101101

102102
### 3.3 Gradient-update hierarchy (the optimizer mapping)
103103

104-
```
104+
```text
105105
depth 0 ↔ "should this parameter tensor be touched this step?"
106106
depth 1 ↔ "which block of this tensor needs update?"
107107
depth 2 ↔ "which row of that block?"
@@ -122,7 +122,7 @@ This is **what DeepSpeed-ZeRO does informally** with `bf16_compress`, `int8_comp
122122

123123
### 4.1 The 12-bit basin = 4096-entry vocabulary
124124

125-
`MAX_BASIN_IDX = (1 << 12) - 1 = 4095` (`mode.rs:71`). Each `LeafCu` carries a 12-bit index into the per-frame basin codebook. For:
125+
`MAX_BASIN_IDX = (1 << 12) - 2 = 4094` (`mode.rs:79`), with `BASIN_NONE = 4095` reserved as the absent-basin sentinel — the 12-bit header field encodes `0..=4094` for real basins plus `4095` for "no basin assigned". Each `LeafCu` carries a 12-bit index into the per-frame basin codebook. For:
126126

127127
- **Video**: 4096 palette entries per GOP — orders of magnitude more than HEVC SCC's 64-entry cap
128128
- **Splats**: 4096 splat archetypes (colour clusters × scale clusters × view-direction clusters) — covers a non-toy scene
@@ -145,9 +145,9 @@ The SCC team had to cap palette at 64 entries rebuilt per-CTU because that was t
145145

146146
**Holy grail claim H-1**: PR-X12 + cam_pq gives you the screen-content video codec HEVC SCC was trying to be in 2013 — without retrofitting, just by composing existing modules. Cite this when somebody asks "why is the basin field 12 bits and not 8 like HEVC SCC".
147147

148-
### 4.3 BASIN_NONE sentinel collision (PR #195 open issue)
148+
### 4.3 BASIN_NONE sentinel collision (resolved in PR #195)
149149

150-
`BASIN_NONE = MAX_BASIN_IDX = 4095` (`mode.rs:79`) — basin 4095 is ambiguous on the wire (real basin vs "no basin" sentinel). Fix in PR #195: `MAX_BASIN_IDX = 4094`, `BASIN_NONE = 4095`. Costs one codebook entry (irrelevant for k-means usage). Flagged by CodeRabbit, not yet pushed. **Tracked in §12 below.**
150+
Original bug: `BASIN_NONE = MAX_BASIN_IDX = 4095` (pre-fix `mode.rs`) — basin 4095 was ambiguous on the wire (real basin vs "no basin" sentinel). Shipped fix in PR #195 (commit `24232985`): `MAX_BASIN_IDX = (1 << 12) - 2 = 4094`, `BASIN_NONE = 4095` reserved. Costs one codebook entry (irrelevant for k-means usage). Originally flagged by CodeRabbit; merged.
151151

152152
---
153153

@@ -300,11 +300,11 @@ The mappings above are dense but specific. The holy grail claims are general and
300300
- ✅ Escape allocator collision (P1) → `Option<&mut u32>` cursor
301301
- ✅ NESW/NEWS doc mismatch (P1) → explicit slot table
302302

303-
**Still open on PR #195**:
304-
- 🟡 `BASIN_NONE == MAX_BASIN_IDX == 4095` ambiguity → shrink MAX_BASIN_IDX to 4094
305-
- 🟡 `pack_leaf` `unwrap_or` fallbacks → switch to `?` operator (non-bijective serialisation)
303+
**Resolved in PR #195 follow-up (commit `24232985`)**:
304+
- `BASIN_NONE == MAX_BASIN_IDX == 4095` ambiguity → `MAX_BASIN_IDX = 4094`, `BASIN_NONE = 4095` reserved
305+
- `pack_leaf` `unwrap_or` fallbacks → switched to `?` operator (bijective serialisation; 3 regression tests added)
306306

307-
Track in §12 below.
307+
Both originally listed in §12 below; entries updated.
308308

309309
### 10.2 PR-X12 A4 — transform
310310

@@ -451,8 +451,8 @@ Track in §12 below.
451451
- None currently.
452452

453453
**Severity P1** (fix before next-sub-card):
454-
- *T-1*: `BASIN_NONE == MAX_BASIN_IDX` collision (`mode.rs:79`). Fix: `MAX_BASIN_IDX = 4094, BASIN_NONE = 4095`. Costs 1 codebook entry. **Flagged by CodeRabbit on PR #195, not yet merged.**
455-
- *T-2*: `pack_leaf` `unwrap_or` fallbacks (`mode.rs:194-210`). Make encode bijective: `leaf.merge_dir?` etc. Malformed `LeafCu` should be a None return, not a silent rewrite. **Flagged by CodeRabbit on PR #195, not yet merged.**
454+
- ~~*T-1*: `BASIN_NONE == MAX_BASIN_IDX` collision (`mode.rs:79`).~~ **RESOLVED** via PR #195 commit `24232985`: `MAX_BASIN_IDX = 4094, BASIN_NONE = 4095`. Costs 1 codebook entry.
455+
- ~~*T-2*: `pack_leaf` `unwrap_or` fallbacks (`mode.rs:194-210`).~~ **RESOLVED** via PR #195 commit `24232985`: switched to `?` operator; 3 regression tests added (`leaf_pack_rejects_malformed_{merge,delta,escape}_without_*`).
456456

457457
**Severity P2** (fix in follow-up):
458458
- *T-3*: A3-intra currently scans NEWS without RDO; replace with λ-weighted RDO when A6 lands. Today's first-fit policy is the right default for λ=0 but suboptimal for typical λ.

.claude/knowledge/pr-x12-substrate-merged-canon.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
> Date: 2026-05-22
44
> Status: **MERGED CANON** — synthesises two parallel sessions' findings into one doc
55
> Supersedes (for new content; keep originals for archeology):
6-
> - `pr-x12-codec-cognitive-substrate-mapping.md` (session A: opus 4.7 main thread, this branch)
7-
> - `pr-x12-cross-domain-synergies.md` (session B: parallel thread, PR #195 branch, commit `01c77ccc`)
6+
> - `pr-x12-codec-cognitive-substrate-mapping.md` (session A: opus 4.7 main thread, this branch)
7+
> - `pr-x12-cross-domain-synergies.md` (session B: parallel thread, merged via PR #195, commit `01c77ccc`)
88
> Sister doc: `pr-x12-codec-x265-design.md` (the mechanical spec, untouched)
99
1010
---
@@ -162,7 +162,7 @@ A's T-16/T-17 (cross-repo dep direction problem) + B's D-STACK-6/D-STACK-12 (Lan
162162
The resolution is **already implicit** in the merged claim: after PR-X12 stabilises, extract `crate::hpc::codec::*` into a sibling crate `ndarray-codec`. Both `ndarray` and `lance-graph` then depend on it. The codec lives at the dep-bottom layer not as "ndarray hardware" but as **its own architectural category**.
163163

164164
→ Action: add a **fifth category** to the architecture rule in CLAUDE.md:
165-
```
165+
```text
166166
- ndarray = hardware (SIMD, Palette, Base17, SpoDistanceMatrices, read_bgz7_file)
167167
- ndarray-codec = compression substrate (Ctu, LeafCu, predict_intra, rANS) ← NEW
168168
- lance-graph = thinking (NarsTruth, NarsEngine, TripleModel, AutocompleteCache)
@@ -300,7 +300,7 @@ Merge of A's H-1..H-7 + B's HG1..HG6 + two new M:H-* claims that emerge from the
300300

301301
**M:H-NEW-1** — The same Rust binary consumes (4K video frames | 1M-Gaussian 3DGS scene | 7B-LLM gradient stream | attention KV cache) and emits a compressed Lance column. One CLI. One codec. Four loads. **This is the falsifiability test** — build it (Plan G, the bench harness), prove HG1/H-7 by demonstration, not by argument.
302302

303-
**M:H-NEW-2**`trait PredictiveSignal` + `trait LinearReduce<Basis>` + `trait CurveOrder<const N: usize>` factor the codec into three plug-points (per M:E-E + M:E-A + M:E-B). The codec body is `<150 LoC of generic glue. Domain consumers ship `<200 LoC` of trait impls. **Total stack for all four industries: ~2 KLoC.** Compared to ~50 KLoC per-domain implementations elsewhere. The 25× code-density delta IS the architectural payoff that justifies the eight sub-cards.
303+
**M:H-NEW-2**`trait PredictiveSignal` + `trait LinearReduce<Basis>` + `trait CurveOrder<const N: usize>` factor the codec into three plug-points (per M:E-E + M:E-A + M:E-B). The codec body is `<150 LoC of generic glue. Domain consumers ship `<200 LoC` of trait impls. **Total stack for all four industries: ~2 KLoC.** Compared to ~50 KLoC per-domain implementations elsewhere. The 25× code-density delta is the architectural payoff that justifies the eight sub-cards.
304304

305305
---
306306

@@ -408,7 +408,7 @@ Replaces both A:§10 and B:§5 plan lists. Critical path resolved per M:E-F.
408408

409409
## 6. Sequencing diagram
410410

411-
```
411+
```text
412412
┌──────────────────────────────────────┐
413413
│ Plan G (multi-domain bench) │
414414
│ 2 weeks — UNFALSIFIABILITY GATE │

0 commit comments

Comments
 (0)