Skip to content

Commit a075e12

Browse files
committed
fix(codec): delete BASIN_NONE sentinel — restore full 4096-entry HHTL codebook
PR #195's follow-up commit 2423298 resolved the BASIN_NONE/MAX_BASIN_IDX collision by shrinking MAX_BASIN_IDX to 4094 with BASIN_NONE = 4095 as a distinct sentinel. The shrink was the wrong fix — the sentinel itself was the bug. Reference (the authoritative source the code drifted from): src/hpc/ogit_bridge/assets/cognitive/entities/Leaf.ttl — the HHTL ontology defines the codebook as 16 Hips × 16 Twigs × 16 Leaves = 4096 Leaves per Heel, every Leaf carrying a real basinSignature. No slot is reserved for absence; the ontology forbids "no basin". What BASIN_NONE actually was: authoring-time epistemic uncertainty ("encoder hasn't decided yet for this cell") smuggled into wire-format ontological uncertainty ("this cell exists but has no basin"). A category error — Rust's idiom would have surfaced this as Option<u16> in the encoder's transient scratch state and a plain u16 on the persistent record. The sentinel value bypassed that hygiene by hiding optionality inside a magic u16, invisible to the typechecker. Confirmation that the sentinel was load-bearing for nothing: full audit of BASIN_NONE producers across src/ found zero — no LeafCu::skip(...), no basin_idx: BASIN_NONE anywhere. The only references were the const definition, a re-export, one regression test that constructed it, and predict.rs:is_no_basin (a predicate with no callers that emit BASIN_NONE leaves). Defensive infrastructure for a feature the ontology forbids and the implementation never used. Changes: - mode.rs: MAX_BASIN_IDX = (1 << 12) - 1 = 4095 (full 12-bit range, 4096 real basins). BASIN_NONE deleted. The 2 sentinel-distinctness regression tests replaced with one test asserting the full range. Docs reframed around the HHTL ontology + the Option<u16> scratch-state contract. - predict.rs: is_no_basin deleted; its only doctest+test deleted. BASIN_NONE import dropped. - mod.rs: BASIN_NONE and is_no_basin removed from re-exports. - pr-x12-codec-cognitive-substrate-mapping.md §4.1/§4.3/§10.1/§12: framing updated. §4.3 now records the full arc (collision → shrink → delete) as design archeology rather than as an open issue. Validation: - cargo check --features codec --lib → clean - cargo clippy --features codec --lib -- -D warnings → clean - cargo test --features codec --lib hpc::codec → 54 tests pass - cargo test --features codec --doc hpc::codec → 13 doctests pass Net diff: -77, +38 lines. Pure subtraction modulo doc reframe.
1 parent 4d0ec96 commit a075e12

4 files changed

Lines changed: 38 additions & 77 deletions

File tree

.claude/knowledge/pr-x12-codec-cognitive-substrate-mapping.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ This is **what DeepSpeed-ZeRO does informally** with `bf16_compress`, `int8_comp
122122

123123
### 4.1 The 12-bit basin = 4096-entry vocabulary
124124

125-
`MAX_BASIN_IDX = (1 << 12) - 2 = 4094` (`mode.rs:79`), with `BASIN_NONE = 4095` reserved as the absent-basin sentinel — the 12-bit header field encodes `0..=4094` for real basins plus `4095` for "no basin assigned". Each `LeafCu` carries a 12-bit index into the per-frame basin codebook. For:
125+
`MAX_BASIN_IDX = (1 << 12) - 1 = 4095` (`mode.rs:79`). The full 12-bit range addresses 4096 real basins — every `LeafCu` carries an index into a fully-populated per-Heel codebook. No slot is reserved as a sentinel: the HHTL ontology (`Heel > Hip > Twig > Leaf`, see `src/hpc/ogit_bridge/assets/cognitive/entities/Leaf.ttl`) defines the codebook as `16 Hips × 16 Twigs × 16 Leaves = 4096 Leaves per Heel`, every Leaf carrying a real `basinSignature`. Authoring-time uncertainty ("not yet decided") stays in the encoder's `Option<u16>` scratch state and never leaks onto the wire. For:
126126

127127
- **Video**: 4096 palette entries per GOP — orders of magnitude more than HEVC SCC's 64-entry cap
128128
- **Splats**: 4096 splat archetypes (colour clusters × scale clusters × view-direction clusters) — covers a non-toy scene
@@ -145,9 +145,16 @@ The SCC team had to cap palette at 64 entries rebuilt per-CTU because that was t
145145

146146
**Holy grail claim H-1**: PR-X12 + cam_pq gives you the screen-content video codec HEVC SCC was trying to be in 2013 — without retrofitting, just by composing existing modules. Cite this when somebody asks "why is the basin field 12 bits and not 8 like HEVC SCC".
147147

148-
### 4.3 BASIN_NONE sentinel collision (resolved in PR #195)
148+
### 4.3 BASIN_NONE — the sentinel that shouldn't have existed
149149

150-
Original bug: `BASIN_NONE = MAX_BASIN_IDX = 4095` (pre-fix `mode.rs`) — basin 4095 was ambiguous on the wire (real basin vs "no basin" sentinel). Shipped fix in PR #195 (commit `24232985`): `MAX_BASIN_IDX = (1 << 12) - 2 = 4094`, `BASIN_NONE = 4095` reserved. Costs one codebook entry (irrelevant for k-means usage). Originally flagged by CodeRabbit; merged.
150+
Arc summary (for archeology):
151+
152+
- **Original state** (PR #195 first push): `BASIN_NONE = MAX_BASIN_IDX = 4095` — same value used both as "max valid basin" and "no basin assigned" sentinel. Ambiguous on the wire. CodeRabbit flagged.
153+
- **First fix** (PR #195 follow-up commit `24232985`): shrunk `MAX_BASIN_IDX` to 4094, kept `BASIN_NONE = 4095` distinct. Resolved the ambiguity but sacrificed one codebook entry and propagated the sentinel into the wire format.
154+
- **Reference check**: the HHTL ontology (`Leaf.ttl`) requires 4096 real Leaves per Heel, every Leaf a real basin. The sentinel itself was the wrong design, not just its value.
155+
- **Final fix**: `BASIN_NONE` and `is_no_basin` deleted entirely; `MAX_BASIN_IDX` restored to `(1 << 12) - 1 = 4095`. Authoring-time "not yet decided" uncertainty stays as an `Option<u16>` in encoder scratch state and is resolved before any leaf reaches the wire. The wire format only ever sees committed basins.
156+
157+
**The deeper read**: `BASIN_NONE` was authoring-time epistemic uncertainty (encoder mid-decision) smuggled into wire-format ontological uncertainty (cell exists but has no basin) — a category error. Rust's idiom would have surfaced this as `Option<u16>` in the encoder's transient state and `u16` on the persistent record. The sentinel value bypassed that hygiene by hiding optionality inside a magic `u16` value. The cleanup restores the two-types-for-two-lifecycles separation.
151158

152159
---
153160

@@ -300,9 +307,9 @@ The mappings above are dense but specific. The holy grail claims are general and
300307
- ✅ Escape allocator collision (P1) → `Option<&mut u32>` cursor
301308
- ✅ NESW/NEWS doc mismatch (P1) → explicit slot table
302309

303-
**Resolved in PR #195 follow-up (commit `24232985`)**:
304-
-`BASIN_NONE == MAX_BASIN_IDX == 4095` ambiguity → `MAX_BASIN_IDX = 4094`, `BASIN_NONE = 4095` reserved
305-
-`pack_leaf` `unwrap_or` fallbacks → switched to `?` operator (bijective serialisation; 3 regression tests added)
310+
**Resolved across PR #195 + follow-up**:
311+
-`pack_leaf` `unwrap_or` fallbacks (PR #195 commit `24232985`) → switched to `?` operator (bijective serialisation; 3 regression tests added)
312+
-`BASIN_NONE` sentinel removed entirely (follow-up to PR #195): `MAX_BASIN_IDX = 4095` restores full 4096-entry HHTL codebook; authoring-time uncertainty lives in encoder `Option<u16>` scratch, not the wire format. See §4.3 for the arc.
306313

307314
Both originally listed in §12 below; entries updated.
308315

@@ -451,7 +458,7 @@ Both originally listed in §12 below; entries updated.
451458
- None currently.
452459

453460
**Severity P1** (fix before next-sub-card):
454-
- ~~*T-1*: `BASIN_NONE == MAX_BASIN_IDX` collision (`mode.rs:79`).~~ **RESOLVED** via PR #195 commit `24232985`: `MAX_BASIN_IDX = 4094, BASIN_NONE = 4095`. Costs 1 codebook entry.
461+
- ~~*T-1*: `BASIN_NONE == MAX_BASIN_IDX` collision.~~ **RESOLVED** via two-step landing: PR #195 commit `24232985` shrunk `MAX_BASIN_IDX` to 4094 with `BASIN_NONE = 4095` as a distinct sentinel; the follow-up cleanup deleted `BASIN_NONE` and `is_no_basin` entirely and restored `MAX_BASIN_IDX = 4095`. Reference: the HHTL ontology (`Leaf.ttl`) defines 4096 real Leaves per Heel — no slot reserved for absence. See §4.3.
455462
- ~~*T-2*: `pack_leaf` `unwrap_or` fallbacks (`mode.rs:194-210`).~~ **RESOLVED** via PR #195 commit `24232985`: switched to `?` operator; 3 regression tests added (`leaf_pack_rejects_malformed_{merge,delta,escape}_without_*`).
456463

457464
**Severity P2** (fix in follow-up):

src/hpc/codec/mod.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ pub mod predict;
3232
pub use ctu::{CellMode, MergeDir, MAX_QUAD_TREE_NODES, MAX_SPLIT_DEPTH};
3333
pub use ctu::{Ctu, CtuArena, CtuPartition, LeafCu, MaxSplitDepthReached, MergeError, NodeIdx};
3434
pub use mode::{
35-
pack_header, pack_leaf, pack_merge_dir, packed_byte_len, unpack_header, unpack_leaf, unpack_merge_dir, BASIN_NONE,
35+
pack_header, pack_leaf, pack_merge_dir, packed_byte_len, unpack_header, unpack_leaf, unpack_merge_dir,
3636
MAX_BASIN_IDX,
3737
};
38-
pub use predict::{is_no_basin, predict_intra, IntraConfig, IntraContext};
38+
pub use predict::{predict_intra, IntraConfig, IntraContext};

src/hpc/codec/mode.rs

Lines changed: 22 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -59,40 +59,25 @@ use super::ctu::{CellMode, LeafCu, MergeDir};
5959
// Header pack / unpack (16-bit)
6060
// ════════════════════════════════════════════════════════════════════
6161

62-
/// Maximum encodable real `basin_idx`. Equal to `(1 << 12) - 2 = 4094`
63-
/// so that the all-ones 12-bit pattern (`0xFFF = 4095`) is reserved as
64-
/// the [`BASIN_NONE`] sentinel — without that reservation, basin 4095
65-
/// would round-trip ambiguously with "no basin assigned".
62+
/// Maximum encodable `basin_idx`. Equal to `(1 << 12) - 1 = 4095`,
63+
/// the full 12-bit range. Every value `0..=MAX_BASIN_IDX` addresses a
64+
/// real basin in the per-Heel codebook — there is no reserved sentinel.
6665
///
67-
/// The on-wire 12-bit field still holds any value `0..=0xFFF`; only the
68-
/// encoder's *valid-basin* range is restricted to `0..=MAX_BASIN_IDX`.
69-
/// [`BASIN_NONE`] is encodable in the header field too (when an encoder
70-
/// emits a "no basin" record), but it must never appear as a real basin
71-
/// codebook index.
66+
/// The HHTL ontology (`Heel > Hip > Twig > Leaf`, see the entity TTLs
67+
/// under `src/hpc/ogit_bridge/assets/cognitive/entities/`) defines the
68+
/// codebook as `16 Hips × 16 Twigs × 16 Leaves = 4096 Leaves per Heel`,
69+
/// every Leaf carrying a real `basinSignature`. Absence is not a state:
70+
/// authoring-time "not yet decided" lives in the encoder's `Option<u16>`
71+
/// scratch state, never on the wire.
7272
///
7373
/// ```
74-
/// use ndarray::hpc::codec::{BASIN_NONE, MAX_BASIN_IDX};
75-
/// assert_eq!(MAX_BASIN_IDX, (1 << 12) - 2);
76-
/// assert_eq!(MAX_BASIN_IDX, 4094);
77-
/// assert!(MAX_BASIN_IDX < BASIN_NONE);
74+
/// use ndarray::hpc::codec::MAX_BASIN_IDX;
75+
/// assert_eq!(MAX_BASIN_IDX, (1 << 12) - 1);
76+
/// assert_eq!(MAX_BASIN_IDX, 4095);
7877
/// ```
79-
pub const MAX_BASIN_IDX: u16 = (1 << 12) - 2; // 4094
80-
81-
/// Tag inside the per-frame basin codebook for "no basin assigned"
82-
/// (encoder-side sentinel during mode decision). Equal to `0xFFF`
83-
/// (the all-ones 12-bit pattern) so it sits one slot above the highest
84-
/// real basin index ([`MAX_BASIN_IDX`]).
85-
///
86-
/// ```
87-
/// use ndarray::hpc::codec::{BASIN_NONE, MAX_BASIN_IDX};
88-
/// assert_eq!(BASIN_NONE, 4095);
89-
/// assert_eq!(BASIN_NONE, MAX_BASIN_IDX + 1);
90-
/// ```
91-
pub const BASIN_NONE: u16 = (1 << 12) - 1;
78+
pub const MAX_BASIN_IDX: u16 = (1 << 12) - 1; // 4095
9279

9380
/// Private: 12-bit mask for the basin field of the packed header.
94-
/// Independent of [`MAX_BASIN_IDX`] so that [`BASIN_NONE`] (which sits
95-
/// in the 12-bit field but is not a real basin) still round-trips.
9681
const BASIN_FIELD_MASK: u16 = 0x0FFF;
9782

9883
/// Pack `(mode, basin_idx)` into a 16-bit header.
@@ -442,26 +427,18 @@ mod tests {
442427
}
443428

444429
#[test]
445-
fn basin_none_distinct_from_max_basin_idx() {
446-
// Regression for the BASIN_NONE/MAX_BASIN_IDX collision: the
447-
// sentinel must sit one slot above the highest real basin so
448-
// basin 4094 is unambiguously "a real basin" and 4095 is
449-
// unambiguously "no basin assigned".
450-
assert_eq!(MAX_BASIN_IDX, 4094);
451-
assert_eq!(BASIN_NONE, 4095);
452-
assert!(MAX_BASIN_IDX < BASIN_NONE);
430+
fn max_basin_idx_fills_full_12bit_range() {
431+
// The codec follows the HHTL ontology: 16 × 16 × 16 = 4096 Leaves
432+
// per Heel, every value `0..=MAX_BASIN_IDX` is a real basin. No
433+
// sentinel slot is reserved.
434+
assert_eq!(MAX_BASIN_IDX, (1 << 12) - 1);
435+
assert_eq!(MAX_BASIN_IDX, 4095);
453436
}
454437

455438
#[test]
456-
fn header_round_trips_max_basin_idx_and_basin_none_distinctly() {
457-
// Both values fit in the 12-bit field; the encoder treats them
458-
// as different. (Decoders that route on BASIN_NONE need to
459-
// compare against the sentinel explicitly.)
460-
let real = pack_header(CellMode::Skip, MAX_BASIN_IDX);
461-
let none = pack_header(CellMode::Skip, BASIN_NONE);
462-
assert_ne!(real, none);
463-
assert_eq!(unpack_header(real), (CellMode::Skip, MAX_BASIN_IDX));
464-
assert_eq!(unpack_header(none), (CellMode::Skip, BASIN_NONE));
439+
fn header_round_trips_max_basin_idx() {
440+
let h = pack_header(CellMode::Skip, MAX_BASIN_IDX);
441+
assert_eq!(unpack_header(h), (CellMode::Skip, MAX_BASIN_IDX));
465442
}
466443

467444
#[test]

src/hpc/codec/predict.rs

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@
5858
//! reference + reconstruction parity test pin the math.
5959
6060
use super::ctu::{CellMode, LeafCu, MergeDir};
61-
use super::mode::BASIN_NONE;
6261

6362
// ════════════════════════════════════════════════════════════════════
6463
// Inputs to the encoder mode decision
@@ -293,21 +292,6 @@ fn merge_dir_from_index(i: usize) -> MergeDir {
293292
}
294293
}
295294

296-
/// Sanity-check sentinel: returns `true` iff the resolved basin index
297-
/// is the "no basin" marker. Encoders that compute basins lazily can
298-
/// short-circuit Skip/Merge/Delta and emit Escape directly when this
299-
/// fires.
300-
///
301-
/// ```
302-
/// use ndarray::hpc::codec::{is_no_basin, BASIN_NONE};
303-
/// assert!(is_no_basin(BASIN_NONE));
304-
/// assert!(!is_no_basin(0));
305-
/// ```
306-
#[inline]
307-
pub fn is_no_basin(basin_idx: u16) -> bool {
308-
basin_idx == BASIN_NONE
309-
}
310-
311295
// ════════════════════════════════════════════════════════════════════
312296
// Tests
313297
// ════════════════════════════════════════════════════════════════════
@@ -478,13 +462,6 @@ mod tests {
478462
assert_eq!(decoded, leaf);
479463
}
480464

481-
#[test]
482-
fn is_no_basin_sentinel_round_trip() {
483-
assert!(is_no_basin(BASIN_NONE));
484-
assert!(!is_no_basin(0));
485-
assert!(!is_no_basin(100));
486-
}
487-
488465
#[test]
489466
fn overflow_delta_does_not_alias_to_merge() {
490467
// Regression for the wrapping-cast Merge alias bug:

0 commit comments

Comments
 (0)