Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 337 additions & 0 deletions .claude/knowledge/pr-x12-anti-neural-lookup-inversion.md

Large diffs are not rendered by default.

471 changes: 471 additions & 0 deletions .claude/knowledge/pr-x12-bgz-jc-substrate-synergies.md

Large diffs are not rendered by default.

396 changes: 396 additions & 0 deletions .claude/knowledge/pr-x12-cam-pq-sigker-dn-tree-substrate-bindings.md

Large diffs are not rendered by default.

424 changes: 424 additions & 0 deletions .claude/knowledge/pr-x12-canon-resolutions-delta.md

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions .claude/knowledge/pr-x12-codec-cognitive-substrate-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@
> Scope: ndarray codec ↔ Gaussian splat ↔ cognitive shaders ↔ blasgraph/MKL ↔ gradient optimization
> Status: **survives compaction** — load-bearing claim mapping + integration plan + debt inventory
> Companion to: `pr-x12-codec-x265-design.md` (the as-shipped HEVC-analog spec) — this doc is the *generalisation* of that spec across the rest of the stack
>
> **Post-merge formalisation (2026-05-22):** the bench / cost / dep-direction claims below have been numbered and pinned in `pr-x12-canon-resolutions-delta.md`:
> - §4.1 (4096-entry basin codebook) → **R-10** (sub-1-bit commitment), **R-13** (federated codebook policy)
> - §5.3 (DCT-II / GEMM crossover) → **R-5** (per-arch crossover constants, bench-tuned)
> - §13.1 (block-matched ME → batched i8 GEMM) → **R-6** (ME via SSD identity, VNNI path)
> - §13.3 (CTU partition as tropical-GEMM) → **R-7** (kernel home in `lance-graph::blasgraph`, dep direction allowed)
> - Plan G (bench harness) → **R-4** (architecture-conditional gate), **R-11** (latency assertions per stage)
>
> Perspective lenses written 2026-05-22 (sibling docs):
> `pr-x12-x265-blasgraph-gemm.md` · `pr-x12-x266-3dgs-spacetime-upscaling.md` · `pr-x12-woa-multiarch-orchestration.md` · `pr-x12-anti-neural-lookup-inversion.md` · `pr-x12-gguf-llm-weights-encoding.md` · **`pr-x12-bgz-jc-substrate-synergies.md`** (grounds PR-X12 in already-implemented `bgz17`/`bgz-tensor`/`bgz-hhtl-d`/`jc` crates)

---

Expand Down Expand Up @@ -120,6 +130,8 @@ This is **what DeepSpeed-ZeRO does informally** with `bf16_compress`, `int8_comp

## 4. Palette / basin codebook — what HEVC SCC tried and missed

> [Codebook lifecycle pinned post-merge as **R-13**: the codec exposes the basin codebook as a swappable handle (LocalEphemeral | SharedClusterWide | SharedRegional | PretrainedStatic). The 4096-entry capacity claim below is unchanged; what's new is that the codebook is *not baked* into the codec — orchestration (q2 / woa-rs) picks the right one per request.]

### 4.1 The 12-bit basin = 4096-entry vocabulary

`MAX_BASIN_IDX = (1 << 12) - 1 = 4095` (`mode.rs:79`). The full 12-bit range addresses 4096 real basins — every `LeafCu` carries an index into a fully-populated per-Heel codebook. No slot is reserved as a sentinel: the HHTL ontology (`Heel > Hip > Twig > Leaf`, see `src/hpc/ogit_bridge/assets/cognitive/entities/Leaf.ttl`) defines the codebook as `16 Hips × 16 Twigs × 16 Leaves = 4096 Leaves per Heel`, every Leaf carrying a real `basinSignature`. Authoring-time uncertainty ("not yet decided") stays in the encoder's `Option<u16>` scratch state and never leaks onto the wire. For:
Expand Down Expand Up @@ -171,6 +183,8 @@ This is **the most underrated** of the four mappings. Optimizer research treats

### 5.3 The DCT-II / GEMM tradeoff (for downstream batched encode)

> [Resolved post-merge as **R-5**: per-arch crossover constants, calibrated by Plan G's `codec-bench`. Concrete defaults landed in canon-resolutions-delta §R-5 — SPR=64, ICX=32, Zen4=96, Apple M=256, Graviton=128. See `pr-x12-x265-blasgraph-gemm.md` §2.2 for the full GEMM-form derivation.]

Single 32×32 DCT-II via butterflies: ~80 ops. Same via GEMM (`C = A @ DCT_BASIS`): ~32K ops. **Per-block, butterfly wins by 400×**. But:

- For a 4K frame with ~1024 CUs, batched GEMM amortises hardware fusion
Expand Down Expand Up @@ -496,6 +510,8 @@ Six places where blasgraph + MKL change the algorithmic complexity, not just con

### 13.1 Block-matched ME → batched i8gemm (E-7)

> [Pinned as **R-6**: SSD-via-GEMM identity is the canonical ME path; the API lives at `ndarray::hpc::blas_level2::batched_ssd_search`. The 50× win is reproduced in the GEMM-lens companion doc; the bench is asserted by Plan G video lane (R-4).]

Classical ME: SAD over 32×32 window. Reformulate as SSD via `||A||² - 2A·B + ||B||²` — middle term is a GEMM. AVX-512 VNNI `i8gemm_i32` does a whole CTU's motion candidates in one call. **~50× over hand-tuned NEON/AVX2 SAD.**

### 13.2 Batched DCT-II via MKL sgemm (E-7-variant)
Expand All @@ -504,6 +520,8 @@ Per-block butterfly wins for single 32×32. Per-frame batched `C = A_batch @ DCT

### 13.3 CTU partition mode-decision as tropical-GEMM (E-8)

> [Pinned as **R-7**: tropical-GEMM kernel lives in `lance-graph::blasgraph::tropical_gemm`; the codec calls into it. The `ndarray-codec → lance-graph` dep direction was confirmed *allowed* post-merge (both are sibling crates above `ndarray::hpc` and below `woa-rs`). See R-7 in the delta doc for the dep-graph audit.]

x265 spends ~30% CPU on recursive partition RDO. Reformulate: each partition is a node in an 85-node DAG, edges = split/merge transitions, weights = ΔRDO. Optimal partition = shortest path. blasgraph's tropical-semiring GEMM (`D ← min(D, D + W)`) solves all partitions in **one batched matrix-relax**. `O(4^d)` → `O(d²)` per CTU.

### 13.4 CABAC context modeling → tiny transformer (E-9)
Expand Down
23 changes: 23 additions & 0 deletions .claude/knowledge/pr-x12-cross-domain-synergies.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,23 @@
> Companion to `.claude/knowledge/pr-x12-codec-x265-design.md` (the
> mechanical design). This doc captures the **why-it-generalizes**
> that the design doc deliberately scopes out.
>
> **Post-merge resolutions (2026-05-22):** the load-bearing claims below
> are now numbered in `pr-x12-canon-resolutions-delta.md`:
> - §E1 (topology-free `MergeDir`) → **R-9** (4-way alphabet stays canonical;
> wider topologies layered, not swapped — `Topology` trait deferred)
> - §HG2 (sub-1-bit-per-Gaussian) → **R-10** (sub-1-bit-per-token via
> Gaussian-tail rANS where source supports it; falsified by Plan G entropy bench)
> - §E9 (splat3d × codec = same pipeline) → **R-1** (`LinearReduce<T>` +
> `Basis<T>` trait surface; codec body never imports a specific basis impl)
> - §Plan A (A7 rANS critical) → **R-3** (codec-body LoC envelope ≤ 1500,
> A7 must fit) + **R-4** (Plan G arch-conditional bench gates the claim)
>
> Perspective lenses landed 2026-05-22:
> `pr-x12-x265-blasgraph-gemm.md` · `pr-x12-x266-3dgs-spacetime-upscaling.md`
> · `pr-x12-woa-multiarch-orchestration.md` · `pr-x12-anti-neural-lookup-inversion.md`
> · `pr-x12-gguf-llm-weights-encoding.md` (the fifth load — static LLM weight tensors)
> · **`pr-x12-bgz-jc-substrate-synergies.md`** (PR-X12 grounded: bgz17/bgz-tensor/bgz-hhtl-d/jc already implement most of the substrate)

## TL;DR

Expand Down Expand Up @@ -186,6 +203,8 @@ literature snapshot I'm working from; **claim** is the right word, not

### E1. **`MergeDir` is a topology, not a direction.**

> [Resolved post-merge as **R-9**: the 4-way alphabet *stays* canonical on the wire — `{N, E, W, S}` discriminant is pinned for HEVC compatibility. Wider topologies (6-way 3D, 8-way diagonal-aware) layer *above* the codec via a `Topology<Mode>` trait, but the wire format does not extend. See `pr-x12-canon-resolutions-delta.md` §R-9 for the rationale: extending the wire alphabet to 6/8 ways would invalidate HEVC's 2-bit `header_kind` field and break the goal of being decodable by spec-conformant HEVC tooling.]

`{North, East, West, South}` happens to be a 2D Cartesian raster
mental model. The codec doesn't care. The discriminant alphabet just
needs to be a 4-way categorical over "which of 4 neighbours did I
Expand Down Expand Up @@ -271,6 +290,8 @@ The user's "Pertuberationslernen" instinct lands here.

### E9. **The `splat3d` PRs 1-7 (May sprint) and the `codec` PRs are the SAME pipeline shifted 90°.**

> [Formalised post-merge as **R-1**: the unified pipeline lives in `ndarray::hpc::LinearReduce<T>`, decomposing into `Basis<T>` (basis-as-data; DCT, EWA splat, wavelet, k-means prototype all are `Basis<T>` impls) and `Reducer<T>` (the reduction: rANS-encode, alpha-composite, sum-reduce, softmax). The codec body dispatches via the trait and *never imports a specific basis impl* — this is what makes the "same pipeline shifted 90°" claim mechanically real.]

The splat3d forward pipeline is: project → tile-bin → mode-decide
(which Gaussian contributes at which pixel) → alpha-composite. The
codec pipeline is: build codebook → block-partition → mode-decide
Expand Down Expand Up @@ -468,6 +489,8 @@ codec for the manifold of predictable codebook-coded signals."*

### HG2. **Sub-1-bit-per-Gaussian 3DGS compression.**

> [Committed post-merge as **R-10**: sub-1-bit-per-token where the source distribution supports it (heavy-tailed residual after basin lookup). The mechanism is basin codebook (12-bit fingerprint → 4096 entries) + Gaussian-tail rANS, both already in scope. Falsifier: Plan G entropy bench at < 1.0 bit-per-token on the held-out Bbb/3DGS test corpus. See R-10 in the delta doc and `pr-x12-anti-neural-lookup-inversion.md` §3.1 for why this lookup-table substrate hits the Shannon bound within ε ≤ 0.2 dB.]

Stock 3DGS: ~250 bytes/Gaussian raw, ~50 bytes after PLY-trim.
PR-X12 mode-coded + A7 rANS: ~3-8 bits/Gaussian for the dominant
modes. **30-60× over current state of the art.** A 1M-Gaussian
Expand Down
Loading
Loading