diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 14e0e5ef..839e1e07 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -2973,3 +2973,90 @@ The architecture's five consumer perspectives are not layers — they're project **SoA vs Functional is not a choice — it's a WHERE.** BindSpace is SoA (columnar storage for SIMD). The algebra on it is Functional (methods on carriers). The SoA carries the state; the Functional methods transform it. Both exist simultaneously on the same data. The "struct of arrays vs object thinks for itself" tension resolves as: the ARRAY is the SoA, the ELEMENT (row, trajectory, fingerprint) thinks for itself via methods. Cross-ref: CLAUDE.md §The Stance (AGI-as-glove, SoA columns ARE the AGI surface), lab-vs-canonical-surface.md (I1-I11 invariants), ExternalMembrane (contract::external_membrane), BindSpace (cognitive-shader-driver::bindspace). + +## 2026-04-26 — FINDING: distance dispatch must be type-intrinsic, not crate-boundary-crossing + +**Status:** FINDING +**Owner scope:** @family-codec-smith, @truth-architect, @host-glove-designer + +The struct-of-arrays (BindSpace, RenderFrame, Arrow columns) carries heterogeneous +fingerprint types that each need a DIFFERENT distance function: + +| Type | Distance | Where it lives | Notes | +|---|---|---|---| +| `Binary16K = [u64; 256]` | Hamming (popcount of XOR) | `ndarray::hpc::bitwise::hamming_distance_raw` | 16384-bit, SIMD VPOPCNTDQ | +| `Vsa16kF32 = [f32; 16_384]` | Cosine → FisherZ transform | `ndarray::hpc::heel_f64x8::cosine_f64_simd` | f32 dot/norm via F32x16 FMA | +| `CamPqCode = [u8; 6]` | ADC (asymmetric distance computation) | `ndarray::hpc::cam_pq::adc_distance` | Precomputed distance tables, O(1) | +| `PaletteEdge = [u8; 3]` | Palette L1 (lookup table) | `ndarray::hpc::palette_distance::SpoDistanceMatrices::distance` | bgz17 256×256 table, 1.8 ns | +| `Base17 = [u8; 17]` | Palette nearest (codebook search) | `bgz17::Palette::nearest` | 256 centroids, should use precomputed table | +| `HighHeelBGZ` container | Cascade (HHTL skip → palette → ADC fallback) | `ndarray::hpc::cascade` + `bgz-tensor::hhtl_cache` | Multi-level, route by `RouteAction` | + +**The problem:** When a SoA column contains mixed types (e.g., one column is Binary16K, +another is CamPqCode), the distance dispatch currently happens at the call site — the +caller must know which distance function to use. This works inside a single crate, but +when the SoA lives in crate A (e.g., `cognitive-shader-driver::BindSpace`) and the +distance kernel lives in crate B (e.g., `ndarray::hpc::bitwise`), every call crosses +a crate boundary. That boundary is zero-cost for `#[inline]` functions, but NOT zero-cost +if the function is generic over a trait object (`dyn DistanceFn`) or involves dynamic +dispatch. + +**The solution — type-intrinsic dispatch, not dynamic dispatch:** + +The distance function should be a method ON the carrier type, not a free function +called FROM the SoA consumer. This follows the "object speaks for itself" doctrine +(CLAUDE.md §The Click): + +```rust +// WRONG — caller must know the distance type: +let d = hamming_distance_raw(fp_a.as_bytes(), fp_b.as_bytes()); // crate boundary + +// RIGHT — the type carries its own distance: +let d = fp_a.distance(&fp_b); // monomorphized, inlined, zero boundary tax +``` + +The contract already has `CodecRoute: Passthrough | CamPq` which names the regime. +What's missing is a `Distance` trait that each carrier implements: + +```rust +pub trait Distance: Sized { + fn distance(&self, other: &Self) -> u32; + fn similarity(&self, other: &Self) -> f32 { + 1.0 - (self.distance(other) as f32 / Self::MAX_DISTANCE as f32) + } + const MAX_DISTANCE: u32; +} +``` + +Implementations: +- `impl Distance for [u64; 256]` → `hamming_distance_raw` (inline, SIMD) +- `impl Distance for CamPqCode` → ADC lookup (precomputed table ref) +- `impl Distance for PaletteEdge` → palette L1 table lookup +- `impl Distance for Vsa16kF32` → cosine → FisherZ (F32x16 FMA) + +The trait monomorphizes at compile time — no dynamic dispatch, no crate boundary +tax. The SoA column iterates with `col.chunks().map(|a, b| a.distance(b))` and +the correct distance function is selected by TYPE, not by runtime enum match. + +**Where this trait should live:** `lance-graph-contract` (zero deps). The +implementations live in ndarray (for SIMD kernels) or in the carrier crate +(for precomputed tables). The contract defines the interface; ndarray provides +the hardware acceleration; the SoA consumer never needs to know which distance +kernel runs. + +**Hard-coded dispatch within the same crate is fine** — when `BindSpace` calls +`hamming_distance_raw` on its `content` column, that's a direct function call +into ndarray, monomorphized and inlined. The problem only arises if we try to +make the SoA generic over distance type via `dyn` trait objects. Don't do that. +Keep the dispatch compile-time via generics or type-specific methods. The SoA +pays zero boundary tax because Rust's monomorphization erases the crate boundary. + +**FisherZ note:** Cosine similarity ∈ [-1, 1] is nonlinear for averaging. The +FisherZ transform `z = atanh(r)` maps it to a normal-distributed variable that +can be averaged, then `r = tanh(z)` maps back. This matters when the SoA +accumulates similarities across columns (e.g., weighted multi-column distance). +The `Distance` trait should expose `fn similarity_z(&self, other: &Self) -> f32` +for the FisherZ-transformed variant, defaulting to `atanh(similarity())`. + +Cross-ref: CLAUDE.md §The Click ("object speaks for itself"), I1 Codec Regime +Split (`CodecRoute`), `contract::cam::DistanceTableProvider` (existing trait for +ADC), `ndarray::hpc::bitwise::hamming_distance_raw`, `ndarray::hpc::palette_distance`. diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index fcf87dde..936539a8 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -1071,3 +1071,56 @@ Cross-ref: `container_bs/dn_redis.rs`; `callcenter-membrane-v1.md` §§595–803 | Diagnostic | TD-INT-11 | All 14 items are additive (add call site). Zero items require type creation or code deletion. + +## 2026-04-26 — TD-DIST-1: Distance trait missing from contract (type-intrinsic dispatch) + +**Status:** Open +**Severity:** Medium (no runtime cost today — hard-coded dispatch works — but blocks +generic SoA distance sweeps) + +The contract has `CodecRoute` (Passthrough | CamPq) naming the regime and +`DistanceTableProvider` for ADC, but no unified `Distance` trait that each +carrier type implements. Today each call site hard-codes which distance +function to use (`hamming_distance_raw` for Binary16K, `adc_distance` for +CamPq, `cosine_f64_simd` for Vsa16kF32). This works but prevents writing +generic distance sweeps over mixed SoA columns. + +**Fix:** Add `pub trait Distance` to `contract::cam` (or a new `contract::distance` +module). Implement for `[u64; 256]`, `CamPqCode`, `PaletteEdge`, `Vsa16kF32`. +Include `similarity_z()` for FisherZ-transformed cosine averaging. +See EPIPHANIES.md 2026-04-26 distance-dispatch entry for full design. + +**Blocked by:** nothing — pure additive. +**Unblocks:** generic SoA distance accumulation, multi-column weighted distance, +render-frame similarity for force-directed layout (CAM-PQ pruning + HHTL cascade). + +## 2026-04-26 — TD-DIST-2: vector_ops.rs still has scalar dot/norm/cosine (4 loops) + +**Status:** Open +**Severity:** High (hot path in DataFusion UDF — L2/cosine queries) + +`vector_ops.rs` lines 140, 160, 179, 189 have 4 independent scalar +`.iter().map().sum()` loops for dot product, norm², cosine similarity. +Should swap for `ndarray::hpc::heel_f64x8::{dot_f64_simd, cosine_f64_simd}`. +Estimated 8-12× speedup (chunked F64x8 FMA vs scalar). + +## 2026-04-26 — TD-DIST-3: bgz17 Palette::nearest() uses brute-force 256×17 L1 + +**Status:** Open +**Severity:** Medium (build-time hot path for palette construction) + +`bgz17/palette.rs` lines 56-65 iterate all 256 centroids per query. +Should use precomputed distance table from `ndarray::hpc::palette_distance`. +Estimated 100× speedup for encoding (O(1) table lookup vs O(256) L1 per query). + +## 2026-04-26 — Paid Debt: TD-DIST-1/2/3 all shipped in commit 8603148 + +- **TD-DIST-1** (Distance trait): `contract::distance` module with `Distance` trait, + `fisher_z_inverse`, `mean_similarity_fisher`. Impls for `[u64; 256]`, `[u8; 6]`, `[u8; 3]`. + 11 tests. Status: **PAID**. +- **TD-DIST-2** (vector_ops scalar→SIMD): `cosine_distance`, `cosine_similarity`, + `dot_product_distance`, `dot_product_similarity` all now delegate to + `ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd` / `dot_f64_simd`. Status: **PAID**. +- **TD-DIST-3** (Palette distance table): `Palette::build_distance_table()` → + `PaletteDistanceTable` with O(1) `distance(a, b)` and `edge_distance(a, b)`. + 128 KB table, L2-resident. Status: **PAID**. diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 4d6731af..0b4a9194 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -16,7 +16,7 @@ concurrency: env: CARGO_TERM_COLOR: always - RUSTFLAGS: "-C debuginfo=1" + RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3" RUST_BACKTRACE: "1" CARGO_INCREMENTAL: "0" diff --git a/.github/workflows/rust-publish.yml b/.github/workflows/rust-publish.yml index 26f660a5..ded7ddc8 100644 --- a/.github/workflows/rust-publish.yml +++ b/.github/workflows/rust-publish.yml @@ -20,7 +20,7 @@ on: env: CARGO_TERM_COLOR: always - RUSTFLAGS: "-C debuginfo=1" + RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3" RUST_BACKTRACE: "1" CARGO_INCREMENTAL: "0" CARGO_BUILD_JOBS: "1" diff --git a/.github/workflows/rust-test.yml b/.github/workflows/rust-test.yml index c14c4533..9ace0568 100644 --- a/.github/workflows/rust-test.yml +++ b/.github/workflows/rust-test.yml @@ -16,7 +16,7 @@ concurrency: env: CARGO_TERM_COLOR: always - RUSTFLAGS: "-C debuginfo=1" + RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3" RUST_BACKTRACE: "1" CARGO_INCREMENTAL: "0" diff --git a/.github/workflows/style.yml b/.github/workflows/style.yml index 75b80b02..8d6ac18f 100644 --- a/.github/workflows/style.yml +++ b/.github/workflows/style.yml @@ -16,7 +16,7 @@ concurrency: env: CARGO_TERM_COLOR: always - RUSTFLAGS: "-C debuginfo=1" + RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3" jobs: format: diff --git a/Dockerfile b/Dockerfile index 56319f93..25786eb1 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,7 +1,10 @@ -# lance-graph — Railway compile-test image +# lance-graph — Railway compile-test image (AVX2 default) # Verifies the workspace builds cleanly (core + bgz17 + planner + contract) # Requires Rust 1.94.0 (LazyLock, modern std APIs) # +# CPU detection & SIMD dispatch documentation: see Dockerfile.md +# AVX-512 pinned variant: see Dockerfile.avx512 +# # Build: docker build -t lance-graph-test . # Run: docker run --rm lance-graph-test @@ -38,6 +41,11 @@ COPY crates/bgz17/Cargo.toml crates/bgz17/Cargo.toml # Copy source COPY crates/ crates/ +# Default target: x86-64-v3 (AVX2) — runs on GitHub CI and most servers. +# Use Dockerfile.avx512 for x86-64-v4 (AVX-512) on Skylake-X / Ice Lake / Sapphire Rapids. +# The .cargo/config.toml pins x86-64-v4 for LOCAL builds; override here for portability. +ENV RUSTFLAGS="-C target-cpu=x86-64-v3" + # Build bgz17 standalone (zero deps, fast check) RUN cargo build --release --manifest-path crates/bgz17/Cargo.toml 2>&1 \ && echo "=== BGZ17 BUILD OK ===" diff --git a/Dockerfile.avx512 b/Dockerfile.avx512 index 464830bb..bf82329d 100644 --- a/Dockerfile.avx512 +++ b/Dockerfile.avx512 @@ -4,6 +4,9 @@ # # ONLY deploy on AVX-512 hardware. # +# CPU detection & SIMD dispatch documentation: see Dockerfile.md +# Portable (AVX2) variant: see Dockerfile +# # Build: docker build -f Dockerfile.avx512 -t lance-graph-avx512 . # Run: docker run --rm lance-graph-avx512 diff --git a/Dockerfile.md b/Dockerfile.md new file mode 100644 index 00000000..dd0ec3f3 --- /dev/null +++ b/Dockerfile.md @@ -0,0 +1,111 @@ +# lance-graph Docker CPU Detection & SIMD Dispatch + +## Three-Tier Build Strategy + +| Target | Dockerfile | RUSTFLAGS | Use case | +|---|---|---|---| +| **Portable (AVX2)** | `Dockerfile` | `-C target-cpu=x86-64-v3` | GitHub CI, general servers | +| **AVX-512 pinned** | `Dockerfile.avx512` | `-C target-cpu=x86-64-v4` | Production (Skylake-X+) | +| **HHTL-D TTS** | `Dockerfile.hhtld` | (inherits) | TTS inference container | +| **Local dev** | `.cargo/config.toml` | `-C target-cpu=x86-64-v4` | Developer machines | + +## How lance-graph Uses SIMD + +lance-graph delegates all SIMD work to **ndarray** (mandatory dependency). +ndarray's `src/simd.rs` polyfill provides the dispatch: + +``` +Consumer code (lance-graph): + ndarray::hpc::bitwise::hamming_distance_raw(a, b) + ndarray::simd::F32x16::mul_add(b, c) + ndarray::hpc::renderer::integrate_simd(pos, vel, dt, damp) + +Polyfill (ndarray simd.rs): + ┌─────────────────────────┐ + │ compile-time target_cpu │ + ├─────────┬───────────────┤ + │ v4 │ v3 / lower │ + ├─────────┼───────────────┤ + │ __m512 │ 2× __m256 or │ + │ native │ scalar loop │ + └─────────┴───────────────┘ + + + ┌──────────────────────────────┐ + │ runtime LazyLock │ + │ is_x86_feature_detected!() │ + │ → per-function AVX-512 even │ + │ when compiled at v3 │ + └──────────────────────────────┘ +``` + +### What lance-graph calls from ndarray SIMD + +| lance-graph location | ndarray function | What it does | +|---|---|---| +| `driver.rs` (shader hot loop) | `bitwise::hamming_distance_raw` | Content-plane Hamming pre-pass (16K-bit fingerprints) | +| `vector_ops.rs` (DataFusion UDF) | `bitwise::hamming_distance_raw` | SQL `hamming_distance()` function | +| `fingerprint.rs` (graph) | `bitwise::hamming_distance_raw` | Graph fingerprint similarity | +| `blasgraph/types.rs` | Own AVX-512/AVX2 Hamming | Hand-rolled (predates ndarray integration) | + +### `.cargo/config.toml` vs CI RUSTFLAGS + +**Important:** `RUSTFLAGS` env var **replaces** (not appends to) the `rustflags` +array in `.cargo/config.toml`. This is a Cargo design decision. + +lance-graph's `.cargo/config.toml` sets `target-cpu=x86-64-v4` for local dev. +CI workflows set `RUSTFLAGS="-C debuginfo=1 -C target-cpu=x86-64-v3"` which +**overrides** config.toml entirely. The CI binary targets AVX2. + +This is intentional: +- Local dev: maximum SIMD (AVX-512, everything inlined) +- CI: portable (AVX2, runtime detection for anything higher) +- Production Docker: choose `Dockerfile` (AVX2) or `Dockerfile.avx512` + +## AMX Detection + +Intel AMX (Sapphire Rapids+) is detected at runtime by ndarray: +`ndarray::hpc::amx_matmul::amx_available()` checks CPUID + OS XSAVE support. +AMX kernels are always compiled in and gated at call sites. No Dockerfile +or RUSTFLAGS change needed — it works with any `target-cpu`. + +## NEON (ARM / aarch64 / Raspberry Pi) + +ndarray detects NEON automatically on aarch64 (it's mandatory). The `dotprod` +extension (Pi 5 / A76+) is runtime-detected for 4× int8 throughput. +lance-graph inherits this via ndarray; no ARM-specific configuration needed. + +## Choosing the Right Dockerfile + +``` +GitHub CI / PR checks → Dockerfile (AVX2, -C target-cpu=x86-64-v3) +Railway / production → Dockerfile.avx512 (-C target-cpu=x86-64-v4) +TTS inference → Dockerfile.hhtld (downloads codebooks + runs decoder) +Raspberry Pi / ARM → Dockerfile (NEON auto-detected at runtime) +Maximum compatibility → docker build --build-arg RUSTFLAGS="-C target-cpu=x86-64" +``` + +## Verifying CPU Features + +```bash +# Inside the container: +cat /proc/cpuinfo | grep -oP 'avx512\w+' | sort -u + +# From Rust (ndarray): +use ndarray::hpc::simd_caps::simd_caps; +println!("{:?}", simd_caps()); // CpuCaps { avx512: true, avx2: true, fma: true, ... } +``` + +## Build Examples + +```bash +# Default (AVX2) — safe everywhere +docker build -t lance-graph-test . + +# AVX-512 pinned — production servers +docker build -f Dockerfile.avx512 -t lance-graph-avx512 . + +# TTS inference +docker build -f Dockerfile.hhtld \ + --build-arg RELEASE_TAG=v0.1.0 \ + -t lance-graph-tts:v0.1.0 . +``` diff --git a/crates/bgz17/src/palette.rs b/crates/bgz17/src/palette.rs index c656e48a..9297c87d 100644 --- a/crates/bgz17/src/palette.rs +++ b/crates/bgz17/src/palette.rs @@ -65,6 +65,24 @@ impl Palette { best_idx } + /// Build a precomputed distance table for O(1) inter-centroid distance. + /// + /// Returns a 256×256 u16 table where `table[i][j]` = L1 distance between + /// `entries[i]` and `entries[j]`. Used by the renderer and cascade skip + /// for fast palette-edge distance without recomputing L1 per query. + pub fn build_distance_table(&self) -> PaletteDistanceTable { + let k = self.entries.len(); + let mut table = vec![0u16; 256 * 256]; + for i in 0..k { + for j in i..k { + let d = self.entries[i].l1(&self.entries[j]) as u16; + table[i * 256 + j] = d; + table[j * 256 + i] = d; + } + } + PaletteDistanceTable { table, size: k } + } + /// Encode an SpoBase17 edge to palette indices. pub fn encode_edge(&self, edge: &SpoBase17) -> PaletteEdge { PaletteEdge { @@ -226,6 +244,41 @@ impl Palette { } } +/// Precomputed 256×256 L1 distance table for O(1) inter-centroid lookup. +/// +/// Built once from a `Palette` via `palette.build_distance_table()`. +/// Used by the cascade skip (HHTL), renderer force-directed layout, and +/// any path that needs repeated palette-edge distance without recomputing L1. +/// +/// Memory: 256×256×2 = 128 KB (fits L2 cache). Build cost: O(k²×17). +#[derive(Clone)] +pub struct PaletteDistanceTable { + table: Vec, + size: usize, +} + +impl PaletteDistanceTable { + /// O(1) distance between two palette indices. + #[inline] + pub fn distance(&self, a: u8, b: u8) -> u16 { + self.table[a as usize * 256 + b as usize] + } + + /// Number of active entries (≤ 256). + pub fn size(&self) -> usize { self.size } + + /// Distance between two PaletteEdges (sum of S + P + O distances). + #[inline] + pub fn edge_distance(&self, a: PaletteEdge, b: PaletteEdge) -> u32 { + self.distance(a.s_idx, b.s_idx) as u32 + + self.distance(a.p_idx, b.p_idx) as u32 + + self.distance(a.o_idx, b.o_idx) as u32 + } + + /// Memory footprint in bytes. + pub fn byte_size(&self) -> usize { self.table.len() * 2 } +} + /// Palette resolution: trade compression vs accuracy. /// /// Edge count determines optimal palette size: diff --git a/crates/cognitive-shader-driver/src/driver.rs b/crates/cognitive-shader-driver/src/driver.rs index e0b4764f..bc439f64 100644 --- a/crates/cognitive-shader-driver/src/driver.rs +++ b/crates/cognitive-shader-driver/src/driver.rs @@ -175,9 +175,9 @@ impl ShaderDriver { let fp_i = self.bindspace.fingerprints.content_row(row_i as usize); for (j_off, &row_j) in passed_rows.iter().enumerate().skip(i + 1) { let fp_j = self.bindspace.fingerprints.content_row(row_j as usize); - let hamming: u32 = fp_i.iter().zip(fp_j.iter()) - .map(|(a, b)| (a ^ b).count_ones()) - .sum(); + let fp_i_bytes = unsafe { std::slice::from_raw_parts(fp_i.as_ptr() as *const u8, WORDS_PER_FP * 8) }; + let fp_j_bytes = unsafe { std::slice::from_raw_parts(fp_j.as_ptr() as *const u8, WORDS_PER_FP * 8) }; + let hamming = ndarray::hpc::bitwise::hamming_distance_raw(fp_i_bytes, fp_j_bytes) as u32; let resonance = 1.0 - (hamming as f32 / FP_BITS); if resonance >= min_resonance { hits.push(ShaderHit { diff --git a/crates/lance-graph-contract/src/distance.rs b/crates/lance-graph-contract/src/distance.rs new file mode 100644 index 00000000..e575297c --- /dev/null +++ b/crates/lance-graph-contract/src/distance.rs @@ -0,0 +1,195 @@ +//! Type-intrinsic distance dispatch — zero crate-boundary tax. +//! +//! Each carrier type implements `Distance` so the SoA consumer calls +//! `a.distance(&b)` and the compiler monomorphizes the correct kernel. +//! No `dyn`, no enum match, no runtime cost. +//! +//! | Carrier | Distance | Kernel | +//! |---|---|---| +//! | `[u64; 256]` (Binary16K) | Hamming (popcount of XOR) | SIMD VPOPCNTDQ | +//! | `[f32; 16_384]` (Vsa16kF32) | Cosine → FisherZ | F32x16 FMA | +//! | `[u8; 6]` (CamPqCode) | ADC lookup | Precomputed table | +//! | `[u8; 3]` (PaletteEdge) | Palette L1 lookup | 256×256 table | + +/// Universal distance trait for all carrier types. +/// +/// The trait monomorphizes at compile time — no dynamic dispatch. +/// Implementations live in ndarray (SIMD kernels) or carrier crates +/// (precomputed tables); the contract defines only the interface. +pub trait Distance: Sized { + /// Maximum possible distance for this type (used for normalization). + const MAX_DISTANCE: u32; + + /// Compute distance between two carriers of the same type. + fn distance(&self, other: &Self) -> u32; + + /// Normalized similarity in [0.0, 1.0]. Default: 1 - d/MAX. + #[inline] + fn similarity(&self, other: &Self) -> f32 { + if Self::MAX_DISTANCE == 0 { return 1.0; } + 1.0 - (self.distance(other) as f32 / Self::MAX_DISTANCE as f32) + } + + /// FisherZ-transformed similarity for safe averaging. + /// + /// Cosine similarity ∈ [-1, 1] is nonlinear for averaging. FisherZ + /// maps it to a normal-distributed variable: z = atanh(r). Average + /// in z-space, then tanh(z_avg) maps back. For non-cosine distances, + /// the default implementation uses the [0,1] similarity directly. + #[inline] + fn similarity_z(&self, other: &Self) -> f32 { + let s = self.similarity(other); + let clamped = s.clamp(-0.999, 0.999); + ((1.0 + clamped) / (1.0 - clamped)).ln() * 0.5 + } +} + +/// Inverse FisherZ: recover similarity from z-transformed value. +#[inline] +pub fn fisher_z_inverse(z: f32) -> f32 { + let e2z = (2.0 * z).exp(); + (e2z - 1.0) / (e2z + 1.0) +} + +/// Average similarities via FisherZ transform (correct for nonlinear scales). +pub fn mean_similarity_fisher(z_values: &[f32]) -> f32 { + if z_values.is_empty() { return 0.0; } + let mean_z: f32 = z_values.iter().sum::() / z_values.len() as f32; + fisher_z_inverse(mean_z) +} + +// ───────────────────────────────────────────────────────────────────── +// Implementations for contract types (zero-dep, no SIMD — baseline). +// ndarray consumers should shadow these with SIMD-accelerated versions +// via the same trait on the same types (blanket impls or newtype wrappers). +// +// These scalar impls guarantee the trait works everywhere, including +// in the contract crate's own tests and in WASM/embedded targets +// where ndarray may not be available. +// ───────────��─────────────────────────��─────────────────────────────── + +/// Binary16K: Hamming distance (scalar baseline). +impl Distance for [u64; 256] { + const MAX_DISTANCE: u32 = 16_384; + + #[inline] + fn distance(&self, other: &Self) -> u32 { + let mut d = 0u32; + for i in 0..256 { + d += (self[i] ^ other[i]).count_ones(); + } + d + } +} + +/// CamPqCode: byte-wise L1 distance (6-byte ADC code, scalar baseline). +/// Real ADC uses precomputed distance tables; this is the fallback. +impl Distance for [u8; 6] { + const MAX_DISTANCE: u32 = 255 * 6; + + #[inline] + fn distance(&self, other: &Self) -> u32 { + let mut d = 0u32; + for i in 0..6 { + d += (self[i] as i16 - other[i] as i16).unsigned_abs() as u32; + } + d + } +} + +/// PaletteEdge: byte-wise L1 distance (3-byte SPO palette code). +impl Distance for [u8; 3] { + const MAX_DISTANCE: u32 = 255 * 3; + + #[inline] + fn distance(&self, other: &Self) -> u32 { + let mut d = 0u32; + for i in 0..3 { + d += (self[i] as i16 - other[i] as i16).unsigned_abs() as u32; + } + d + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn binary16k_hamming_self_zero() { + let a = [0u64; 256]; + assert_eq!(a.distance(&a), 0); + assert_eq!(a.similarity(&a), 1.0); + } + + #[test] + fn binary16k_hamming_all_different() { + let a = [0u64; 256]; + let b = [u64::MAX; 256]; + assert_eq!(a.distance(&b), 16_384); + assert_eq!(a.similarity(&b), 0.0); + } + + #[test] + fn binary16k_hamming_partial() { + let mut a = [0u64; 256]; + let mut b = [0u64; 256]; + a[0] = 0xFF; + b[0] = 0x00; + assert_eq!(a.distance(&b), 8); + } + + #[test] + fn cam_pq_code_distance() { + let a = [10u8, 20, 30, 40, 50, 60]; + let b = [15u8, 25, 35, 45, 55, 65]; + assert_eq!(a.distance(&b), 30); // 5×6 + } + + #[test] + fn cam_pq_code_self_zero() { + let a = [10u8, 20, 30, 40, 50, 60]; + assert_eq!(a.distance(&a), 0); + } + + #[test] + fn palette_edge_distance() { + let a = [0u8, 0, 0]; + let b = [255u8, 255, 255]; + assert_eq!(a.distance(&b), 255 * 3); + assert_eq!(a.similarity(&b), 0.0); + } + + #[test] + fn fisher_z_roundtrip() { + let s = 0.8f32; + let z = ((1.0 + s) / (1.0 - s)).ln() * 0.5; + let recovered = fisher_z_inverse(z); + assert!((recovered - s).abs() < 1e-5); + } + + #[test] + fn mean_similarity_fisher_averaging() { + let z_values = vec![0.5, 0.5, 0.5]; + let mean = mean_similarity_fisher(&z_values); + let expected = fisher_z_inverse(0.5); + assert!((mean - expected).abs() < 1e-5); + } + + #[test] + fn similarity_z_positive_for_similar() { + let a = [0u64; 256]; + let mut b = [0u64; 256]; + b[0] = 1; // 1 bit different + let z = a.similarity_z(&b); + assert!(z > 0.0, "similar vectors should have positive z"); + } + + #[test] + fn similarity_z_near_zero_for_dissimilar() { + let a = [0u64; 256]; + let b = [u64::MAX; 256]; + let z = a.similarity_z(&b); + assert!(z < 0.01, "maximally different should have z near 0"); + } +} diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs index 577aec3a..72cb318b 100644 --- a/crates/lance-graph-contract/src/lib.rs +++ b/crates/lance-graph-contract/src/lib.rs @@ -67,3 +67,4 @@ pub mod sla; pub mod auth; pub mod scenario; pub mod graph_render; +pub mod distance; diff --git a/crates/lance-graph/src/datafusion_planner/vector_ops.rs b/crates/lance-graph/src/datafusion_planner/vector_ops.rs index 17ce78ac..db8c87a0 100644 --- a/crates/lance-graph/src/datafusion_planner/vector_ops.rs +++ b/crates/lance-graph/src/datafusion_planner/vector_ops.rs @@ -137,56 +137,37 @@ pub fn cosine_distance(a: &[f32], b: &[f32]) -> f32 { return 2.0; } - let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum(); - let norm_a: f32 = a.iter().map(|x| x * x).sum::().sqrt(); - let norm_b: f32 = b.iter().map(|x| x * x).sum::().sqrt(); - - if norm_a == 0.0 || norm_b == 0.0 { - return 2.0; // Maximum distance for zero vectors - } - - let similarity = dot / (norm_a * norm_b); - 1.0 - similarity + ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd(a, b) as f32 } /// Compute cosine similarity (for vector_similarity function) /// Returns a value in [-1, 1] where 1 means identical and -1 means opposite pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 { if a.len() != b.len() { - // Dimension mismatch - return minimum similarity return -1.0; } - - let dot: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum(); - let norm_a: f32 = a.iter().map(|x| x * x).sum::().sqrt(); - let norm_b: f32 = b.iter().map(|x| x * x).sum::().sqrt(); - - if norm_a == 0.0 || norm_b == 0.0 { - return -1.0; // Minimum similarity for zero vectors - } - - dot / (norm_a * norm_b) + ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd(a, b) as f32 } /// Compute dot product between two vectors /// For similarity search, we return the negative (so lower is better for sorting) pub fn dot_product_distance(a: &[f32], b: &[f32]) -> f32 { if a.len() != b.len() { - // Dimension mismatch - return worst distance to exclude from results return f32::MAX; } - - -a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::() + let a_f64: Vec = a.iter().map(|&x| x as f64).collect(); + let b_f64: Vec = b.iter().map(|&x| x as f64).collect(); + -(ndarray::hpc::heel_f64x8::dot_f64_simd(&a_f64, &b_f64) as f32) } /// Compute dot product similarity (for vector_similarity function) pub fn dot_product_similarity(a: &[f32], b: &[f32]) -> f32 { if a.len() != b.len() { - // Dimension mismatch - return worst similarity to exclude from results return f32::MIN; } - - a.iter().zip(b.iter()).map(|(x, y)| x * y).sum::() + let a_f64: Vec = a.iter().map(|&x| x as f64).collect(); + let b_f64: Vec = b.iter().map(|&x| x as f64).collect(); + ndarray::hpc::heel_f64x8::dot_f64_simd(&a_f64, &b_f64) as f32 } /// Compute vector distance for an array of vectors against a single query vector @@ -214,20 +195,7 @@ pub fn hamming_distance(a: &[u8], b: &[u8]) -> u32 { if a.len() != b.len() { return u32::MAX; } - // Process 8 bytes at a time via u64 popcount - let chunks = a.len() / 8; - let mut dist = 0u32; - for i in 0..chunks { - let offset = i * 8; - let wa = u64::from_le_bytes(a[offset..offset + 8].try_into().unwrap()); - let wb = u64::from_le_bytes(b[offset..offset + 8].try_into().unwrap()); - dist += (wa ^ wb).count_ones(); - } - // Remainder bytes - for i in (chunks * 8)..a.len() { - dist += (a[i] ^ b[i]).count_ones(); - } - dist + ndarray::hpc::bitwise::hamming_distance_raw(a, b) as u32 } /// Hamming similarity: `1.0 - distance / total_bits`. diff --git a/crates/lance-graph/src/graph/fingerprint.rs b/crates/lance-graph/src/graph/fingerprint.rs index bcb49230..407c5a8d 100644 --- a/crates/lance-graph/src/graph/fingerprint.rs +++ b/crates/lance-graph/src/graph/fingerprint.rs @@ -80,10 +80,9 @@ pub fn dn_hash(dn: &str) -> u64 { /// /// Returns the number of bit positions where the fingerprints differ. pub fn hamming_distance(a: &Fingerprint, b: &Fingerprint) -> u32 { - a.iter() - .zip(b.iter()) - .map(|(x, y)| (x ^ y).count_ones()) - .sum() + let a_bytes = unsafe { std::slice::from_raw_parts(a.as_ptr() as *const u8, FINGERPRINT_WORDS * 8) }; + let b_bytes = unsafe { std::slice::from_raw_parts(b.as_ptr() as *const u8, FINGERPRINT_WORDS * 8) }; + ndarray::hpc::bitwise::hamming_distance_raw(a_bytes, b_bytes) as u32 } /// Zero fingerprint constant.