feat: Distance trait + SIMD Hamming/cosine wiring + PaletteDistanceTable + Dockerfile docs#269
Conversation
…AVX2 default
Hamming SIMD wiring (ndarray is mandatory, no reason for scalar):
- driver.rs:178 — shader content pre-pass now calls
ndarray::hpc::bitwise::hamming_distance_raw() instead of
scalar iter().zip().map(xor.count_ones()).sum() over 256 u64 words
- vector_ops.rs:213 — DataFusion UDF hamming_distance delegates to
ndarray::hpc::bitwise::hamming_distance_raw()
- fingerprint.rs:82 — graph fingerprint Hamming delegates to ndarray
CI/Docker fix — x86-64-v3 (AVX2) as the default everywhere:
- .github/workflows/{build,rust-test,style,rust-publish}.yml all had
RUSTFLAGS without target-cpu, which overrode .cargo/config.toml's
x86-64-v4 and compiled at BASELINE x86-64 (no AVX at all).
Now: RUSTFLAGS includes -C target-cpu=x86-64-v3 so CI gets AVX2.
- Dockerfile: added ENV RUSTFLAGS="-C target-cpu=x86-64-v3" so the
default Docker image runs on AVX2+ hardware (GitHub CI, most servers).
Dockerfile.avx512 still pins x86-64-v4 for deployment.
The split:
LOCAL (.cargo/config.toml) → x86-64-v4 (AVX-512, developer machines)
CI / Docker default → x86-64-v3 (AVX2, GitHub runners)
Dockerfile.avx512 → x86-64-v4 (AVX-512, production deploy)
ndarray's simd.rs polyfill detects AVX-512 at runtime regardless of
compile target, so the AVX2 binary still dispatches to AVX-512 kernels
on capable hardware.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Comprehensive doc covering the three-tier build strategy (AVX2 default / AVX-512 pinned / local dev), two-layer dispatch model (compile-time cfg(target_feature) + runtime LazyLock<Tier>), AMX detection, NEON/ARM, RUSTFLAGS vs .cargo/config.toml override behavior, and which lance-graph locations call ndarray SIMD. Also: Dockerfile + Dockerfile.avx512 headers now reference Dockerfile.md. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
EPIPHANY: Distance dispatch must be type-intrinsic, not crate-boundary- crossing. The `Distance` trait on carrier types monomorphizes at compile time — zero dynamic dispatch, zero crate boundary tax. Contract defines interface, ndarray provides SIMD kernels. Includes FisherZ note for cosine similarity averaging across SoA columns. Full type→distance mapping table (Binary16K→Hamming, Vsa16kF32→cosine/FisherZ, CamPq→ADC, PaletteEdge→L1 table, Base17→nearest, HighHeelBGZ→cascade). TECH_DEBT: TD-DIST-1: Distance trait missing from contract (blocks generic SoA sweeps) TD-DIST-2: vector_ops.rs 4 scalar dot/norm/cosine loops (8-12× speedup available) TD-DIST-3: bgz17 Palette::nearest() brute-force 256×17 (100× speedup via table) https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…T-1/2/3)
TD-DIST-1: contract::distance module — type-intrinsic dispatch trait.
- Distance trait with distance(), similarity(), similarity_z() (FisherZ)
- impl for [u64; 256] (Binary16K → Hamming), [u8; 6] (CamPq → L1),
[u8; 3] (PaletteEdge → L1)
- fisher_z_inverse(), mean_similarity_fisher() for safe averaging
- Scalar baseline impls (work in WASM/embedded; ndarray shadows with SIMD)
- 11 tests
TD-DIST-2: vector_ops.rs cosine/dot → ndarray SIMD.
- cosine_distance/similarity → ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd
- dot_product_distance/similarity → ndarray::hpc::heel_f64x8::dot_f64_simd
- Estimated 8-12× speedup on DataFusion UDF path
TD-DIST-3: bgz17 PaletteDistanceTable — O(1) inter-centroid lookup.
- Palette::build_distance_table() → 256×256 u16 table (128 KB, L2-resident)
- PaletteDistanceTable::distance(a, b) → single array index
- edge_distance(a, b) → sum of S+P+O table lookups
- Palette::nearest() unchanged (still brute-force for query→centroid);
table is for centroid↔centroid (cascade skip, renderer force layout)
All three TD-DIST items from TECH_DEBT.md addressed. 260 contract tests pass,
126 bgz17 tests pass, workspace compiles clean.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68993903ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| let similarity = dot / (norm_a * norm_b); | ||
| 1.0 - similarity | ||
| ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd(a, b) as f32 |
There was a problem hiding this comment.
Restore cosine distance semantics in vector_ops
cosine_distance now returns cosine_f32_to_f64_simd directly, but that helper is a cosine similarity value (used as such throughout the repo), while this API is documented and consumed as distance (1 - similarity) in compute_vector_distances. With the current code, cosine ranking is wrong (e.g., identical vectors produce ~1 instead of 0, orthogonal vectors produce 0 instead of 1), so nearest-neighbor ordering for DistanceMetric::Cosine is inverted/corrupted.
Useful? React with 👍 / 👎.
| let mut table = vec![0u16; 256 * 256]; | ||
| for i in 0..k { | ||
| for j in i..k { | ||
| let d = self.entries[i].l1(&self.entries[j]) as u16; |
There was a problem hiding this comment.
Avoid truncating palette L1 distances to u16
Base17::l1 returns u32 and can exceed 65,535 (17 dimensions of i16 differences), but build_distance_table narrows each value to u16. This silently wraps large distances before they are used by PaletteDistanceTable::distance/edge_distance, causing incorrect palette-edge scores and wrong decisions in any path relying on the precomputed table.
Useful? React with 👍 / 👎.
Summary
Five commits, follow-on to merged PR #268:
SIMD Hamming wiring (was scalar)
cognitive-shader-driver/src/driver.rs:178— shader content pre-pass now callsndarray::hpc::bitwise::hamming_distance_raw()instead of scalariter().zip().map(xor.count_ones()).sum()over 256 u64 words. ~8-16× speedup (AVX-512 VPOPCNTDQ).lance-graph/src/datafusion_planner/vector_ops.rs:213— DataFusion UDFhamming_distancedelegates to ndarray.lance-graph/src/graph/fingerprint.rs:82— graph fingerprint Hamming delegates to ndarray.CI / Docker AVX2 default
build.yml,rust-test.yml,style.yml,rust-publish.yml) hadRUSTFLAGS: "-C debuginfo=1"which overrode.cargo/config.toml'starget-cpu=x86-64-v4and compiled at baseline x86-64 (no AVX at all).RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3"so CI gets AVX2.Dockerfile:ENV RUSTFLAGS="-C target-cpu=x86-64-v3"so default Docker image runs on AVX2+ hardware.Dockerfile.avx512unchanged.Dockerfile.md documentation
.cargo/config.tomloverride behavior, AMX, NEON, decision flowchart.Distance trait (
contract::distance) — TD-DIST-1Distancetrait withdistance(),similarity(),similarity_z()(FisherZ transform).fp_a.distance(&fp_b)monomorphizes at compile time — zero crate boundary tax. Nodyn, no enum match.[u64; 256](Binary16K → Hamming),[u8; 6](CamPq → L1),[u8; 3](PaletteEdge → L1).fisher_z_inverse()+mean_similarity_fisher()for safe averaging across SoA columns.SIMD cosine/dot wiring — TD-DIST-2
vector_ops.rscosine/dot replaced 4 scalar loops withndarray::hpc::heel_f64x8::{cosine_f32_to_f64_simd, dot_f64_simd}. Estimated 8-12× speedup on DataFusion UDF path.bgz17 PaletteDistanceTable — TD-DIST-3
Palette::build_distance_table()→ 256×256 u16 table (128 KB, L2-resident).PaletteDistanceTable::distance(a, b)→ O(1) array index.edge_distance(a, b)→ S+P+O table lookups.Board hygiene
EPIPHANIES.md: Distance dispatch FINDING (type-intrinsic dispatch, FisherZ note, full type→distance mapping table).TECH_DEBT.md: TD-DIST-1/2/3 opened, then marked PAID in same session.Test plan
cargo checkworkspace clean-C target-cpu=x86-64-v3Commits
2d7e6b3fix: wire ndarray SIMD Hamming into all scalar hot paths + CI/Docker AVX2 defaultca4eb8bdocs: Dockerfile.md — CPU detection & SIMD dispatch documentation3a983b4docs: distance dispatch epiphany + 3 tech debt entries (TD-DIST-1/2/3)277232bfeat: Distance trait + SIMD cosine/dot + PaletteDistanceTable (TD-DIST-1/2/3)6899390chore(board): mark TD-DIST-1/2/3 paid in commit 8603148https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Generated by Claude Code