forked from lance-format/lance-graph
-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Distance trait + SIMD Hamming/cosine wiring + PaletteDistanceTable + Dockerfile docs #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
a6cec70
fix: wire ndarray SIMD Hamming into all scalar hot paths + CI/Docker …
claude ca4eb8b
docs: Dockerfile.md — CPU detection & SIMD dispatch documentation
claude 3a983b4
docs: distance dispatch epiphany + 3 tech debt entries (TD-DIST-1/2/3)
claude 277232b
feat: Distance trait + SIMD cosine/dot + PaletteDistanceTable (TD-DIS…
claude 6899390
chore(board): mark TD-DIST-1/2/3 paid in commit 8603148
claude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # lance-graph Docker CPU Detection & SIMD Dispatch | ||
|
|
||
| ## Three-Tier Build Strategy | ||
|
|
||
| | Target | Dockerfile | RUSTFLAGS | Use case | | ||
| |---|---|---|---| | ||
| | **Portable (AVX2)** | `Dockerfile` | `-C target-cpu=x86-64-v3` | GitHub CI, general servers | | ||
| | **AVX-512 pinned** | `Dockerfile.avx512` | `-C target-cpu=x86-64-v4` | Production (Skylake-X+) | | ||
| | **HHTL-D TTS** | `Dockerfile.hhtld` | (inherits) | TTS inference container | | ||
| | **Local dev** | `.cargo/config.toml` | `-C target-cpu=x86-64-v4` | Developer machines | | ||
|
|
||
| ## How lance-graph Uses SIMD | ||
|
|
||
| lance-graph delegates all SIMD work to **ndarray** (mandatory dependency). | ||
| ndarray's `src/simd.rs` polyfill provides the dispatch: | ||
|
|
||
| ``` | ||
| Consumer code (lance-graph): | ||
| ndarray::hpc::bitwise::hamming_distance_raw(a, b) | ||
| ndarray::simd::F32x16::mul_add(b, c) | ||
| ndarray::hpc::renderer::integrate_simd(pos, vel, dt, damp) | ||
|
|
||
| Polyfill (ndarray simd.rs): | ||
| ┌─────────────────────────┐ | ||
| │ compile-time target_cpu │ | ||
| ├─────────┬───────────────┤ | ||
| │ v4 │ v3 / lower │ | ||
| ├─────────┼───────────────┤ | ||
| │ __m512 │ 2× __m256 or │ | ||
| │ native │ scalar loop │ | ||
| └─────────┴───────────────┘ | ||
| + | ||
| ┌──────────────────────────────┐ | ||
| │ runtime LazyLock<Tier> │ | ||
| │ is_x86_feature_detected!() │ | ||
| │ → per-function AVX-512 even │ | ||
| │ when compiled at v3 │ | ||
| └──────────────────────────────┘ | ||
| ``` | ||
|
|
||
| ### What lance-graph calls from ndarray SIMD | ||
|
|
||
| | lance-graph location | ndarray function | What it does | | ||
| |---|---|---| | ||
| | `driver.rs` (shader hot loop) | `bitwise::hamming_distance_raw` | Content-plane Hamming pre-pass (16K-bit fingerprints) | | ||
| | `vector_ops.rs` (DataFusion UDF) | `bitwise::hamming_distance_raw` | SQL `hamming_distance()` function | | ||
| | `fingerprint.rs` (graph) | `bitwise::hamming_distance_raw` | Graph fingerprint similarity | | ||
| | `blasgraph/types.rs` | Own AVX-512/AVX2 Hamming | Hand-rolled (predates ndarray integration) | | ||
|
|
||
| ### `.cargo/config.toml` vs CI RUSTFLAGS | ||
|
|
||
| **Important:** `RUSTFLAGS` env var **replaces** (not appends to) the `rustflags` | ||
| array in `.cargo/config.toml`. This is a Cargo design decision. | ||
|
|
||
| lance-graph's `.cargo/config.toml` sets `target-cpu=x86-64-v4` for local dev. | ||
| CI workflows set `RUSTFLAGS="-C debuginfo=1 -C target-cpu=x86-64-v3"` which | ||
| **overrides** config.toml entirely. The CI binary targets AVX2. | ||
|
|
||
| This is intentional: | ||
| - Local dev: maximum SIMD (AVX-512, everything inlined) | ||
| - CI: portable (AVX2, runtime detection for anything higher) | ||
| - Production Docker: choose `Dockerfile` (AVX2) or `Dockerfile.avx512` | ||
|
|
||
| ## AMX Detection | ||
|
|
||
| Intel AMX (Sapphire Rapids+) is detected at runtime by ndarray: | ||
| `ndarray::hpc::amx_matmul::amx_available()` checks CPUID + OS XSAVE support. | ||
| AMX kernels are always compiled in and gated at call sites. No Dockerfile | ||
| or RUSTFLAGS change needed — it works with any `target-cpu`. | ||
|
|
||
| ## NEON (ARM / aarch64 / Raspberry Pi) | ||
|
|
||
| ndarray detects NEON automatically on aarch64 (it's mandatory). The `dotprod` | ||
| extension (Pi 5 / A76+) is runtime-detected for 4× int8 throughput. | ||
| lance-graph inherits this via ndarray; no ARM-specific configuration needed. | ||
|
|
||
| ## Choosing the Right Dockerfile | ||
|
|
||
| ``` | ||
| GitHub CI / PR checks → Dockerfile (AVX2, -C target-cpu=x86-64-v3) | ||
| Railway / production → Dockerfile.avx512 (-C target-cpu=x86-64-v4) | ||
| TTS inference → Dockerfile.hhtld (downloads codebooks + runs decoder) | ||
| Raspberry Pi / ARM → Dockerfile (NEON auto-detected at runtime) | ||
| Maximum compatibility → docker build --build-arg RUSTFLAGS="-C target-cpu=x86-64" | ||
| ``` | ||
|
|
||
| ## Verifying CPU Features | ||
|
|
||
| ```bash | ||
| # Inside the container: | ||
| cat /proc/cpuinfo | grep -oP 'avx512\w+' | sort -u | ||
|
|
||
| # From Rust (ndarray): | ||
| use ndarray::hpc::simd_caps::simd_caps; | ||
| println!("{:?}", simd_caps()); // CpuCaps { avx512: true, avx2: true, fma: true, ... } | ||
| ``` | ||
|
|
||
| ## Build Examples | ||
|
|
||
| ```bash | ||
| # Default (AVX2) — safe everywhere | ||
| docker build -t lance-graph-test . | ||
|
|
||
| # AVX-512 pinned — production servers | ||
| docker build -f Dockerfile.avx512 -t lance-graph-avx512 . | ||
|
|
||
| # TTS inference | ||
| docker build -f Dockerfile.hhtld \ | ||
| --build-arg RELEASE_TAG=v0.1.0 \ | ||
| -t lance-graph-tts:v0.1.0 . | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base17::l1returnsu32and can exceed 65,535 (17 dimensions ofi16differences), butbuild_distance_tablenarrows each value tou16. This silently wraps large distances before they are used byPaletteDistanceTable::distance/edge_distance, causing incorrect palette-edge scores and wrong decisions in any path relying on the precomputed table.Useful? React with 👍 / 👎.