Claude/unified query planner a w8ax#65
Merged
Merged
Conversation
F32x8 and F64x4 only had Add + Mul. AVX2 fallback for F32x16 needs all four arithmetic ops on the 256-bit types. Additive only — no existing code changed. https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
F32x16, F64x8, U8x64, I32x16, I64x8, U32x16, U64x8 — all composed from 2× AVX2 halves (F32x8/F64x4 for float, array loops for integer). Same API as simd_avx512.rs types. simd.rs will LazyLock-dispatch between the two files based on runtime CPU detection. Add/Sub/Mul/Div on F32x16 dispatch to 2× F32x8 operations (AVX2). Integer types use array loops (AVX2 lacks 512-bit integer SIMD). https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
simd.rs now re-exports from simd_avx2 (2× __m256 composed types) instead of simd_avx512 (__m512 native) for all 512-bit types. This eliminates the SIGILL risk on x86_64 without AVX-512. The AVX2 composed types use 2× F32x8 per F32x16 operation — correct on all hardware, 2 instructions instead of 1 on AVX-512. BLAS hot paths (dot, axpy, gemm) still dispatch to AVX-512 kernels via native.rs LazyLock<Tier> — no performance regression for inner loops. The simd.rs types serve HPC consumer code. LazyLock<Tier> detection added to simd.rs (same pattern as native.rs). F32x8/F64x4 (256-bit AVX2 base types) always re-exported from simd_avx512. 1422/1423 tests pass (1 pre-existing causal_diff failure). https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
- Remove burn from exclude list — all crates now in workspace - Add [lib] section to burn Cargo.toml (edition 2024 requires explicit target) - p64: 23 tests pass, phyllotactic-manifold: 14 tests pass - Full workspace compiles clean https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
7 sections, 11 tests, zero new types — only mappings: 1. SIMD manifold: expand_manifold_simd() via F64x8 + SPIRAL7_X/Y 2. SIMD attention: attend_batch_8() with VPOPCNTDQ fast path via simd_caps() 3. NARS bridge: resonance_to_nars(), nars_to_branch_byte() 4. CausalEdge64 compat: bit layout, palette addressing, layer mask mapping 5. ThinkingStyle cache: 6 styles in LazyLock, ordinal + name lookup 6. Semiring mapping: semiring name → CombineMode + ContraMode 7. DeepNSM palette: distance matrix → Palette64 interaction bitmap Re-exports: Palette64, Palette3D, ThinkingStyle, HeelPlanes, CombineMode, ContraMode, predicate, manifold_consts p64 + phyllotactic-manifold added as path deps in Cargo.toml. https://claude.ai/code/session_01BTATTRUACijvsK4hqmKUBR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.