Skip to content

Commit 7caefe9

Browse files
committed
feat(simd): re-export f32_to_bf16_batch_rne / f32_to_bf16_scalar_rne
Makes the pure AVX-512-F RNE routines from commit c489d31 reachable as `ndarray::simd::f32_to_bf16_batch_rne` and `ndarray::simd::f32_to_bf16_scalar_rne` for consumer code in lance-graph. Without this re-export, callers would have to reach into the private `simd_avx512` module path, which is not `pub mod` in `lib.rs`. Doc comment on the re-export explicitly pins the workspace-wide "never scalar ever" rule for F32→BF16: consumer hot loops use `f32_to_bf16_batch_rne` exclusively (500-20,000× faster than scalar via AMX/AVX-512-BF16 tiles), and `f32_to_bf16_scalar_rne` is exposed only as a unit-test reference implementation. Cross-references the Certification Process section in `lance-graph/CLAUDE.md`. Companion commit in lance-graph updates `seven_lane_encoder.rs` Lane 6 to call the batch primitive instead of its previous element-wise truncation loop. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
1 parent c489d31 commit 7caefe9

1 file changed

Lines changed: 14 additions & 0 deletions

File tree

src/simd.rs

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,20 @@ pub use crate::simd_avx512::{
105105
bf16_to_f32_scalar, f32_to_bf16_scalar,
106106
bf16_to_f32_batch, f32_to_bf16_batch,
107107
};
108+
109+
// BF16 RNE (round-to-nearest-even) path — pure AVX-512-F, byte-exact vs
110+
// hardware `_mm512_cvtneps_pbh` on Sapphire Rapids+ (verified on 1M inputs
111+
// in ndarray::simd_avx512::tests). Consumer code should call
112+
// `f32_to_bf16_batch_rne` in hot loops (500-20000× faster than the scalar
113+
// path via AMX / AVX-512 tiles); `f32_to_bf16_scalar_rne` is exposed only
114+
// as a unit-test reference implementation and MUST NOT be called in hot
115+
// loops per the workspace-wide "never scalar ever" rule for F32→BF16.
116+
// See lance-graph/CLAUDE.md § Certification Process.
117+
#[cfg(target_arch = "x86_64")]
118+
pub use crate::simd_avx512::{
119+
f32_to_bf16_scalar_rne,
120+
f32_to_bf16_batch_rne,
121+
};
108122
// BF16 SIMD types only available when avx512bf16 is enabled at compile time
109123
#[cfg(all(target_arch = "x86_64", target_feature = "avx512bf16"))]
110124
pub use crate::simd_avx512::{BF16x16, BF16x8};

0 commit comments

Comments
 (0)