Skip to content

Commit c489d31

Browse files
committed
feat(simd_avx512): pure AVX-512-F RNE path for f32 → bf16
Adds f32_to_bf16_x16_rne (16-lane AVX-512-F routine) and the scalar/batch wrappers f32_to_bf16_scalar_rne / f32_to_bf16_batch_rne. Output is byte-identical to _mm512_cvtneps_pbh on every f32 input (normals, subnormals, ±0, ±Inf, qNaN, sNaN) while requiring only the skylake-x AVX-512-F baseline, so the certification harness in thinking-engine gets a deterministic F32 → BF16 primitive across CPU generations. Algorithm follows Intel SDM VCVTNEPS2BF16 pseudocode: - NaN → (bits >> 16) | 0x0040 (forced quiet bit) - subnormal → sign bit only (DAZ-style flush) - everything → (bits + 0x7FFF + ((bits>>16)&1)) >> 16 (RNE bias trick) Verified against _mm512_cvtneps_pbh byte-for-byte on ~1,000,100 f32 inputs (systematic corpus + xorshift stream) and against a ties-to-even sweep over every f32 exponent. Legacy truncation primitive f32_to_bf16_scalar and the existing f32_to_bf16_batch dispatch are intentionally left untouched — this commit only adds new symbols. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
1 parent 17bfde3 commit c489d31

1 file changed

Lines changed: 517 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)