Commit c489d31
committed
feat(simd_avx512): pure AVX-512-F RNE path for f32 → bf16
Adds f32_to_bf16_x16_rne (16-lane AVX-512-F routine) and the scalar/batch
wrappers f32_to_bf16_scalar_rne / f32_to_bf16_batch_rne. Output is
byte-identical to _mm512_cvtneps_pbh on every f32 input (normals,
subnormals, ±0, ±Inf, qNaN, sNaN) while requiring only the skylake-x
AVX-512-F baseline, so the certification harness in thinking-engine gets
a deterministic F32 → BF16 primitive across CPU generations.
Algorithm follows Intel SDM VCVTNEPS2BF16 pseudocode:
- NaN → (bits >> 16) | 0x0040 (forced quiet bit)
- subnormal → sign bit only (DAZ-style flush)
- everything → (bits + 0x7FFF + ((bits>>16)&1)) >> 16 (RNE bias trick)
Verified against _mm512_cvtneps_pbh byte-for-byte on ~1,000,100 f32 inputs
(systematic corpus + xorshift stream) and against a ties-to-even sweep
over every f32 exponent. Legacy truncation primitive f32_to_bf16_scalar
and the existing f32_to_bf16_batch dispatch are intentionally left
untouched — this commit only adds new symbols.
https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A1 parent 17bfde3 commit c489d31
1 file changed
Lines changed: 517 additions & 0 deletions
0 commit comments