You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I8x16::saturating_abs now uses _mm_abs_epi8 + _mm_min_epu8 (the contract's
VPABSB correction: VPABSB returns 0x80 for i8::MIN, VPMINUB clamps to 0x7f)
instead of a per-lane branching scalar loop — 16 lanes branchless.
Also adds the binding W1a unit tests that #203 shipped without (only
rust,ignore doctests existed): saturating_abs(i8::MIN)==i8::MAX for I8x16
and I8x32, a scalar-reference corpus, i4 sign-extension, U64x8 popcnt /
xor_popcount, and gather_u16. All 6 pass on the v3 build.
Not changed (measured, not assumed): U64x8::popcnt on AVX2 already lowers
to hardware POPCNT via count_ones; gather_u16 stays scalar because a 32-bit
_mm256_i32gather over a &[u16] over-reads past the last index (no 16-bit
hardware gather exists).
https://claude.ai/code/session_017GFLBnDy23AWBqvkbHHC41
0 commit comments