You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(simd): flip MX-T1a cells + lock in asm-byte rule for AMX/F16
Two updates to the agnostic-surface CPU matrix following the
MX-T1a landing (b5bca4e) and the user directive on instruction
encoding strategy:
1. Matrix § C cells flipped from ⚠️ scalar → ✅ for
add_i8 / sub_i8 / add_i16 across every CPU column. The path
per backend is documented inline (zmm _mm512_add_epi8 on
AVX-512-BW, 2× ymm _mm256_add_epi8 on AVX2 via I8x64 polyfill,
vaddq_s8 on NEON, scalar wrapping_add elsewhere).
2. § J Phase 0 grows an entry for MX-T1a, and gains a NEW
"Design rule for AMX / F16 / FP16 paths" subsection that
codifies the asm-byte encoding requirement for Phases 1b
(AMX-INT8 arm of gemm_u8_i8), 3b (AVX-512-FP16 native
F16x16 ops), 3c (NEON BF16+FP16), and 4d (AMX-FP16 on GNR).
The rule:
* AMX intrinsics are nightly-only on Rust 1.95 (issue
#126622) → use asm!(".byte 0xc4, 0xe2, 0x73, 0x5e, 0xc1")
style per the existing simd_amx.rs pattern.
* AVX-512-FP16 intrinsics have stabilization churn → same
asm-byte encoding sidesteps Rust release dance.
* NEON FP16 (FMLA v.8h, BFDOT, BFMMLA, USDOT) — historically
nightly-gated, use .inst 0x0e40cc20-style encoding for
AArch64 (same idea, different assembler directive).
* Each newly-encoded instruction lands with an objdump -d
verification check in the doc-comment ("verified working"
— same convention as simd_amx.rs:16-19).
* Does NOT apply to instructions WITH stable intrinsics on
Rust 1.95: _mm512_dpbusd_epi32 (avx512vnni), F16C
_mm256_cvtph_ps, _mm512_cvtne2ps2bf16 (avx512bf16), etc.
Those continue using direct intrinsics per existing
simd_avx512.rs patterns.
The rule prevents future regression where a session reaches for
nightly avx512fp16 intrinsics, fails to compile on the project's
stable toolchain, and then drops back to scalar polyfill — the
same shape of regression that removed array_windows/add_mul in
the prior session and was recovered in 0a46e7f.
https://claude.ai/code/session_01HbqooFZHAjaUtFEzhA1R2u
0 commit comments