You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are not in the cross-arch parity surface — consumers requesting
191
+
256-bit / 512-bit shapes go through the composed wrappers.
192
+
193
+
### Gaps surfaced 2026-05-20
194
+
195
+
-**`F32x8` / `F64x4` are universal on x86**, even on the v3 / AVX2 path
196
+
— they share the `__m256` / `__m256d` declarations exposed by
197
+
`simd_avx512.rs` (AVX, not AVX-512; works on every host with AVX
198
+
support, i.e. Sandy Bridge+). The previous matrix marked them `❌`
199
+
in the v3 column — corrected above.
200
+
-**`U32x8` / `U64x4`** exist only in `simd_nightly` (via `core::simd`).
201
+
No native or polyfill wrapper on x86 or aarch64. Add to `simd_avx512`
202
+
+`simd_scalar` if a consumer needs them at 256-bit width.
203
+
-**`I32x8` / `I64x4` / `U16x16`** missing across every backend (incl.
204
+
nightly). Theoretical 256-bit shapes that no consumer has reached for
205
+
yet; add to backlog if needed.
206
+
-**`F32Mask8` / `F64Mask4`** are declared in `simd_scalar` as
207
+
`F32Mask8Scalar` / `F64Mask4Scalar` (the rename came from a duplicate-
208
+
decl conflict on i686 — see `src/simd_scalar.rs:340-345`). Not
209
+
surfaced through `crate::simd::*`. If consumers want these mask
210
+
widths, expose them and unify the name (drop the `Scalar` suffix on
211
+
AVX-512 where `__mmask8` natively maps to F64Mask8 already; the
212
+
256-bit f64 lane width needs a 4-bit mask which `__mmask8` can hold
213
+
but isn't yet typed as `F64Mask4`).
172
214
173
215
### Read of the matrix
174
216
@@ -199,7 +241,7 @@ Ranked by P0 (blocks current CI / consumers) → P3 (nice-to-have).
199
241
|**TD-SIMD-5**|**P1**| Scalar fallback inline in `simd.rs` (`pub(crate) mod scalar`) makes symmetry hard — every other backend is its own file. | inspection | Promote to `src/simd_scalar.rs`; `simd.rs` becomes pure dispatch. ~mechanical refactor. |
200
242
|**TD-SIMD-6**|**P2**| No `runtime-dispatch` feature / `simd_runtime` module exists yet. Release-binary distribution to heterogeneous silicon requires recompile per target today. |`grep -r "LazyLock<CpuCaps>"` only matches reporting code in `simd.rs:52-55`| New module wiring per-op trampolines from the compiled-in backends. ~300 LoC + one new cargo feature. |
201
243
|**TD-SIMD-7**|**P2**| Compile-time arms in `simd.rs:153-194` are duplicated four times (one per type group: F32x16, F64x8, U8x32, BF16x16). Adding a new lane requires copy-pasting four `#[cfg(...)]` arms. | inspection | Single source-of-truth macro emitting the arms. ~one macro_rules!, 50 LoC. |
202
-
|**TD-SIMD-8**|**P2**|`F16Scaler` in `simd_avx2.rs:2566` is a scalar implementation masquerading as a SIMD type. Consumers using `F16x16`on v3 get scalar perf without warning. | grep `F16Scaler`|Either gate `F16x16` behind `target_feature = "f16c"`or rename / document the scalar nature. ~20 LoC + docs. |
244
+
|**TD-SIMD-8**|**P2**|`F16x16` in `src/simd_half.rs:123` is a scalar `[u16; 16]` polyfill — every arithmetic op upcasts to f32, computes, downcasts. Consumers using `crate::simd::F16x16` get scalar perf even on AVX-512 hardware with `vcvtph2ps` / `vcvtps2ph`. (`F16Scaler` in `simd_avx2.rs:2566` is unrelated — it's a *scaling context* for range-normalizing values before f16 encoding, not the F16x16 SIMD type.) | inspection of `src/simd_half.rs:115-150`|(a) Replace the `[u16; 16]` storage with `__m256i` + `_mm256_cvtph_ps` / `_mm256_cvtps_ph` under `target_feature = "f16c"`(Sapphire Rapids+, all Skylake AVX-512). (b) Add an `F16x16Scalar` alias and route consumers explicitly. (c) Add a doc-warning at the type level pointing at the architecture doc. ~80 LoC. |
203
245
|**TD-SIMD-9**|**P3**| No CI matrix entry for the `nightly-simd` polyfill path. |`.github/workflows/ci.yaml`| Add a `nightly-simd-polyfill` job that builds with `--features nightly-simd` on nightly rustc. ~20 LoC YAML. |
204
246
|**TD-SIMD-10**|**P3**| No CI matrix entry for `.cargo/config-avx512.toml`. AVX-512 deployment path silently bit-rots between PRs. |`.github/workflows/ci.yaml`| Add an `avx-512-explicit` job using a runner with AVX-512 silicon. ~20 LoC YAML; runner availability TBD. |
0 commit comments