@@ -23,7 +23,7 @@ Same set as `td-simd-cpu-dispatch-matrix.md` § "Master matrix — x86_64" and
2323| Z5 | ` znver5 ` / ` Zen4Avx512 ` (same dispatch) | AMD 2024 | same as Z4 + minor uarch |
2424| ARL | ` arrowlake ` / ` ArrowLake ` | Intel 2024 | AVX2+FMA + AVX-VNNI+VNNI-INT8 |
2525| HSW | ` x86-64-v3 ` / ` HaswellAvx2 ` | Intel 2013→2021 | AVX2+FMA (no VNNI/AVX-512) |
26- | A76 | ` cortex-a76 ` / ` A76DotProd ` | ARMv8.2 (Pi 5, M1) | NEON+dotprod+bf16+ fp16 |
26+ | A76 | ` cortex-a76 ` / ` A76DotProd ` | ARMv8.2 (Pi 5) | NEON+dotprod+fp16 (no bf16 / i8mm — those are V8.6+, see § M) |
2727| A72 | ` cortex-a72 ` / ` A72Fast ` | ARMv8.0 (Pi 4) | NEON only (no dotprod) |
2828| A53 | ` cortex-a53 ` / ` A53Baseline ` | ARMv8.0 (Pi 3/Z2W) | NEON, lower IPC |
2929| SCA | scalar fallback | wasm32/riscv/i686 | no SIMD |
@@ -530,6 +530,76 @@ verifies that no per-CPU regression has crept in vs the historical baseline:
530530 `crate :: simd :: * `, this table must grow a row . Reviewers should reject
531531 PRs that add a public symbol without a corresponding matrix entry .
532532
533+ ## M . AArch64 ground - truth core enumeration (GCC source )
534+
535+ The matrix above uses three aarch64 columns (A53 / A72 / A76 ) that
536+ each cover a * dispatch tier * — multiple physical cores share the same
537+ SIMD primitive set . The authoritative per - core feature membership is
538+ in GCC 's `gcc / config / aarch64 / aarch64 - cores . def`, scraped 2026 - 05 - 21 :
539+
540+ | Core | GCC arch | Explicit feature flags |
541+ | --- | --- | --- |
542+ | * * A53 / A72 / A76 tier ** (baseline NEON , optional dotprod + fp16 , NO bf16 ) | | |
543+ | `cortex - a53 ` | V8 - A | `(CRC )` |
544+ | `cortex - a72 ` | V8 - A | `(CRC )` |
545+ | `cortex - a76 ` | V8 . 2 - A | `F16 , RCPC , DOTPROD ` |
546+ | `cortex - a78 ` | V8 . 2 - A | `F16 , RCPC , DOTPROD , SSBS , PROFILE ` |
547+ | `cortex - x1 ` | V8 . 2 - A | `F16 , RCPC , DOTPROD , SSBS , PROFILE ` |
548+ | `neoverse - n1 `| V8 . 2 - A | `F16 , RCPC , DOTPROD , PROFILE ` |
549+ | `apple - m1 ` | V8 . 5 - A | `()` — V8 . 5 baseline includes F16 + dotprod , NO bf16 / i8mm |
550+ | * * V8 . 6 - A tier ** (BF16 + I8MM via baseline ) | | |
551+ | `apple - m2 ` | V8 . 6 - A | `()` — V8 . 6 baseline → bf16 , i8mm , sve , sve2 |
552+ | `apple - m3 ` | V8 . 6 - A | same |
553+ | `oryon - 1 ` | V8 . 6 - A | `CRYPTO , SM4 , SHA3 , F16 ` (Snapdragon X Elite / Plus ) |
554+ | `ampere1 ` | V8 . 6 - A | `F16 , RNG , AES , SHA3 ` |
555+ | `ampere1a ` | V8 . 6 - A | `F16 , RNG , AES , SHA3 , SM4 , MEMTAG ` |
556+ | * * V8 . 7 - A tier ** (baseline + LS64 + MOPS ) | | |
557+ | `apple - m4 ` | V8 . 7 - A | `()` |
558+ | `ampere1b ` | V8 . 7 - A | `F16 , RNG , AES , SHA3 , SM4 , MEMTAG , CSSC ` |
559+ | * * V9 . 0 - A tier ** (SVE2 baseline + explicit bf16 / i8mm ) | | |
560+ | `cortex - a510 `| V9 - A | `SVE2_BITPERM , MEMTAG , I8MM , BF16 ` |
561+ | `cortex - a710 `| V9 - A | `SVE2_BITPERM , MEMTAG , I8MM , BF16 ` |
562+ | `cortex - a715 `| V9 - A | `SVE2_BITPERM , MEMTAG , I8MM , BF16 ` |
563+ | `cortex - x2 ` | V9 - A | `SVE2_BITPERM , MEMTAG , I8MM , BF16 ` |
564+ | `cortex - x3 ` | V9 - A | `SVE2_BITPERM , MEMTAG , I8MM , BF16 ` |
565+ | `neoverse - n2 `| V9 - A | `I8MM , BF16 , SVE2_BITPERM , RNG , MEMTAG , PROFILE ` |
566+ | `neoverse - v2 `| V9 - A | `I8MM , BF16 , SVE2_BITPERM , RNG , MEMTAG , PROFILE ` (Graviton 4 ) |
567+ | `grace ` | V9 - A | `I8MM , BF16 , SVE2_BITPERM , SVE2_AES , SVE2_SHA3 , SVE2_SM4 , PROFILE ` |
568+ | * * V8 . 4 - A SVE tier ** (Graviton 3 's odd one ) | | |
569+ | `neoverse - v1 `| V8 . 4 - A | `SVE , I8MM , BF16 , PROFILE , SSBS , RNG ` |
570+ | * * V9 . 2 - A tier ** (V9 + V8 . 7 features ) | | |
571+ | `cortex - a520 `| V9 . 2 - A | `SVE2_BITPERM , MEMTAG ` |
572+ | `cortex - a720 `| V9 . 2 - A | `SVE2_BITPERM , MEMTAG , PROFILE ` |
573+ | `cortex - a725 `| V9 . 2 - A | `SVE2_BITPERM , MEMTAG , PROFILE ` |
574+ | `cortex - x4 ` | V9 . 2 - A | `SVE2_BITPERM , MEMTAG , PROFILE ` |
575+ | `cortex - x925 `| V9 . 2 - A | `SVE2_BITPERM , MEMTAG , PROFILE ` |
576+ | `neoverse - n3 `| V9 . 2 - A | `SVE2_BITPERM , RNG , MEMTAG , PROFILE ` |
577+ | `neoverse - v3 `| V9 . 2 - A | `SVE2_BITPERM , RNG , LS64 , MEMTAG , PROFILE ` |
578+
579+ * * Dispatch tier mapping (which matrix column each core lands in ): **
580+
581+ | Tier (matrix col . ) | Cores |
582+ | --- | --- |
583+ | A53 | `cortex - a53 `, older V8 . 0 - A |
584+ | A72 | `cortex - a72 `, V8 . 0 - A + CRC |
585+ | A76 (V8 . 2 with dotprod + fp16 , NO bf16 / i8mm ) | `cortex - a76 `, `cortex - a78 `, `cortex - x1 `, `neoverse - n1 `, `apple - m1 ` |
586+ | ** (new tier — V8 . 6 +/ V9 with bf16 + i8mm )** | `apple - m2 `+ , `oryon - 1 ` (Snapdragon X ), `cortex - a510 `+ , `neoverse - n2 `/ `v2 `/ `grace `, `ampere1 `+ |
587+ | ** (new tier — V8 . 4 - A + SVE + bf16 + i8mm )** | `neoverse - v1 ` (Graviton 3 — only V8 . 4 - A core with explicit SVE + bf16 + i8mm ) |
588+
589+ The matrix 's three aarch64 columns cover the bottom of the dispatch
590+ ladder . The bf16 / i8mm tier (which would carry NEON BFMMLA / BFDOT /
591+ USDOT / FMLA . 8h) needs its own column in a future revision — when the
592+ NEON BF16 asm - byte arm lands (Phase 3b in § J ), every V8 . 6 + core
593+ listed above gets covered by the same dispatch arm .
594+
595+ * * Source provenance : ** scraped from
596+ `https : // raw.githubusercontent.com/gcc-mirror/gcc/master/gcc/config/aarch64/aarch64-cores.def`
597+ (GCC trunk , 2026 - 05 - 21 ). The `AARCH64_CORE (... )` macro emits the
598+ canonical name → arch → feature - string mapping ; GCC 's
599+ `(define_insn ... )` patterns in `aarch64 - simd . md` give the bit
600+ encodings for the asm - byte rule (`. inst 0xXXXXXXXX`) that Phase 3b
601+ will use for BFMMLA / BFDOT / FMLA . 8h / USDOT .
602+
533603## L . Provenance
534604
535605- CPU feature presence: sourced from `td- simd- cpu- dispatch- matrix. md`.
0 commit comments