Skip to content

Commit b7d36bc

Browse files
committed
feat(simd): cpu-* cargo features for compile-time SimdProfile pinning
Phase 3 T3.2: add 13 mutually-exclusive cargo features that pin simd_profile() to a const at compile time, bypassing the runtime LazyLock detection from T3.1. One feature per non-Scalar variant of the SimdProfile enum. Features (mapping to LLVM target-cpu codenames): cpu-gnr → GraniteRapids (graniterapids) cpu-spr → SapphireRapids (sapphirerapids) cpu-zen4 → Zen4Avx512 (znver4) cpu-cpl → CooperLake (cooperlake) cpu-tigerlake → TigerLakeU (tigerlake) cpu-icx → IceLakeSp (icelake-server) cpu-clx → CascadeLake (cascadelake) cpu-skx → SkylakeX (skylake-avx512) cpu-arrowlake → ArrowLake (arrowlake) cpu-haswell → HaswellAvx2 (haswell) cpu-a76 → A76DotProd (cortex-a76) cpu-a72 → A72Fast (cortex-a72) cpu-a53 → A53Baseline (cortex-a53) Mutual exclusion (per integration-plan risk #1: "if a user sets cpu-spr AND has runtime detection on Zen 4, the binary SIGILLs on AMX instructions") is enforced via a const assert: each cpu-* contributes 1 to _PIN_COUNT and the assert fires at compile time if the sum exceeds one. Verified: enabling cpu-spr+cpu-zen4 simultaneously produces a build error citing the mutex. Implementation: a const pinned_profile() -> Option<SimdProfile> walks a cfg cascade and returns the active variant or None. The simd_profile() function exists in two cfg-gated forms — a runtime LazyLock variant compiled when no cpu-* feature is set, and a const-foldable variant compiled when any is set. The LazyLock is not linked into pinned binaries. is_pinned() const helper exposes whether compile-time dispatch is active, useful both for consumer-facing diagnostics and for gating arch-detection tests that no longer apply when pinning overrides hardware. Existing x86_target_lands_inside_x86_family / aarch64_target_lands_inside_aarch64_family tests early-return when pinned; two new tests (pinning_default_is_off, pinning_consistency) verify the const + runtime paths agree. Tests: 2077/2077 lib tests pass under both default and --features cpu-spr configurations. cargo clippy --lib -- -D warnings clean in both modes. Mutex compile_error verified by attempting --features "cpu-spr,cpu-zen4" — fails as expected with the const assert citation. The default (no feature) path is byte-identical to the T3.1 runtime detection — no regression risk to the merged #181 work or to the e40f3a3 SimdProfile commit.
1 parent 0c3c13d commit b7d36bc

2 files changed

Lines changed: 262 additions & 4 deletions

File tree

Cargo.toml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,36 @@ splat3d = ["std"]
252252
# quad-tree partition; the entropy coder + RDO loop land in later workers.
253253
codec = ["std"]
254254

255+
# ── Phase 3 T3.2: compile-time SimdProfile pinning ───────────────────
256+
#
257+
# Each cpu-<codename> feature, when enabled, makes
258+
# `crate::simd::simd_profile()` fold to a const at compile time and
259+
# bypass the runtime LazyLock detection. Pair with the matching
260+
# `-Ctarget-cpu=<llvm-name>` in `.cargo/config.toml` (or `RUSTFLAGS`)
261+
# for full effect — the cargo feature picks the *dispatch* variant,
262+
# while `-Ctarget-cpu` picks the *codegen* variant. Both together
263+
# produce a binary that is specialised to one silicon family.
264+
#
265+
# Features are MUTUALLY EXCLUSIVE — enable at most one. A compile-time
266+
# assert in `src/hpc/simd_profile.rs` enforces this. Multiple
267+
# pinning features active = build error.
268+
#
269+
# Codename → SimdProfile variant mapping (see
270+
# `.claude/knowledge/td-simd-cpu-dispatch-matrix.md`):
271+
cpu-gnr = [] # GraniteRapids — target-cpu=graniterapids
272+
cpu-spr = [] # SapphireRapids — target-cpu=sapphirerapids
273+
cpu-zen4 = [] # Zen4Avx512 — target-cpu=znver4 (or znver5)
274+
cpu-cpl = [] # CooperLake — target-cpu=cooperlake
275+
cpu-tigerlake = [] # TigerLakeU — target-cpu=tigerlake
276+
cpu-icx = [] # IceLakeSp — target-cpu=icelake-server
277+
cpu-clx = [] # CascadeLake — target-cpu=cascadelake
278+
cpu-skx = [] # SkylakeX — target-cpu=skylake-avx512
279+
cpu-arrowlake = [] # ArrowLake — target-cpu=arrowlake
280+
cpu-haswell = [] # HaswellAvx2 — target-cpu=haswell (or znver3)
281+
cpu-a76 = [] # A76DotProd — target-cpu=cortex-a76
282+
cpu-a72 = [] # A72Fast — target-cpu=cortex-a72
283+
cpu-a53 = [] # A53Baseline — target-cpu=cortex-a53
284+
255285
# no_std polyfill for `static LazyLock` in `src/simd.rs` (sprint A12).
256286
# Pulls in `portable-atomic` with the `critical-section` impl plus the
257287
# `critical-section` runtime so we can build a once-cell-style cache for

src/hpc/simd_profile.rs

Lines changed: 232 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -225,18 +225,200 @@ impl SimdProfile {
225225
}
226226
}
227227

228+
// ────────────────────────────────────────────────────────────────────
229+
// Phase 3 T3.2 — compile-time pinning via `cpu-*` cargo features.
230+
//
231+
// When any `cpu-<codename>` feature is set, `PINNED_PROFILE` is `Some(_)`
232+
// and `simd_profile()` folds to the const at compile time — no LazyLock
233+
// initialisation, no branch, no atomics. Without any feature set,
234+
// behaviour matches the original runtime LazyLock detection.
235+
//
236+
// Mutual exclusion: at most ONE cpu-* feature may be enabled. The
237+
// `_PIN_COUNT` const assert below fires at compile time if more than
238+
// one is active, mirroring integration-plan risk #1 ("if a user sets
239+
// cpu-spr AND has runtime detection on Zen 4, the binary SIGILLs").
240+
// ────────────────────────────────────────────────────────────────────
241+
242+
const _PIN_COUNT: u32 = 0
243+
+ cfg!(feature = "cpu-gnr") as u32
244+
+ cfg!(feature = "cpu-spr") as u32
245+
+ cfg!(feature = "cpu-zen4") as u32
246+
+ cfg!(feature = "cpu-cpl") as u32
247+
+ cfg!(feature = "cpu-tigerlake") as u32
248+
+ cfg!(feature = "cpu-icx") as u32
249+
+ cfg!(feature = "cpu-clx") as u32
250+
+ cfg!(feature = "cpu-skx") as u32
251+
+ cfg!(feature = "cpu-arrowlake") as u32
252+
+ cfg!(feature = "cpu-haswell") as u32
253+
+ cfg!(feature = "cpu-a76") as u32
254+
+ cfg!(feature = "cpu-a72") as u32
255+
+ cfg!(feature = "cpu-a53") as u32;
256+
257+
const _: () = assert!(
258+
_PIN_COUNT <= 1,
259+
"cpu-* cargo features are mutually exclusive: enable at most one (cpu-gnr, cpu-spr, cpu-zen4, cpu-cpl, cpu-tigerlake, cpu-icx, cpu-clx, cpu-skx, cpu-arrowlake, cpu-haswell, cpu-a76, cpu-a72, cpu-a53)"
260+
);
261+
262+
/// The compile-time pinned profile, or `None` when runtime detection is in
263+
/// effect. `Some(_)` exactly when one of the `cpu-*` cargo features is
264+
/// enabled; mutually exclusive features are enforced by the `_PIN_COUNT`
265+
/// const assert above.
266+
///
267+
/// Consumers wanting branch-free dispatch on pinned builds can match on
268+
/// this const directly — the optimiser folds `Some(SimdProfile::X)` into
269+
/// the call site and the `None`-arm runtime path is eliminated. Returned
270+
/// from a `const fn` so call sites in const contexts (e.g. `const` array
271+
/// initialisers for dispatch tables) work as well.
272+
pub const fn pinned_profile() -> Option<SimdProfile> {
273+
#[cfg(feature = "cpu-gnr")]
274+
{
275+
return Some(SimdProfile::GraniteRapids);
276+
}
277+
#[cfg(feature = "cpu-spr")]
278+
{
279+
return Some(SimdProfile::SapphireRapids);
280+
}
281+
#[cfg(feature = "cpu-zen4")]
282+
{
283+
return Some(SimdProfile::Zen4Avx512);
284+
}
285+
#[cfg(feature = "cpu-cpl")]
286+
{
287+
return Some(SimdProfile::CooperLake);
288+
}
289+
#[cfg(feature = "cpu-tigerlake")]
290+
{
291+
return Some(SimdProfile::TigerLakeU);
292+
}
293+
#[cfg(feature = "cpu-icx")]
294+
{
295+
return Some(SimdProfile::IceLakeSp);
296+
}
297+
#[cfg(feature = "cpu-clx")]
298+
{
299+
return Some(SimdProfile::CascadeLake);
300+
}
301+
#[cfg(feature = "cpu-skx")]
302+
{
303+
return Some(SimdProfile::SkylakeX);
304+
}
305+
#[cfg(feature = "cpu-arrowlake")]
306+
{
307+
return Some(SimdProfile::ArrowLake);
308+
}
309+
#[cfg(feature = "cpu-haswell")]
310+
{
311+
return Some(SimdProfile::HaswellAvx2);
312+
}
313+
#[cfg(feature = "cpu-a76")]
314+
{
315+
return Some(SimdProfile::A76DotProd);
316+
}
317+
#[cfg(feature = "cpu-a72")]
318+
{
319+
return Some(SimdProfile::A72Fast);
320+
}
321+
#[cfg(feature = "cpu-a53")]
322+
{
323+
return Some(SimdProfile::A53Baseline);
324+
}
325+
#[allow(unreachable_code)]
326+
None
327+
}
328+
329+
/// `true` when a `cpu-*` cargo feature has pinned the profile at compile
330+
/// time, `false` when runtime detection is in use. Equivalent to
331+
/// `pinned_profile().is_some()` but spelled out for grep-ability.
332+
pub const fn is_pinned() -> bool {
333+
pinned_profile().is_some()
334+
}
335+
336+
// The LazyLock only exists when no cpu-* feature is set. With pinning,
337+
// linking the LazyLock would defeat the purpose — we want every code
338+
// path that touches `simd_profile()` to fold to a const.
339+
#[cfg(not(any(
340+
feature = "cpu-gnr",
341+
feature = "cpu-spr",
342+
feature = "cpu-zen4",
343+
feature = "cpu-cpl",
344+
feature = "cpu-tigerlake",
345+
feature = "cpu-icx",
346+
feature = "cpu-clx",
347+
feature = "cpu-skx",
348+
feature = "cpu-arrowlake",
349+
feature = "cpu-haswell",
350+
feature = "cpu-a76",
351+
feature = "cpu-a72",
352+
feature = "cpu-a53",
353+
)))]
228354
static PROFILE: LazyLock<SimdProfile> = LazyLock::new(SimdProfile::detect);
229355

230-
/// Get the resolved silicon profile, detected once at first access.
356+
/// Get the resolved silicon profile.
357+
///
358+
/// **Default (no `cpu-*` feature):** detected once at first access via
359+
/// `LazyLock`; subsequent calls are a single pointer deref to a `Copy`
360+
/// enum — no atomics, no branching.
361+
///
362+
/// **Pinned (one `cpu-*` feature set):** returns the pinned const
363+
/// directly. The compiler folds this call into the matching variant at
364+
/// every call site; the LazyLock is not linked into the binary.
231365
///
232-
/// All subsequent calls are a single pointer deref to a `Copy` enum —
233-
/// no atomics, no branching. Pair with `*Dispatch` static tables to make
234-
/// per-call dispatch a single indirect call after monomorphisation.
366+
/// Pair with `*Dispatch` static tables to make per-call dispatch a single
367+
/// indirect call after monomorphisation (runtime path) or to fold to a
368+
/// direct call (pinned path).
369+
#[cfg(not(any(
370+
feature = "cpu-gnr",
371+
feature = "cpu-spr",
372+
feature = "cpu-zen4",
373+
feature = "cpu-cpl",
374+
feature = "cpu-tigerlake",
375+
feature = "cpu-icx",
376+
feature = "cpu-clx",
377+
feature = "cpu-skx",
378+
feature = "cpu-arrowlake",
379+
feature = "cpu-haswell",
380+
feature = "cpu-a76",
381+
feature = "cpu-a72",
382+
feature = "cpu-a53",
383+
)))]
235384
#[inline(always)]
236385
pub fn simd_profile() -> SimdProfile {
237386
*PROFILE
238387
}
239388

389+
/// Get the resolved silicon profile (pinned variant).
390+
///
391+
/// A `cpu-*` cargo feature is active: this returns the pinned constant
392+
/// directly, foldable at every call site. The runtime LazyLock is not
393+
/// linked into the binary. See the documentation on the runtime variant
394+
/// for the call-site contract.
395+
#[cfg(any(
396+
feature = "cpu-gnr",
397+
feature = "cpu-spr",
398+
feature = "cpu-zen4",
399+
feature = "cpu-cpl",
400+
feature = "cpu-tigerlake",
401+
feature = "cpu-icx",
402+
feature = "cpu-clx",
403+
feature = "cpu-skx",
404+
feature = "cpu-arrowlake",
405+
feature = "cpu-haswell",
406+
feature = "cpu-a76",
407+
feature = "cpu-a72",
408+
feature = "cpu-a53",
409+
))]
410+
#[inline(always)]
411+
pub const fn simd_profile() -> SimdProfile {
412+
// SAFETY of the unwrap: the cfg gate above guarantees at least one
413+
// cpu-* feature is set, and `pinned_profile()` returns Some(_) under
414+
// any of those gates. Const-evaluable since the inner cfg cascade is
415+
// resolved at compile time.
416+
match pinned_profile() {
417+
Some(p) => p,
418+
None => SimdProfile::Scalar, // unreachable; const fn can't panic cleanly
419+
}
420+
}
421+
240422
#[cfg(test)]
241423
mod tests {
242424
use super::*;
@@ -273,6 +455,11 @@ mod tests {
273455

274456
#[test]
275457
fn x86_target_lands_inside_x86_family() {
458+
// Pinning overrides hardware detection — the test only describes
459+
// the default (runtime-detection) path.
460+
if is_pinned() {
461+
return;
462+
}
276463
#[cfg(target_arch = "x86_64")]
277464
{
278465
let p = simd_profile();
@@ -290,13 +477,54 @@ mod tests {
290477

291478
#[test]
292479
fn aarch64_target_lands_inside_aarch64_family() {
480+
if is_pinned() {
481+
return;
482+
}
293483
#[cfg(target_arch = "aarch64")]
294484
{
295485
let p = simd_profile();
296486
assert!(p.is_aarch64(), "aarch64 silicon resolved as {:?}", p);
297487
}
298488
}
299489

490+
#[test]
491+
fn pinning_default_is_off() {
492+
// The default build (no cpu-* feature) must NOT be pinned, so
493+
// downstream consumers don't get surprised by compile-time
494+
// dispatch they didn't opt into.
495+
#[cfg(not(any(
496+
feature = "cpu-gnr",
497+
feature = "cpu-spr",
498+
feature = "cpu-zen4",
499+
feature = "cpu-cpl",
500+
feature = "cpu-tigerlake",
501+
feature = "cpu-icx",
502+
feature = "cpu-clx",
503+
feature = "cpu-skx",
504+
feature = "cpu-arrowlake",
505+
feature = "cpu-haswell",
506+
feature = "cpu-a76",
507+
feature = "cpu-a72",
508+
feature = "cpu-a53",
509+
)))]
510+
{
511+
assert!(!is_pinned());
512+
assert_eq!(pinned_profile(), None);
513+
}
514+
}
515+
516+
#[test]
517+
fn pinning_consistency() {
518+
// When pinning is in effect, simd_profile() must equal the
519+
// pinned const — hardware detection is bypassed entirely.
520+
if let Some(pinned) = pinned_profile() {
521+
assert!(is_pinned());
522+
assert_eq!(simd_profile(), pinned);
523+
} else {
524+
assert!(!is_pinned());
525+
}
526+
}
527+
300528
#[test]
301529
fn has_avx512_is_subset_of_is_x86() {
302530
for &p in &[

0 commit comments

Comments
 (0)