Skip to content

Emit branchless Math.Min/Max in ScalarEmitter for supported types#106

Open
jonathanpeppers wants to merge 5 commits into
mainfrom
jonathanpeppers/branchless-scalar-minmax
Open

Emit branchless Math.Min/Max in ScalarEmitter for supported types#106
jonathanpeppers wants to merge 5 commits into
mainfrom
jonathanpeppers/branchless-scalar-minmax

Conversation

@jonathanpeppers
Copy link
Copy Markdown
Owner

@jonathanpeppers jonathanpeppers commented May 3, 2026

Summary

Updates ScalarEmitter.EmitSortMethod in the source generator to emit platform-specific compare-and-swap patterns for numeric types. By default, a runtime X86Base.IsSupported check selects branchless Math.Min/Math.Max on x86 (JIT lowers to cmov) or branching if/swap on ARM (where branch prediction outperforms csel data-dependency chains). The JIT dead-code-eliminates the unused path.

Adds a Branchless property to SortingNetworkAttribute for explicit control.

Fixes #32

Changes

ScalarEmitter.cs

Default (auto-detect) — generated code:

if (X86Base.IsSupported)
{
    { int t0 = Math.Min(e0, e1); int t1 = Math.Max(e0, e1); e0 = t0; e1 = t1; }
}
else
{
    if (e0 > e1) { int temp = e0; e0 = e1; e1 = temp; }
}

Branchless = true — generated code:

{ int t0 = Math.Min(e0, e1); int t1 = Math.Max(e0, e1); e0 = t0; e1 = t1; }

Branchless = false — generated code:

if (e0 > e1) { int temp = e0; e0 = e1; e1 = temp; }

SortingNetworkAttribute.cs

New Branchless property:

[SortingNetwork(27, typeof(int))]                        // auto-detect (default)
[SortingNetwork(27, typeof(int), Branchless = true)]     // force branchless
[SortingNetwork(27, typeof(int), Branchless = false)]    // force branching

Type Coverage

Pattern Types
Platform-adaptive (x86: Min/Max, ARM: if/swap) byte, sbyte, short, ushort, int, uint, long, ulong, float, double
Branching always char, nint/nuint (delegates), string, custom IComparable<T>

Benchmark Results

CI run: https://github.com/jonathanpeppers/SortingNetworks/actions/runs/25268017631

int scalar sizes 23-32 — GeneratedSort Ratio vs ArraySort (Random data)

Size Ubuntu x64 (EPYC 7763) Windows x64 (EPYC 9V74) macOS ARM (M1) ARM64 (Neoverse-N2)
23 0.82x 0.68x 0.75x 1.56x (scalar only)
24 0.73x 1.02x 0.77x 1.34x
25 0.87x 0.85x 0.75x 1.89x
26 0.81x 0.84x 0.76x 1.43x
27 0.69x 0.83x 0.76x 1.37x
28 0.73x 0.73x 0.60x 1.22x
31 0.75x 1.68x
32 0.78x 1.78x

x86: 13-32% faster across all sizes — cmov eliminates branch misprediction on random data.

macOS M1 (ARM): 23-40% faster — Apple Silicon's wide out-of-order pipeline + excellent branch predictor handles sorting network branches well with if/swap.

Neoverse-N2 (ARM): Ratios >1x on Random/Reversed/Duplicates are expected — this is a narrower ARM server core where Array.Sort's introsort is competitive at these sizes. Sorted data is 0.53-0.62x (40-50% faster) because branching if/swap skips swaps entirely.

SIMD sizes 27-28 (unchanged — SIMD path is not affected)

Platform Size 27 Random Size 28 Random
Ubuntu x64 0.69x 0.73x
Windows x64 0.83x 0.73x
macOS M1 0.76x 0.60x

Test Results

All 470 tests pass across 4 new test methods:

  • PlatformSpecific_EmittedForNumericTypes (10 types)
  • BranchlessTrue_EmitsOnlyMinMax (int, float)
  • BranchlessFalse_EmitsOnlyBranching (int, float)
  • BranchingSwap_EmittedForNonMinMaxTypes (char)

Copilot AI review requested due to automatic review settings May 3, 2026 00:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the source generator’s scalar emitter so generated unrolled sort methods can use System.Math.Min/System.Math.Max compare-and-swap patterns for selected primitive numeric types instead of the previous branching swap form. In the broader codebase, this affects the scalar fallback code that SortingNetworkGenerator emits for generated sorters.

Changes:

  • Added a type allowlist in ScalarEmitter for primitives intended to use Math.Min/Math.Max.
  • Switched scalar compare-and-swap emission to choose between branchless Math.Min/Math.Max and the existing branching logic.
  • Added comments documenting the new branchless-emission path.

Comment thread SortingNetworks.Generators/ScalarEmitter.cs Outdated
Comment thread SortingNetworks.Generators/ScalarEmitter.cs
Comment thread SortingNetworks.Generators/ScalarEmitter.cs Outdated
Comment thread SortingNetworks.Generators/ScalarEmitter.cs Outdated
Comment thread SortingNetworks.Generators/ScalarEmitter.cs Outdated
@jonathanpeppers jonathanpeppers force-pushed the jonathanpeppers/branchless-scalar-minmax branch 3 times, most recently from a4566ce to fd866b6 Compare May 3, 2026 19:35
jonathanpeppers and others added 4 commits May 4, 2026 20:49
Update ScalarEmitter.EmitSortMethod to generate Math.Min/Math.Max
compare-and-swap patterns instead of branching if/swap for types with
direct System.Math overloads (byte, sbyte, short, ushort, int, uint,
long, ulong, float, double). The JIT lowers these to branchless cmov
instructions on x64.

Types without Math.Min/Max overloads (char, nint, nuint, string) and
custom IComparable<T> types retain the existing branching pattern.

Fixes #32

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace string-based MathMinMaxTypes HashSet with SpecialType-based
  SupportsBranchlessMinMax() method, avoiding duplication with the
  generator's existing type metadata
- Keep float/double in branchless path (NaN is unsupported per #10/#11)
- Add generator tests asserting Math.Min/Max emitted for numeric types
  and branching if/swap for char
- Update class-level XML docs to reflect branchless vs branching paths
- Update performance.instructions.md and README.md scalar examples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ScalarEmitter now emits a runtime X86Base.IsSupported check for numeric
  types: Math.Min/Max (cmov) on x86, if/swap on ARM where branch prediction
  outperforms csel data-dependency chains. The JIT dead-code-eliminates the
  unused path.
- Added Branchless property to SortingNetworkAttribute for explicit control:
  Branchless = true forces Math.Min/Max, false forces if/swap, unset (default)
  uses runtime auto-detection.
- Generator reads the Branchless named argument via Roslyn NamedArguments and
  passes it through NetworkRequest to ScalarEmitter.
- Updated tests: PlatformSpecific_EmittedForNumericTypes (10 types),
  BranchlessTrue_EmitsOnlyMinMax, BranchlessFalse_EmitsOnlyBranching.
- Updated performance.instructions.md and README.md with platform-specific
  documentation and Branchless attribute examples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add int scalar sizes 23-32 benchmark table showing platform-specific
results across Ubuntu x64, Windows x64, and macOS ARM. Update ARM64
int detailed results with latest CI run data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jonathanpeppers jonathanpeppers force-pushed the jonathanpeppers/branchless-scalar-minmax branch from fd866b6 to 57acd68 Compare May 5, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: Branchless compare-and-swap in ScalarEmitter source generator

2 participants