Emit branchless Math.Min/Max in ScalarEmitter for supported types#106
Open
jonathanpeppers wants to merge 5 commits into
Open
Emit branchless Math.Min/Max in ScalarEmitter for supported types#106jonathanpeppers wants to merge 5 commits into
jonathanpeppers wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the source generator’s scalar emitter so generated unrolled sort methods can use System.Math.Min/System.Math.Max compare-and-swap patterns for selected primitive numeric types instead of the previous branching swap form. In the broader codebase, this affects the scalar fallback code that SortingNetworkGenerator emits for generated sorters.
Changes:
- Added a type allowlist in
ScalarEmitterfor primitives intended to useMath.Min/Math.Max. - Switched scalar compare-and-swap emission to choose between branchless
Math.Min/Math.Maxand the existing branching logic. - Added comments documenting the new branchless-emission path.
a4566ce to
fd866b6
Compare
Update ScalarEmitter.EmitSortMethod to generate Math.Min/Math.Max compare-and-swap patterns instead of branching if/swap for types with direct System.Math overloads (byte, sbyte, short, ushort, int, uint, long, ulong, float, double). The JIT lowers these to branchless cmov instructions on x64. Types without Math.Min/Max overloads (char, nint, nuint, string) and custom IComparable<T> types retain the existing branching pattern. Fixes #32 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace string-based MathMinMaxTypes HashSet with SpecialType-based SupportsBranchlessMinMax() method, avoiding duplication with the generator's existing type metadata - Keep float/double in branchless path (NaN is unsupported per #10/#11) - Add generator tests asserting Math.Min/Max emitted for numeric types and branching if/swap for char - Update class-level XML docs to reflect branchless vs branching paths - Update performance.instructions.md and README.md scalar examples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ScalarEmitter now emits a runtime X86Base.IsSupported check for numeric types: Math.Min/Max (cmov) on x86, if/swap on ARM where branch prediction outperforms csel data-dependency chains. The JIT dead-code-eliminates the unused path. - Added Branchless property to SortingNetworkAttribute for explicit control: Branchless = true forces Math.Min/Max, false forces if/swap, unset (default) uses runtime auto-detection. - Generator reads the Branchless named argument via Roslyn NamedArguments and passes it through NetworkRequest to ScalarEmitter. - Updated tests: PlatformSpecific_EmittedForNumericTypes (10 types), BranchlessTrue_EmitsOnlyMinMax, BranchlessFalse_EmitsOnlyBranching. - Updated performance.instructions.md and README.md with platform-specific documentation and Branchless attribute examples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add int scalar sizes 23-32 benchmark table showing platform-specific results across Ubuntu x64, Windows x64, and macOS ARM. Update ARM64 int detailed results with latest CI run data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fd866b6 to
57acd68
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates
ScalarEmitter.EmitSortMethodin the source generator to emit platform-specific compare-and-swap patterns for numeric types. By default, a runtimeX86Base.IsSupportedcheck selects branchlessMath.Min/Math.Maxon x86 (JIT lowers tocmov) or branchingif/swapon ARM (where branch prediction outperformscseldata-dependency chains). The JIT dead-code-eliminates the unused path.Adds a
Branchlessproperty toSortingNetworkAttributefor explicit control.Fixes #32
Changes
ScalarEmitter.csDefault (auto-detect) — generated code:
Branchless = true— generated code:Branchless = false— generated code:SortingNetworkAttribute.csNew
Branchlessproperty:Type Coverage
byte,sbyte,short,ushort,int,uint,long,ulong,float,doublechar,nint/nuint(delegates),string, customIComparable<T>Benchmark Results
CI run: https://github.com/jonathanpeppers/SortingNetworks/actions/runs/25268017631
int scalar sizes 23-32 — GeneratedSort Ratio vs ArraySort (Random data)
x86: 13-32% faster across all sizes —
cmoveliminates branch misprediction on random data.macOS M1 (ARM): 23-40% faster — Apple Silicon's wide out-of-order pipeline + excellent branch predictor handles sorting network branches well with
if/swap.Neoverse-N2 (ARM): Ratios >1x on Random/Reversed/Duplicates are expected — this is a narrower ARM server core where
Array.Sort's introsort is competitive at these sizes. Sorted data is 0.53-0.62x (40-50% faster) because branchingif/swapskips swaps entirely.SIMD sizes 27-28 (unchanged — SIMD path is not affected)
Test Results
All 470 tests pass across 4 new test methods:
PlatformSpecific_EmittedForNumericTypes(10 types)BranchlessTrue_EmitsOnlyMinMax(int, float)BranchlessFalse_EmitsOnlyBranching(int, float)BranchingSwap_EmittedForNonMinMaxTypes(char)