Summary
Update ScalarEmitter.EmitSortMethod in the source generator to emit branchless Math.Min/Math.Max compare-and-swap patterns instead of the current branching if (a > b) swap pattern for supported numeric types.
Current Generated Code
The ScalarEmitter (line 42) currently emits:
if (e1 > e26) { byte temp = e1; e1 = e26; e26 = temp; }
This compiles to cmp + conditional branch (jg/jle). For random data, each branch has ~50% probability, which is worst case for branch predictors.
Proposed Generated Code
For types with Math.Min/Math.Max overloads, emit:
{ int t0 = System.Math.Min(e1, e26); int t1 = System.Math.Max(e1, e26); e1 = t0; e26 = t1; }
The .NET JIT lowers Math.Min/Math.Max to branchless cmov instructions on x64.
Scope of Change
File: SortingNetworks.Generators/ScalarEmitter.cs
Types that get branchless Min/Max: byte, sbyte, short, ushort, int, uint, long, ulong, float, double
Types that keep branching: char, nint, nuint (no direct Math.Min/Math.Max overloads), string, and custom IComparable<T> types
Not changed:
ApplyNetworkWithComparer (loop-based IComparer path) - unchanged
- SIMD emitters (
SimdX86Emitter, SimdArmEmitter) - already branchless
- Network data fields - unchanged
Important Caveats - Benchmark Required
CMOV is not universally faster than branches for sorting networks:
- Data dependency chains: CMOV creates a serial dependency. Branches allow speculative execution, which can be faster when the predictor is accurate.
- Predictable inputs: For sorted/reversed/near-sorted data, the branch predictor quickly learns the pattern and branches win.
- CPU microarchitecture matters: CMOV latency varies (1-2 cycles on modern Intel/AMD), but branch misprediction penalty is ~12-20 cycles.
References
Summary
Update
ScalarEmitter.EmitSortMethodin the source generator to emit branchlessMath.Min/Math.Maxcompare-and-swap patterns instead of the current branchingif (a > b) swappattern for supported numeric types.Current Generated Code
The
ScalarEmitter(line 42) currently emits:This compiles to
cmp+ conditional branch (jg/jle). For random data, each branch has ~50% probability, which is worst case for branch predictors.Proposed Generated Code
For types with
Math.Min/Math.Maxoverloads, emit:The .NET JIT lowers
Math.Min/Math.Maxto branchlesscmovinstructions on x64.Scope of Change
File:
SortingNetworks.Generators/ScalarEmitter.csTypes that get branchless Min/Max:
byte,sbyte,short,ushort,int,uint,long,ulong,float,doubleTypes that keep branching:
char,nint,nuint(no directMath.Min/Math.Maxoverloads),string, and customIComparable<T>typesNot changed:
ApplyNetworkWithComparer(loop-based IComparer path) - unchangedSimdX86Emitter,SimdArmEmitter) - already branchlessImportant Caveats - Benchmark Required
CMOV is not universally faster than branches for sorting networks:
References