Skip to content

Fix operand order in accumulate for non-commutative operators#78

Open
jariji wants to merge 1 commit intoJuliaGPU:mainfrom
jariji:fix-noncommutative-accumulate
Open

Fix operand order in accumulate for non-commutative operators#78
jariji wants to merge 1 commit intoJuliaGPU:mainfrom
jariji:fix-noncommutative-accumulate

Conversation

@jariji
Copy link

@jariji jariji commented Mar 4, 2026

Resolves #67

Summary

I noticed accumulate gives unexpected results when used with non-commutative operators. It looks like a few code paths apply op(right, left) rather than op(left, right), which is fine for + but matters for things like matrix multiply.

Changes

CPU (accumulate_1d_cpu.jl):

  • Cross-task prefix combination: op(v[i], shared[itask-1])op(shared[itask-1], v[i])

GPU (accumulate_1d_gpu.jl):

  • Up-sweep reduction: op(temp[_bi], temp[_ai])op(temp[_ai], temp[_bi])
  • _accumulate_previous! (DecoupledLookback): earlier block's value as left operand
  • _accumulate_previous_coupled_preblocks! (ScanPrefixes): accumulate earlier chunks left-to-right, then prepend to running prefix (also avoids unsigned reverse-loop issue on GPU)

Also added a test using 2x2 matrix multiplication as a non-commutative operator — the existing tests all use + so they couldn't catch this.

Test plan

  • All existing accumulate tests still pass
  • New accumulate_1d_noncommutative tests pass on CPU (203 tests across random sizes and block sizes)
  • Tested on AMDGPU (ROCArray) — 201/201 pass including small block_size=16
  • Confirmed the new test catches the issue when the fix is reverted

The scan implementation applied `op(right, left)` instead of
`op(left, right)` in several places, producing wrong results for
associative-but-non-commutative operators (e.g. matrix multiply).
This was invisible in existing tests since they all used `+`.

Fixes:
- CPU: cross-task prefix combination (accumulate_1d_cpu.jl)
- GPU: up-sweep reduction (accumulate_1d_gpu.jl line 70)
- GPU: decoupled lookback prefix accumulation (lines 176, 179)
- GPU: coupled preblocks prefix accumulation (lines 239-241)

Adds test using 2x2 matrix multiplication as a non-commutative op.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected result from accumulate

1 participant