Fix operand order in accumulate for non-commutative operators#78
Open
jariji wants to merge 1 commit intoJuliaGPU:mainfrom
Open
Fix operand order in accumulate for non-commutative operators#78jariji wants to merge 1 commit intoJuliaGPU:mainfrom
jariji wants to merge 1 commit intoJuliaGPU:mainfrom
Conversation
The scan implementation applied `op(right, left)` instead of `op(left, right)` in several places, producing wrong results for associative-but-non-commutative operators (e.g. matrix multiply). This was invisible in existing tests since they all used `+`. Fixes: - CPU: cross-task prefix combination (accumulate_1d_cpu.jl) - GPU: up-sweep reduction (accumulate_1d_gpu.jl line 70) - GPU: decoupled lookback prefix accumulation (lines 176, 179) - GPU: coupled preblocks prefix accumulation (lines 239-241) Adds test using 2x2 matrix multiplication as a non-commutative op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #67
Summary
I noticed
accumulategives unexpected results when used with non-commutative operators. It looks like a few code paths applyop(right, left)rather thanop(left, right), which is fine for+but matters for things like matrix multiply.Changes
CPU (
accumulate_1d_cpu.jl):op(v[i], shared[itask-1])→op(shared[itask-1], v[i])GPU (
accumulate_1d_gpu.jl):op(temp[_bi], temp[_ai])→op(temp[_ai], temp[_bi])_accumulate_previous!(DecoupledLookback): earlier block's value as left operand_accumulate_previous_coupled_preblocks!(ScanPrefixes): accumulate earlier chunks left-to-right, then prepend to running prefix (also avoids unsigned reverse-loop issue on GPU)Also added a test using 2x2 matrix multiplication as a non-commutative operator — the existing tests all use
+so they couldn't catch this.Test plan
accumulate_1d_noncommutativetests pass on CPU (203 tests across random sizes and block sizes)