Refactor DeviceReduce dispatch logic by bernhardmgruber · Pull Request #9088 · NVIDIA/cccl

bernhardmgruber · 2026-05-21T09:22:52Z

No description provided.

coderabbitai · 2026-05-21T09:25:39Z

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Reworked internal device reduction logic to centralize min/max handling, standardize determinism fallbacks for sum/min/max operations, and simplify dispatch paths.
- No public API changes; results are now more consistent and reliability of reduction operations under different environment settings has been improved.

suggestion:

Walkthrough

This PR refactors determinism dispatch for DeviceReduce: reduce_impl now uses compile-time determinism branches that call detail::dispatch_with_env_and_tuning; determinism fallback computation was moved into __transform_reduce; min/max behavior was centralized into __minmax_reduce; Sum/Min/Max env overloads now delegate to those helpers.

Changes

Device reduce determinism dispatch refactoring

Layer / File(s)	Summary
reduce_impl determinism dispatch `cub/cub/device/device_reduce.cuh`	`reduce_impl` dispatch switched from lambda-based tuning/query to explicit `if constexpr` branches on determinism tags (`gpu_to_gpu`, `not_guaranteed`, default), each calling the corresponding dispatch routine via `detail::dispatch_with_env_and_tuning`.
Transform-reduce determinism handling `cub/cub/device/device_reduce.cuh`	`__transform_reduce` now computes `default_determinism_t` from environment requirements, defines determinism-dependent fallback predicates for integral types and float/double `plus`/min/max operators, and refines the "4B or greater" condition to use `sizeof(AccumT)`.
Min/max reduce helper introduction `cub/cub/device/device_reduce.cuh`	New private `__minmax_reduce` helper centralizes min/max env determinism handling: rejects `gpu_to_gpu`, forces `run_to_run`, computes output limits, and forwards to `reduce_impl` with `identity` transform.
Public API rewiring `cub/cub/device/device_reduce.cuh`	`Sum` env overload simplified to call `__transform_reduce` with `plus`/`identity`. `Min` and `Max` env overloads compute `OutputT` and limits, then delegate to `__minmax_reduce`, removing duplicated determinism bodies.

Suggested reviewers

shwina
pauleonix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 306f06d3-aefb-42db-b33e-5527015ec795

📥 Commits

Reviewing files that changed from the base of the PR and between 8b47911 and b295493.

📒 Files selected for processing (1)

cub/cub/device/device_reduce.cuh

coderabbitai

🧹 Nitpick comments (1)

cub/cub/device/device_reduce.cuh (1)

114-161: important: This changes dispatch and determinism selection on a Device* algorithm path. Please do the required before/after SASS comparison and run the CUB benchmarks before merge; otherwise codegen and throughput regressions in the new dispatch split stay unvalidated.

As per coding guidelines, **/cub/**/device*.{cu,cuh,h}: Verify no SASS code generation changes occur for Device* algorithms in CUB by comparing generated SASS output before and after changes; Run benchmark tests using the CUB Benchmarks framework when modifying Device* algorithms to verify no performance regressions occur.

Also applies to: 173-240, 244-270, 549-551

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6fde9ca1-11b8-4a40-aa81-d6719380fcf5

📥 Commits

Reviewing files that changed from the base of the PR and between b295493 and 96c47e0.

📒 Files selected for processing (1)

cub/cub/device/device_reduce.cuh

github-actions · 2026-05-21T14:44:01Z

🥳 CI Workflow Results

🟩 Finished in 2h 33m: Pass: 100%/285 | Total: 11d 04h | Max: 2h 32m | Hits: 21%/914257

See results here.

tpn

Reviewed the DeviceReduce dispatch refactor. No blocking issues from me; the CI summary is green.

NaderAlAwar · 2026-05-22T13:06:31Z

-    return reduce_impl(
-      d_in, d_out, num_items, ::cuda::std::plus<>{}, ::cuda::std::identity{}, InitT{}, determinism_t{}, env);
+    using accum_t = ::cuda::std::__accumulator_t<::cuda::std::plus<>, cub::detail::it_value_t<InputIteratorT>, OutputT>;
+    return __transform_reduce<accum_t>(


Critical: this breaks Sum(uint8_t*, uint8_t*, ...) so that it doesn't compile anymore. Prior to this change, we fell back to run to run determinism for output types < 4 bytes. But __transform_reduce checks AccumT instead of OutputT. For uint8_t, AccumT is int due to integer promotion, so the kernel will attempt to do atomicAdd(unsigned char*, int) which doesn't compile

This issue already existed for Reduce() before this PR, but sum had its own detection logic

Thinking about this some more, it is probably okay to merge this PR and I'll create a fix for __transform_reduce to check OutputT instead of AccumT, or you can driveby fix it if you want, but the fix is slightly involved and I would like to add tests to verify the behavior. We basically need to check two things:

OutputT == AccumT

atomicAdd() is supported for that type. This is mostly handled by is_4b_or_greater but it occurs to me now that there is no overload for long long.

I'm going to approve, let me know if you prefer fixing it here or I can fix it after this gets merged

bernhardmgruber requested a review from a team as a code owner May 21, 2026 09:22

bernhardmgruber requested a review from fbusato May 21, 2026 09:22

github-project-automation Bot added this to CCCL May 21, 2026

github-project-automation Bot moved this to Todo in CCCL May 21, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread cub/cub/device/device_reduce.cuh

Comment thread cub/cub/device/device_reduce.cuh

This comment has been minimized.

Sign in to view

bernhardmgruber added 3 commits May 21, 2026 14:08

Refactor DeviceReduce dispatch logic

8b40372

Refactor DeviceReduce dispatch logic

fe6dc9b

Review

96c47e0

bernhardmgruber force-pushed the ref_reduce branch from b295493 to 96c47e0 Compare May 21, 2026 12:08

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

tpn approved these changes May 21, 2026

View reviewed changes

NaderAlAwar approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor DeviceReduce dispatch logic#9088

Refactor DeviceReduce dispatch logic#9088
bernhardmgruber wants to merge 3 commits into
NVIDIA:mainfrom
bernhardmgruber:ref_reduce

bernhardmgruber commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

tpn left a comment

Uh oh!

NaderAlAwar May 22, 2026

Uh oh!

NaderAlAwar May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bernhardmgruber commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 33m: Pass: 100%/285 | Total: 11d 04h | Max: 2h 32m | Hits: 21%/914257

Uh oh!

tpn left a comment

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar May 22, 2026

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 21, 2026 •

edited

Loading