Skip to content

Use block-scoped atomics in BlockHistogramAtomic on SM 60+#8162

Open
NitishNaineni wants to merge 1 commit intoNVIDIA:mainfrom
NitishNaineni:histogram-atomic-scope
Open

Use block-scoped atomics in BlockHistogramAtomic on SM 60+#8162
NitishNaineni wants to merge 1 commit intoNVIDIA:mainfrom
NitishNaineni:histogram-atomic-scope

Conversation

@NitishNaineni
Copy link

@NitishNaineni NitishNaineni commented Mar 25, 2026

Closes #3357

Summary

  • Replaces atomicAdd with atomicAdd_block in BlockHistogramAtomic::Composite on SM 60+ using NV_IF_TARGET, with fallback to atomicAdd on older architectures.
  • This matches the pattern already used in agent_histogram.cuh (lines 312, 325, 350).

Codegen verification

SASS confirms the scope change on SM 86 (RTX 3080):

  • Before: RED.E.ADD.STRONG.GPU
  • After: RED.E.ADD.STRONG.SM

Performance was identical on Ampere in my benchmarks (1024 threads × 16 items, per-block global histograms, various bin counts and grid sizes).

Test plan

  • cub.test.block.histogram.bins_32 — all 1512 assertions passed
  • cub.test.block.histogram.bins_256 — all 1512 assertions passed
  • cub.test.block.histogram.bins_1024 — all 1512 assertions passed

@NitishNaineni NitishNaineni requested a review from a team as a code owner March 25, 2026 01:02
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 25, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment on lines +56 to +58
NV_IF_TARGET(NV_PROVIDES_SM_60,
(atomicAdd_block(histogram + items[i], 1);),
(atomicAdd(histogram + items[i], 1);));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only suppport SM75 and up, so this should just be

Suggested change
NV_IF_TARGET(NV_PROVIDES_SM_60,
(atomicAdd_block(histogram + items[i], 1);),
(atomicAdd(histogram + items[i], 1);));
atomicAdd_block(histogram + items[i], 1);

Copy link
Contributor

@bernhardmgruber bernhardmgruber Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do accept PRs that add/maintain support for older architectures though. So this change is fine. But we are also fine with just the version that @miscco suggested.

@NitishNaineni I leave the choice to you whether you maintain the fallback for below SM60 or not.

@github-project-automation github-project-automation bot moved this from Todo to In Progress in CCCL Mar 25, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[FEA]: Reduce scope of histogram atomics

3 participants