[cub] Fix compilation warnings with `nvc++` by davebayer · Pull Request #9068 · NVIDIA/cccl

davebayer · 2026-05-19T08:02:26Z

This PR fixes compilation warnings emitted during CUB compilation with nvc++.

coderabbitai · 2026-05-19T08:12:38Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Enhanced compatibility with the NVHPC CUDA compiler for atomic operations and mathematical functions.
- Improved device/host code dispatch for atomic operations and memory operations.
Refactor
- Optimized compiler-specific code paths for better cross-compiler support and reliability.

Walkthrough

This PR applies NVHPC compiler workarounds for atomicAdd_block and __umul64hi by using fallback implementations, migrates ten atomic dispatcher functions to the NV_IF_ELSE_TARGET pattern for consistent device/host selection, and adjusts test storage declarations and inline assembly constraints.

Changes

NVHPC Compiler Workarounds

Layer / File(s)	Summary
Histogram accumulation atomic workaround `cub/cub/agent/agent_histogram.cuh`	Three accumulation paths (RLE per-run, RLE final-pixel, non-RLE per-pixel) now use plain `atomicAdd` under NVHPC, bypassing `atomicAdd_block` to work around a compiler issue.
64-bit multiply-high inline PTX `cub/cub/detail/fast_modulo_division.cuh`, `libcudacxx/include/cuda/__cmath/mul_hi.h`	NVHPC builds use inline PTX (`mul.hi.u64`) instead of `__umul64hi` for 64-bit high-word multiplication; non-NVHPC continues using the intrinsic.
Complex literal API decoration for NVHPC `libcudacxx/include/cuda/std/__complex/literals.h`	Long-double complex literal overloads conditionally omit `_CCCL_API` decoration when compiling with NVHPC to avoid hard errors in device code.

Atomic Dispatcher Migration to NV_IF_ELSE_TARGET

Layer / File(s)	Summary
Fence and signal fence dispatchers `libcudacxx/include/cuda/std/__atomic/types/base.h`	`__atomic_thread_fence_dispatch` and `__atomic_signal_fence_dispatch` migrate to `NV_IF_ELSE_TARGET` for device/host routing.
Store, load, and exchange dispatchers `libcudacxx/include/cuda/std/__atomic/types/base.h`	`__atomic_store/load/exchange_dispatch` functions switch to `NV_IF_ELSE_TARGET` with updated return/flow control.
Compare-exchange dispatchers `libcudacxx/include/cuda/std/__atomic/types/base.h`	Strong and weak compare-exchange dispatchers migrate to `NV_IF_ELSE_TARGET` with `__result` assigned via wrapped expressions.
Fetch-style operation dispatchers `libcudacxx/include/cuda/std/__atomic/types/base.h`	Fetch-add, fetch-sub, fetch-and/or/xor, and min/max dispatchers convert to `NV_IF_ELSE_TARGET` with device branches returning through expressions.
Memory comparison dispatcher `libcudacxx/include/cuda/std/__atomic/types/common.h`	`__atomic_memcmp` refactors to `NV_IF_ELSE_TARGET`, wrapping device-side bytewise comparison in a statement-expression.

Test and Implementation Adjustments

Layer / File(s)	Summary
Storage duration adjustments `cub/test/catch2_test_device_exclusive_scan_noncommutative.cu`, `cub/test/catch2_test_device_segmented_scan_multi_segment.cu`, `cub/test/catch2_test_device_segmented_scan_noncommutative.cu`	Lookup table declarations change from `static constexpr` to `constexpr`, removing static storage while preserving const-initialization.
Host/device qualification cleanup `cub/test/catch2_test_device_partition_flagged.cu`	`__host__ __device__` qualifiers removed from `operator<<` friend declaration in test helper type.
Inline assembly and control flow `cub/test/catch2_test_device_transform.cu`, `cub/cub/block/block_load_to_shared.cuh`	Shared-memory references in kernel inline-assembly use input-only constraints; `__try_wait()` adds `_CCCL_UNREACHABLE()` after dispatcher.
Type casting in floating-point test `cub/test/internal/catch2_test_integer_utils.cu`	Integer operation result explicitly cast to floating-point type `T` before comparison in assertion.

Suggested reviewers

pciolkosz
ericniebler
bernhardmgruber

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

libcudacxx/include/cuda/std/__complex/literals.h (1)

49-53: ⚡ Quick win

suggestion: Use _CCCL_HOST_API on the NVHPC branch instead of leaving these overloads undecorated.
This workaround clearly wants host-only literals under NVHPC, but dropping the API macro entirely makes the availability of these public overloads depend on compiler defaults. Switching the NVHPC branch to _CCCL_HOST_API keeps the intent explicit and preserves the normal libcudacxx annotation contract. As per coding guidelines, "ensure functions are properly annotated with _CCCL_HOST_API, _CCCL_DEVICE_API, or _CCCL_API".

Also applies to: 58-62, 67-71, 81-85

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6f58e9e1-4468-4f3c-9fd3-231eb8cb0750

📥 Commits

Reviewing files that changed from the base of the PR and between 96e580b and eb2cfbc.

📒 Files selected for processing (13)

cub/cub/agent/agent_histogram.cuh
cub/cub/block/block_load_to_shared.cuh
cub/cub/detail/fast_modulo_division.cuh
cub/test/catch2_test_device_exclusive_scan_noncommutative.cu
cub/test/catch2_test_device_partition_flagged.cu
cub/test/catch2_test_device_segmented_scan_multi_segment.cu
cub/test/catch2_test_device_segmented_scan_noncommutative.cu
cub/test/catch2_test_device_transform.cu
cub/test/internal/catch2_test_integer_utils.cu
libcudacxx/include/cuda/__cmath/mul_hi.h
libcudacxx/include/cuda/std/__atomic/types/base.h
libcudacxx/include/cuda/std/__atomic/types/common.h
libcudacxx/include/cuda/std/__complex/literals.h

miscco · 2026-05-19T09:32:36Z

    NV_IS_HOST,
      (return static_cast<unsigned_t>((static_cast<larger_t>(value) * multiplier) >> NumBits);),
      ({return (sizeof(T) == 8)
+        // nvc++ doesn't implement __umul64hi and crashes, so we replace it with inline PTX
+#if _CCCL_CUDA_COMPILER(NVHPC)
+        ? [value, multiplier](){
+          unsigned long long result;
+          asm("mul.hi.u64 %0, %1, %2;" : "=l"(result) : "l"(value), "l"(multiplier));
+          return static_cast<unsigned_t>(result);
+        }()
+#else // ^^^ _CCCL_CUDA_COMPILER(NVHPC) ^^^ / vvv !_CCCL_CUDA_COMPILER(NVHPC) vvv
        ? static_cast<unsigned_t>(__umul64hi(value, multiplier))
+#endif // ^^^ !_CCCL_CUDA_COMPILER(NVHPC) ^^^
        : static_cast<unsigned_t>((static_cast<larger_t>(value) * multiplier) >> NumBits);}));


Critical: We use conditional compilation inside an NV_IF_TARGET. you need to wrap outside of it

bernhardmgruber · 2026-05-19T09:38:29Z

+  if (threadIdx.x == 0)
+  {
+    ssmem = 0;
+  }


Suggestion: can you please add a comment why this is necessary? And that it's because of nvc++.

bernhardmgruber · 2026-05-19T09:39:06Z

      {
        [[maybe_unused]] const auto __lhs1 = static_cast<unsigned long long>(__lhs);
        [[maybe_unused]] const auto __rhs1 = static_cast<unsigned long long>(__rhs);
+        // nvc++ doesn't implement __umul64hi and crashes, so we replace it with inline PTX


Suggestion: Consider filing a bug report

I've already told @dkolsen-pgi

bernhardmgruber · 2026-05-19T09:40:51Z

+            // nvc++ has a bug that it does not treat atomicAdd_block as a builtin and ptxas fails due to unresolved
+            // symbol.
+#if _CCCL_CUDA_COMPILER(NVHPC)
+            atomicAdd(privatized_histograms[ch] + bins[pixel], accumulator);


Q: Why hasn't this come up yet? Or: why hasn't this been reported by the nvc++ team yet? What change makes this a problem now? I want to understand where this is coming from.

Probably because they don't use histogram. And we don't use it anywhere else

github-actions · 2026-05-19T12:42:55Z

😬 CI Workflow Results

🟥 Finished in 4h 38m: Pass: 78%/341 | Total: 10d 10h | Max: 1h 40m | Hits: 32%/1718464

See results here.

jrhemstad

I'd like to discuss the motivation for this before adding additional complexity to CUB's source code.

[cub] Fix compilation with nvc++

eb2cfbc

davebayer requested review from a team as code owners May 19, 2026 08:02

github-project-automation Bot added this to CCCL May 19, 2026

davebayer requested a review from ericniebler May 19, 2026 08:02

github-project-automation Bot moved this to Todo in CCCL May 19, 2026

davebayer requested a review from srinivasyadav18 May 19, 2026 08:02

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 19, 2026

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

miscco requested changes May 19, 2026

View reviewed changes

github-project-automation Bot moved this from In Review to In Progress in CCCL May 19, 2026

bernhardmgruber reviewed May 19, 2026

View reviewed changes

jrhemstad requested changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cub] Fix compilation warnings with `nvc++`#9068

[cub] Fix compilation warnings with `nvc++`#9068
davebayer wants to merge 1 commit into
NVIDIA:mainfrom
davebayer:fix_nvhpc_cub_warnings

davebayer commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

miscco May 19, 2026

Uh oh!

bernhardmgruber May 19, 2026

Uh oh!

bernhardmgruber May 19, 2026

Uh oh!

davebayer May 19, 2026

Uh oh!

bernhardmgruber May 19, 2026

Uh oh!

davebayer May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

jrhemstad left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

davebayer commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

miscco May 19, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber May 19, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber May 19, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer May 19, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber May 19, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

😬 CI Workflow Results

🟥 Finished in 4h 38m: Pass: 78%/341 | Total: 10d 10h | Max: 1h 40m | Hits: 32%/1718464

Uh oh!

jrhemstad left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants