[cub] Replace cub parameter framework with cuda::argument#9074
[cub] Replace cub parameter framework with cuda::argument#9074pciolkosz wants to merge 22 commits into
Conversation
845daaf to
5dd3c87
Compare
5dd3c87 to
8a3b299
Compare
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR introduces a new CUDA argument wrapper system with static and runtime bounds validation, then refactors CUB's batched top-K dispatch, kernel, and agent layers to use unified ChangesCUDA Argument Infrastructure
CUB Batched Top-K Refactor
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
cub/cub/detail/segmented_params.cuh (1)
31-43: 💤 Low valuesuggestion: Missing
[[nodiscard]]onget_paramoverloads. Per coding guidelines, most functions with non-void return should have this attribute._CCCL_TEMPLATE(class _Tp) _CCCL_REQUIRES((!::cuda::argument::__is_wrapper_v<::cuda::std::remove_cv_t<::cuda::std::remove_reference_t<_Tp>>>) ) -_CCCL_HOST_DEVICE constexpr auto get_param(_Tp&& __arg, [[maybe_unused]] size_t __index) noexcept +[[nodiscard]] _CCCL_HOST_DEVICE constexpr auto get_param(_Tp&& __arg, [[maybe_unused]] size_t __index) noexceptSame applies to the other
get_paramoverloads on lines 46-47, 53-54, 67-68, 74-75. As per coding guidelines, most functions with a non-void return type should use[[nodiscard]].cub/cub/device/dispatch/dispatch_batched_topk.cuh (1)
51-66: 💤 Low valuesuggestion: Both
wrap_select_directionoverloads return non-void and should have[[nodiscard]].-_CCCL_HOST_DEVICE inline auto wrap_select_direction(detail::topk::select dir) +[[nodiscard]] _CCCL_HOST_DEVICE inline auto wrap_select_direction(detail::topk::select dir)-_CCCL_HOST_DEVICE auto wrap_select_direction(IteratorT iter) +[[nodiscard]] _CCCL_HOST_DEVICE auto wrap_select_direction(IteratorT iter)libcudacxx/include/cuda/__argument/argument_bounds.h (1)
103-113: ⚡ Quick winsuggestion: Complete Doxygen tags for the documented
__boundsoverloads. The documented non-void factory functions currently only provide//!@brief; add `//! `@paramfor each parameter and//!@return`` for both overloads to satisfy header documentation requirements.As per coding guidelines: "When a function is documented with Doxygen, it must include:
//!@brief, `//! `@param`[in/out/in,out]` for every parameter, and `//! `@returnfor non-void functions."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a53ca9f6-f66d-4f20-a942-2e8bd23c2c84
📒 Files selected for processing (20)
cub/benchmarks/bench/segmented_topk/fixed/keys.cucub/benchmarks/bench/segmented_topk/variable/keys.cucub/cub/agent/agent_batched_topk.cuhcub/cub/detail/segmented_params.cuhcub/cub/device/dispatch/dispatch_batched_topk.cuhcub/cub/device/dispatch/kernels/kernel_batched_topk.cuhcub/test/catch2_test_device_segmented_topk_keys.cucub/test/catch2_test_device_segmented_topk_pairs.culibcudacxx/include/cuda/__argument/argument.hlibcudacxx/include/cuda/__argument/argument_bounds.hlibcudacxx/include/cuda/argumentlibcudacxx/include/cuda/std/__internal/namespaces.hlibcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpplibcudacxx/test/support/test_macros.h
| template <auto _Lowest, auto _Max> | ||
| _CCCL_API constexpr __immediate(_Arg __arg, __static_bounds<_Lowest, _Max>) noexcept | ||
| : arg{::cuda::std::move(__arg)} | ||
| { | ||
| __validate_bounds(); | ||
| __validate_value(); | ||
| } |
There was a problem hiding this comment.
important: These constructors accept a __static_bounds<_Lowest, _Max> argument but validate against the class template parameter _StaticBounds, so explicitly-instantiated types can silently ignore the bounds token passed at construction. Add a compile-time constraint (for example, static_assert(::cuda::std::is_same_v<_StaticBounds, __static_bounds<_Lowest, _Max>>) or a requires clause) so construction fails when the token and _StaticBounds disagree.
Also applies to: 294-302
| template <auto _Lowest, auto _Max> | ||
| _CCCL_API constexpr __deferred_base(_Arg __arg, __static_bounds<_Lowest, _Max>) noexcept | ||
| : arg{::cuda::std::move(__arg)} | ||
| { | ||
| __validate_bounds_intersection<__element_type, _StaticBounds>(__runtime_bounds_); | ||
| } |
There was a problem hiding this comment.
important: __deferred_base has the same bounds-token mismatch risk: constructor parameters carry __static_bounds<_Lowest, _Max> but all checks use _StaticBounds. This can make user-provided static bounds inert for explicitly-typed wrappers. Enforce _StaticBounds == __static_bounds<_Lowest, _Max> at compile time (or remove the redundant bounds parameter in favor of _StaticBounds-typed overloads).
Also applies to: 357-366
😬 CI Workflow Results🟥 Finished in 4h 13m: Pass: 94%/341 | Total: 10d 11h | Max: 4h 13m | Hits: 41%/1754003See results here. |
This PR replaces most of the functionality in
segmented_params.cuhwithcuda::argumentwrappers from #8875. This PR contains the other one, since it's not merged yet.There are two things that were left from the original implementation, the static dispatch over bounded set of values and
get_paramthat either gets item from a sequence at a given index or returns a uniform value depending on the argument. Both of those things were more fitting for a cub-specific functionality, but its not set in stone