Recover warp_shuffle original behavior (revert #8210)#8254
Recover warp_shuffle original behavior (revert #8210)#8254fbusato wants to merge 2 commits intoNVIDIA:mainfrom
warp_shuffle original behavior (revert #8210)#8254Conversation
| [[nodiscard]] _CCCL_DEVICE_API warp_shuffle_result<_Up> warp_shuffle_idx( | ||
| const _Tp& __data, int __src_lane, uint32_t __lane_mask = 0xFFFFFFFF, ::cuda::std::integral_constant<int, _Width> = {}) | ||
| { | ||
| static_assert(::cuda::std::is_default_constructible_v<_Tp>, "_Tp must be default constructible"); |
There was a problem hiding this comment.
question: Instead of wholesale removing these checks, should we just add explicit exceptions for known types? With the ability to allow people to proclaim types as valid for use with these APIs?
There was a problem hiding this comment.
As @fbusato explained to me, the problem is that if there is a struct containing __half, it won't be trivially copyable.. However, we do the same think for cuda::std::bit_cast and noone has complained yet.
But I think we should keep at least the requirement on default constructibility.
There was a problem hiding this comment.
I would second default_constructability, because that is a much clearer error message than what a C++ compiler generates 5 lines below
There was a problem hiding this comment.
I'd recommend taking a look at what we did in cuCollections by offering a is_bitwise_comparable custom trait. By default, we use has_unique_object_representation<T>, but that is false for floating-point values due to NaNs. However, for the majority of use cases that doesn't matter, and so we allow an escape hatch of specializing is_bitwise_comparable to opt-in. We emit a helpful diagnostic when this situation arises pointing people towards specializing is_bitwise_comparable.
We could do something similar here.
There was a problem hiding this comment.
the idea is a bit invasive but nice. The problem affects other warp instructions as well, so this solution applies to all of them. We can specialize the new type traits for reduced precision floating points + array.
I opened an RFE for the compiler nvbug 5497120 a while ago. We can rely on the proposed solution until we don't get an official workaround.
There was a problem hiding this comment.
the funny aspect is that has_unique_object_representation<T> recognizes __half, __nv_bfloat16 as unique object representation, while this is not the case
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 1h 49m: Pass: 100%/99 | Total: 2d 03h | Max: 1h 30m | Hits: 94%/257973See results here. |
Description
#8210 enforces additional constraints to the type allowed in
cuda::device::warp_shuffle_*, namely default contractible and trivially copyable.While this is conceptually correct, the new constraints prevent using warp shuffle instructions with types that practically satisfy these property but where the type traits fail, e.g.
__half,__nv_bfloat16, other reduced precision floating-point types, composition of them like array and structures. Enforcing such constraints prevent using them in many context, e.g. CUB.This PR reverts the original behavior until we don't find a reliable way to prevent the problem.