Skip to content

Improve DeviceScan env test coverage and clean up stale code#8280

Open
gonidelis wants to merge 1 commit intoNVIDIA:mainfrom
gonidelis:scan_test_impr
Open

Improve DeviceScan env test coverage and clean up stale code#8280
gonidelis wants to merge 1 commit intoNVIDIA:mainfrom
gonidelis:scan_test_impr

Conversation

@gonidelis
Copy link
Copy Markdown
Member

@gonidelis gonidelis commented Apr 2, 2026

Taking the opportunity from a recent review from one of my env PRs I took the liberty to clean up the scan tests mainly for two reasons:

  1. use non-identity init values to make sure they are factored in properly
  2. apply scan/sum ops in more than 1 elements (which in exclusive* it's just the init value) to make sure without loss of generality that the results is correct for the full input range and not just the edge case
  3. clean-up some dead leftover code

* Use non-identity init values so the init actually affects scan results
* Use multi-element inputs so the scan op is exercised, not just init
* Remove unused cudaGetDevice/PtxVersion calls, use ptx_arch_id instead
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 2, 2026
@gonidelis gonidelis requested a review from a team as a code owner April 2, 2026 17:00
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🥳 CI Workflow Results

🟩 Finished in 42m 52s: Pass: 100%/213 | Total: 1d 20h | Max: 30m 46s | Hits: 98%/124738

See results here.

for (int i = 0; i < num_items; i++)
{
REQUIRE(d_out[i] == i);
REQUIRE(d_out[i] == i + init);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a thrust::equals(d_out, counting_iterator(init))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't teach llms how to write bad cuda code

num_items_t num_items = 1;
num_items_t num_items = 2;
c2h::device_vector<int> d_block_size(1);
block_size_check_t block_size_check{thrust::raw_pointer_cast(d_block_size.data())};
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify the functor with if threadId.x == 0 so there is no data race

{
REQUIRE(d_out[i] == i);
}

Copy link
Copy Markdown
Member Author

@gonidelis gonidelis Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use the stream_registry_factory_t for that but that might end up getting too involved

let's just create a custom IputTypeT for d_in that stores blockDim.x info internally and we can poll it from there

for (int i = 0; i < num_items; i++)
{
REQUIRE(d_out[i] == (i + 1 + 10.0f));
REQUIRE(d_out[i] == (i + 1 + init));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use thrust::equal as above

for (int i = 0; i < num_items; i++)
{
REQUIRE(d_out[i] == i);
REQUIRE(d_out[i] == i + init);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, use thrust::equal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant