Skip to content

Asserting current device and CUB stream matches#9119

Open
thom-gg wants to merge 7 commits into
NVIDIA:mainfrom
thom-gg:validate-device-and-stream-matches
Open

Asserting current device and CUB stream matches#9119
thom-gg wants to merge 7 commits into
NVIDIA:mainfrom
thom-gg:validate-device-and-stream-matches

Conversation

@thom-gg
Copy link
Copy Markdown

@thom-gg thom-gg commented May 23, 2026

Description

closes #7782

Adding an assertion to all the dispatching codes to ensure current device and CUB stream matches. Calling it at the very beginning of each of the dispatch functions, hence the number of modified files

The assertion itself uses cudaStreamGetDevice which was introduced in CTK 12.8 so it's guarded by the macro _CCCL_CTK_AT_LEAST(12,8).

I'm new to the project so unsure if there is a better place to call the assertion rather than doing it in every dispatch file, also unsure if the assertion should be put in the cub/cub/util_device.cuh file like i did or elsewhere, please tell me if this issue should be addressed differently and i'll try to do it !

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@thom-gg thom-gg requested a review from a team as a code owner May 23, 2026 15:52
@thom-gg thom-gg requested a review from fbusato May 23, 2026 15:52
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 23, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc347435-9ca3-4ec6-b1e8-58763662dfc2

📥 Commits

Reviewing files that changed from the base of the PR and between 329bf57 and efd3b9a.

📒 Files selected for processing (4)
  • cub/cub/util_device.cuh
  • cub/test/catch2_test_device_for.cu
  • cub/test/catch2_test_device_for_api.cu
  • libcudacxx/include/cuda/__driver/driver_api.h

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Added upfront CUDA stream↔device validation across many device dispatch paths so invalid stream/device combinations are detected early and reported before work is launched.
  • New Features

    • Introduced a runtime stream/device validation utility and non-throwing driver-API wrappers to reliably surface validation results.
  • Tests

    • Added tests for cross-device stream behavior (both success and expected-failure cases).

important:

Walkthrough

Driver-API no-throw wrappers and cub::detail::validate_stream_device(cudaStream_t) were added; dispatch entrypoints now call it up-front and return early on mismatch; tests for cross-device stream behavior were added.

Changes

Stream-device validation layer

Layer / File(s) Summary
Core validation utility
cub/cub/util_device.cuh, libcudacxx/include/cuda/__driver/driver_api.h
Added cub::detail::validate_stream_device(cudaStream_t) and non-throwing driver wrappers (__ctxPushNoThrow, __ctxPopNoThrow, __ctxGetDeviceNoThrow, __streamGetCtxNoThrow).
Dispatch validation rollout
cub/cub/device/dispatch/*.cuh (adjacent_difference, batch_memcpy, batched_topk, find, for, histogram, merge, merge_sort, radix_sort, reduce, reduce_by_key, reduce_deterministic, reduce_nondeterministic, rle, scan, scan_by_key, segmented_radix_sort, segmented_reduce, segmented_scan, segmented_sort, select_if, three_way_partition, topk, transform, unique_by_key)
Inserted validate_stream_device(stream) at start of dispatch entrypoints and helper dispatch functions; each returns early on validation error before PTX/compute-capability queries, temporary-storage sizing, or kernel dispatch.
Tests
cub/test/catch2_test_device_for.cu, cub/test/catch2_test_device_for_api.cu
Added tests: one disables validation and exercises cross-device stream usage; another asserts cudaErrorInvalidDevice when calling ForEachN with a stream from a different device.

Assessment against linked issues

Objective Addressed Explanation
Add stream-device validation to CUB dispatch functions [#7782]

Suggested reviewers

  • fbusato
  • bernhardmgruber
  • srinivasyadav18

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (16)
cub/cub/device/dispatch/dispatch_merge_sort.cuh (1)

406-406: ⚡ Quick win

suggestion: qualify validate_stream_device(stream) with its global namespace-qualified symbol (matching its declaration namespace) instead of using unqualified lookup.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 477-477

cub/cub/device/dispatch/dispatch_radix_sort.cuh (1)

1141-1141: ⚡ Quick win

suggestion: use the global namespace-qualified form of validate_stream_device(stream) at both dispatch entry points to satisfy the free-function qualification rule.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 1206-1206

cub/cub/device/dispatch/dispatch_reduce.cuh (1)

481-481: ⚡ Quick win

suggestion: qualify validate_stream_device(stream) from the global namespace in both locations rather than relying on unqualified lookup.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 754-754

cub/cub/device/dispatch/dispatch_reduce_by_key.cuh (1)

609-609: ⚡ Quick win

suggestion: switch both validate_stream_device(stream) calls to the fully qualified global-namespace symbol to comply with the free-function call rule.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 698-698

cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh (1)

342-342: ⚡ Quick win

suggestion: qualify validate_stream_device(stream) from the global namespace instead of calling it unqualified.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

cub/cub/device/dispatch/dispatch_reduce_nondeterministic.cuh (1)

176-176: ⚡ Quick win

suggestion: call validate_stream_device(stream) via its global namespace-qualified symbol here.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

cub/cub/device/dispatch/dispatch_rle.cuh (1)

608-608: ⚡ Quick win

suggestion: make both validate_stream_device(stream) calls fully qualified from the global namespace to align with project call-qualification rules.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 666-666

cub/cub/device/dispatch/dispatch_scan.cuh (1)

865-865: ⚡ Quick win

suggestion: use the global namespace-qualified form for validate_stream_device(stream) in both locations rather than unqualified calls.
As per coding guidelines "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 933-933

cub/cub/device/dispatch/dispatch_scan_by_key.cuh (1)

599-599: ⚡ Quick win

suggestion: Qualify validate_stream_device from the global namespace in both dispatch entrypoints to match the repository call-style rule.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 737-737

cub/cub/device/dispatch/dispatch_segmented_radix_sort.cuh (1)

620-620: ⚡ Quick win

suggestion: Use a globally qualified call for validate_stream_device at both insertion points to keep dispatch code aligned with repository qualification rules.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 907-907

cub/cub/device/dispatch/dispatch_segmented_reduce.cuh (1)

424-424: ⚡ Quick win

suggestion: Fully qualify validate_stream_device from global scope in both dispatch paths for consistency with the project’s free-function call rule.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 531-531

cub/cub/device/dispatch/dispatch_segmented_scan.cuh (1)

132-132: ⚡ Quick win

suggestion: Qualify validate_stream_device from the global namespace here to satisfy the repository’s free-function qualification requirement.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

cub/cub/device/dispatch/dispatch_segmented_sort.cuh (1)

692-692: ⚡ Quick win

suggestion: Switch both validate_stream_device invocations to globally qualified form to match the enforced free-function qualification convention.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 1285-1285

cub/cub/device/dispatch/dispatch_select_if.cuh (1)

846-846: ⚡ Quick win

suggestion: Apply global qualification to validate_stream_device in both dispatch entrypoints to comply with the project-wide free-function call convention.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 1105-1105

cub/cub/device/dispatch/dispatch_three_way_partition.cuh (1)

367-367: ⚡ Quick win

suggestion: Use globally qualified validate_stream_device calls in both updated dispatch layers to align with the mandatory free-function qualification rule.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".

Also applies to: 438-438

cub/cub/device/dispatch/dispatch_topk.cuh (1)

478-478: ⚡ Quick win

suggestion: Qualify validate_stream_device from global scope in this dispatch entrypoint to satisfy the repository free-function qualification rule.

As per coding guidelines, "All calls to free functions must be fully qualified from the global namespace, e.g. ::cuda::ceil_div, even when calling functions in the same namespace".


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5d183277-1a01-4666-9f1b-617c330bdabb

📥 Commits

Reviewing files that changed from the base of the PR and between c47f140 and e23c56c.

📒 Files selected for processing (26)
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/dispatch_find.cuh
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/cub/device/dispatch/dispatch_histogram.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/dispatch_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/dispatch_reduce_by_key.cuh
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_reduce_nondeterministic.cuh
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • cub/cub/device/dispatch/dispatch_scan.cuh
  • cub/cub/device/dispatch/dispatch_scan_by_key.cuh
  • cub/cub/device/dispatch/dispatch_segmented_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/device/dispatch/dispatch_segmented_scan.cuh
  • cub/cub/device/dispatch/dispatch_segmented_sort.cuh
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cub/cub/device/dispatch/dispatch_three_way_partition.cuh
  • cub/cub/device/dispatch/dispatch_topk.cuh
  • cub/cub/device/dispatch/dispatch_transform.cuh
  • cub/cub/device/dispatch/dispatch_unique_by_key.cuh
  • cub/cub/util_device.cuh

Comment thread cub/cub/device/dispatch/dispatch_adjacent_difference.cuh Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2d2b79d9-a47e-40c6-99d9-5d9e96a54b55

📥 Commits

Reviewing files that changed from the base of the PR and between e23c56c and a8d21f9.

📒 Files selected for processing (26)
  • cub/cub/device/dispatch/dispatch_adjacent_difference.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/dispatch_find.cuh
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/cub/device/dispatch/dispatch_histogram.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/dispatch_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/dispatch_reduce_by_key.cuh
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_reduce_nondeterministic.cuh
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • cub/cub/device/dispatch/dispatch_scan.cuh
  • cub/cub/device/dispatch/dispatch_scan_by_key.cuh
  • cub/cub/device/dispatch/dispatch_segmented_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/device/dispatch/dispatch_segmented_scan.cuh
  • cub/cub/device/dispatch/dispatch_segmented_sort.cuh
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cub/cub/device/dispatch/dispatch_three_way_partition.cuh
  • cub/cub/device/dispatch/dispatch_topk.cuh
  • cub/cub/device/dispatch/dispatch_transform.cuh
  • cub/cub/device/dispatch/dispatch_unique_by_key.cuh
  • cub/cub/util_device.cuh
✅ Files skipped from review due to trivial changes (1)
  • cub/cub/device/dispatch/dispatch_histogram.cuh

Comment thread cub/cub/device/dispatch/dispatch_scan_by_key.cuh
Comment thread cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution! Please add a unit test to at least one algorithm calling it with a stream that does not match the current device. This test must be written in a way that it also works if there is only one GPU/device in the system (just succeeding is fine I think). I can try it briefly on my machine where I have two GPUs.

Comment thread cub/cub/util_device.cuh Outdated
Comment on lines +467 to +471
error = cudaStreamGetDevice(stream, &streamDevice);
if (error != cudaSuccess)
{
return error;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Let's not reuse the error variable:

Suggested change
error = cudaStreamGetDevice(stream, &streamDevice);
if (error != cudaSuccess)
{
return error;
}
if (const auto error = cudaStreamGetDevice(stream, &streamDevice);)
{
return error;
}

Comment thread cub/cub/util_device.cuh Outdated
Comment on lines +473 to +477
error = cudaGetDevice(&currentDevice);
if (error != cudaSuccess)
{
return error;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error = cudaGetDevice(&currentDevice);
if (error != cudaSuccess)
{
return error;
}
if (const auto error = cudaGetDevice(&currentDevice);)
{
return error;
}

Comment thread cub/cub/util_device.cuh Outdated
return cudaErrorInvalidDevice;
}
# endif // _CCCL_CTK_AT_LEAST(12,8)
return error;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return error;
return cudaSuccess;

Comment thread cub/cub/util_device.cuh Outdated
CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE cudaError_t validate_stream_device(cudaStream_t stream)
{
cudaError_t error = cudaSuccess;
# if _CCCL_CTK_AT_LEAST(12, 8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: sometimes users violate our API requirements, but their software ran fine for a long time. They would be upset if we suddenly enforce requirements, causing their software to break. Let's add a macro to disable this new feature:

Suggested change
# if _CCCL_CTK_AT_LEAST(12, 8)
# if _CCCL_CTK_AT_LEAST(12, 8) && !defined(CCCL_DISABLE_STREAM_DEVICE_CHECK)

If possible, add a unit test that calls a simple algorithm like DeviceFor with a stream and a different current device and define the CCCL_DISABLE_STREAM_DEVICE_CHECK macro, to see whether the escape hatch works.

Comment thread cub/cub/util_device.cuh Outdated
Comment on lines +465 to +467
# if _CCCL_CTK_AT_LEAST(12, 8)
int streamDevice;
error = cudaStreamGetDevice(stream, &streamDevice);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this function work even before CTK 12.8 using CUDA Driver API. We already have this implemented for cuda::stream_ref. It should look as:

{
  ::CUdevice current_device;
  if (const auto error = ::cuda::__driver::__ctxGetDeviceNoThrow(current_device); error != cudaSuccess)
  {
    return error;
  }

  ::CUcontext stream_ctx;
  if (const auto error = ::cuda::__driver::__streamGetCtxNoThrow(stream_ctx, stream); error != cudaSuccess)
  {
    return error;
  }

  if (const auto error = ::cuda::__driver::__ctxPushNoThrow(stream_ctx); error != cudaSuccess)
  {
    return error;
  }

  ::CUdevice stream_device;
  if (const auto error = ::cuda::__driver::__ctxGetDeviceNoThrow(stream_device); error != cudaSuccess)
  {
    return error;
  }

  if (const auto error = ::cuda::__driver::__ctxPopNoThrow(); error != cudaSuccess)
  {
    return error;
  }

  _CCCL_ASSERT(current_device == stream_device, "current device must match CUB stream device");
}

The only problem is that we need to add __meowNoThrow variants of all context-related driver APIs to <cuda/__driver/driver_api.h>.

If you don't feel comfortable doing this, I will make a follow up PR after this one is merged :)

Comment thread cub/cub/util_device.cuh Outdated
Comment on lines +464 to +484
cudaError_t error = cudaSuccess;
# if _CCCL_CTK_AT_LEAST(12, 8)
int streamDevice;
error = cudaStreamGetDevice(stream, &streamDevice);
if (error != cudaSuccess)
{
return error;
}
int currentDevice;
error = cudaGetDevice(&currentDevice);
if (error != cudaSuccess)
{
return error;
}
_CCCL_ASSERT(currentDevice == streamDevice, "current device must match CUB stream device");
if (currentDevice != streamDevice)
{
return cudaErrorInvalidDevice;
}
# endif // _CCCL_CTK_AT_LEAST(12,8)
return error;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Since this is an assertion, we need to make sure all of the CUDA Runtime/Driver calls are done only when assertions are enabled, because they won't get optimized out and can introduce some unwanted overhead.

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL May 25, 2026
@thom-gg thom-gg requested a review from a team as a code owner May 25, 2026 14:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bd797629-5166-4034-acec-48de9bb9ed1e

📥 Commits

Reviewing files that changed from the base of the PR and between 8de3755 and 329bf57.

📒 Files selected for processing (4)
  • cub/cub/util_device.cuh
  • cub/test/catch2_test_device_for.cu
  • cub/test/catch2_test_device_for_api.cu
  • libcudacxx/include/cuda/__driver/driver_api.h

Comment thread cub/cub/util_device.cuh Outdated
Comment thread cub/cub/util_device.cuh
Comment thread cub/test/catch2_test_device_for_api.cu
Comment on lines +18 to +19
#define CCCL_DISABLE_STREAM_DEVICE_CHECK

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: defining CCCL_DISABLE_STREAM_DEVICE_CHECK at file scope disables validation for every test in this translation unit, not just the new escape-hatch case. Move this case to a dedicated test file (or dedicated compile target) so default behavior tests stay meaningful.

Comment thread cub/test/catch2_test_device_for.cu
Comment thread libcudacxx/include/cuda/__driver/driver_api.h Outdated
… end of tests, and making sure pop always gets executed in validate_stream_device
@thom-gg
Copy link
Copy Markdown
Author

thom-gg commented May 25, 2026

Hi, thanks to you both for the feedbacks:

  • i added the noThrow variants for the driver api methods
  • used them in validate_stream_device to support all CTK version and not only >=12.8
  • stopped re-using the error variables
  • added a macro CCCL_DISABLE_STREAM_DEVICE_CHECK to disable the check
  • guarded the check in behind a CCCL_ENABLE_ASSERTIONS macro
  • wrote 2 unit tests launching DeviceFor on a stream from another device than the current one, one test that should fail and one that defines CCCL_DISABLE_STREAM_DEVICE_CHECK and should therefore skip the check and succeed. I wasn't able to run the tests since i only have one gpu, but i compiled them though. if there is less than 2 gpus the tests are skipped.

happy to keep modifying this if needed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Validate current device and CUB stream matches

3 participants