Skip to content

Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput#750

Open
WenqingLan1 wants to merge 47 commits intomicrosoft:mainfrom
WenqingLan1:feat/third_party/nvbench
Open

Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput#750
WenqingLan1 wants to merge 47 commits intomicrosoft:mainfrom
WenqingLan1:feat/third_party/nvbench

Conversation

@WenqingLan1
Copy link
Copy Markdown
Contributor

@WenqingLan1 WenqingLan1 commented Oct 9, 2025

This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.

  • Integrated the NVBench submodule
  • Implemented three benchmarks
    • nvbench-sleep-kernel
    • nvbench-kernel-launch
    • nvbench-auto-throughput
  • updated documentation and added example scripts

Example config:

version: v0.12
superbench:
  enable:
  # nvbench benchmarks
  - nvbench-sleep-kernel:single
  - nvbench-sleep-kernel:list
  - nvbench-sleep-kernel:range
  - nvbench-sleep-kernel:range-step
  - nvbench-kernel-launch
  - nvbench-auto-throughput
  - nvbench-auto-throughput:stride-list
  - nvbench-auto-throughput:stride-range
  var:
    default_local_mode: &default_local_mode
      modes:
      - name: local
        proc_num: 4
        prefix: CUDA_VISIBLE_DEVICES={proc_rank}
        parallel: yes
  benchmarks:
    nvbench-sleep-kernel:single:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "50"                   # Single value format
        timeout: 30
    nvbench-sleep-kernel:list:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[25,50,75]"         # List format - no spaces after commas
        timeout: 30
    nvbench-sleep-kernel:range:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:5]"           # Range format
        timeout: 30
    nvbench-sleep-kernel:range-step:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:50:10]"         # Range with step format
        timeout: 30
    nvbench-kernel-launch:
      <<: *default_local_mode
      timeout: 300
    nvbench-auto-throughput:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1,2,4,8]"              # List format for stride
        block_size: "[128,256,512,1024]"  # List format for block size
    nvbench-auto-throughput:stride-list:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1,2,4,8]"              # List format
        block_size: "[256,512]"
    nvbench-auto-throughput:stride-range:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1:8:2]"                # Range with step format
        block_size: "256"                 # Single value format

@WenqingLan1 WenqingLan1 requested a review from a team as a code owner October 9, 2025 23:12
@WenqingLan1 WenqingLan1 added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Oct 9, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 98.20628% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.03%. Comparing base (3c95714) to head (e253b85).

Files with missing lines Patch % Lines
...rbench/benchmarks/micro_benchmarks/nvbench_base.py 97.91% 2 Missing ⚠️
...hmarks/micro_benchmarks/nvbench_auto_throughput.py 98.07% 1 Missing ⚠️
...enchmarks/micro_benchmarks/nvbench_sleep_kernel.py 97.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   85.69%   86.03%   +0.34%     
==========================================
  Files         103      107       +4     
  Lines        7890     8113     +223     
==========================================
+ Hits         6761     6980     +219     
- Misses       1129     1133       +4     
Flag Coverage Δ
cpu-python3.10-unit-test 71.18% <98.17%> (+0.75%) ⬆️
cpu-python3.12-unit-test 71.18% <98.17%> (+0.75%) ⬆️
cpu-python3.7-unit-test 70.64% <98.20%> (+0.78%) ⬆️
cuda-unit-test 83.99% <98.17%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings February 26, 2026 22:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 30 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dockerfile/cuda13.0.dockerfile Outdated
Comment thread .gitmodules
Comment thread superbench/benchmarks/micro_benchmarks/__init__.py
Comment thread dockerfile/rocm5.0.x.dockerfile
@microsoft microsoft deleted a comment from Copilot AI Mar 10, 2026
@microsoft microsoft deleted a comment from Copilot AI Mar 10, 2026
Copilot AI review requested due to automatic review settings March 10, 2026 20:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread third_party/Makefile
Comment thread superbench/benchmarks/micro_benchmarks/nvbench_base.py Outdated
Comment thread tests/benchmarks/micro_benchmarks/test_nvbench_base.py Outdated
Comment thread dockerfile/cuda12.9.dockerfile
Comment thread tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py
Copilot AI review requested due to automatic review settings March 10, 2026 21:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py Outdated
@microsoft microsoft deleted a comment from Copilot AI Mar 10, 2026
@microsoft microsoft deleted a comment from Copilot AI Mar 10, 2026
@microsoft microsoft deleted a comment from Copilot AI Mar 10, 2026
@WenqingLan1 WenqingLan1 changed the title Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput Mar 25, 2026
Copilot AI review requested due to automatic review settings April 8, 2026 20:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/micro_benchmarks/__init__.py
Comment thread superbench/benchmarks/micro_benchmarks/nvbench/sleep_kernel.cu
Comment thread superbench/benchmarks/micro_benchmarks/nvbench/kernel_launch.cu
Comment thread superbench/benchmarks/micro_benchmarks/nvbench/CMakeLists.txt
Comment thread superbench/benchmarks/micro_benchmarks/nvbench/CMakeLists.txt
Comment thread superbench/benchmarks/micro_benchmarks/nvbench_auto_throughput.py
Comment thread dockerfile/cuda13.0.dockerfile
Copilot AI review requested due to automatic review settings April 22, 2026 19:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +8
cmake_minimum_required(VERSION 3.18)
project(nvbench_benchmarks LANGUAGES CUDA)

# Check if we have a recent enough CMake for nvbench (which requires 3.30.4)
if(CMAKE_VERSION VERSION_LESS "3.30.4")
message(STATUS "CMake version ${CMAKE_VERSION} is less than 3.30.4 (required by nvbench), skipping nvbench benchmarks")
return()
endif()
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CMakeLists declares project(... LANGUAGES CUDA) before checking the CMake version / CUDA availability. If this directory is configured on a machine without a CUDA toolchain (or when CMake < 3.30.4), configuration can fail before reaching the intended “skip” logic. Consider moving the CMake version guard above project() and using project(... LANGUAGES CXX) + include(cuda_common.cmake)/enable_language(CUDA) only inside the CUDAToolkit_FOUND branch.

Copilot uses AI. Check for mistakes.
Comment thread third_party/Makefile
cd ./nvbandwidth && git apply ../nvbandwidth.patch && cp ../nvbandwidth_testcases_patched.h ./testcases_patched.h && cmake . && make && cd ..
cp -v ./nvbandwidth/nvbandwidth $(SB_MICRO_PATH)/bin

# Build nvbench
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New cuda_nvbench target isn’t listed in the Makefile’s .PHONY targets. If a file/directory named cuda_nvbench exists, make cuda_nvbench may become a no-op. Add cuda_nvbench to the .PHONY list to ensure the recipe always runs.

Suggested change
# Build nvbench
# Build nvbench
.PHONY: cuda_nvbench

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants