Add ROCm 6.4, 7.0 and 7.2 support#810
Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
# Conflicts: # superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt
Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Pull request overview
Adds ROCm 6.4 / 7.0 / 7.2 support across SuperBench micro-benchmarks and container images, including making parsing/build steps resilient to ROCm toolchain/output changes.
Changes:
- Make
hipblaslt-benchparsing robust to evolving output schemas by using header-based column lookup, and extend unit tests to cover the newer format. - Enable
gpu-streamon ROCm by switching memory-clock queries from NVML torocm_smiand updating the CMake build flow to support HIP/hipify. - Introduce new ROCm 6.4 / 7.0 / 7.2 Dockerfiles and a standalone CMake build script for
hipblaslt-benchon ROCm 7.2.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/Makefile | Injects <cassert> into HIP’s hipBusBandwidth sample to satisfy newer ROCm clang behavior. |
| tests/benchmarks/micro_benchmarks/test_hipblaslt_function.py | Adds a positive test for the newer hipblaslt-bench CSV schema. |
| superbench/benchmarks/micro_benchmarks/hipblaslt_function.py | Updates result parsing to map columns by header name instead of fixed indices. |
| superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream_utils.hpp | Switches NVML include to rocm_smi when building under HIP. |
| superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream.cu | Adds ROCm SMI-based memory clock querying for ROCm builds. |
| superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt | Adds a CUDA vs ROCm build split; hipifies sources and links rocm_smi on ROCm. |
| superbench/benchmarks/micro_benchmarks/gpu_stream.py | Registers gpu-stream for ROCm in the benchmark registry. |
| dockerfile/rocm7.2.x.dockerfile | New ROCm 7.2 image; builds hipblaslt-bench via standalone CMake to avoid upstream 7.2 build issues. |
| dockerfile/rocm7.0.x.dockerfile | New ROCm 7.0 image with updated RCCL/hipBLASLt build flow and TE install. |
| dockerfile/rocm6.4.x.dockerfile | New ROCm 6.4 image with hipBLASLt build adjustments and related environment setup. |
| dockerfile/rocm6.2.x.dockerfile | Updates Intel MLC download URL/version. |
| dockerfile/etc/hipblaslt-bench-standalone.cmake | Adds standalone CMakeLists content to build only hipblaslt-bench against system hipBLASLt. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Place this file at the root of an upstream hipBLASLt source tree | ||
| # (e.g. cp this to /path/to/hipBLASLt/CMakeLists-bench.txt) and invoke: | ||
| # | ||
| # cmake -B build -S /path/to/hipBLASLt -P /path/to/this/file | ||
| # | ||
| # Or use it as the top-level CMakeLists.txt by overwriting it. |
There was a problem hiding this comment.
The usage instructions are incorrect: cmake -P runs CMake in script mode and will not configure/generate a build from a project()/targets file. Either instruct users to copy this file as the top-level CMakeLists.txt and run a normal cmake -S ... -B ..., or provide a separate script-mode (-P) driver that configures a build directory via cmake -S ... -B ....
|
@arsdragonfly Thanks for your contribution, could we break down this PR to 3 separate PR? Then each PR contain 1 ROCM version. Thanks! |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ./bootstrap --prefix=/usr --no-system-curl --parallel=16 && \ | ||
| make -j ${NUM_MAKE_JOBS} && \ | ||
| make install && \ | ||
| rm -rf /tmp/cmake-${required_version}* \ |
|
|
||
|
|
||
| BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.CUDA) | ||
| BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.ROCM) |
| # Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support, | ||
| # so we can use the latest dev branch with full CK fused attention. | ||
| RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \ | ||
| cd TransformerEngine && \ |
| RUN python3 -m pip install onnxscript && \ | ||
| git clone --recursive https://github.com/ROCm/TransformerEngine.git && \ | ||
| cd TransformerEngine && \ |
| # Place this file at the root of an upstream hipBLASLt source tree | ||
| # (e.g. cp this to /path/to/hipBLASLt/CMakeLists-bench.txt) and invoke: | ||
| # | ||
| # cmake -B build -S /path/to/hipBLASLt -P /path/to/this/file | ||
| # | ||
| # Or use it as the top-level CMakeLists.txt by overwriting it. |
| # Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support, | ||
| # so we can use the latest dev branch with full CK fused attention. | ||
| RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \ | ||
| cd TransformerEngine && \ | ||
| NVTE_FRAMEWORK=pytorch \ | ||
| NVTE_FUSED_ATTN_CK=0 \ | ||
| NVTE_FUSED_ATTN_AOTRITON=1 \ |
No description provided.