Add ROCm 7.2 support#814
Draft
arsdragonfly wants to merge 3 commits intomicrosoft:mainfrom
Draft
Conversation
Add dockerfile/rocm6.4.x.dockerfile and required cross-cutting changes: - gpu_stream: register on Platform.ROCM and port the microbenchmark to HIP. - third_party/rccl-tests: bump submodule for ROCm 6.4 compatibility. - dockerfile/rocm6.2.x.dockerfile: bump Intel MLC to v3.12 (3.10 mirror gone). - CI: add rocm6.4 entry to build-image workflow.
Add dockerfile/rocm7.0.x.dockerfile and required cross-cutting changes: - third_party/Makefile: inject <cassert> into hipBusBandwidth.cpp; newer clang in ROCm >= 7 rejects assert() without it. - hipblaslt_function: parse hipblaslt-bench output by header-name lookup so the benchmark works across both the legacy 23-column and the new 33+ column schemas introduced in hipblaslt 7.x. Add a regression test for the new schema. - CI: add rocm7.0 entry to build-image workflow. Stacks on top of the ROCm 6.4 PR.
Add dockerfile/rocm7.2.x.dockerfile plus a standalone CMake script that builds only hipblaslt-bench against system-installed hipBLASLt. The upstream 7.2 source tree pulls in AMD-internal headers and a tensilelite-host C++ library that conflict with building only the bench tool, so a minimal top-level CMakeLists.txt is supplied via dockerfile/etc/. Also add the rocm7.2 entry to the build-image workflow. Stacks on top of the ROCm 7.0 PR.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds ROCm 7.2 support to SuperBench’s container/build pipeline, while the stacked diff also carries the earlier ROCm 6.4/7.0 support and shared ROCm benchmark changes that those images depend on.
Changes:
- Adds new ROCm 6.4, 7.0, and 7.2 Dockerfiles plus workflow matrix entries to build those images.
- Updates ROCm micro-benchmark support by making
hipblaslt-benchparsing schema-aware and enablinggpu-streamon ROCm. - Adjusts supporting build assets for newer ROCm toolchains, including the hipBLASLt standalone CMake path, hipBusBandwidth fix, and MLC version bump.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
third_party/Makefile |
Patches ROCm HIP bandwidth sample build for newer clang. |
tests/benchmarks/micro_benchmarks/test_hipblaslt_function.py |
Adds regression coverage for newer hipblaslt-bench output format. |
superbench/benchmarks/micro_benchmarks/hipblaslt_function.py |
Switches hipBLASLt result parsing to header-based column lookup. |
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream.cu |
Adds ROCm SMI-based clock handling to gpu_stream. |
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream_utils.hpp |
Adds ROCm-specific header selection for gpu_stream. |
superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt |
Introduces ROCm/HIP build flow for gpu_stream. |
superbench/benchmarks/micro_benchmarks/gpu_stream.py |
Registers gpu-stream for ROCm. |
dockerfile/rocm7.2.x.dockerfile |
Adds the new ROCm 7.2 image definition and special hipBLASLt bench build path. |
dockerfile/rocm7.0.x.dockerfile |
Adds the ROCm 7.0 image definition and related dependency handling. |
dockerfile/rocm6.4.x.dockerfile |
Adds the ROCm 6.4 image definition and compatibility workarounds. |
dockerfile/rocm6.2.x.dockerfile |
Updates Intel MLC download to a newer available version. |
dockerfile/etc/hipblaslt-bench-standalone.cmake |
Adds standalone CMake logic for building only hipblaslt-bench on ROCm 7.2. |
.github/workflows/build-image.yml |
Adds CI matrix entries for ROCm 6.4, 7.0, and 7.2 image builds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.CUDA) | ||
| BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.ROCM) |
| benchmark._result = BenchmarkResult(self.benchmark_name, BenchmarkType.MICRO, ReturnCode.SUCCESS, run_count=1) | ||
| self.assertTrue(benchmark._process_raw_result(0, new_format_raw_output)) | ||
| self.assertEqual(ReturnCode.SUCCESS, benchmark.return_code) | ||
| self.assertEqual(2, len(benchmark.result)) |
| make -j $(nproc) install && \ | ||
| ldconfig && \ | ||
| cd / && \ | ||
| rm -rf /tmp/openmpi-${OPENMPI_VERSION}* |
| make -j $(nproc) install && \ | ||
| ldconfig && \ | ||
| cd / && \ | ||
| rm -rf /tmp/openmpi-${OPENMPI_VERSION}* |
| make -j $(nproc) install && \ | ||
| ldconfig && \ | ||
| cd / && \ | ||
| rm -rf /tmp/openmpi-${OPENMPI_VERSION}* |
Comment on lines
+199
to
+202
| # Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support, | ||
| # so we can use the latest dev branch with full CK fused attention. | ||
| RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \ | ||
| cd TransformerEngine && \ |
Comment on lines
+206
to
+209
| RUN python3 -m pip install onnxscript && \ | ||
| git clone --recursive https://github.com/ROCm/TransformerEngine.git && \ | ||
| cd TransformerEngine && \ | ||
| NVTE_FRAMEWORK=pytorch \ |
Comment on lines
+13
to
+14
| # - hipblaslt: release-staging/rocm-rel-7.2 | ||
| # - rocblas: release-staging/rocm-rel-7.2 |
| git apply ../megatron_deepspeed_rocm6.patch | ||
|
|
||
| # Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support, | ||
| # so we can use the latest dev branch with full CK fused attention. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a ROCm 7.2 dockerfile.
Depends on #813 (which depends on #812). While the parent PRs are
unmerged, this PR's diff also includes the ROCm 6.4 and 7.0 changes;
once they merge, this diff shrinks to just the 7.2-specific commit.
This PR is the third of three stacked PRs that split #810 by ROCm version:
Add ROCm 6.4 supportAdd ROCm 7.0 supportAdd ROCm 7.2 supportChanges (7.2-specific)
dockerfile/rocm7.2.x.dockerfile— new ROCm 7.2 image.dockerfile/etc/hipblaslt-bench-standalone.cmake— minimal top-levelCMake script that builds only
hipblaslt-benchagainst thesystem-installed hipBLASLt. The upstream 7.2 source tree pulls in
AMD-internal "origami" headers and a tensilelite-host C++ library that
conflict with building only the bench tool, so the dockerfile copies
this file in as
hipBLASLt/CMakeLists.txtand configures it as a normalCMake project.
.github/workflows/build-image.yml— addrocm7.2matrix entry.