Skip to content

Add ROCm 7.2 support#814

Draft
arsdragonfly wants to merge 3 commits intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-7.2
Draft

Add ROCm 7.2 support#814
arsdragonfly wants to merge 3 commits intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-7.2

Conversation

@arsdragonfly
Copy link
Copy Markdown

Description

Adds a ROCm 7.2 dockerfile.

Depends on #813 (which depends on #812). While the parent PRs are
unmerged, this PR's diff also includes the ROCm 6.4 and 7.0 changes;
once they merge, this diff shrinks to just the 7.2-specific commit.

This PR is the third of three stacked PRs that split #810 by ROCm version:

  1. Add ROCm 6.4 support #812Add ROCm 6.4 support
  2. Add ROCm 7.0 support #813Add ROCm 7.0 support
  3. This PRAdd ROCm 7.2 support

Changes (7.2-specific)

  • dockerfile/rocm7.2.x.dockerfile — new ROCm 7.2 image.
  • dockerfile/etc/hipblaslt-bench-standalone.cmake — minimal top-level
    CMake script that builds only hipblaslt-bench against the
    system-installed hipBLASLt. The upstream 7.2 source tree pulls in
    AMD-internal "origami" headers and a tensilelite-host C++ library that
    conflict with building only the bench tool, so the dockerfile copies
    this file in as hipBLASLt/CMakeLists.txt and configures it as a normal
    CMake project.
  • .github/workflows/build-image.yml — add rocm7.2 matrix entry.

Add dockerfile/rocm6.4.x.dockerfile and required cross-cutting changes:
- gpu_stream: register on Platform.ROCM and port the microbenchmark to HIP.
- third_party/rccl-tests: bump submodule for ROCm 6.4 compatibility.
- dockerfile/rocm6.2.x.dockerfile: bump Intel MLC to v3.12 (3.10 mirror gone).
- CI: add rocm6.4 entry to build-image workflow.
Add dockerfile/rocm7.0.x.dockerfile and required cross-cutting changes:
- third_party/Makefile: inject <cassert> into hipBusBandwidth.cpp; newer clang
  in ROCm >= 7 rejects assert() without it.
- hipblaslt_function: parse hipblaslt-bench output by header-name lookup so
  the benchmark works across both the legacy 23-column and the new 33+ column
  schemas introduced in hipblaslt 7.x. Add a regression test for the new
  schema.
- CI: add rocm7.0 entry to build-image workflow.

Stacks on top of the ROCm 6.4 PR.
Add dockerfile/rocm7.2.x.dockerfile plus a standalone CMake script that
builds only hipblaslt-bench against system-installed hipBLASLt. The upstream
7.2 source tree pulls in AMD-internal headers and a tensilelite-host C++
library that conflict with building only the bench tool, so a minimal
top-level CMakeLists.txt is supplied via dockerfile/etc/.

Also add the rocm7.2 entry to the build-image workflow.

Stacks on top of the ROCm 7.0 PR.
Copilot AI review requested due to automatic review settings May 5, 2026 00:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ROCm 7.2 support to SuperBench’s container/build pipeline, while the stacked diff also carries the earlier ROCm 6.4/7.0 support and shared ROCm benchmark changes that those images depend on.

Changes:

  • Adds new ROCm 6.4, 7.0, and 7.2 Dockerfiles plus workflow matrix entries to build those images.
  • Updates ROCm micro-benchmark support by making hipblaslt-bench parsing schema-aware and enabling gpu-stream on ROCm.
  • Adjusts supporting build assets for newer ROCm toolchains, including the hipBLASLt standalone CMake path, hipBusBandwidth fix, and MLC version bump.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
third_party/Makefile Patches ROCm HIP bandwidth sample build for newer clang.
tests/benchmarks/micro_benchmarks/test_hipblaslt_function.py Adds regression coverage for newer hipblaslt-bench output format.
superbench/benchmarks/micro_benchmarks/hipblaslt_function.py Switches hipBLASLt result parsing to header-based column lookup.
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream.cu Adds ROCm SMI-based clock handling to gpu_stream.
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream_utils.hpp Adds ROCm-specific header selection for gpu_stream.
superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt Introduces ROCm/HIP build flow for gpu_stream.
superbench/benchmarks/micro_benchmarks/gpu_stream.py Registers gpu-stream for ROCm.
dockerfile/rocm7.2.x.dockerfile Adds the new ROCm 7.2 image definition and special hipBLASLt bench build path.
dockerfile/rocm7.0.x.dockerfile Adds the ROCm 7.0 image definition and related dependency handling.
dockerfile/rocm6.4.x.dockerfile Adds the ROCm 6.4 image definition and compatibility workarounds.
dockerfile/rocm6.2.x.dockerfile Updates Intel MLC download to a newer available version.
dockerfile/etc/hipblaslt-bench-standalone.cmake Adds standalone CMake logic for building only hipblaslt-bench on ROCm 7.2.
.github/workflows/build-image.yml Adds CI matrix entries for ROCm 6.4, 7.0, and 7.2 image builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.CUDA)
BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.ROCM)
benchmark._result = BenchmarkResult(self.benchmark_name, BenchmarkType.MICRO, ReturnCode.SUCCESS, run_count=1)
self.assertTrue(benchmark._process_raw_result(0, new_format_raw_output))
self.assertEqual(ReturnCode.SUCCESS, benchmark.return_code)
self.assertEqual(2, len(benchmark.result))
make -j $(nproc) install && \
ldconfig && \
cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
make -j $(nproc) install && \
ldconfig && \
cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
make -j $(nproc) install && \
ldconfig && \
cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
Comment on lines +199 to +202
# Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support,
# so we can use the latest dev branch with full CK fused attention.
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \
cd TransformerEngine && \
Comment on lines +206 to +209
RUN python3 -m pip install onnxscript && \
git clone --recursive https://github.com/ROCm/TransformerEngine.git && \
cd TransformerEngine && \
NVTE_FRAMEWORK=pytorch \
Comment on lines +13 to +14
# - hipblaslt: release-staging/rocm-rel-7.2
# - rocblas: release-staging/rocm-rel-7.2
git apply ../megatron_deepspeed_rocm6.patch

# Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support,
# so we can use the latest dev branch with full CK fused attention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants