Skip to content

Add ROCm 6.4 support#812

Draft
arsdragonfly wants to merge 1 commit intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-6.4
Draft

Add ROCm 6.4 support#812
arsdragonfly wants to merge 1 commit intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-6.4

Conversation

@arsdragonfly
Copy link
Copy Markdown

Description

Adds a ROCm 6.4 dockerfile and the cross-cutting changes that first become
relevant at this version.

This PR is the first of three stacked PRs that split #810 by ROCm version:

  1. This PRAdd ROCm 6.4 support
  2. Add ROCm 7.0 support (stacked on this one)
  3. Add ROCm 7.2 support (stacked on the 7.0 PR)

Changes

  • dockerfile/rocm6.4.x.dockerfile — new ROCm 6.4 image.
  • superbench/benchmarks/micro_benchmarks/gpu_stream* — register the
    gpu-stream microbenchmark on Platform.ROCM and port it to HIP so it
    builds for any ROCm dockerfile.
  • third_party/rccl-tests — submodule bump for ROCm 6.4 compatibility.
  • dockerfile/rocm6.2.x.dockerfile — bump Intel MLC to v3.12 (the v3.10
    download mirror is gone).
  • .github/workflows/build-image.yml — add rocm6.4 matrix entry.

Notes

Splits the original branch arsdragonfly/rocm-refresh (existing PR #810) by
ROCm version so each image can be reviewed and built independently.

Add dockerfile/rocm6.4.x.dockerfile and required cross-cutting changes:
- gpu_stream: register on Platform.ROCM and port the microbenchmark to HIP.
- third_party/rccl-tests: bump submodule for ROCm 6.4 compatibility.
- dockerfile/rocm6.2.x.dockerfile: bump Intel MLC to v3.12 (3.10 mirror gone).
- CI: add rocm6.4 entry to build-image workflow.
Copilot AI review requested due to automatic review settings May 5, 2026 00:25
This was referenced May 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ROCm 6.4 container support and extends the existing gpu-stream microbenchmark so it can be built/registered on ROCm (HIP) in addition to CUDA.

Changes:

  • Added a new rocm6.4.x Dockerfile and wired it into the CI image build matrix.
  • Ported gpu-stream C++ microbenchmark build logic to support a ROCm/HIP build path (hipify + rocm_smi) and updated the benchmark registration to include Platform.ROCM.
  • Updated the ROCm 6.2 image to use a new Intel MLC v3.12 download URL.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream.cu Adds ROCm SMI path to retrieve memory clock rate under HIP/ROCm.
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream_utils.hpp Switches NVML include to ROCm SMI include when building for HIP/AMD.
superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt Adds CUDA-vs-HIP build split and hipify-based ROCm build pipeline; links rocm_smi.
superbench/benchmarks/micro_benchmarks/gpu_stream.py Registers gpu-stream for Platform.ROCM.
dockerfile/rocm6.4.x.dockerfile Introduces new ROCm 6.4 build image with updated dependency/tooling steps.
dockerfile/rocm6.2.x.dockerfile Updates Intel MLC download URL to v3.12.
.github/workflows/build-image.yml Adds rocm6.4 to the Docker build matrix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +537 to +546
rsmi_frequencies_t freq{};
ret = rsmi_dev_gpu_clk_freq_get(static_cast<uint32_t>(gpu_id), RSMI_CLK_TYPE_MEM, &freq);
if (ret != RSMI_STATUS_SUCCESS) {
std::cerr << "Failed to get memory clock from ROCm SMI: status=" << ret << std::endl;
rsmi_shut_down();
return -1.0f;
}

// freq.current is the index of the active frequency level; values are in Hz.
float clock_mhz = static_cast<float>(freq.frequency[freq.current]) / 1.0e6f;
Comment on lines +110 to +116
if(HIP_UNCACHED_MEMORY)
add_compile_definitions(HIP_UNCACHED_MEMORY)
endif()

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2")

add_executable(gpu_stream ${HIP_SOURCES})
Comment on lines +177 to +188
# Fix: copy the set before iterating. Patch all joblib instances system-wide.
RUN pip install "joblib>=1.4.2" && \
find / -path '*/joblib/parallel.py' -not -path '*/.git/*' -exec sed -i \
's/timeout_control_job = next(iter(self\._jobs_set), None)/timeout_control_job = next(iter(set(self._jobs_set)), None)/' {} +
RUN cd third_party && \
git clone -b release-staging/rocm-rel-6.4 https://github.com/ROCmSoftwarePlatform/hipBLASLt.git && \
sed -i 's/host-x86_64-unknown-linux,/host-x86_64-unknown-linux-gnu,/' \
hipBLASLt/tensilelite/Tensile/BuildCommands/SharedCommands.py && \
cd hipBLASLt && ./install.sh -dc && \
find /opt -path '*/joblib/parallel.py' -not -path '*/.git/*' -exec sed -i \
's/timeout_control_job = next(iter(self\._jobs_set), None)/timeout_control_job = next(iter(set(self._jobs_set)), None)/' {} + && \
cp -v build/release/clients/staging/hipblaslt-bench /opt/superbench/bin/
Comment on lines +177 to +187
# Fix: copy the set before iterating. Patch all joblib instances system-wide.
RUN pip install "joblib>=1.4.2" && \
find / -path '*/joblib/parallel.py' -not -path '*/.git/*' -exec sed -i \
's/timeout_control_job = next(iter(self\._jobs_set), None)/timeout_control_job = next(iter(set(self._jobs_set)), None)/' {} +
RUN cd third_party && \
git clone -b release-staging/rocm-rel-6.4 https://github.com/ROCmSoftwarePlatform/hipBLASLt.git && \
sed -i 's/host-x86_64-unknown-linux,/host-x86_64-unknown-linux-gnu,/' \
hipBLASLt/tensilelite/Tensile/BuildCommands/SharedCommands.py && \
cd hipBLASLt && ./install.sh -dc && \
find /opt -path '*/joblib/parallel.py' -not -path '*/.git/*' -exec sed -i \
's/timeout_control_job = next(iter(self\._jobs_set), None)/timeout_control_job = next(iter(set(self._jobs_set)), None)/' {} + && \
Comment on lines 119 to +120
BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.CUDA)
BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.ROCM)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants