A portable implementation of the hipBLAS API on top of chipStar and CLBlast.
chipBLAS lets HIP applications using hipblas.h run on any device
chipStar's OpenCL backend supports — Intel GPUs, PoCL CPU, AMD via Mesa
Clover, etc. — without rocBLAS or cuBLAS.
chipStar exposes the underlying OpenCL handles backing each HIP stream via
hipGetBackendNativeHandles(). chipBLAS pulls those handles out, hands
them to CLBlast, and stages user data through cl_mem buffers (CLBlast's
required input format). The hipBLAS calls are thin shims over the matching
CLBlastSgemm / CLBlastSgemv / etc. entry points.
user app ──hipBLAS API──► libhipblas.so (chipBLAS)
│
├── handle: cl_context, cl_command_queue
│ ◄── hipGetBackendNativeHandles()
└── exec: stage cl_mem ↔ HIP ptr
└── CLBlast{S,D,C,Z}{gemm,...}
Early. The handle and stream plumbing work; the BLAS subset shipping today is:
- L1:
hipblasSaxpy,hipblasDaxpy,hipblasSscal,hipblasDscal - L2:
hipblasSgemv,hipblasDgemv - L3:
hipblasSgemm,hipblasDgemm,hipblasCgemm,hipblasZgemm
Backends:
- OpenCL — supported. chipStar's hipMalloc returns device-address pointers
that are not directly usable as
cl_mem, so the v0 bridge stages user data through host bounce buffers (oneclCreateBuffer+clEnqueueWriteBufferper input, oneclEnqueueReadBufferper output). Functional but not free — a directcl_meminterop path requires a new hook in chipStar (hipGetClMemFromDevPtror similar) and is on the roadmap. - Level Zero — not supported. CLBlast is OpenCL-only; on an L0 chipStar
build, exec entry points return
HIPBLAS_STATUS_NOT_SUPPORTED. Build chipStar against OpenCL or useH4I-hipBLAS.
Pointer mode: HOST only. hipblasSetPointerMode(handle, DEVICE) is
recorded but ignored; alpha/beta are dereferenced on the host.
- chipStar, installed and built against OpenCL (provides
CHIPTargets.cmake) - OpenCL ICD loader + headers
- CMake 3.20+
- A C++17 compiler
git clone --recursive https://github.com/<you>/chipBLAS
cd chipBLAS
# If you forgot --recursive:
git submodule add https://github.com/CNugteren/CLBlast.git third_party/CLBlast
cmake -S . -B build \
-DCMAKE_PREFIX_PATH=/path/to/chipStar/install
cmake --build build -j
ctest --test-dir build --output-on-failureNumerical tests compare GPU results to a host reference in test/blas_reference.hh (same formulas as the BLAS routines). Run everything with:
ctest --test-dir build --output-on-failureThe conformance executable (test_conformance) bundles the broader L1/L2/L3 scenario set in one process (exit 0 only if all pass).
To use a system-installed CLBlast instead of the vendored submodule:
cmake -S . -B build -DCHIPBLAS_USE_VENDORED_CLBLAST=OFF \
-DCMAKE_PREFIX_PATH="/path/to/chipStar/install;/path/to/clblast"CMake's default find_package(OpenCL) selects Apple's OpenCL.framework,
which is incompatible with the ICD loader chipStar links against — the
two stacks segfault when a queue from one is passed to the other. The
build sets CMAKE_FIND_FRAMEWORK=NEVER automatically; you also need to
point CMake at chipStar's bundled CL headers and the loader you want
CLBlast to link against:
cmake -S . -B build \
-DCMAKE_PREFIX_PATH=/path/to/chipStar/install \
-DOPENCL_INCLUDE_DIRS=/path/to/chipStar/install/include \
-DOPENCL_LIBRARIES=/path/to/ocl-icd-loader/lib/libOpenCL.dylib \
-DOpenCL_INCLUDE_DIR=/path/to/chipStar/install/include \
-DOpenCL_LIBRARY=/path/to/ocl-icd-loader/lib/libOpenCL.dylibBoth upper- and lower-case variants of OPENCL_* are needed because
CLBlast's bundled FindOpenCL.cmake uses the older spelling while CMake's
own module uses the newer one.
PoCL / limited OpenCL stacks: use CHIP_OCL_USE_ALLOC_STRATEGY=svm for
canonical HIP pointers with the USE_HOST_PTR bridge (both
test-opencl-macos and test-opencl-linux CI jobs set this). If
CLBlast returns kNoHalfPrecision (-2045) on hipblasHalf* paths, set
CHIPBLAS_SKIP_HALF_API_SURFACE so api_surface can complete (CI sets
this on both self-hosted jobs).
#include <hip/hip_runtime.h>
#include <hipblas/hipblas.h>
float *dA, *dB, *dC;
hipMalloc(&dA, M*K*sizeof(float));
hipMalloc(&dB, K*N*sizeof(float));
hipMalloc(&dC, M*N*sizeof(float));
// ... copy inputs in ...
hipblasHandle_t h;
hipblasCreate(&h);
float alpha = 1.0f, beta = 0.0f;
hipblasSgemm(h, HIPBLAS_OP_N, HIPBLAS_OP_N,
M, N, K, &alpha, dA, M, dB, K, &beta, dC, M);
hipblasDestroy(h);MIT. CLBlast is bundled as a submodule under its own Apache-2.0 license.