Summary
The chipStar module cache (~/.cache/chipStar/) is write-only for hipRTC-compiled kernels. Each run writes new cache entries that are never loaded on subsequent runs, causing unbounded disk growth with no cross-run speedup.
Statically compiled HIP programs (typical hipcc approach) are not affected -- their cache works reasonably.
Root Cause
hipRTC's SPIR-V output is non-deterministic across runs for two independent reasons:
1. Random temp directory paths embedded in SPIR-V
hiprtcCompileProgram() creates a random temporary directory and its absolute path leaks into the compiled SPIR-V through four places in spirv_hiprtc.cc:
// createCompileCommand():
Append("-I" + WorkingDirectory.string()); // -I flag with random path
Append(SourceFile.string()); // absolute source path
Append(OutputFile.string()); // absolute output path
// createSourceFile() — lowered names magic variable:
File << ... << LoweredNamesFile << ";"; // absolute path as string literal
Clang embeds these paths in SPIR-V metadata (DICompileUnit, module ID). The _chip_name_expr_output_file string literal (used by the HipEmitLoweredNames pass) is compiled directly into the SPIR-V. When users pass -g, clang also embeds the working directory as DW_AT_comp_dir.
2. LLVM non-deterministic code generation
Even after fixing all path-related issues, LLVM produces non-deterministic SPIR-V for some kernels. Observed differences include basic block names with different numeric suffixes (loadstoreloop264 vs loadstoreloop252) and different SPIR-V instruction IDs for the same logical blocks. This is a known class of LLVM issue caused by hash table iteration order depending on pointer addresses:
In testing with 5 real-world kernels, 2 produced deterministic SPIR-V after path fixes; 3 remained non-deterministic (36–1327 differing bytes in ~125K object files).
Observed Behavior
$ rm -rf ~/.cache/chipStar/*
$ CHIP_LOGLEVEL=info ./my_hiprtc_program # run 1
# Log shows "Kernel compilation took X seconds" (no "Loaded from cache")
$ ls ~/.cache/chipStar/ | wc -l
3
$ CHIP_LOGLEVEL=info ./my_hiprtc_program # run 2
# Still no "Loaded from cache"
$ ls ~/.cache/chipStar/ | wc -l
6 # 3 new files, different names
Cache grows by N files per run (one per kernel module). Entries are never read. Timings are identical whether the cache directory is populated or empty.
Proposed Fix
Path fixes (necessary for any future cache solution)
cd into the working directory in executeCommand()
- Remove
-I<WorkingDirectory> — clang searches the source file's directory by default for quoted includes
- Use
SourceFile.filename() and OutputFile.filename() instead of absolute paths
- Use
LoweredNamesFile.filename() in createSourceFile() for the _chip_name_expr_output_file magic variable
- Add
-fdebug-compilation-dir=. to prevent cwd embedding when -g is used
Disable cache for dynamically loaded modules
Since LLVM non-determinism cannot be fixed at the chipStar level, skip cache load() and save() for modules loaded via hipModuleLoadData():
- Add
IsDynamicLoad flag to SPVModule (set in hipModuleLoadDataInternal)
- Check the flag in
CHIPModuleOpenCL::compile() to skip cache operations
This prevents unbounded growth while leaving the cache fully functional for statically compiled HIP programs.
Workaround
export CHIP_MODULE_CACHE_DIR=""
Environment
- macOS on Apple Silicon (ARM64), POCL as OpenCL backend
- chipStar main branch (post-LLVM 21 merge)
- POCL main branch with LLVM 21
CHIP_BE=opencl, CHIP_DEVICE_TYPE=pocl
- The LLVM non-determinism is not platform-specific and likely affects other backends
Summary
The chipStar module cache (
~/.cache/chipStar/) is write-only for hipRTC-compiled kernels. Each run writes new cache entries that are never loaded on subsequent runs, causing unbounded disk growth with no cross-run speedup.Statically compiled HIP programs (typical hipcc approach) are not affected -- their cache works reasonably.
Root Cause
hipRTC's SPIR-V output is non-deterministic across runs for two independent reasons:
1. Random temp directory paths embedded in SPIR-V
hiprtcCompileProgram()creates a random temporary directory and its absolute path leaks into the compiled SPIR-V through four places inspirv_hiprtc.cc:Clang embeds these paths in SPIR-V metadata (
DICompileUnit, module ID). The_chip_name_expr_output_filestring literal (used by theHipEmitLoweredNamespass) is compiled directly into the SPIR-V. When users pass-g, clang also embeds the working directory asDW_AT_comp_dir.2. LLVM non-deterministic code generation
Even after fixing all path-related issues, LLVM produces non-deterministic SPIR-V for some kernels. Observed differences include basic block names with different numeric suffixes (
loadstoreloop264vsloadstoreloop252) and different SPIR-V instruction IDs for the same logical blocks. This is a known class of LLVM issue caused by hash table iteration order depending on pointer addresses:In testing with 5 real-world kernels, 2 produced deterministic SPIR-V after path fixes; 3 remained non-deterministic (36–1327 differing bytes in ~125K object files).
Observed Behavior
Cache grows by N files per run (one per kernel module). Entries are never read. Timings are identical whether the cache directory is populated or empty.
Proposed Fix
Path fixes (necessary for any future cache solution)
cdinto the working directory inexecuteCommand()-I<WorkingDirectory>— clang searches the source file's directory by default for quoted includesSourceFile.filename()andOutputFile.filename()instead of absolute pathsLoweredNamesFile.filename()increateSourceFile()for the_chip_name_expr_output_filemagic variable-fdebug-compilation-dir=.to prevent cwd embedding when-gis usedDisable cache for dynamically loaded modules
Since LLVM non-determinism cannot be fixed at the chipStar level, skip cache
load()andsave()for modules loaded viahipModuleLoadData():IsDynamicLoadflag toSPVModule(set inhipModuleLoadDataInternal)CHIPModuleOpenCL::compile()to skip cache operationsThis prevents unbounded growth while leaving the cache fully functional for statically compiled HIP programs.
Workaround
Environment
CHIP_BE=opencl,CHIP_DEVICE_TYPE=pocl