Describe the bug
On branch-2.2, compiling the Cascaded algorithm with the -G flag changes the static shared memory alignment behavior, causing misalignment errors.
Steps/Code to reproduce bug
- Modify CMakeLists.txt:
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG};-g")
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG};-G").
- Run tests/test_cascaded.cpp with the following test case:
TEST_CASE("comp/decomp cascaded-small-uint64", "[nvcomp][small]").
To compile successfully, I reduced the default_chunk_size in default_chunk_size from 4096 to 2048.
Expected behavior
The test should pass without errors.
Environment details (please complete the following information):
Additional context
After debugging, I explicitly declared alignment for shared memory allocation in the following files:
CascadedHlifKernels.cu:122
__shared__ __align__(sizeof(data_type)) uint8_t shmem[shmem_size];
CascadedKernels.cuh:801
__shared__ __align__(sizeof(data_type)) uint32_t chunk_metadata[max_chunk_metadata_size / sizeof(uint32_t)];
After making these changes, all tests in test_cascaded.cpp passed. I believe this dependency on compiler optimization for correctness is a bug.
Describe the bug
On branch-2.2, compiling the Cascaded algorithm with the -G flag changes the static shared memory alignment behavior, causing misalignment errors.
Steps/Code to reproduce bug
To compile successfully, I reduced the default_chunk_size in default_chunk_size from 4096 to 2048.
Expected behavior
The test should pass without errors.
Environment details (please complete the following information):
Environment location:
Ubuntu-22.04
Driver Version: 555.99
CUDA Version: 12.5
NVIDIA GeForce RTX 3080
Method of nvCOMP install: branch-2.2 source code
Additional context
After debugging, I explicitly declared alignment for shared memory allocation in the following files:
CascadedHlifKernels.cu:122
CascadedKernels.cuh:801
After making these changes, all tests in test_cascaded.cpp passed. I believe this dependency on compiler optimization for correctness is a bug.