Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
185 commits
Select commit Hold shift + click to select a range
a024f61
jit lto interleaved scan
divyegala Oct 2, 2025
45da4aa
fix dependencies.yaml
divyegala Oct 2, 2025
a7c8621
generate files at build time, use tags to avoid compilation of types
divyegala Oct 4, 2025
eb2d74b
passing tests
divyegala Oct 5, 2025
d2318e8
update gitignore
divyegala Oct 6, 2025
5e6afcd
separate out distance function from main kernel
divyegala Oct 6, 2025
6eee4da
fix deps
divyegala Oct 6, 2025
1de8f28
add filters as jit device functions, rework caching logic
divyegala Oct 7, 2025
84c6020
lto post lambda, cleanup files, generate cmake in build dir
divyegala Oct 7, 2025
22680c8
don't read hardcoded kernels, use generator properly
divyegala Oct 8, 2025
37f1163
random cmake changes carried over from 25.10
divyegala Oct 8, 2025
0ae5383
cmake format
divyegala Oct 8, 2025
fe56aec
remove dep on kernel list
divyegala Oct 8, 2025
40c8fd6
attempt to solve overlinking problem
divyegala Oct 9, 2025
e87a8c7
reorder if-else in compiler check
divyegala Oct 9, 2025
179d733
Merge branch 'branch-25.12' into jit-lto-ivf-flat-interleaved
divyegala Oct 9, 2025
32a67bd
use cudart apis
divyegala Oct 9, 2025
c27612e
merge
divyegala Oct 9, 2025
a4b48b1
attempt to link cudart
divyegala Oct 9, 2025
d5d692e
revert cudart link, try all arch build of jit lto fatbin sources
divyegala Oct 9, 2025
1c6dd94
cmake format
divyegala Oct 9, 2025
30f5ab6
missing shared mem setting
divyegala Oct 10, 2025
9674969
separate cuda 12 and 13 compilation
divyegala Oct 22, 2025
24fc47d
merge upstream
divyegala Oct 22, 2025
db9a487
remove bench
divyegala Oct 22, 2025
aa9294f
c include directory
divyegala Oct 22, 2025
2eb77fe
style check
divyegala Oct 22, 2025
6c685fa
merge upstream
divyegala Oct 22, 2025
3e35b99
guard cuda calls and use shared_ptr
divyegala Oct 23, 2025
d0ff62c
add AlgorithmPlanner to main target
divyegala Oct 23, 2025
eb87577
merge upstream
divyegala Oct 23, 2025
445a6c4
remove nvjitlink as cuda 12 dep
divyegala Oct 23, 2025
92a27d4
address review
divyegala Oct 24, 2025
8549172
merge upstream
divyegala Oct 24, 2025
67579f4
add include guard
divyegala Oct 27, 2025
7ad8774
add and remove couple of comments
divyegala Oct 27, 2025
816a480
merge upstream
divyegala Oct 27, 2025
ab35ef3
delete readme
divyegala Oct 27, 2025
cdd4c85
increase warmup time
divyegala Oct 27, 2025
87334b2
merge upstream
divyegala Oct 27, 2025
c1eff9f
use new copyright
divyegala Oct 27, 2025
ece09b8
new copyright
divyegala Oct 27, 2025
4dacc6e
remove one more straggling comment
divyegala Oct 27, 2025
1fd95cd
use raft expects
divyegala Oct 27, 2025
64cde0d
Merge branch 'main' into jit-lto-ivf-flat-interleaved
divyegala Oct 27, 2025
5ac127b
merge upstream
divyegala Dec 12, 2025
78002c6
address review
divyegala Dec 12, 2025
9ad6a0b
pre-commit
divyegala Dec 12, 2025
bf4c4ad
address review
divyegala Dec 12, 2025
18b2af9
Generate kernel files in CMake instead of Python
KyleFromNVIDIA Dec 12, 2025
ece5cad
Merge remote-tracking branch 'refs/remotes/github/divyegala/jit-lto-i…
KyleFromNVIDIA Dec 12, 2025
8ce70c2
Style
KyleFromNVIDIA Dec 12, 2025
fdc4239
Style
KyleFromNVIDIA Dec 12, 2025
be3cf0d
Style
KyleFromNVIDIA Dec 12, 2025
7e644c3
Lint
KyleFromNVIDIA Dec 12, 2025
235938a
Style, lint
KyleFromNVIDIA Dec 12, 2025
e3b749d
Fix nvjitlink_checker
KyleFromNVIDIA Dec 15, 2025
f42ae3f
Style
KyleFromNVIDIA Dec 15, 2025
b606df9
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA Dec 15, 2025
5ce7aab
Refactor JIT LTO kernel compilation
KyleFromNVIDIA Dec 15, 2025
eaad347
Style
KyleFromNVIDIA Dec 15, 2025
eb3b468
pic
KyleFromNVIDIA Dec 15, 2025
912279c
style
KyleFromNVIDIA Dec 15, 2025
19f1af3
Verbose build
KyleFromNVIDIA Dec 15, 2025
087b943
static
KyleFromNVIDIA Dec 15, 2025
c16e109
style
KyleFromNVIDIA Dec 15, 2025
323b79f
TARGET_OBJECTS
KyleFromNVIDIA Dec 15, 2025
9f13e73
Disable sccache
KyleFromNVIDIA Dec 16, 2025
eaf9d39
Recache
KyleFromNVIDIA Dec 16, 2025
ce40c51
Revert CI debugging
KyleFromNVIDIA Dec 16, 2025
0d0abb9
Install and link object library
KyleFromNVIDIA Dec 17, 2025
84bfa92
Style
KyleFromNVIDIA Dec 17, 2025
21241eb
Alias
KyleFromNVIDIA Dec 17, 2025
7c0ac13
Make cuvs_jit_lto_kernels a static library
KyleFromNVIDIA Dec 17, 2025
880dbf2
Style
KyleFromNVIDIA Dec 17, 2025
d04d7c1
rapids_cuda_init_architectures() for C tests
KyleFromNVIDIA Dec 17, 2025
19581f9
Be more specific about where we search for libclang
KyleFromNVIDIA Dec 17, 2025
a61f019
More libclang updates
KyleFromNVIDIA Dec 17, 2025
2eeb913
Revert "Fix libclang download for Rust, CUDA initialization for C tests"
KyleFromNVIDIA Dec 17, 2025
55ec26c
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA Dec 18, 2025
10228c5
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA Dec 18, 2025
031ce21
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA Jan 14, 2026
088c21e
Copyright
KyleFromNVIDIA Jan 14, 2026
8ca1062
Apply suggestions from code review
divyegala Jan 22, 2026
d5ab5bf
merge upstream
divyegala Jan 22, 2026
b8c0d42
address some review comments
divyegala Jan 22, 2026
17d34ae
remove too many underscores
divyegala Jan 22, 2026
45a5146
FEA Add initial commit of prototype/pseudo-code for proposed UDF APIs…
dantegd Jan 26, 2026
447532e
stitch together
divyegala Jan 30, 2026
e1627d1
add udf to cmakelists
divyegala Jan 30, 2026
f7ea581
udfs working e2e
divyegala Jan 30, 2026
8b2775c
run benchmarks
divyegala Feb 3, 2026
e9c77d9
working through
divyegala Feb 3, 2026
adcfb8f
fixed overhead
divyegala Feb 4, 2026
282b376
Simplify
KyleFromNVIDIA Feb 4, 2026
609a4d6
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA Feb 4, 2026
3115d07
address reviews
divyegala Feb 4, 2026
bb524ae
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala Feb 4, 2026
30a8a9f
Merge branch 'jit-lto-ivf-flat-interleaved' of github.com:divyegala/c…
divyegala Feb 4, 2026
72ddb36
Merge branch 'main' into jit-lto-ivf-flat-interleaved
divyegala Feb 5, 2026
4bd2102
add to docs and log about jit
divyegala Feb 10, 2026
fb722f0
Merge branch 'jit-lto-ivf-flat-interleaved' of github.com:divyegala/c…
divyegala Feb 10, 2026
3523b96
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala Feb 10, 2026
ba758a2
address review
divyegala Feb 10, 2026
42b78ae
rename inner_product to inner_prod
divyegala Feb 10, 2026
2e3a471
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala Feb 10, 2026
bfc6c09
fix merge conflict
divyegala Feb 10, 2026
f6377fa
include header and form better log
divyegala Feb 10, 2026
26abc7b
Merge branch 'jit-lto-ivf-flat-interleaved' into ivf-flat-search-udf
divyegala Feb 10, 2026
fb7f105
merge
divyegala Feb 10, 2026
533b770
address review and move
divyegala Feb 11, 2026
af23585
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala Feb 11, 2026
78c59d9
one more fix
divyegala Feb 11, 2026
7f8802b
correct path
divyegala Feb 11, 2026
27acbb6
merge upstream
divyegala Feb 13, 2026
d11edfd
Merge branch 'jit-lto-ivf-flat-interleaved' into ivf-flat-search-udf
divyegala Feb 13, 2026
64f6ad8
merge upstream
divyegala Feb 15, 2026
f1888a2
more cleaning
divyegala Feb 15, 2026
b596e79
merge cleanly
divyegala Feb 15, 2026
9c4980f
add nvrtc as a dependency
divyegala Feb 15, 2026
f27eeb2
fix build errors
divyegala Feb 15, 2026
bc5c90e
guard udf use
divyegala Feb 15, 2026
55c32f4
compiler definition on headers
divyegala Feb 15, 2026
1866475
guard udf test
divyegala Feb 15, 2026
3e9f5f3
Merge branch 'main' into ivf-flat-search-udf
divyegala Feb 17, 2026
caf8d03
Merge branch 'main' into ivf-flat-search-udf
divyegala Feb 18, 2026
736dc75
Ignore cache-host run exports
bdice Feb 18, 2026
f83f595
Merge branch 'main' into ivf-flat-search-udf
divyegala Feb 18, 2026
88a4b6e
respond to reviews
divyegala Feb 19, 2026
101c5ee
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala Feb 19, 2026
5d3a9df
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala Feb 19, 2026
63c7300
pin cupy to <14.0 for cuda 12 wheels
divyegala Feb 19, 2026
faa9339
add includes
divyegala Feb 19, 2026
73e8fa0
fix logging
divyegala Feb 19, 2026
fef68d3
fix macro
divyegala Feb 19, 2026
995f998
Merge branch 'main' into ivf-flat-search-udf
divyegala Feb 20, 2026
3256a8e
attempt to fix devcontainer error
divyegala Feb 20, 2026
32a5d9f
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala Feb 20, 2026
592af70
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala Feb 20, 2026
43501b7
address review comments
divyegala Feb 20, 2026
56467e8
merge upstream
divyegala Mar 3, 2026
748daec
remove < cupy 14
divyegala Mar 3, 2026
8481346
add missing TU
divyegala Mar 3, 2026
d22b048
express deps
divyegala Mar 4, 2026
db75968
remove genex
divyegala Mar 4, 2026
9e0ba6c
typo
divyegala Mar 4, 2026
a67d653
case sensitive
divyegala Mar 4, 2026
3f09b32
Add cudart to cuda-toolkit extras
bdice Mar 4, 2026
77114a4
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala Mar 4, 2026
9069280
fix tests
divyegala Mar 4, 2026
81ed0a2
Apply suggestions from code review
divyegala Mar 5, 2026
342b5cc
address reviews
divyegala Mar 5, 2026
492f293
Merge branch 'main' into ivf-flat-search-udf
divyegala Mar 5, 2026
705dcf9
fix
divyegala Mar 6, 2026
ac10f8d
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala Mar 6, 2026
4d61d4d
Merge branch 'main' into ivf-flat-search-udf
divyegala Mar 6, 2026
cb3e23d
address review
divyegala Mar 11, 2026
217d42d
Merge branch 'main' of github.com:rapidsai/cuvs into ivf-flat-search-udf
divyegala Mar 11, 2026
bf7de27
add more docs
divyegala Mar 12, 2026
1407f60
Merge remote-tracking branch 'origin/release/26.04' into ivf-flat-sea…
divyegala Mar 12, 2026
0fbfaae
tests running
divyegala Apr 2, 2026
27f28fe
Merge remote-tracking branch 'upstream/main' into ivf-flat-search-udf
divyegala Apr 2, 2026
c87b67c
merge better
divyegala Apr 2, 2026
6d83226
Merge branch 'main' into ivf-flat-search-udf
divyegala Apr 6, 2026
569706b
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala Apr 8, 2026
ffadc27
address review feedback part 1
divyegala Apr 9, 2026
2994c62
address reviews for tests, try to add fp16 embedding with AI
divyegala Apr 9, 2026
9b24f05
no embedding headers
divyegala Apr 13, 2026
dea79bd
no embedding headers
divyegala Apr 13, 2026
2756334
update ivf pq
divyegala Apr 13, 2026
852684b
comprehensive type checks for wheel builds
divyegala Apr 13, 2026
d1c5d29
exclude nvrtc
divyegala Apr 13, 2026
33db1d8
clean up recipe
divyegala Apr 14, 2026
27abacc
Merge branch 'main' into ivf-flat-search-udf
divyegala Apr 14, 2026
56de24a
ignore run export
divyegala Apr 14, 2026
b6c1d85
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala Apr 14, 2026
564f91f
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala Apr 14, 2026
6749933
attempt to fix dp4a link issues; other reviews
divyegala Apr 14, 2026
906bd98
add chebyshev
divyegala Apr 14, 2026
4a76e2f
add expect throw test
divyegala Apr 14, 2026
f63d40a
try different nvrtc version get
divyegala Apr 15, 2026
f0cc886
brackets
divyegala Apr 15, 2026
9a346c6
merge upstream
divyegala Apr 15, 2026
b52ce36
comments
divyegala Apr 15, 2026
3fde027
review comments
divyegala Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/build_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ EXCLUDE_ARGS=(
--exclude "libcusparse.so.*"
--exclude "libnccl.so.*"
--exclude "libnvJitLink.so.*"
--exclude "libnvrtc.so.*"
--exclude "libraft.so"
--exclude "librapids_logger.so"
--exclude "librmm.so"
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- clang==20.1.8
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=12.9.2,<13.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- clang==20.1.8
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=12.9.2,<13.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-131_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- clang==20.1.8
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=13.0.1,<14.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-131_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- clang==20.1.8
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=13.0.1,<14.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/bench_ann_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- click
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=12.9.2,<13.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/bench_ann_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- click
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=12.9.2,<13.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/bench_ann_cuda-131_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- click
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=13.0.1,<14.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/bench_ann_cuda-131_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- click
- cmake>=3.30.4
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-python>=13.0.1,<14.0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/go_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=12.9
Expand Down
1 change: 1 addition & 0 deletions conda/environments/go_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=12.9
Expand Down
1 change: 1 addition & 0 deletions conda/environments/go_cuda-131_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=13.1
Expand Down
1 change: 1 addition & 0 deletions conda/environments/go_cuda-131_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=13.1
Expand Down
1 change: 1 addition & 0 deletions conda/environments/rust_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=12.9
Expand Down
1 change: 1 addition & 0 deletions conda/environments/rust_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=12.9
Expand Down
1 change: 1 addition & 0 deletions conda/environments/rust_cuda-131_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=13.1
Expand Down
1 change: 1 addition & 0 deletions conda/environments/rust_cuda-131_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- cmake>=3.30.4
- cuda-cudart-dev
- cuda-nvcc
- cuda-nvrtc-dev
- cuda-nvtx-dev
- cuda-profiler-api
- cuda-version=13.1
Expand Down
16 changes: 15 additions & 1 deletion conda/recipes/libcuvs/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ cache:
- nccl ${{ nccl_version }}
- cuda-version =${{ cuda_version }}
- cuda-cudart-dev
- cuda-nvrtc-dev
- cuda-profiler-api
- libcublas-dev
- libcurand-dev
Expand Down Expand Up @@ -114,6 +115,7 @@ outputs:
- nccl ${{ nccl_version }}
- cuda-version =${{ cuda_version }}
- cuda-cudart-dev
- cuda-nvrtc-dev
- cuda-profiler-api
- libcublas-dev
- libcurand-dev
Expand All @@ -124,6 +126,7 @@ outputs:
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
- libraft-headers =${{ minor_version }}
- librmm =${{ minor_version }}
- cuda-nvrtc
- nccl
- libcublas
- libcurand
Expand All @@ -132,6 +135,7 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libaio
- libboost
Expand Down Expand Up @@ -173,6 +177,7 @@ outputs:
- nccl ${{ nccl_version }}
- cuda-version =${{ cuda_version }}
- cuda-cudart-dev
- cuda-nvrtc-dev
- cuda-profiler-api
- libcublas-dev
- libcurand-dev
Expand All @@ -184,6 +189,7 @@ outputs:
- ${{ pin_subpackage("libcuvs-headers", exact=True) }}
- libraft-headers =${{ minor_version }}
- librmm =${{ minor_version }}
- cuda-nvrtc
- nccl
- libcublas
- libcurand
Expand All @@ -192,6 +198,7 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libaio
- libboost
Expand Down Expand Up @@ -231,6 +238,7 @@ outputs:
- nccl ${{ nccl_version }}
- cuda-version =${{ cuda_version }}
- cuda-cudart-dev
- cuda-nvrtc-dev
- cuda-profiler-api
- libcublas-dev
- libcurand-dev
Expand All @@ -242,6 +250,7 @@ outputs:
- ${{ pin_subpackage("libcuvs-headers", exact=True) }}
- libraft-headers =${{ minor_version }}
- librmm =${{ minor_version }}
- cuda-nvrtc
- nccl
- libcublas
- libcurand
Expand All @@ -250,6 +259,7 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libaio
- libboost
Expand Down Expand Up @@ -306,6 +316,7 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libaio
- libboost
Expand Down Expand Up @@ -394,10 +405,12 @@ outputs:
- libcusolver-dev
- libcusparse-dev
- libnvjitlink-dev
- cuda-nvrtc-dev
run:
- ${{ pin_subpackage("libcuvs-headers", exact=True) }}
- ${{ pin_subpackage("libcuvs", exact=True) }}
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
- cuda-nvrtc
- nccl
- libcublas
- libcurand
Expand All @@ -406,6 +419,7 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libaio
- libboost
Expand Down Expand Up @@ -478,14 +492,14 @@ outputs:
ignore_run_exports:
by_name:
- cuda-cudart
- cuda-nvrtc
- cuda-version
- libcublas
- libcurand
- libcusolver
- libcusparse
- librmm
- mkl
- nccl
about:
homepage: ${{ load_from_file("python/cuvs_bench/pyproject.toml").project.urls.Homepage }}
license: ${{ load_from_file("python/cuvs_bench/pyproject.toml").project.license }}
Expand Down
10 changes: 8 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,7 @@ if(NOT BUILD_CPU_ONLY)
src/detail/jit_lto/AlgorithmPlanner.cpp
src/detail/jit_lto/FragmentEntry.cpp
src/detail/jit_lto/nvjitlink_checker.cpp
src/detail/jit_lto/NVRTCLTOFragmentCompiler.cpp
src/distance/detail/kernels/gram_matrix.cu
src/distance/detail/kernels/kernel_factory.cu
src/distance/detail/kernels/kernel_matrices.cu
Expand Down Expand Up @@ -1033,8 +1034,12 @@ if(NOT BUILD_CPU_ONLY)
$<BUILD_LOCAL_INTERFACE:$<TARGET_NAME_IF_EXISTS:NCCL::NCCL>>
$<BUILD_LOCAL_INTERFACE:$<TARGET_NAME_IF_EXISTS:hnswlib::hnswlib>>
$<$<BOOL:${CUVS_NVTX}>:CUDA::nvtx3>
PRIVATE rmm::rmm $<TARGET_NAME_IF_EXISTS:OpenMP::OpenMP_CXX>
$<COMPILE_ONLY:nvidia::cutlass::cutlass> $<COMPILE_ONLY:cuco::cuco> CUDA::nvJitLink
PRIVATE rmm::rmm
$<TARGET_NAME_IF_EXISTS:OpenMP::OpenMP_CXX>
$<COMPILE_ONLY:nvidia::cutlass::cutlass>
$<COMPILE_ONLY:cuco::cuco>
CUDA::nvJitLink
CUDA::nvrtc
)
set_property(TARGET cuvs PROPERTY NO_CUDART_DEP ON)

Expand Down Expand Up @@ -1093,6 +1098,7 @@ SECTIONS
PRIVATE rmm::rmm
$<TARGET_NAME_IF_EXISTS:OpenMP::OpenMP_CXX>
CUDA::nvJitLink
CUDA::nvrtc
$<$<BOOL:${CUVS_NVTX}>:CUDA::nvtx3>
$<COMPILE_ONLY:nvidia::cutlass::cutlass>
$<COMPILE_ONLY:cuco::cuco>
Expand Down
2 changes: 0 additions & 2 deletions cpp/include/cuvs/detail/jit_lto/AlgorithmLauncher.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,3 @@ struct AlgorithmLauncher {
cudaKernel_t kernel;
cudaLibrary_t library;
};

std::unordered_map<std::string, std::shared_ptr<AlgorithmLauncher>>& get_cached_launchers();
16 changes: 15 additions & 1 deletion cpp/include/cuvs/detail/jit_lto/AlgorithmPlanner.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,26 @@
#pragma once

#include <memory>
#include <shared_mutex>
#include <string>
#include <type_traits>
#include <unordered_map>
#include <utility>
#include <vector>

#include "AlgorithmLauncher.hpp"
#include "FragmentEntry.hpp"

struct LauncherJitCache {
std::shared_mutex mutex;
std::unordered_map<std::string, std::shared_ptr<AlgorithmLauncher>> launchers;
};

struct AlgorithmPlanner {
AlgorithmPlanner(std::string entrypoint) : entrypoint(std::move(entrypoint)) {}
AlgorithmPlanner(std::string entrypoint, LauncherJitCache& jit_cache)
: entrypoint(std::move(entrypoint)), jit_cache_(jit_cache)
{
}

std::shared_ptr<AlgorithmLauncher> get_launcher();

Expand All @@ -37,4 +47,8 @@ struct AlgorithmPlanner {
private:
std::string get_fragments_key() const;
std::shared_ptr<AlgorithmLauncher> build();

std::shared_ptr<AlgorithmLauncher> read_cache(std::string const& launch_key) const;

LauncherJitCache& jit_cache_;
};
17 changes: 17 additions & 0 deletions cpp/include/cuvs/detail/jit_lto/FragmentEntry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,20 @@ struct StaticFatbinFragmentEntry final : FatbinFragmentEntry {
static const uint8_t* const data;
static const size_t length;
};

struct UDFFatbinFragment final : FatbinFragmentEntry {
UDFFatbinFragment(std::string key, std::vector<uint8_t> bytes)
: key_(std::move(key)), bytes_(std::move(bytes))
{
}

const uint8_t* get_data() const override { return bytes_.data(); }

size_t get_length() const override { return bytes_.size(); }

const char* get_key() const override { return key_.c_str(); }

private:
std::string key_;
std::vector<uint8_t> bytes_;
};
29 changes: 29 additions & 0 deletions cpp/include/cuvs/detail/jit_lto/NVRTCLTOFragmentCompiler.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#pragma once

#include <cuvs/detail/jit_lto/FragmentEntry.hpp>

#include <memory>
#include <shared_mutex>
#include <string>
#include <unordered_map>
#include <vector>

struct NVRTCLTOFragmentCompiler {
NVRTCLTOFragmentCompiler();

std::vector<std::string> standard_compile_opts;
std::unordered_map<std::string, std::vector<uint8_t>> cache;
mutable std::shared_mutex cache_mutex_;

std::unique_ptr<UDFFatbinFragment> compile(std::string const& key, std::string const& code);

private:
std::unique_ptr<UDFFatbinFragment> read_cache(std::string const& key) const;
};

NVRTCLTOFragmentCompiler& nvrtc_compiler();
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ struct tag_acc_u32 {};
// Tag types for distance metrics with full template info
struct tag_metric_euclidean {};
struct tag_metric_inner_product {};
struct tag_metric_custom_udf {};

// Tag types for post-processing
struct tag_post_process_identity {};
Expand Down
Loading
Loading