[FEA] View Type PQ Preprocessor by tarang-jain · Pull Request #1764 · rapidsai/cuvs

tarang-jain · 2026-02-03T11:56:29Z

Separate VPQ codebooks from encoded data by introducing vpq_codebooks<MathT> (PIMPL with owning/view variants). vpq_dataset is now a simple {codebooks, data} struct. quantizer<T> holds only vpq_codebooks since it doesn't need encoded data. Adds a view-type pq::build() overload for pre-computed external codebooks.

copy-pr-bot · 2026-02-03T11:56:33Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

… view-pq-quantizer

cpp/include/cuvs/preprocessing/quantize/pq.hpp

cpp/src/preprocessing/quantize/detail/pq.cuh

cpp/include/cuvs/neighbors/common.hpp

cpp/src/neighbors/vpq_dataset_impl.hpp

… into view-pq-quantizer

… view-pq-quantizer

tfeher

Hi Tarang, thank you for the PR! Overall looks good, but there is an issue with a copy operation that should be fixed before we can merge this.

Nitpick, but I find 'View Type PQ preprocessor' expression confusing. Could you please update the PR description to explain that this PR provides an owning and non owning variants of the vpq_dataset?

cpp/src/preprocessing/quantize/detail/pq.cuh

cpp/include/cuvs/preprocessing/quantize/pq.hpp

cpp/include/cuvs/neighbors/common.hpp

tfeher · 2026-04-01T23:05:37Z

cpp/src/neighbors/detail/vpq_dataset.cuh

+  // Copy the data from the source (data type is uint8_t, independent of MathT)
+  auto data_view = src.data();
+  auto data      = raft::make_device_matrix<uint8_t, IdxT, raft::row_major>(
+    res, data_view.extent(0), data_view.extent(1));
+  raft::copy(data.data_handle(),
+             data_view.data_handle(),
+             data_view.size(),
+             raft::resource::get_cuda_stream(res));


src.data() is the whole dataset. Although it is already compressed, it can still be HUGE in size. We definitely do not want to copy it. Note that the function takes src as rvalue reference, so we can just move src.data() to the new vpq_dataset and take ownership of it.

… into view-pq-quantizer

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>

tarang-jain · 2026-04-03T01:34:55Z

@tfeher I have updated the PR desc

… into view-pq-quantizer

…quantizer

lowener

Looks very good, thanks! Just have a remark on build()

lowener · 2026-04-09T10:22:15Z

cpp/tests/neighbors/ann_scann.cuh

    cuvs::preprocessing::quantize::pq::quantizer<float> quantizer{
      pq_params,
-      cuvs::neighbors::vpq_dataset<float, int64_t>{
-        std::move(vq_codebook), std::move(pq_codebook_copy), std::move(empty_data)}};
+      cuvs::preprocessing::quantize::pq::vpq_codebooks<float>{
+        std::make_unique<cuvs::preprocessing::quantize::pq::vpq_codebooks_owning<float>>(
+          std::move(vq_codebook), std::move(pq_codebook_copy))}};


Use the new pq::build() with views that you're introducing in this PR. That will also get rid of the relative include of src

lowener · 2026-04-09T10:26:16Z

cpp/src/neighbors/detail/vamana/vamana_build.cuh

    auto quantizer = cuvs::preprocessing::quantize::pq::quantizer<float>(
      pq_params,
-      cuvs::neighbors::vpq_dataset<float, int64_t>{
-        raft::make_device_matrix<float, uint32_t, raft::row_major>(res, 0, 0),
-        std::move(pq_codebook),
-        raft::make_device_matrix<uint8_t, int64_t, raft::row_major>(res, 0, 0)});
+      cuvs::preprocessing::quantize::pq::vpq_codebooks<float>{
+        std::make_unique<cuvs::preprocessing::quantize::pq::vpq_codebooks_owning<float>>(
+          raft::make_device_matrix<float, uint32_t, raft::row_major>(res, 0, 0),
+          std::move(pq_codebook))});


Use the new pq::build() with views

lowener · 2026-04-09T10:42:09Z

cpp/src/preprocessing/quantize/detail/vpq_dataset_impl.hpp

+  vpq_codebooks_owning(raft::device_matrix<math_type, uint32_t, raft::row_major>&& vq_code_book,
+                       raft::device_matrix<math_type, uint32_t, raft::row_major>&& pq_code_book)


Nitpick: Would it make sense to have the vq as optional in both struct? It would follow the other functions parameter order too if it's by default nullopt.

first commit

2accbba

github-project-automation bot added this to Unstructured Data Processing Feb 3, 2026

update vpq_dataset

4c6182c

tarang-jain self-assigned this Feb 3, 2026

tarang-jain added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 3, 2026

tarang-jain and others added 5 commits February 3, 2026 04:52

clean pimpl separation

f18e00c

fix vpq_build

fa70a01

revert changes to quantizer struct

bf763e3

Merge branch 'main' into view-pq-quantizer

ac85ece

Merge branch 'main' into view-pq-quantizer

2728273

cjnolet moved this to In Progress in Unstructured Data Processing Feb 6, 2026

tarang-jain and others added 5 commits February 10, 2026 21:32

Merge branch 'main' into view-pq-quantizer

a0f6c76

Merge branch 'main' into view-pq-quantizer

1620486

make user class pure pimpl

b0aaa05

Merge branch 'release/26.04' of https://github.com/rapidsai/cuvs into…

b479c34

… view-pq-quantizer

fixes

04be0a0

tarang-jain changed the base branch from main to release/26.04 March 13, 2026 23:21

tarang-jain marked this pull request as ready for review March 13, 2026 23:28

tarang-jain requested review from a team as code owners March 13, 2026 23:28

tarang-jain and others added 3 commits March 16, 2026 10:40

style

f80280e

fix tests

51e8209

Merge branch 'release/26.04' into view-pq-quantizer

c558964

lowener requested changes Mar 18, 2026

View reviewed changes

tarang-jain and others added 4 commits March 19, 2026 15:40

move vpq_dataset class

ebfc7d2

Merge branch 'release/26.04' into view-pq-quantizer

b949c2c

fix style

a8b3ce4

Merge branch 'view-pq-quantizer' of https://github.com/tarang-jain/cuvs…

4e9565f

… into view-pq-quantizer

tarang-jain and others added 3 commits March 30, 2026 15:05

Merge branch 'view-pq-quantizer' of https://github.com/tarang-jain/cuvs…

be681d3

… into view-pq-quantizer

Merge branch 'release/26.04' of https://github.com/rapidsai/cuvs into…

65437ec

… view-pq-quantizer

Merge branch 'release/26.04' into view-pq-quantizer

fc30857

tfeher requested changes Apr 1, 2026

View reviewed changes

Merge branch 'view-pq-quantizer' of https://github.com/tarang-jain/cuvs…

faa46f9

… into view-pq-quantizer

aamijar changed the base branch from release/26.04 to main April 2, 2026 03:10

tarang-jain added breaking Introduces a breaking change and removed non-breaking Introduces a non-breaking change labels Apr 3, 2026

tarang-jain and others added 2 commits April 2, 2026 18:30

Update cpp/include/cuvs/preprocessing/quantize/pq.hpp

2c1aa71

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>

Update cpp/src/preprocessing/quantize/detail/pq.cuh

d6a8364

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>

tarang-jain added 14 commits April 2, 2026 18:37

Merge branch 'view-pq-quantizer' of https://github.com/tarang-jain/cuvs…

963f16e

… into view-pq-quantizer

merge upstream; resolve merge conflicts

cbb5d75

update namespace

68f016d

fix compilation

0070cda

create vpq_codebooks

819eef8

reduce diff

19fe976

fix compilation

77bd557

pre-commit

1c85f16

revert bm change

4708434

rm unnecessary commits

55db0a4

fix error message and copyright

6408cbb

fix condition check

c4250f5

change trailing return type

8c1f792

Merge branch 'main' of https://github.com/rapidsai/cuvs into view-pq-…

f5ac6ba

…quantizer

tarang-jain requested review from lowener and tfeher April 6, 2026 22:12

tarang-jain and others added 2 commits April 7, 2026 18:27

add non const getters

4b7015f

Merge branch 'main' into view-pq-quantizer

cc2a900

lowener reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] View Type PQ Preprocessor#1764

[FEA] View Type PQ Preprocessor#1764
tarang-jain wants to merge 45 commits intorapidsai:mainfrom
tarang-jain:view-pq-quantizer

tarang-jain commented Feb 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher Apr 1, 2026

Uh oh!

tarang-jain commented Apr 3, 2026

Uh oh!

lowener left a comment

Uh oh!

lowener Apr 9, 2026

Uh oh!

lowener Apr 9, 2026

Uh oh!

lowener Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		vpq_codebooks_owning(raft::device_matrix<math_type, uint32_t, raft::row_major>&& vq_code_book,
		raft::device_matrix<math_type, uint32_t, raft::row_major>&& pq_code_book)

Conversation

tarang-jain commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

tarang-jain commented Apr 3, 2026

Uh oh!

lowener left a comment

Choose a reason for hiding this comment

Uh oh!

lowener Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

lowener Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

lowener Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tarang-jain commented Feb 3, 2026 •

edited

Loading