Skip to content

Implement data-blind scalar quantization#16030

Open
mccullocht wants to merge 8 commits into
apache:mainfrom
mccullocht:sq-data-blind
Open

Implement data-blind scalar quantization#16030
mccullocht wants to merge 8 commits into
apache:mainfrom
mccullocht:sq-data-blind

Conversation

@mccullocht
Copy link
Copy Markdown
Contributor

@mccullocht mccullocht commented May 4, 2026

Add an option to the quantization format to enable or disable centering (enabled by default). When centering is disabled we also stop writing the float vectors which can lead to significant storage savings. Special handling is included during merges -- we check that all of the input is in the same encoding, and handle transcoding if some of the input is float vectors.

Large portions of this change were generated using claude code. I reviewed, tweaked, and tested the code before putting it up for review.

This change is being made as a new codec as the format changes to drop the center vector when centering is disabled. This is not strictly necessary as we could write a zero vector instead, but I have plans to make other format changes related to data blindness, see #16029.

luceneutil results -- 1M cohere vectors, 8 bit quantization.
before:

recall  latency(ms)  netCPU  avgCpuCount     nDoc  searchType  topK  fanout  resultSimilarity  decay  resultCount  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  filterStrategy  filterSelectivity  overSample  vec_disk(MB)  vec_RAM(MB)  bp-reorder  indexType
 0.974        2.304   2.297        0.997  1000000         KNN   100     100               N/A    N/A      100.000       64        250     8 bits     8619    132.85       7527.40          235.00             1         5047.27            null                N/A       1.000      4898.071      991.821       false       HNSW

after

recall  latency(ms)  netCPU  avgCpuCount     nDoc  searchType  topK  fanout  resultSimilarity  decay  resultCount  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  filterStrategy  filterSelectivity  overSample  vec_disk(MB)  vec_RAM(MB)  bp-reorder  indexType
 0.972        2.281   2.274        0.997  1000000         KNN   100     100               N/A    N/A      100.000       64        250     8 bits     8612    143.06       6990.07          160.33             1         1140.98            null                N/A       1.000      4898.071      991.821       false       HNSW

The harness extrapolates vector size from the input size so believe the on-disk index_size number -- this is about 4x smaller. Force merge is faster since we don't have to re-quantize vectors on merge. Recall is very similar but YMMV.

mccullocht added 7 commits May 2, 2026 21:51
Allow callers to disable centering at the format level, which also
disables writing of float vectors since they are no longer needed.

Includes a path to handle of mix of centered and uncentered segments as
input. In this case the uncentered/no float vectors will be dequantized
and requantized but this case should be relatively uncommon.

Includes OSQ changes to allow a zero vector for COSINE if the vector is
not a unit vector. Maybe fix this in upstream callers?
@mccullocht mccullocht added this to the 10.5.0 milestone May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant