Implement data-blind scalar quantization#16030
Open
mccullocht wants to merge 8 commits into
Open
Conversation
Allow callers to disable centering at the format level, which also disables writing of float vectors since they are no longer needed. Includes a path to handle of mix of centered and uncentered segments as input. In this case the uncentered/no float vectors will be dequantized and requantized but this case should be relatively uncommon. Includes OSQ changes to allow a zero vector for COSINE if the vector is not a unit vector. Maybe fix this in upstream callers?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an option to the quantization format to enable or disable centering (enabled by default). When centering is disabled we also stop writing the float vectors which can lead to significant storage savings. Special handling is included during merges -- we check that all of the input is in the same encoding, and handle transcoding if some of the input is float vectors.
Large portions of this change were generated using claude code. I reviewed, tweaked, and tested the code before putting it up for review.
This change is being made as a new codec as the format changes to drop the center vector when centering is disabled. This is not strictly necessary as we could write a zero vector instead, but I have plans to make other format changes related to data blindness, see #16029.
luceneutil results -- 1M cohere vectors, 8 bit quantization.
before:
after
The harness extrapolates vector size from the input size so believe the on-disk index_size number -- this is about 4x smaller. Force merge is faster since we don't have to re-quantize vectors on merge. Recall is very similar but YMMV.