Add an (optional) field-writer creation strategy to Lucene99FlatVectorsWriter#16053
Open
ldematte wants to merge 4 commits into
Open
Add an (optional) field-writer creation strategy to Lucene99FlatVectorsWriter#16053ldematte wants to merge 4 commits into
Lucene99FlatVectorsWriter#16053ldematte wants to merge 4 commits into
Conversation
ldematte
commented
May 13, 2026
| private final List<FieldWriter<?>> fields = new ArrayList<>(); | ||
| private record FieldData(FlatFieldVectorsWriter<?> fieldWriter, FieldInfo fieldInfo) {} | ||
|
|
||
| private final List<FieldData> fields = new ArrayList<>(); |
Author
There was a problem hiding this comment.
We could store FlatFieldVectorsWriters and FieldInfos separately in 2 arrays, but I think using a small record here is cleaner
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lucene99FlatVectorsWriterhard-codes the per-field storage for vector values to an on-heapArrayList<T>(ArrayList<float[]>orArrayList<byte[]>), managed by a private nested classLucene99FlatVectorsWriter.FieldWriter<T>.Different users of
Lucene99FlatVectorsFormatmay want to change how vectors are stored in memory; this PR proposes a simple change to decouple the external interface and the write pipeline implemented byLucene99FlatVectorsWriterfrom the vectors' memory storage.In particular, this PR adds a public constructor overload to
Lucene99FlatVectorsWriterthat accepts aFlatFieldVectorsWriterfactory: when the factory is supplied,addField(FieldInfo)uses it to obtain the per-field storage; when the existing two-arg ctor is used, the current behavior is preserved exactly via a default hardcodedfieldWriterFactory.Lucene99FlatVectorsWriterwrite pipeline is unchanged in shape, but is refactored internally to read its FieldWriter's state through accessor methods rather than direct field reads. This makes it possible to introduce aDelegatingvariant ofFieldWriterthat forwards to the injected strategy, without the need to open up the class visibility.FieldWriter<T>remains a private nested class. No public API surface ofFlatFieldVectorsWriter<T>(or any other class) is changed.A new
TestKnnVectorsFormatCustomWriteris introduced; this test exercises the new constructor againstthe full
BaseKnnVectorsFormatTestCasesuite, using a paged storage strategy as a concrete example of a non-defaultFlatFieldVectorsWriter.The purpose of the test is 2-fold: 1) ensure that the use of a different, custom strategy does not break any existing invariants, and 2) showcase how a different
FlatFieldVectorsWritercan work. The testFlatFieldVectorsWriterstores vectors in aList<ByteBuffer>of fixed-size pages, and exposes them back throughFlatFieldVectorsWriter#getVectors()via anAbstractListadapter that materializes a heap array per access. The on-disk format produced is identical to the default configuration -- only the in-memory accumulation differs.