Skip to content

Add an (optional) field-writer creation strategy to Lucene99FlatVectorsWriter#16053

Open
ldematte wants to merge 4 commits into
apache:mainfrom
ldematte:flat-vectors-field-writer-strategy
Open

Add an (optional) field-writer creation strategy to Lucene99FlatVectorsWriter#16053
ldematte wants to merge 4 commits into
apache:mainfrom
ldematte:flat-vectors-field-writer-strategy

Conversation

@ldematte
Copy link
Copy Markdown

@ldematte ldematte commented May 12, 2026

Lucene99FlatVectorsWriter hard-codes the per-field storage for vector values to an on-heap ArrayList<T> (ArrayList<float[]> or ArrayList<byte[]>), managed by a private nested class Lucene99FlatVectorsWriter.FieldWriter<T>.

Different users of Lucene99FlatVectorsFormat may want to change how vectors are stored in memory; this PR proposes a simple change to decouple the external interface and the write pipeline implemented by Lucene99FlatVectorsWriter from the vectors' memory storage.

In particular, this PR adds a public constructor overload to Lucene99FlatVectorsWriter that accepts a FlatFieldVectorsWriter factory: when the factory is supplied, addField(FieldInfo) uses it to obtain the per-field storage; when the existing two-arg ctor is used, the current behavior is preserved exactly via a default hardcoded fieldWriterFactory.

Lucene99FlatVectorsWriter write pipeline is unchanged in shape, but is refactored internally to read its FieldWriter's state through accessor methods rather than direct field reads. This makes it possible to introduce a Delegating variant of FieldWriter that forwards to the injected strategy, without the need to open up the class visibility. FieldWriter<T> remains a private nested class. No public API surface of FlatFieldVectorsWriter<T> (or any other class) is changed.

A new TestKnnVectorsFormatCustomWriter is introduced; this test exercises the new constructor against
the full BaseKnnVectorsFormatTestCase suite, using a paged storage strategy as a concrete example of a non-default FlatFieldVectorsWriter.

The purpose of the test is 2-fold: 1) ensure that the use of a different, custom strategy does not break any existing invariants, and 2) showcase how a different FlatFieldVectorsWriter can work. The test FlatFieldVectorsWriter stores vectors in a List<ByteBuffer> of fixed-size pages, and exposes them back through FlatFieldVectorsWriter#getVectors() via an AbstractList adapter that materializes a heap array per access. The on-disk format produced is identical to the default configuration -- only the in-memory accumulation differs.

private final List<FieldWriter<?>> fields = new ArrayList<>();
private record FieldData(FlatFieldVectorsWriter<?> fieldWriter, FieldInfo fieldInfo) {}

private final List<FieldData> fields = new ArrayList<>();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store FlatFieldVectorsWriters and FieldInfos separately in 2 arrays, but I think using a small record here is cleaner

@ldematte ldematte marked this pull request as ready for review May 13, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant