[lake/lance] Refactor LanceArrowWriter #2345

XuQianJin-Stars · 2026-01-11T10:14:13Z

Purpose

Linked issue: close #1569

Refactor fluss-lake-lance module to eliminate code duplication by reusing fluss-common's ArrowFieldWriter implementations.
Issue: The fluss-lake-lance module previously maintained a separate LanceArrowWriter with duplicate FieldWriter implementations because fluss-common uses shaded Arrow API while Lance library requires non-shaded Arrow API.
Solution: Implement ArrowDataConverter to bridge shaded and non-shaded Arrow via zero-copy off-heap memory sharing. This allows lance module to reuse fluss-common's ArrowFieldWriter implementations, eliminating the need for duplicate writer code.

Brief change log

1. Eliminated Code Duplication (~400 lines removed)

Removed LanceArrowWriter class and all its inner FieldWriter classes from fluss-lake-lance
No longer need to maintain separate writer implementations for lance module
All Arrow writing logic now unified through fluss-common's ArrowFieldWriter

2. Created ArrowDataConverter (Zero-Copy Bridge)

Implements efficient conversion between shaded and non-shaded Arrow VectorSchemaRoot
Uses direct off-heap memory sharing via ByteBuffer transfer (zero serialization overhead)
Key insight: Both shaded and non-shaded Arrow use identical off-heap memory layout
Extracts ByteBuffer from shaded ArrowBuf and directly copies to non-shaded ArrowBuf

3. Created ShadedArrowBatchWriter (Reuse Adapter)

Wraps shaded Arrow VectorSchemaRoot and reuses ArrowUtils.createArrowFieldWriter()
Provides simple batch writing interface: writeRow(), finish(), reset()
Enables lance module to leverage all existing ArrowFieldWriter implementations from fluss-common

4. Refactored LanceLakeWriter (Unified Writer Path)

Changed from non-shaded Arrow writer to shaded Arrow writer + converter
Write path: InternalRow → ShadedArrowBatchWriter → ArrowDataConverter → Lance Fragment.create()
Uses ArrayList buffer to collect rows, writes in batches
Maintains both shaded and non-shaded allocators for conversion process

5. Architecture Benefits

Single source of truth: Only fluss-common maintains ArrowFieldWriter implementations
Zero maintenance overhead: New data type support only needs changes in fluss-common
Performance: Zero-copy memory sharing eliminates serialization overhead
Consistency: Lance and other modules use identical Arrow writing logic

Tests

Unit Tests

✅ LanceTieringTest.testTieringWriteTable (with/without partitions)
✅ LakeEnabledTableCreateITCase (table creation with various data types)

Verification

Compilation successful
No behavioral changes to external APIs
Off-heap memory sharing verified through reflection-based buffer access

API and Format

API Changes: None - This is an internal refactoring

External APIs (LakeWriter, LanceLakeTieringFactory) remain unchanged
Lance dataset format and compatibility are preserved

Internal Changes:

LanceLakeWriter now uses ShadedArrowBatchWriter + ArrowDataConverter
New ArrowDataConverter class for zero-copy shaded/non-shaded conversion
New ShadedArrowBatchWriter class to reuse fluss-common's ArrowFieldWriter
Removed LanceArrowWriter and its duplicate FieldWriter implementations

Documentation

Documentation Updates: Not required

This is an internal code refactoring
No new features exposed to users
Existing Lance documentation remains valid

Code Quality Improvements:

Eliminated code duplication (~400 lines of duplicate FieldWriter code removed)
Unified Arrow writing logic across all modules
Better maintainability (changes to ArrowFieldWriter only need to be made once)
Performance optimization through zero-copy memory sharing
Reduced future maintenance burden for new data type support

wuchong · 2026-01-11T16:10:58Z

Just to double-check, doesn’t this still retain the LanceArrowWriters?

The original goal of the issue was to reuse the ArrowWriter implementations from fluss-common to avoid maintaining two separate copies. Could we consider removing LanceArrowWriters entirely and relying on the shared utilities instead?

wuchong

Thanks @XuQianJin-Stars , I left some comments.

wuchong · 2026-01-15T08:06:04Z

fluss-lake/fluss-lake-lance/pom.xml

            <groupId>org.apache.fluss</groupId>
            <artifactId>fluss-common</artifactId>
            <version>${project.version}</version>
-            <scope>provided</scope>


Why need to include fluss-common into shaded jar? If this is needed for testing, we can add a test scope.

@luoyuxia could you also help to check this ?

Yes, we don't need to include fluss-common

wuchong · 2026-01-15T08:09:30Z

...ake/fluss-lake-lance/src/main/java/org/apache/fluss/lake/lance/utils/ArrowDataConverter.java

+    private static List<org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf>
+            getFieldBuffers(
+                    org.apache.fluss.shaded.arrow.org.apache.arrow.vector.FieldVector vector) {
+        try {
+            Method method = vector.getClass().getMethod("getFieldBuffers");
+            return (List<org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf>)
+                    method.invoke(vector);
+        } catch (Exception e) {
+            throw new RuntimeException("Failed to get field buffers from shaded vector", e);
+        }
+    }
+
+    private static int getValueCount(
+            org.apache.fluss.shaded.arrow.org.apache.arrow.vector.FieldVector vector) {
+        try {
+            Method method = vector.getClass().getMethod("getValueCount");
+            return (int) method.invoke(vector);
+        } catch (Exception e) {
+            throw new RuntimeException("Failed to get value count from shaded vector", e);
+        }
+    }
+
+    private static ByteBuffer getByteBuffer(
+            org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf buf) {
+        try {
+            Method method = buf.getClass().getMethod("nioBuffer", long.class, int.class);
+            return (ByteBuffer) method.invoke(buf, 0L, (int) buf.capacity());
+        } catch (Exception e) {
+            try {
+                Field field = buf.getClass().getDeclaredField("memoryAddress");
+                field.setAccessible(true);
+                long address = (long) field.get(buf);
+                return null;
+            } catch (Exception ex) {
+                throw new RuntimeException("Failed to get ByteBuffer from ArrowBuf", ex);


Why introduce these reflections? It seems they provide these methods and can be directly invoked.

wuchong · 2026-01-15T08:16:50Z

...uss-lake-lance/src/main/java/org/apache/fluss/lake/lance/tiering/ShadedArrowBatchWriter.java

+            FieldVector fieldVector = shadedRoot.getVector(i);
+            fieldWriters[i] = ArrowUtils.createArrowFieldWriter(fieldVector, rowType.getTypeAt(i));


Can we direclty use ArrowWriter instead of ArrowFieldWriter? It seems here missed to call initFieldVector which has been done in ArrowWriter.

wuchong · 2026-01-15T08:18:26Z

...uss-lake-lance/src/main/java/org/apache/fluss/lake/lance/tiering/ShadedArrowBatchWriter.java

+        shadedRoot.setRowCount(recordsCount);
+    }
+
+    public void reset() {


This is never called, should we call it in LanceLakeWriter#complete?

wuchong · 2026-01-15T08:19:20Z

...lake/fluss-lake-lance/src/main/java/org/apache/fluss/lake/lance/tiering/LanceLakeWriter.java

    @Override
    public void write(LogRecord record) throws IOException {
-        arrowWriter.write(record);
+        buffer.add(record.getRow());


Why buffer it first instead of writing to arrow directly? This introduce doubled memory overhead.

wuchong · 2026-01-15T08:20:53Z

...lake/fluss-lake-lance/src/main/java/org/apache/fluss/lake/lance/tiering/LanceLakeWriter.java

+            List<FragmentMetadata> fragments =
+                    Fragment.create(datasetUri, nonShadedAllocator, nonShadedRoot, writeParams);
+
+            allFragments.addAll(fragments);


No need to use a memory shared variable allFragments, this can be a local variable and as a return value of the flush() method? I can't find the reset of allFragments.

XuQianJin-Stars force-pushed the refactor/lance-arrow-writer branch 2 times, most recently from 8630ee7 to a1ed91b Compare January 11, 2026 11:58

XuQianJin-Stars changed the title ~~[lake/lance] refactor LanceArrowWriter~~ [lake/lance] Refactor LanceArrowWriter Jan 11, 2026

wuchong reviewed Jan 15, 2026

View reviewed changes

XuQianJin-Stars added 3 commits January 16, 2026 14:07

[lake/lance] Refactor LanceArrowWriter

68ed5b1

[lake/lance] Refactor LanceArrowWriter

f16aa53

[lake/lance] Refactor LanceArrowWriter

458473b

XuQianJin-Stars force-pushed the refactor/lance-arrow-writer branch from 1baa6a1 to 458473b Compare January 16, 2026 06:32

XuQianJin-Stars added 2 commits January 16, 2026 14:39

[lake/lance] Refactor LanceArrowWriter

36b0c48

[lake/lance] Refactor LanceArrowWriter

6930906

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lake/lance] Refactor LanceArrowWriter #2345

[lake/lance] Refactor LanceArrowWriter #2345

Uh oh!

XuQianJin-Stars commented Jan 11, 2026 •

edited

Loading

Uh oh!

wuchong commented Jan 11, 2026

Uh oh!

wuchong left a comment

Uh oh!

wuchong Jan 15, 2026

Uh oh!

luoyuxia Jan 15, 2026

Uh oh!

wuchong Jan 15, 2026

Uh oh!

wuchong Jan 15, 2026

Uh oh!

wuchong Jan 15, 2026

Uh oh!

wuchong Jan 15, 2026

Uh oh!

wuchong Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		FieldVector fieldVector = shadedRoot.getVector(i);
		fieldWriters[i] = ArrowUtils.createArrowFieldWriter(fieldVector, rowType.getTypeAt(i));

[lake/lance] Refactor LanceArrowWriter #2345

Are you sure you want to change the base?

[lake/lance] Refactor LanceArrowWriter #2345

Uh oh!

Conversation

XuQianJin-Stars commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

wuchong commented Jan 11, 2026

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

XuQianJin-Stars commented Jan 11, 2026 •

edited

Loading