Skip to content

Conversation

@Shekharrajak
Copy link
Contributor

Fixes #2944

This happens because RangeExec (and other non-Comet Spark operators) produce Spark's OnHeapColumnVector instead of Arrow arrays that the native writer expects.

What changes are included in this PR?

  • Modified CometNativeWriteExec.doExecuteColumnar() to detect when the child operator is not a CometPlan
  • Added automatic conversion of Spark columnar batches to Arrow format using CometArrowConverters.columnarBatchToArrowBatchIter()
  • Added support for row-based input by converting rows to Arrow batches using CometArrowConverters.rowToArrowBatchIter()

How are these changes tested?

Added two new tests in CometParquetWriterSuite:

@Shekharrajak Shekharrajak force-pushed the fix/issue-2944-local-writer-arrow-array branch from 5a6966b to 37e4a23 Compare January 12, 2026 18:42
@mbutrovich
Copy link
Contributor

Should this just be a modification to shouldApplySparkToColumnar in CometExecRule to insert the operator instead of duplicating that operator's logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet fails when local writer enabled

2 participants