feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234
feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234JingsongLi merged 4 commits intoapache:mainfrom
Conversation
| let row = BinaryRow::from_serialized_bytes(&msg.partition)?; | ||
| let mut spec = HashMap::new(); | ||
| for (i, key) in partition_keys.iter().enumerate() { | ||
| if let Some(datum) = extract_datum(&row, i, &data_types[i])? { |
There was a problem hiding this comment.
This will drop NULL partition keys from the overwrite predicate. I reproduced a case where overwriting the NULL partition also deletes other partitions.
… support Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers. Key components: - TableWrite: routes batches by partition/bucket, holds DataFileWriters - DataFileWriter: manages parquet file lifecycle with rolling support - WriteBuilder: creates TableWrite and TableCommit instances - PaimonDataSink: DataFusion DataSink integration for INSERT/OVERWRITE - FormatFileWriter: extended with flush() and in_progress_size() Configurable options via CoreOptions: - file.compression (default: zstd) - target-file-size (default: 256MB) - write.parquet-buffer-size (default: 256MB) Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.
| let datum = extract_datum_from_arrow(batch, row_idx, field_idx, field.data_type())?; | ||
| if let Some(d) = datum { | ||
| datums.push((d, field.data_type().clone())); | ||
| } |
There was a problem hiding this comment.
This will drop NULL bucket-key fields before hashing. Java preserves NULL positions here; see FixedBucketRowKeyExtractorTest.testUnCompactDecimalAndTimestampNullValueBucketNumber.
https://github.com/apache/paimon/blob/master/paimon-core/src/test/java/org/apache/paimon/table/sink/FixedBucketRowKeyExtractorTest.java
There was a problem hiding this comment.
Good point! Also fix bucket Null values in TableScan.
|
+1 |
leaves12138
left a comment
There was a problem hiding this comment.
Solid write pipeline implementation. The architecture mirrors the paimon-python design well and the delta-rs style direct-write pattern is a good fit.
Highlights:
-
TableWrite+DataFileWriter: Clean per (partition, bucket) writer model. Thedivide_by_partition_bucketrouting viaarrow_select::takeis correct for now. Background file close viaJoinSetinroll_file()and parallelprepare_commitwithtry_join_allare well thought out. -
PaimonDataSink: ProperDataSinkimplementation withwrite_allfor INSERT and overwrite support. The dynamic partition predicate extraction from commit messages for OVERWRITE is the right approach. -
TableCommitrefactoring: Splitting into explicitcommit()(APPEND) andoverwrite()(dynamic partition overwrite) methods is cleaner than the implicitoverwrite_partitionconstructor arg. -
Integration tests: Comprehensive E2E coverage — unpartitioned, partitioned, fixed bucket, multi-commit, column projection, bucket filtering.
-
CoreOptions additions:
file.compression,target-file-size,write.parquet-buffer-sizeare the right knobs to expose.
Minor notes (non-blocking):
divide_by_partition_bucketcreates oneUInt32Arrayof indices per row. For large batches this could be optimized with batch-level partition extraction (e.g., sort-by-partition then slice), but the current approach is correct and simple for a first pass.DataFileMetausesEMPTY_SERIALIZED_ROWformin_key/max_keyand zero sequence numbers — this is fine for append-only but worth a TODO note if PK/compaction support is planned later.- The NULL datum handling fix (from the review thread) is correct — dropping NULL from bucket key datums and partition predicate specs was a real bug.
+1, good to merge.
Purpose
Subtask of #232
Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers.
Key components:
Configurable options via CoreOptions:
Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.
Brief change log
Tests
API and Format
Documentation