Is your feature request related to a problem or challenge?
arrays_zip (arrays_zip_inner_with_field in datafusion/functions-nested/src/arrays_zip.rs) always assembles its output by walking every row through per-column MutableArrayData builders, copying each input slice one row at a time (builder.extend(0, start, end)) and padding shorter rows with NULLs (builder.extend_nulls(...)).
When the inputs form a perfect zip — every input array has identical per-row element lengths, no null list rows with non-zero element slots, and therefore no null padding is needed — this row-by-row copy is wasted work. In that case the resulting struct child columns are bit-identical to the (concatenated) input value arrays, and the list offsets are identical to the inputs' offsets.
Describe the solution you'd like
Detect the perfect-zip case up front and skip the MutableArrayData path entirely:
- Build the output struct child columns directly from the original input value
ArrayRefs (clone / concat, no per-row copy).
- Reuse an input array's offset buffer for the resulting
ListArray instead of rebuilding it.
This keeps the existing general path as a fallback for the ragged / null-padded cases.
Describe alternatives you've considered
Keep the current always-copy implementation. It is correct but does avoidable work for the common case where all zipped arrays line up.
Additional context
Raised by @paleolimbot while reviewing #21984:
Not here, but for the perfect zip (all value arrays the same length, no nulls with non-zero element slot lengths, no null padding needed) this should ideally be just clones of the original arrayrefs
Split out of #21984 (a metadata-propagation bugfix) since this is an orthogonal performance optimization that warrants its own benchmarks and edge-case tests.
Is your feature request related to a problem or challenge?
arrays_zip(arrays_zip_inner_with_fieldindatafusion/functions-nested/src/arrays_zip.rs) always assembles its output by walking every row through per-columnMutableArrayDatabuilders, copying each input slice one row at a time (builder.extend(0, start, end)) and padding shorter rows with NULLs (builder.extend_nulls(...)).When the inputs form a perfect zip — every input array has identical per-row element lengths, no null list rows with non-zero element slots, and therefore no null padding is needed — this row-by-row copy is wasted work. In that case the resulting struct child columns are bit-identical to the (concatenated) input value arrays, and the list offsets are identical to the inputs' offsets.
Describe the solution you'd like
Detect the perfect-zip case up front and skip the
MutableArrayDatapath entirely:ArrayRefs (clone / concat, no per-row copy).ListArrayinstead of rebuilding it.This keeps the existing general path as a fallback for the ragged / null-padded cases.
Describe alternatives you've considered
Keep the current always-copy implementation. It is correct but does avoidable work for the common case where all zipped arrays line up.
Additional context
Raised by @paleolimbot while reviewing #21984:
Split out of #21984 (a metadata-propagation bugfix) since this is an orthogonal performance optimization that warrants its own benchmarks and edge-case tests.