Skip to content

Current shuffle format has too much overhead with default batch size #3882

@andygrove

Description

@andygrove

Describe the bug

The current shuffle format writes each batch using the Arrow IPC Stream format, writing a single batch per stream instance, which means that the schema is encoded for each batch. There may also be overhead in creating a new compression codec for each batch.

In one example, we have seen that with the default batch size that Comet shuffle files are 50% larger than Spark shuffle files, and overall query performance was 10% slower than Spark. After doubling the batch size, Comet shuffle files were only 8% larger than Spark and performance was 15% faster than Spark.

Increasing the batch size consistently improves performance, but at the cost of downstream operators potentially using more memory, although we have not measured this.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

area:shuffleShuffle (JVM and native)bugSomething isn't workingperformancepriority:highCrashes, panics, segfaults, major functional breakage

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions