Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 12, 2026

Which issue does this PR close?

Closes #3067

Builds on #3077

Rationale for this change

When a native plan has round-robin partitioning, we do columnar-to-row, fall back to Spark for partitioning, then row-to-columnar before writing the shuffle file. This is extremely inefficient.

What changes are included in this PR?

Comet's native shuffle now supports df.repartition(n) (round-robin partitioning). However, it is disabled by default because it produces different partition assignments than Spark. Spark's round-robin implementation sorts rows by their binary UnsafeRow representation before assigning partitions to ensure deterministic output for fault tolerance. Since Comet uses Arrow format internally (which has a completely different binary layout), we cannot match Spark's exact partition assignments.

Instead of true round-robin, Comet implements it as hash partitioning on ALL columns. This achieves the same semantic goals:

  • Even distribution - rows are distributed evenly across partitions
  • Deterministic - same input always produces the same partition assignments (important for fault tolerance / task retries)
  • No semantic grouping - unlike hash partitioning on specific columns, this doesn't group related rows together

To enable, set spark.comet.native.shuffle.partitioning.roundrobin.enabled=true.

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 68.57143% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.55%. Comparing base (f09f8af) to head (b35e402).
⚠️ Report is 843 commits behind head on main.

Files with missing lines Patch % Lines
...t/execution/shuffle/CometShuffleExchangeExec.scala 43.75% 7 Missing and 2 partials ⚠️
...k/src/main/scala/org/apache/comet/serde/hash.scala 0.00% 0 Missing and 1 partial ⚠️
...t/execution/shuffle/CometNativeShuffleWriter.scala 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3076      +/-   ##
============================================
+ Coverage     56.12%   59.55%   +3.42%     
- Complexity      976     1382     +406     
============================================
  Files           119      167      +48     
  Lines         11743    15572    +3829     
  Branches       2251     2585     +334     
============================================
+ Hits           6591     9274    +2683     
- Misses         4012     4998     +986     
- Partials       1140     1300     +160     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title feat: experimental support for round-robin partitioning in native shuffle feat: Add support for round-robin partitioning in native shuffle Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add round-robin partitioning support to native shuffle

2 participants