Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 12, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

This is needed for the following features:

  • Support complex types as partition keys in hash partitioning in native shuffle
  • Support round-robin partitioning in native shuffle

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.50%. Comparing base (f09f8af) to head (078b204).
⚠️ Report is 845 commits behind head on main.

Files with missing lines Patch % Lines
...k/src/main/scala/org/apache/comet/serde/hash.scala 75.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3077      +/-   ##
============================================
+ Coverage     56.12%   59.50%   +3.37%     
- Complexity      976     1381     +405     
============================================
  Files           119      167      +48     
  Lines         11743    15560    +3817     
  Branches       2251     2586     +335     
============================================
+ Hits           6591     9259    +2668     
- Misses         4012     5002     +990     
- Partials       1140     1299     +159     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// Hash each element in sequence, chaining the hash values
for elem_idx in 0..len {
let elem_array = values.slice(start + elem_idx, 1);
let mut single_hash = [*hash];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is single_hash an array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single_hash is an array because the recursive hash method interface expects a slice of hashes and this allows us to reuse that rather than add another version of the code

* These tests verify that Comet's native implementation of murmur3 hash produces identical
* results to Spark's implementation for all supported data types.
*/
class CometHashExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this expression to one of the fuzz suites?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Are you thinking more about the fuzz data generation aspect, or testing across different scans/shuffles, or both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just interested in the fuzz data generation, seems like a good schema to throw at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that already uncovered a bug 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants