Skip to content

[AURON #2234] Handle Hudi scan options case-insensitively#2235

Open
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:fix/hudi-case-insensitive-scan-options
Open

[AURON #2234] Handle Hudi scan options case-insensitively#2235
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:fix/hudi-case-insensitive-scan-options

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2234

Rationale for this change

Auron Hudi native scan detection checks several Hudi datasource options and table properties by key, including table type, base file format, table path, and time travel options.

These lookups were previously case-sensitive. Spark datasource options are commonly handled case-insensitively, so mixed-case Hudi option keys may cause Auron to miss important scan metadata.

This can lead to incorrect native scan conversion decisions. For example, a mixed-case MOR table type option may not be detected, so a query that should fall back to Spark could be considered eligible for native scan.

What changes are included in this PR?

This PR makes Hudi scan option and property lookup case-insensitive for:

  • Hudi table type
  • Hudi base file format
  • Hudi table path
  • Hudi time travel options
  • Hudi catalog/storage properties
  • Hudi table properties loaded from .hoodie/hoodie.properties

It also adds unit coverage for mixed-case Hudi options.

Are there any user-facing changes?

No API changes.
This only makes Hudi native scan detection more compatible and safer when Hudi option key casing differs.

How was this patch tested?

CI.

@weimingdiit weimingdiit marked this pull request as ready for review May 5, 2026 09:51
@slfan1989 slfan1989 requested a review from Copilot May 5, 2026 12:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of Auron’s Hudi native scan detection by making option/property lookups case-insensitive, aligning behavior with common Spark datasource option handling and reducing the risk of unsafe native scan eligibility decisions when users supply mixed-case Hudi keys.

Changes:

  • Normalize Hudi scan option and catalog/storage property lookups to be case-insensitive (table type, base file format, path, time travel keys).
  • Make .hoodie/hoodie.properties lookups case-insensitive for relevant keys.
  • Add a unit test covering mixed-case Hudi option keys.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
thirdparty/auron-hudi/src/main/scala/org/apache/spark/sql/auron/hudi/HudiScanSupport.scala Introduces shared case-insensitive lookup helpers and applies them across Hudi support detection paths (options, catalog, metadata properties, time travel).
thirdparty/auron-hudi/src/test/scala/org/apache/spark/sql/auron/hudi/HudiScanSupportSuite.scala Adds a unit test validating mixed-case option keys are handled as expected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@weimingdiit weimingdiit force-pushed the fix/hudi-case-insensitive-scan-options branch 2 times, most recently from ef6158f to c3005c1 Compare May 6, 2026 02:21
Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the fix/hudi-case-insensitive-scan-options branch from c3005c1 to 9ff82bc Compare May 6, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle Hudi scan options case-insensitively

2 participants