[AURON #2221] Remove hard-coded Iceberg scan class name detection using type-based check#2226
Open
guixiaowen wants to merge 4 commits intoapache:masterfrom
Open
[AURON #2221] Remove hard-coded Iceberg scan class name detection using type-based check#2226guixiaowen wants to merge 4 commits intoapache:masterfrom
guixiaowen wants to merge 4 commits intoapache:masterfrom
Conversation
…g Iceberg table types.
added 2 commits
May 3, 2026 12:45
…on using type-based check
…on using type-based check
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Auron’s Iceberg integration to avoid brittle string-based detection of Iceberg scan/partition classes, moving toward type-based checks to better tolerate Iceberg refactoring.
Changes:
- Replaced hard-coded
scan.getClass.getNameprefix/equality checks with a class-based gate for identifying Iceberg batch scans. - Replaced hard-coded
SparkInputPartitionclass-name equality with a class-based check. - Added a small utility under
org.apache.iceberg.spark.sourceto exposeClass[_]handles for (likely) package-private Iceberg classes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala | Swaps string-based Iceberg scan/partition detection to class-based checks. |
| thirdparty/auron-iceberg/src/main/scala/org/apache/iceberg/spark/source/AuronIcebergSourceUtil.scala | Introduces helper to access Class objects for Iceberg source types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Changelog scan carries row-level changes; not supported by native COW-only path. | ||
| if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") { | ||
| if (!(scan.getClass == AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan)) { |
…on using type-based check
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…erg table types.
Which issue does this PR close?
Closes #2221
Rationale for this change
This PR removes string-based detection and replaces it with type-based checking.
The current implementation relies on hard-coded class name strings to detect Iceberg scan types:
`
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}
if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
return None
}
if (className != "org.apache.iceberg.spark.source.SparkInputPartition") {
return None
}
`
This approach introduces tight coupling to Iceberg internal class naming and has several drawbacks:
Fragile to upstream refactoring (class/package rename)
Lacks type safety
Hard to maintain and extend
Notes on Changelog Scan
ChangelogScan is not a subclass of SparkBatchQueryScan
What changes are included in this PR?
Three conditional scenarios were modified to avoid hard-coded logic in the evaluations.
Benefits
Removes hard-coded class name dependency
Improves type safety and readability
More robust against Iceberg internal refactoring
Cleaner and more maintainable logic
Are there any user-facing changes?
No changes.
How was this patch tested?
Depends on existing unit tests.