chore: run Spark 3.4 tests with native_datafusion scan#3722
chore: run Spark 3.4 tests with native_datafusion scan#3722andygrove wants to merge 10 commits intoapache:mainfrom
native_datafusion scan#3722Conversation
Tag tests with IgnoreCometNativeDataFusion and IgnoreCometNativeScan to match tags already applied in the 3.5.8 diff, plus 3 tests that are specific to Spark 3.4.
…diff Remove imports of IgnoreCometNativeDataFusion in DynamicPartitionPruningSuite and FileBasedDataSourceSuite since both files are in the org.apache.spark.sql package where IgnoreCometNativeDataFusion is defined, making the import redundant and causing "permanently hidden" compilation errors.
… diff The test override in SQLTestUtils checked DisableAdaptiveExecution first, so tests with both DisableAdaptiveExecution and IgnoreCometNativeDataFusion tags (like "static scan metrics") would enter the DAE branch and never check the ignore tag. Reorder to match the 3.5.8 diff: check Comet skip tags first with early returns, then handle DisableAdaptiveExecution.
native_datafusion scan [WIP]native_datafusion scan
| assert(bJoinExec.isEmpty) | ||
| val smJoinExec = collect(joinedDF.queryExecution.executedPlan) { | ||
| case smJoin: SortMergeJoinExec => smJoin | ||
| + case smJoin: CometSortMergeJoinExec => smJoin |
There was a problem hiding this comment.
what's the change here (and in the two changed lines below)?
There was a problem hiding this comment.
There is no change here. The + at the start is not indicating that anything changed in this PR. This is part of the diff that is already on main.
|
|
||
| val scanNodes = query.queryExecution.executedPlan.collect { | ||
| case scan: FileSourceScanExec => scan | ||
| + case scan: CometScanExec => scan |
There was a problem hiding this comment.
Is this not sufficient to enable the test?
There was a problem hiding this comment.
No. We would need to add CometNativeScanExec here to support native_datafusion. I am not sure if that is sufficient to actually make the test pass though.
There was a problem hiding this comment.
Again, this code is not changed in the PR - the test is just ignored in this PR for native_datafusion scan.
|
I'll check it this afternoon |
dev/diffs/3.4.3.diff
Outdated
| } | ||
|
|
||
| - test("filter pushdown - StringPredicate") { | ||
| + test("filter pushdown - StringPredicate", IgnoreCometNativeScan("cannot be pushed down")) { |
There was a problem hiding this comment.
Is this a new issue?
There was a problem hiding this comment.
hmm this should have been ignored with IgnoreCometDataFusionScan not IgnoreCometNativeScan. I will update this.
There was a problem hiding this comment.
do we have existing GH issue link to follow this later?
There was a problem hiding this comment.
I went ahead and updated this in this PR
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove looks like #3311 is a biggest villain for this PR, marking the issue so we can reenable spark tests once #3311 fixed
… diff - static scan metrics → apache#3442 (DPP not supported in native_datafusion) - caseSensitive and SPARK-25207 duplicate fields → apache#3760 (new issue) - Python UDF filter pushdown → fixed by adding CometNativeScanExec pattern matches, removed IgnoreCometNativeDataFusion tag
I updated the diff to link to new specific issues so that we can close the old #3311 issue which covered multiple |
Which issue does this PR close?
Part of #3321
Related to #3311
Rationale for this change
The
native_datafusionSpark SQL tests were already running for Spark 3.5 but not for Spark 3.4. Adding 3.4 coverage revealed test failures that need to be skipped with the appropriate ignore tags.What changes are included in this PR?
native_datafusionCI workflowdev/diffs/3.4.3.diffto tag failing tests withIgnoreCometNativeDataFusionandIgnoreCometNativeScan, matching tags already applied in the 3.5.8 diff plus 3 tests specific to Spark 3.4:How are these changes tested?
By running the Spark SQL
native_datafusiontests in CI for Spark 3.4.3.