[spark] Split non-partition and partition predicates from pushPredicates to limit pushdown by Yohahaha · Pull Request #3397 · apache/fluss

Yohahaha · 2026-05-28T15:44:20Z

Purpose

Linked issue: close #xxx

Fix a bug where LIMIT is never pushed down when a query has partition filters (e.g., WHERE dt = 'x' LIMIT 1). The root cause is that pushPredicates returns all predicates including partition ones, causing Spark to keep a Filter node. Spark's pushDownLimit only invokes pushLimit when there are no filters (PhysicalOperation(_, Nil, ...)), so the Filter node blocks limit pushdown.

Brief change log

SparkPartitionPredicate.scala: Refactor extract() to return nonPartitionPredicates and partitionPredicate.
FlussScanBuilder.scala: Return only non-partition predicates from pushPredicates() across all scan builders (partition filters, ARROW filters, lake filters).

Tests

API and Format

Documentation

Yohahaha · 2026-05-30T09:45:28Z

@fresh-borzoni @luoyuxia @YannByron PTAL!

fresh-borzoni

@Yohahaha Ty for the PR and nice catch, LGTM overall.
A couple of minor comments, PTAL

fresh-borzoni · 2026-05-31T02:18:13Z

+    val (partitionPredicates, nonPartitionPredicates) = predicates.partition {
+      sparkPredicate =>
+        SparkPredicateConverter
+          .convert(rowType, sparkPredicate)


Small thing: we call convert(...) here to classify the predicate, then call it again right below to actually use it. Could we convert once and reuse the result?

Something like mapping each predicate to its converted form, then partition-ing on isDefined. Only runs at planning time so not a big deal, just avoids doing the same work twice.

fresh-borzoni · 2026-05-31T02:28:02Z

  def flussConfig: FlussConfiguration

  override def pushPredicates(predicates: Array[Predicate]): Array[Predicate] = {
+    val nonPartitionPredicates = super.pushPredicates(predicates)


Here the lake path calls super.pushPredicates, which for ARROW tables runs convertAndStorePredicates and sets pushedPredicate/acceptedPredicates, then we overwrite both a few lines down.
So that work gets thrown away every time.

Could we pull the partition-extraction bit into a small helper and call that instead of super, so we skip the ARROW step we don't use?

I did a small refactor on filter pushdown. Lake and non-lake paths can now implement
pushdown logic freely. The common partition pushdown logic is in the parent class.

You can see the new interface and class inheritance structure below

SupportsPushDownV2Filters (Spark) │ └── FlussSupportsPushDownPartitionFilters │ ├── FlussSupportsPushDownV2Filters │ │ │ ├── FlussAppendScanBuilder │ └── FlussUpsertScanBuilder │ └── FlussLakeSupportsPushDownV2Filters │ ├── FlussLakeAppendScanBuilder └── FlussLakeUpsertScanBuilder

fix

c576288

Yohahaha mentioned this pull request May 28, 2026

[spark] Implement SupportsPushDownLimit DSv2 interface #3346

Open

Yohahaha added 2 commits May 29, 2026 09:43

fix style

c28a45d

fix lake batch partition filter pushdown

840022d

fresh-borzoni reviewed May 31, 2026

View reviewed changes

fix comments

7277582

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Split non-partition and partition predicates from pushPredicates to limit pushdown#3397

[spark] Split non-partition and partition predicates from pushPredicates to limit pushdown#3397
Yohahaha wants to merge 4 commits into
apache:mainfrom
Yohahaha:spark-pushdown-partition-filter

Yohahaha commented May 28, 2026

Uh oh!

Yohahaha commented May 30, 2026

Uh oh!

fresh-borzoni left a comment

Uh oh!

fresh-borzoni May 31, 2026

Uh oh!

Uh oh!

fresh-borzoni May 31, 2026

Uh oh!

Yohahaha May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yohahaha commented May 28, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Yohahaha commented May 30, 2026

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fresh-borzoni May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants