[Proposal] Runtime Filters for DataFusion Comet

### What is the problem the feature request solves?

Runtime Filters is a high-performance optimization that reduces I/O by **90%** and improves query performance  for join-heavy workloads. The feature filters data at scan time using lightweight data structures constructed during hash join build phases.

Runtime filters are lightweight data structures (IN sets, Min/Max bounds, Bloom filters) built during hash join **build phases** and pushed down to **scan operators** to filter data before reading from storage.

Example:

```sql
-- User writes standard SQL
SELECT o.*, c.name 
FROM orders o 
JOIN customers c ON o.customer_id = c.id 
WHERE c.country = 'USA';
```

1. Comet will detect join opportunity
2. Builds filter from `customers` table during join build
3. Applies filter to `orders` scan automatically
4. Query runs **faster** with **less I/O**


### Filter Types

1. **IN Filter** (Small cardinality <1000)

2. **Min/Max Filter** (Numeric/date types)
   - Min and max bounds

3. **Bloom Filter** (Large cardinality, future)
   - Probabilistic data structure

The system should automatically selects the optimal filter type:
- Numeric/date → Min/Max Filter (most efficient)
- Small cardinality → IN Filter
- Large cardinality → Bloom Filter 

Users should be able to control runtime filters via Spark configuration:

```scala
// Enable/disable runtime filters
spark.conf.set("spark.comet.runtimeFilter.enabled", true)

// Adjust thresholds
spark.conf.set("spark.comet.runtimeFilter.inFilterThreshold", 1000)
spark.conf.set("spark.comet.runtimeFilter.bloomFilterFpp", 0.01)
```

### Describe the potential solution

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Runtime Filters for DataFusion Comet #3053

What is the problem the feature request solves?

Filter Types

Describe the potential solution

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] Runtime Filters for DataFusion Comet #3053

Description

What is the problem the feature request solves?

Filter Types

Describe the potential solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions