Describe the bug
The logical optimizer rule optimize_projections can cause recursive CTEs to fail or return incorrect results. This is caused by attempting to remove projections based on what the outer query returns independently of what the recursive term uses.
This is introduced by PR #16696 to solve #16684. I think the best approach to solve this bug while keeping #16684 fixed is to optimize the projections of the static term as if it was a regular query, and not attempt to optimize based on the outer query. As far as I can tell PostgreSQL and DuckDB work like this. As a bonus, I think it would solve this issue as well: #17853.
If anyone has a better suggestion please let me know.
To Reproduce
For example, this query should return two rows:
with recursive t(k, v) as (
select 1 k, 10 v
union all
select 2, 20 from t where k = 1
)
select v from t;
+----+
| v |
+----+
| 10 |
+----+
-- k is removed from the static term, but it is used in the recursive term
SubqueryAlias: t
Projection: v AS v
RecursiveQuery: is_distinct=false
Projection: Int64(10) AS v
EmptyRelation: rows=1
Projection: Int64(20)
Filter: t.k = Int64(1)
TableScan: t projection=[k]
This query fails:
with recursive t(k, v) as (
select 1 k, 10 v
union all
select 2, 20 from t where v = 10
)
select v from t;
Arrow error: Schema error: project index 1 out of bounds, max field 1
Expected behavior
Both queries should return 2 rows.
Additional context
No response
Describe the bug
The logical optimizer rule
optimize_projectionscan cause recursive CTEs to fail or return incorrect results. This is caused by attempting to remove projections based on what the outer query returns independently of what the recursive term uses.This is introduced by PR #16696 to solve #16684. I think the best approach to solve this bug while keeping #16684 fixed is to optimize the projections of the static term as if it was a regular query, and not attempt to optimize based on the outer query. As far as I can tell PostgreSQL and DuckDB work like this. As a bonus, I think it would solve this issue as well: #17853.
If anyone has a better suggestion please let me know.
To Reproduce
For example, this query should return two rows:
This query fails:
Expected behavior
Both queries should return 2 rows.
Additional context
No response