fix: raise AmbiguousReference error for duplicate column names in subquery by xiedeyantu · Pull Request #21236 · apache/datafusion

xiedeyantu · 2026-03-29T13:49:13Z

Which issue does this PR close?

Closes Unambiguous Column Reference Error Not Triggered in SQL Query #21232.

Rationale for this change

When two joined tables share a column with the same name (e.g. age), a SELECT * inside a derived table subquery produces duplicate column names. Previously, referencing such a column by its unqualified name from the outer query silently succeeded instead of raising an ambiguity error, violating standard SQL semantics.

What changes are included in this PR?

Added an ambiguous_names: HashSet<String> field to DFSchema to track column names that are structurally ambiguous in a given schema context.
Added DFSchema::with_ambiguous_names (builder) and DFSchema::ambiguous_names (accessor) methods.
In SubqueryAlias::try_new, after unique_field_aliases renames duplicate columns to keep the Arrow schema valid, the original (pre-rename) names are collected into ambiguous_names and attached to the output schema.
In DFSchema::qualified_field_with_unqualified_name, any lookup of an ambiguous name now immediately returns SchemaError::AmbiguousReference.
In Column::normalize_with_schemas_and_ambiguity_check, even a single structural match is rejected when the containing schema has flagged the name as ambiguous.
Updated the bad_extension_planner snapshot test to include the new ambiguous_names field in the DFSchema debug output.

Are these changes tested?

The existing join_with_ambiguous_column, order_by_ambiguous_name, and group_by_ambiguous_name tests continue to pass. A new test case covering the reported scenario (select age from (SELECT * FROM a join b on a.aid = b.bid) as t) should be added to datafusion/sql/tests/sql_integration.rs.

Are there any user-facing changes?

Yes. Queries that previously silently resolved an ambiguous column reference through a derived-table subquery will now receive a Schema error: Ambiguous reference to unqualified field <name> error, consistent with standard SQL behavior and with how DataFusion already handles the same ambiguity at the direct JOIN level.

…query derived tables

xudong963

Could you please add some slt tests? multiple joins (2, 3 etc) with duplicated columns

xudong963 · 2026-03-30T08:01:19Z

datafusion/expr/src/logical_plan/plan.rs

        let aliases = unique_field_aliases(plan.schema().fields());
        let is_projection_needed = aliases.iter().any(Option::is_some);

+        // Collect the set of unqualified field names that are ambiguous in this


Here marks a name as ambiguous when unique_field_aliases provides a rename for it. But unique_field_aliases renames all duplicates — so if there are 3 columns named age, the 2nd and 3rd get renamed.

The code collects field.name() for each renamed field, which means it collects "age" twice and puts it in the set. That works correctly due to HashSet dedup, but the first age (which was NOT renamed) is also ambiguous and only ends up in the set because one of the later duplicates shares its name. This is coincidentally correct but fragile — if unique_field_aliases ever changed to rename ALL duplicates (including the first), or if it renamed to something other than name:N, the logic could break. 🤔

A cleaner approach: count occurrences of each name and mark any name appearing 2+ times.

@xudong963 Thank you very much for your review and excellent suggestions. My initial intention for this PR was to eliminate the unique naming convention—specifically, the name:N format—but it appears to play a critical role internally (it would be great if you could provide some further details on this). Consequently, I introduced an additional ambiguous_names field to track duplicate column names. To be honest, I felt this approach lacked elegance, but I couldn't come up with a better alternative at the time. Having reviewed your suggestions, however, I now believe they offer a superior solution; I will proceed to refactor this PR based on that approach. I will also add the corresponding SLT tests.

xiedeyantu · 2026-03-30T14:33:51Z

@xudong963 I have made the changes based on your suggestions and added tests. There is one failing item in the CI, but it appears unrelated to the current PR, as the main branch shows the same failure. Could you please review this again? Thanks!

fix: raise AmbiguousReference error for duplicate column names in sub…

579aa74

…query derived tables

github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate common Related to common crate labels Mar 29, 2026

fix fmt

d14f7a4

xudong963 changed the title ~~fix: raise AmbiguousReference error for duplicate column names in sub…~~ fix: raise AmbiguousReference error for duplicate column names in subquery Mar 30, 2026

xudong963 reviewed Mar 30, 2026

View reviewed changes

xiedeyantu added 2 commits March 30, 2026 19:53

change to count occurrences of each name

8f9a111

added tests

bef7bc7

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 30, 2026

Merge branch 'main' into ambiguous-names

c46092a

xiedeyantu requested a review from xudong963 March 30, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: raise AmbiguousReference error for duplicate column names in subquery#21236

fix: raise AmbiguousReference error for duplicate column names in subquery#21236
xiedeyantu wants to merge 5 commits intoapache:mainfrom
xiedeyantu:ambiguous-names

xiedeyantu commented Mar 29, 2026

Uh oh!

xudong963 left a comment

Uh oh!

xudong963 Mar 30, 2026

Uh oh!

xiedeyantu Mar 30, 2026

Uh oh!

xiedeyantu commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiedeyantu commented Mar 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

xudong963 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants