Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Add an api-proposal skill
Adds Enumerable.FullJoin and Queryable.FullJoin with 2 overloads each (with/without IEqualityComparer), following the same pattern as the existing LeftJoin/RightJoin operators added in .NET 10. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-linq |
There was a problem hiding this comment.
Pull request overview
This PR adds the FullJoin LINQ operator to complement the existing Join, LeftJoin, and RightJoin operators. FullJoin performs a full outer join, returning all elements from both sequences with matched pairs where keys correlate, plus unmatched elements from either side paired with a default counterpart.
Changes:
- Adds
FullJoinoperator toSystem.Linq.Enumerablewith two overloads (with and without custom comparer) - Adds corresponding
FullJoinoperator toSystem.Linq.Queryablefor query provider translation - Implements comprehensive test coverage for both
EnumerableandQueryableversions - Updates reference assemblies to expose the new public API
- Adds API proposal skill documentation and guidelines (separate feature for future proposals)
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Linq/src/System/Linq/FullJoin.cs | Core implementation of FullJoin using hash-based join with HashSet tracking for unmatched inner elements |
| src/libraries/System.Linq/src/System.Linq.csproj | Adds FullJoin.cs to the project |
| src/libraries/System.Linq/ref/System.Linq.cs | Reference assembly API surface additions for FullJoin |
| src/libraries/System.Linq/tests/FullJoinTests.cs | Comprehensive tests covering edge cases, null handling, custom comparers, and argument validation |
| src/libraries/System.Linq/tests/System.Linq.Tests.csproj | Adds FullJoinTests.cs to test project |
| src/libraries/System.Linq.Queryable/src/System/Linq/Queryable.cs | Queryable implementation wrapping calls to Enumerable.FullJoin with expression tree generation |
| src/libraries/System.Linq.Queryable/ref/System.Linq.Queryable.cs | Reference assembly API surface additions for Queryable.FullJoin |
| src/libraries/System.Linq.Queryable/tests/FullJoinTests.cs | Queryable-specific tests for expression tree generation and argument validation |
| src/libraries/System.Linq.Queryable/tests/System.Linq.Queryable.Tests.csproj | Adds FullJoinTests.cs to Queryable test project |
| .github/skills/api-proposal/SKILL.md | New API proposal skill documentation (separate feature) |
| .github/skills/api-proposal/references/api-proposal-checklist.md | API proposal quality checklist (separate feature) |
| .github/skills/api-proposal/references/proposal-examples.md | Curated API proposal examples (separate feature) |
| /// person => person, | ||
| /// pet => pet.Owner, | ||
| /// (person, pet) => | ||
| /// new { OwnerName = person?.Name, Pet = pet?.Name }); |
There was a problem hiding this comment.
The example code doesn't demonstrate the use of the comparer parameter, which is misleading since this overload specifically exists to show how to use a custom comparer. The example should either pass a comparer argument (e.g., StringComparer.OrdinalIgnoreCase if joining on strings) or be identical to the first overload's example but passing null for the comparer to show that it's optional.
| /// new { OwnerName = person?.Name, Pet = pet?.Name }); | |
| /// new { OwnerName = person?.Name, Pet = pet?.Name }, | |
| /// comparer: null); |
| /// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents | ||
| /// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> itself as a constructed generic method. | ||
| /// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method | ||
| /// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter. | ||
| /// </para> | ||
| /// <para> | ||
| /// The query behavior that occurs as a result of executing an expression tree that represents calling | ||
| /// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter. |
There was a problem hiding this comment.
The XML documentation cref references are incorrect. They reference Expression{Func{TOuter, TInner, TResult}} but the actual method signature uses Expression{Func{TOuter?, TInner?, TResult}} with nullable type parameters. All cref tags should use Expression{Func{TOuter?, TInner?, TResult}} to match the actual signature. This occurs on lines 1119, 1120, and 1126.
| /// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents | |
| /// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> itself as a constructed generic method. | |
| /// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method | |
| /// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter. | |
| /// </para> | |
| /// <para> | |
| /// The query behavior that occurs as a result of executing an expression tree that represents calling | |
| /// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter. | |
| /// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents | |
| /// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> itself as a constructed generic method. | |
| /// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method | |
| /// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter. | |
| /// </para> | |
| /// <para> | |
| /// The query behavior that occurs as a result of executing an expression tree that represents calling | |
| /// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter. |
| /// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> method generates a <see cref="MethodCallExpression"/> that represents | ||
| /// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> itself as a constructed generic method. | ||
| /// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method | ||
| /// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter. | ||
| /// </para> | ||
| /// <para> | ||
| /// The query behavior that occurs as a result of executing an expression tree that represents calling | ||
| /// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter. |
There was a problem hiding this comment.
The XML documentation cref references are incorrect. They reference Expression{Func{TOuter, TInner, TResult}} but the actual method signature uses Expression{Func{TOuter?, TInner?, TResult}} with nullable type parameters. All cref tags should use Expression{Func{TOuter?, TInner?, TResult}} to match the actual signature. This occurs on lines 1172, 1173, and 1179.
| /// List{Person} people = new List{Person} { terry, charlotte, tom }; | ||
| /// List{Pet} pets = new List{Pet} { barley, boots, whiskers, daisy, stray }; |
There was a problem hiding this comment.
The XML documentation example code contains syntax errors. List{Person} and List{Pet} should be List<Person> and List<Pet> respectively (using XML entity encoding for angle brackets in XML documentation comments).
| /// /* | ||
| /// This code produces the following output: | ||
| /// | ||
| /// Adams, Terry - Barley | ||
| /// Adams, Terry - Boots | ||
| /// Weiss, Charlotte - Whiskers | ||
| /// Chapkin, Tom - NONE | ||
| /// NONE - Daisy | ||
| /// */ |
There was a problem hiding this comment.
The example output is incomplete. The example code includes 5 pets (barley, boots, whiskers, daisy, stray), but the output only shows 4 results. The stray pet (with Owner = null) should appear in the output as "NONE - Stray" since it has no matching person. The complete output should be:
Adams, Terry - Barley
Adams, Terry - Boots
Weiss, Charlotte - Whiskers
Chapkin, Tom - NONE
NONE - Daisy
NONE - Stray
| /// List{Person} people = new List{Person} { terry, charlotte, tom }; | ||
| /// List{Pet} pets = new List{Pet} { barley, boots, whiskers, daisy, stray }; |
There was a problem hiding this comment.
The XML documentation example code contains syntax errors. List{Person} and List{Pet} should be List<Person> and List<Pet> respectively (using XML entity encoding for angle brackets in XML documentation comments).
| /// /* | ||
| /// This code produces the following output: | ||
| /// | ||
| /// Adams, Terry - Barley | ||
| /// Adams, Terry - Boots | ||
| /// Weiss, Charlotte - Whiskers | ||
| /// Chapkin, Tom - NONE | ||
| /// NONE - Daisy | ||
| /// */ |
There was a problem hiding this comment.
The example output is incomplete. The example code includes 5 pets (barley, boots, whiskers, daisy, stray), but the output only shows 4 results. The stray pet (with Owner = null) should appear in the output as "NONE - Stray" since it has no matching person.
[API Proposal]: Introduce FullJoin() LINQ operator
Background and Motivation
The Problem
LINQ currently has
Join(inner join),LeftJoin(left outer join, added in .NET 10), andRightJoin(right outer join, added in .NET 10), but it lacks a full outer join operator. A full outer join returns all elements from both sequences: matched pairs where keys correlate, plus unmatched elements from either side paired with a default counterpart.This is the last missing join type from the standard SQL join family. Its absence is blocking EF Core from translating LINQ queries to SQL
FULL OUTER JOIN(dotnet/efcore#37633).Real-World Scenario: Hybrid Search
The primary motivating scenario is hybrid search in databases, where two search result sets (e.g., full-text search + vector similarity search) need to be joined and ranked together via techniques like Reciprocal Rank Fusion (RRF). In this scenario, highly-ranking results from either search method should be included even if there's no corresponding result from the other method — this is precisely the semantics of a full outer join.
See this blog post for example usage with SQL Server.
Existing Workarounds and Why They're Insufficient
Without
FullJoin, users must combine multiple operators:These workarounds have three problems:
FullJoinoperator can do everything in a single pass using one hash table.FULL OUTER JOINby LINQ providers like EF Core.Prior Art
FULL OUTER JOINis standard across all major relational databases (SQL Server, PostgreSQL, MySQL 8.0+, Oracle, SQLite 3.39+).pd.merge(left, right, how='outer')— thehow='outer'parameter performs a full outer join.cogroupand explicit collection operations.LeftJoinandRightJoinwere added in .NET 10 (#110292), establishing the naming convention and API shape that this proposal follows.API Proposal
System.Linq.Enumerablenamespace System.Linq; public static partial class Enumerable { public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(...); public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer); public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(...); public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer); public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(...); public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer); + public static IEnumerable<TResult> FullJoin<TOuter, TInner, TKey, TResult>( + this IEnumerable<TOuter> outer, + IEnumerable<TInner> inner, + Func<TOuter, TKey> outerKeySelector, + Func<TInner, TKey> innerKeySelector, + Func<TOuter?, TInner?, TResult> resultSelector); + + public static IEnumerable<TResult> FullJoin<TOuter, TInner, TKey, TResult>( + this IEnumerable<TOuter> outer, + IEnumerable<TInner> inner, + Func<TOuter, TKey> outerKeySelector, + Func<TInner, TKey> innerKeySelector, + Func<TOuter?, TInner?, TResult> resultSelector, + IEqualityComparer<TKey>? comparer); }System.Linq.Queryablenamespace System.Linq; public static partial class Queryable { + public static IQueryable<TResult> FullJoin<TOuter, TInner, TKey, TResult>( + this IQueryable<TOuter> outer, + IEnumerable<TInner> inner, + Expression<Func<TOuter, TKey>> outerKeySelector, + Expression<Func<TInner, TKey>> innerKeySelector, + Expression<Func<TOuter?, TInner?, TResult>> resultSelector); + + public static IQueryable<TResult> FullJoin<TOuter, TInner, TKey, TResult>( + this IQueryable<TOuter> outer, + IEnumerable<TInner> inner, + Expression<Func<TOuter, TKey>> outerKeySelector, + Expression<Func<TInner, TKey>> innerKeySelector, + Expression<Func<TOuter?, TInner?, TResult>> resultSelector, + IEqualityComparer<TKey>? comparer); }Key Design Point: Both Parameters Nullable in Result Selector
Unlike
LeftJoin(where onlyTInner?is nullable) andRightJoin(where onlyTOuter?is nullable),FullJoinmakes both parameters nullable in the result selector:Func<TOuter?, TInner?, TResult>. This reflects the semantics: either side can be unmatched, so either can bedefault.API Usage
Result Ordering
The result ordering follows this pattern:
defaultfor the inner side.defaultfor the outer side, in the order they appeared in the inner lookup.This is consistent with SQL
FULL OUTER JOINsemantics where the left side's rows come first, followed by unmatched right-side rows.Implementation
The implementation uses the same hash join strategy as
LeftJoin/RightJoin, building aLookupof the inner sequence, then iterating the outer sequence. Additionally, it tracks which inner groupings were matched (via aHashSet) so it can yield unmatched inner elements in a second pass:This requires a single construction of the inner lookup (O(n) where n = inner count), a single pass over the outer sequence, and a single pass over unmatched inner groupings. The
HashSetfor tracking matched groupings adds O(k) space where k = number of distinct matched inner keys, which is bounded by both the outer and inner sequence sizes.Design Decisions
Naming:
FullJoin(notFullOuterJoin): Follows the convention established byLeftJoin/RightJoinin .NET 10, which omitted "Outer" from the name despite representing outer joins. The SQL standard name isFULL OUTER JOIN, butFullJoinis shorter, consistent, and unambiguous.No early-exit optimization for empty sequences: Unlike
LeftJoin(which returns[]whenouteris empty) andRightJoin(which returns[]wheninneris empty),FullJoincannot short-circuit on either being empty — an empty outer with a non-empty inner must still yield unmatched inner elements, and vice versa.HashSet for tracking matched groupings: An alternative approach would be to add a
_matchedfield to theGrouping<TKey, TElement>class, but this would pollute the sharedLookuptype with FullJoin-specific state. Using an externalHashSet<Grouping>(with reference equality) is cleaner and doesn't impact other operators.Alternatives Considered
Express as
LeftJoin.Concat(RightJoin.Where(unmatched)): Requires two full hash-table builds and two passes over the data. Cannot be translated to a single SQLFULL OUTER JOINby query providers.Express via
GroupJoin+SelectMany: Even more complex than the dual-join approach, and equally un-translatable to SQL.Add only the
Queryableversion: Would allow EF Core translation but leave in-memory LINQ without an efficient implementation. The .NET 10 precedent of adding bothEnumerableandQueryableversions forLeftJoin/RightJoinargues against this.Risks
FullJoindoes not conflict with any existing LINQ operators. Users who have defined their ownFullJoinextension methods could see ambiguity, but this is the same (accepted) risk thatLeftJoin/RightJoinintroduced.LeftJoin/RightJoin, the Queryable version is represented via existingMethodCallExpressions. LINQ providers can pattern-match on the method info to translate to SQL.Open Questions
C# query syntax: The C# language team has an active proposal for
left/rightjoin modifiers. Should afull joinmodifier be considered in that same proposal? This is a language concern and doesn't block the API addition.Value types and the "default vs. not found" ambiguity: As discussed in the
LeftJoinproposal (#110292), whenTOuterorTInneris a value type,defaultis indistinguishable from a legitimate value. This is the same trade-off accepted forLeftJoin/RightJoinand other operators likeFirstOrDefault. A futureOptional<T>or discriminated union type could address this, but making the API more complex for this edge case is not warranted.