Skip to content

Comments

Add FullJoin LINQ operator#124806

Draft
eiriktsarpalis wants to merge 2 commits intodotnet:mainfrom
eiriktsarpalis:feature/full-outer-join
Draft

Add FullJoin LINQ operator#124806
eiriktsarpalis wants to merge 2 commits intodotnet:mainfrom
eiriktsarpalis:feature/full-outer-join

Conversation

@eiriktsarpalis
Copy link
Member

[API Proposal]: Introduce FullJoin() LINQ operator

Background and Motivation

The Problem

LINQ currently has Join (inner join), LeftJoin (left outer join, added in .NET 10), and RightJoin (right outer join, added in .NET 10), but it lacks a full outer join operator. A full outer join returns all elements from both sequences: matched pairs where keys correlate, plus unmatched elements from either side paired with a default counterpart.

This is the last missing join type from the standard SQL join family. Its absence is blocking EF Core from translating LINQ queries to SQL FULL OUTER JOIN (dotnet/efcore#37633).

Real-World Scenario: Hybrid Search

The primary motivating scenario is hybrid search in databases, where two search result sets (e.g., full-text search + vector similarity search) need to be joined and ranked together via techniques like Reciprocal Rank Fusion (RRF). In this scenario, highly-ranking results from either search method should be included even if there's no corresponding result from the other method — this is precisely the semantics of a full outer join.

See this blog post for example usage with SQL Server.

Existing Workarounds and Why They're Insufficient

Without FullJoin, users must combine multiple operators:

// Workaround 1: LeftJoin + RightJoin + filtering (two passes over data)
var fullJoin = outers.LeftJoin(inners, o => o.Key, i => i.Key, (o, i) => new { o, i })
    .Concat(
        outers.RightJoin(inners, o => o.Key, i => i.Key, (o, i) => new { o, i })
              .Where(x => x.o is null));

// Workaround 2: GroupJoin + SelectMany + Concat (even more complex)
var fullJoin = outers
    .GroupJoin(inners, o => o.Key, i => i.Key, (o, matches) => new { o, matches })
    .SelectMany(x => x.matches.DefaultIfEmpty(), (x, i) => new { x.o, i })
    .Concat(
        inners.Where(i => !outers.Any(o => o.Key == i.Key))
              .Select(i => new { o = default(Outer), i }));

These workarounds have three problems:

  1. Complexity: They require combining 3+ operators in non-obvious ways, are error-prone, and difficult to read.
  2. Performance: They require multiple passes over the data and/or multiple hash table constructions, whereas a single FullJoin operator can do everything in a single pass using one hash table.
  3. Query translation: None of these combinations can be recognized and translated to SQL FULL OUTER JOIN by LINQ providers like EF Core.

Prior Art

  • SQL: FULL OUTER JOIN is standard across all major relational databases (SQL Server, PostgreSQL, MySQL 8.0+, Oracle, SQLite 3.39+).
  • Python (pandas): pd.merge(left, right, how='outer') — the how='outer' parameter performs a full outer join.
  • Java (Streams): No built-in full join, but libraries like Guava provide it. The absence is cited as a pain point.
  • Scala: Full outer join support via cogroup and explicit collection operations.
  • .NET 10: LeftJoin and RightJoin were added in .NET 10 (#110292), establishing the naming convention and API shape that this proposal follows.

API Proposal

System.Linq.Enumerable

 namespace System.Linq;

 public static partial class Enumerable
 {
     public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(...);
     public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer);

     public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(...);
     public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer);

     public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(...);
     public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(..., IEqualityComparer<TKey>? comparer);

+    public static IEnumerable<TResult> FullJoin<TOuter, TInner, TKey, TResult>(
+        this IEnumerable<TOuter> outer,
+        IEnumerable<TInner> inner,
+        Func<TOuter, TKey> outerKeySelector,
+        Func<TInner, TKey> innerKeySelector,
+        Func<TOuter?, TInner?, TResult> resultSelector);
+
+    public static IEnumerable<TResult> FullJoin<TOuter, TInner, TKey, TResult>(
+        this IEnumerable<TOuter> outer,
+        IEnumerable<TInner> inner,
+        Func<TOuter, TKey> outerKeySelector,
+        Func<TInner, TKey> innerKeySelector,
+        Func<TOuter?, TInner?, TResult> resultSelector,
+        IEqualityComparer<TKey>? comparer);
 }

System.Linq.Queryable

 namespace System.Linq;

 public static partial class Queryable
 {
+    public static IQueryable<TResult> FullJoin<TOuter, TInner, TKey, TResult>(
+        this IQueryable<TOuter> outer,
+        IEnumerable<TInner> inner,
+        Expression<Func<TOuter, TKey>> outerKeySelector,
+        Expression<Func<TInner, TKey>> innerKeySelector,
+        Expression<Func<TOuter?, TInner?, TResult>> resultSelector);
+
+    public static IQueryable<TResult> FullJoin<TOuter, TInner, TKey, TResult>(
+        this IQueryable<TOuter> outer,
+        IEnumerable<TInner> inner,
+        Expression<Func<TOuter, TKey>> outerKeySelector,
+        Expression<Func<TInner, TKey>> innerKeySelector,
+        Expression<Func<TOuter?, TInner?, TResult>> resultSelector,
+        IEqualityComparer<TKey>? comparer);
 }

Key Design Point: Both Parameters Nullable in Result Selector

Unlike LeftJoin (where only TInner? is nullable) and RightJoin (where only TOuter? is nullable), FullJoin makes both parameters nullable in the result selector: Func<TOuter?, TInner?, TResult>. This reflects the semantics: either side can be unmatched, so either can be default.

API Usage

// Basic full outer join: merge two search result sets
var hybridResults = fullTextResults.FullJoin(
    vectorResults,
    ft => ft.DocumentId,
    vr => vr.DocumentId,
    (ft, vr) => new
    {
        DocumentId = ft?.DocumentId ?? vr!.DocumentId,
        FullTextScore = ft?.Score ?? 0,
        VectorScore = vr?.Score ?? 0
    });

// With custom comparer
var merged = customers.FullJoin(
    orders,
    c => c.Name,
    o => o.CustomerName,
    (c, o) => new { Customer = c?.Name ?? "Unknown", OrderId = o?.Id ?? 0 },
    StringComparer.OrdinalIgnoreCase);

Result Ordering

The result ordering follows this pattern:

  1. First, all outer elements are enumerated in their original order. For each outer element, if matching inner elements exist, a result is yielded for each match; if no match exists, a result is yielded with default for the inner side.
  2. After all outer elements are processed, any inner elements that were not matched by any outer element are yielded with default for the outer side, in the order they appeared in the inner lookup.

This is consistent with SQL FULL OUTER JOIN semantics where the left side's rows come first, followed by unmatched right-side rows.

Implementation

The implementation uses the same hash join strategy as LeftJoin/RightJoin, building a Lookup of the inner sequence, then iterating the outer sequence. Additionally, it tracks which inner groupings were matched (via a HashSet) so it can yield unmatched inner elements in a second pass:

1. Build a Lookup from inner elements (keyed by innerKeySelector)
2. For each outer element:
   a. Look up matching inner grouping
   b. If found: yield (outer, inner) for each match; mark grouping as matched
   c. If not found: yield (outer, default)
3. Iterate all inner groupings; for each unmatched grouping:
   a. yield (default, inner) for each element

This requires a single construction of the inner lookup (O(n) where n = inner count), a single pass over the outer sequence, and a single pass over unmatched inner groupings. The HashSet for tracking matched groupings adds O(k) space where k = number of distinct matched inner keys, which is bounded by both the outer and inner sequence sizes.

Design Decisions

  1. Naming: FullJoin (not FullOuterJoin): Follows the convention established by LeftJoin/RightJoin in .NET 10, which omitted "Outer" from the name despite representing outer joins. The SQL standard name is FULL OUTER JOIN, but FullJoin is shorter, consistent, and unambiguous.

  2. No early-exit optimization for empty sequences: Unlike LeftJoin (which returns [] when outer is empty) and RightJoin (which returns [] when inner is empty), FullJoin cannot short-circuit on either being empty — an empty outer with a non-empty inner must still yield unmatched inner elements, and vice versa.

  3. HashSet for tracking matched groupings: An alternative approach would be to add a _matched field to the Grouping<TKey, TElement> class, but this would pollute the shared Lookup type with FullJoin-specific state. Using an external HashSet<Grouping> (with reference equality) is cleaner and doesn't impact other operators.

Alternatives Considered

  1. Express as LeftJoin.Concat(RightJoin.Where(unmatched)): Requires two full hash-table builds and two passes over the data. Cannot be translated to a single SQL FULL OUTER JOIN by query providers.

  2. Express via GroupJoin + SelectMany: Even more complex than the dual-join approach, and equally un-translatable to SQL.

  3. Add only the Queryable version: Would allow EF Core translation but leave in-memory LINQ without an efficient implementation. The .NET 10 precedent of adding both Enumerable and Queryable versions for LeftJoin/RightJoin argues against this.

Risks

  • No binary breaking changes: This is a purely additive API.
  • Source breaking changes: Extremely unlikely. The new method name FullJoin does not conflict with any existing LINQ operators. Users who have defined their own FullJoin extension methods could see ambiguity, but this is the same (accepted) risk that LeftJoin/RightJoin introduced.
  • No expression tree changes needed: Like LeftJoin/RightJoin, the Queryable version is represented via existing MethodCallExpressions. LINQ providers can pattern-match on the method info to translate to SQL.

Open Questions

  1. C# query syntax: The C# language team has an active proposal for left/right join modifiers. Should a full join modifier be considered in that same proposal? This is a language concern and doesn't block the API addition.

  2. Value types and the "default vs. not found" ambiguity: As discussed in the LeftJoin proposal (#110292), when TOuter or TInner is a value type, default is indistinguishable from a legitimate value. This is the same trade-off accepted for LeftJoin/RightJoin and other operators like FirstOrDefault. A future Optional<T> or discriminated union type could address this, but making the API more complex for this edge case is not warranted.

eiriktsarpalis and others added 2 commits February 24, 2026 16:19
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add an api-proposal skill
Adds Enumerable.FullJoin and Queryable.FullJoin with 2 overloads each
(with/without IEqualityComparer), following the same pattern as the
existing LeftJoin/RightJoin operators added in .NET 10.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the FullJoin LINQ operator to complement the existing Join, LeftJoin, and RightJoin operators. FullJoin performs a full outer join, returning all elements from both sequences with matched pairs where keys correlate, plus unmatched elements from either side paired with a default counterpart.

Changes:

  • Adds FullJoin operator to System.Linq.Enumerable with two overloads (with and without custom comparer)
  • Adds corresponding FullJoin operator to System.Linq.Queryable for query provider translation
  • Implements comprehensive test coverage for both Enumerable and Queryable versions
  • Updates reference assemblies to expose the new public API
  • Adds API proposal skill documentation and guidelines (separate feature for future proposals)

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/libraries/System.Linq/src/System/Linq/FullJoin.cs Core implementation of FullJoin using hash-based join with HashSet tracking for unmatched inner elements
src/libraries/System.Linq/src/System.Linq.csproj Adds FullJoin.cs to the project
src/libraries/System.Linq/ref/System.Linq.cs Reference assembly API surface additions for FullJoin
src/libraries/System.Linq/tests/FullJoinTests.cs Comprehensive tests covering edge cases, null handling, custom comparers, and argument validation
src/libraries/System.Linq/tests/System.Linq.Tests.csproj Adds FullJoinTests.cs to test project
src/libraries/System.Linq.Queryable/src/System/Linq/Queryable.cs Queryable implementation wrapping calls to Enumerable.FullJoin with expression tree generation
src/libraries/System.Linq.Queryable/ref/System.Linq.Queryable.cs Reference assembly API surface additions for Queryable.FullJoin
src/libraries/System.Linq.Queryable/tests/FullJoinTests.cs Queryable-specific tests for expression tree generation and argument validation
src/libraries/System.Linq.Queryable/tests/System.Linq.Queryable.Tests.csproj Adds FullJoinTests.cs to Queryable test project
.github/skills/api-proposal/SKILL.md New API proposal skill documentation (separate feature)
.github/skills/api-proposal/references/api-proposal-checklist.md API proposal quality checklist (separate feature)
.github/skills/api-proposal/references/proposal-examples.md Curated API proposal examples (separate feature)

/// person => person,
/// pet => pet.Owner,
/// (person, pet) =>
/// new { OwnerName = person?.Name, Pet = pet?.Name });
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code doesn't demonstrate the use of the comparer parameter, which is misleading since this overload specifically exists to show how to use a custom comparer. The example should either pass a comparer argument (e.g., StringComparer.OrdinalIgnoreCase if joining on strings) or be identical to the first overload's example but passing null for the comparer to show that it's optional.

Suggested change
/// new { OwnerName = person?.Name, Pet = pet?.Name });
/// new { OwnerName = person?.Name, Pet = pet?.Name },
/// comparer: null);

Copilot uses AI. Check for mistakes.
Comment on lines +1119 to +1126
/// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents
/// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> itself as a constructed generic method.
/// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method
/// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter.
/// </para>
/// <para>
/// The query behavior that occurs as a result of executing an expression tree that represents calling
/// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation cref references are incorrect. They reference Expression{Func{TOuter, TInner, TResult}} but the actual method signature uses Expression{Func{TOuter?, TInner?, TResult}} with nullable type parameters. All cref tags should use Expression{Func{TOuter?, TInner?, TResult}} to match the actual signature. This occurs on lines 1119, 1120, and 1126.

Suggested change
/// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents
/// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> itself as a constructed generic method.
/// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method
/// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter.
/// </para>
/// <para>
/// The query behavior that occurs as a result of executing an expression tree that represents calling
/// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter.
/// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> method generates a <see cref="MethodCallExpression"/> that represents
/// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> itself as a constructed generic method.
/// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method
/// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter.
/// </para>
/// <para>
/// The query behavior that occurs as a result of executing an expression tree that represents calling
/// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter?, TInner?, TResult}})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +1172 to +1179
/// The <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> method generates a <see cref="MethodCallExpression"/> that represents
/// calling <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> itself as a constructed generic method.
/// It then passes the <see cref="MethodCallExpression"/> to the <see cref="IQueryProvider.CreateQuery{TElement}(Expression)"/> method
/// of the <see cref="IQueryProvider"/> represented by the <see cref="IQueryable.Provider"/> property of the <paramref name="outer"/> parameter.
/// </para>
/// <para>
/// The query behavior that occurs as a result of executing an expression tree that represents calling
/// <see cref="FullJoin{TOuter, TInner, TKey, TResult}(IQueryable{TOuter}, IEnumerable{TInner}, Expression{Func{TOuter, TKey}}, Expression{Func{TInner, TKey}}, Expression{Func{TOuter, TInner, TResult}}, IEqualityComparer{TKey})"/> depends on the implementation of the type of the <paramref name="outer"/> parameter.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation cref references are incorrect. They reference Expression{Func{TOuter, TInner, TResult}} but the actual method signature uses Expression{Func{TOuter?, TInner?, TResult}} with nullable type parameters. All cref tags should use Expression{Func{TOuter?, TInner?, TResult}} to match the actual signature. This occurs on lines 1172, 1173, and 1179.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +55
/// List{Person} people = new List{Person} { terry, charlotte, tom };
/// List{Pet} pets = new List{Pet} { barley, boots, whiskers, daisy, stray };
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation example code contains syntax errors. List{Person} and List{Pet} should be List&lt;Person&gt; and List&lt;Pet&gt; respectively (using XML entity encoding for angle brackets in XML documentation comments).

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +85
/// /*
/// This code produces the following output:
///
/// Adams, Terry - Barley
/// Adams, Terry - Boots
/// Weiss, Charlotte - Whiskers
/// Chapkin, Tom - NONE
/// NONE - Daisy
/// */
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output is incomplete. The example code includes 5 pets (barley, boots, whiskers, daisy, stray), but the output only shows 4 results. The stray pet (with Owner = null) should appear in the output as "NONE - Stray" since it has no matching person. The complete output should be:
Adams, Terry - Barley
Adams, Terry - Boots
Weiss, Charlotte - Whiskers
Chapkin, Tom - NONE
NONE - Daisy
NONE - Stray

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +161
/// List{Person} people = new List{Person} { terry, charlotte, tom };
/// List{Pet} pets = new List{Pet} { barley, boots, whiskers, daisy, stray };
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation example code contains syntax errors. List{Person} and List{Pet} should be List&lt;Person&gt; and List&lt;Pet&gt; respectively (using XML entity encoding for angle brackets in XML documentation comments).

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +191
/// /*
/// This code produces the following output:
///
/// Adams, Terry - Barley
/// Adams, Terry - Boots
/// Weiss, Charlotte - Whiskers
/// Chapkin, Tom - NONE
/// NONE - Daisy
/// */
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output is incomplete. The example code includes 5 pets (barley, boots, whiskers, daisy, stray), but the output only shows 4 results. The stray pet (with Owner = null) should appear in the output as "NONE - Stray" since it has no matching person.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant