Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion global.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"sdk": {
"version": "10.0.105"
"version": "10.0.105",
"rollForward": "latestMajor"
},
"tools": {
"dotnet": "10.0.105",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@

The [Microsoft.Extensions.DataIngestion.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion.Abstractions) package provides the core exchange types, including [`IngestionDocument`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestiondocument), [`IngestionChunker<T>`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunker-1), [`IngestionChunkProcessor<T>`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkprocessor-1), and [`IngestionChunkWriter<T>`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkwriter-1). Any .NET library that provides document processing capabilities can implement these abstractions to enable seamless integration with consuming code.

The [Microsoft.Extensions.DataIngestion](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion) package has an implicit dependency on the `Microsoft.Extensions.DataIngestion.Abstractions` package. This package enables you to easily integrate components such as enrichment processors, vector storage writers, and telemetry into your applications using familiar dependency injection and pipeline patterns. For example, it provides processors for sentiment analysis, keyword extraction, and summarization that can be chained together in ingestion pipelines.
The [Microsoft.Extensions.DataIngestion](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion) package has an implicit dependency on the `Microsoft.Extensions.DataIngestion.Abstractions` package. This package enables you to easily integrate components such as enrichment processors, vector storage writers, and telemetry into your applications using familiar dependency injection and pipeline patterns.

> **Note:** Retrieval abstractions (`RetrievalPipeline`, `RetrievalQuery`, `RetrievalQueryProcessor`, etc.) live in the separate [`Microsoft.Extensions.DataRetrieval`](../Microsoft.Extensions.DataRetrieval/) package family.

## Which package to reference

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Defines a re-ranking strategy for retrieval results.
/// </summary>
/// <remarks>
/// Re-rankers score and reorder retrieval chunks based on relevance to the query.
/// Implementations may use LLM-based scoring, cross-encoder models (e.g., ONNX),
/// or other ranking strategies.
/// </remarks>
public interface IReranker
{
/// <summary>
/// Re-ranks the provided chunks based on their relevance to the query.
/// </summary>
/// <param name="query">The search query.</param>
/// <param name="chunks">The chunks to re-rank.</param>
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param>
/// <returns>The re-ranked chunks, ordered by relevance (highest first).</returns>
Task<IReadOnlyList<RetrievalChunk>> RerankAsync(string query, IReadOnlyList<RetrievalChunk> chunks, CancellationToken cancellationToken = default);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Threading;
using System.Threading.Tasks;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Defines the contract for a retrieval pipeline that processes queries and returns results.
/// </summary>
/// <remarks>
/// Enables DI registration and testability. Consumers depend on <see cref="IRetriever"/>
/// rather than a concrete pipeline implementation, allowing mocking in tests and
/// swappable retrieval strategies.
/// </remarks>
public interface IRetriever
{
/// <summary>
/// Retrieves results for the specified query.
/// </summary>
/// <param name="query">The user query.</param>
/// <param name="topK">Maximum number of results to retrieve.</param>
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param>
/// <returns>The retrieval results.</returns>
Task<RetrievalResults> RetrieveAsync(
string query,
int topK = 5,
CancellationToken cancellationToken = default);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFrameworks>$(TargetFrameworks);netstandard2.0</TargetFrameworks>
<RootNamespace>Microsoft.Extensions.DataRetrieval</RootNamespace>
<Description>Abstractions representing Data Retrieval components for RAG.</Description>
<Workstream>RAG</Workstream>
<PackageTags>RAG;retrieval;search;reranking</PackageTags>
<ForceLatestDotnetVersions>true</ForceLatestDotnetVersions>
<Stage>preview</Stage>
<EnablePackageValidation>false</EnablePackageValidation>
<MinCodeCoverage>75</MinCodeCoverage>
<MinMutationScore>75</MinMutationScore>
<!-- Convert abstract class into interface -->
<NoWarn>$(NoWarn);S1694;S2368</NoWarn>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="System.Memory" Condition="'$(TargetFrameworkIdentifier)' != '.NETCoreApp'" />
<PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Condition="'$(TargetFrameworkIdentifier)' != '.NETCoreApp'" />
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Microsoft.Extensions.DataRetrieval.Abstractions

Abstractions for building composable retrieval pipelines in .NET RAG (Retrieval-Augmented Generation) applications. The retrieval abstractions are the symmetric counterpart to [`Microsoft.Extensions.DataIngestion`](../Microsoft.Extensions.DataIngestion.Abstractions/) — ingestion writes data in, retrieval reads relevant data out.

## Core Types

| Type | Description |
|------|-------------|
| `RetrievalQuery` | Query text with support for variants (multi-query expansion) and metadata for inter-processor communication. |
| `RetrievalChunk` | A single retrieved chunk with content, relevance score, and record metadata. |
| `RetrievalResults` | Collection of retrieved chunks with pipeline-level metadata (e.g., CRAG scores, reranking info). |
| `RetrievalQueryProcessor` | Abstract base class for pre-search processors (query expansion, HyDE, adaptive routing). |
| `RetrievalResultProcessor` | Abstract base class for post-search processors (re-ranking, CRAG quality gating). |
| `IReranker` | Interface for re-ranking strategies (LLM-based, cross-encoder, ONNX models). |

## Which package to reference

Libraries that provide implementations of the abstractions (e.g., custom query processors, re-rankers) should reference only `Microsoft.Extensions.DataRetrieval.Abstractions`.

Applications that need the full pipeline orchestrator (`RetrievalPipeline`) should reference `Microsoft.Extensions.DataRetrieval` instead (which itself references the abstractions).

## Install the package

From the command-line:

```console
dotnet add package Microsoft.Extensions.DataRetrieval.Abstractions --prerelease
```

Or directly in the C# project file:

```xml
<ItemGroup>
<PackageReference Include="Microsoft.Extensions.DataRetrieval.Abstractions" Version="[CURRENTVERSION]" />
</ItemGroup>
```

## Feedback & Contributing

We welcome feedback and contributions in [our GitHub repo](https://github.com/dotnet/extensions).
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Represents a single chunk returned from retrieval.
/// </summary>
public sealed class RetrievalChunk
{
/// <summary>
/// Initializes a new instance of the <see cref="RetrievalChunk"/> class.
/// </summary>
/// <param name="content">The text content of the chunk.</param>
/// <param name="score">The relevance score from vector search.</param>
public RetrievalChunk(string content, double score)
{
Content = content;
Score = score;
}

/// <summary>
/// Gets the text content of this chunk.
/// </summary>
public string Content { get; }

/// <summary>
/// Gets or sets the relevance score.
/// </summary>
public double Score { get; set; }

/// <summary>
/// Gets the underlying record data as key-value pairs.
/// </summary>
/// <remarks>
/// Contains the full record fields from the vector store, enabling
/// downstream consumers to reconstruct strongly-typed records.
/// </remarks>
public IDictionary<string, object?> Record { get; } = new Dictionary<string, object?>();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Represents a retrieval query with optional expanded variants and metadata.
/// </summary>
public sealed class RetrievalQuery
{
/// <summary>
/// Initializes a new instance of the <see cref="RetrievalQuery"/> class.
/// </summary>
/// <param name="text">The original query text.</param>
public RetrievalQuery(string text)
{
Text = text;
Variants = [text];
}

/// <summary>
/// Gets the original query text.
/// </summary>
public string Text { get; }

/// <summary>
/// Gets or sets the query variants to search with.
/// </summary>
/// <remarks>
/// Pre-query processors may expand a single query into multiple variants
/// (e.g., multi-query expansion, HyDE). Each variant is searched independently
/// and results are merged using Reciprocal Rank Fusion.
/// </remarks>
public IList<string> Variants { get; set; }

/// <summary>
/// Gets the metadata associated with this query.
/// </summary>
public IDictionary<string, object?> Metadata { get; } = new Dictionary<string, object?>();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Threading;
using System.Threading.Tasks;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Processes a retrieval query before vector search is performed.
/// </summary>
/// <remarks>
/// Pre-search processors transform or expand a <see cref="RetrievalQuery"/>
/// before it is sent to the vector store. Examples include multi-query expansion,
/// HyDE (Hypothetical Document Embeddings), and adaptive routing.
/// </remarks>
public abstract class RetrievalQueryProcessor
{
/// <summary>
/// Processes the query asynchronously before vector search.
/// </summary>
/// <param name="query">The retrieval query to process.</param>
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param>
/// <returns>The processed query.</returns>
public abstract Task<RetrievalQuery> ProcessAsync(RetrievalQuery query, CancellationToken cancellationToken = default);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Threading;
using System.Threading.Tasks;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Processes retrieval results after vector search is performed.
/// </summary>
/// <remarks>
/// Post-search processors transform or filter <see cref="RetrievalResults"/>
/// after they are returned from the vector store. Examples include re-ranking,
/// CRAG (Corrective RAG) quality validation, and deduplication.
/// </remarks>
public abstract class RetrievalResultProcessor
{
/// <summary>
/// Processes the results asynchronously after vector search.
/// </summary>
/// <param name="results">The retrieval results to process.</param>
/// <param name="query">The original query (for context during processing).</param>
/// <param name="cancellationToken">The token to monitor for cancellation requests.</param>
/// <returns>The processed results.</returns>
public abstract Task<RetrievalResults> ProcessAsync(RetrievalResults results, RetrievalQuery query, CancellationToken cancellationToken = default);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;

namespace Microsoft.Extensions.DataRetrieval;

/// <summary>
/// Represents the results of a retrieval operation.
/// </summary>
public sealed class RetrievalResults
{
/// <summary>
/// Gets or sets the retrieved chunks, ordered by relevance.
/// </summary>
public IList<RetrievalChunk> Chunks { get; set; } = [];

/// <summary>
/// Gets the metadata from the retrieval pipeline.
/// </summary>
/// <remarks>
/// Pipeline processors may add metadata such as CRAG quality scores,
/// reranking diagnostics, or query expansion details.
/// </remarks>
public IDictionary<string, object?> Metadata { get; } = new Dictionary<string, object?>();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

namespace Microsoft.Extensions.DataRetrieval;

internal static class DiagnosticsConstants
{
internal const string ActivitySourceName = "Experimental.Microsoft.Extensions.DataRetrieval";
internal const string ErrorTypeTagName = "error.type";
}
21 changes: 21 additions & 0 deletions src/Libraries/Microsoft.Extensions.DataRetrieval/Log.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using Microsoft.Extensions.Logging;

#pragma warning disable S109 // Magic numbers should not be used

namespace Microsoft.Extensions.DataRetrieval
{
internal static partial class Log
{
[LoggerMessage(0, LogLevel.Debug, "Running query processor: {processor}.")]
internal static partial void RunningQueryProcessor(this ILogger logger, string processor);

[LoggerMessage(1, LogLevel.Debug, "Searching variant: {variant}.")]
internal static partial void SearchingVariant(this ILogger logger, string variant);

[LoggerMessage(2, LogLevel.Debug, "Running result processor: {processor}.")]
internal static partial void RunningResultProcessor(this ILogger logger, string processor);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFrameworks>$(TargetFrameworks);netstandard2.0</TargetFrameworks>
<RootNamespace>Microsoft.Extensions.DataRetrieval</RootNamespace>
<Description>Data Retrieval utilities for RAG.</Description>
<Workstream>RAG</Workstream>
<PackageTags>RAG;retrieval;search;reranking</PackageTags>
<UseLoggingGenerator>true</UseLoggingGenerator>
<DisableMicrosoftExtensionsLoggingSourceGenerator>false</DisableMicrosoftExtensionsLoggingSourceGenerator>
<ForceLatestDotnetVersions>true</ForceLatestDotnetVersions>
<Stage>preview</Stage>
<EnablePackageValidation>false</EnablePackageValidation>
<MinCodeCoverage>75</MinCodeCoverage>
<MinMutationScore>75</MinMutationScore>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\Microsoft.Extensions.DataRetrieval.Abstractions\Microsoft.Extensions.DataRetrieval.Abstractions.csproj" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="Microsoft.Bcl.Memory" Condition="'$(TargetFrameworkIdentifier)' != '.NETCoreApp'" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
<PackageReference Include="Microsoft.Extensions.VectorData.Abstractions" />
<PackageReference Include="System.Diagnostics.DiagnosticSource" />
</ItemGroup>

<ItemGroup Condition="!$([MSBuild]::IsTargetFrameworkCompatible('$(TargetFramework)', 'net10.0'))">
<PackageReference Include="System.Linq.AsyncEnumerable" />
</ItemGroup>

</Project>
Loading
Loading