Support for retriever-augmented models. by akshathmangudi · Pull Request #1125 · huggingface/lighteval

akshathmangudi · 2026-01-19T05:39:59Z

ISSUE SUMMARY

Resolves #1109

This PR implements evaluation of RAG systems in LightEval. The PR provides a flexible adapter pattern that allows uers to plug in any retriever and generator combination, enabling evaluation of RAG systems on LightEval benchmarks.

The implementation provides:

RAGAdapterModel: A base class that implements the LightevalModel interface.
Interfaces to support any retrieval/generator implementation
A working example using sentence transformers for retrieval and T5 for generation.

The RAG adapter works by:

Receiving standard Doc objects.
Performs retrieval internally using the query.
Augments the prompt with retrieved context
Generates a response using the generator
Returns standard ModelResponse objects.

This allows RAG systems to be evaluated on benchmarks like TrivialQA, MMLU, etc. using the same metrics (exact_match, F1, ROUGE) as traditional language models.

You can take a look at the example provided in examples/custom_models/rag_model_example.py.

Quick Start (for example)

lighteval custom \
    "rag-flan" \
    "examples/custom_models/rag_model_example.py" \
    "triviaqa" \
    --max-samples 10 \
    --save-details

To implement your own RAG Model

Step 1. Implement Retriever

from lighteval.models.rag.rag_model import RetrieverProtocol, RetrievedDocument

class MyRetriever(RetrieverProtocol):
    def retrieve(self, query: str, top_k: int = 5) -> list[RetrievedDocument]:
        # Your retrieval logic here (FAISS, BM25, etc.)
        return [
            RetrievedDocument(text="...", score=0.95, metadata={"doc_id": 123})
        ]

Step 2. Implement Generator

from lighteval.models.rag.rag_model import GeneratorProtocol

class MyGenerator(GeneratorProtocol):
    def generate(
        self, 
        prompt: str, 
        max_new_tokens: Optional[int] = None, 
        stop_sequences: Optional[list[str]] = None, 
        **kwargs
    ) -> str:
        # Your generation logic here (Transformers, vLLM, TGI, etc.)
        return "Generated answer"
    
    # Optional: provide tokenizer for token counting
    @property
    def tokenizer(self):
        return self._tokenizer

Step 3. Create RAG Model

from lighteval.models.rag.rag_model import RAGAdapterModel
from lighteval.models.custom.custom_model import CustomModelConfig

class MyRAGModel(RAGAdapterModel):
    def __init__(self, config):
        retriever = MyRetriever()
        generator = MyGenerator()
        super().__init__(config, retriever, generator, top_k=5)

Step 4. Evaluate

lighteval custom "my-rag-model" "path/to/my_rag_model.py" "triviaqa"

(OR) using the Python API

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.custom.custom_model import CustomModelConfig
from lighteval.pipeline import Pipeline, PipelineParameters, ParallelismManager

evaluation_tracker = EvaluationTracker(output_dir="results", save_details=True)
pipeline_params = PipelineParameters(launcher_type=ParallelismManager.CUSTOM)

model_config = CustomModelConfig(
    model_name="my-rag-model",
    model_definition_file_path="path/to/my_rag_model.py"
)

pipeline = Pipeline(
    tasks="triviaqa",
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    model_config=model_config
)

pipeline.evaluate()
pipeline.save_and_push_results()

Limitations

Current implementation focuses on open-ended QA. Multiple-choice tasks would need additional logic to map generated text to choices.
Different benchmarks may require different normalization strategies. The example provides a TriviaQA-compatible normalization
The example above uses simple cosine similarity. Production systems might usemore sophisticated retrieval (reranking, hybrid search, etc)

…est things out once again

Copilot

Pull request overview

This PR implements support for evaluating Retrieval-Augmented Generation (RAG) systems within LightEval, addressing issue #1109. It introduces a flexible adapter pattern that allows users to plug in any retriever and generator combination to evaluate RAG systems on existing LightEval benchmarks.

Changes:

Added RAGAdapterModel base class implementing the LightevalModel interface with protocols for retriever and generator components
Extended ModelResponse dataclass with an optional metadata field for storing retrieval information
Provided a working example implementation using sentence transformers for retrieval and T5 for generation with a TriviaQA-focused document corpus

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File	Description
src/lighteval/models/rag/rag_model.py	Core RAG adapter implementation with RetrieverProtocol, GeneratorProtocol, ContextFormatter utility class, and RAGAdapterModel base class
src/lighteval/models/model_output.py	Added optional metadata field to ModelResponse for storing retrieval information and other model-specific data
examples/custom_models/rag_model_example.py	Complete working example with SimpleVectorRetriever and SimpleGenerator demonstrating RAG evaluation on TriviaQA-style tasks
src/lighteval/models/custom/rag_adapters.py	Placeholder file marked "TO BE IMPLEMENTED"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…val into akshath/issue-1109

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

akshathmangudi · 2026-01-23T16:39:54Z

cc: @NathanHB

akshathmangudi added 3 commits January 19, 2026 11:07

initial skeleton setup

f53536d

implementation complete with example, will have to double check and t…

7b5e478

…est things out once again

Merge branch 'main' into akshath/issue-1109

bb5fbea

akshathmangudi marked this pull request as ready for review January 23, 2026 13:55

Copilot AI review requested due to automatic review settings January 23, 2026 13:55

Copilot started reviewing on behalf of akshathmangudi January 23, 2026 13:55 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

akshathmangudi added 3 commits January 23, 2026 19:32

improving the code

e5bff95

Merge branch 'akshath/issue-1109' of github.com:akshathmangudi/lighte…

4e8a27e

…val into akshath/issue-1109

addressed copilot comments

28a674b

akshathmangudi requested a review from Copilot January 23, 2026 14:50

Copilot started reviewing on behalf of akshathmangudi January 23, 2026 14:50 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

addressed more copilot comments

5b6378b

akshathmangudi requested a review from Copilot January 23, 2026 16:28

Copilot started reviewing on behalf of akshathmangudi January 23, 2026 16:28 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

defensive checks addressed by copilot

10a296b

Conversation

akshathmangudi commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ISSUE SUMMARY

Quick Start (for example)

To implement your own RAG Model

Limitations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akshathmangudi commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akshathmangudi commented Jan 19, 2026 •

edited

Loading