Support for retriever-augmented models. #1125
Support for retriever-augmented models. #1125akshathmangudi wants to merge 8 commits intohuggingface:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements support for evaluating Retrieval-Augmented Generation (RAG) systems within LightEval, addressing issue #1109. It introduces a flexible adapter pattern that allows users to plug in any retriever and generator combination to evaluate RAG systems on existing LightEval benchmarks.
Changes:
- Added
RAGAdapterModelbase class implementing theLightevalModelinterface with protocols for retriever and generator components - Extended
ModelResponsedataclass with an optionalmetadatafield for storing retrieval information - Provided a working example implementation using sentence transformers for retrieval and T5 for generation with a TriviaQA-focused document corpus
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
| src/lighteval/models/rag/rag_model.py | Core RAG adapter implementation with RetrieverProtocol, GeneratorProtocol, ContextFormatter utility class, and RAGAdapterModel base class |
| src/lighteval/models/model_output.py | Added optional metadata field to ModelResponse for storing retrieval information and other model-specific data |
| examples/custom_models/rag_model_example.py | Complete working example with SimpleVectorRetriever and SimpleGenerator demonstrating RAG evaluation on TriviaQA-style tasks |
| src/lighteval/models/custom/rag_adapters.py | Placeholder file marked "TO BE IMPLEMENTED" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…val into akshath/issue-1109
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
cc: @NathanHB |
ISSUE SUMMARY
Resolves #1109
This PR implements evaluation of RAG systems in LightEval. The PR provides a flexible adapter pattern that allows uers to plug in any retriever and generator combination, enabling evaluation of RAG systems on LightEval benchmarks.
The implementation provides:
RAGAdapterModel: A base class that implements theLightevalModelinterface.The RAG adapter works by:
Docobjects.ModelResponseobjects.This allows RAG systems to be evaluated on benchmarks like TrivialQA, MMLU, etc. using the same metrics (exact_match, F1, ROUGE) as traditional language models.
You can take a look at the example provided in
examples/custom_models/rag_model_example.py.Quick Start (for example)
lighteval custom \ "rag-flan" \ "examples/custom_models/rag_model_example.py" \ "triviaqa" \ --max-samples 10 \ --save-detailsTo implement your own RAG Model
Step 1. Implement Retriever
Step 2. Implement Generator
Step 3. Create RAG Model
Step 4. Evaluate
(OR) using the Python API
Limitations