Skip to content

Latest commit

 

History

History

README.md

Evaluation Results

This directory contains the detailed evaluation results.

It includes comparative visualizations at the root level, as well as individual subdirectories containing the raw data and detailed metrics for each specific model.

📊 High-Level Benchmarks

The following graphs provide a summary of performance across all evaluated models and topics.

Visualization Description
Model Pass Rate Model Pass Rate
Comparative performance of all models.
Metric Pass Rate Metric Pass Rate
Breakdown of success rates across different metrics of all models.
Topic Pass Rate Topic Pass Rate
Breakdown of success rates across different documentation categories.
Topic Pass Rate No Faithfulness Topic Pass Rate (w/o Faithfulness)
Topic breakdown excluding the faithfulness metric for broader relevancy analysis.

📂 Model Directories

Detailed logs, CSVs, and specific metric breakdowns can be found in the respective folder for each model:


📏 Metrics Configured

For this evaluation, every Q&A pair was tested against 5 specific metrics at the Turn Level.

1. Ragas Metrics

See official Ragas documentation.

Response Evaluation

Context Evaluation

2. Custom Metrics

Response Evaluation

  • answer_correctness: A custom logic metric designed to validate the accuracy of the final response comparing with the expected_response.

📖 Reference: Understanding the Results

Each model directory above contains standard output files generated by the lightspeed-evaluation tool. Use this guide to interpret the data.