Name	Name	Last commit message	Last commit date
parent directory ..
gemini-2.5-flash-lite	gemini-2.5-flash-lite
gemini-2.5-pro	gemini-2.5-pro
gpt-4o-mini	gpt-4o-mini
gpt-5.2	gpt-5.2
llama-31-8b	llama-31-8b
README.md	README.md
model-metric-passrate.png	model-metric-passrate.png
model-pass-rate.png	model-pass-rate.png
topic-passrate-without-faithfulness.png	topic-passrate-without-faithfulness.png
topic-passrate.png	topic-passrate.png

Name

Last commit message

Last commit date

gemini-2.5-flash-lite

model-metric-passrate.png

model-pass-rate.png

topic-passrate-without-faithfulness.png

topic-passrate.png

Evaluation Results

This directory contains the detailed evaluation results.

It includes comparative visualizations at the root level, as well as individual subdirectories containing the raw data and detailed metrics for each specific model.

📊 High-Level Benchmarks

The following graphs provide a summary of performance across all evaluated models and topics.

Visualization	Description
	Model Pass Rate Comparative performance of all models.
	Metric Pass Rate Breakdown of success rates across different metrics of all models.
	Topic Pass Rate Breakdown of success rates across different documentation categories.
	Topic Pass Rate (w/o Faithfulness) Topic breakdown excluding the faithfulness metric for broader relevancy analysis.

📂 Model Directories

Detailed logs, CSVs, and specific metric breakdowns can be found in the respective folder for each model:

📏 Metrics Configured

For this evaluation, every Q&A pair was tested against 5 specific metrics at the Turn Level.

1. Ragas Metrics

See official Ragas documentation.

Response Evaluation

faithfulness

Context Evaluation

context_recall
context_relevance
context_precision_without_reference

2. Custom Metrics

Response Evaluation

answer_correctness: A custom logic metric designed to validate the accuracy of the final response comparing with the expected_response.

📖 Reference: Understanding the Results

Each model directory above contains standard output files generated by the lightspeed-evaluation tool. Use this guide to interpret the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Evaluation Results

📊 High-Level Benchmarks

📂 Model Directories

📏 Metrics Configured

1. Ragas Metrics

2. Custom Metrics

📖 Reference: Understanding the Results

FilesExpand file tree

evaluation-result

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation-result

Folders and files

parent directory

README.md

Evaluation Results

📊 High-Level Benchmarks

📂 Model Directories

📏 Metrics Configured

1. Ragas Metrics

2. Custom Metrics

📖 Reference: Understanding the Results