Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge.
-
Updated
Oct 23, 2025 - Python
Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge.
Analyze Claude Code session logs and generate efficiency reports, cost diagnostics, and actionable recommendations. This project reads local JSONL session logs, computes deterministic efficiency signals, and can optionally add local LLM recommendations using Ollama.
Bilingual LLM annotation dataset — EN/PT quality evaluation
Add a description, image, and links to the rubric-based-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the rubric-based-evaluation topic, visit your repo's landing page and select "manage topics."