[ICLR2026] Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?

Official Python implementation of the paper "Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?" in ICLR 2026.

🔥 News

[2026/01/26] Our paper has been accepted to ICLR 2026. Thanks!

🚀 Pipeline

Here's an overview of the process behind our EssenceBench method:

Notably, the method can be run on a CPU without any GPUs!

🛠️ Getting Started

To get started with EssenceBench, follow the installation instructions below.

Clone the repo

git clone https://github.com/gszfwsb/EssenceBench.git

Install dependencies

conda create -n essencebench python=3.11
conda activate essencebench
pip install -r requirements.txt

Download and preprocess data (eg. gsm8k)

export HF_ENDPOINT=https://huggingface.co # or https://hf-mirror.com
export HF_TOKEN=hf_********************** # your Huggingface token

# if you want to run other dataset, just replacing "gsm8k" using one of "arc", "truthfulqa", "winogrande", "hellswag" and "mmlu"   
python process_data_main.py \
    --datasets gsm8k \
    --temp_dir ./ \
    --data_subdir data

Subset Selection (eg. gsm8k)

python code/subset_selection/train_multi_round.py \
    --data ./data/gsm8k/gsm8k_discard_items_transform.jsonl \
    -k 50 \
    --rounds 3 \
    --gens 2000 \
    --ablation full\
    
# if you want to run other dataset, replace "gsm8k" using one of "arc", "truthfulqa", "winogrande", "hellswag" and "mmlu"   
# eg. --data ./data/arc/arc_discard_items_transform.jsonl

📮 Contact

If you have any questions, please contact us:

Shaobo Wang(shaobowang1009@sjtu.edu.cn)
Cong Wang(2149367891a@gmail.com)
Wenjie Fu(22300240028@m.fudan.edu.cn).

📌 Citation

If you find EssenceBench useful for your research and applications, please cite using this BibTeX:

@inproceedings{wang2026EssenceBench,
      title={Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?}, 
      author={Shaobo Wang and Cong Wang and Wenjie Fu and Yue Min and Mingquan Feng and Isabel Guan and Xuming Hu and Conghui He and Cunxiang Wang and Kexin Yang and Xingzhang Ren and Fei Huang and Dayiheng Liu and Linfeng Zhang},
      booktitle={International Conference on Learning Representations (ICLR)},
      year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
process_data_main.py		process_data_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR2026] Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?

🔥 News

🚀 Pipeline

🛠️ Getting Started

📮 Contact

📌 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR2026] Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?

🔥 News

🚀 Pipeline

🛠️ Getting Started

📮 Contact

📌 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages