Skip to content

gszfwsb/EssenceBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICLR2026] Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?

Official Python implementation of the paper "Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?" in ICLR 2026.

🔥 News

  • [2026/01/26] Our paper has been accepted to ICLR 2026. Thanks!

🚀 Pipeline

Here's an overview of the process behind our EssenceBench method:

image

Notably, the method can be run on a CPU without any GPUs!

🛠️ Getting Started

To get started with EssenceBench, follow the installation instructions below.

  1. Clone the repo
git clone https://github.com/gszfwsb/EssenceBench.git
  1. Install dependencies
conda create -n essencebench python=3.11
conda activate essencebench
pip install -r requirements.txt
  1. Download and preprocess data (eg. gsm8k)
export HF_ENDPOINT=https://huggingface.co # or https://hf-mirror.com
export HF_TOKEN=hf_********************** # your Huggingface token

# if you want to run other dataset, just replacing "gsm8k" using one of "arc", "truthfulqa", "winogrande", "hellswag" and "mmlu"   
python process_data_main.py \
    --datasets gsm8k \
    --temp_dir ./ \
    --data_subdir data
  1. Subset Selection (eg. gsm8k)
python code/subset_selection/train_multi_round.py \
    --data ./data/gsm8k/gsm8k_discard_items_transform.jsonl \
    -k 50 \
    --rounds 3 \
    --gens 2000 \
    --ablation full\
    
# if you want to run other dataset, replace "gsm8k" using one of "arc", "truthfulqa", "winogrande", "hellswag" and "mmlu"   
# eg. --data ./data/arc/arc_discard_items_transform.jsonl

📮 Contact

If you have any questions, please contact us:

  • Shaobo Wang(shaobowang1009@sjtu.edu.cn)
  • Cong Wang(2149367891a@gmail.com)
  • Wenjie Fu(22300240028@m.fudan.edu.cn).

📌 Citation

If you find EssenceBench useful for your research and applications, please cite using this BibTeX:

@inproceedings{wang2026EssenceBench,
      title={Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?}, 
      author={Shaobo Wang and Cong Wang and Wenjie Fu and Yue Min and Mingquan Feng and Isabel Guan and Xuming Hu and Conghui He and Cunxiang Wang and Kexin Yang and Xingzhang Ren and Fei Huang and Dayiheng Liu and Linfeng Zhang},
      booktitle={International Conference on Learning Representations (ICLR)},
      year={2026}
}

About

Official PyTorch implementation of the paper "Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data" (EssenceBench) in ICLR 2026.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages