
This repository contains the code and instructions necessary to reproduce the experiments presented in the paper: "STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models", accepted to ICLR 2026.
STAR (Similarity-guided Teacher-Assisted Refinement) is a novel holistic framework designed to effectively transfer the function calling capabilities of large language models (LLMs) to super-tiny, cost-efficient models. Our STAR training curriculum involves the two processes:
- Constrained Knowledge Distillation (CKD): The selected teacher's knowledge is transferred to a super-tiny student model (e.g., 0.6B) using our novel Constrained Knowledge Distillation (CKD) objective, which ensures training stability and preserves exploratory capacity.
- Similarity-guided Reinforcement Learning (Sim-RL): The distilled student model is polished with Sim-RL to enhance its generalization capability and optimize its performance on complex problems.
- [2026.02.04] We released the STAR codebase, including implementations for CKD and Sim-RL.
- [2026.02.04] Our paper is now available on arXiv: 2602.03022.
- [2026.01.26] Our paper has been accepted to ICLR 2026!
Our STAR models establish new state-of-the-art performance in their size classes. The STAR framework significantly closes the performance gap with much larger models.
We rely on uv for Python environment management and OpenRLHF for our RL training framework.
-
Create Python Environment
# Create a virtual environment using uv uv venv --seed --python 3.12 ./train-env # Install dependencies uv pip sync -p ./train-env/bin/python ./requirements_uv.txt source ./train-env/bin/activate
-
Install Patched OpenRLHF
# Clone the specific commit of OpenRLHF git clone https://github.com/OpenRLHF/OpenRLHF.git cd OpenRLHF git checkout c1fc63a9f7e1837577a76b0c688809b3c0bdc644 # Apply the patch for CKD functionality git apply ../0001-add-ckd.patch cd ..
Download the base models from Hugging Face. We use the Qwen-8B model as the teacher and smaller models as students.
# Teacher Model
huggingface-cli download star-lab/Teacher-8B --local-dir models/Teacher-8B
# Student Models (e.g., 0.6B)
huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6BBoth CKD and SimRL require datasets in jsonlines format, where each line is a JSON object with two fields:
inputs: The prompt formatted with the Qwen chat template.outputs: The response formatted with the Qwen chat template.
We recommend organizing your data into a structured format first (e.g., using the messages API format) and then converting it.
Example structured format:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "...", "tool_calls": [...], "reasoning_content": "..."},
{"role": "tool", "content": "..."},
],
"tools": [
{"name": "...", "description": "...", "parameters": ...},
]
}For demonstration purposes, we provide example_messages.jsonl, a file containing 128 instances randomly sampled from the XLAM dataset.
SimRL Datasets:
The comprehensive datasets for SimRL training can be prepared by executing the scripts/prepare_rl_data.sh script, which will automate the download and preprocessing of the ToolMind, XLAM, ToolAce, and Hammer datasets.
Notes:
- ToolMind Dataset: Please note that the Tool-use-synthetic dataset is no longer publicly available and has been substituted in this work with the Toolmind dataset.
- XLAM Dataset Access: Access to the XLAM dataset is restricted and requires authorization. You can request access here. Upon approval, a personal Hugging Face access token must be configured as an environment variable (HF_TOKEN) prior to script execution:
export HF_TOKEN="your_token_here" bash scripts/prepare_rl_data.sh
CKD Datasets:
The dataset for CKD is generated via a two-stage process. First, we perform rollouts with the teacher model (Teacher-8B) to generate synthetic trajectories from a set of seed messages. Second, these trajectories are converted into a structured training format, incorporating the teacher's reasoning chains. These steps are executed as follows:
# 1. Generate synthetic trajectories via teacher model rollouts
python teacher_rollout.py --input=example_messages.jsonl --output=kd_messages.jsonl --model-path ./models/Teacher-8B --rollout-n 8 --dp-size 8
# 2. Convert trajectories to a structured training set with reasoning
python messages_to_trainset.py --input=kd_messages.jsonl --output=kd_data.jsonl --tokenizer-path=./models/Teacher-8B --add-reasoning-contentBefore starting, prepare the environment for a training run:
export PYTHONPATH=$PWD/OpenRLHF
ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 --disable-usage-statsFirst, distill knowledge from the teacher model to the student using CKD. This step requires training data generated by the teacher model. We provide teacher_rollout.py as a reference for generating these samples.
After preparing your models and data, edit the paths in scripts/train_ckd.sh and run it:
bash scripts/train_ckd.shThe distilled student model will be saved to the path specified in the script (e.g., checkpoints/student-0.6b-ckd).
Next, refine the CKD-distilled student model using Sim-RL to further boost its capabilities. While Sim-RL can be applied to any base model, it is most effective when used on a model already trained with CKD.
Update the model and data paths in scripts/train_sim_rl.sh and run the script:
bash scripts/train_sim_rl.shThe final STAR-0.6B model will be saved to the path specified in the script (e.g., checkpoints/star-0.6b).
This project is built upon the OpenRLHF framework. We thank the original authors for their significant open-source contributions.
If you find this work useful, please kindly cite our paper:
@misc{ni2026starsimilarityguidedteacherassistedrefinement,
title={STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models},
author={Jiliang Ni and Jiachen Pu and Zhongyi Yang and Jingfeng Luo and Conggang Hu},
year={2026},
eprint={2602.03022},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.03022},
}