STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models

Algorithm Platform Team, AI Hardware Division, Alibaba

This repository contains the code and instructions necessary to reproduce the experiments presented in the paper: "STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models", accepted to ICLR 2026.

STAR (Similarity-guided Teacher-Assisted Refinement) is a novel holistic framework designed to effectively transfer the function calling capabilities of large language models (LLMs) to super-tiny, cost-efficient models. Our STAR training curriculum involves the two processes:

Constrained Knowledge Distillation (CKD): The selected teacher's knowledge is transferred to a super-tiny student model (e.g., 0.6B) using our novel Constrained Knowledge Distillation (CKD) objective, which ensures training stability and preserves exploratory capacity.
Similarity-guided Reinforcement Learning (Sim-RL): The distilled student model is polished with Sim-RL to enhance its generalization capability and optimize its performance on complex problems.

🔥 News

[2026.02.04] We released the STAR codebase, including implementations for CKD and Sim-RL.
[2026.02.04] Our paper is now available on arXiv: 2602.03022.
[2026.01.26] Our paper has been accepted to ICLR 2026!

💡 Main Results

Our STAR models establish new state-of-the-art performance in their size classes. The STAR framework significantly closes the performance gap with much larger models.

🛠️ Installation

We rely on uv for Python environment management and OpenRLHF for our RL training framework.

Create Python Environment

# Create a virtual environment using uv
uv venv --seed --python 3.12 ./train-env

# Install dependencies
uv pip sync -p ./train-env/bin/python ./requirements_uv.txt

source ./train-env/bin/activate

Install Patched OpenRLHF

# Clone the specific commit of OpenRLHF
git clone https://github.com/OpenRLHF/OpenRLHF.git
cd OpenRLHF
git checkout c1fc63a9f7e1837577a76b0c688809b3c0bdc644

# Apply the patch for CKD functionality
git apply ../0001-add-ckd.patch
cd ..

🎯 Quick Start

Model Preparation

Download the base models from Hugging Face. We use the Qwen-8B model as the teacher and smaller models as students.

# Teacher Model
huggingface-cli download star-lab/Teacher-8B --local-dir models/Teacher-8B

# Student Models (e.g., 0.6B)
huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6B

Data Preparation

Both CKD and SimRL require datasets in jsonlines format, where each line is a JSON object with two fields:

inputs: The prompt formatted with the Qwen chat template.
outputs: The response formatted with the Qwen chat template.

We recommend organizing your data into a structured format first (e.g., using the messages API format) and then converting it.

Example structured format:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "...", "tool_calls": [...], "reasoning_content": "..."},
        {"role": "tool", "content": "..."},
    ],
    "tools": [
        {"name": "...", "description": "...", "parameters": ...},
    ]
}

For demonstration purposes, we provide example_messages.jsonl, a file containing 128 instances randomly sampled from the XLAM dataset.

SimRL Datasets:

The comprehensive datasets for SimRL training can be prepared by executing the scripts/prepare_rl_data.sh script, which will automate the download and preprocessing of the ToolMind, XLAM, ToolAce, and Hammer datasets.

Notes:

ToolMind Dataset: Please note that the Tool-use-synthetic dataset is no longer publicly available and has been substituted in this work with the Toolmind dataset.

XLAM Dataset Access: Access to the XLAM dataset is restricted and requires authorization. You can request access here. Upon approval, a personal Hugging Face access token must be configured as an environment variable (HF_TOKEN) prior to script execution:
export HF_TOKEN="your_token_here"
bash scripts/prepare_rl_data.sh

CKD Datasets:

The dataset for CKD is generated via a two-stage process. First, we perform rollouts with the teacher model (Teacher-8B) to generate synthetic trajectories from a set of seed messages. Second, these trajectories are converted into a structured training format, incorporating the teacher's reasoning chains. These steps are executed as follows:

# 1. Generate synthetic trajectories via teacher model rollouts
python teacher_rollout.py --input=example_messages.jsonl --output=kd_messages.jsonl --model-path ./models/Teacher-8B --rollout-n 8 --dp-size 8

# 2. Convert trajectories to a structured training set with reasoning
python messages_to_trainset.py --input=kd_messages.jsonl --output=kd_data.jsonl --tokenizer-path=./models/Teacher-8B --add-reasoning-content

Training: The STAR Curriculum

Before starting, prepare the environment for a training run:

export PYTHONPATH=$PWD/OpenRLHF
ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 --disable-usage-stats

Phase 1: Constrained Knowledge Distillation (CKD)

First, distill knowledge from the teacher model to the student using CKD. This step requires training data generated by the teacher model. We provide teacher_rollout.py as a reference for generating these samples.

After preparing your models and data, edit the paths in scripts/train_ckd.sh and run it:

bash scripts/train_ckd.sh

The distilled student model will be saved to the path specified in the script (e.g., checkpoints/student-0.6b-ckd).

Phase 2: Similarity-guided Reinforcement Learning (Sim-RL)

Next, refine the CKD-distilled student model using Sim-RL to further boost its capabilities. While Sim-RL can be applied to any base model, it is most effective when used on a model already trained with CKD.

Update the model and data paths in scripts/train_sim_rl.sh and run the script:

bash scripts/train_sim_rl.sh

The final STAR-0.6B model will be saved to the path specified in the script (e.g., checkpoints/star-0.6b).

🙏🏻 Acknowledgements

This project is built upon the OpenRLHF framework. We thank the original authors for their significant open-source contributions.

⭐️ Citation

If you find this work useful, please kindly cite our paper:

@misc{ni2026starsimilarityguidedteacherassistedrefinement,
      title={STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models}, 
      author={Jiliang Ni and Jiachen Pu and Zhongyi Yang and Jingfeng Luo and Conggang Hu},
      year={2026},
      eprint={2602.03022},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.03022}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
data		data
data_process		data_process
scripts		scripts
0001-add-ckd.patch		0001-add-ckd.patch
LICENSE		LICENSE
README.md		README.md
example_messages.jsonl		example_messages.jsonl
requirements_uv.txt		requirements_uv.txt
simrl.py		simrl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models

🔥 News

💡 Main Results

🛠️ Installation

🎯 Quick Start

Model Preparation

Data Preparation

Training: The STAR Curriculum

Phase 1: Constrained Knowledge Distillation (CKD)

Phase 2: Similarity-guided Reinforcement Learning (Sim-RL)

🙏🏻 Acknowledgements

⭐️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models

🔥 News

💡 Main Results

🛠️ Installation

🎯 Quick Start

Model Preparation

Data Preparation

Training: The STAR Curriculum

Phase 1: Constrained Knowledge Distillation (CKD)

Phase 2: Similarity-guided Reinforcement Learning (Sim-RL)

🙏🏻 Acknowledgements

⭐️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages