Skip to content

Qwen-Applications/STAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models

Github arXiv PDF License

Algorithm Platform Team, AI Hardware Division, Alibaba

This repository contains the code and instructions necessary to reproduce the experiments presented in the paper: "STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models", accepted to ICLR 2026.

STAR (Similarity-guided Teacher-Assisted Refinement) is a novel holistic framework designed to effectively transfer the function calling capabilities of large language models (LLMs) to super-tiny, cost-efficient models. Our STAR training curriculum involves the two processes:

  1. Constrained Knowledge Distillation (CKD): The selected teacher's knowledge is transferred to a super-tiny student model (e.g., 0.6B) using our novel Constrained Knowledge Distillation (CKD) objective, which ensures training stability and preserves exploratory capacity.
  2. Similarity-guided Reinforcement Learning (Sim-RL): The distilled student model is polished with Sim-RL to enhance its generalization capability and optimize its performance on complex problems.

πŸ”₯ News

  • [2026.02.04] We released the STAR codebase, including implementations for CKD and Sim-RL.
  • [2026.02.04] Our paper is now available on arXiv: 2602.03022.
  • [2026.01.26] Our paper has been accepted to ICLR 2026!

πŸ’‘ Main Results

Our STAR models establish new state-of-the-art performance in their size classes. The STAR framework significantly closes the performance gap with much larger models.

Main Results Table

πŸ› οΈ Installation

We rely on uv for Python environment management and OpenRLHF for our RL training framework.

  1. Create Python Environment

    # Create a virtual environment using uv
    uv venv --seed --python 3.12 ./train-env
    
    # Install dependencies
    uv pip sync -p ./train-env/bin/python ./requirements_uv.txt
    
    source ./train-env/bin/activate
  2. Install Patched OpenRLHF

    # Clone the specific commit of OpenRLHF
    git clone https://github.com/OpenRLHF/OpenRLHF.git
    cd OpenRLHF
    git checkout c1fc63a9f7e1837577a76b0c688809b3c0bdc644
    
    # Apply the patch for CKD functionality
    git apply ../0001-add-ckd.patch
    cd ..

🎯 Quick Start

Model Preparation

Download the base models from Hugging Face. We use the Qwen-8B model as the teacher and smaller models as students.

# Teacher Model
huggingface-cli download star-lab/Teacher-8B --local-dir models/Teacher-8B

# Student Models (e.g., 0.6B)
huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6B

Data Preparation

Both CKD and SimRL require datasets in jsonlines format, where each line is a JSON object with two fields:

  • inputs: The prompt formatted with the Qwen chat template.
  • outputs: The response formatted with the Qwen chat template.

We recommend organizing your data into a structured format first (e.g., using the messages API format) and then converting it.

Example structured format:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "...", "tool_calls": [...], "reasoning_content": "..."},
        {"role": "tool", "content": "..."},
    ],
    "tools": [
        {"name": "...", "description": "...", "parameters": ...},
    ]
}

For demonstration purposes, we provide example_messages.jsonl, a file containing 128 instances randomly sampled from the XLAM dataset.

SimRL Datasets:

The comprehensive datasets for SimRL training can be prepared by executing the scripts/prepare_rl_data.sh script, which will automate the download and preprocessing of the ToolMind, XLAM, ToolAce, and Hammer datasets.

Notes:

  1. ToolMind Dataset: Please note that the Tool-use-synthetic dataset is no longer publicly available and has been substituted in this work with the Toolmind dataset.
  2. XLAM Dataset Access: Access to the XLAM dataset is restricted and requires authorization. You can request access here. Upon approval, a personal Hugging Face access token must be configured as an environment variable (HF_TOKEN) prior to script execution:
export HF_TOKEN="your_token_here"
bash scripts/prepare_rl_data.sh

CKD Datasets:

The dataset for CKD is generated via a two-stage process. First, we perform rollouts with the teacher model (Teacher-8B) to generate synthetic trajectories from a set of seed messages. Second, these trajectories are converted into a structured training format, incorporating the teacher's reasoning chains. These steps are executed as follows:

# 1. Generate synthetic trajectories via teacher model rollouts
python teacher_rollout.py --input=example_messages.jsonl --output=kd_messages.jsonl --model-path ./models/Teacher-8B --rollout-n 8 --dp-size 8

# 2. Convert trajectories to a structured training set with reasoning
python messages_to_trainset.py --input=kd_messages.jsonl --output=kd_data.jsonl --tokenizer-path=./models/Teacher-8B --add-reasoning-content

Training: The STAR Curriculum

Before starting, prepare the environment for a training run:

export PYTHONPATH=$PWD/OpenRLHF
ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 --disable-usage-stats

Phase 1: Constrained Knowledge Distillation (CKD)

First, distill knowledge from the teacher model to the student using CKD. This step requires training data generated by the teacher model. We provide teacher_rollout.py as a reference for generating these samples.

After preparing your models and data, edit the paths in scripts/train_ckd.sh and run it:

bash scripts/train_ckd.sh

The distilled student model will be saved to the path specified in the script (e.g., checkpoints/student-0.6b-ckd).

Phase 2: Similarity-guided Reinforcement Learning (Sim-RL)

Next, refine the CKD-distilled student model using Sim-RL to further boost its capabilities. While Sim-RL can be applied to any base model, it is most effective when used on a model already trained with CKD.

Update the model and data paths in scripts/train_sim_rl.sh and run the script:

bash scripts/train_sim_rl.sh

The final STAR-0.6B model will be saved to the path specified in the script (e.g., checkpoints/star-0.6b).

πŸ™πŸ» Acknowledgements

This project is built upon the OpenRLHF framework. We thank the original authors for their significant open-source contributions.

⭐️ Citation

If you find this work useful, please kindly cite our paper:

@misc{ni2026starsimilarityguidedteacherassistedrefinement,
      title={STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models}, 
      author={Jiliang Ni and Jiachen Pu and Zhongyi Yang and Jingfeng Luo and Conggang Hu},
      year={2026},
      eprint={2602.03022},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.03022}, 
}

About

STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors