Skip to content

HenryHe0123/UI-RFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UI-RFT: Reinforcement Fine-Tuning for GUI Grounding

🔥 Overview

We introduce UI-RFT, the first framework utilizing rule-based RL to enhance VLMs' GUI grounding capabilities.

This work is the course project of CS3316 Reinforcement Learning.

ben

📌 Takeaway

  • Reinforced fine-tuning with only 128 high-quality samples significantly enhances GUI grounding.
  • GUI grounding is a fundamental visual ability in VLMs, improved without needing long reasoning chains.

⚙️ Usage

Training

To train VLM with verl:

./train.sh

Evaluation

To test VLM on ScreenSpot:

python ./screenspot/test.py

To test VLM on ScreenSpot-Pro:

python ./screenspot/test-pro.py

🙏 Acknowledgement

We would like to express our sincere gratitude to Yan Ma for his invaluable and highly insightful discussions.

About

UI-RFT: Reinforcement Fine-Tuning for GUI Grounding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors