Skip to content

Extend-Robotics/TA-VLA

 
 

Repository files navigation

TA-VLA: Elucidating the Design Space of
Torque-aware Vision-Language-Action Models

CoRL 2025

Zongzheng Zhang*1 · Haobo Xu*2 · Zhuo Yang*1 ·
Chenghao Yue1 · Zehao Lin1 · Huan-ang Gao1· Ziwei Wang3 . Hao Zhao1,2

1 Beijing Academy of Artificial Intelligence (BAAI),
2 Institute for AI Industry Research (AIR), Tsinghua University,
3 Nanyang Technological University
(* indicates equal contribution)

Project Page | arXiv | Code

TA-VLA

This repository provides the implementation of TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models on openpi. It is branched from the original repository at commit cd82848. Please refer to the original repository for environment setup and training details. The following focuses only on the differences from the upstream repository.

Torque Input Types

The file src/openpi/shared/effort_type.py defines the ways in which torque information is fed into openpi, covering all the experiments described in the paper.
The types are:

  • NO
    No effort is used, but TavlaInputs still processes it for norm_stats computation.
    Used for the baseline model.

  • STATE
    Inserts the current effort into the last state[-14:] so that it will be considered by the action expert.
    Corresponds to the DePre method in Section 4.1.

  • LLM
    Projects the current effort into a token and passes it to the LLM along with image and language tokens.
    Projector MLP: Linear(in, 2*w) -> swish -> Linear(2*w, w).
    Corresponds to the Enc method in Section 4.1.

  • LLM_HIS_C
    Concatenates current and historical effort, projects it into a token, and passes it to the LLM.
    Corresponds to Enc-1 in Section 4.2.

  • LLM_HIS_T
    Projects current and historical effort into tokens separately and passes them to the LLM.
    Corresponds to Enc-H in Section 4.2.

  • EXPERT
    Projects effort into a token and passes it to the action expert (a component of the LLM) together with state and action tokens.
    Corresponds to DePost in Section 4.1.

  • EXPERT_HIS_C
    Concatenates current and historical effort, projects it into a token, and passes it to the action expert.
    Corresponds to Dec-1 in Section 4.2.

  • EXPERT_HIS_T
    Projects current and historical effort into tokens separately and passes them to the action expert.
    Corresponds to Dec-H in Section 4.2.

  • EXPERT_FUT
    Not an input type per se, but predicts future effort along with actions.
    Corresponds to Sections 5 and 6 ($π_0$ + obj).

  • EXPERT_HIS_C_FUT
    Inputs concatenated historical effort to the action expert and outputs future effort.
    Corresponds to Sections 5 and 6 ($π_0$ + obs + obj).

  • EXPERT_HIS_C_L_FUT
    Inputs concatenated historical effort as the last token and outputs future effort.
    A positional variant of the previous type, tested without performance improvement.

Note: These torque-handling implementations have only been tested on $π_0$ and may not be compatible with $π_0$-FAST.

Dataset

As in the original openpi implementation, we use datasets in the standard lerobot format. The difference is that we expect an additional field observation.effort storing the per-frame joint torque, analogous to how observation.state stores per-frame joint angles. We provide a sample dataset for the button-pressing task here.

Training

Refer to the example configurations provided in src/openpi/training/config.py. When using effort inputs, be sure to pass the corresponding effort_history parameter.

Deployment

For data collection and model deployment, we use a modified version of the AgileX official example code. In addition to reading torque values from the ROS topic, this version maintains a historical torque buffer for policies that require past torque information.

Citation

If you find this project useful, feel free to cite our work!

@article{zhang2025ta,
  title={TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models},
  author={Zhang, Zongzheng and Xu, Haobo and Yang, Zhuo and Yue, Chenghao and Lin, Zehao and Gao, Huan-ang and Wang, Ziwei and Zhao, Hao},
  journal={arXiv preprint arXiv:2509.07962},
  year={2025}
}

About

CoRL 2025 TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.7%
  • Other 1.3%