CoRL 2025
Zongzheng Zhang*1 · Haobo Xu*2 · Zhuo Yang*1 ·
Chenghao Yue1 · Zehao Lin1 · Huan-ang Gao1· Ziwei Wang3 . Hao Zhao1,2
1 Beijing Academy of Artificial Intelligence (BAAI),
2 Institute for AI Industry Research (AIR), Tsinghua University,
3 Nanyang Technological University
(* indicates equal contribution)
Project Page | arXiv | Code
This repository provides the implementation of TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models on openpi.
It is branched from the original repository at commit cd82848.
Please refer to the original repository for environment setup and training details.
The following focuses only on the differences from the upstream repository.
The file src/openpi/shared/effort_type.py defines the ways in which torque information is fed into openpi, covering all the experiments described in the paper.
The types are:
-
NO
No effort is used, butTavlaInputsstill processes it fornorm_statscomputation.
Used for the baseline model. -
STATE
Inserts the current effort into the laststate[-14:]so that it will be considered by the action expert.
Corresponds to the DePre method in Section 4.1. -
LLM
Projects the current effort into a token and passes it to the LLM along with image and language tokens.
Projector MLP:Linear(in, 2*w) -> swish -> Linear(2*w, w).
Corresponds to the Enc method in Section 4.1. -
LLM_HIS_C
Concatenates current and historical effort, projects it into a token, and passes it to the LLM.
Corresponds to Enc-1 in Section 4.2. -
LLM_HIS_T
Projects current and historical effort into tokens separately and passes them to the LLM.
Corresponds to Enc-H in Section 4.2. -
EXPERT
Projects effort into a token and passes it to the action expert (a component of the LLM) together with state and action tokens.
Corresponds to DePost in Section 4.1. -
EXPERT_HIS_C
Concatenates current and historical effort, projects it into a token, and passes it to the action expert.
Corresponds to Dec-1 in Section 4.2. -
EXPERT_HIS_T
Projects current and historical effort into tokens separately and passes them to the action expert.
Corresponds to Dec-H in Section 4.2. -
EXPERT_FUT
Not an input type per se, but predicts future effort along with actions.
Corresponds to Sections 5 and 6 ($π_0$ + obj). -
EXPERT_HIS_C_FUT
Inputs concatenated historical effort to the action expert and outputs future effort.
Corresponds to Sections 5 and 6 ($π_0$ + obs + obj). -
EXPERT_HIS_C_L_FUT
Inputs concatenated historical effort as the last token and outputs future effort.
A positional variant of the previous type, tested without performance improvement.
Note: These torque-handling implementations have only been tested on
As in the original openpi implementation, we use datasets in the standard lerobot format.
The difference is that we expect an additional field observation.effort storing the per-frame joint torque, analogous to how observation.state stores per-frame joint angles.
We provide a sample dataset for the button-pressing task here.
Refer to the example configurations provided in src/openpi/training/config.py.
When using effort inputs, be sure to pass the corresponding effort_history parameter.
For data collection and model deployment, we use a modified version of the AgileX official example code. In addition to reading torque values from the ROS topic, this version maintains a historical torque buffer for policies that require past torque information.
If you find this project useful, feel free to cite our work!
@article{zhang2025ta,
title={TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models},
author={Zhang, Zongzheng and Xu, Haobo and Yang, Zhuo and Yue, Chenghao and Lin, Zehao and Gao, Huan-ang and Wang, Ziwei and Zhao, Hao},
journal={arXiv preprint arXiv:2509.07962},
year={2025}
}