| FALCON | From Spatial to Actions:
Grounding Vision-Language-Action Model in Spatial Foundation Priors (ICLR 2026)

Zhengshen Zhang Hao Li Yalun Dai Zhengbang Zhu Lei Zhou
Chenchen Liu Dong Wang Francis E. H. Tay Sijin Chen
Ziwei Liu Yuxiao Liu^*^† Xinghang Li^* Pan Zhou^*

^*Corresponding Author ^†Project Lead

ByteDance Seed
National University of Singapore Nanyang Technological University
Tsinghua University Singapore Management University

Updates 🚀🚀🚀

[26/01/2026] 🎊 Thrilled to share that our paper has been accepted to ICLR 2026! Code will be open-sourced soon. Stay tuned!
[20/10/2025] Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the action head of a VLA model, enabling robust spatial understanding and SOTA performance across diverse manipulation tasks without disrupting vision-language alignment. See our paper at here.

💪 Benchmark Performance Comparison

CALVIN Benchmark

SimplerEnv WidowX Robot Experiments

SimplerEnv Google Robot Experiments

Real-World Experiments

💡 For more sim/real-world benchmark results, please refer to our paper.

🗒️ TODO List

Release the code, model of FALCON.
Release the CALVIN & SimplerEnv evaluation code and model weights for FALCON series.
Release pre-training / fine-tuning code for FALCON series.
Release the code for real-world deployment of FALCON via ManiUniCon.

🖊️ Citation

If you find this project useful in your research, please consider cite:

@article{zhang2025spatial,
  title={From spatial to actions: Grounding vision-language-action model in spatial foundation priors},
  author={Zhang, Zhengshen and Li, Hao and Dai, Yalun and Zhu, Zhengbang and Zhou, Lei and Liu, Chenchen and Wang, Dong and Tay, Francis EH and Chen, Sijin and Liu, Ziwei and others},
  journal={arXiv preprint arXiv:2510.17439},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

| FALCON | From Spatial to Actions:
Grounding Vision-Language-Action Model in Spatial Foundation Priors (ICLR 2026)

Updates 🚀🚀🚀

Contents

💪 Benchmark Performance Comparison

CALVIN Benchmark

SimplerEnv WidowX Robot Experiments

SimplerEnv Google Robot Experiments

Real-World Experiments

🗒️ TODO List

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

| FALCON | From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors (ICLR 2026)

Updates 🚀🚀🚀

Contents

💪 Benchmark Performance Comparison

CALVIN Benchmark

SimplerEnv WidowX Robot Experiments

SimplerEnv Google Robot Experiments

Real-World Experiments

🗒️ TODO List

🖊️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

| FALCON | From Spatial to Actions:
Grounding Vision-Language-Action Model in Spatial Foundation Priors (ICLR 2026)

Packages