Panoramic Affordance Prediction

Zixin Zhang^1*, Chenfei Liao^1*, Hongfei Zhang¹, Harold H. Chen¹, Kanghao Chen¹, Zichen Wen³, Litao Guo¹, Bin Ren⁴, Xu Zheng¹, Yinchuan Li⁶, Xuming Hu¹, Nicu Sebe⁵, Ying-Cong Chen^1,2†

¹HKUST(GZ), ²HKUST, ³SJTU, ⁴MBZUAI, ⁵UniTrento, ⁶Knowin

*Equal contribution †Corresponding author

Official repository for the paper: Panoramic Affordance Prediction.

Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of View (FoV) and fragmented observations. In this paper, we present the first exploration into Panoramic Affordance Prediction, utilizing 360-degree imagery to capture global spatial relationships and holistic scene understanding.

🚀 Right Around the Corner!

The codebase is currently undergoing internal review and clean-up.

We plan to release the following components soon (in two weeks):

PAP-12K Dataset (All full resolution images, QA annotations, and Segmentation Masks)
Evaluation Scripts for the benchmark
Source Code for the PAP inference pipeline

Please stay tuned for updates!

🌟 Highlights

New Task: We introduce the First Exploration into Panoramic Affordance Prediction, overcoming the "tunnel vision" of traditional pinhole camera based affordance methods.
PAP-12K Dataset (100% Real-World): A large-scale benchmark featuring 1,003 natively captured ultra-high-resolution (12K) panoramic images from diverse indoor environments, coupled with over 13,000 carefully annotated reasoning-based QA pairs with pixel-level affordance masks.
PAP Framework: A training-free, coarse-to-fine pipeline mimicking human foveal vision to handle panoramic challenges like geometric distortion, scale variations, and boundary discontinuity.

📊 Dataset (PAP-12K)

PAP-12K is explicitly designed to encapsulate the unique challenges of 360° Equirectangular Projection (ERP) imagery. Unlike synthetic or web-crawled datasets, all 1,003 ultra-high resolution (11904×5952) panoramic images in PAP-12K were natively captured in real-world environments using professional 360° cameras. This ensures authentic geometric distortions, lighting conditions, and natural object scales, bridging the gap between static dataset evaluation and practical robotic applications.

Key challenges captured include:

Geometric Distortion: Objects suffer from severe stretching near the poles.
Extreme Scale Variations: Unconstrained environments lead to minute, sub-scale interactive targets.
Boundary Discontinuity: Continuous objects are split at image edges.

(Dataset download links and formatting instructions will be provided here soon.)

🛠️ Method Overview

Our proposed PAP framework operates in three primary stages to tackle 360-degree scenes:

Recursive Visual Routing: Uses numerical grid prompting to guide Vision-Language Models (VLMs) to dynamically "zoom in" and coarsely locate target tools.
Adaptive Gaze: Projects the spherical region onto a tailored perspective plane to act as a domain adapter, eliminating geometric distortions and boundary discontinuities.
Cascaded Affordance Grounding: Deploys robust 2D vision models (Open-Vocabulary Detector + SAM) within the rectified patch to extract precise, instance-level masks.

📧 Contact

If you have any questions or suggestions, please feel free to contact us at zzhang300@connect.hkust-gz.edu.cn, cliao127@connect.hkust-gz.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Panoramic Affordance Prediction

🚀 Right Around the Corner!

🌟 Highlights

📊 Dataset (PAP-12K)

🛠️ Method Overview

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Panoramic Affordance Prediction

🚀 Right Around the Corner!

🌟 Highlights

📊 Dataset (PAP-12K)

🛠️ Method Overview

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages