Tianchong Jiang1, Jingtian Ji1, Xiangshan Tan1, Jiading Fang2,*, Anand Bhattad3, Vitor Guizilini4,†, Matthew R. Walter1,†
1TTIC 2Waymo 3Johns Hopkins University 4Toyota Research Institute
Accepted to ICRA 2026 (Vienna, June 2026)
First, clone the repo and cd into it.
git clone https://github.com/ripl/CamPoseOpensource
cd CamPoseOpensource
Then, run the setup script. It will setup the conda environment and download data.
bash setup.sh
Activate the conda environment with
conda activate know_your_camera
If you only need ManiSkill or robosuite, comment out the lines to install other one.
You can run training in robosuite with
python policy_robosuite/train.py
or in ManiSkill with
python policy_maniskill/train.py
Every experiment in the paper is specified in reproduce/paper_runs.yaml, keyed by figure (e.g. fig6) and entry. To launch one run, pass the figure, entry, and seed:
python reproduce/reproduce.py --paper_item fig6 --exp lift_randomized_with_conditioning --seed 0
This invokes the matching train.py with the exact overrides and seed used for the paper.
If you use a coding agent (Cursor, Claude Code, Codex, etc.), you can point it at reproduce/SKILL.md and just say e.g. "reproduce fig 6 lift randomized with conditioning" — it will ask about your scheduler and draft a job script. I honestly don't know how well this works in practice yet.
Results will not be bitwise identical across machines — this is not guaranteed on modern GPUs (see this blog for background) — but numbers should match the paper in expectation. If something looks off, or you hit any other issue, I'd really appreciate hearing about it — please open an issue or email tianchongj [at] ttic [dot] edu.
Training runs are long (typically hours to a day per seed on one GPU), so in practice you'll want a cluster (SLURM or similar).
To add camera conditioning to your policy, you can use the following minimalist snippet to get Plücker raymap from intrinsics and extrinsics. (It assumes OpenCV convention i.e. image origin at top-left, +z is forward.)
import torch
def get_plucker_raymap(K, c2w, height, width):
"""intrinsics (3,3), cam2world (4,4), height int, width int"""
vv, uu = torch.meshgrid(
torch.arange(height, device=K.device, dtype=K.dtype) + 0.5,
torch.arange(width, device=K.device, dtype=K.dtype) + 0.5,
indexing="ij",
)
rays = torch.stack([uu, vv, torch.ones_like(uu)], dim=-1)
d_world = torch.nn.functional.normalize(
(rays @ torch.linalg.inv(K).T) @ c2w[:3, :3].T,
dim=-1,
eps=1e-9,
)
o = c2w[:3, 3].view(1, 1, 3)
m = torch.cross(o, d_world, dim=-1)
return torch.cat([d_world, m], dim=-1)
If you find this work useful, please cite:
@article{jiang2025knowyourcamera,
title = {Do You Know Where Your Camera Is? {V}iew-Invariant Policy Learning with Camera Conditioning},
author = {Tianchong Jiang and Jingtian Ji and Xiangshan Tan and Jiading Fang and Anand Bhattad and Vitor Guizilini and Matthew R. Walter},
journal = {arXiv preprint arXiv:2510.02268},
year = {2025},
}