Skip to content

ripl/CamPoseOpensource

Repository files navigation

Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning

Tianchong Jiang1, Jingtian Ji1, Xiangshan Tan1, Jiading Fang2,*, Anand Bhattad3, Vitor Guizilini4,†, Matthew R. Walter1,†

1TTIC   2Waymo   3Johns Hopkins University   4Toyota Research Institute

Paper PDF Project Page

Accepted to ICRA 2026 (Vienna, June 2026)

Installation

First, clone the repo and cd into it.

git clone https://github.com/ripl/CamPoseOpensource
cd CamPoseOpensource

Then, run the setup script. It will setup the conda environment and download data.

bash setup.sh

Activate the conda environment with

conda activate know_your_camera

If you only need ManiSkill or robosuite, comment out the lines to install other one.

How to run

You can run training in robosuite with

python policy_robosuite/train.py

or in ManiSkill with

python policy_maniskill/train.py

Reproducing the paper

Every experiment in the paper is specified in reproduce/paper_runs.yaml, keyed by figure (e.g. fig6) and entry. To launch one run, pass the figure, entry, and seed:

python reproduce/reproduce.py --paper_item fig6 --exp lift_randomized_with_conditioning --seed 0

This invokes the matching train.py with the exact overrides and seed used for the paper.

If you use a coding agent (Cursor, Claude Code, Codex, etc.), you can point it at reproduce/SKILL.md and just say e.g. "reproduce fig 6 lift randomized with conditioning" — it will ask about your scheduler and draft a job script. I honestly don't know how well this works in practice yet.

Results will not be bitwise identical across machines — this is not guaranteed on modern GPUs (see this blog for background) — but numbers should match the paper in expectation. If something looks off, or you hit any other issue, I'd really appreciate hearing about it — please open an issue or email tianchongj [at] ttic [dot] edu.

Training runs are long (typically hours to a day per seed on one GPU), so in practice you'll want a cluster (SLURM or similar).

Plücker Snippet

To add camera conditioning to your policy, you can use the following minimalist snippet to get Plücker raymap from intrinsics and extrinsics. (It assumes OpenCV convention i.e. image origin at top-left, +z is forward.)

import torch

def get_plucker_raymap(K, c2w, height, width):
    """intrinsics (3,3), cam2world (4,4), height int, width int"""
    vv, uu = torch.meshgrid(
        torch.arange(height, device=K.device, dtype=K.dtype) + 0.5,
        torch.arange(width, device=K.device, dtype=K.dtype) + 0.5,
        indexing="ij",
    )
    rays = torch.stack([uu, vv, torch.ones_like(uu)], dim=-1)
    d_world = torch.nn.functional.normalize(
        (rays @ torch.linalg.inv(K).T) @ c2w[:3, :3].T,
        dim=-1,
        eps=1e-9,
    )
    o = c2w[:3, 3].view(1, 1, 3)
    m = torch.cross(o, d_world, dim=-1)
    return torch.cat([d_world, m], dim=-1)                         

BibTeX

If you find this work useful, please cite:

@article{jiang2025knowyourcamera,
  title     = {Do You Know Where Your Camera Is? {V}iew-Invariant Policy Learning with Camera Conditioning},
  author    = {Tianchong Jiang and Jingtian Ji and Xiangshan Tan and Jiading Fang and Anand Bhattad and Vitor Guizilini and Matthew R. Walter},
  journal   = {arXiv preprint arXiv:2510.02268},
  year      = {2025},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors