This tutorial demonstrates how to perform live PyCuVSLAM tracking using stereo images and depth data from a ZED stereo camera
Refer to the Installation Guide for instructions on installing and configuring all required dependencies.
Install the ZED SDK for Linux x86 or Jetpack by following the official documentation
Note: if you already have the ZED SDK installed, you can install the Python API separately using this script
During installation, make sure to set up the ZED Python API:
-
When prompted, press
Yto install the Python API:Do you want to install the Python API (recommended) [Y/n] ? -
If you installed PyCuVSLAM using Conda or Venv, specify the path to your environment's Python executable when prompted:
Please specify your python executable: /path/to/env/python
USB Camera Troubleshooting: occasionally, the USB camera may fail to start. If this occurs, briefly disconnect and reconnect it to the USB port
The ZED camera can provide either rectified or distorted stereo images. By default, rectified images are used for visual odometry. If you need to use distorted images, update the following flag at the top of run_stereo.py:
RAW = True
Tip: Rectified images are generally preferred for most visual odometry applications, as they simplify downstream processing. Only use distorted images if you have a specific requirement
This script has been developed and validated for the ZED2 camera using distortion coefficients in the format:
To start stereo visual odometry, run:
python3 run_stereo.pyAfter starting, you should see a visualization similar to the following:

Performance Note: PyCuVSLAM stereo tracker requires reliably synchronized stereo pairs and a stable frame rate. If you see warnings about low FPS, try reducing the camera resolution or decreasing the frame rate in your camera settings
Monocular-Depth Visual Odometry requires pixel-to-pixel alignment between the camera image and its corresponding depth image. For ZED cameras, the depth image is aligned with the left camera image by default.
PyCuVSLAM expects the depth image in uint16 format, while the camera images (either RGB or grayscale) should be in uint8 format. A scale factor is used to convert pixel values to meters. In the provided script, depth is output in millimeters (sl.UNIT.MILLIMETER) with a scale factor of 1000. If you need different depth units, modify these settings in run_rgbd.py.
By default, the script uses the sl.DEPTH_MODE.PERFORMANCE depth mode of ZED SDK. For details on other depth modes and GPU acceleration, refer to the official ZED documentation.
To run monocular-depth visual odometry, execute:
python3 run_rgbd.pyYou should see a visualization showing the camera trajectory along with the input RGB and depth images:
For stereo depth cameras without an IR emitter (which can interfere with stereo image capture), you can add the stereo camera input to the RGBD tracker. In this mode, cuVSLAM still utilizes a single visual tracker cuvslam.Tracker.OdometryMode.RGBD, predicting the camera pose based on both image-depth and stereo-image input data. This approach combines the inputs at the solver level, rather than simply blending the results of two separate trackers at postprocessing stage.
To enable Mono-Depth + Stereo visual tracking, set the RUN_STEREO = True flag at the top of run_rgbd.py, then run:
python3 run_rgbd.pyYou should see a visualization displaying the camera trajectory along with the input stereo RGB and depth images:
When using the Mono-Depth + Stereo tracker, ensure that depth_camera_id corresponds to the index of the camera aligned with the depth image:
rgbd_settings = vslam.Tracker.OdometryRGBDSettings()
rgbd_settings.depth_scale_factor = 1000
rgbd_settings.depth_camera_id = 0
rgbd_settings.enable_depth_stereo_tracking = RUN_STEREOThis index should match the position of the left camera in the list of cameras passed to cuvslam.Rig():
rig.cameras = [left_camera, right_camera]
