Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 1.19 KB

File metadata and controls

11 lines (9 loc) · 1.19 KB

Flow Poke Transformer Data Preprocessing

The flow poke transformer training relies on large-scale video datasets with pre-extracted trackers. Both the dataset and tracking method are generally exchangeable, no aspect in the model depends on specific choices.

We generally use sharded data where multiple samples are combined into "shards" using webdataset to reduce load on file servers in HPC systems.

Preparing a Dataset Yourself

  1. Collect a set of videos for your target domain.
  2. Install additional requirements for running the preprocessing listed in requirements.txt.
  3. Create shards from the video datasets using python shard_videos.py /path/to/videos /path/to/output/shards. This script assumes that videos are stored as mp4 files. The glob pattern in the code can be adapted to change this. Just make sure that the tracking script in the next step can decode your videos.
  4. For each shard, perform tracking. We provide a reference script using CoTracker3 that can be run by invoking python track_shard.py /path/to/shard.tar /path/to/preprocessed/output/shard.tar. This will use the first visible CUDA GPU to perform the preprocessing by default.