Skip to content

UCF-CRCV/TF-CoVR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos

tf-covr

Animesh Gupta1  |  Jay Parmar1  |  Ishan Rajendrakumar Dave2  |  Mubarak Shah1

1University of Central Florida  2Adobe 

  arXiv  🤗 Dataset License  visitors


Accepted in NeurIPS Datasets and Benchmarks track 2025

If you like our project, please give us a star ⭐ on GitHub for the latest update.

tfcovr teaser gif

Composed Video Retrieval (CoVR) retrieves a target video given a query video and a modification text describing the intended change. Existing CoVR benchmarks emphasize appearance shifts or coarse event changes and therefore do not test the ability to capture subtle, fast-paced temporal differences. We introduce TF-CoVR, the first large-scale benchmark dedicated to temporally fine-grained CoVR. TF-CoVR focuses on gymnastics and diving and provides 180K triplets drawn from FineGym and FineDiving. Previous CoVR benchmarks focusing on temporal aspect, link each query to a single target segment taken from the same video, limiting practical usefulness. In TF-CoVR, we instead construct each <query, modification> pair by prompting an LLM with the label differences between clips drawn from different videos; every pair is thus associated with multiple valid target videos (3.9 on average), reflecting real-world tasks such as sports-highlight generation. To model these temporal dynamics we propose TF-CoVR-Base, a concise two-stage training framework: (i) pre-train a video encoder on fine-grained action classification to obtain temporally discriminative embeddings; (ii) align the composed query with candidate videos using contrastive learning. We conduct the first comprehensive study of image, video, and general multimodal embedding (GME) models on temporally fine-grained composed retrieval in both zero-shot and fine-tuning regimes. On TF-CoVR, TF-CoVR-Base improves zero-shot mAP@50 from 5.92 (LanguageBind) to 7.51, and after fine-tuning raises the state-of-the-art from 19.83 to 27.22

Environment Setup

cd TF-CoVR/
conda create -n tfcovr python=3.10 -y
conda activate tfcovr
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

Pretrained weights

  1. Please download our stage 1 pretrained weights from google drive here.
  2. Please download our stage 2 pretrained weights from google drive here.

Dataset

Please follow the instructions from DATASET.md to access the dataset.

AIM Embeddings

  1. Please follow the DATASET.md to get access to original videos and converting them to mp4 format.
  2. Update the videos path and path to save embeddings in aim_emb.py
  3. Please run the following command to generate the embeddings:
    cd AIM_Embeddings
    python aim_emb.py model.ckpt.path="stage-1-checkpoint-path"
    

Training

For reproducing results on TF-CoVR using TF-CoVR-Base

Run following command:
python train.py data=finegd-covr-aim trainer=gpu model=aim model/ckpt=aim test=finegd-test-aim

Testing

python test.py data=finegd-covr-aim trainer=gpu model=aim_clip model/ckpt=aim test=finegd-test-aim-clip machine.num_workers=8 trainer.max_epochs=100 model.ckpt.path=/checkpoint/path/

Citation

If you use this dataset and/or this code in your work, please cite our paper:

@misc{gupta2025playreplaycomposedvideo,
      title={From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos}, 
      author={Animesh Gupta and Jay Parmar and Ishan Rajendrakumar Dave and Mubarak Shah},
      year={2025},
      eprint={2506.05274},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.05274}, 
}

🙏 Acknowledgements

This repository has borrowed code from CoVR. We thank the authors for releasing their code.


About

[NeurIPS 2025] From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages