Referring Expression Comprehension: Grounding natural language in visual scenes using Words as Classifiers approach
-
Updated
Feb 7, 2026 - Python
Referring Expression Comprehension: Grounding natural language in visual scenes using Words as Classifiers approach
A modular multimodal framework that generates object masks from text prompts. It uses a lightweight cross-modal decoder to fuse features from interchangeable vision and language backbones.
Add a description, image, and links to the refcoco topic page so that developers can more easily learn about it.
To associate your repository with the refcoco topic, visit your repo's landing page and select "manage topics."