Hands-on tutorials for curating data across all modalities with NeMo Curator. Complete working examples with detailed explanations.
New to NeMo Curator? Start with the Getting Started Guide or try the quickstart.py example to understand core concepts.
| Modality | Description | Key Tutorials |
|---|---|---|
| Text | Natural language processing and curation | Deduplication, Classification, Quality Assessment, Tokenization |
| Image | Computer vision and image processing | Aesthetic Classification, NSFW Detection, Deduplication |
| Video | Video processing and analysis | Clipping, Frame Extraction, Filtering |
| Audio | Speech and audio data curation | FLEURS Dataset Processing |
The quickstart.py demonstrates NeMo Curator's foundational architecture:
- Task: Define data processing objectives
- ProcessingStage: Individual processing steps
- Pipeline: Orchestrate multiple stages
| Category | Links |
|---|---|
| Getting Started | Installation • Configuration • Core Concepts |
| Modality Guides | Text Curation • Image Curation • Video Curation |
| Advanced | Custom Pipelines • Execution Backends • API Reference |
Documentation: Main Docs • GitHub Discussions