Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

NeMo Curator Tutorials

Hands-on tutorials for curating data across all modalities with NeMo Curator. Complete working examples with detailed explanations.

Quick Start

New to NeMo Curator? Start with the Getting Started Guide or try the quickstart.py example to understand core concepts.

Tutorials by Modality

Modality Description Key Tutorials
Text Natural language processing and curation Deduplication, Classification, Quality Assessment, Tokenization
Image Computer vision and image processing Aesthetic Classification, NSFW Detection, Deduplication
Video Video processing and analysis Clipping, Frame Extraction, Filtering
Audio Speech and audio data curation FLEURS Dataset Processing

Core Concepts Example

The quickstart.py demonstrates NeMo Curator's foundational architecture:

  • Task: Define data processing objectives
  • ProcessingStage: Individual processing steps
  • Pipeline: Orchestrate multiple stages

Documentation Links

Category Links
Getting Started InstallationConfigurationCore Concepts
Modality Guides Text CurationImage CurationVideo Curation
Advanced Custom PipelinesExecution BackendsAPI Reference

Support

Documentation: Main DocsGitHub Discussions