CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Dagger is a configuration-driven framework that transforms YAML definitions into Apache Airflow DAGs. It uses dataset lineage (matching inputs/outputs) to automatically build dependency graphs across workflows.

Common Commands

Development Setup

make install-dev      # Create venv, install package in editable mode with dev/test deps
source venv/bin/activate

Testing

make test             # Run all tests with coverage (sets AIRFLOW_HOME automatically)

# Run a single test file
AIRFLOW_HOME=$(pwd)/tests/fixtures/config_finder/root/ ENV=local pytest -s tests/path/to/test_file.py

# Run a specific test
AIRFLOW_HOME=$(pwd)/tests/fixtures/config_finder/root/ ENV=local pytest -s tests/path/to/test_file.py::test_function_name

Linting

make lint             # Run flake8 on dagger and tests directories
black dagger tests    # Format code

Local Airflow Testing

make test-airflow     # Build and start Airflow in Docker (localhost:8080, user: dev_user, pass: dev_user)
make stop-airflow     # Stop Airflow containers

CLI

dagger --help
dagger list-tasks     # Show available task types
dagger list-ios       # Show available IO types
dagger init-pipeline  # Create a new pipeline.yaml
dagger init-task --type=<task_type>  # Add a task configuration
dagger init-io --type=<io_type>      # Add an IO definition
dagger print-graph    # Visualize dependency graph

Architecture

Core Flow

ConfigFinder discovers pipeline directories (each with pipeline.yaml + task YAML files)
ConfigProcessor loads YAML configs with environment variable support
TaskFactory/IOFactory use reflection to instantiate task/IO objects from YAML
TaskGraph builds a 3-layer graph: Pipeline → Task → Dataset nodes
DagCreator traverses the graph and generates Airflow DAGs using OperatorFactory

Key Directories

dagger/pipeline/tasks/ - Task type definitions (DbtTask, SparkTask, AthenaTransformTask, etc.)
dagger/pipeline/ios/ - IO type definitions (S3, Redshift, Athena, Databricks, etc.)
dagger/dag_creator/airflow/operator_creators/ - One creator per task type, translates tasks to Airflow operators
dagger/graph/ - Graph construction from task inputs/outputs
dagger/config_finder/ - YAML discovery and loading
tests/fixtures/config_finder/root/dags/ - Example DAG configurations for testing

Adding a New Task Type

Create task definition in dagger/pipeline/tasks/ (subclass of Task)
Create any needed IOs in dagger/pipeline/ios/ (if new data sources)
Create operator creator in dagger/dag_creator/airflow/operator_creators/
Register in dagger/dag_creator/airflow/operator_factory.py

Configuration Files

pipeline.yaml - Pipeline metadata (owner, schedule, alerts, airflow_parameters)
[taskname].yaml - Task configs (type, inputs, outputs, task-specific params)
dagger_config.yaml - System config (Neo4j, Elasticsearch, Spark settings)

Key Patterns

Factory Pattern: TaskFactory/IOFactory auto-discover types via reflection
Strategy Pattern: OperatorCreator subclasses handle task-specific operator creation
Dataset Aliasing: IO alias() method enables automatic dependency detection across pipelines

Coding Standards

Avoid getattr

Do not use getattr for accessing task or IO properties. Instead, define explicit properties on the class. This ensures:

Type safety and IDE autocompletion
Clear interface contracts
Easier debugging and testing

# Bad - avoid this pattern
value = getattr(self._task, 'some_property', default)

# Good - use explicit properties
value = self._task.some_property  # Property defined on task class

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Common Commands

Development Setup

Testing

Linting

Local Airflow Testing

CLI

Architecture

Core Flow

Key Directories

Adding a New Task Type

Configuration Files

Key Patterns

Coding Standards

Avoid getattr

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Common Commands

Development Setup

Testing

Linting

Local Airflow Testing

CLI

Architecture

Core Flow

Key Directories

Adding a New Task Type

Configuration Files

Key Patterns

Coding Standards

Avoid getattr