This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Dagger is a configuration-driven framework that transforms YAML definitions into Apache Airflow DAGs. It uses dataset lineage (matching inputs/outputs) to automatically build dependency graphs across workflows.
make install-dev # Create venv, install package in editable mode with dev/test deps
source venv/bin/activatemake test # Run all tests with coverage (sets AIRFLOW_HOME automatically)
# Run a single test file
AIRFLOW_HOME=$(pwd)/tests/fixtures/config_finder/root/ ENV=local pytest -s tests/path/to/test_file.py
# Run a specific test
AIRFLOW_HOME=$(pwd)/tests/fixtures/config_finder/root/ ENV=local pytest -s tests/path/to/test_file.py::test_function_namemake lint # Run flake8 on dagger and tests directories
black dagger tests # Format codemake test-airflow # Build and start Airflow in Docker (localhost:8080, user: dev_user, pass: dev_user)
make stop-airflow # Stop Airflow containersdagger --help
dagger list-tasks # Show available task types
dagger list-ios # Show available IO types
dagger init-pipeline # Create a new pipeline.yaml
dagger init-task --type=<task_type> # Add a task configuration
dagger init-io --type=<io_type> # Add an IO definition
dagger print-graph # Visualize dependency graph- ConfigFinder discovers pipeline directories (each with
pipeline.yaml+ task YAML files) - ConfigProcessor loads YAML configs with environment variable support
- TaskFactory/IOFactory use reflection to instantiate task/IO objects from YAML
- TaskGraph builds a 3-layer graph: Pipeline → Task → Dataset nodes
- DagCreator traverses the graph and generates Airflow DAGs using OperatorFactory
dagger/pipeline/tasks/- Task type definitions (DbtTask, SparkTask, AthenaTransformTask, etc.)dagger/pipeline/ios/- IO type definitions (S3, Redshift, Athena, Databricks, etc.)dagger/dag_creator/airflow/operator_creators/- One creator per task type, translates tasks to Airflow operatorsdagger/graph/- Graph construction from task inputs/outputsdagger/config_finder/- YAML discovery and loadingtests/fixtures/config_finder/root/dags/- Example DAG configurations for testing
- Create task definition in
dagger/pipeline/tasks/(subclass of Task) - Create any needed IOs in
dagger/pipeline/ios/(if new data sources) - Create operator creator in
dagger/dag_creator/airflow/operator_creators/ - Register in
dagger/dag_creator/airflow/operator_factory.py
pipeline.yaml- Pipeline metadata (owner, schedule, alerts, airflow_parameters)[taskname].yaml- Task configs (type, inputs, outputs, task-specific params)dagger_config.yaml- System config (Neo4j, Elasticsearch, Spark settings)
- Factory Pattern: TaskFactory/IOFactory auto-discover types via reflection
- Strategy Pattern: OperatorCreator subclasses handle task-specific operator creation
- Dataset Aliasing: IO
alias()method enables automatic dependency detection across pipelines
Do not use getattr for accessing task or IO properties. Instead, define explicit properties on the class. This ensures:
- Type safety and IDE autocompletion
- Clear interface contracts
- Easier debugging and testing
# Bad - avoid this pattern
value = getattr(self._task, 'some_property', default)
# Good - use explicit properties
value = self._task.some_property # Property defined on task class