Complete API reference for docling-graph modules, classes, and functions.
What's Included:
- Pipeline API
- Configuration classes
- Protocol definitions
- Exception hierarchy
- Converter classes
- Extractor classes
- Exporter classes
- LLM client interfaces
Pipeline API
Main entry point for document processing.
run_pipeline()- Execute the pipeline- Pipeline stages and orchestration
Configuration API
Type-safe configuration classes.
PipelineConfig- Main configuration classModelConfig- Model configurationLLMConfig/VLMConfig- Backend configs
Protocols
Protocol definitions for type-safe interfaces.
ExtractionBackendProtocol- VLM backendsTextExtractionBackendProtocol- LLM backendsLLMClientProtocol- LLM clientsExtractorProtocol- Extraction strategies
Exceptions
Exception hierarchy and error handling.
DoclingGraphError- Base exceptionConfigurationError- Config errorsClientError- API errorsExtractionError- Extraction failuresValidationError- Data validationGraphError- Graph operationsPipelineError- Pipeline execution
Converters
Graph conversion from Pydantic models.
GraphConverter- Convert models to graphsNodeIDRegistry- Stable node IDs- Graph construction utilities
Extractors
Document extraction strategies.
OneToOne- Per-page extractionManyToOne- Consolidated extraction- Backend implementations
- Chunking and batching
Exporters
Graph export formats.
CSVExporter- Neo4j-compatible CSVCypherExporter- Cypher scriptsJSONExporter- JSON formatDoclingExporter- Docling documents
LLM Clients
LiteLLM-backed client for all LLM calls.
LiteLLMClient- Provider-agnostic client
docling_graph/
├── __init__.py # Public API exports
├── pipeline.py # run_pipeline()
├── config.py # PipelineConfig
├── protocols.py # Protocol definitions
├── exceptions.py # Exception hierarchy
│
├── core/ # Core processing
│ ├── converters/ # Graph conversion
│ ├── extractors/ # Extraction strategies
│ ├── exporters/ # Export formats
│ └── visualizers/ # Visualization
│
├── llm_clients/ # LLM integrations
│ ├── base.py
│ ├── ollama.py
│ ├── mistral.py
│ ├── openai.py
│ ├── gemini.py
│ └── vllm.py
│
└── pipeline/ # Pipeline orchestration
├── context.py
├── stages.py
└── orchestrator.py
# Main API
from docling_graph import run_pipeline, PipelineConfig
# Configuration classes
from docling_graph import (
LLMConfig,
VLMConfig,
ModelConfig,
ModelsConfig
)# Protocols
from docling_graph.protocols import (
ExtractionBackendProtocol,
TextExtractionBackendProtocol,
LLMClientProtocol
)
# Exceptions
from docling_graph.exceptions import (
DoclingGraphError,
ConfigurationError,
ClientError,
ExtractionError,
ValidationError,
GraphError,
PipelineError
)
# Converters
from docling_graph.core.converters import GraphConverter
# Extractors
from docling_graph.core.extractors import OneToOne, ManyToOne
# Exporters
from docling_graph.core.exporters import (
CSVExporter,
CypherExporter,
JSONExporter
)from typing import Any, Dict, List, Type, Union
from pathlib import Path
from pydantic import BaseModel
import networkx as nx
# Configuration
config: PipelineConfig
config_dict: Dict[str, Any]
# Templates
template: Type[BaseModel]
model_instance: BaseModel
models: List[BaseModel]
# Graphs
graph: nx.MultiDiGraph
# Paths
source: Union[str, Path]
output_dir: Pathimport docling_graph
# Get version
print(docling_graph.__version__) # e.g., "v1.2.0"
# Check available exports
print(docling_graph.__all__)
# ['run_pipeline', 'PipelineConfig', 'LLMConfig', ...]These APIs are stable and safe to use:
run_pipeline()PipelineConfig- All configuration classes
- Exception hierarchy
- Public protocols
These are internal and may change:
pipeline.orchestratorinternalscore.extractors.backendsinternalscore.utilsmodules
These are experimental:
- Custom stage APIs
- Advanced pipeline customization
Deprecated features will:
- Be marked with
@deprecateddecorator - Emit
DeprecationWarning - Be documented in CHANGELOG
- Be removed after 2 minor versions
Example:
import warnings
@deprecated("Use PipelineConfig instead")
def old_function():
warnings.warn(
"old_function is deprecated, use PipelineConfig",
DeprecationWarning,
stacklevel=2
)All public APIs use type hints:
def run_pipeline(config: Union[PipelineConfig, Dict[str, Any]]) -> PipelineContext:
"""Type-safe function signature; returns pipeline context with graph and results."""
passConfiguration uses Pydantic for validation:
config = PipelineConfig(
source="doc.pdf",
template="templates.MyTemplate",
backend="llm" # Validated at runtime
)Extensibility through protocols:
class MyBackend(TextExtractionBackendProtocol):
"""Custom backend implementing protocol."""
passClear error hierarchy:
try:
run_pipeline(config)
except ConfigurationError as e:
print(f"Config error: {e.message}")
print(f"Details: {e.details}")from docling_graph import PipelineConfig
config = PipelineConfig(
source="document.pdf",
template="templates.MyTemplate",
backend="llm",
inference="local"
)
run_pipeline(config)from docling_graph import run_pipeline
from docling_graph.exceptions import ExtractionError
config = {
"source": "document.pdf",
"template": "templates.MyTemplate",
"backend": "llm",
"inference": "remote",
"model_override": "mistral-small-latest",
"use_chunking": True,
"export_format": "cypher"
}
try:
run_pipeline(config)
except ExtractionError as e:
print(f"Extraction failed: {e}")- Pipeline API → - Main entry point
- Configuration API → - Configuration classes
- Protocols → - Protocol definitions
- Exceptions → - Exception hierarchy
- Converters → - Graph conversion
- Extractors → - Extraction strategies
- Exporters → - Export formats
- LLM Clients → - LLM integrations
See Development Guide for:
- Adding new APIs
- API design guidelines
- Documentation standards
- Testing requirements