Open-source data contract enforcement for modern data teams.
Define contracts in YAML. Sync to dbt. Validate in CI. Block bad data before it reaches production.
Official ODCS Vendor — Listed on the Bitol registry alongside Data Contract CLI, Data Caterer, and DQC.ai.
89% of data teams report pain points with data modeling and ownership. Data contracts are the solution — but the tooling is fragmented:
- dbt tests → SQL only, no formal contract, no pre-ingestion validation
- Great Expectations → verbose Python, steep learning curve, no standard format
- Soda → good YAML checks, but no CI/CD gate, no stakeholder reporting, no ODCS
- Data Contract CLI → ODCS compatible, but no dbt sync, no scoring, no CI gate
DataVow covers the full lifecycle: define → sync dbt → validate → block → report. One tool. One standard.
pip install datavow
# Initialize a project
datavow init my-project
# Define a contract
datavow define contracts/orders.yaml
# Validate data against contracts
datavow validate contracts/orders.yaml --source data/orders.csv
# Generate an HTML report
datavow report contracts/orders.yaml --source data/orders.csv --format html
# Run in CI mode (exit code 1 on critical violations)
datavow ci contracts/ --source data/Define schemas, quality rules, and SLAs in readable YAML. DataVow supports both its own format and native ODCS v3.1 contracts — auto-detected, no config needed.
apiVersion: datavow/v1
kind: DataContract
metadata:
name: orders
version: 1.0.0
owner: data-team@company.com
domain: sales
schema:
type: table
fields:
- name: order_id
type: integer
required: true
unique: true
- name: customer_email
type: string
required: true
pii: true
quality:
rules:
- name: no_negative_totals
type: sql
query: "SELECT COUNT(*) FROM {table} WHERE total_amount < 0"
threshold: 0
severity: CRITICALOne command generates dbt-native tests from your contracts. Works on every dbt adapter — no connector needed.
# Generate dbt tests from contracts
datavow dbt sync contracts/ --dbt-project-dir .
# Generates generic + singular tests from your contracts
# All tagged `datavow` for easy filteringRun data contract validation as an Airflow task with DataVowOperator.
pip install datavow[airflow] # requires apache-airflow>=2.7Standalone DAG:
from datavow.airflow.operators import DataVowOperator
validate = DataVowOperator(
task_id="validate_orders",
contract_path="/data/contracts/orders.yaml",
data_path="/data/bronze/orders.parquet",
on_failure="fail", # fail | warn | skip
fail_on="strained", # strained (<95) | broken (<80) | shattered (<50)
report_format="html",
report_path="/data/reports/orders.html",
)Lakecast YAML (ADR-013):
tasks:
- name: validate_orders
type: datavow
contract: contracts/orders.yaml
source: "{{ params.bronze_path }}/orders.parquet"
fail_on: broken| XCom Key | Description |
|---|---|
vow_score |
Integer 0–100 |
vow_verdict |
Vow Kept / Strained / Broken / Shattered |
violations_critical |
Count of CRITICAL failures |
violations_warning |
Count of WARNING failures |
violations_info |
Count of INFO failures |
contract_name |
Contract name from YAML |
report_path |
Path to generated report (if any) |
K8s executor: Imports are lazy — the scheduler node does not need datavow installed.
Vow Score = 100 - (20 × CRITICAL + 5 × WARNING + 1 × INFO)
95-100 ✅ Vow Kept — fully compliant, ship it
80-94 ⚠️ Vow Strained — action needed
50-79 🔧 Vow Broken — blocking issues
0-49 ❌ Vow Shattered — critical violations
Block bad data automatically. No manual intervention.
GitHub Action (Marketplace):
- uses: ludovicschmetz-stack/datavow-action@v1
with:
contracts: contracts/
source: data/
fail-on: critical
comment-on-pr: "true"dbt on-run-end hook (datavow-dbt):
# dbt_project.yml
on-run-end:
- "{{ datavow_summary() }}"
vars:
datavow_fail_on: broken # block pipeline on Vow Broken or worse# Validate a contract against the ODCS v3.1 JSON Schema
datavow odcs check contracts/orders.yaml
# Convert ODCS native → DataVow format
datavow odcs convert contracts/orders-odcs.yaml -o contracts/orders.yamlDataVow bundles the official ODCS v3.1.0 JSON Schema (2928 lines, Draft 2019-09). No other CLI tool does this.
| Command | Description |
|---|---|
datavow init |
Initialize project with config and example contract |
datavow define |
Create or edit a data contract interactively |
datavow validate |
Validate data against contracts |
datavow report |
Generate HTML or Markdown reports |
datavow ci |
CI mode — validate + exit code 0/1 |
datavow dbt generate |
Auto-generate contracts from dbt manifest |
datavow dbt validate |
Validate against dbt warehouse (via profiles.yml) |
datavow dbt sync |
Generate dbt tests from contracts |
datavow dbt ci |
Full pipeline: sync → dbt test → Vow Score |
datavow odcs check |
Validate contract against ODCS v3.1 JSON Schema |
datavow odcs convert |
Convert ODCS native → DataVow format |
DataVow validates files and databases via DuckDB:
| Source | How |
|---|---|
| CSV, Parquet, JSON, TSV | Direct file validation |
| PostgreSQL | datavow validate --source postgresql://... |
| DuckDB | datavow validate --source path/to/db.duckdb |
For cloud warehouses (Snowflake, BigQuery, Redshift, Databricks), use datavow dbt sync — it generates dbt-native tests that run on your existing dbt adapter. No extra connector needed.
| Persona | Uses | Gets |
|---|---|---|
| Data Engineer | datavow ci in pipeline |
Automated quality gate |
| Analytics Engineer | datavow dbt sync |
One source of truth, zero test duplication |
| Domain Data Owner | YAML contracts in git | Versioned, reviewable data agreements |
| Data Governance | HTML reports | Conformity view across domains |
| Tech Lead | CI gate + Vow Score | No pipeline in prod without a contract |
| Freelance / Consultant | datavow report |
Quality proof attached to every delivery |
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ YAML │ │ DataVow │ │ Outputs │
│ Contracts │───▶│ Engine │───▶│ │
│ (ODCS/DV) │ │ (DuckDB) │ │ ✅ Score │
└─────────────┘ └──────┬───────┘ │ 📊 Report │
│ │ 🚦 Exit 1 │
┌───────────┼──────┐ └─────────────┘
▼ ▼ ▼
CSV/Parquet PostgreSQL dbt
| Package | Description | Version |
|---|---|---|
datavow |
CLI — define, validate, report, CI | v0.3.0 |
datavow-action |
GitHub Action — CI gate | v1.0.0 |
datavow-dbt |
dbt package — on-run-end Vow Score | v1.0.0 |
Contributions are welcome! See CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/ludovicschmetz-stack/datavow.git
cd datavow
python -m venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest # 137 testsApache 2.0 — free forever. Use it, fork it, ship it.
Website · Documentation · PyPI · Issues
