Skip to content

ludovicschmetz-stack/datavow

Repository files navigation

DataVow

Trust Your Data. Know Why You Can't.

Open-source data contract enforcement for modern data teams.
Define contracts in YAML. Sync to dbt. Validate in CI. Block bad data before it reaches production.

PyPI Python CI License GitHub Action ODCS Vendor


Official ODCS Vendor — Listed on the Bitol registry alongside Data Contract CLI, Data Caterer, and DQC.ai.

The problem

89% of data teams report pain points with data modeling and ownership. Data contracts are the solution — but the tooling is fragmented:

  • dbt tests → SQL only, no formal contract, no pre-ingestion validation
  • Great Expectations → verbose Python, steep learning curve, no standard format
  • Soda → good YAML checks, but no CI/CD gate, no stakeholder reporting, no ODCS
  • Data Contract CLI → ODCS compatible, but no dbt sync, no scoring, no CI gate

DataVow covers the full lifecycle: define → sync dbt → validate → block → report. One tool. One standard.

Quick start

pip install datavow

# Initialize a project
datavow init my-project

# Define a contract
datavow define contracts/orders.yaml

# Validate data against contracts
datavow validate contracts/orders.yaml --source data/orders.csv

# Generate an HTML report
datavow report contracts/orders.yaml --source data/orders.csv --format html

# Run in CI mode (exit code 1 on critical violations)
datavow ci contracts/ --source data/

Key features

YAML-first contracts (ODCS v3.1 native)

Define schemas, quality rules, and SLAs in readable YAML. DataVow supports both its own format and native ODCS v3.1 contracts — auto-detected, no config needed.

apiVersion: datavow/v1
kind: DataContract
metadata:
  name: orders
  version: 1.0.0
  owner: data-team@company.com
  domain: sales

schema:
  type: table
  fields:
    - name: order_id
      type: integer
      required: true
      unique: true
    - name: customer_email
      type: string
      required: true
      pii: true

quality:
  rules:
    - name: no_negative_totals
      type: sql
      query: "SELECT COUNT(*) FROM {table} WHERE total_amount < 0"
      threshold: 0
      severity: CRITICAL

datavow dbt sync — the killer feature

One command generates dbt-native tests from your contracts. Works on every dbt adapter — no connector needed.

# Generate dbt tests from contracts
datavow dbt sync contracts/ --dbt-project-dir .

# Generates generic + singular tests from your contracts
# All tagged `datavow` for easy filtering

Airflow Integration

Run data contract validation as an Airflow task with DataVowOperator.

pip install datavow[airflow]   # requires apache-airflow>=2.7

Standalone DAG:

from datavow.airflow.operators import DataVowOperator

validate = DataVowOperator(
    task_id="validate_orders",
    contract_path="/data/contracts/orders.yaml",
    data_path="/data/bronze/orders.parquet",
    on_failure="fail",       # fail | warn | skip
    fail_on="strained",     # strained (<95) | broken (<80) | shattered (<50)
    report_format="html",
    report_path="/data/reports/orders.html",
)

Lakecast YAML (ADR-013):

tasks:
  - name: validate_orders
    type: datavow
    contract: contracts/orders.yaml
    source: "{{ params.bronze_path }}/orders.parquet"
    fail_on: broken
XCom Key Description
vow_score Integer 0–100
vow_verdict Vow Kept / Strained / Broken / Shattered
violations_critical Count of CRITICAL failures
violations_warning Count of WARNING failures
violations_info Count of INFO failures
contract_name Contract name from YAML
report_path Path to generated report (if any)

K8s executor: Imports are lazy — the scheduler node does not need datavow installed.

Vow Score — every validation renders a verdict

Vow Score = 100 - (20 × CRITICAL + 5 × WARNING + 1 × INFO)

  95-100  ✅ Vow Kept      — fully compliant, ship it
  80-94   ⚠️ Vow Strained  — action needed
  50-79   🔧 Vow Broken    — blocking issues
   0-49   ❌ Vow Shattered  — critical violations

CI pipeline gating

Block bad data automatically. No manual intervention.

GitHub Action (Marketplace):

- uses: ludovicschmetz-stack/datavow-action@v1
  with:
    contracts: contracts/
    source: data/
    fail-on: critical
    comment-on-pr: "true"

dbt on-run-end hook (datavow-dbt):

# dbt_project.yml
on-run-end:
  - "{{ datavow_summary() }}"

vars:
  datavow_fail_on: broken  # block pipeline on Vow Broken or worse

ODCS v3.1 — validate against the official standard

# Validate a contract against the ODCS v3.1 JSON Schema
datavow odcs check contracts/orders.yaml

# Convert ODCS native → DataVow format
datavow odcs convert contracts/orders-odcs.yaml -o contracts/orders.yaml

DataVow bundles the official ODCS v3.1.0 JSON Schema (2928 lines, Draft 2019-09). No other CLI tool does this.

Full command reference

Command Description
datavow init Initialize project with config and example contract
datavow define Create or edit a data contract interactively
datavow validate Validate data against contracts
datavow report Generate HTML or Markdown reports
datavow ci CI mode — validate + exit code 0/1
datavow dbt generate Auto-generate contracts from dbt manifest
datavow dbt validate Validate against dbt warehouse (via profiles.yml)
datavow dbt sync Generate dbt tests from contracts
datavow dbt ci Full pipeline: sync → dbt test → Vow Score
datavow odcs check Validate contract against ODCS v3.1 JSON Schema
datavow odcs convert Convert ODCS native → DataVow format

Data sources

DataVow validates files and databases via DuckDB:

Source How
CSV, Parquet, JSON, TSV Direct file validation
PostgreSQL datavow validate --source postgresql://...
DuckDB datavow validate --source path/to/db.duckdb

For cloud warehouses (Snowflake, BigQuery, Redshift, Databricks), use datavow dbt sync — it generates dbt-native tests that run on your existing dbt adapter. No extra connector needed.

Built for your whole team

Persona Uses Gets
Data Engineer datavow ci in pipeline Automated quality gate
Analytics Engineer datavow dbt sync One source of truth, zero test duplication
Domain Data Owner YAML contracts in git Versioned, reviewable data agreements
Data Governance HTML reports Conformity view across domains
Tech Lead CI gate + Vow Score No pipeline in prod without a contract
Freelance / Consultant datavow report Quality proof attached to every delivery

Architecture

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  YAML       │    │   DataVow    │    │   Outputs   │
│  Contracts  │───▶│   Engine     │───▶│             │
│  (ODCS/DV)  │    │   (DuckDB)   │    │  ✅ Score   │
└─────────────┘    └──────┬───────┘    │  📊 Report  │
                          │            │  🚦 Exit 1  │
              ┌───────────┼──────┐     └─────────────┘
              ▼           ▼      ▼
          CSV/Parquet  PostgreSQL  dbt

Ecosystem

Package Description Version
datavow CLI — define, validate, report, CI v0.3.0
datavow-action GitHub Action — CI gate v1.0.0
datavow-dbt dbt package — on-run-end Vow Score v1.0.0

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/ludovicschmetz-stack/datavow.git
cd datavow
python -m venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest  # 137 tests

License

Apache 2.0 — free forever. Use it, fork it, ship it.


Website · Documentation · PyPI · Issues

About

Open-source data contract enforcement — define, sync dbt, validate, block, report. Built on ODCS v3.1 + DuckDB.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors