Stop debugging OCR regressions by diffing JSON with your eyeballs.
ocrdrift is a local-first CLI and visual report generator for comparing two OCR or document-extraction runs. It highlights exactly what changed — fields, confidences, token samples, and table rows — and overlays drift directly on the source page.
Built for OCR, VLM, IDP, and document AI engineers who need to answer questions like:
- Which fields broke after a preprocessing change?
- Did my prompt/model upgrade improve extraction or just move errors around?
- Which invoice/receipt/table regions became unstable?
- Where did confidence collapse even when values still matched?
Document AI teams have parsers, OCR engines, benchmarks, and evaluation notebooks.
What they often do not have is a fast, developer-friendly way to debug extraction drift visually.
That gap gets painful when:
- a layout tweak causes one field to disappear
- a VLM “mostly works” but silently corrupts tax or totals
- a table parser changes row alignment after a model upgrade
- a preprocessing step improves one document family and breaks another
ocrdrift is built to be the missing inspection layer between raw extraction output and production confidence.
npm install
npm run demo
open out/demo-report/index.htmlThat generates a side-by-side report comparing a clean invoice extraction against a noisy/rotated variant.
- visual HTML report with side-by-side document pages
- field drift table with changed / missing / added status
- confidence deltas to catch “looks right but got weaker” regressions
- table drift summary for row-level parsing changes
- token sample diff to inspect OCR damage quickly
- Tesseract TSV adapter so existing OCR outputs can be converted immediately
- zero paid APIs and no external services required
After npm run demo, open:
out/demo-report/index.htmlout/demo-report/report.json
The demo intentionally shows:
Industrial→lndustrialLayout Analyzer→Layout AnaIyzer26.82→2G.82- confidence drops around the due date and totals region
So you can see the kind of regression report engineers actually need during pipeline iteration.
npm install
npm linkThen use:
ocrdrift --helpocrdrift demo --out ./out/demo-reportocrdrift compare \
--a ./examples/invoice-baseline.json \
--b ./examples/invoice-rotated.json \
--out ./out/my-reportocrdrift adapt:tesseract \
--input ./examples/sample.tesseract.tsv \
--image ./examples/invoice-baseline.svg \
--out ./out/sample.from-tesseract.jsonocrdrift uses a small open JSON format so you can adapt any OCR or extraction pipeline into it.
{
"schemaVersion": "ocrdrift/v1",
"documentId": "invoice-001",
"engine": "my-ocr-pipeline",
"imagePath": "./page-1.png",
"pages": [{ "number": 1, "width": 1000, "height": 1400, "imagePath": "./page-1.png" }],
"tokens": [
{
"id": "tok-1",
"text": "Invoice",
"confidence": 0.98,
"page": 1,
"bbox": { "x": 100, "y": 80, "width": 120, "height": 32 }
}
],
"fields": {
"invoice_number": {
"value": "INV-2026-0319",
"confidence": 0.96,
"page": 1,
"bbox": { "x": 680, "y": 132, "width": 210, "height": 34 }
}
},
"tables": [
{
"name": "line_items",
"rows": [{ "description": "Vision SDK", "qty": "2", "unit_price": "149.00", "line_total": "298.00" }]
}
]
}- OCR platform teams
- Document parsing / IDP engineers
- invoice and receipt extraction teams
- VLM prompt / pipeline experimenters
- benchmark/evaluation owners who need a visual debugging layer
- tiny install surface
- no hosted backend
- works with existing OCR output after light adaptation
- demoable in under 2 minutes
- obvious value on first run
OCR / VLM / parser output A ─┐
├─> ocrdrift compare ──> normalized diff model ──> HTML report + JSON summary
OCR / VLM / parser output B ─┘
Tesseract TSV ──> ocrdrift adapt:tesseract ──> ocrdrift JSON
Core modules:
src/compare.js— field/table/token diff logicsrc/render.js— self-contained HTML report renderersrc/adapters/tesseract.js— starter adapter for Tesseract TSVsrc/demo-data.js— built-in realistic demo fixtures
- adapters for PaddleOCR, docTR, Azure Form Recognizer, Google Document AI, and generic LLM extraction JSON
- multi-page PDF support
- field-group scoring and rule-based severity tuning
- CLI batch mode for regression suites
- CI summary output for pull requests
- overlay heatmaps for token loss / confidence collapse
Most tooling in document AI is optimized for extraction.
ocrdrift is optimized for inspection, debugging, and regression review.
That makes it useful not only during model evaluation, but during day-to-day engineering work.
ocrdrift/
├─ bin/
├─ src/
│ ├─ adapters/
│ ├─ cli.js
│ ├─ compare.js
│ ├─ demo-data.js
│ └─ render.js
├─ examples/
├─ tests/
├─ docs/
└─ launch/
MIT
