Skip to content

techwithaijack-beep/ocrdrift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ocrdrift

Stop debugging OCR regressions by diffing JSON with your eyeballs.

ocrdrift is a local-first CLI and visual report generator for comparing two OCR or document-extraction runs. It highlights exactly what changed — fields, confidences, token samples, and table rows — and overlays drift directly on the source page.

Built for OCR, VLM, IDP, and document AI engineers who need to answer questions like:

  • Which fields broke after a preprocessing change?
  • Did my prompt/model upgrade improve extraction or just move errors around?
  • Which invoice/receipt/table regions became unstable?
  • Where did confidence collapse even when values still matched?

Why this exists

Document AI teams have parsers, OCR engines, benchmarks, and evaluation notebooks.

What they often do not have is a fast, developer-friendly way to debug extraction drift visually.

That gap gets painful when:

  • a layout tweak causes one field to disappear
  • a VLM “mostly works” but silently corrupts tax or totals
  • a table parser changes row alignment after a model upgrade
  • a preprocessing step improves one document family and breaks another

ocrdrift is built to be the missing inspection layer between raw extraction output and production confidence.

First-glance demo

npm install
npm run demo
open out/demo-report/index.html

That generates a side-by-side report comparing a clean invoice extraction against a noisy/rotated variant.

What you get

  • visual HTML report with side-by-side document pages
  • field drift table with changed / missing / added status
  • confidence deltas to catch “looks right but got weaker” regressions
  • table drift summary for row-level parsing changes
  • token sample diff to inspect OCR damage quickly
  • Tesseract TSV adapter so existing OCR outputs can be converted immediately
  • zero paid APIs and no external services required

Example report

ocrdrift visual report

After npm run demo, open:

  • out/demo-report/index.html
  • out/demo-report/report.json

The demo intentionally shows:

  • Industriallndustrial
  • Layout AnalyzerLayout AnaIyzer
  • 26.822G.82
  • confidence drops around the due date and totals region

So you can see the kind of regression report engineers actually need during pipeline iteration.

Install

npm install
npm link

Then use:

ocrdrift --help

CLI

1) Run the built-in demo

ocrdrift demo --out ./out/demo-report

2) Compare two extraction JSON files

ocrdrift compare \
  --a ./examples/invoice-baseline.json \
  --b ./examples/invoice-rotated.json \
  --out ./out/my-report

3) Convert Tesseract TSV into ocrdrift JSON

ocrdrift adapt:tesseract \
  --input ./examples/sample.tesseract.tsv \
  --image ./examples/invoice-baseline.svg \
  --out ./out/sample.from-tesseract.json

Input schema

ocrdrift uses a small open JSON format so you can adapt any OCR or extraction pipeline into it.

{
  "schemaVersion": "ocrdrift/v1",
  "documentId": "invoice-001",
  "engine": "my-ocr-pipeline",
  "imagePath": "./page-1.png",
  "pages": [{ "number": 1, "width": 1000, "height": 1400, "imagePath": "./page-1.png" }],
  "tokens": [
    {
      "id": "tok-1",
      "text": "Invoice",
      "confidence": 0.98,
      "page": 1,
      "bbox": { "x": 100, "y": 80, "width": 120, "height": 32 }
    }
  ],
  "fields": {
    "invoice_number": {
      "value": "INV-2026-0319",
      "confidence": 0.96,
      "page": 1,
      "bbox": { "x": 680, "y": 132, "width": 210, "height": 34 }
    }
  },
  "tables": [
    {
      "name": "line_items",
      "rows": [{ "description": "Vision SDK", "qty": "2", "unit_price": "149.00", "line_total": "298.00" }]
    }
  ]
}

Who should use this

  • OCR platform teams
  • Document parsing / IDP engineers
  • invoice and receipt extraction teams
  • VLM prompt / pipeline experimenters
  • benchmark/evaluation owners who need a visual debugging layer

Why developers might adopt it quickly

  • tiny install surface
  • no hosted backend
  • works with existing OCR output after light adaptation
  • demoable in under 2 minutes
  • obvious value on first run

Architecture

OCR / VLM / parser output A ─┐
                             ├─> ocrdrift compare ──> normalized diff model ──> HTML report + JSON summary
OCR / VLM / parser output B ─┘

Tesseract TSV ──> ocrdrift adapt:tesseract ──> ocrdrift JSON

Core modules:

  • src/compare.js — field/table/token diff logic
  • src/render.js — self-contained HTML report renderer
  • src/adapters/tesseract.js — starter adapter for Tesseract TSV
  • src/demo-data.js — built-in realistic demo fixtures

Roadmap

  • adapters for PaddleOCR, docTR, Azure Form Recognizer, Google Document AI, and generic LLM extraction JSON
  • multi-page PDF support
  • field-group scoring and rule-based severity tuning
  • CLI batch mode for regression suites
  • CI summary output for pull requests
  • overlay heatmaps for token loss / confidence collapse

Competitive angle

Most tooling in document AI is optimized for extraction.

ocrdrift is optimized for inspection, debugging, and regression review.

That makes it useful not only during model evaluation, but during day-to-day engineering work.

Repo structure

ocrdrift/
├─ bin/
├─ src/
│  ├─ adapters/
│  ├─ cli.js
│  ├─ compare.js
│  ├─ demo-data.js
│  └─ render.js
├─ examples/
├─ tests/
├─ docs/
└─ launch/

License

MIT

About

Visual regression debugger for OCR and document extraction pipelines

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors