Skip to content

biglinux/bigocrpdf

Repository files navigation

BigOcrPDF

The complete OCR toolkit for Linux — turn scanned PDFs and images into searchable, editable documents.

License: GPL-3.0 Version: 3.0.0 Python 3.10+ GTK4 + Libadwaita Tests: 311


BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text layers to scanned PDFs, extracts text from images, and provides a full-featured PDF editor — all from a modern, native Linux interface.

Why BigOcrPDF?

  • AI-Powered OCR — Uses RapidOCR PP-OCRv5 with OpenVINO hardware acceleration for fast, accurate text recognition across 130+ languages
  • Edit, Merge & Organize PDFs — Reorder pages, rotate, delete, and combine multiple PDFs and images into a single document
  • Smart Preprocessing — Automatic perspective correction, deskew, dewarping, and illumination normalization — even photos of documents come out clean
  • Multiple Export Formats — Searchable PDF, PDF/A-2b archival, plain text, and ODF/ODT with layout-aware formatting
  • Screen Capture OCR — Select any region on screen and instantly extract text
  • Batch Processing — Process dozens of files at once with checkpoint/resume support
  • File Manager Integration — Right-click any PDF or image to OCR it directly

Key Features

PDF Editor

Manage your documents before and after OCR — no need for a separate tool.

  • Drag-and-drop page reordering with thumbnail previews
  • Rotate & flip pages — left, right, horizontal, and vertical
  • Delete pages you don't need
  • Merge files — combine pages from multiple PDFs and images into one document
  • Create PDFs from images — import JPEG, PNG, TIFF, WebP, RAW photos, and more
  • EXIF-aware import — automatically applies correct orientation from camera metadata
  • Zoom control — 50% to 200% thumbnail scaling with keyboard shortcuts
  • Select pages for OCR — choose exactly which pages to process
  • Context menu — right-click any page to save as image or PDF
  • Compress PDF — reduce file size with configurable quality and DPI
  • Split PDF — by page count or target file size
  • Undo support — revert page operations with Ctrl+Z
  • Window size persistence — remembers your preferred dimensions

OCR Engine

State-of-the-art text recognition powered by deep learning.

  • RapidOCR PP-OCRv5 models with OpenVINO inference (ONNX fallback)
  • 130+ languages across 12 script families: Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, Greek, Devanagari, Tamil, Telugu, Thai, and more
  • 4 precision levels — tune the trade-off between capturing hard-to-read text (tolerates more false positives) and strict recognition (avoids false positives but may miss low-legibility text)
  • Parallel processing — multi-core batch OCR with automatic worker scaling
  • Invisible text layer — preserves original page appearance while adding searchable text
  • Smart detection — auto-identifies image-only vs. mixed-content PDFs
  • Re-OCR support — replace existing text layers with improved recognition
  • Right-to-left text — full BiDi support for Arabic and Hebrew via fribidi

Image Preprocessing

Automatically clean up scans and photos before OCR for maximum accuracy.

  • Perspective correction — 6-mode cascade that straightens photographed documents
  • Auto deskew — fixes tilted scans using morphological analysis + Hough transform
  • Baseline dewarp — per-line polynomial fitting to flatten curved text
  • Orientation detection — auto-correct 90°/180°/270° rotations
  • Illumination normalization — even out uneven lighting
  • Scanner effect — LAB-space background normalization
  • Denoising — bilateral filter and Non-Local Means
  • Enhance embedded images — apply corrections to images inside mixed-content pages
  • All toggles individually controllable from educational settings dialogs with visual illustrations

Export Options

Get your text out in the format you need.

Format Description
Searchable PDF Original pages with invisible OCR text layer
PDF/A-2b ISO archival standard with metadata injection (preserves original images)
Custom Quality PDF Choose JPEG quality: 30%, 50%, 70%, 85%, or 95%
Black & White (JBIG2) Pure black-and-white output using JBIG2 — the most compact format for text-only documents
Plain Text (.txt) Extracted text from all pages
ODF/ODT ⚠️ 4 modes: formatted + images, images + simple text, formatted text only, or plain text (experimental — formatting quality may vary)

ODF export includes layout analysis: automatic paragraph/heading detection, table detection, image embedding, and proper page breaks. Note: ODF/ODT export is experimental and formatting results may not always be accurate.

Screen Capture & Image OCR

Extract text from anything on your screen.

  • Region capture — select an area and get the text instantly
  • Works with: Spectacle (KDE), GNOME Screenshot, Flameshot
  • Open any image — JPEG, PNG, WebP, TIFF, RAW formats (CR2, DNG, NEF, ARW, and more)
  • Copy to clipboard with one click
  • Standalone mode — run bigocrimage for a dedicated image OCR window

Batch Processing & Session Management

Handle large workloads efficiently.

  • Multi-file queue — add files via drag-and-drop or file chooser, with grid and list views
  • File information — right-click any file to view PDF metadata, fonts, images, and attachments
  • Checkpoint/resume — interrupted sessions automatically resume on next launch
  • Processing history — tracks file sizes, page counts, processing time, and success/failure
  • Cancel anytime with clean cleanup
  • Auto-split output — configurable maximum file size (10MB–100MB)
  • Results page with per-file statistics, text viewer, and export actions

Installation

From Source

git clone https://github.com/biglinux/bigocrpdf.git
cd bigocrpdf
pip install -e .

Dependencies

Package Purpose
python >= 3.10 Runtime
gtk4, libadwaita User interface
python-rapidocr-pp-ocrv5 OCR engine
python-rapidocr-openvino Hardware-accelerated inference
poppler-utils PDF image extraction (pdfimages, pdfinfo)
ghostscript PDF/A-2b conversion
python-opencv Image preprocessing
python-numpy Array operations
python-pillow Image format support
python-odfpy ODF/ODT export
fribidi BiDi text reordering (Arabic, Hebrew)

Usage

GUI

bigocrpdf                     # PDF OCR interface
bigocrimage                   # Image OCR window

Command Line

bigocrpdf [OPTIONS] [FILES...]

Options:
  -v, --version     Show version and exit
  -d, --debug       Enable debug logging
  --verbose         Verbose output
  --image-mode      Launch in image OCR mode
  FILES             PDF or image files to open

File Manager Integration

  • Right-click a PDFRecognize text in scanned PDF (OCR)
  • Right-click an imageExtract text from image (OCR)
  • KDE Dolphin context menu integration included

Screen Capture

Press Print Screen → select a region → export to Extract text from image (OCR).


Interface

UI Highlights

  • GTK4 + Libadwaita — clean, modern design following GNOME Human Interface Guidelines
  • Multi-page wizard — Settings → Processing → Results
  • Educational dialogs — image corrections, output, and advanced settings with SVG illustrations explaining each option
  • Grid / List view toggle — switch between compact grid and detailed list in the file queue
  • Context menus — right-click files in the queue or pages in the editor for quick actions
  • Toast notifications — non-intrusive status feedback
  • Before/After comparison — track file size changes after OCR
  • Window size persistence — remembers your preferred dimensions for all windows
  • Keyboard shortcuts — comprehensive shortcuts for all major actions
  • 28 UI languages — Bulgarian, Chinese, Czech, Croatian, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian

Architecture

graph TD
    A[bigocrpdf] --> B[Application Layer]
    A --> C[Services Layer]
    A --> D[UI Layer]
    A --> E[Utils Layer]

    B --> B1[application.py<br/>Adw.Application entry point]
    B --> B2[window.py<br/>Main PDF OCR window]
    B --> B3[config.py<br/>Constants & configuration]

    C --> C1[processor.py<br/>OCR engine interface]
    C --> C2[screen_capture.py<br/>Screen capture & image OCR]
    C --> C3[export_service.py<br/>PDF / Text / ODF export]
    C --> C4[contour_analysis.py<br/>Document contour detection]
    C --> C5[perspective_correction.py<br/>Geometric correction]
    C --> C6[rapidocr_service/]

    C6 --> C6a[engine.py — Singleton OCR engine]
    C6 --> C6b[ocr_worker.py — Subprocess worker]
    C6 --> C6c[preprocessor.py — Image pipeline]
    C6 --> C6d[rotation.py — Orientation detection]

    D --> D1[image_ocr_window.py<br/>Standalone image OCR]
    D --> D2[settings_page.py<br/>OCR settings]
    D --> D3[conclusion_page.py<br/>Results & export]
    D --> D4[pdf_editor/<br/>PDF page editor]

    E --> E1[odf_exporter.py<br/>ODF document generation]
    E --> E2[layout_analyzer.py<br/>Document structure detection]
    E --> E3[checkpoint_manager.py<br/>Session resume support]

    style A fill:#4A86CF,color:#fff
    style C6 fill:#3776AB,color:#fff
Loading

Quality & Testing

  • 311 automated tests covering OCR pipeline, PDF operations, export, preprocessing, editor logic, and utilities
  • Tested with Python 3.10 through 3.14 — supports the latest Python release
  • 100% i18n coverage — all 28 languages fully translated (604 strings each)
  • Ruff-enforced code style and linting
  • WCAG 2.1 Level AA accessibility considerations

License

GPL-3.0-or-later

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages