BigOcrPDF

The complete OCR toolkit for Linux — turn scanned PDFs and images into searchable, editable documents.

BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text layers to scanned PDFs, extracts text from images, and provides a full-featured PDF editor — all from a modern, native Linux interface.

Why BigOcrPDF?

AI-Powered OCR — Uses RapidOCR PP-OCRv5 with OpenVINO hardware acceleration for fast, accurate text recognition across 130+ languages
Edit, Merge & Organize PDFs — Reorder pages, rotate, delete, and combine multiple PDFs and images into a single document
Smart Preprocessing — Automatic perspective correction, deskew, dewarping, and illumination normalization — even photos of documents come out clean
Multiple Export Formats — Searchable PDF, PDF/A-2b archival, plain text, and ODF/ODT with layout-aware formatting
Screen Capture OCR — Select any region on screen and instantly extract text
Batch Processing — Process dozens of files at once with checkpoint/resume support
File Manager Integration — Right-click any PDF or image to OCR it directly

Key Features

PDF Editor

Manage your documents before and after OCR — no need for a separate tool.

Drag-and-drop page reordering with thumbnail previews
Rotate & flip pages — left, right, horizontal, and vertical
Delete pages you don't need
Merge files — combine pages from multiple PDFs and images into one document
Create PDFs from images — import JPEG, PNG, TIFF, WebP, RAW photos, and more
EXIF-aware import — automatically applies correct orientation from camera metadata
Zoom control — 50% to 200% thumbnail scaling with keyboard shortcuts
Select pages for OCR — choose exactly which pages to process
Context menu — right-click any page to save as image or PDF
Compress PDF — reduce file size with configurable quality and DPI
Split PDF — by page count or target file size
Undo support — revert page operations with Ctrl+Z
Window size persistence — remembers your preferred dimensions

OCR Engine

State-of-the-art text recognition powered by deep learning.

RapidOCR PP-OCRv5 models with OpenVINO inference (ONNX fallback)
130+ languages across 12 script families: Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, Greek, Devanagari, Tamil, Telugu, Thai, and more
4 precision levels — tune the trade-off between capturing hard-to-read text (tolerates more false positives) and strict recognition (avoids false positives but may miss low-legibility text)
Parallel processing — multi-core batch OCR with automatic worker scaling
Invisible text layer — preserves original page appearance while adding searchable text
Smart detection — auto-identifies image-only vs. mixed-content PDFs
Re-OCR support — replace existing text layers with improved recognition
Right-to-left text — full BiDi support for Arabic and Hebrew via fribidi

Image Preprocessing

Automatically clean up scans and photos before OCR for maximum accuracy.

Perspective correction — 6-mode cascade that straightens photographed documents
Auto deskew — fixes tilted scans using morphological analysis + Hough transform
Baseline dewarp — per-line polynomial fitting to flatten curved text
Orientation detection — auto-correct 90°/180°/270° rotations
Illumination normalization — even out uneven lighting
Scanner effect — LAB-space background normalization
Denoising — bilateral filter and Non-Local Means
Enhance embedded images — apply corrections to images inside mixed-content pages
All toggles individually controllable from educational settings dialogs with visual illustrations

Export Options

Get your text out in the format you need.

Format	Description
Searchable PDF	Original pages with invisible OCR text layer
PDF/A-2b	ISO archival standard with metadata injection (preserves original images)
Custom Quality PDF	Choose JPEG quality: 30%, 50%, 70%, 85%, or 95%
Black & White (JBIG2)	Pure black-and-white output using JBIG2 — the most compact format for text-only documents
Plain Text (.txt)	Extracted text from all pages
ODF/ODT ⚠️	4 modes: formatted + images, images + simple text, formatted text only, or plain text (experimental — formatting quality may vary)

ODF export includes layout analysis: automatic paragraph/heading detection, table detection, image embedding, and proper page breaks. Note: ODF/ODT export is experimental and formatting results may not always be accurate.

Screen Capture & Image OCR

Extract text from anything on your screen.

Region capture — select an area and get the text instantly
Works with: Spectacle (KDE), GNOME Screenshot, Flameshot
Open any image — JPEG, PNG, WebP, TIFF, RAW formats (CR2, DNG, NEF, ARW, and more)
Copy to clipboard with one click
Standalone mode — run bigocrimage for a dedicated image OCR window

Batch Processing & Session Management

Handle large workloads efficiently.

Multi-file queue — add files via drag-and-drop or file chooser, with grid and list views
File information — right-click any file to view PDF metadata, fonts, images, and attachments
Checkpoint/resume — interrupted sessions automatically resume on next launch
Processing history — tracks file sizes, page counts, processing time, and success/failure
Cancel anytime with clean cleanup
Auto-split output — configurable maximum file size (10MB–100MB)
Results page with per-file statistics, text viewer, and export actions

Installation

From Source

git clone https://github.com/biglinux/bigocrpdf.git
cd bigocrpdf
pip install -e .

Dependencies

Package	Purpose
`python >= 3.10`	Runtime
`gtk4`, `libadwaita`	User interface
`python-rapidocr-pp-ocrv5`	OCR engine
`python-rapidocr-openvino`	Hardware-accelerated inference
`poppler-utils`	PDF image extraction (`pdfimages`, `pdfinfo`)
`ghostscript`	PDF/A-2b conversion
`python-opencv`	Image preprocessing
`python-numpy`	Array operations
`python-pillow`	Image format support
`python-odfpy`	ODF/ODT export
`fribidi`	BiDi text reordering (Arabic, Hebrew)

Usage

GUI

bigocrpdf                     # PDF OCR interface
bigocrimage                   # Image OCR window

Command Line

bigocrpdf [OPTIONS] [FILES...]

Options:
  -v, --version     Show version and exit
  -d, --debug       Enable debug logging
  --verbose         Verbose output
  --image-mode      Launch in image OCR mode
  FILES             PDF or image files to open

File Manager Integration

Right-click a PDF → Recognize text in scanned PDF (OCR)
Right-click an image → Extract text from image (OCR)
KDE Dolphin context menu integration included

Screen Capture

Press Print Screen → select a region → export to Extract text from image (OCR).

Interface

UI Highlights

GTK4 + Libadwaita — clean, modern design following GNOME Human Interface Guidelines
Multi-page wizard — Settings → Processing → Results
Educational dialogs — image corrections, output, and advanced settings with SVG illustrations explaining each option
Grid / List view toggle — switch between compact grid and detailed list in the file queue
Context menus — right-click files in the queue or pages in the editor for quick actions
Toast notifications — non-intrusive status feedback
Before/After comparison — track file size changes after OCR
Window size persistence — remembers your preferred dimensions for all windows
Keyboard shortcuts — comprehensive shortcuts for all major actions
28 UI languages — Bulgarian, Chinese, Czech, Croatian, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian

Architecture

graph TD
    A[bigocrpdf] --> B[Application Layer]
    A --> C[Services Layer]
    A --> D[UI Layer]
    A --> E[Utils Layer]

    B --> B1[application.py<br/>Adw.Application entry point]
    B --> B2[window.py<br/>Main PDF OCR window]
    B --> B3[config.py<br/>Constants & configuration]

    C --> C1[processor.py<br/>OCR engine interface]
    C --> C2[screen_capture.py<br/>Screen capture & image OCR]
    C --> C3[export_service.py<br/>PDF / Text / ODF export]
    C --> C4[contour_analysis.py<br/>Document contour detection]
    C --> C5[perspective_correction.py<br/>Geometric correction]
    C --> C6[rapidocr_service/]

    C6 --> C6a[engine.py — Singleton OCR engine]
    C6 --> C6b[ocr_worker.py — Subprocess worker]
    C6 --> C6c[preprocessor.py — Image pipeline]
    C6 --> C6d[rotation.py — Orientation detection]

    D --> D1[image_ocr_window.py<br/>Standalone image OCR]
    D --> D2[settings_page.py<br/>OCR settings]
    D --> D3[conclusion_page.py<br/>Results & export]
    D --> D4[pdf_editor/<br/>PDF page editor]

    E --> E1[odf_exporter.py<br/>ODF document generation]
    E --> E2[layout_analyzer.py<br/>Document structure detection]
    E --> E3[checkpoint_manager.py<br/>Session resume support]

    style A fill:#4A86CF,color:#fff
    style C6 fill:#3776AB,color:#fff

Quality & Testing

311 automated tests covering OCR pipeline, PDF operations, export, preprocessing, editor logic, and utilities
Tested with Python 3.10 through 3.14 — supports the latest Python release
100% i18n coverage — all 28 languages fully translated (604 strings each)
Ruff-enforced code style and linting
WCAG 2.1 Level AA accessibility considerations

License

GPL-3.0-or-later

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
locale		locale
pkgbuild		pkgbuild
src/bigocrpdf		src/bigocrpdf
tests		tests
usr		usr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
default.nix		default.nix
flake.nix		flake.nix
pyproject.toml		pyproject.toml
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigOcrPDF

Why BigOcrPDF?

Key Features

PDF Editor

OCR Engine

Image Preprocessing

Export Options

Screen Capture & Image OCR

Batch Processing & Session Management

Installation

From Source

Dependencies

Usage

GUI

Command Line

File Manager Integration

Screen Capture

Interface

UI Highlights

Architecture

Quality & Testing

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BigOcrPDF

Why BigOcrPDF?

Key Features

PDF Editor

OCR Engine

Image Preprocessing

Export Options

Screen Capture & Image OCR

Batch Processing & Session Management

Installation

From Source

Dependencies

Usage

GUI

Command Line

File Manager Integration

Screen Capture

Interface

UI Highlights

Architecture

Quality & Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages