Skip to content

Byte-Biscuit/simple-smart-ocr

Repository files navigation

Simple Smart OCR

简体中文 | English

A powerful and easy-to-use OCR and QR code recognition desktop application built with Python and PySide6.

License: MIT Python PySide6 uv

📸 Application Interface

Application Main Interface


💡 Project Background

In daily work, text recognition from screenshots is frequently needed. Initially, the following local solutions were attempted:

  • PaddleOCR Local Deployment: Baidu PaddlePaddle's OCR model
  • Hugging Face Models: Download excellent open-source OCR models for local execution

However, practical application revealed:

  • ⚠️ High Resource Consumption: Model execution requires significant memory and computational resources
  • ⚠️ Large Package Size: Applications reach hundreds of MB or even GB when including model files
  • ⚠️ High Usage Barrier: Regular users need to configure CUDA, download models, and perform other complex operations

To make the tool more accessible and lightweight, an API-driven approach was ultimately chosen:

  • Lightweight: Small application size, no need to download large model files
  • Zero Configuration: Regular users only need to provide an API Key to start using
  • High Performance: Leveraging cloud computing power for fast and accurate recognition
  • Free Quota: Baidu Cloud offers 1000 calls/month, Google Gemini offers 60 calls/minute for free

This project focuses on providing an easy-to-use OCR tool, not a local model inference solution. If you need completely offline OCR capabilities, PaddleOCR is recommended.


🎯 Quick Understanding

graph LR
    A[Image Input] --> B{Select Recognition Method}
    B -->|OCR Text Recognition| C[Choose Engine]
    B -->|QR Code Recognition| D[Offline Decoder]

    C --> E[Baidu Cloud OCR API]
    C --> F[Google Gemini LLM]
    C --> G[Other LLM]

    E --> H[Recognition Result]
    F --> H
    G --> H
    D --> H

    H --> I[Auto Copy to Clipboard]

    style A fill:#e1f5ff
    style H fill:#c8e6c9
    style I fill:#fff9c4
Loading

✨ How to Use

Download and Run

  1. Visit the Releases page
  2. Download the latest version package (simple-smart-ocr-vX.X.X.zip)
  3. Extract and run SimpleSmartOCR.exe

💡 Tip: QR code recognition works out of the box, no configuration needed!


Configure API Key (Required for OCR)

flowchart LR
    A[Start] --> B{Choose Engine}

    B -->|Baidu Cloud OCR| C[fa:fa-cloud Apply for Baidu API]
    B -->|Google Gemini| D[fa:fa-brain Apply for Gemini API]

    C --> E[1000 free calls/month]
    D --> F[60 free calls/minute]

    E --> G[Enter in Settings]
    F --> G

    G --> H[fa:fa-check Start Recognition]

    style A fill:#e3f2fd
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style H fill:#fff9c4
Loading

📂 Directory Monitoring

Features

After clicking Select Image Directory, the application automatically monitors file changes in that directory:

  • File List: Image files are sorted by creation date in descending order (newest first)
  • Image Preview: Single or double-click an image to view it in the preview area; click the preview to enlarge
  • Execute Recognition: After double-clicking to select an image, click the corresponding recognition button
    • Text Recognition: Extract text content from images
    • QR Code Recognition: Decode QR code information from images
  • Auto Monitoring: New images added to the directory automatically appear in the file list

🤝 Contributing

Community contributions are welcome! You can participate by:

  • 🐛 Submitting bug reports
  • 💡 Proposing new features
  • 🔧 Submitting code improvements
  • 📖 Improving documentation
  • 🌍 Adding new language translations
  • ⭐ Starring the project to show support

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A powerful and user-friendly OCR and QR code recognition desktop application, built using Python and PySide6.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors