简体中文 | English
A powerful and easy-to-use OCR and QR code recognition desktop application built with Python and PySide6.
In daily work, text recognition from screenshots is frequently needed. Initially, the following local solutions were attempted:
- PaddleOCR Local Deployment: Baidu PaddlePaddle's OCR model
- Hugging Face Models: Download excellent open-source OCR models for local execution
However, practical application revealed:
⚠️ High Resource Consumption: Model execution requires significant memory and computational resources⚠️ Large Package Size: Applications reach hundreds of MB or even GB when including model files⚠️ High Usage Barrier: Regular users need to configure CUDA, download models, and perform other complex operations
To make the tool more accessible and lightweight, an API-driven approach was ultimately chosen:
- ✅ Lightweight: Small application size, no need to download large model files
- ✅ Zero Configuration: Regular users only need to provide an API Key to start using
- ✅ High Performance: Leveraging cloud computing power for fast and accurate recognition
- ✅ Free Quota: Baidu Cloud offers 1000 calls/month, Google Gemini offers 60 calls/minute for free
This project focuses on providing an easy-to-use OCR tool, not a local model inference solution. If you need completely offline OCR capabilities, PaddleOCR is recommended.
graph LR
A[Image Input] --> B{Select Recognition Method}
B -->|OCR Text Recognition| C[Choose Engine]
B -->|QR Code Recognition| D[Offline Decoder]
C --> E[Baidu Cloud OCR API]
C --> F[Google Gemini LLM]
C --> G[Other LLM]
E --> H[Recognition Result]
F --> H
G --> H
D --> H
H --> I[Auto Copy to Clipboard]
style A fill:#e1f5ff
style H fill:#c8e6c9
style I fill:#fff9c4
- Visit the Releases page
- Download the latest version package (
simple-smart-ocr-vX.X.X.zip) - Extract and run
SimpleSmartOCR.exe
💡 Tip: QR code recognition works out of the box, no configuration needed!
flowchart LR
A[Start] --> B{Choose Engine}
B -->|Baidu Cloud OCR| C[fa:fa-cloud Apply for Baidu API]
B -->|Google Gemini| D[fa:fa-brain Apply for Gemini API]
C --> E[1000 free calls/month]
D --> F[60 free calls/minute]
E --> G[Enter in Settings]
F --> G
G --> H[fa:fa-check Start Recognition]
style A fill:#e3f2fd
style E fill:#c8e6c9
style F fill:#c8e6c9
style H fill:#fff9c4
After clicking Select Image Directory, the application automatically monitors file changes in that directory:
- File List: Image files are sorted by creation date in descending order (newest first)
- Image Preview: Single or double-click an image to view it in the preview area; click the preview to enlarge
- Execute Recognition: After double-clicking to select an image, click the corresponding recognition button
Text Recognition: Extract text content from imagesQR Code Recognition: Decode QR code information from images
- Auto Monitoring: New images added to the directory automatically appear in the file list
Community contributions are welcome! You can participate by:
- 🐛 Submitting bug reports
- 💡 Proposing new features
- 🔧 Submitting code improvements
- 📖 Improving documentation
- 🌍 Adding new language translations
- ⭐ Starring the project to show support
This project is licensed under the MIT License - see the LICENSE file for details.
