Next-Gen Cognitive OCR for Rise of Kingdoms
Key Features • Architecture • Getting Started • API Usage • Roadmap • Contributing
RoK Vision is a high-performance Cognitive OCR API designed to transform Rise of Kingdoms screenshots into structured data. By combining Deep Learning (PaddleOCR) with a Topological C# Orchestrator, Vision understands the context of the screen, making it resolution-independent and extremely resilient to UI variations.
- 👤 Governor Profiles Extracts ID, Name, Power, Kill Points, and Civilization from the profile screen with sub-second latency.
- ⚔️ Battle Intelligence Full analysis of PvP and PvE reports, including troop metrics, casualty rates, and boss identification.
- 🎒 Inventory Intelligence Reads complex inventory screens (Action Points & XP Books). Supports Multi-Screenshot Merging and uses Color Detection to distinguish items.
- 🗺️ Kingdom Map Intelligence (Beta) Extracts all visible cities from a map screenshot using a Hybrid AI Engine (YOLO + OCR), resilient to screen resolution and UI variations.
- 🛡️ Alliance Rally Intelligence Analyzes war screens to extract Rally Leader, Target (Forts/Passes), and a detailed list of participants. Includes a Logical Inference Engine to deduce troop types based on global rally statistics.
- ✅ Standardized Output
All endpoints now return a unified
RokResponsestructure with a complete Audit Log and detailed Extraction Evidence for every field. - 🔍 The Magnifier (Auto-Healing) Automatic regional re-scanning with specialized digital filters (White Isolation, Inverted Binary) for low-confidence areas.
- 🩺 Debug Mode
Add
Debug: trueto any request to receive granular Timings per step, Raw OCR Text, and Magnifier Attempt Logs in the response. - 🌐 Multicultural Core Optimized for Latin alphabets (EN, PT, ES, FR, DE) with smart detection of unsupported characters.
The easiest way to run RoK Vision is using Docker. It sets up the Neural Network environment and the API Gateway automatically.
👉 Read the Installation Guide to get up and running in 5 minutes.
The solution follows a distributed architecture: the Eye (Python) handles the heavy AI computer vision, while the Brain (C#) manages the logical orchestration.
graph LR
User["Client / Bot"] -->|"POST"| API["API Gateway (.NET 9)"]
subgraph "The Brain (.NET 9)"
API --> Orchestrator[Cognitive Orchestrator]
Orchestrator --> Neurons[Specialized Neurons]
Neurons --> Magnifier[The Magnifier]
end
subgraph "The Eye (Python)"
Orchestrator -->|"gRPC/HTTP"| OCR[PaddleOCR Engine]
end
RoK Vision exposes a set of RESTful endpoints to analyze different game screens. Every response is wrapped in a standardized RokResponse<T> envelope that includes a summary with clean data, fields with extraction evidence, and an auditLog.
👉 View the Full API Reference for detailed request/response models and JSON examples.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/governor/analyze |
Extracts all stats from a governor profile screen. |
POST |
/api/reports/analyze |
Analyzes a PvP or PvE battle report. |
POST |
/api/ap/analyze |
Reads Action Point items from the inventory. Supports multi-image. |
POST |
/api/xp/analyze |
Reads Tomes of Knowledge from the inventory. Supports multi-image. |
POST |
/api/map/analyze |
(Beta) Extracts all visible cities from a kingdom map view. |
POST |
/api/rally/analyze |
Extracts details from Alliance Rally screens (Header, Target, Participants). |
To ensure >95% accuracy, follow the "Golden Screenshot" rules:
- Full Screen: Send original screenshots. Do not crop the image manually.
- No Overlays: Close the chat, notification bubbles, or side menus before capturing etc...
- Brightness: Use standard in-game brightness for optimal contrast.
If RoKVision helps your alliance, consider buying me a coffee! ☕
- Pix: 031c9e65-66a3-4611-822b-796e227e200a
- Ko-fi: [link]
See our CONTRIBUTING.md for details on how to help the project.
Pull requests are welcome! For major changes, please open an issue first.
Distributed under the MIT License. See LICENSE for more information.