feat: Ink CLI with rich UX and OCR preprocessing by prosdev · Pull Request #11 · prosdevlab/doc-agent

prosdev · 2025-12-08T06:17:03Z

Summary

This PR adds a rich, interactive CLI experience using Ink (React for terminals) and OCR preprocessing for accurate document extraction.

Closes

Closes UX: Ollama Readiness & Feedback #5 (UX: Ollama Readiness & Feedback)
Closes Accuracy: Add OCR preprocessing for reliable price extraction #10 (Accuracy: Add OCR preprocessing for reliable price extraction)

Changes

CLI UX (Issue #5)

Ink-based CLI: React components for interactive terminal UI
Ollama auto-setup: Detects if Ollama is installed/running, prompts to install/start
Model pulling progress bar: Native Ink progress bar using Ollama HTTP API
Multiple providers: Support for Ollama, Gemini with --model flag
Testable architecture: Services, hooks, contexts with dependency injection

OCR Preprocessing (Issue #10)

tesseract.js integration: Extract text from images before vision model
Multi-page support: OCR all PDF pages in parallel
Improved accuracy: Botella Agua price now correctly extracts as $3.49 (was $1.90)

Testing

72 tests passing
75% coverage
Tests for hooks, services, components

Trade-offs

Feature	Impact
Ink	+Rich UX, +Testable, ~Larger bundle
OCR	+15MB (WASM), +2-5s/page, +Accuracy

Test

pnpm dev extract examples/tacqueria-receipt.pdf --dry-run

Replace ora/chalk with Ink (React for CLIs) for a richer terminal experience: Components: - OllamaStatus: Health check with clear setup instructions - ExtractionProgress: Live elapsed time for slow extractions - ErrorDisplay: Friendly errors with suggestions - Result: JSON output display UX improvements: - When Ollama isn't running: show install/setup instructions - When extraction is slow: show elapsed time + reassurance - Clear visual hierarchy with colors and symbols Dependencies: - ink ^6.5.1 (React for CLIs) - react ^19.2.1 - ink-spinner ^5.0.0 Addresses #5

Features: - Add Ink React components for interactive CLI experience - Implement Ollama auto-install/auto-start flow with prompts - Add native Ink progress bar for model pulling via HTTP API - Support multiple providers (Ollama, Gemini) with --model flag - Extract services/hooks/contexts for testable architecture - Add dependency injection via React Context Tests (67 total, 75% coverage): - Add useExtraction hook tests (extraction flow, dry-run, errors) - Add useOllama hook tests (install/start flows, auto-confirm) - Add OllamaStatus component tests using ink-testing-library - Add Ollama service tests (API checks, model pulling) - Exclude barrel files from coverage Other: - Make Zod schema lenient to handle model output variations - Update default Gemini model to gemini-2.5-flash - Add example receipt PDF for testing

- Add tesseract.js for OCR text extraction - OCR all PDF pages in parallel before vision model processing - Include OCR text in prompt as primary reference for text/numbers - Vision model uses image for layout context only Fixes #10: Botella Agua price now correctly extracts as $3.49 (was $1.90) Trade-offs: - +15MB install size (WASM files) - +2-5s processing time per page - Significantly improved accuracy for financial documents

- Add OCR processing tests (5 tests) - Mock tesseract.js to avoid worker cleanup issues in tests - Update existing test to expect OCR text in prompts 72 tests passing

pros-cs added 4 commits December 7, 2025 13:04

test(extract): add OCR tests and mock tesseract.js in test env

ae3d2be

- Add OCR processing tests (5 tests) - Mock tesseract.js to avoid worker cleanup issues in tests - Update existing test to expect OCR text in prompts 72 tests passing

prosdev merged commit f7f1317 into main Dec 8, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ink CLI with rich UX and OCR preprocessing#11

feat: Ink CLI with rich UX and OCR preprocessing#11
prosdev merged 4 commits intomainfrom
feat/ink-cli-ux

prosdev commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

prosdev commented Dec 8, 2025

Summary

Closes

Changes

CLI UX (Issue #5)

OCR Preprocessing (Issue #10)

Testing

Trade-offs

Test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants