From b12bb8ff0b8accecb621fd58d97c60aaf363246a Mon Sep 17 00:00:00 2001 From: Manuel Moreno Delgado Date: Tue, 20 Jan 2026 23:31:52 +0100 Subject: [PATCH 1/2] Add pdf2md-ai: AI-powered PDF to Markdown converter --- src/pdf2md-ai/README.md | 156 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 src/pdf2md-ai/README.md diff --git a/src/pdf2md-ai/README.md b/src/pdf2md-ai/README.md new file mode 100644 index 0000000000..5a988f4c9f --- /dev/null +++ b/src/pdf2md-ai/README.md @@ -0,0 +1,156 @@ +# PDF to Markdown (pdf2md-ai) + +AI-powered PDF to Markdown converter using advanced AI. Preserves document structure, tables, and formatting with intelligent content extraction. + +## Features + +- **Intelligent Extraction**: Uses advanced AI (Gemini) for accurate content extraction +- **Structure Preservation**: Maintains headings, tables, lists, and formatting +- **Multi-language Support**: Processes documents in any language +- **Credit-based System**: Transparent usage tracking +- **Fast Processing**: Typical 1-page PDF converted in seconds + +## Installation + +### Via NPX (Recommended) + +Add to your MCP settings file: + +#### Claude Desktop + +On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json` +On Windows: `%APPDATA%\Claude\claude_desktop_config.json` + +```json +{ + "mcpServers": { + "pdf2md-ai": { + "command": "npx", + "args": ["-y", "pdf2md-ai"], + "env": { + "PDF_TO_MARKDOWN_API_KEY": "your-api-key-here" + } + } + } +} +``` + +#### Cursor + +Add to your Cursor MCP settings: + +```json +{ + "mcpServers": { + "pdf2md-ai": { + "command": "npx", + "args": ["-y", "pdf2md-ai"], + "env": { + "PDF_TO_MARKDOWN_API_KEY": "your-api-key-here" + } + } + } +} +``` + +### Getting an API Key + +1. Visit [pdf-to-markdown-pro.onrender.com](https://pdf-to-markdown-pro.onrender.com) +2. Sign up for a free account +3. Copy your API key from the dashboard +4. Add it to your MCP configuration as shown above + +## Usage + +Once configured, simply ask your AI assistant: + +``` +Convert this PDF to markdown: /path/to/your/document.pdf +``` + +The server will: +1. Read the PDF file from your local system +2. Process it using advanced AI +3. Return formatted Markdown with statistics + +### Example + +**Request:** +``` +Convert this contract: C:\Documents\agreement.pdf +``` + +**Response:** +``` +✅ Conversion Completed Successfully + +📊 Statistics: +- Pages processed: 8 +- Credits used: 8 +- Credits remaining: 492 + +## Contract Content: + +[Full markdown content here with preserved structure, tables, and formatting...] +``` + +## Tools + +### convert_pdf_to_markdown + +Converts a PDF file to Markdown format. + +**Arguments:** +- `filePath` (string, required): Absolute path to the PDF file on your local system + +**Returns:** +- Markdown-formatted content +- Document statistics (pages, file size) +- Credit usage information + +## Configuration + +### Environment Variables + +- `PDF_TO_MARKDOWN_API_KEY` (required): Your API key from the service +- `PDF_API_URL` (optional): Custom API endpoint (defaults to production) + +## Use Cases + +- **Document Analysis**: Extract text from contracts, reports, invoices +- **RAG Pipelines**: Convert PDFs to Markdown for vector databases and embeddings +- **Content Migration**: Batch convert PDF documentation to Markdown format +- **Research**: Extract academic papers and technical documents +- **Data Extraction**: Pull structured data from forms and tables +- **Archiving**: Create searchable text versions of PDF archives + +## Requirements + +- Node.js 18 or higher +- Internet connection for API access +- Valid API key with available credits + +## Limitations + +- Maximum file size: 50 MB recommended +- Request timeout: 5 minutes per file +- Credit-based: Each page consumes 1 credit +- Requires network access to processing API + +## Pricing + +- Free tier available with limited credits +- Pay-as-you-go model: 1 credit per page +- Enterprise plans available for high-volume usage + +Visit [pdf-to-markdown-pro.onrender.com](https://pdf-to-markdown-pro.onrender.com) for current pricing. + +## Links + +- [NPM Package](https://www.npmjs.com/package/pdf2md-ai) +- [Get API Key](https://pdf-to-markdown-pro.onrender.com) +- [GitHub Issues](https://github.com/MANUJ243/pdf2md-ai/issues) + +## License + +MIT From ca3a7fc4905478cacfe6a9545b95dec95ec8df59 Mon Sep 17 00:00:00 2001 From: Manuel Moreno Delgado Date: Wed, 21 Jan 2026 00:01:28 +0100 Subject: [PATCH 2/2] Update README: emphasize context preservation (images, tables, code) --- src/pdf2md-ai/README.md | 55 +++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git a/src/pdf2md-ai/README.md b/src/pdf2md-ai/README.md index 5a988f4c9f..d50ac1f151 100644 --- a/src/pdf2md-ai/README.md +++ b/src/pdf2md-ai/README.md @@ -1,14 +1,20 @@ -# PDF to Markdown (pdf2md-ai) +# PDF to Markdown (pdf2md-ai) -AI-powered PDF to Markdown converter using advanced AI. Preserves document structure, tables, and formatting with intelligent content extraction. +AI-powered PDF to Markdown converter that **preserves complete context**: images (analyzed and described with AI), complex tables (including merged cells), code blocks (with original formatting), and document structure. Uses Gemini and LlamaParse for intelligent processing. -## Features +## Key Features -- **Intelligent Extraction**: Uses advanced AI (Gemini) for accurate content extraction -- **Structure Preservation**: Maintains headings, tables, lists, and formatting -- **Multi-language Support**: Processes documents in any language -- **Credit-based System**: Transparent usage tracking -- **Fast Processing**: Typical 1-page PDF converted in seconds +This is not just a simple PDF text extractor. pdf2md-ai **preserves complete visual and structural context**: + +- 📸 **Images with Context**: Each image is analyzed with AI (Gemini) and described in detail, maintaining its context within the document +- 📊 **Complex Tables**: Preserves complete table structure including merged cells, alignment, and formatting +- 💻 **Source Code**: Maintains code blocks with original syntax and formatting intact +- 📝 **Document Structure**: Hierarchies, lists, quotes, and special formatting preserved +- 🌍 **Multi-language Support**: Processes documents in any language +- ⚡ **Fast Processing**: Typical 1-page PDF converted in seconds +- 💳 **Credit-based System**: Transparent usage tracking (1 credit per page) + +This means when you convert a technical PDF, a report with graphics, or documentation with code examples, **you don't lose any visual or structural information**. ## Installation @@ -70,14 +76,16 @@ Convert this PDF to markdown: /path/to/your/document.pdf The server will: 1. Read the PDF file from your local system -2. Process it using advanced AI -3. Return formatted Markdown with statistics +2. Analyze images with AI and extract descriptions +3. Preserve complete table structures +4. Maintain code blocks with original formatting +5. Return formatted Markdown with full context preserved ### Example **Request:** ``` -Convert this contract: C:\Documents\agreement.pdf +Convert this technical document: C:\Documents\api-guide.pdf ``` **Response:** @@ -89,22 +97,26 @@ Convert this contract: C:\Documents\agreement.pdf - Credits used: 8 - Credits remaining: 492 -## Contract Content: +## API Guide Content: -[Full markdown content here with preserved structure, tables, and formatting...] +[Full markdown content here with: + - Image descriptions in context + - Complex tables fully preserved + - Code examples with syntax highlighting + - Complete document structure maintained...] ``` ## Tools ### convert_pdf_to_markdown -Converts a PDF file to Markdown format. +Converts a PDF file to Markdown format preserving complete context: images, tables, code blocks, and structure. **Arguments:** - `filePath` (string, required): Absolute path to the PDF file on your local system **Returns:** -- Markdown-formatted content +- Markdown-formatted content with complete context preservation - Document statistics (pages, file size) - Credit usage information @@ -117,12 +129,13 @@ Converts a PDF file to Markdown format. ## Use Cases -- **Document Analysis**: Extract text from contracts, reports, invoices -- **RAG Pipelines**: Convert PDFs to Markdown for vector databases and embeddings -- **Content Migration**: Batch convert PDF documentation to Markdown format -- **Research**: Extract academic papers and technical documents -- **Data Extraction**: Pull structured data from forms and tables -- **Archiving**: Create searchable text versions of PDF archives +- **Technical Documentation**: Convert docs with diagrams, tables, and code while preserving all context +- **Research Papers**: Extract academic papers with figures, complex tables, and references +- **RAG Pipelines**: Create context-rich markdown for vector databases and embeddings +- **Contract Analysis**: Process legal documents with tables and structured information +- **Data Extraction**: Pull structured data from forms and complex tables +- **Code Documentation**: Extract programming guides with code examples intact +- **Report Processing**: Convert business reports maintaining charts and table context ## Requirements