PDF Annotation Extractor

A Flask web app that extracts annotations (highlights, underlines, squiggly marks, comments) from uploaded PDF files and exports them as CSV and JSON.

Folder Structure

project/
├── app.py
├── templates/
│   └── index.html
├── uploads/          ← auto-created
├── outputs/          ← auto-created
└── README.md

Requirements

Python 3.9+
pip

Setup & Run

1. Create and activate a virtual environment (recommended)

python -m venv venv

# macOS / Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

2. Install dependencies

pip install flask pymupdf werkzeug

3. Run the app

python app.py

4. Open in your browser

http://127.0.0.1:5000

Usage

Upload any annotated PDF via the drag-and-drop zone or file browser.
Click Extract Annotations.
View results in the table on the page.
Download as CSV or JSON using the buttons.

Output Format

CSV (outputs/annotations.csv):

page,label,text
1,Highlight,"example highlighted text"
2,Comment,"This is a sticky note"
3,Underline,"underlined sentence"

JSON (outputs/annotations.json):

[
  { "page": 1, "label": "Highlight", "text": "example highlighted text" },
  { "page": 2, "label": "Comment",   "text": "This is a sticky note" },
  { "page": 3, "label": "Underline", "text": "underlined sentence" }
]

Supported Annotation Types

Type	PDF Standard Name
Highlight	PDF_ANNOT_HIGHLIGHT
Underline	PDF_ANNOT_UNDERLINE
Squiggly	PDF_ANNOT_SQUIGGLY
Strikeout	PDF_ANNOT_STRIKEOUT
Comment	PDF_ANNOT_TEXT (sticky)
Free Text	PDF_ANNOT_FREE_TEXT

API Endpoints

Method	Route	Description
GET	`/`	Render upload page
POST	`/`	Upload PDF and process annotations
GET	`/download/csv`	Download annotations.csv
GET	`/download/json`	Download annotations.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Annotation Extractor

Folder Structure

Requirements

Setup & Run

1. Create and activate a virtual environment (recommended)

2. Install dependencies

3. Run the app

4. Open in your browser

Usage

Output Format

Supported Annotation Types

API Endpoints

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PDF Annotation Extractor

Folder Structure

Requirements

Setup & Run

1. Create and activate a virtual environment (recommended)

2. Install dependencies

3. Run the app

4. Open in your browser

Usage

Output Format

Supported Annotation Types

API Endpoints