Skip to content

jmozzi/pdf-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF tools

A modular PDF toolkit (currently implementing a PDF merger and rotating pages functionality).

Designed to evolve into a toolkit for merging, splitting, rotating, and compressing PDF files.
Built as a ongoing hands-on practice project to deepen my Python skills.
Inspired by https://dailypythonprojects.substack.com/p/build-a-pdf-toolkit-with-python-day

Architecture

The project follows this structure:

Layers: CLIoperationscore, with utils as shared helpers. See docs/architecture.md for details.

pdf_tools/
│
├── main.py
│   # Application entry point
│   # CLI dispatch, logging configuration, exit codes
│
├── cli/
│   └── parser.py
│       # Defines CLI subcommands and argument parsing using argparse
│
├── core/
│   ├── merger.py
│   │   # Pure PDF logic
│   │   # Implements `merge_pdfs()`
│   └── transformer.py
│       # Pure PDF logic
│       # Implements `rotate_pages()`
│
├── operations/
│   ├── merge.py
│   │   # Operation orchestration layer for merge
│   │   # Performs validation, file discovery, and calls core logic
│   │   # Uses structured logging
│   │   # Raises `OperationError` on failure
│   └── rotate.py
│       # Operation orchestration layer for rotate
│       # Validates input file, parses page specs, and calls core logic
│       # Raises `OperationError` on failure
│
└── utils/
    ├── file_utils.py
    │   # File system helpers
    │   # e.g. `find_pdf_files()`
    │
    └── validation.py
        # Generic validation helpers
        # `validate_folder()`, `ensure_pdf_extension()`, `is_writable_directory()`

Design goals

  • separation of responsibilities (CLI adapter → operations → core)
  • core: pure functions, no logging/CLI/HTTP
  • operations: orchestration, validation, logging, user-friendly error messages
  • make PDF operations reusable as importable functions
  • maintain a clean and readable project structure

What I learnt

  • structuring small Python projects into modular components
  • applying separation of concerns (CLI layer vs operations vs core logic)
  • designing operation orchestration layers (config, result dataclasses, OperationError)
  • designing CLI tools with argparse (subcommands, dest, required)
  • instantiating classes vs calling functions; modules vs classes
  • controlled exception handling: raising custom errors, exception chaining
  • safe file handling using context managers
  • working with pypdf to process structured documents
  • validation and normalisation: input, directories, write permissions, output paths
  • type hints
  • dataclasses
  • managing virtual environments and project dependencies with uv and git

Technologies used

  • Runtime: Python 3.12
  • Third-party: pypdf
  • Standard library: argparse, dataclasses, logging, os, sys
  • Tooling: uv, git

Installation

Install uv (if you don’t have it yet):
Follow the official instructions for your platform: https://docs.astral.sh/uv/getting-started/installation/.

Clone and set up:

# clone the repository:
git clone https://github.com/jmozzi/pdf-tools
cd pdf-tools
# create and activate a virtual environment with uv:
uv venv
# install dependencies with uv (using `pyproject.toml`)
uv sync

Usage

Once inside the virtual environment:

Run without arguments: shows the help menu (subcommand required)
uv run main.py

Merge command:
uv run main.py merge -p path/to/folder -o merged_file.pdf

Rotate command:
uv run main.py rotate -i "path/to/input.pdf" -pg "1,3,5-7" -o rotated_output.pdf

Options for merge:

Option Short Default Description
--path -p current directory Folder containing PDF files
--output -o merged_output.pdf Output filename

Options for rotate:

Option Short Default Description
--input -i (required) Input PDF file to rotate
--output -o rotated_output.pdf Output PDF filename
--pages -pg (required) Pages to rotate (1-based), can include comma-separated pages and ranges, e.g. 1,3,5-7

Help:
uv run main.py -h
uv run main.py merge -h
uv run main.py rotate -h

Example output

  • using the -p option to provide a file path and utilising the default output file name:
    Example Output

Docs

The docs/ folder holds detailed explanations:

File Content
docs/architecture.md layers, responsibilities, error handling
docs/flow-merge-cli-to-core.md how one merge command flows through the system
docs/cli-parser.md how the CLI parser and subcommands work
docs/operations-merge.md how MergeOperation works (config, result, validation, exceptions)
docs/operations-merge-design-choices.md why I designed operations/merge.py this way
docs/operations-rotate.md how RotateOperation works (config, result, validation, exceptions)

Next steps:

  • add unit tests for merger
  • add split, compress commands and core functions
  • add web layer (Flask or FastAPI), reusing operations

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages