A Python-based utility to identify and report duplicate media files across subdirectories relative to a root folder.
- Root-to-Subdirectory Comparison: Compares media files in subdirectories against the files found at the root level.
- Two Comparison Modes:
- Fast (Default): Compares files by size and filename. Ideal for large drives.
- Deep (
-deep): Uses SHA-256 content hashing for guaranteed accuracy. Optimized to only hash files with matching sizes.
- Supported Formats: Targets specific media extensions:
.jpg,.jpeg,.cr3,.png, and.mp4. - Safe Operation: This tool does not delete any files. It only generates a detailed report.
- Detailed Reporting: Summarizes the number of files checked and lists the paths for both duplicates and their originals.
- Python 3.x installed on your system.
Before running the script or tests, set up a virtual environment and install dependencies:
# Create a virtual environment
python -m venv .venv
# Activate the environment (optional for the commands below, but good practice)
# .venv\Scripts\Activate.ps1
# Install dependencies
.venv\Scripts\python.exe -m pip install -r requirements.txt-
Open a terminal or command prompt.
-
Navigate to the directory containing
dedupe.py. -
Run the script using the virtual environment's Python executable:
.venv\Scripts\python.exe dedupe.py "C:\Path\To\Your\MediaFolder"
-
--log: Specify a custom name for the report file..venv\Scripts\python.exe dedupe.py "C:\Media" --log "my_dedupe_report.txt"
-
-deep: Enable full content comparison using SHA-256 hashes..venv\Scripts\python.exe dedupe.py "C:\Media" -deep
-
-clean: Delete root files that have verified duplicates in subdirectories. Note: This requires the-deepflag for safety..venv\Scripts\python.exe dedupe.py "C:\Media" -deep -clean
To run the automated test suite, use the virtual environment's Python:
.venv\Scripts\python.exe -m pytest test_dedupe.pyPhoto and Video Deduplication Report
=====================================
Target Directory: E:\work\personal\photodedupe\test_data
Files found in root: 2
Duplicates found in subdirectories: 3
Duplicate Files Found:
----------------------
Duplicate: E:\work\personal\photodedupe\test_data\subdir1\image1_copy.jpg
Original in root: E:\work\personal\photodedupe\test_data\image1.jpg
--------------------