Skip to content

ashmathesondev/dedupe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Photo and Video Deduplication Tool

A Python-based utility to identify and report duplicate media files across subdirectories relative to a root folder.

Features

  • Root-to-Subdirectory Comparison: Compares media files in subdirectories against the files found at the root level.
  • Two Comparison Modes:
    • Fast (Default): Compares files by size and filename. Ideal for large drives.
    • Deep (-deep): Uses SHA-256 content hashing for guaranteed accuracy. Optimized to only hash files with matching sizes.
  • Supported Formats: Targets specific media extensions: .jpg, .jpeg, .cr3, .png, and .mp4.
  • Safe Operation: This tool does not delete any files. It only generates a detailed report.
  • Detailed Reporting: Summarizes the number of files checked and lists the paths for both duplicates and their originals.

Usage

Prerequisites

  • Python 3.x installed on your system.

Setup

Before running the script or tests, set up a virtual environment and install dependencies:

# Create a virtual environment
python -m venv .venv

# Activate the environment (optional for the commands below, but good practice)
# .venv\Scripts\Activate.ps1

# Install dependencies
.venv\Scripts\python.exe -m pip install -r requirements.txt

Running the Script

  1. Open a terminal or command prompt.

  2. Navigate to the directory containing dedupe.py.

  3. Run the script using the virtual environment's Python executable:

    .venv\Scripts\python.exe dedupe.py "C:\Path\To\Your\MediaFolder"

Optional Arguments

  • --log: Specify a custom name for the report file.

    .venv\Scripts\python.exe dedupe.py "C:\Media" --log "my_dedupe_report.txt"
  • -deep: Enable full content comparison using SHA-256 hashes.

    .venv\Scripts\python.exe dedupe.py "C:\Media" -deep
  • -clean: Delete root files that have verified duplicates in subdirectories. Note: This requires the -deep flag for safety.

    .venv\Scripts\python.exe dedupe.py "C:\Media" -deep -clean

Running Tests

To run the automated test suite, use the virtual environment's Python:

.venv\Scripts\python.exe -m pytest test_dedupe.py

Example Report

Photo and Video Deduplication Report
=====================================
Target Directory: E:\work\personal\photodedupe\test_data
Files found in root: 2
Duplicates found in subdirectories: 3

Duplicate Files Found:
----------------------
Duplicate: E:\work\personal\photodedupe\test_data\subdir1\image1_copy.jpg
Original in root: E:\work\personal\photodedupe\test_data\image1.jpg
--------------------

Contact

ash.matheson.dev@gmail.com

About

python script for de-duplicating media files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages