Demo website: https://huggingface.co/spaces/mysertkaya/AI-Generated-Image-Detection
This repository contains a deep learning project designed to detect whether an image is Real or AI-Generated (Fake). It utilizes the Qwen2.5-VL-7B-Instruct Vision-Language Model (VLM), fine-tuned using LoRA (Low-Rank Adaptation) via the Unsloth library for efficiency.
The project explores two different training methodologies to improve robustness against social media compression and hard-to-classify examples.
- State-of-the-Art VLM: Uses Qwen2.5-VL-7B, a powerful multimodal model.
- Efficient Fine-Tuning: Implements 4-bit quantization and LoRA adapters using Unsloth.
- Robust Training Pipeline (Method 2):
- Data Augmentation: Random resizing, cropping, and color jittering.
- Compression Simulation: Applies random JPEG compression (Quality 50-95) during training to mimic real-world social media artifacts.
- Focal Loss: Used to focus the model on hard-to-classify examples (Alpha: 0.75, Gamma: 2.0).
- Interactive Inference: Includes a Gradio web interface for easy testing.
first_method_training_code.ipynb: First Method training pipeline using standard SFT (Supervised Fine-Tuning) with CrossEntropy Loss.second_method_training_code.ipynb: Second Method training pipeline incorporating JPEG compression simulation, heavy augmentation, and Focal Loss.inference_code_to_run.ipynb: Inference script that loads the trained adapters and launches a Gradio web UI.
To run the notebooks, you need to install the required dependencies. It is recommended to use a GPU environment (e.g., Google Colab or a local machine with CUDA).
pip install unsloth
pip install torch torchvision pillow gradio
pip install transformers --upgrade
pip install --no-deps trlThe model is trained on the SID_Set (Synthetic Image Detection) dataset sourced from Hugging Face
(saberzl/SID_Set).
- Images with
label = 2are filtered out. - Images are filtered to ensure high quality (e.g., minimum width of 1024 pixels).
- Class Balancing: The dataset is undersampled to ensure an equal number of Real and Fake examples.
- Model: Qwen2.5-VL-7B-Instruct (4-bit)
- Loss: Standard Cross-Entropy
- Scheduler: Linear
- Epochs: 1
- Focus: Establishing a good performance on balanced data
- Data Augmentation:
RandomHorizontalFlipColorJitterRandomResizedCrop(target size: 512×512)
- Compression Simulation:
- Images are compressed and decompressed in memory using JPEG quality 50–95 to simulate real-world quality degradation.
- Loss: Focal Loss (penalizes hard misclassifications more than easy ones)
- Scheduler: Cosine Annealing
- Epochs: 2
- Result: Improved robustness against artifacts commonly found in compressed images
(e.g., images shared via Twitter or WhatsApp)
- Training Accuracy: Reached ~99% on the validation set for Method 1.
- Evaluation: Tested on external datasets (e.g., Dalle3 generated images) and standard validation splits.
Unsloth AI for the optimized fine-tuning library. Qwen Team for the Qwen2.5-VL model. saberzl for the SID Dataset.