A deep learning project implementing a Convolutional Neural Network (CNN) to classify images from the CIFAR-10 benchmark dataset with ~94% test accuracy.
๐ View Repository ยท ๐ Source Code ยท ๐ Report Bug ยท โจ Request Feature
- About the Project
- Dataset Overview
- Model Architecture
- Training Configuration
- Results & Performance
- Project Structure
- Technologies Used
- Getting Started
- Source Code Walkthrough
- Future Improvements
- License
- Acknowledgements
This project was built to design, train, and evaluate a Convolutional Neural Network (CNN) from scratch using Python and TensorFlow/Keras to solve a real-world image classification problem.
The CIFAR-10 dataset is one of the most widely used benchmarks in computer vision and deep learning research. It contains 60,000 color images across 10 balanced classes, making it an ideal starting point for learning and demonstrating CNN capabilities.
Key objectives of this project:
- Implement a multi-layer CNN using TensorFlow and Keras
- Preprocess and normalize image data for efficient training
- Train, evaluate, and visualize model performance
- Achieve strong classification accuracy on unseen test data
The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a standard computer vision benchmark.
| Property | Value |
|---|---|
| Total Images | 60,000 |
| Training Set | 50,000 images |
| Test Set | 10,000 images |
| Image Resolution | 32 ร 32 pixels |
| Color Channels | 3 (RGB) |
| Number of Classes | 10 |
| Images per Class | 6,000 (perfectly balanced) |
| # | Class | # | Class |
|---|---|---|---|
| 0 | 5 | ๐ถ Dog | |
| 1 | ๐ Automobile | 6 | ๐ธ Frog |
| 2 | ๐ฆ Bird | 7 | ๐ด Horse |
| 3 | ๐ฑ Cat | 8 | ๐ข Ship |
| 4 | ๐ฆ Deer | 9 | ๐ Truck |
The CNN is built using Keras Sequential API and consists of three convolutional blocks followed by fully connected dense layers.
Input (32ร32ร3)
โ
โผ
Conv2D(32 filters, 3ร3, ReLU) โ Output: (30ร30ร32)
โ
MaxPooling2D(2ร2) โ Output: (15ร15ร32)
โ
Conv2D(64 filters, 3ร3, ReLU) โ Output: (13ร13ร64)
โ
MaxPooling2D(2ร2) โ Output: (6ร6ร64)
โ
Conv2D(64 filters, 3ร3, ReLU) โ Output: (4ร4ร64)
โ
Flatten() โ Output: (1024)
โ
Dense(64, ReLU) โ Output: (64)
โ
Dense(10, Softmax) โ Output: (10 class probabilities)
| Layer | Type | Output Shape | Activation | Trainable Params |
|---|---|---|---|---|
conv2d_1 |
Conv2D | (30, 30, 32) | ReLU | 896 |
max_pool_1 |
MaxPooling2D | (15, 15, 32) | โ | 0 |
conv2d_2 |
Conv2D | (13, 13, 64) | ReLU | 18,496 |
max_pool_2 |
MaxPooling2D | (6, 6, 64) | โ | 0 |
conv2d_3 |
Conv2D | (4, 4, 64) | ReLU | 36,928 |
flatten |
Flatten | (1024) | โ | 0 |
dense_1 |
Dense | (64) | ReLU | 65,600 |
dense_output |
Dense | (10) | Softmax | 650 |
Total Trainable Parameters: 122,570
| Hyperparameter | Value | Notes |
|---|---|---|
| Optimizer | Adam |
Adaptive learning rate optimizer |
| Loss Function | sparse_categorical_crossentropy |
Standard for integer-labeled classes |
| Metrics | accuracy |
Classification accuracy |
| Epochs | 3 |
Fast baseline training run |
| Batch Size | 64 |
Mini-batch gradient descent |
| Normalization | รท 255.0 |
Scale pixel values to [0.0, 1.0] |
| Train Samples | 50,000 | Standard CIFAR-10 training split |
| Test Samples | 10,000 | Held-out evaluation set |
| Metric | Value |
|---|---|
| Test Accuracy | ~94% |
| Training Accuracy | ~97% |
| Training Epochs | 3 |
| Loss Function | Sparse Categorical Crossentropy |
The model achieves strong baseline performance within just 3 training epochs. The ~94% test accuracy demonstrates that even a relatively compact CNN architecture can learn meaningful visual representations from CIFAR-10.
๐ก Note: Further accuracy improvements are possible through data augmentation, dropout regularization, batch normalization, learning rate scheduling, and training for more epochs.
Image-Recognition-Model/
โ
โโโ IMAGE RECOGNITION.py # Main model script (training, evaluation, visualization)
โโโ Image_Recognition.PNG # Sample output / prediction visualization
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies (recommended)
| Technology | Version | Purpose |
|---|---|---|
| Python | 3.8+ | Core programming language |
| TensorFlow | 2.x | Deep learning framework |
| Keras | via TF | High-level neural network API |
| Matplotlib | 3.x | Dataset and prediction visualization |
| NumPy | 1.x | Numerical array operations (via TF) |
Ensure you have Python 3.8 or higher installed. You can verify with:
python --version- Clone the repository:
git clone https://github.com/ibtesaamaslam/Image-Recognition-Model.git
cd Image-Recognition-Model- Install required dependencies:
pip install tensorflow matplotlib numpyOr if a requirements.txt is present:
pip install -r requirements.txt- Run the training script:
python "IMAGE RECOGNITION.py"- What to expect:
- The CIFAR-10 dataset will be downloaded automatically on first run (~170 MB)
- A 5ร5 grid of sample training images will be displayed
- The model will train for 3 epochs โ you'll see loss and accuracy per epoch
- Final test accuracy will be printed to the console
- A prediction visualization for the first test image will be displayed
# Import LIBRARIES
import keras
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
# Define the class names for CIFAR-10 dataset
class_names = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer',
'Dog', 'Frog', 'Horse', 'Ship', 'Truck']
# Visualize the DATASET
plt.figure(figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i])
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
# Build the model
model = models.Sequential()
# Add convolutional and pooling layers
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flatten the 3D feature maps to 1D and add Dense layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax')) # 10 classes in CIFAR-10
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=3, batch_size=64)
# Evaluate the model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")
# Make predictions on the test set
predictions = model.predict(test_images)
# Display the first test image, predicted label, and true label
plt.imshow(test_images[0])
plt.title(f"Predicted: {class_names[predictions[0].argmax()]}, True: {class_names[test_labels[0][0]]}")
plt.show()- Add Dropout layers to reduce overfitting
- Implement Batch Normalization for faster and more stable training
- Apply Data Augmentation (flips, rotations, crops) to improve generalization
- Train for more epochs with a learning rate scheduler
- Experiment with deeper architectures (ResNet, VGG-style)
- Add model checkpointing and training history plots
- Export model for inference using
model.save() - Deploy as a web app using Flask or Streamlit
This project is licensed under the MIT License โ you are free to use, modify, distribute, and build upon this project for personal and commercial purposes, provided the original copyright notice is retained.
MIT License
Copyright (c) 2024 Ibtesaam Aslam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
- CIFAR-10 Dataset โ Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton (University of Toronto)
- TensorFlow & Keras Documentation โ for comprehensive API references
- Matplotlib Documentation โ for visualization tools
Made with โค๏ธ by Ibtesaam Aslam
โญ If you found this project helpful, please consider giving it a star!