Skip to content

Latest commit

 

History

History
700 lines (516 loc) · 18.4 KB

File metadata and controls

700 lines (516 loc) · 18.4 KB

Convolutional Neural Networks (CNN) - Exam Prep Notes

From Pixels to Advanced Classification


📚 Table of Contents

  1. Understanding Image Representation
  2. CNN Architecture Overview
  3. Convolution Layer - The Feature Extractor
  4. Activation Function - ReLU
  5. Pooling Layer - The Downsampler
  6. Flattening Layer - Bridge to Classification
  7. Fully Connected Layer - The Decision Maker
  8. Regularization Techniques
  9. Output Layer - Softmax
  10. Mathematical Formulas - Quick Reference
  11. Solved Examples

1. Understanding Image Representation

1.1 What is a Pixel?

Analogy: Think of a digital image like a mosaic artwork. Each tiny colored tile is a pixel, and together they create the complete picture.

Key Concepts:

  • A digital image = Matrix of tiny units called pixels
  • Each pixel stores intensity or color information
  • Computers don't see objects — they see numbers!

1.2 Image as a Matrix

Grayscale vs. RGB Images:

Image Type Pixel Storage Example Value
Grayscale 1 number per pixel (brightness) [128]
RGB Color 3 numbers per pixel (R, G, B) [128, 64, 32]

Example Calculation:

  • A tree image: 148 × 148 pixels = 21,904 individual values
  • Grayscale: 21,904 values
  • RGB Color: 21,904 × 3 = 65,712 values

Exam Tip: For RGB image of size M × N:

  • Total values = M × N × 3

2. CNN Architecture Overview

Analogy: CNN is like an assembly line in a factory:

  1. Raw material (Input image) enters
  2. Quality inspection stations (Convolution + ReLU) extract features
  3. Compression units (Pooling) reduce size
  4. Assembly workers (Fully connected layers) make final decisions
  5. Quality control (Softmax) provides confidence scores

2.1 Complete Pipeline

Input Image → [Convolution → Batch Norm → ReLU → Pooling] × n
            → Flatten → Fully Connected → Dropout → Softmax → Output

Key Properties:

  • Input layer: Original image (pixel values)
  • Hidden layers: Convolution, ReLU, Pooling (partially connected)
  • Output layer: Fully connected + Softmax (classification)

3. Convolution Layer - The Feature Extractor

Analogy: Imagine using a magnifying glass to scan a document. You move it systematically across the page, examining small sections at a time. That's how convolution filters work!

3.1 What is Convolution?

Definition: A mathematical operation combining two functions to produce a third function.

In CNN:

Feature Map = Convolution(Input Image, Kernel/Filter)

3.2 Filters/Kernels

Key Points:

  • Filters are learnable matrices with weights and bias
  • Weights are randomly initialized, updated during training
  • Multiple filters learn different features (edges, textures, shapes)
  • Same filter is shared across the entire image (parameter sharing)

Analogy: Each filter is like a detective with a specific specialty:

  • Filter 1: Edge detective (finds boundaries)
  • Filter 2: Texture detective (finds patterns)
  • Filter 3: Shape detective (finds curves)

3.3 Local Connectivity

Traditional Neural Network: Every neuron connected to ALL inputs (fully connected)

  • Problem: Too many parameters for images!

CNN Approach: Each neuron connected to small local region only

  • Advantage: Fewer parameters, learns spatial hierarchies

3.4 Parameter Sharing

Concept: The same kernel weights are used across all spatial locations.

Benefit:

  • Drastically reduces parameters
  • If image has 1000×1000 pixels and filter is 3×3:
    • Without sharing: 1,000,000 × 9 = 9 million parameters
    • With sharing: Only 9 parameters!

3.5 Convolution Operation - Step by Step

Sliding Window Protocol:

  1. Kernel starts at top-left corner
  2. Moves left to right, computing dot product
  3. Reaches last column, resets to first column
  4. Moves one row down
  5. Repeats until entire image processed

Mathematical Operation (Element-wise multiplication + sum):

Example:

Input Patch:        Kernel:           Calculation:
[1  2  0]          [1   2   2]       (1×1)+(2×2)+(0×2)+
[2  1  1]    ×     [0   0   0]   =   (2×0)+(1×0)+(1×0)+
[0  5  0]          [-1  -2  -1]      (0×-1)+(5×-2)+(0×-1)

Result = 1 + 4 + 0 + 0 + 0 + 0 + 0 - 10 + 0 = -5

3.6 Batch Normalization

When: Applied between convolution and activation (ReLU)

Purpose:

  • Normalizes inputs of each layer
  • Reduces internal covariate shift (changes in activation distributions)
  • Acts as regularization

Benefits:

  • Enables higher learning rates → faster training
  • Stabilizes learning
  • Improves overall performance

Exam Tip: Batch Normalization is MORE effective in convolutional layers than Dropout!

3.7 Padding and Stride

Padding

Analogy: Like adding a picture frame around your photo to preserve its size.

Purpose:

  • Preserve spatial dimensions
  • Treat edge pixels similar to center pixels
  • Control output size

Visual:

Original 5×5:          With Padding (P=1):
[a b c d e]           [0 0 0 0 0 0 0]
[f g h i j]           [0 a b c d e 0]
[k l m n o]    →      [0 f g h i j 0]
[p q r s t]           [0 k l m n o 0]
[u v w x y]           [0 p q r s t 0]
                      [0 u v w x y 0]
                      [0 0 0 0 0 0 0]

Stride

Definition: Number of pixels the filter shifts at each step

Effect:

  • Stride = 1: Filter moves 1 pixel at a time → Larger output
  • Stride = 2: Filter moves 2 pixels at a time → Smaller output

4. Activation Function - ReLU

Analogy: ReLU is like a security gate that only lets positive values through and blocks negative ones.

4.1 ReLU Definition

Mathematical Formula: $$ y = \max(0, x) = \begin{cases} x & \text{if } x > 0 \ 0 & \text{if } x \leq 0 \end{cases} $$

4.2 Example

Input Matrix:

M = [-3  19   5]
    [ 7  -6  12]
    [ 4  -8  17]

After ReLU:

ReLU(M) = [0  19   5]
          [7   0  12]
          [4   0  17]

Purpose:

  • Introduces non-linearity (enables learning complex patterns)
  • Suppresses negative activations
  • Improves learning efficiency

5. Pooling Layer - The Downsampler

Analogy: Like creating a thumbnail image — you keep the important parts but reduce the size.

5.1 Types of Pooling

Max Pooling (Most Common) ⭐

  • Selects maximum element from each region
  • Keeps the strongest features
  • Better results in practice

Average Pooling

  • Calculates average of each region
  • Smooths the feature map

5.2 Example - Max Pooling

2×2 Max Pooling:

Input (4×4):              Output (2×2):
[1  3  2  4]              [3  4]
[5  6  7  8]       →      [9  11]
[9  2  1  3]
[4  5  11 7]

Top-left: max(1,3,5,6) = 6  →  Wait, let me recalculate:
[1  3 | 2  4]              [6  8]
[5  6 | 7  8]       →      [9  11]
-----------
[9  2 | 1  3]
[4  5 | 11 7]

5.3 Importance of Pooling

Benefits:

  1. Reduces spatial dimensions (height × width)
  2. Decreases parameters → Less computation
  3. Controls overfitting
  4. Translation invariance (small shifts don't affect output)
  5. Retains important features

What if NO pooling?

  • Feature maps retain same resolution
  • Increased computational complexity
  • Higher risk of overfitting

6. Flattening Layer - Bridge to Classification

Analogy: Like converting a 2D chessboard into a single line of pieces.

6.1 Process

Input: 2D Feature Map (Matrix) Output: 1D Vector

Example:

Feature Map (2×3):        Flattened Vector:
[2  5  1]                 [2, 5, 1, 4, 0, 3]
[4  0  3]          →

Purpose: Prepare data for fully connected layers (which require 1D input)


7. Fully Connected Layer - The Decision Maker

Analogy: Like a committee where every member (neuron) considers ALL information before voting.

7.1 Characteristics

  • Every neuron connected to all neurons in previous layer
  • Similar to traditional neural networks
  • Usually forms the final layers of CNN
  • Contains large number of parameters

7.2 Role

  • Takes feature vector from flattening layer
  • Learns complex decision boundaries
  • Classifies into different categories

Problem: Prone to overfitting (too many parameters) Solution: Use Dropout regularization


8. Regularization Techniques

8.1 Dropout

Analogy: Like training a sports team where random players sit out during practice. This forces all players to be versatile, not relying on specific teammates.

How it Works:

  • Randomly disable neurons during training
  • Typical dropout rate: 0.5 (50% neurons dropped)
  • At test time: All neurons active

Benefits:

  • Reduces node-to-node dependencies
  • Forces network to learn robust features
  • Better generalization to new data
  • Improves training speed

Exam Tip:

  • Dropout is more effective in fully connected layers
  • Batch Normalization is more effective in convolutional layers

9. Output Layer - Softmax

Analogy: Like a weather forecaster giving probabilities: 70% rain, 20% cloudy, 10% sunny. All probabilities sum to 100%.

9.1 Softmax Function

Purpose: Convert raw scores (logits) into probability distribution

Properties:

  • All outputs between 0 and 1
  • Sum of all outputs = 1.0
  • Used for multi-class classification

Example:

Logits (Raw Outputs):     Softmax Probabilities:
[1.3]                     [0.02]  → 2% Class 1
[3.1]              →      [0.90]  → 90% Class 2  ✓ (Predicted)
[2.2]                     [0.05]  → 5% Class 3
[0.7]                     [0.01]  → 1% Class 4
[1.9]                     [0.02]  → 2% Class 5
                          -----
                          Sum = 1.00

Exam Tip:

  • Binary classification: Use Sigmoid/Logistic function
  • Multi-class classification: Use Softmax

10. Mathematical Formulas - Quick Reference

10.1 Output Dimension Formula ⭐⭐⭐

Most Important Formula for Exams: $$ \text{Output Size} = \left\lfloor \frac{N - F + 2P}{S} \right\rfloor + 1 $$

Where:

  • N = Input dimension (height or width)
  • F = Kernel/Filter size
  • P = Padding
  • S = Stride
  • ⌊ ⌋ = Floor function (round down)

10.2 Parameter Count Formula ⭐⭐⭐

For Convolutional Layers: $$ \text{Parameters} = (\text{Kernel Height} \times \text{Kernel Width} \times \text{Input Channels} + 1) \times \text{Output Channels} $$

The +1 accounts for the bias term.

10.3 ReLU Formula

$$ \text{ReLU}(x) = \max(0, x) $$

10.4 Quick Reference Table

Operation Changes Dimensions? Adds Parameters?
Convolution Yes (depends on P, S) Yes (weights + bias)
Batch Normalization No Yes (scale + shift)
ReLU No No
Pooling Yes (reduces size) No
Flatten Yes (2D → 1D) No
Dropout No No
Softmax No No

11. Solved Examples

Example 1: Calculate Output Dimensions ⭐⭐⭐

Given:

  • Input: 10 × 10 × 10 (Width × Height × Channels)
  • Operations:
    1. 3×3 Conv (40 channels), stride=1, padding=1
    2. ReLU
    3. 3×3 Max Pooling, stride=1, padding=1
    4. 3×3 Conv (20 channels), stride=1, padding=1
    5. ReLU
    6. 2×2 Max Pooling, stride=2, padding=1

Solution:

Step 1: 3×3 Convolution (40 channels) $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = \left\lfloor \frac{9}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 40

Step 2: ReLU

  • No dimension change Output: 10 × 10 × 40

Step 3: 3×3 Max Pooling $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 40

Step 4: 3×3 Convolution (20 channels) $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 20

Step 5: ReLU

  • No dimension change Output: 10 × 10 × 20

Step 6: 2×2 Max Pooling (stride=2) $$ \text{Width} = \left\lfloor \frac{10 - 2 + 2(1)}{2} \right\rfloor + 1 = \left\lfloor \frac{10}{2} \right\rfloor + 1 = 6 $$ $$ \text{Height} = \left\lfloor \frac{10 - 2 + 2(1)}{2} \right\rfloor + 1 = 6 $$ Output: 6 × 6 × 20

Final Answer: 6 × 6 × 20


Example 2: Count Parameters ⭐⭐⭐

Same network as Example 1. Calculate total parameters.

Solution:

First Convolution (3×3, 40 channels):

  • Kernel: 3 × 3
  • Input Channels: 10
  • Output Channels: 40

$$ \text{Parameters} = (3 \times 3 \times 10 + 1) \times 40 $$ $$ = (90 + 1) \times 40 = 91 \times 40 = 3,640 $$

Second Convolution (3×3, 20 channels):

  • Kernel: 3 × 3
  • Input Channels: 40 (from previous layer)
  • Output Channels: 20

$$ \text{Parameters} = (3 \times 3 \times 40 + 1) \times 20 $$ $$ = (360 + 1) \times 20 = 361 \times 20 = 7,220 $$

ReLU and Pooling: 0 parameters (no learnable weights)

Total Parameters: $$ 3,640 + 7,220 = \boxed{10,860} $$


Example 3: Convolution with Manual Calculation

Given:

Input (5×5):              Kernel (3×3):
[1  2  0  3  2]          [1   2   2]
[2  1  1  1  1]          [0   0   0]
[0  5  0  0  0]          [-1  -2  -1]
[3  7  0  6  0]
[1  1  3  2  0]

Stride = 1, Padding = 0

Calculate output dimensions: $$ \text{Output} = \left\lfloor \frac{5 - 3 + 0}{1} \right\rfloor + 1 = 3 $$ Output will be 3 × 3

Calculate O₁,₁ (top-left output):

Patch:                Kernel:
[1  2  0]            [1   2   2]
[2  1  1]     ×      [0   0   0]
[0  5  0]            [-1  -2  -1]

Calculation:
(1×1) + (2×2) + (0×2) +
(2×0) + (1×0) + (1×0) +
(0×-1) + (5×-2) + (0×-1)

= 1 + 4 + 0 + 0 + 0 + 0 + 0 - 10 + 0 = -5

Complete Feature Map:

[-5   3   10]
[-11  -8  -7]
[4    -4  -7]

12. CNN Summary - The Complete Flow

Step-by-Step Process:

  1. Input: Image (pixel matrix) enters the network

  2. Convolution: Filters slide over image, extract features

    • Creates feature maps
    • Parameter sharing reduces complexity
  3. Batch Normalization: Normalize activations

    • Stabilizes learning
    • Reduces covariate shift
  4. ReLU: Non-linear activation

    • Keeps positive values
    • Sets negatives to zero
  5. Pooling: Downsample feature maps

    • Reduces dimensions
    • Retains important features
    • Provides translation invariance
  6. Repeat 2-5: Multiple times for deeper features

    • Early layers: Low-level features (edges, textures)
    • Deep layers: High-level features (shapes, objects)
  7. Flatten: Convert 2D feature maps to 1D vector

  8. Fully Connected: Learn decision boundaries

    • Apply dropout to prevent overfitting
  9. Softmax: Convert to probabilities

    • Final classification output

13. Key Differences - Quick Comparison

CNN vs Traditional Neural Network

Aspect Traditional NN CNN
Connectivity Fully connected Locally connected
Parameters Very high Reduced (sharing)
Input Type 1D vectors 2D/3D images
Spatial Info Lost Preserved
Best For Tabular data Images, spatial data

Batch Norm vs Dropout

Aspect Batch Normalization Dropout
Best in Convolutional layers Fully connected layers
Purpose Reduce covariate shift Prevent overfitting
During Test Active (with learned stats) Inactive
Effect Normalizes activations Randomly drops neurons

14. Exam Tips and Common Mistakes ⚠️

✅ Do's:

  1. Always use floor function in dimension calculations
  2. Remember the +1 in bias for parameter counting
  3. Check if padding is specified (default is usually 0)
  4. ReLU and Pooling don't add parameters
  5. Channels dimension doesn't change in pooling

❌ Don'ts:

  1. Don't forget the floor operation ⌊ ⌋
  2. Don't confuse stride with kernel size
  3. Don't count activation functions as having parameters
  4. Don't mix up input channels vs output channels
  5. Don't forget to add bias (+1) in parameter formula

15. Practice Problems for Exams

Problem 1:

Input: 32 × 32 × 3 Conv: 5×5 kernel, 64 filters, stride=2, padding=2 What is output dimension?

Answer: $$ \left\lfloor \frac{32 - 5 + 2(2)}{2} \right\rfloor + 1 = \left\lfloor \frac{33}{2} \right\rfloor + 1 = 16 + 1 = 17 $$ Output: 17 × 17 × 64

Problem 2:

How many parameters in above convolution? $$ (5 \times 5 \times 3 + 1) \times 64 = 76 \times 64 = 4,864 $$

Problem 3:

If RGB image is 256 × 256, how many pixel values total? $$ 256 \times 256 \times 3 = 196,608 $$


16. Memory Aid - Formulas to Memorize 📝

The Big Three:

  1. Output Dimension: $\left\lfloor \frac{N - F + 2P}{S} \right\rfloor + 1$

  2. Parameters Count: $(F_h \times F_w \times C_{in} + 1) \times C_{out}$

  3. ReLU: $\text{max}(0, x)$

Remember CBRP:

  • Convolution (extract features)
  • Batch Normalization (stabilize)
  • ReLU (activate)
  • Pooling (downsample)

17. Final Checklist Before Exam ✓

  • Can calculate output dimensions with any P, S, F, N?
  • Can count parameters for convolutional layers?
  • Understand difference between local vs full connectivity?
  • Know when to use Batch Norm vs Dropout?
  • Can perform manual convolution calculation?
  • Understand purpose of each layer type?
  • Know which operations add parameters?
  • Can explain parameter sharing benefit?
  • Understand flattening process?
  • Know Softmax vs Sigmoid usage?

Good Luck! 🎯

Remember: CNNs are just systematic pattern extractors. Each layer has a specific job, and together they transform raw pixels into meaningful predictions!