Convolutional Neural Networks (CNN) - Exam Prep Notes

From Pixels to Advanced Classification

📚 Table of Contents

Understanding Image Representation
CNN Architecture Overview
Convolution Layer - The Feature Extractor
Activation Function - ReLU
Pooling Layer - The Downsampler
Flattening Layer - Bridge to Classification
Fully Connected Layer - The Decision Maker
Regularization Techniques
Output Layer - Softmax
Mathematical Formulas - Quick Reference
Solved Examples

1. Understanding Image Representation

1.1 What is a Pixel?

Analogy: Think of a digital image like a mosaic artwork. Each tiny colored tile is a pixel, and together they create the complete picture.

Key Concepts:

A digital image = Matrix of tiny units called pixels
Each pixel stores intensity or color information
Computers don't see objects — they see numbers!

1.2 Image as a Matrix

Grayscale vs. RGB Images:

Image Type	Pixel Storage	Example Value
Grayscale	1 number per pixel (brightness)	[128]
RGB Color	3 numbers per pixel (R, G, B)	[128, 64, 32]

Example Calculation:

A tree image: 148 × 148 pixels = 21,904 individual values
Grayscale: 21,904 values
RGB Color: 21,904 × 3 = 65,712 values

Exam Tip: For RGB image of size M × N:

Total values = M × N × 3

2. CNN Architecture Overview

Analogy: CNN is like an assembly line in a factory:

Raw material (Input image) enters
Quality inspection stations (Convolution + ReLU) extract features
Compression units (Pooling) reduce size
Assembly workers (Fully connected layers) make final decisions
Quality control (Softmax) provides confidence scores

2.1 Complete Pipeline

Input Image → [Convolution → Batch Norm → ReLU → Pooling] × n
            → Flatten → Fully Connected → Dropout → Softmax → Output

Key Properties:

Input layer: Original image (pixel values)
Hidden layers: Convolution, ReLU, Pooling (partially connected)
Output layer: Fully connected + Softmax (classification)

3. Convolution Layer - The Feature Extractor

Analogy: Imagine using a magnifying glass to scan a document. You move it systematically across the page, examining small sections at a time. That's how convolution filters work!

3.1 What is Convolution?

Definition: A mathematical operation combining two functions to produce a third function.

In CNN:

Feature Map = Convolution(Input Image, Kernel/Filter)

3.2 Filters/Kernels

Key Points:

Filters are learnable matrices with weights and bias
Weights are randomly initialized, updated during training
Multiple filters learn different features (edges, textures, shapes)
Same filter is shared across the entire image (parameter sharing)

Analogy: Each filter is like a detective with a specific specialty:

Filter 1: Edge detective (finds boundaries)
Filter 2: Texture detective (finds patterns)
Filter 3: Shape detective (finds curves)

3.3 Local Connectivity

Traditional Neural Network: Every neuron connected to ALL inputs (fully connected)

Problem: Too many parameters for images!

CNN Approach: Each neuron connected to small local region only

Advantage: Fewer parameters, learns spatial hierarchies

3.4 Parameter Sharing

Concept: The same kernel weights are used across all spatial locations.

Benefit:

Drastically reduces parameters
If image has 1000×1000 pixels and filter is 3×3:
- Without sharing: 1,000,000 × 9 = 9 million parameters
- With sharing: Only 9 parameters!

3.5 Convolution Operation - Step by Step

Sliding Window Protocol:

Kernel starts at top-left corner
Moves left to right, computing dot product
Reaches last column, resets to first column
Moves one row down
Repeats until entire image processed

Mathematical Operation (Element-wise multiplication + sum):

Example:

Input Patch:        Kernel:           Calculation:
[1  2  0]          [1   2   2]       (1×1)+(2×2)+(0×2)+
[2  1  1]    ×     [0   0   0]   =   (2×0)+(1×0)+(1×0)+
[0  5  0]          [-1  -2  -1]      (0×-1)+(5×-2)+(0×-1)

Result = 1 + 4 + 0 + 0 + 0 + 0 + 0 - 10 + 0 = -5

3.6 Batch Normalization

When: Applied between convolution and activation (ReLU)

Purpose:

Normalizes inputs of each layer
Reduces internal covariate shift (changes in activation distributions)
Acts as regularization

Benefits:

Enables higher learning rates → faster training
Stabilizes learning
Improves overall performance

Exam Tip: Batch Normalization is MORE effective in convolutional layers than Dropout!

3.7 Padding and Stride

Padding

Analogy: Like adding a picture frame around your photo to preserve its size.

Purpose:

Preserve spatial dimensions
Treat edge pixels similar to center pixels
Control output size

Visual:

Original 5×5:          With Padding (P=1):
[a b c d e]           [0 0 0 0 0 0 0]
[f g h i j]           [0 a b c d e 0]
[k l m n o]    →      [0 f g h i j 0]
[p q r s t]           [0 k l m n o 0]
[u v w x y]           [0 p q r s t 0]
                      [0 u v w x y 0]
                      [0 0 0 0 0 0 0]

Stride

Definition: Number of pixels the filter shifts at each step

Effect:

Stride = 1: Filter moves 1 pixel at a time → Larger output
Stride = 2: Filter moves 2 pixels at a time → Smaller output

4. Activation Function - ReLU

Analogy: ReLU is like a security gate that only lets positive values through and blocks negative ones.

4.1 ReLU Definition

Mathematical Formula: $$ y = \max(0, x) = \begin{cases} x & \text{if } x > 0 \ 0 & \text{if } x \leq 0 \end{cases} $$

4.2 Example

Input Matrix:

M = [-3  19   5]
    [ 7  -6  12]
    [ 4  -8  17]

After ReLU:

ReLU(M) = [0  19   5]
          [7   0  12]
          [4   0  17]

Purpose:

Introduces non-linearity (enables learning complex patterns)
Suppresses negative activations
Improves learning efficiency

5. Pooling Layer - The Downsampler

Analogy: Like creating a thumbnail image — you keep the important parts but reduce the size.

5.1 Types of Pooling

Max Pooling (Most Common) ⭐

Selects maximum element from each region
Keeps the strongest features
Better results in practice

Average Pooling

Calculates average of each region
Smooths the feature map

5.2 Example - Max Pooling

2×2 Max Pooling:

Input (4×4):              Output (2×2):
[1  3  2  4]              [3  4]
[5  6  7  8]       →      [9  11]
[9  2  1  3]
[4  5  11 7]

Top-left: max(1,3,5,6) = 6  →  Wait, let me recalculate:
[1  3 | 2  4]              [6  8]
[5  6 | 7  8]       →      [9  11]
-----------
[9  2 | 1  3]
[4  5 | 11 7]

5.3 Importance of Pooling

Benefits:

Reduces spatial dimensions (height × width)
Decreases parameters → Less computation
Controls overfitting
Translation invariance (small shifts don't affect output)
Retains important features

What if NO pooling?

Feature maps retain same resolution
Increased computational complexity
Higher risk of overfitting

6. Flattening Layer - Bridge to Classification

Analogy: Like converting a 2D chessboard into a single line of pieces.

6.1 Process

Input: 2D Feature Map (Matrix) Output: 1D Vector

Example:

Feature Map (2×3):        Flattened Vector:
[2  5  1]                 [2, 5, 1, 4, 0, 3]
[4  0  3]          →

Purpose: Prepare data for fully connected layers (which require 1D input)

7. Fully Connected Layer - The Decision Maker

Analogy: Like a committee where every member (neuron) considers ALL information before voting.

7.1 Characteristics

Every neuron connected to all neurons in previous layer
Similar to traditional neural networks
Usually forms the final layers of CNN
Contains large number of parameters

7.2 Role

Takes feature vector from flattening layer
Learns complex decision boundaries
Classifies into different categories

Problem: Prone to overfitting (too many parameters) Solution: Use Dropout regularization

8. Regularization Techniques

8.1 Dropout

Analogy: Like training a sports team where random players sit out during practice. This forces all players to be versatile, not relying on specific teammates.

How it Works:

Randomly disable neurons during training
Typical dropout rate: 0.5 (50% neurons dropped)
At test time: All neurons active

Benefits:

Reduces node-to-node dependencies
Forces network to learn robust features
Better generalization to new data
Improves training speed

Exam Tip:

Dropout is more effective in fully connected layers
Batch Normalization is more effective in convolutional layers

9. Output Layer - Softmax

Analogy: Like a weather forecaster giving probabilities: 70% rain, 20% cloudy, 10% sunny. All probabilities sum to 100%.

9.1 Softmax Function

Purpose: Convert raw scores (logits) into probability distribution

Properties:

All outputs between 0 and 1
Sum of all outputs = 1.0
Used for multi-class classification

Example:

Logits (Raw Outputs):     Softmax Probabilities:
[1.3]                     [0.02]  → 2% Class 1
[3.1]              →      [0.90]  → 90% Class 2  ✓ (Predicted)
[2.2]                     [0.05]  → 5% Class 3
[0.7]                     [0.01]  → 1% Class 4
[1.9]                     [0.02]  → 2% Class 5
                          -----
                          Sum = 1.00

Exam Tip:

Binary classification: Use Sigmoid/Logistic function
Multi-class classification: Use Softmax

10. Mathematical Formulas - Quick Reference

10.1 Output Dimension Formula ⭐⭐⭐

Most Important Formula for Exams: $$ \text{Output Size} = \left\lfloor \frac{N - F + 2P}{S} \right\rfloor + 1 $$

Where:

N = Input dimension (height or width)
F = Kernel/Filter size
P = Padding
S = Stride
⌊ ⌋ = Floor function (round down)

10.2 Parameter Count Formula ⭐⭐⭐

For Convolutional Layers: $$ \text{Parameters} = (\text{Kernel Height} \times \text{Kernel Width} \times \text{Input Channels} + 1) \times \text{Output Channels} $$

The +1 accounts for the bias term.

10.3 ReLU Formula

$$ \text{ReLU}(x) = \max(0, x) $$

10.4 Quick Reference Table

Operation	Changes Dimensions?	Adds Parameters?
Convolution	Yes (depends on P, S)	Yes (weights + bias)
Batch Normalization	No	Yes (scale + shift)
ReLU	No	No
Pooling	Yes (reduces size)	No
Flatten	Yes (2D → 1D)	No
Dropout	No	No
Softmax	No	No

11. Solved Examples

Example 1: Calculate Output Dimensions ⭐⭐⭐

Given:

Input: 10 × 10 × 10 (Width × Height × Channels)
Operations:
1. 3×3 Conv (40 channels), stride=1, padding=1
2. ReLU
3. 3×3 Max Pooling, stride=1, padding=1
4. 3×3 Conv (20 channels), stride=1, padding=1
5. ReLU
6. 2×2 Max Pooling, stride=2, padding=1

Solution:

Step 1: 3×3 Convolution (40 channels) $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = \left\lfloor \frac{9}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 40 ✓

Step 2: ReLU

No dimension change Output: 10 × 10 × 40 ✓

Step 3: 3×3 Max Pooling $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 40 ✓

Step 4: 3×3 Convolution (20 channels) $$ \text{Width} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ $$ \text{Height} = \left\lfloor \frac{10 - 3 + 2(1)}{1} \right\rfloor + 1 = 10 $$ Output: 10 × 10 × 20 ✓

Step 5: ReLU

No dimension change Output: 10 × 10 × 20 ✓

Step 6: 2×2 Max Pooling (stride=2) $$ \text{Width} = \left\lfloor \frac{10 - 2 + 2(1)}{2} \right\rfloor + 1 = \left\lfloor \frac{10}{2} \right\rfloor + 1 = 6 $$ $$ \text{Height} = \left\lfloor \frac{10 - 2 + 2(1)}{2} \right\rfloor + 1 = 6 $$ Output: 6 × 6 × 20 ✓

Final Answer: 6 × 6 × 20

Example 2: Count Parameters ⭐⭐⭐

Same network as Example 1. Calculate total parameters.

Solution:

First Convolution (3×3, 40 channels):

Kernel: 3 × 3
Input Channels: 10
Output Channels: 40

$$ \text{Parameters} = (3 \times 3 \times 10 + 1) \times 40 $$ $$ = (90 + 1) \times 40 = 91 \times 40 = 3,640 $$

Second Convolution (3×3, 20 channels):

Kernel: 3 × 3
Input Channels: 40 (from previous layer)
Output Channels: 20

$$ \text{Parameters} = (3 \times 3 \times 40 + 1) \times 20 $$ $$ = (360 + 1) \times 20 = 361 \times 20 = 7,220 $$

ReLU and Pooling: 0 parameters (no learnable weights)

Total Parameters: $$ 3,640 + 7,220 = \boxed{10,860} $$

Example 3: Convolution with Manual Calculation

Given:

Input (5×5):              Kernel (3×3):
[1  2  0  3  2]          [1   2   2]
[2  1  1  1  1]          [0   0   0]
[0  5  0  0  0]          [-1  -2  -1]
[3  7  0  6  0]
[1  1  3  2  0]

Stride = 1, Padding = 0

Calculate output dimensions: $$ \text{Output} = \left\lfloor \frac{5 - 3 + 0}{1} \right\rfloor + 1 = 3 $$ Output will be 3 × 3

Calculate O₁,₁ (top-left output):

Patch:                Kernel:
[1  2  0]            [1   2   2]
[2  1  1]     ×      [0   0   0]
[0  5  0]            [-1  -2  -1]

Calculation:
(1×1) + (2×2) + (0×2) +
(2×0) + (1×0) + (1×0) +
(0×-1) + (5×-2) + (0×-1)

= 1 + 4 + 0 + 0 + 0 + 0 + 0 - 10 + 0 = -5

Complete Feature Map:

[-5   3   10]
[-11  -8  -7]
[4    -4  -7]

12. CNN Summary - The Complete Flow

Step-by-Step Process:

Input: Image (pixel matrix) enters the network
Convolution: Filters slide over image, extract features
- Creates feature maps
- Parameter sharing reduces complexity
Batch Normalization: Normalize activations
- Stabilizes learning
- Reduces covariate shift
ReLU: Non-linear activation
- Keeps positive values
- Sets negatives to zero
Pooling: Downsample feature maps
- Reduces dimensions
- Retains important features
- Provides translation invariance
Repeat 2-5: Multiple times for deeper features
- Early layers: Low-level features (edges, textures)
- Deep layers: High-level features (shapes, objects)
Flatten: Convert 2D feature maps to 1D vector
Fully Connected: Learn decision boundaries
- Apply dropout to prevent overfitting
Softmax: Convert to probabilities
- Final classification output

13. Key Differences - Quick Comparison

CNN vs Traditional Neural Network

Aspect	Traditional NN	CNN
Connectivity	Fully connected	Locally connected
Parameters	Very high	Reduced (sharing)
Input Type	1D vectors	2D/3D images
Spatial Info	Lost	Preserved
Best For	Tabular data	Images, spatial data

Batch Norm vs Dropout

Aspect	Batch Normalization	Dropout
Best in	Convolutional layers	Fully connected layers
Purpose	Reduce covariate shift	Prevent overfitting
During Test	Active (with learned stats)	Inactive
Effect	Normalizes activations	Randomly drops neurons

14. Exam Tips and Common Mistakes ⚠️

✅ Do's:

Always use floor function in dimension calculations
Remember the +1 in bias for parameter counting
Check if padding is specified (default is usually 0)
ReLU and Pooling don't add parameters
Channels dimension doesn't change in pooling

❌ Don'ts:

Don't forget the floor operation ⌊ ⌋
Don't confuse stride with kernel size
Don't count activation functions as having parameters
Don't mix up input channels vs output channels
Don't forget to add bias (+1) in parameter formula

15. Practice Problems for Exams

Problem 1:

Input: 32 × 32 × 3 Conv: 5×5 kernel, 64 filters, stride=2, padding=2 What is output dimension?

Answer: $$ \left\lfloor \frac{32 - 5 + 2(2)}{2} \right\rfloor + 1 = \left\lfloor \frac{33}{2} \right\rfloor + 1 = 16 + 1 = 17 $$ Output: 17 × 17 × 64

Problem 2:

How many parameters in above convolution? $$ (5 \times 5 \times 3 + 1) \times 64 = 76 \times 64 = 4,864 $$

Problem 3:

If RGB image is 256 × 256, how many pixel values total? $$ 256 \times 256 \times 3 = 196,608 $$

16. Memory Aid - Formulas to Memorize 📝

The Big Three:

Output Dimension: $\left\lfloor \frac{N - F + 2P}{S} \right\rfloor + 1$
Parameters Count: $(F_h \times F_w \times C_{in} + 1) \times C_{out}$
ReLU: $\text{max}(0, x)$

Remember CBRP:

Convolution (extract features)
Batch Normalization (stabilize)
ReLU (activate)
Pooling (downsample)

17. Final Checklist Before Exam ✓

Good Luck! 🎯

Remember: CNNs are just systematic pattern extractors. Each layer has a specific job, and together they transform raw pixels into meaningful predictions!

FilesExpand file tree

CNN_Exam_Prep_Notes.md

Latest commit

History

CNN_Exam_Prep_Notes.md

File metadata and controls

Convolutional Neural Networks (CNN) - Exam Prep Notes

From Pixels to Advanced Classification

📚 Table of Contents

1. Understanding Image Representation

1.1 What is a Pixel?

1.2 Image as a Matrix

2. CNN Architecture Overview

2.1 Complete Pipeline

3. Convolution Layer - The Feature Extractor

3.1 What is Convolution?

3.2 Filters/Kernels

3.3 Local Connectivity

3.4 Parameter Sharing

3.5 Convolution Operation - Step by Step

3.6 Batch Normalization

3.7 Padding and Stride

Padding

Stride

4. Activation Function - ReLU

4.1 ReLU Definition

4.2 Example

5. Pooling Layer - The Downsampler

5.1 Types of Pooling

Max Pooling (Most Common) ⭐

Average Pooling

5.2 Example - Max Pooling

5.3 Importance of Pooling

6. Flattening Layer - Bridge to Classification

6.1 Process

7. Fully Connected Layer - The Decision Maker

7.1 Characteristics

7.2 Role

8. Regularization Techniques

8.1 Dropout

9. Output Layer - Softmax

9.1 Softmax Function

10. Mathematical Formulas - Quick Reference

10.1 Output Dimension Formula ⭐⭐⭐

10.2 Parameter Count Formula ⭐⭐⭐

10.3 ReLU Formula

10.4 Quick Reference Table

11. Solved Examples

Example 1: Calculate Output Dimensions ⭐⭐⭐

Example 2: Count Parameters ⭐⭐⭐

Example 3: Convolution with Manual Calculation

12. CNN Summary - The Complete Flow

13. Key Differences - Quick Comparison

CNN vs Traditional Neural Network

Batch Norm vs Dropout

14. Exam Tips and Common Mistakes ⚠️

✅ Do's:

❌ Don'ts:

15. Practice Problems for Exams

Problem 1:

Problem 2:

Problem 3:

16. Memory Aid - Formulas to Memorize 📝

The Big Three:

Remember CBRP:

17. Final Checklist Before Exam ✓

Good Luck! 🎯