A simplistic implementation of a perceptron, trainable with Perceptron Learning Aglorithm (PLA). This is a very basic implementation which uses primitive (Heaviside) step activation function and a very simple loss function.
The perceptron was originally introduced by Frank Rosenblatt in his research paper: "The perceptron: A probabilistic model for information storage and organization in the brain." and later refined by Marvin Minsky and Seymour Papert in "Perceptrons : an introduction to computational geometry"
The perceptron implemention follows the explaination from CS6910/CS7015 - Deep Learning Course lectures (L1, L2) by Prof. Mitesh M. Khapra
I recommend checking out his lectures!
Since, using a single perceptron it is only possible to represent linearly separable functions, we use examples from the boolean world which are linearly separable except for the XOR and XNOR datasets which are not linearly separable!
Above is the (theoretical) mathematical representation of the algorithm. Reference
In essense,
To represent / learn a linearly separable function with 2 boolean input features and 1 boolean output.
We can ...
In the formula of the weighted sum i.e. (
We define set P as all the inputs in the dataset for which the expected output is 1 and vice-versa as the set N.
Thus,
-
Input Vector
x: To account for the bias, we augment our input vector. Instead of$[x_1, x_2]$ , it becomes$x = [x_0, x_1, x_2]$ , where$x_0$ is always 1. -
Weights Vector
w: This vector will therefore have 3 corresponding weights:$w = [w_0, w_1, w_2]$ . -
Initialization: We initialize
wrandomly. For example,wcould start as[0, 0, 0]or small random values.
The term
The algorithm now enters the while !convergence loop. In each iteration, it does the following:
It picks one random input vector x from the entire dataset (P N).
It calculates the weighted sum, also called the activation or net input. Let's call it
In our 2-input case, this is:
The perceptron's prediction rule is simple:
- If
$a \ge 0$ , it predicts 1 (positive class). - If
$a < 0$ , it predicts 0 (negative class).
This rule is essentially the heaviside step activation function which has a harsh separating threshold.
This is the core of the algorithm. It checks if the prediction from Step 2 was wrong.
Case 1: A False Negative (Type 1 Error)
-
Check:
if x ∈ P and a < 0 then -
Meaning: The true label is 1 (it's in
P), but the algorithm predicted 0 (because the sum$a$ was negative). This is an error. -
Update Rule:
w = w + x -
Why? By adding the input vector
xto the weightsw, we are "nudging" the weight vector to be more similar tox. This makes it more likely that the dot product$\mathbf{w} \cdot \mathbf{x}$ will be positive the next time this input is seen.
Case 2: A False Positive (Type 2 Error)
-
Check:
if x ∈ N and a ≥ 0 then -
Meaning: The true label is 0 (it's in
N), but the algorithm predicted 1 (because the sum$a$ was zero or positive). This is also an error. -
Update Rule:
w = w - x -
Why? By subtracting the input vector
xfrom the weightsw, we are "pushing" the weight vector away fromx. This makes it more likely that the dot product$\mathbf{w} \cdot \mathbf{x}$ will be negative the next time.
If the picked input x is classified correctly (i.e., it's in P and N and
The loop continues this process of picking a random point, checking for an error, and updating the weights if an error is made.
Convergence is achieved when the algorithm can pass through the entire dataset, one input at a time, and make zero corrections. This means:
- Every input
xinPresults in a weighted sum$a \ge 0$ . (Preferrably $a > 0$ which indicates clear bifurcation) - Every input
xinNresults in a weighted sum$a < 0$ .
Once this happens, the loop condition !convergence becomes false, and the algorithm terminates. The final vector w now defines the parameters of a line ($w_0 + w_1x_1 + w_2x_2 = 0$) that perfectly separates the positive and negative data points.
This convergence is only guaranteed if the dataset is linearly separable (meaning such a line actually exists).
pip install -r requirements.txtpython train.py --data_path data/*.csv --epochs 15 --model_path saved_models/* --verbosepython test.py --data_path data/*.csv --model_path saved_models/*