Deep Learning 101: The Perceptron

5 min readJan 14, 2021

A brief Intro: -

It’s a supervised linear classifier(binary). It’s a single layer neural network.

History:

The perceptron algorithm was the brain child of Frank Rosenblatt, developed at the Cornell Aeronautical Laboratory, funded by the United States Office of Naval Research. It’s first implementation was software running on IBM 704.It was subsequently reimplemented as a custom-built hardware called the “Mark 1 perceptron”. It was designed with the intension of performing image classification tasks.

Although the perceptron seemed promising it could classify only linearly separable patterns. For example, it can’t learn XOR function.

Why is it important?

The perceptron was one of the early implementations of the neural networks. It was inspired by the basic understanding of the human brain. It laid one of the foundations for the modern neural network.

Understanding perceptron essential to understand all other neural networks. Perceptron helps us understand the functions of weights, NET value, Activation Function.

Structure:

The structure of the perceptron is simple. It has one input layer and one output layer.

Note: But the total number of layers is one since input layer just passes the input to the output layer (no math operations happens at input layer).

X1, X2, …, Xn are the inputs, W1, W2, …., Wn are the weights, Y is the output , n is the number of inputs. In the output layer NET sum (Z) is calculated and the sum is passed to the activation function (Such as Sigmoid) to produce the output.

The structure overall contains four parts.

1. Inputs (X1, X2, …, Xn )

2. Weights (W1, W2, Wn ) & Bias(b)

3. NET Sum (Z)

4. Activation function (σ)

Working:

The perceptron is a binary classifier. It maps the inputs (X1, X2, …, Xn ) to output Ŷ. The Ŷ outputs a value 0 or 1. 0 Means one class/label and zero meaning the other.

Note: To first perform this mapping the neural network is first trained so it can learn weights values.

This kind of network can be used for a number of classification applications. For example, find binary output of 3 input gates. Check out this example as implementation in python in Part 2 of the post.

Training:

Forward Pass & Backward Pass.

The algorithm involves a forward pass and a subsequent backward pass during each iteration of the training process. The backward pass alters the weights and bias so that it gives a more accurate value prediction.

Forward Pass

1)First the inputs (X1, X2, …, Xn ) are passed to the input layer.

2)Weights are randomly initialised. (W1, W2, …, Wn ) (With Value between 0 to 1)

3)For each node input(X) and weights(W) are multiplied and let that value be K.

4)Net value (σ) is calculated by summation of K for all input nodes.

5)The Net value is passed to an activation function (Like Sigmoid) Which outputs a value between 0 and 1.

Now after the first iteration, the value predicted will most probably not be close to actual value. So, the weights need to be updated.

Note: The Weights act like the memory unit of the neural network.

Backward Pass.

The error is calculated using the formula.

Where Y is actual Ŷ is predicted value.

Note: Now we need to reduce this error and make predicted value close to actual value. The perceptron uses a rule called “The Perceptron Learning Rule” to updates the Weights.

1)First the error is calculated using.

2)Next is calculated using the formula.

3)The bias is updated using the formula.

Where μ is the learning rate, ẽ is the error, Xi is inputs, i is the node number.

3)Steps 1. to 3. is repeated till the convergence is reached.

Convergence.

When Y-Ŷ = 0(close to zero) convergence is reached. The weights and bias are properly set. Thus, the neural network has trained.

So how is the model now used?

Now using the trained model, the input values are given to the input layer and the neural network performs a single forward pass with this value as input (like described in Training process — Forward Prop Steps 1.to 5.) and the output Ŷ is predicted.

For example, in the case of using perceptron as an AND gate. We do the following.

First train the model using the inputs and outputs of an AND gate.
Where X1,X2,X3 are the inputs and Y is the output (Output label). This is the training data using which we train the neural network.

For predicting values.

We pass X1,X2,X3 (inputs) to the trained neural network and the value Y is predicted which is the output of AND gate.

Deep Learning 101: The Perceptron

Written by Christopher Ebenezer