Deep Neural Networks: The complete picture

Deep Neural Networks also called as Artificial Neural Networks (ANNs) are a popular choice for solving problems in machine learning, computer vision, natural language processing etc. The name is inspired from the human brain and the structure is designed to mimic the biological data processing through neurons.

During my learning phase, I had found the individual steps involved in building a deep neural network to be understandable, although I had found it elusive to grasp the complete picture. Visualizing the relationship between various steps involved in training the network was really helpful as it provides a high level view of the network’s operation. Hence I am writing this article to share my understanding of the deep neural network from the scratch to a bird’s eye view of the complete process. If you are a complete beginner to this topic I suggest you to read this article from which can provide a basic intuition of why the network is designed this way. With that being said, let’s dive in.

Artificial Neuron

Artificial Neuron in a deep neural network represents a fundamental unit in the neural network. It includes a linear transformation step and an activation step. Each neuron has weights and bias(threshold) associated with it. It can be represented as follows.

During training, neurons in the network will be automatically tuned to selectively derive required features from the input based on the supplied output data provided.

Structure of ANN

A typical deep neural network has the following data layers

A Input layer- Input features that are fed into the network

Hidden layers- Layers that transform the input to the output with the help of multiple neurons

An Output layer- Produce an output in the desired format. (say Probability)

Steps involved in training a deep neural network

Initialize the parameters that we would like to learn in the network (weights and biases of all neurons across the layers in our network)

Forward propagation- Running the network for a single forward pass and produce an output

Backward propagation- Calculate gradient of the some chosen error metric (cost function) with respect to the network parameters

Update the parameters using gradients according to a chosen optimization algorithm (Eg: Gradient descent)

Notations used

X- Input | Y- Output | m-Total number of data in training set

L-Number of layers in the network | l →1,2,...L

W-Weight | b-Bias

Z- Output after linear transformation | A-Output after Activation | g-Activation function

1. Initializing parameters

Weights are initialized randomly in deep neural networks whereas bias for the layers can be initialized as zero. To read more about the reason behind this refer this article.

2. Forward propagation

Forward propagation is when a series of transformations are applied to the input data to produce an output. The superscript is used to denote the layer number. Since the parameters are vectorized, the following image describes the all the computations involved in a hidden layer.

3. Back propagation

Backprop involves calculating the gradient(s) of the network’s Loss function w.r.t the parameter under consideration. A single unit of backprop can be expressed as follows. The formula below is the gradient of the chosen cost function w.r.t the forward prop equations.

4. Update parameters

Gradients calculated during the backpropagation step are used to update the parameters so as to reduce the error or cost of the network. The following equation describes the parameter updates during the training phase of the network where $\alpha$ is the learning rate which is a hyperparameter that determines the rate at which the parameters are updated and prefix d represents the differential or gradient of the corresponding term.

$$W=W-\alpha dW$$

$$ b=b-\alpha db$$

The Complete picture

Now that we have performed one complete loop of steps involved in training our network, we can take a look at the complete network with all the necessary pieces put together.

The output produced from the forward prop A[L] is compared with actual output Y and the error is calculated using a cost function (Eg: Binary cross entropy). The gradients are subsequently calculated to update the parameters and reduce error. These steps are repeated until the error is within the specification required.

We have learnt about various steps involved in building deep neural networks. I hope you found the article useful. Thank you for reading 🙂