Updated On : Nov-14,2021 Time Investment : ~45 mins

Guide to Create Simple Neural Networks using PyTorch

Pytorch is a Python library that provides a framework for developing deep neural networks. It has a numpy-like API for working with N-dimensional arrays but operations on an array can be run on GPU as well which will be quite fast compared to when run on CPU. Apart from linear algebra on GPU, it provides autograd functionality which automatically calculates the gradients of function with respect to specified variables. This has sped up the deep learning research a lot as scientists do not need to write code to find out gradients of loss of complicated neural networks. Apart from this, it also provides different modules to create neural network layers, loss functions, optimizers, etc. Overall, PyTorch is specifically designed to speed up deep learning research with so many functionalities.

As a part of this tutorial, we'll explain with simple examples how we can create a simple neural network to solve regression and classification tasks using PyTorch. We'll be using toy datasets available from scikit-learn for our problem. We assume that the reader of this tutorial has a little bit of background on neural network terms (like hidden layers, loss function, optimizer, SGD, etc) as we won't be explaining their inner working in detail. The main aim of the tutorial is to get individuals started developing neural networks using PyTorch.

If you want to learn about the basics of PyTorch then please feel free to check our small tutorial where we have covered the basic API of it.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  • Regression
    • Load Dataset
    • Normalize Data
    • Initialize Model Weights
    • Activation for Hidden Layers
    • Single Layer of Neural Network
    • Single Forward Pass through Data to Make Predictions
    • Define Loss Function
    • Train Neural Network
    • Make Predictions
    • Evaluate Performance of Neural Network
    • Train Data in Batches
    • Make Predictions in Batches
    • Evaluate Performance
  • Classification

Below we have imported the PyTorch and printed the version of it that we'll be using in this tutorial.

import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.9.1+cpu
device = "cuda" if torch.cuda.is_available() else "cpu"

print("Device : {}".format(device))
Device : cpu

Regression

In this section, we'll explain how we can create a simple neural network using PyTorch numpy-like API to solve simple regression tasks. We'll be using the Boston housing dataset from scikit-learn for our example. We'll create individual parts of the neural network, test them and then connect all of them together.

Load Dataset

Below we have loaded the Boston housing dataset available from scikit-learn. The features of the dataset are stored in a variable X and target values which are median house prices in dollar 1000 are stored in variable Y.

After loading the dataset, we have divided it into the train (80%) and test (20%) sets. We have then converted all numpy arrays to PyTorch tensors using torch.tensor() method. All PyTorch method requires input to be torch tensors hence this step was necessary.

We have also recorded a number of training samples and the number of features in separate variables as we'll be using them in our code.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
                                   torch.tensor(X_test, dtype=torch.float32),\
                                   torch.tensor(Y_train, dtype=torch.float32),\
                                   torch.tensor(Y_test, dtype=torch.float32)

samples, features = X_train.shape

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(torch.Size([404, 13]),
 torch.Size([102, 13]),
 torch.Size([404]),
 torch.Size([102]))
samples, features
(404, 13)

Normalize Data

In this section, we have normalized our data. The main reason behind normalizing data is to bring all features of data on the same scale so that all of them are in the almost same range. This helps the gradient descent algorithm to converge faster.

In order to normalize data, we have first found out the mean and standard deviation of our train data. Then we have subtracted the mean from both train and test sets. Then, we have divided the difference by standard deviation.

mean = X_train.mean(axis=0)
std = X_train.std(axis=0)

X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std

Initialize Model Weights

In this section, we have designed a small function that takes as input size of layers of our neural networks and then returns weights and biases for all layers in a list initialized with random values.

The function takes layer sizes as input which includes the last layer as well. It then loops through layer sizes and creates weights and biases for each layer. The weights and biases for each layer are kept together in a list. So in the final returned list of weights, the first entry will be (weights, biases) for the first layer, a second entry will be (weights, biases) for the second layer, and so on.

The shape of weights of each layer will be #units x #units_prev_layer except for the first layer. The first layer weights will have shape #units x #features. All the biases will have shape (#units,).

When we initialized weights and biases with random values using torch.rand() function, we provided one extra parameter named requires_grad with value True. This parameter indicates that whenever we calculate gradients of any function (e.g - Our neural network loss function) which has used these tensors in its calculation, we need to find the gradient of that function with respect to these tensors and store that gradient values inside of grad attribute of tensors.

def InitializeWeights(layer_sizes):
    weights = []
    for i, units in enumerate(layer_sizes):
        if i==0:
            w = torch.rand(units,features, dtype=torch.float32, requires_grad=True) ## First Layer
        else:
            w = torch.rand(units,layer_sizes[i-1], dtype=torch.float32, requires_grad=True) ## All other layers

        b = torch.rand(units, dtype=torch.float32, requires_grad=True) ## Bias

        weights.append([w,b])

    return weights

Below we have tested our function by giving layer sizes [15,10,1]. After initializing weights and biases using our function, we have also printed the shape of them for verifying that the function works as expected.

We can notice from the results that the first layer has weights of shape 15x13 which is according to #unit x #features, the second layer has shape 10x15 according to #units x #prev_layer_units, and so on.

torch.manual_seed(123)

weights = InitializeWeights([15,10,1])

for i, (w,b) in enumerate(weights):
    print("Layer : {}, Weights : {}, Biases : {}".format(i+1, w.shape, b.shape))
Layer : 1, Weights : torch.Size([15, 13]), Biases : torch.Size([15])
Layer : 2, Weights : torch.Size([10, 15]), Biases : torch.Size([10])
Layer : 3, Weights : torch.Size([1, 10]), Biases : torch.Size([1])

Activation for Hidden Layers

In this section, we have designed an activation function that we'll use for our hidden layers (layers except for input and output layers). The activation function that we'll use for our purpose is Relu (Rectified Linear Units). The activation function takes as input an array and returns an array of the same shape where all values less than 0 will be replaced with 0 hence it'll only have values greater than or equal to 0.

def Relu(tensor):
    return torch.maximum(tensor, torch.zeros_like(tensor)) # max(0,x)

Below we have tested our activation function by giving a sample tensor as input.

tensor = torch.tensor([-1,0,1,-2,4,-6,5])

Relu(tensor)
tensor([0, 0, 1, 0, 4, 0, 5])

Single Layer of Neural Network

In this section, we have defined a simple function that performs the work of one layer of our neural network. The function takes as input weights, input data, and activation function. It then performs the matrix dot product of input data and weights of the neural network. Then it adds biases to the output of the dot product. At last, we apply the activation function to the result and return it.

When performing matrix dot product of input data and weights, we have given transpose of weights as input. The reason behind this is that shape of input data will be #batch_size x #features and the shape of weights will be #units x #features for the first layer hence we need to take the transpose of weights to match dimensions for dot product. The same will happen for inner layers where we'll have the shape of input data #batch_size x #prev_layer_units and shape of weights will be #units x #prev_layer_units# hence we need to take the transpose of weights.

The output of this function will be of shape #batch_size x #units.

def LinearLayer(weights, input_data, activation=lambda x: x):
    w, b = weights
    out = torch.matmul(input_data, w.T) + b ## Multiply input by weights and add bias to it.
    return activation(out) ## Apply activation at last

Below we have tested our function on random data and printed shape of input data and output data for verification purposes. We have used weights that we initialized when we defined our weight initialization function. We have used weights of the first layer as input to function. We can notice from the output that it has shape 5x15 which matches #batch_size x #units of the first layer.

rand_data = torch.rand(5, features)

out = LinearLayer(weights[0], rand_data, Relu)

print("Data Shape : {}".format(rand_data.shape))
print("Output Shape : {}".format(out.shape))
Data Shape : torch.Size([5, 13])
Output Shape : torch.Size([5, 15])

Single Forward Pass through Data to Make Predictions

In this section, we have defined a function that performs one full forward pass of data through a neural network. The function takes weights and input data as input. It then loops through weights taking weights and biases of single layer and performs calculation of single layer by calling the function we designed in the previous cell. We loop through weights and biases of all layers except the last layer. For all inner layers, we have given Relu function as an activation function. For the last layer, we have not given any activation function because we want the output of the last layer as it is.

def ForwardPass(weights, input_data):
    layer_out = input_data

    for i in range(len(weights[:-1])):
        layer_out = LinearLayer(weights[i], layer_out, Relu) ## Hidden Layer

    preds = LinearLayer(weights[-1], layer_out) ## Final Layer

    return preds.ravel()

Below we have tested the forward pass function of our neural network by giving our train data as input.

preds = ForwardPass(weights, X_train)

print("Input Shape : {}, Output Shape : {}".format(X_train.shape, preds.shape))
Input Shape : torch.Size([404, 13]), Output Shape : torch.Size([404])

Define Loss Function

In this section, we have defined a loss function of our neural network. We'll be using the mean squared error loss function for our regression task.

MSE(actual, predictions) = 1/n * (actual - prediction)^2

The function takes as input actual target values and predicted target values as input. It then subtracts prediction values from actual target values, takes the square of the difference, and then averages all values to return one MSE value.

def MeanSquaredErrorLoss(actual, preds):
    return torch.pow(actual - preds, 2).mean()

Below we have tested our loss function with a simple example.

y1 = torch.tensor([1,2,3], dtype=torch.float32)
y2 = torch.tensor([4,5,6],dtype=torch.float32)

MeanSquaredErrorLoss(y1, y2)
tensor(9.)

Train Neural Network

In this section, we have defined a function that will actually perform the training of our neural network. Our training function takes as input train data features (X), target values (Y), learning rate, and a number of epochs. It then executes the training loop number of epochs time. Each time, it performed forward pass of train data through the neural network, calculates loss, calculates gradients, and at last update gradients. The forward pass is performed using the function we designed earlier by giving weights and features data as input. It returns predictions for input data. The predictions and actual target values are used to calculate loss value using the loss function. The gradients are calculated by simply calling backward() method on loss value. The backward() method uses chain rule to calculate gradients. This will calculate gradients of loss with respect to all weights where we had specified requires_grad as True. All the weights will have grad attribute set with gradient values.

As of last, we update weights using a loop. We subtract learning rate time gradients from all weights and biases. This process of updating weights by a small amount is commonly referred to gradient descent algorithm. After weights are updated, we set grad property of all weights tensor to None. This is done to prevent any issue when we call backward() method which generally adds gradients to grad attribute as it can add new gradients to previously present gradients if we don't remove them.

We have kept code to update weights inside of torch.no_grad() context manager. The reason behind this is that PyTorch keeps calculating gradients each time a function involving tensor with requires grad is executed. To prevent the calculation of gradients, we use this context manager.

We are also printing loss value at every 100 epochs.

def TrainModel(X, Y, learning_rate, epochs):
    for i in range(1, epochs+1):
        preds = ForwardPass(weights, X) ## Make Predictions by forward pass through network

        loss = MeanSquaredErrorLoss(Y, preds) ## Calculate Loss

        loss.backward() ## Calculate Gradients

        with torch.no_grad():
            for j in range(len(weights)): ## Update Weights
                weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
                weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases

                weights[j][0].grad = None
                weights[j][1].grad = None

        if i % 100 == 0: ## Print MSE every 100 epochs
            print("MSE : {:.2f}".format(loss))

Here, we have trained our neural network by calling the function we designed in the previous cell. We have first initialized number of epochs (2500), learning rate (0.0001) and layer sizes ([5,10,15,1]). We have initialized weights using the weight initialization function which we created earlier by giving layer sizes as input.

We have then called our training function with train features, train target values, learning rate, and epochs. We can notice from the loss getting printed at every 100 epochs that our neural network is going in the right direction.

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 2500
learning_rate = torch.tensor(1/1e4) # 0.0001
layer_sizes = [5,10,15,1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights

TrainModel(X_train, Y_train, learning_rate, epochs)
MSE : 170.44
MSE : 66.03
MSE : 31.75
MSE : 25.20
MSE : 23.12
MSE : 21.74
MSE : 20.66
MSE : 19.90
MSE : 19.23
MSE : 18.60
MSE : 18.05
MSE : 17.57
MSE : 17.13
MSE : 16.72
MSE : 16.28
MSE : 15.81
MSE : 15.39
MSE : 15.00
MSE : 14.67
MSE : 14.36
MSE : 14.09
MSE : 13.85
MSE : 13.63
MSE : 13.42
MSE : 13.24

Make Predictions

In this section, we are actually making predictions using our trained neural network weights.

We have made predictions for train data and test data both using our updated weights. We have used a function which we had designed earlier to perform one forward pass through a neural network.

train_preds = ForwardPass(weights, X_train)

train_preds[:5]
tensor([43.3787, 14.4095, 17.4788, 27.6152, 12.9184], grad_fn=<SliceBackward>)
test_preds = ForwardPass(weights, X_test)

test_preds[:5]
tensor([16.6272, 27.8568, 42.6880, 15.6851, 31.4551], grad_fn=<SliceBackward>)

Evaluate Performance of Neural Network

In this section, we have evaluated the performance of our neural network by calculating R^2 score on both train and test predictions. We have used r2_score() function available from scikit-learn to calculate score. The function takes as input actual target values and predicted values. It then returns value in the range [0,1]. A value close to 1 is considered a good score.

If you are interested in learning about how R^2 score works then please feel free to check our tutorial on scikit-learn metrics which covers it in detail.

from sklearn.metrics import r2_score

print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy(), Y_train.detach().numpy())))
print("Test  R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy(), Y_test.detach().numpy())))
Train R^2 Score : 0.81
Test  R^2 Score : 0.62

Train Data in Batches

In many real-life situations, the size of data is quite large and it generally does not fit into the main memory of the computer. To solve this issue, we take a small batch of data from the whole dataset at a time, make predictions on it, calculate loss, calculate gradients and then update weights using those gradients. We divide the dataset into batches and perform the same tasks for all batches of data. This algorithm of working on a small batch of data which consists of a few samples of data is generally referred to as Stochastic gradient descent a variant of gradient descent.

Our current dataset is small and it fits into the main memory of the computer, but we'll treat it as a big dataset that does not fit into the main memory to explain training data in batches.

We have designed a different function to perform training in batches. The function takes training data, training label, learning rate, number of epochs, and batch size as inputs. It then performs a training loop number of epoch times. Each time, we first calculate a number of batches of our data. We then loop through a number of batches calculating start and end indices of batches to filter our original data to take a single batch of data. We then perform forward pass through a single batch of data, calculate loss, calculate gradients by calling backward() on loss, and at last update the weights of the neural network. We do this process for all batches of data and all batches are executed the number of epochs times. In this case, we have separated the logic for updating weights into a separate function to prevent the training function from getting large.

We are also printing loss at every 100 epochs to track it.

def UpdateWeights(weights, learning_rate):
    with torch.no_grad():
        for j in range(len(weights)): ## Update Weights
            weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
            weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases

            weights[j][0].grad = None
            weights[j][1].grad = None

def TrainModelInBatches(X, Y, learning_rate, epochs, batch_size=32):
    for i in range(1, epochs+1):
        batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            preds = ForwardPass(weights, X_batch) ## Make Predictions by forward pass through network
            loss = MeanSquaredErrorLoss(Y_batch, preds) ## Calculate Loss
            losses.append(loss) ## Record Loss
            loss.backward() ## Calculate Gradients

            UpdateWeights(weights, learning_rate) ## Update Weights

        if i % 100 == 0: ## Print MSE every 100 epochs
            print("MSE : {:.2f}".format(torch.tensor(losses).mean()))

Below we are actually performing training of our neural network by calling the training routine we designed in the previous cell. We have initialized number of epochs (2500), learning rate (0.0001) and layer sizes ([5,10,15,1]). We have then initialized the weights of the neural network by calling the weight initialization function giving it layer sizes. At last, we have called our training function to actually perform training by giving training data to it.

We can notice from the MSE getting printed every 100 epochs that our neural network is doing a good job.

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 2500
learning_rate = torch.tensor(1/1e4) # 0.0001
layer_sizes = [5,10,15,1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights

TrainModelInBatches(X_train, Y_train, learning_rate, epochs)
MSE : 16.29
MSE : 13.36
MSE : 11.64
MSE : 10.51
MSE : 9.86
MSE : 9.38
MSE : 8.92
MSE : 8.49
MSE : 8.13
MSE : 7.86
MSE : 7.66
MSE : 7.50
MSE : 7.40
MSE : 7.31
MSE : 7.20
MSE : 7.03
MSE : 6.87
MSE : 6.76
MSE : 6.65
MSE : 6.56
MSE : 6.47
MSE : 6.38
MSE : 6.27
MSE : 6.17
MSE : 6.08

Make Predictions in Batches

As we have assumed that we can only fit certain samples into the main memory of the computer and not all of them, we need to design a function that will do predictions on a batch of data.

Below we have created a function that performs prediction on batches of data taking one batch at a time and at last, it combines all predictions.

The function generates a number of batches just like our training function at the beginning. It then loops through data in batches, makes predictions, and combines them before returning all predictions.

def MakePredictions(input_data, batch_size=32):
    batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    with torch.no_grad(): ## Disables automatic gradients calculations
        preds = []
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch = input_data[start:end]

            preds.append(ForwardPass(weights, X_batch))

    return preds

Below we have made predictions on the test and train dataset in batches using the function we designed earlier. We have then combined the predictions of all batches as well.

test_preds = MakePredictions(X_test)

test_preds = torch.cat(test_preds)

train_preds = MakePredictions(X_train)

train_preds = torch.cat(train_preds)

Evaluate Performance

In this section, we have evaluated the R^2 score on the train and test predictions. We can notice from the results that the result are a little better compared to when we worked on the whole data at a time. This might be due to weights getting updated for each batch of the data.

from sklearn.metrics import r2_score

print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy(), Y_train.detach().numpy())))
print("Test  R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy(), Y_test.detach().numpy())))
Train R^2 Score : 0.92
Test  R^2 Score : 0.70

Classification

In this section, we'll explain how we can design a neural network to solve classification tasks. We'll be creating a small neural network to solve a simple binary classification task. We'll be using the breast cancer dataset available from scikit-learn for our purpose.

The majority of code in this section will be repeated of what we had already coded in the regression section hence we won't be including detailed descriptions of them again. We have included them here for someone who starts directly from this section to follow along from top to bottom without copying code from the regression section.

Load Dataset

In this section, we have loaded the breast cancer dataset available from scikit-learn. The dataset has features related to measurements of tumors in breast cancer and the target variable is binary indicating whether the tumor is malignant or benign. The features of the dataset are loaded in variable X and target values are loaded in variable Y.

After loading the dataset, we have divided it into the train (80%) and test (20%) sets. We have then converted all numpy arrays holding datasets to PyTorch tensors. We have also recorded a number of training samples, a number of data features, and unique classes of the target in separate variables as we'll be using them in our code later on.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_breast_cancer(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
                                   torch.tensor(X_test, dtype=torch.float32),\
                                   torch.tensor(Y_train, dtype=torch.float32),\
                                   torch.tensor(Y_test, dtype=torch.float32)

samples, features = X_train.shape
classes = Y_test.unique()

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(torch.Size([455, 30]),
 torch.Size([114, 30]),
 torch.Size([455]),
 torch.Size([114]))
samples, features, classes
(455, 30, tensor([0., 1.]))

Normalize Data

In this section, we have normalized our data as usual by subtracting the mean and dividing the difference by standard deviation. The code is exactly the same as that from the regression section.

mean = X_train.mean(axis=0)
std = X_train.std(axis=0)

X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std

Initialize Model Weights

In this section, we have included a function to initialize the weights of neural networks. This function is almost exactly the same as the one we used in the regression section with a few minor modifications. We have introduced a new parameter named scale in the method signature. We are not passing requires_grad when creating weights using torch.rand() methods. Instead, we are setting requires_grad after we have applied scale to our weights.

The reason behind applying scale to our weights to decrease their values is that during our training process, we came to know that current weights were generating big values as input to our sigmoid function which was evaluated to 1 for large values. This resulted in our loss function being evaluated to 0 because our loss function takes a log of predictions. The loss of 0 resulted in all gradients becoming 0. As all gradients were zero, our model was not training. In order for our model to move, we needed to reduce weights so that values that go inside are not that big so that it returns values less than 1. These kinds of adjustments are needed when weights turn 0 or NaNs.

def InitializeWeights(layer_sizes, scale=0.1):
    weights = []
    for i, units in enumerate(layer_sizes):
        if i==0:
            w = torch.rand(units,features, dtype=torch.float32)
        else:
            w = torch.rand(units,layer_sizes[i-1], dtype=torch.float32)

        b = torch.rand(units, dtype=torch.float32)

        if scale: ## Scale weights
            w = w*scale
            b = b*scale

        w.requires_grad=True ## Set requires grad after weights are updated with scale
        b.requires_grad=True

        weights.append([w,b])

    return weights

Activation for Hidden Layers

In this section, we have included activation function Relu for the inner layers of our neural network. It takes as input PyTorch tensor and returns tensor where all values less than 0 will be replaced by 0.

def Relu(tensor):
    return torch.maximum(tensor, torch.zeros_like(tensor)) # max(0,x)

Activation for Last Layer

In this section, we have included code for the activation function of our last layer. As this is a binary classification task, we'll be using sigmoid function as an activation function for the last layer of our neural network. The sigmoid function takes as input PyTorch tensor and maps their values in the range [0,1].

sigmoid(x) = 1 / (1 + e^-x)

After defining the function, we have also tested the function on random data. We have compared the results with the ready function available from torch.nn module.

def Sigmoid(tensor):
    return 1 / (1 + torch.exp(-tensor))
tensor = torch.tensor([1,2,3,4,5])

Sigmoid(tensor), torch.nn.Sigmoid()(tensor)
(tensor([0.7311, 0.8808, 0.9526, 0.9820, 0.9933]),
 tensor([0.7311, 0.8808, 0.9526, 0.9820, 0.9933]))

Single Layer of Neural Network

In this section, we have included a function that applies one layer of a neural network to input data. The code for this function is an exact copy of what we have in the regression section.

def LinearLayer(weights, input_data, activation=lambda x: x):
    w, b = weights
    out = torch.matmul(input_data, w.T) + b ## Multiply input by weights and add bias to it.
    return activation(out) ## Apply activation at last

Single Forward Pass through Data to Make Predictions

In this section, we have included a function that performs one forward pass through a whole neural network. It uses the function we defined in the previous cell to apply one layer at a time. This function has exactly the same code as the one from the regression section with only two minor changes. In the regression section, we did not have any activation function applied to the last layer whereas here, we have provided the sigmoid function as the activation function for the last layer. The other change is that we have clipped the values which came from the last layer in the range [0.01,0.99]. This was done to prevent loss function getting 0 as it uses log and log of 1 is 0. The loss of 0 can make gradients 0 and mess up the whole training process.

def ForwardPass(weights, input_data):
    layer_out = input_data

    for i in range(len(weights[:-1])):
        layer_out = LinearLayer(weights[i], layer_out, Relu) ## Hidden Layer

    preds = LinearLayer(weights[-1], layer_out, Sigmoid) ## Final Layer

    return torch.clamp(preds.squeeze(), 0.01, 0.99)

Define Loss Function

In this section, we have defined a loss function for our binary classification task. We'll be using log loss function for our purpose. We have also included the formula of the log loss function below.

log_loss(actual, predictions) = -actual * log(predictions) - (1-actual) * log(1-predictions)

After defining the loss function, we have also tested the function with two arrays as input. We have also verified the function output with the ready log loss function available from scikit-learn to check whether our implementation is right.

def NegLogLoss(actual, preds):
    loss = - actual * torch.log(preds) - (1 - actual) * torch.log(1 - preds)
    return loss.mean()
y1 = torch.tensor([1,1,0, 0,1])
y2 = torch.tensor([0.7,0.1,0.69, 0.1,0.23])

NegLogLoss(y1, y2)
tensor(1.0811)
from sklearn.metrics import log_loss

log_loss(y1.detach().numpy(), y2.detach().numpy())
1.0810959234833717

Train Neural Network

In this section, we have defined a function that will actually train our neural network. We just need to call this function and it'll perform the training process. The function has exactly the same code as the one we used in the regression section with only one minor difference which is that we have used the log loss function here.

from torch import autograd

def TrainModel(X, Y, learning_rate, epochs):

    for i in range(1, epochs+1):
        preds = ForwardPass(weights, X) ## Make Predictions by forward pass through network

        loss = NegLogLoss(Y, preds) ## Calculate Loss

        loss.backward() ## Calculate Gradients

        with torch.no_grad():
            for j in range(len(weights)): ## Update Weights
                weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
                weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases

                weights[j][0].grad = None
                weights[j][1].grad = None

        if i % 100 == 0: ## Print NegLogLoss every 100 epochs
            print("NegLogLoss : {:.2f}".format(loss))

Below we have actually performed training of our neural network by calling the training function we designed in the previous cell. We have first initialized number of epochs (2500), learning rate (0.01) and layer sizes ([5,10,15,1]). We have then initialized our layer weights and biases using the weight initialization function we had designed earlier.

At last, we have called our training function with train features data, train target values, learning rate, and epochs as input. We can notice from the log loss getting printed at every 100 epochs that it seems to be moving in the right direction.

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 2500
learning_rate = torch.tensor(1/1e2) # 0.01
layer_sizes = [5,10,15,1] ## Layer sizes including last layer

weights = InitializeWeights(layer_sizes) ## Initialize Weights

TrainModel(X_train, Y_train, learning_rate, epochs)
NegLogLoss : 0.69
NegLogLoss : 0.68
NegLogLoss : 0.67
NegLogLoss : 0.67
NegLogLoss : 0.66
NegLogLoss : 0.66
NegLogLoss : 0.65
NegLogLoss : 0.64
NegLogLoss : 0.62
NegLogLoss : 0.58
NegLogLoss : 0.50
NegLogLoss : 0.42
NegLogLoss : 0.35
NegLogLoss : 0.31
NegLogLoss : 0.27
NegLogLoss : 0.24
NegLogLoss : 0.22
NegLogLoss : 0.20
NegLogLoss : 0.18
NegLogLoss : 0.17
NegLogLoss : 0.16
NegLogLoss : 0.15
NegLogLoss : 0.14
NegLogLoss : 0.13
NegLogLoss : 0.13

Make Predictions

In this section, we are making predictions on our train and test datasets. We have used the forward pass function we designed earlier to make predictions on train and test datasets. The output of our neural network is sigmoid output which is probabilities in the range [0,1]. We need to convert these probabilities to the actual class of our classification problem. We have set the threshold at 0.5, classifying all values greater than it as class 1 (malignant tumor) and all values less than the threshold as class 0 (benign tumor).

train_preds = ForwardPass(weights, X_train)

train_preds = torch.as_tensor(train_preds > 0.5, dtype=torch.float32)

train_preds[:5], Y_train[:5]
(tensor([1., 1., 0., 1., 1.]), tensor([1., 1., 0., 0., 1.]))
test_preds = ForwardPass(weights, X_test)

test_preds = torch.as_tensor(test_preds > 0.5, dtype=torch.float32)

test_preds[:5], Y_test[:5]
(tensor([0., 0., 1., 1., 1.]), tensor([0., 0., 1., 1., 1.]))

Evaluate Performance of Neural Network

In this section, we have evaluated the performance of our classification neural network by calculating accuracy on train and test predictions. We can notice from the results that our model seems to have done a decent job.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
Train Accuracy : 0.98
Test  Accuracy : 0.98

Train Data in Batches

In this section, we have explained how we can perform training in batches on datasets that do not fit into the main memory of the computer. We have included code for the function which actually performs training in batches. The code for this function is exactly the same as the one we used in the regression section with one minor change. We are using the log loss function this time for our classification problem.

def UpdateWeights(weights, learning_rate):
    with torch.no_grad():
        for j in range(len(weights)): ## Update Weights
            weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
            weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases

            weights[j][0].grad = None
            weights[j][1].grad = None

def TrainModelInBatches(X, Y, learning_rate, epochs, batch_size=32):
    for i in range(1, epochs+1):
        batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            preds = ForwardPass(weights, X_batch) ## Make Predictions by forward pass through network

            loss = NegLogLoss(Y_batch, preds) ## Calculate Loss
            losses.append(loss) ## Record Loss
            loss.backward() ## Calculate Gradients

            UpdateWeights(weights, learning_rate) ## Update Weights

        if i % 100 == 0: ## Print NegLogLoss every 100 epochs
            print("NegLogLoss : {:.2f}".format(torch.tensor(losses).mean()))

Below we have actually trained our neural network by calling the training function we designed in the previous cell. We have initialized number of epochs (2500), learning rate (0.001) and layer sizes ([5,10,15,1]). We have first initialized our model's weights and biases of each layer using the weights initialization function we designed earlier. We have then trained our neural network using the function we designed in the previous cell by giving it training data features, training target values, learning rate, and epochs as input.

We can notice from the log loss getting printed every 100 epochs that it seems to be doing better.

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 2500
learning_rate = torch.tensor(1/1e3) # 0.01
layer_sizes = [5,10,15, 1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights

TrainModelInBatches(X_train, Y_train, learning_rate, epochs)
NegLogLoss : 0.68
NegLogLoss : 0.67
NegLogLoss : 0.66
NegLogLoss : 0.65
NegLogLoss : 0.65
NegLogLoss : 0.63
NegLogLoss : 0.58
NegLogLoss : 0.46
NegLogLoss : 0.35
NegLogLoss : 0.28
NegLogLoss : 0.24
NegLogLoss : 0.20
NegLogLoss : 0.18
NegLogLoss : 0.16
NegLogLoss : 0.15
NegLogLoss : 0.13
NegLogLoss : 0.12
NegLogLoss : 0.12
NegLogLoss : 0.11
NegLogLoss : 0.10
NegLogLoss : 0.10
NegLogLoss : 0.09
NegLogLoss : 0.09
NegLogLoss : 0.09
NegLogLoss : 0.08

Make Predictions in Batches

In this section, we have included a function to make predictions on a dataset in batches. As we are assuming that our dataset does not fit into the main memory of a computer and we can bring only a batch of data into the main memory, we need to design a function that does prediction on batches of data and then combine them. The below function has almost exactly the same code as the one we had used in the regression section.

def MakePredictions(input_data, batch_size=32):
    batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    with torch.no_grad(): ## Disables automatic gradients calculations
        preds = []
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch = input_data[start:end]

            preds.append(ForwardPass(weights, X_batch))

    return preds

Below we have made predictions on train and test datasets using a function from the previous cell. We have also converted probabilities to class type for evaluation purposes.

test_preds = MakePredictions(X_test) ## Make Predictions on test dataset

test_preds = torch.cat(test_preds) ## Combine all batch predictions

test_preds = torch.as_tensor(test_preds > 0.5, dtype=torch.float32) ## Convert Probabilities to class type

train_preds = MakePredictions(X_train) ## Make Predictions on train dataset

train_preds = torch.cat(train_preds) ## Combine all batch predictions

train_preds = torch.as_tensor(train_preds > 0.5, dtype=torch.float32) ## Convert Probabilities to class type

Evaluate Performance

At last, we have evaluated the performance of our model by calculating the accuracy of the train and test predictions below. We can notice from the results that the model seems to have done a decent job.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
Train Accuracy : 0.98
Test  Accuracy : 0.98

This ends our small tutorial explaining how we can use PyTorch's low-level numpy-like API to create neural networks. We have covered a tutorial on creating a neural network using PyTorch's high-level API available through torch.nn module as well in the separate tutorial for those interested in learning about it (link in the Reference section below). Please feel free to let us know your views in the comments section.

Reference

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.