Updated On : Apr-05,2022 Time Investment : ~45 mins

PyTorch RNN For Text Classification Tasks¶

Neural network types like fully connected networks or convolutional neural networks are good at identifying patterns in data but they do not have a memory. They treat each example of data and parts of the example as independent of each other. They can not maintain any state/memory about previously seen examples. This kind of behavior is good as long as each example like images is independent of each other. But there are situations where remembering state information about previously seen examples can help get better results. Let's say for the natural language processing task of text generation, if our network can remember some state information about words seen, then it can help it remember state and generate new better words as it now knows about the context of the sentence. This kind of approach can help with time-series data as well where new prediction is generally dependent on the last few text examples.

To solve the problem of maintaining memory, Recurrent neural networks (RNN) were introduced. Recurrent neural networks maintain the state of the data examples and use it to improve results. If the reader is interested in learning about the inner workings of RNNs then we recommend this blog which covers it in detail.

As a part of this tutorial, we are going to design simple RNNs using PyTorch to solve text classification tasks. We'll try different approaches to using RNNs to classify text documents. We'll be using the word embedding approach to vectorize words to real-valued vectors before giving them to RNNs. The main aim of the tutorial is to get individuals started using RNNs for text classification tasks. Please check the link below if you are looking for guidance on LSTM Networks (Long Short-Term Memory - a variant of RNNs). It has almost the same structure as this tutorial but explains how to use LSTM networks.

PyTorch: LSTM Networks For Text Classification Tasks (Word Embeddings)

Below, we have listed important sections of tutorial to give an overview of the material covered.

Important Sections Of Tutorial¶

Populate Vocabulary
- 1.1 Load Dataset
- 1.2 Create Tokenizer And Populate Vocabulary
Approach 1: Single RNN Layer (Tokens Per Text Example=25, Embeddings Length=50)
- Load Datasets And Create Data Loaders
- Define RNN Classification Network
- Train Network
- Evaluate Network Performance
- Explain Network Predictions Using LIME Algorithm
Approach 2: Single RNN Layer (Tokens Per Text Example=50, Embeddings Length=50)
Approach 3: Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)
Approach 4: Stacking Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)
Approach 5: Multiple Bidirectional RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)
Summary of Results And Suggestions

First, we have imported the necessary libraries and printed the versions that we'll use in our tutorial.

import torch

print("PyTorch Version : {}".format(torch.__version__))

PyTorch Version : 1.9.1+cpu

import torchtext

print("TorchText Version : {}".format(torchtext.__version__))

TorchText Version : 0.10.1

1. Populate Vocabulary ¶

In this section, we have loaded AG NEWS dataset and populated a vocabulary using tokens generated from text examples of the dataset. The vocabulary will then be used to later map tokens to indexes which will be used to identify them. These indexes generated for tokens of text examples will be given as input to neural networks for classifying text documents.

1.1 Load Dataset¶

In this section, we have simply loaded AG NEWS dataset available from datasets sub-module of torchtext library. The dataset is already divided into the train (120000 text examples) and test (7600 text examples) sets.

from torch.utils.data import DataLoader

train_dataset, test_dataset  = torchtext.datasets.AG_NEWS()

train.csv: 29.5MB [00:00, 91.2MB/s]
test.csv: 1.86MB [00:00, 52.0MB/s]

1.2 Create Tokenizer And Populate Vocabulary¶

In this section, we have populated vocabulary for vectorizing text data using datasets loaded from the previous cell. We have first defined a simple tokenizer using get_tokenizer() function available from data sub-module of torchtext library. The tokenizer is a function that takes a text example as input and returns a list of tokens of that text example. The tokens are general words of the text but they can also be punctuation marks and special symbols.

After defining tokenizer, we have created a vocabulary using build_vocab_from_iterator() function available from vocab sub-module of torchtext library. The function takes an iterator as input that returns a list of tokens on each call. We have created a simple iterator function named build_vocabulary() which takes datasets as input. It then loops through each dataset and text examples from each dataset, yielding a list of tokens for each text example using a tokenizer. The special character <UNK> will be kept at the 0th index and all tokens not present in the dictionary will be mapped to this token.

In the next cells after the below cell, we have printed the length of vocabulary and explained how we can convert text examples to a list of indexes using tokenizer and vocabulary. This list of integers (indexes of tokens/words as per vocabulary) will be given as input to the neural network which will generate embeddings for them.

from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

tokenizer = get_tokenizer("basic_english")

def build_vocabulary(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, test_dataset]), min_freq=1, specials=["<UNK>"])

vocab.set_default_index(vocab["<UNK>"])

len(vocab)

tokens = tokenizer("Hello how are you?, Welcome to CoderzColumn!!")
indexes = vocab(tokens)

tokens, indexes

(['hello',
  'how',
  'are',
  'you',
  '?',
  ',',
  'welcome',
  'to',
  'coderzcolumn',
  '!',
  '!'],
 [12388, 355, 42, 164, 80, 3, 3298, 4, 0, 747, 747])

vocab["<UNK>"] ## Coderzcolumn word is mapped to unknown as it's new and not present in vocabulary

Approach 1: Single RNN Layer (Tokens Per Text Example=25, Embeddings Length=50) ¶

This is our first approach to the Recurrent layer in our network to classify text documents. We'll be using a combination of embedding, and recurrent and dense layers to create a neural network for classifying text documents. Our network in this approach uses a simple one recurrent neural layer.

Load Datasets And Create Data Loaders¶

Below, we have loaded AG NEWS dataset again and created data loaders from it that will be used during training to loop through data in batches. We have created train and test data loaders that return a batch of 1024 examples and their respective target labels for each call. We have created a simple vectorization function (vectorize_batch()) that will be used to vectorize text examples of a batch of data. For each batch, it'll tokenize text examples using a tokenizer, generates indexes using vocabulary, and return indexes as torch tensors along with target labels. We have set the maximum token size to 25 which keeps 25 tokens per text example. For examples with more than 25 tokens, extra tokens will be truncated and for examples with less than 25 tokens, they will be padded with 0s (<UNK> token). The function also subtracts 1 from target labels because labels are in the range 1-4 in data and we want them in the range 0-3 for our convenience. The vectorization function is given to collate_fn parameters of both data loaders.

from torch.utils.data import DataLoader
from torchtext.data.functional import to_map_style_dataset

train_dataset, test_dataset  = torchtext.datasets.AG_NEWS()
train_dataset, test_dataset  = to_map_style_dataset(train_dataset), to_map_style_dataset(test_dataset)

target_classes = ["World", "Sports", "Business", "Sci/Tech"]

max_words = 25

def vectorize_batch(batch):
    Y, X = list(zip(*batch))
    X = [vocab(tokenizer(text)) for text in X]
    X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.

    return torch.tensor(X, dtype=torch.int32), torch.tensor(Y) - 1 ## We have deducted 1 from target names to get them in range [0,1,2,3] from [1,2,3,4]


train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_batch, shuffle=True)
test_loader  = DataLoader(test_dataset , batch_size=1024, collate_fn=vectorize_batch)

for X, Y in train_loader:
    print(X.shape, Y.shape)
    break

torch.Size([1024, 25]) torch.Size([1024])

Define RNN Classification Network¶

In this section, we have created a neural network that we'll be using for the text classification task. The network consists of 3 layers.

Embeddings layer
RNN layer
Linear layer

The embedding layer has word embeddings for each token/word of our dictionary. We have set an embedding length of 50 for our example. This means that the embedding layer will have the weight of shape (len_vocab, 50). It has an embedding vector of length 50 for each token of our vocabulary. The embedding layer simply maps input indexes to a list of embeddings. Our data loaders will be returning indexes for tokens of text examples which will be given to the embedding layer as input which will convert indexes to embeddings. These embeddings will be updated during training to better classify documents. The input of the embedding layer will be of shape (batch_size, 25) and the output will be (batch_size, 25, 50), The batch size in our case is 1024. If the reader does not have a background on word embeddings then we recommend reading the below article that covers it in detail.

Word Embeddings for PyTorch Text Classification Networks

The RNN layer will take input from embedding layer of shape (batch_size, max_tokens, embedding_length) = (batch_size, 25, 50), perform it's operations and return an output of shape (batch_size, max_tokens, hidden_size) = (batch_size, 25, 50). In our case, the output shape of RNN layer is 50. If for example the hidden size of a recurrent layer is set at 75 then output from the recurrent layer will be of shape (batch_size, 25, 75). The recurrent layer loops through embeddings of tokens for each text example and generates output that has some knowledge about the context of the text documents. When calling the recurrent layer in the forward pass, we need to provide initial state detail to it which we have provided as random numbers. An initial state is needed for each text example. If we don't provide it then PyTorch generates a tensor of zeros internally. In our case, we have provided random numbers of shape (1, batch_size, 50) which means for each text example we have provided a real-valued vector of length 50 as an initial state. We recommend that the reader goes through this link if he/she wants to know how RNN layer works internally. Basically, it takes embedding for a single token and the initial state as input and returns the output. This output will become the state for the next token for the text example. For the next token, it'll again take embedding & state (previous output) and returns the new output (state). This process goes till the last token of the text example. This loop is repeated for each text example (25 tokens).

We can create recurrent layer using RNN() constructor available from 'nn' sub-module of torch library. We need to provide input shape and hidden dimension size to the constructor. We can create multiple RNN layers by providing an integer value greater than 1 to num_layers parameter. By default, the output shape of RNN layer will be (25, batch_size, 50) but we have converted it to (batch_size, 25, 50) by setting batch_first parameter to True.

The output of RNN will be given to the linear layer which has 4 output units (same as a number of target classes). We have given the last output of each example (output[:,-1]) in our case to the linear layer because the RNN layer generates output for each 25 tokens of the text example. We only need to give the last output of each example to be given as input to the linear layer as according to the concept of RNN, it has information about all previous tokens. So, even though the output of RNN layer is of shape (batch_size, 25, 50), the input to the linear layer will be (batch_size, 50) because we'll take the last entry from 25 entries for each example.

Please feel free to check the below tutorial if you are new to PyTorch and want to learn how to design networks using it. It's a simple guide for getting started with PyTorch.

Create Simple PyTorch Neural Networks using 'torch.nn' Module

After defining the network, we initialized it, printed the shape of weights/biases of layers, and performed a forward pass to make predictions as well. These steps were done for verification purposes that the network works as expected.

from torch import nn
from torch.nn import functional as F

embed_len = 50
hidden_dim = 50
n_layers=1

class RNNClassifier(nn.Module):
    def __init__(self):
        super(RNNClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.rnn = nn.RNN(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, len(target_classes))

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings, torch.randn(n_layers, len(X_batch), hidden_dim))
        return self.linear(output[:,-1])

rnn_classifier = RNNClassifier()

rnn_classifier

RNNClassifier(
  (embedding_layer): Embedding(98635, 50)
  (rnn): RNN(50, 50, batch_first=True)
  (linear): Linear(in_features=50, out_features=4, bias=True)
)

for layer in rnn_classifier.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()

Layer : Embedding(98635, 50)
Parameters :
torch.Size([98635, 50])

Layer : RNN(50, 50, batch_first=True)
Parameters :
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])

Layer : Linear(in_features=50, out_features=4, bias=True)
Parameters :
torch.Size([4, 50])
torch.Size([4])

out = rnn_classifier(torch.randint(0, len(vocab), (1024, max_words)))

out.shape

torch.Size([1024, 4])

Train Network¶

In this section, we are training our network using data loaders. We have created a helper function for the training network. The function takes model, loss function, optimizer, train data loader, validation data loader, and a number of epochs as input. It then performs a training loop number of epochs times. For each epoch, it loops through training data in batches using a train data loader. For each batch of data, it performs a forward pass to make predictions, calculates loss value (using predictions and actual target labels), calculates gradients, and updates network parameters using gradients. It also records the loss value for each batch and prints the average loss at the end of each epoch. We have also created another helper function that loops through the validation data loader and calculates validation accuracy and loss.

from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def CalcValLossAndAccuracy(model, loss_fn, val_loader):
    with torch.no_grad():
        Y_shuffled, Y_preds, losses = [],[],[]
        for X, Y in val_loader:
            preds = model(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())

            Y_shuffled.append(Y)
            Y_preds.append(preds.argmax(dim=-1))

        Y_shuffled = torch.cat(Y_shuffled)
        Y_preds = torch.cat(Y_preds)

        print("Valid Loss : {:.3f}".format(torch.tensor(losses).mean()))
        print("Valid Acc  : {:.3f}".format(accuracy_score(Y_shuffled.detach().numpy(), Y_preds.detach().numpy())))


def TrainModel(model, loss_fn, optimizer, train_loader, val_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X)

            loss = loss_fn(Y_preds, Y)
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))
        CalcValLossAndAccuracy(model, loss_fn, val_loader)

Below, we have initialized the necessary parameters and trained our network using a function defined in the previous cell. We have initialized a number of epochs to 15 and the learning rate to 0.001. Then, we have initialized the loss function, our classification network, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform training. We can notice from the loss and accuracy value getting printed after each epoch that our model is doing a good job at the text classification task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
rnn_classifier = RNNClassifier()
optimizer = Adam(rnn_classifier.parameters(), lr=learning_rate)

TrainModel(rnn_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)

100%|██████████| 118/118 [00:17<00:00,  6.87it/s]

Train Loss : 1.323
Valid Loss : 1.150
Valid Acc  : 0.489

100%|██████████| 118/118 [00:16<00:00,  7.23it/s]

Train Loss : 0.965
Valid Loss : 0.820
Valid Acc  : 0.661

100%|██████████| 118/118 [00:16<00:00,  7.14it/s]

Train Loss : 0.722
Valid Loss : 0.682
Valid Acc  : 0.735

100%|██████████| 118/118 [00:16<00:00,  7.20it/s]

Train Loss : 0.590
Valid Loss : 0.587
Valid Acc  : 0.791

100%|██████████| 118/118 [00:16<00:00,  7.12it/s]

Train Loss : 0.500
Valid Loss : 0.526
Valid Acc  : 0.816

100%|██████████| 118/118 [00:16<00:00,  7.13it/s]

Train Loss : 0.436
Valid Loss : 0.486
Valid Acc  : 0.835

100%|██████████| 118/118 [00:16<00:00,  7.10it/s]

Train Loss : 0.390
Valid Loss : 0.467
Valid Acc  : 0.844

100%|██████████| 118/118 [00:16<00:00,  7.12it/s]

Train Loss : 0.358
Valid Loss : 0.448
Valid Acc  : 0.853

100%|██████████| 118/118 [00:16<00:00,  7.23it/s]

Train Loss : 0.330
Valid Loss : 0.438
Valid Acc  : 0.856

100%|██████████| 118/118 [00:16<00:00,  6.99it/s]

Train Loss : 0.306
Valid Loss : 0.427
Valid Acc  : 0.863

100%|██████████| 118/118 [00:16<00:00,  7.18it/s]

Train Loss : 0.285
Valid Loss : 0.422
Valid Acc  : 0.865

100%|██████████| 118/118 [00:16<00:00,  7.02it/s]

Train Loss : 0.268
Valid Loss : 0.422
Valid Acc  : 0.867

100%|██████████| 118/118 [00:16<00:00,  7.23it/s]

Train Loss : 0.254
Valid Loss : 0.413
Valid Acc  : 0.872

100%|██████████| 118/118 [00:16<00:00,  7.15it/s]

Train Loss : 0.237
Valid Loss : 0.409
Valid Acc  : 0.871

100%|██████████| 118/118 [00:16<00:00,  7.18it/s]

Train Loss : 0.224
Valid Loss : 0.414
Valid Acc  : 0.870

Evaluate Network Performance¶

In this section, we have evaluated the performance of our network by calculating accuracy, classification report (precision, recall, and f1-score per target class) and confusion matrix metrics on test predictions. We can notice from the accuracy that our model has done a decent job at classifying text documents of the test dataset.

We have used metrics available from scikit-learn to calculate our metrics. Please feel free to check the below link if you want to learn about various metrics available from sklearn.

Scikit-Learn - Model Evaluation & Scoring Metrics

In the next cell after metrics calculation, we have plotted the confusion matrix using scikit-plot python library. We can notice from the visualization that our model is doing a good job at identifying Sports category documents compared to other categories.

If you are interested in learning about scikit-plot which provides visualizations for many ML metrics then please check the below link.

Yellowbrick - Visualize Sklearn's Classification & Regression Metrics in Python

def MakePredictions(model, loader):
    Y_shuffled, Y_preds = [], []
    for X, Y in loader:
        preds = model(X)
        Y_preds.append(preds)
        Y_shuffled.append(Y)
    gc.collect()
    Y_preds, Y_shuffled = torch.cat(Y_preds), torch.cat(Y_shuffled)

    return Y_shuffled.detach().numpy(), F.softmax(Y_preds, dim=-1).argmax(dim=-1).detach().numpy()

Y_actual, Y_preds = MakePredictions(rnn_classifier, test_loader)

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Test Accuracy : 0.8728947368421053

Classification Report :
              precision    recall  f1-score   support

       World       0.91      0.84      0.87      1900
      Sports       0.94      0.94      0.94      1900
    Business       0.82      0.86      0.84      1900
    Sci/Tech       0.83      0.85      0.84      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix :
[[1592   72  106  130]
 [  42 1793   48   17]
 [  49   29 1626  196]
 [  61   18  198 1623]]

from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Explain Network Predictions Using LIME Algorithm¶

In this section, we have explained predictions made by our network using LIME algorithm. The implementation of LIME algorithm is available through lime library. It let us generate a visualization that highlights words that contributed to prediction.

In order to explain prediction using lime, we first need to create an instance of LimeTextExplainer using the constructor available from lime_text sub-module of lime. Then, we need to call explain_instance() method on LimeTextExplainer instance to generate Explanation instance. At last, we need to call show_in_notebook() method on Explanation instance to generate a visualization that shows words from the text which contributed to predicting a particular target label.

Below, we have first retrieved all text examples from the test dataset. Then, we have created an instance of LimeTextExplainer with our target labels. Then, we have defined a function that will be required by explain_instance() method. The function takes as an input list of text examples and returns prediction probabilities for them. The function tokenizes and vectorizes data before giving it to the network for making predictions. The output of the network is converted to probabilities using softmax activation function and returned from the function.

After defining a function, we randomly selected on text example from the test dataset and made predictions on it using our trained network. We have then printed the actual and predicted labels for the selected sample. We can notice that our model correctly predicts the category as 'Business'.

If you are new to LIME and want to learn about it in-depth then please feel free to check the below link where we have covered the algorithm and its uses in detail.

How to Use LIME to Understand sklearn Models Predictions?

X_test_text, Y_test = [], []
for Y, X in test_dataset:
    X_test_text.append(X)
    Y_test.append(Y-1)

len(X_test_text)

from lime import lime_text
import numpy as np

explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)

def make_predictions(X_batch_text):
    X = [vocab(tokenizer(text)) for text in X_batch_text]
    X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
    logits = rnn_classifier(torch.tensor(X, dtype=torch.int32))
    preds = F.softmax(logits, dim=-1)
    return preds.detach().numpy()

rng = np.random.RandomState(1)
idx = rng.randint(1, len(X_test_text))
X = [vocab(tokenizer(text)) for text in X_test_text[idx:idx+1]]
X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
preds = rnn_classifier(torch.tensor(X, dtype=torch.int32))
preds = F.softmax(preds, dim=-1)

print("Prediction : ", target_classes[preds.argmax()])
print("Actual :     ", target_classes[Y_test[idx]])

Prediction :  Business
Actual :      Business

Below, we have called explain_instance() method with selected text example, prediction function, and target label for selected example to generate Explanation instance. Then, we have called show_in_notebook() method to generate a visualization showing an explanation. We can notice from the visualization that words like 'pensions', 'bankruptcy', 'airlines', 'employees', 'financing', etc are contributing to predicting category 'Business'.

explanation = explainer.explain_instance(X_test_text[idx], classifier_fn=make_predictions,
                                         labels=Y_test[idx:idx+1])
explanation.show_in_notebook()

Approach 2: Single RNN Layer (Tokens Per Text Example=50, Embeddings Length=50) ¶

Our approach in this section is exactly the same as our approach in the previous section with only a difference in the number of tokens used per text example. Our previous approach kept 25 tokens per text example whereas, in this section, we have kept 50 tokens per text example. The code is almost exactly the same as in our previous section.

Load Datasets And Create Data Loaders¶

Below, we have loaded the dataset again and created data loaders from it. We have set max_words to 50 this time to keep 50 tokens per text example.

from torch.utils.data import DataLoader
from torchtext.data.functional import to_map_style_dataset

train_dataset, test_dataset  = torchtext.datasets.AG_NEWS()
train_dataset, test_dataset  = to_map_style_dataset(train_dataset), to_map_style_dataset(test_dataset)

target_classes = ["World", "Sports", "Business", "Sci/Tech"]

max_words = 50

train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_batch, shuffle=True)
test_loader  = DataLoader(test_dataset , batch_size=1024, collate_fn=vectorize_batch)

for X, Y in train_loader:
    print(X.shape, Y.shape)
    break

torch.Size([1024, 50]) torch.Size([1024])

Define RNN Classification Network¶

Here, we have defined a network that we'll use for the text classification task in this section. The definition of a network is almost exactly the same as in the previous section with minor changes.

from torch import nn
from torch.nn import functional as F

embed_len = 50
hidden_dim = 50
n_layers=1

class RNNClassifier(nn.Module):
    def __init__(self):
        super(RNNClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.rnn = nn.RNN(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers,
                          batch_first=True, nonlinearity="relu", dropout=0.2)
        self.linear = nn.Linear(hidden_dim, len(target_classes))

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings, torch.randn(n_layers, len(X_batch), hidden_dim))
        return self.linear(output[:,-1])

Train Network¶

Now, we have trained our new network using exactly the same settings (15 epochs and learning rate of 0.001) that we have used in the previous approach. We'll be keeping our settings the same for all upcoming approaches as well. We can notice from the loss and accuracy getting printed after each epoch that our model is doing a good job.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
rnn_classifier = RNNClassifier()
optimizer = Adam(rnn_classifier.parameters(), lr=learning_rate)

TrainModel(rnn_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py:65: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
100%|██████████| 118/118 [00:24<00:00,  4.79it/s]

Train Loss : 1.385
Valid Loss : 1.382
Valid Acc  : 0.280

100%|██████████| 118/118 [00:26<00:00,  4.52it/s]

Train Loss : 1.376
Valid Loss : 1.386
Valid Acc  : 0.285

100%|██████████| 118/118 [00:23<00:00,  5.06it/s]

Train Loss : 1.232
Valid Loss : 1.096
Valid Acc  : 0.480

100%|██████████| 118/118 [00:21<00:00,  5.38it/s]

Train Loss : 0.925
Valid Loss : 0.770
Valid Acc  : 0.689

100%|██████████| 118/118 [00:21<00:00,  5.60it/s]

Train Loss : 0.647
Valid Loss : 0.608
Valid Acc  : 0.762

100%|██████████| 118/118 [00:21<00:00,  5.47it/s]

Train Loss : 0.500
Valid Loss : 0.506
Valid Acc  : 0.815

100%|██████████| 118/118 [00:21<00:00,  5.54it/s]

Train Loss : 0.422
Valid Loss : 0.472
Valid Acc  : 0.828

100%|██████████| 118/118 [00:20<00:00,  5.68it/s]

Train Loss : 0.374
Valid Loss : 0.417
Valid Acc  : 0.854

100%|██████████| 118/118 [00:21<00:00,  5.53it/s]

Train Loss : 0.337
Valid Loss : 0.406
Valid Acc  : 0.863

100%|██████████| 118/118 [00:21<00:00,  5.55it/s]

Train Loss : 0.311
Valid Loss : 0.413
Valid Acc  : 0.859

100%|██████████| 118/118 [00:21<00:00,  5.56it/s]

Train Loss : 0.286
Valid Loss : 0.379
Valid Acc  : 0.873

100%|██████████| 118/118 [00:21<00:00,  5.51it/s]

Train Loss : 0.267
Valid Loss : 0.368
Valid Acc  : 0.875

100%|██████████| 118/118 [00:21<00:00,  5.52it/s]

Train Loss : 0.252
Valid Loss : 0.389
Valid Acc  : 0.870

100%|██████████| 118/118 [00:23<00:00,  5.06it/s]

Train Loss : 0.239
Valid Loss : 0.353
Valid Acc  : 0.881

100%|██████████| 118/118 [00:21<00:00,  5.57it/s]

Train Loss : 0.227
Valid Loss : 0.371
Valid Acc  : 0.878

Evaluate Network Performance¶

In this section, we have evaluated the performance of our network by calculating accuracy, classification report and confusion matrix on test predictions as usual. We can notice from the test accuracy that it has improved a bit from our previous approach.

In the next cell after the below cell, we have plotted the confusion matrix which indicates that our model is doing a good job at classifying text documents of categories Sports and World compared to categories Business and Sci/Tech.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(rnn_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Test Accuracy : 0.8797368421052632

Classification Report :
              precision    recall  f1-score   support

       World       0.88      0.91      0.89      1900
      Sports       0.94      0.96      0.95      1900
    Business       0.83      0.85      0.84      1900
    Sci/Tech       0.87      0.80      0.83      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix :
[[1721   56   89   34]
 [  36 1821    6   37]
 [ 103   13 1623  161]
 [  96   49  234 1521]]

from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Explain Network Predictions Using LIME Algorithm¶

In this section, we have again tried to explain the prediction made by our network using LIME algorithm. We have randomly selected a sample and our model correctly predicts the target label as 'Business' for it. We can notice from the visualization that words like 'concessions', 'bankruptcy', 'pensions', 'labor', 'employees', 'financing', etc are contributing to predicting 'Business' category.

from lime import lime_text
import numpy as np

explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)

rng = np.random.RandomState(1)
idx = rng.randint(1, len(X_test_text))
X = [vocab(tokenizer(text)) for text in X_test_text[idx:idx+1]]
X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
preds = rnn_classifier(torch.tensor(X, dtype=torch.int32))
preds = F.softmax(preds, dim=-1)

print("Prediction : ", target_classes[preds.argmax()])
print("Actual :     ", target_classes[Y_test[idx]])

explanation = explainer.explain_instance(X_test_text[idx], classifier_fn=make_predictions,
                                         labels=Y_test[idx:idx+1])
explanation.show_in_notebook()

Approach 3: Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

All our previous approaches used a single recurrent layer in the network. As a part of our approach in this section, we have stacked 3 recurrent layers together. As we have stacked more recurrent layers, it should hopefully help better classify text documents. The code in this section is almost the same as our code from previous approaches with very minor changes.

Define RNN Classification Network¶

Below, we have defined our network which has exactly the same code as our previous networks with only a change in value given to num_layers parameter of RNN() constructor. We have set the value of the parameter to 3 to ask it to stack 3 recurrent layers. The rest of the code is the same as our previous networks.

After defining the network, we have also initialized it and printed the shape of weights/biases of layers of the network.

from torch import nn
from torch.nn import functional as F

embed_len = 50
hidden_dim = 50
n_layers=3

class RNNClassifier(nn.Module):
    def __init__(self):
        super(RNNClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.rnn = nn.RNN(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, len(target_classes))

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings, torch.randn(n_layers, len(X_batch), hidden_dim))
        return self.linear(output[:,-1])

rnn_classifier = RNNClassifier()

rnn_classifier

RNNClassifier(
  (embedding_layer): Embedding(98635, 50)
  (rnn): RNN(50, 50, num_layers=3, batch_first=True)
  (linear): Linear(in_features=50, out_features=4, bias=True)
)

for layer in rnn_classifier.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()

Layer : Embedding(98635, 50)
Parameters :
torch.Size([98635, 50])

Layer : RNN(50, 50, num_layers=3, batch_first=True)
Parameters :
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])

Layer : Linear(in_features=50, out_features=4, bias=True)
Parameters :
torch.Size([4, 50])
torch.Size([4])

Train Network¶

In this section, we have trained our 3 recurrent layers network with the same settings that we have been using for all our approaches. We can notice from the loss and accuracy value getting printed after each epoch that our model seems to be doing a decent job at the task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
rnn_classifier = RNNClassifier()
optimizer = Adam(rnn_classifier.parameters(), lr=learning_rate)

TrainModel(rnn_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)

100%|██████████| 118/118 [00:32<00:00,  3.68it/s]

Train Loss : 1.338
Valid Loss : 1.211
Valid Acc  : 0.435

100%|██████████| 118/118 [00:31<00:00,  3.79it/s]

Train Loss : 1.079
Valid Loss : 0.972
Valid Acc  : 0.552

100%|██████████| 118/118 [00:30<00:00,  3.86it/s]

Train Loss : 0.896
Valid Loss : 0.908
Valid Acc  : 0.579

100%|██████████| 118/118 [00:30<00:00,  3.90it/s]

Train Loss : 0.770
Valid Loss : 0.747
Valid Acc  : 0.706

100%|██████████| 118/118 [00:30<00:00,  3.85it/s]

Train Loss : 0.644
Valid Loss : 0.663
Valid Acc  : 0.764

100%|██████████| 118/118 [00:30<00:00,  3.87it/s]

Train Loss : 0.524
Valid Loss : 0.554
Valid Acc  : 0.815

100%|██████████| 118/118 [00:31<00:00,  3.74it/s]

Train Loss : 0.449
Valid Loss : 0.491
Valid Acc  : 0.833

100%|██████████| 118/118 [00:31<00:00,  3.78it/s]

Train Loss : 0.400
Valid Loss : 0.477
Valid Acc  : 0.841

100%|██████████| 118/118 [00:31<00:00,  3.80it/s]

Train Loss : 0.365
Valid Loss : 0.440
Valid Acc  : 0.853

100%|██████████| 118/118 [00:30<00:00,  3.83it/s]

Train Loss : 0.340
Valid Loss : 0.451
Valid Acc  : 0.853

100%|██████████| 118/118 [00:30<00:00,  3.86it/s]

Train Loss : 0.320
Valid Loss : 0.423
Valid Acc  : 0.861

100%|██████████| 118/118 [00:31<00:00,  3.79it/s]

Train Loss : 0.300
Valid Loss : 0.410
Valid Acc  : 0.867

100%|██████████| 118/118 [00:30<00:00,  3.82it/s]

Train Loss : 0.283
Valid Loss : 0.410
Valid Acc  : 0.867

100%|██████████| 118/118 [00:31<00:00,  3.77it/s]

Train Loss : 0.264
Valid Loss : 0.400
Valid Acc  : 0.874

100%|██████████| 118/118 [00:31<00:00,  3.80it/s]

Train Loss : 0.259
Valid Loss : 0.429
Valid Acc  : 0.867

Evaluate Network Performance¶

In this section, we have again evaluated various ML metrics on test predictions. We can notice from the accuracy that it's a little less compared to our previous approach. It seems from this experiment that multiple recurrent layers are not giving better accuracy compared to a single recurrent layer. Though trying different hyperparameters combinations might improve results.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(rnn_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Test Accuracy : 0.8677631578947368

Classification Report :
              precision    recall  f1-score   support

       World       0.87      0.88      0.87      1900
      Sports       0.90      0.97      0.93      1900
    Business       0.90      0.78      0.83      1900
    Sci/Tech       0.81      0.85      0.83      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix :
[[1673   82   89   56]
 [  29 1836    5   30]
 [ 113   16 1479  292]
 [ 111  109   73 1607]]

from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Explain Network Predictions Using LIME Algorithm¶

In this section, we have explained prediction using LIME algorithm. We have randomly selected a test example and our model correctly predicts the target label as 'Business' for it. The words like 'bankruptcy', 'airlines', 'cuts', 'concessions', 'pensions', 'labor', 'employees', etc seems to be contributing to predicting category as 'Business'.

from lime import lime_text
import numpy as np

explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)

rng = np.random.RandomState(1)
idx = rng.randint(1, len(X_test_text))
X = [vocab(tokenizer(text)) for text in X_test_text[idx:idx+1]]
X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
preds = rnn_classifier(torch.tensor(X, dtype=torch.int32))
preds = F.softmax(preds, dim=-1)

print("Prediction : ", target_classes[preds.argmax()])
print("Actual :     ", target_classes[Y_test[idx]])

explanation = explainer.explain_instance(X_test_text[idx], classifier_fn=make_predictions,
                                         labels=Y_test[idx:idx+1])
explanation.show_in_notebook()

Approach 4: Stacking Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

Our approach in this section is the same as our approach in the previous section of stacking multiple recurrent layers but the output size of the recurrent layer is different in this approach. In the previous approach, the output size of all recurrent layers was the same. The majority of the code in the section is a repeat of previous sections.

Define RNN Classification Network¶

Below, we have defined the network that we'll use in this section. We have defined 3 recurrent layers in this section with output sizes of 50, 60, and 75. The output of the embedding layer is given to the first recurrent layer, the output of the first recurrent layer is given to the second, and the output of the second recurrent layer is given to the third one. The output of the last recurrent layer is given to the linear layer whose output will be a prediction of the network.

After defining the network, we initialized it and printed the shapes of weights/biases of different layers of the network.

from torch import nn
from torch.nn import functional as F

embed_len = 50
hidden_dim1 = 50
hidden_dim2 = 60
hidden_dim3 = 75
n_layers=1

class RNNClassifier(nn.Module):
    def __init__(self):
        super(RNNClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.rnn1 = nn.RNN(input_size=embed_len, hidden_size=hidden_dim1, num_layers=1, batch_first=True)
        self.rnn2 = nn.RNN(input_size=hidden_dim1, hidden_size=hidden_dim2, num_layers=1, batch_first=True)
        self.rnn3 = nn.RNN(input_size=hidden_dim2, hidden_size=hidden_dim3, num_layers=1, batch_first=True)
        self.linear = nn.Linear(hidden_dim3, len(target_classes))

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn1(embeddings, torch.randn(n_layers, len(X_batch), hidden_dim1))
        output, hidden = self.rnn2(output, torch.randn(n_layers, len(X_batch), hidden_dim2))
        output, hidden = self.rnn3(output, torch.randn(n_layers, len(X_batch), hidden_dim3))
        return self.linear(output[:,-1])

rnn_classifier = RNNClassifier()

rnn_classifier

RNNClassifier(
  (embedding_layer): Embedding(98635, 50)
  (rnn1): RNN(50, 50, batch_first=True)
  (rnn2): RNN(50, 60, batch_first=True)
  (rnn3): RNN(60, 75, batch_first=True)
  (linear): Linear(in_features=75, out_features=4, bias=True)
)

for layer in rnn_classifier.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()

Layer : Embedding(98635, 50)
Parameters :
torch.Size([98635, 50])

Layer : RNN(50, 50, batch_first=True)
Parameters :
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])

Layer : RNN(50, 60, batch_first=True)
Parameters :
torch.Size([60, 50])
torch.Size([60, 60])
torch.Size([60])
torch.Size([60])

Layer : RNN(60, 75, batch_first=True)
Parameters :
torch.Size([75, 60])
torch.Size([75, 75])
torch.Size([75])
torch.Size([75])

Layer : Linear(in_features=75, out_features=4, bias=True)
Parameters :
torch.Size([4, 75])
torch.Size([4])

Train Network¶

In this section, we have trained our new network using the same settings that we have been using for all our approaches. The loss and accuracy getting printed after completion of each epoch hints that the model is doing a good job at the text classification task.

from torch.optim import Adam

epochs = 15
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
rnn_classifier = RNNClassifier()
optimizer = Adam(rnn_classifier.parameters(), lr=learning_rate)

TrainModel(rnn_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)

100%|██████████| 118/118 [00:36<00:00,  3.26it/s]

Train Loss : 1.328
Valid Loss : 1.175
Valid Acc  : 0.439

100%|██████████| 118/118 [00:36<00:00,  3.27it/s]

Train Loss : 1.018
Valid Loss : 0.920
Valid Acc  : 0.574

100%|██████████| 118/118 [00:35<00:00,  3.28it/s]

Train Loss : 0.788
Valid Loss : 0.725
Valid Acc  : 0.686

100%|██████████| 118/118 [00:35<00:00,  3.29it/s]

Train Loss : 0.632
Valid Loss : 0.618
Valid Acc  : 0.768

100%|██████████| 118/118 [00:35<00:00,  3.30it/s]

Train Loss : 0.532
Valid Loss : 0.544
Valid Acc  : 0.808

100%|██████████| 118/118 [00:35<00:00,  3.30it/s]

Train Loss : 0.450
Valid Loss : 0.472
Valid Acc  : 0.831

100%|██████████| 118/118 [00:36<00:00,  3.28it/s]

Train Loss : 0.399
Valid Loss : 0.468
Valid Acc  : 0.839

100%|██████████| 118/118 [00:35<00:00,  3.29it/s]

Train Loss : 0.361
Valid Loss : 0.421
Valid Acc  : 0.858

100%|██████████| 118/118 [00:35<00:00,  3.29it/s]

Train Loss : 0.328
Valid Loss : 0.407
Valid Acc  : 0.863

100%|██████████| 118/118 [00:37<00:00,  3.15it/s]

Train Loss : 0.308
Valid Loss : 0.395
Valid Acc  : 0.869

100%|██████████| 118/118 [00:35<00:00,  3.33it/s]

Train Loss : 0.291
Valid Loss : 0.385
Valid Acc  : 0.869

100%|██████████| 118/118 [00:35<00:00,  3.29it/s]

Train Loss : 0.271
Valid Loss : 0.383
Valid Acc  : 0.877

100%|██████████| 118/118 [00:36<00:00,  3.28it/s]

Train Loss : 0.257
Valid Loss : 0.381
Valid Acc  : 0.871

100%|██████████| 118/118 [00:35<00:00,  3.29it/s]

Train Loss : 0.245
Valid Loss : 0.384
Valid Acc  : 0.873

100%|██████████| 118/118 [00:35<00:00,  3.31it/s]

Train Loss : 0.229
Valid Loss : 0.394
Valid Acc  : 0.881

Evaluate Network Performance¶

Here, we have evaluated the performance of our network from this approach by calculating various ML metrics. We can notice from the accuracy of the model that it is a little more compared to our previous approach of using the same length multiple recurrent layers.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(rnn_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Test Accuracy : 0.8786842105263157

Classification Report :
              precision    recall  f1-score   support

       World       0.89      0.87      0.88      1900
      Sports       0.94      0.95      0.94      1900
    Business       0.85      0.83      0.84      1900
    Sci/Tech       0.83      0.87      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix :
[[1655   70  110   65]
 [  61 1796    3   40]
 [  86    8 1583  223]
 [  67   30  159 1644]]

from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Explain Network Predictions Using LIME Algorithm¶

In this section, we have explained the prediction made by our network from this section using LIME algorithm. The model correctly predicts the target label as 'Business' for the randomly selected test sample. The words like 'concessions', 'bankruptcy', 'pensions', 'cuts', 'labor', etc are contributing to prediction.

from lime import lime_text
import numpy as np

explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)

rng = np.random.RandomState(1)
idx = rng.randint(1, len(X_test_text))
X = [vocab(tokenizer(text)) for text in X_test_text[idx:idx+1]]
X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
preds = rnn_classifier(torch.tensor(X, dtype=torch.int32))
preds = F.softmax(preds, dim=-1)

print("Prediction : ", target_classes[preds.argmax()])
print("Actual :     ", target_classes[Y_test[idx]])

explanation = explainer.explain_instance(X_test_text[idx], classifier_fn=make_predictions,
                                         labels=Y_test[idx:idx+1])
explanation.show_in_notebook()

Approach 5: Multiple Bidirectional RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

Our approach in this section is the same as our third approach where we had used 3 recurrent layers of the same output size. The only difference in this approach is that we have used bidirectional recurrent layers. The bidirectional recurrent layers work exactly like unidirectional but it works in forward and backward directions both. When we give the recurrent layer 50 tokens (50 words of our text example), the cycle of applying the recurrent layer will go through 50 tokens in both directions (start to end and end to start). As the bidirectional recurrent layer performs a function in both directions, the output shape of it will be 2 times the output shape of the unidirectional recurrent layer as we have an output for both directions. Our code in this section is the same as our code from the third approach with minor parameter value change.

Define RNN Classification Network¶

Below, we have defined the network that we'll be using for our task. Our network is the same as the one from the third approach with minor changes. We have set bidirectional parameter of RNN() constructor to True. The input units of the linear layer are 2 times the output units of the recurrent layer as it is a bidirectional layer.

After defining the network, we initialized it and printed the shapes of weights/biases of layers for information purposes.

from torch import nn
from torch.nn import functional as F

embed_len = 50
hidden_dim = 50
n_layers=3

class RNNClassifier(nn.Module):
    def __init__(self):
        super(RNNClassifier, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.rnn = nn.RNN(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers,
                          batch_first=True, bidirectional=True) ## Bidirectional RNN
        self.linear = nn.Linear(2*hidden_dim, len(target_classes)) ## Input dimension are 2 times hidden dimensions due to bidirectional results

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings, torch.randn(2*n_layers, len(X_batch), hidden_dim))
        return self.linear(output[:,-1])

rnn_classifier = RNNClassifier()

rnn_classifier

RNNClassifier(
  (embedding_layer): Embedding(98635, 50)
  (rnn): RNN(50, 50, num_layers=3, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=100, out_features=4, bias=True)
)

for layer in rnn_classifier.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()

Layer : Embedding(98635, 50)
Parameters :
torch.Size([98635, 50])

Layer : RNN(50, 50, num_layers=3, batch_first=True, bidirectional=True)
Parameters :
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 50])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 100])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 100])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 100])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])
torch.Size([50, 100])
torch.Size([50, 50])
torch.Size([50])
torch.Size([50])

Layer : Linear(in_features=100, out_features=4, bias=True)
Parameters :
torch.Size([4, 100])
torch.Size([4])

Train Network¶

In this section, we have trained our network using the same settings that we have used for all our previous approaches. We can notice from the loss and accuracy values getting printed after each epoch that our model is doing a good job at the task.

from torch.optim import Adam

epochs = 25
learning_rate = 1e-4

loss_fn = nn.CrossEntropyLoss()
rnn_classifier = RNNClassifier()
optimizer = Adam(rnn_classifier.parameters(), lr=learning_rate)

TrainModel(rnn_classifier, loss_fn, optimizer, train_loader, test_loader, epochs)

100%|██████████| 118/118 [00:54<00:00,  2.15it/s]

Train Loss : 1.399
Valid Loss : 1.392
Valid Acc  : 0.264

100%|██████████| 118/118 [00:57<00:00,  2.05it/s]

Train Loss : 1.388
Valid Loss : 1.387
Valid Acc  : 0.270

100%|██████████| 118/118 [00:56<00:00,  2.09it/s]

Train Loss : 1.382
Valid Loss : 1.381
Valid Acc  : 0.273

100%|██████████| 118/118 [00:53<00:00,  2.19it/s]

Train Loss : 1.357
Valid Loss : 1.326
Valid Acc  : 0.349

100%|██████████| 118/118 [00:54<00:00,  2.18it/s]

Train Loss : 1.284
Valid Loss : 1.251
Valid Acc  : 0.409

100%|██████████| 118/118 [00:52<00:00,  2.24it/s]

Train Loss : 1.217
Valid Loss : 1.201
Valid Acc  : 0.439

100%|██████████| 118/118 [00:53<00:00,  2.21it/s]

Train Loss : 1.160
Valid Loss : 1.145
Valid Acc  : 0.449

100%|██████████| 118/118 [00:52<00:00,  2.26it/s]

Train Loss : 1.112
Valid Loss : 1.103
Valid Acc  : 0.476

100%|██████████| 118/118 [00:53<00:00,  2.21it/s]

Train Loss : 1.072
Valid Loss : 1.072
Valid Acc  : 0.493

100%|██████████| 118/118 [00:55<00:00,  2.14it/s]

Train Loss : 1.039
Valid Loss : 1.036
Valid Acc  : 0.507

100%|██████████| 118/118 [00:57<00:00,  2.05it/s]

Train Loss : 1.008
Valid Loss : 1.011
Valid Acc  : 0.523

100%|██████████| 118/118 [01:09<00:00,  1.70it/s]

Train Loss : 0.980
Valid Loss : 0.981
Valid Acc  : 0.538

100%|██████████| 118/118 [01:08<00:00,  1.72it/s]

Train Loss : 0.956
Valid Loss : 0.959
Valid Acc  : 0.550

100%|██████████| 118/118 [01:02<00:00,  1.90it/s]

Train Loss : 0.931
Valid Loss : 0.941
Valid Acc  : 0.563

100%|██████████| 118/118 [01:00<00:00,  1.95it/s]

Train Loss : 0.902
Valid Loss : 0.913
Valid Acc  : 0.583

100%|██████████| 118/118 [00:58<00:00,  2.03it/s]

Train Loss : 0.879
Valid Loss : 0.889
Valid Acc  : 0.605

100%|██████████| 118/118 [00:58<00:00,  2.01it/s]

Train Loss : 0.847
Valid Loss : 0.856
Valid Acc  : 0.633

100%|██████████| 118/118 [00:55<00:00,  2.11it/s]

Train Loss : 0.818
Valid Loss : 0.822
Valid Acc  : 0.657

100%|██████████| 118/118 [00:52<00:00,  2.24it/s]

Train Loss : 0.784
Valid Loss : 0.792
Valid Acc  : 0.675

100%|██████████| 118/118 [00:53<00:00,  2.20it/s]

Train Loss : 0.751
Valid Loss : 0.770
Valid Acc  : 0.694

100%|██████████| 118/118 [00:52<00:00,  2.25it/s]

Train Loss : 0.720
Valid Loss : 0.737
Valid Acc  : 0.705

100%|██████████| 118/118 [00:54<00:00,  2.16it/s]

Train Loss : 0.691
Valid Loss : 0.720
Valid Acc  : 0.720

100%|██████████| 118/118 [00:52<00:00,  2.24it/s]

Train Loss : 0.666
Valid Loss : 0.698
Valid Acc  : 0.731

100%|██████████| 118/118 [00:54<00:00,  2.17it/s]

Train Loss : 0.642
Valid Loss : 0.676
Valid Acc  : 0.738

100%|██████████| 118/118 [00:52<00:00,  2.23it/s]

Train Loss : 0.620
Valid Loss : 0.650
Valid Acc  : 0.751

Evaluate Network Performance¶

Here, we have evaluated various ML metrics for our network on test predictions. We can notice from the accuracy that our model from this approach has the least accuracy.

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Y_actual, Y_preds = MakePredictions(rnn_classifier, test_loader)

print("Test Accuracy : {}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Test Accuracy : 0.748421052631579

Classification Report :
              precision    recall  f1-score   support

       World       0.76      0.74      0.75      1900
      Sports       0.87      0.90      0.88      1900
    Business       0.69      0.62      0.65      1900
    Sci/Tech       0.67      0.74      0.70      1900

    accuracy                           0.75      7600
   macro avg       0.75      0.75      0.75      7600
weighted avg       0.75      0.75      0.75      7600


Confusion Matrix :
[[1407   97  250  146]
 [  49 1701    7  143]
 [ 301   30 1178  391]
 [ 101  127  270 1402]]

from sklearn.metrics import confusion_matrix
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np

skplt.metrics.plot_confusion_matrix([target_classes[i] for i in Y_actual], [target_classes[i] for i in Y_preds],
                                    normalize=True,
                                    title="Confusion Matrix",
                                    cmap="Purples",
                                    hide_zeros=True,
                                    figsize=(5,5)
                                    );
plt.xticks(rotation=90);

Explain Network Predictions Using LIME Algorithm¶

Here, we have again tried to explain the prediction of our network using LIME algorithm. The network correctly predicts the target category as 'Business' for the selected test example through the probability is a little less. The words like 'bankruptcy', 'labor', 'financing', 'employees', etc are contributing to predicting 'Business' category.

from lime import lime_text
import numpy as np

explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)

rng = np.random.RandomState(1)
idx = rng.randint(1, len(X_test_text))
X = [vocab(tokenizer(text)) for text in X_test_text[idx:idx+1]]
X = [tokens+([0]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X] ## Bringing all samples to max_words length.
preds = rnn_classifier(torch.tensor(X, dtype=torch.int32))
preds = F.softmax(preds, dim=-1)

print("Prediction : ", target_classes[preds.argmax()])
print("Actual :     ", target_classes[Y_test[idx]])

explanation = explainer.explain_instance(X_test_text[idx], classifier_fn=make_predictions,
                                         labels=Y_test[idx:idx+1])
explanation.show_in_notebook()

7. Summary of Results And Suggestions ¶

In this section, we have simply summarized the results of various approaches for easier comparisons. We have also suggested a few more things to try which might improve performance further.

Approach	Test Accuracy
Approach 1: Single RNN Layer (Tokens Per Text Example=25, Embeddings Length=50)	87.28 %
Approach 2: Single RNN Layer (Tokens Per Text Example=50, Embeddings Length=50)	87.98 %
Approach 3: Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)	86.77 %
Approach 4: Stacking Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)	87.86 %
Approach 5: Multiple Bidirectional RNN Layers (Tokens Per Text Example=50, Embeddings Length=50)	74.84 %

Further Suggestions¶

Below is a list of things that can be tried to further improve the performance of the network.

Try different token sizes.
Try different embedding lengths.
Try different output/hidden sizes of recurrent layers.
Train model for more epochs.
Try different weight initializers.
Try different values of dropout with recurrent layers.
Add a few more liner layers after recurrent layers.

Apart from the above-mentioned suggestions, there can be other things that can help improve the performance further but they need to be tried.

This ends our small tutorial explaining how we can design RNN for text classification tasks using PyTorch. Please feel free to let us know your views in the comments section.

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

pytorch, rnn, text-classification

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

PyTorch RNN For Text Classification Tasks¶

Important Sections Of Tutorial¶

1. Populate Vocabulary ¶

1.1 Load Dataset¶

1.2 Create Tokenizer And Populate Vocabulary¶

Approach 1: Single RNN Layer (Tokens Per Text Example=25, Embeddings Length=50) ¶

Load Datasets And Create Data Loaders¶

Define RNN Classification Network¶

Train Network¶

Evaluate Network Performance¶

Explain Network Predictions Using LIME Algorithm¶

Approach 2: Single RNN Layer (Tokens Per Text Example=50, Embeddings Length=50) ¶

Load Datasets And Create Data Loaders¶

Define RNN Classification Network¶

Train Network¶

Evaluate Network Performance¶

Explain Network Predictions Using LIME Algorithm¶

Approach 3: Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

Define RNN Classification Network¶

Train Network¶

Evaluate Network Performance¶

Explain Network Predictions Using LIME Algorithm¶

Approach 4: Stacking Multiple RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

Define RNN Classification Network¶

Train Network¶

Evaluate Network Performance¶

Explain Network Predictions Using LIME Algorithm¶

Approach 5: Multiple Bidirectional RNN Layers (Tokens Per Text Example=50, Embeddings Length=50) ¶

Define RNN Classification Network¶

Train Network¶

Evaluate Network Performance¶

Explain Network Predictions Using LIME Algorithm¶

7. Summary of Results And Suggestions ¶

Further Suggestions¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription