When we want to work with text data for machine learning tasks, we need to convert text data to real-valued data as required by neural networks. All machine learning models work only on real-valued input data (float/int). There are various ways to convert text data to real-valued data (Frequency Count, Tf-Idf, One-hot encoding, word embeddings, etc). This process of converting text data to real-valued data is generally referred to as vectorization. Word Embeddings is one such text data vectorization approach. Generally, we tokenize data first where we split text data into tokens (words, punctuation marks, special characters, etc.). We keep track of all tokens from the whole dataset (all text examples) by creating a vocabulary of tokens. Then we assign a real-valued vector to each token of the data. These real-valued vectors are generally referred to as word embeddings. Each token can be assigned a vector of any length. Initially, these vectors are random numbers and are updated during the training process so that they capture the meaning of the token and understand the context of the text. Other vectorization approaches like frequency count, Tf-IDF, etc use just one real-valued number to represent the token whereas word embedding uses a real-valued vector (list of floats) to represent the token. Hence word embeddings have more representation power and can better understand words compared to other approaches.
As a part of this tutorial, we'll explain how we can design MXNet networks that use word embeddings for text classification tasks. We have explained various ways to handle words embeddings by trying different approaches. We'll be using gluonnlp library to tokenize and vectorize text data.
Below, we have highlighted important sections of tutorial to give an overview of the material covered.
Below, we have loaded important libraries and printed the versions of them that we have used in our tutorial.
import mxnet
print("MXNet Version : {}".format(mxnet.__version__))
import gluonnlp
print("GluonNLP Version : {}".format(gluonnlp.__version__))
import torchtext
print("TorchText Version : {}".format(torchtext.__version__))
In this section, we have prepared our data for the text classification task. We have loaded AG NEWS dataset from torchtext library, tokenized text examples from the dataset, populated vocabulary with tokens from text examples, and created data loaders that will be used during training. The data loader will generate indexes of tokens based on populated vocabulary for neural network input.
In this section, we have loaded AG NEWS dataset available from torchtext library. The dataset has text documents for 4 different categories (["World", "Sports", "Business", "Sci/Tech"]). After loading both datasets, we have converted them to gluon ArrayDataset. It's a data structure used by mxnet to internally maintain datasets. We'll use it to create data loaders later which will be used during the training of the network.
Category | Target Label |
---|---|
World | 1 |
Sports | 2 |
Business | 3 |
Sci/Tech | 4 |
from mxnet.gluon.data import ArrayDataset
train_dataset, test_dataset = torchtext.datasets.AG_NEWS()
Y_train, X_train = zip(*list(train_dataset))
Y_test, X_test = zip(*list(test_dataset))
train_dataset = ArrayDataset(X_train, Y_train)
test_dataset = ArrayDataset(X_test, Y_test)
In this section, we have created a tokenizer that will be used to tokenize text data and then populated a vocabulary. The tokenizer takes the text document as input and returns a list of tokens which are words/punctuations of the text document. The vocabulary is a simple mapping of tokens to their integer indexing. Each word is assigned a unique index starting from 1. Later on, we'll vectorize text data to a list of indexes using a tokenizer and vocabulary.
The gluonnlp library provides many tokenizers. Below, we have explained how we can load SpacyTokenizer available from it and use it to tokenize text data. Though we won't be using it for our purpose. It was just included for introducing the reader that there are many tokenizers available from gluonnlp.
import spacy
spacy.load('en_core_web_sm')
spacy_tokenizer = gluonnlp.data.SpacyTokenizer(lang="en_core_web_sm")
spacy_tokenizer("Hello, How are you?")
Below, we have created a simple tokenizer function that we'll use for our purpose. It just takes a text document as input and returns a list of words from it. It uses regular expression for creating tokenizer.
import re
from functools import partial
tokenizer = partial(lambda X: re.findall(r"\w+", X))
tokenizer("Hello, How are you?")
Below, we have created a vocabulary using Vocab() constructor available from gluonnlp library. It requires us to provide a Counter object which is simply mapping from token to their frequency. The Counter object has all words that will be included in the vocabulary and their frequency (no of times they appeared in all text documents). We have assigned string <unk>
for unknown tokens. When vectorizing, later on, the words that are not part of the vocabulary will be mapped to this token.
To populate Counter object, we have used count_tokens() function available from data module of gluonnlp. We have looped through each text document of datasets, tokenized them, and called count_tokens() function on the list of tokens. We have initially created an empty Counter object, which we provide to each call to count_tokens(). The Counter object gets filled with tokens and their frequencies.
After populating Counter object with tokens and their frequencies, we have created Vocab object using it. We have also printed the vocabulary size (no of tokens in vocab) at the end.
from collections import Counter
counter = Counter()
for dataset in [train_dataset, test_dataset]:
for X, Y in dataset:
gluonnlp.data.count_tokens(tokenizer(X), to_lower=True, counter=counter)
vocab = gluonnlp.Vocab(counter=counter, special_token="<unk>", min_freq=1)
print("Vocabulary Size : {}".format(len(vocab)))
In this section, we have created a simple function that will be used by data loaders, later on, to vectorize text documents to a list of indexes per vocabulary. The function takes as an input batch of data. It separates text documents (X) and their target labels (Y) in separate variables first. Then it loops through each text document, tokenizes it to a list of tokens, and retrieves the index of each token from the vocabulary.
We have decided to keep 50 tokens per text example. This will keep the first 50 words per text example and the rest will be ignored if present. If there are less than 50 words then we'll pad the text example with 0s (<unk>
token). This way all examples will have 50 tokens.
At last, we have converted lists to mxnet nd arrays and returned. Please make a NOTE that we have subtracted 1 from target labels as they are in the range 1-4 and we want labels in the range 0-3.
import gluonnlp.data.batchify as bf
from mxnet import nd
import numpy as np
def vectorize(batch):
X, Y = list(zip(*batch))
X = [[vocab(word) for word in tokenizer(sample)] for sample in X]
X = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X] ## Bringing all samples to 50 length.
return nd.array(X, dtype=np.int32), nd.array(Y, dtype=np.int32) - 1 # Subtracting 1 from labels to bring them in range 0-3 from 1-4
vectorize([["how are you", 1]])
In this section, we have created train and test data loaders using datasets we created earlier. They will be used during the training process to loop through data in batches. We have created data loaders using DataLoader() constructor available from mxnet. We have decided to keep the batch size of 1024 hence each batch will have 1024 examples. We have also provided our vectorization function we created in previous section to batchify_fn parameter of DataLoader() constructor.
from mxnet.gluon.data import DataLoader
train_loader = DataLoader(train_dataset, batch_size=1024, batchify_fn=vectorize)
test_loader = DataLoader(test_dataset, batch_size=1024, batchify_fn=vectorize)
target_classes = ["World", "Sports", "Business", "Sci/Tech"]
for X, Y in train_loader:
print(X.shape, Y.shape)
break
Our first approach uses an embedding size of 15 which means that it'll assign a real-valued vector of length 15 to each token of our data. We have designed a simple network to classify text documents.
In this section, we have designed a simple neural network that consists of one embedding layer and 2 dense layers for our text classification task. The embedding layer has embeddings (real-valued vector) of length 15 for each token of our vocabulary. We have created an embedding layer using Embedding() constructor available from nn module of mxnet by giving a vocabulary length and embedding size of 15. This will internally create a weight vector of shape (vocab_len, 15). The embedding layer is simply used to map the index of the token to its embeddings from weights. It'll take as an input list of indexes and return embeddings of length 15 for each index. As our single example consists of 50 tokens, the embedding layer will return embeddings of length 15 for 50 tokens (50 x 15). So if we give the input of shape (1024, 50) to the embedding layer, it'll return the output of shape (1024,50,15) where 1024 is our batch size and 50 token indexes we kept per example using vectorize function.
The output of the embedding layer is flattened and given to the first dense layer. The output shape will change from (1024,50,15) to (1024, 50x15) = (1024, 750) after flattening. The flattened output will be given to a dense layer that has 128 output unit and applies relu activation to the output. The output of the first dense layer will be given to another dense layer that has 4 output units (same as a number of unique target labels). The output of the second dense layer will be our prediction.
After defining the network, we have also initialized it and performed a forward pass using random data for verification purposes.
We have designed the whole network using Sequential API of mxnet. It applies layers in sequence in which they are added to input data. If the reader does not have a background on how to create a network using MXNet then we recommend the below link that covers it in detail.
from mxnet.gluon import nn
class EmbeddingClassifier(nn.Block):
def __init__(self, **kwargs):
super(EmbeddingClassifier, self).__init__(**kwargs)
self.seq = nn.Sequential()
self.seq.add(nn.Embedding(len(vocab), 15)) ## word embeddings length=15
self.seq.add(nn.Flatten()) ## Embeddings flattened
self.seq.add(nn.Dense(128, activation="relu"))
self.seq.add(nn.Dense(len(target_classes)))
def forward(self, x):
logits = self.seq(x)
return logits #nd.softmax(logits)
model = EmbeddingClassifier()
model
from mxnet import init, initializer
model.initialize(initializer.Xavier())
preds = model(nd.random.randint(1,10000, shape=(10,50)))
preds.shape
In this section, we have trained our network. In order to train it, we have designed a helper function that we'll call for training. The function takes Trainer object, train data loader, validation data loader, and a number of epochs as input.
The function executes the training loop number of epochs times. For each epoch, it loops through training data in batches using a train loader. For each batch, it calculates model predictions, calculates loss value, calculates gradients, and updates network parameters by calling step() function on Trainer object. The function records loss for each batch and prints the average loss at the end of the epoch. We have also calculated validation loss and validation accuracy at the end of each epoch. We have created helper functions to calculate validation loss and accuracy.
from mxnet import autograd
from tqdm import tqdm
from sklearn.metrics import accuracy_score
def MakePredictions(model, val_loader):
Y_actuals, Y_preds = [], []
for X_batch, Y_batch in val_loader:
preds = model(X_batch)
preds = nd.softmax(preds)
Y_actuals.append(Y_batch)
Y_preds.append(preds.argmax(axis=-1))
Y_actuals, Y_preds = nd.concatenate(Y_actuals), nd.concatenate(Y_preds)
return Y_actuals, Y_preds
def CalcValLoss(model, val_loader):
losses = []
for X_batch, Y_batch in val_loader:
val_loss = loss_func(model(X_batch), Y_batch)
val_loss = val_loss.mean().asscalar()
losses.append(val_loss)
print("Valid CrossEntropyLoss : {:.3f}".format(np.array(losses).mean()))
def TrainModelInBatches(trainer, train_loader, val_loader, epochs):
for i in range(1, epochs+1):
losses = [] ## Record loss of each batch
for X_batch, Y_batch in tqdm(train_loader):
with autograd.record():
preds = model(X_batch) ## Forward pass to make predictions
train_loss = loss_func(preds.squeeze(), Y_batch) ## Calculate Loss
train_loss.backward() ## Calculate Gradients
train_loss = train_loss.mean().asscalar()
losses.append(train_loss)
trainer.step(len(X_batch)) ## Update weights
print("Train CrossEntropyLoss : {:.3f}".format(np.array(losses).mean()))
CalcValLoss(model, val_loader)
Y_actuals, Y_preds = MakePredictions(model, val_loader)
print("Valid Accuracy : {:.3f}".format(accuracy_score(Y_actuals.asnumpy(), Y_preds.asnumpy())))
Below, we have initiated the necessary parameters and performed training by calling the training routine we designed in the previous cell.
We have initialized the number of epochs to 15 and the learning rate to 0.001. Then, we have initialized our network, cross entropy loss, RMSProp optimizer, and Trainer object. At last, we have called our training function to perform training.
We can notice from the loss and accuracy getting printed after each epoch that our model seems to be doing a good job at the task.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
from mxnet import optimizer
epochs=15
learning_rate = 0.001
model = EmbeddingClassifier()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()
optimizer = optimizer.RMSProp(learning_rate=learning_rate)
trainer = gluon.Trainer(model.collect_params(), optimizer)
TrainModelInBatches(trainer, train_loader, test_loader, epochs)
In this section, we have evaluated the performance of our network by calculating accuracy, classification report (precision, recall, and f1-score) and confusion matrix on test predictions. We have calculated all metrics using functions available from scikit-learn. We can notice from the classification report that our model is doing a good job at classifying text documents of categories World and Sports compared to Business and Sci/Tech.
If you are interested in learning about various ML metrics available from sklearn then please check the below link which covers the majority of them in detail.
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Y_actuals, Y_preds = MakePredictions(model, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actuals.asnumpy(), Y_preds.asnumpy())))
print("Classification Report : ")
print(classification_report(Y_actuals.asnumpy(), Y_preds.asnumpy(), target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actuals.asnumpy(), Y_preds.asnumpy()))
In this section, we have tried to explain the prediction made by our network using LIME algorithm implementation available from lime library.
In order to explain prediction using LIME, we first need to create an instance of LimeTextExplainer from lime_text module of lime. Then, we need to call explain_instance() method on it to create an instance of Explanation. At last, we need to call show_in_notebook() method on Explanation instance to create a visualization that highlights words contributing to prediction.
If the reader does not have a background with LIME then we recommend going through the below tutorial that covers the basics and can get individuals started using it.
Below, we have simply retrieved test examples from the test dataset.
X_test, Y_test = [], []
for X, Y in test_dataset:
X_test.append(X)
Y_test.append(Y-1)
Below, we have first initialized LimeTextExplainer instance using target labels. Then, we have created a function that takes a list of text documents as input and returns their predicted probabilities by model. The function first tokenizes data, then vectorizes it using vocabulary, and then makes predictions using the model. It returns probabilities by applying softmax activation to the output of the model.
After defining a function, we randomly selected one sample from test examples and made predictions on it using our trained model. Our model correctly predicts the target label as 'Business' for the selected sample.
from lime import lime_text
explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)
def make_predictions(X_batch_text):
X_batch = [[vocab(word) for word in tokenizer(sample)] for sample in X_batch_text]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
logits = model(nd.array(X_batch, dtype=np.int32))
preds = nd.softmax(logits)
return preds.asnumpy()
rng = np.random.RandomState(123)
idx = rng.randint(1, len(X_test))
X_batch = [[vocab(word) for word in tokenizer(sample)] for sample in X_test[idx:idx+1]]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
preds = model(nd.array(X_batch)).argmax(axis=-1)
print("Prediction : ", target_classes[int(preds.asnumpy()[0])])
print("Actual : ", target_classes[Y_test[idx]])
Below, we have first called explain_instance() method with selected text example, classification function, and target label. It'll return Explanation object. Then, we have called show_in_notebook() method on Explanation object to create visualization.
We can notice from the visualization that words like 'investor', 'forecasts', 'ticker', 'fullquote', etc are contributing to predicting the target category as 'Business' which makes sense as they are commonly used words in the business world.
explanation = explainer.explain_instance(X_test[idx], classifier_fn=make_predictions, labels=Y_test[idx:idx+1])
explanation.show_in_notebook()
Our approach in this section is almost exactly the same as our approach in the previous section with only a difference in the length of the embeddings. We have kept the embeddings length of 50 per token in this section. The majority of the code in this section is the same as the code from the previous section.
Below, we have defined a network that we'll use for our text classification task in this section. The network is exactly the same as our network from the previous section with the only difference in embedding length provided to Embedding layer which is 50 in this case. The rest of the network is the same as earlier.
from mxnet.gluon import nn
class EmbeddingClassifier(nn.Block):
def __init__(self, **kwargs):
super(EmbeddingClassifier, self).__init__(**kwargs)
self.seq = nn.Sequential()
self.seq.add(nn.Embedding(len(vocab), 50)) ## word embeddings length = 50
self.seq.add(nn.Flatten()) ## Embeddings flattened
self.seq.add(nn.Dense(128, activation="relu"))
self.seq.add(nn.Dense(len(target_classes)))
def forward(self, x):
logits = self.seq(x)
return logits #nd.softmax(logits)
model = EmbeddingClassifier()
model
Below, we have trained our network using exactly the same settings we had used in the previous section. We can notice from the loss and accuracy values getting printed after each epoch that our network is doing a good job.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
from mxnet import optimizer
epochs=15
learning_rate = 0.001
model = EmbeddingClassifier()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()
optimizer = optimizer.RMSProp(learning_rate=learning_rate)
trainer = gluon.Trainer(model.collect_params(), optimizer)
TrainModelInBatches(trainer, train_loader, test_loader, epochs)
In this section, we have evaluated the performance of the network by calculating accuracy, confusion matrix and classification report metrics on test predictions. We can notice from the accuracy that it is actually a little less compared to our previous approach. The classification reports show that our model is doing a good job at classifying World and Sports categories compared to Business and Sci/Tech categories.
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Y_actuals, Y_preds = MakePredictions(model, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actuals.asnumpy(), Y_preds.asnumpy())))
print("Classification Report : ")
print(classification_report(Y_actuals.asnumpy(), Y_preds.asnumpy(), target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actuals.asnumpy(), Y_preds.asnumpy()))
In this section, we have tried to explain a prediction made by the network using LIME algorithm. We have randomly selected a test example and our model correctly predicts the target label as 'Business' for it. Then, we have created a visualization explaining the network prediction. We can notice from the visualization that words like 'investor', 'fullquote', 'ticker', 'routers', 'forecasts', etc are contributing to the prediction.
from lime import lime_text
explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)
rng = np.random.RandomState(123)
idx = rng.randint(1, len(X_test))
X_batch = [[vocab(word) for word in tokenizer(sample)] for sample in X_test[idx:idx+1]]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
preds = model(nd.array(X_batch)).argmax(axis=-1)
print("Prediction : ", target_classes[int(preds.asnumpy()[0])])
print("Actual : ", target_classes[Y_test[idx]])
explanation = explainer.explain_instance(X_test[idx], classifier_fn=make_predictions, labels=Y_test[idx:idx+1])
explanation.show_in_notebook()
Our approach in this section is almost the same as our approach in the previous section with a minor change in the way the output embeddings from the embedding layer are handled. Till now, both approaches that we tried, flattened the embeddings of tokens of single text examples. In this approach, we have taken the average of embeddings of tokens of a single text example. The only change done to implement this approach is in the forward pass of the network. The rest of the code is almost the same as the previous section.
Below, we have defined a network that we'll use for our task in this section. We have defined the layers that we'll use in init() method of the network class. During the forward pass, the output of the embedding layer is averaged at the tokens level instead of flattening it like in our previous sections. Then, we have applied both linear layers to averaged embeddings.
from mxnet.gluon import nn
class EmbeddingClassifier(nn.Block):
def __init__(self, **kwargs):
super(EmbeddingClassifier, self).__init__(**kwargs)
self.word_embeddings = nn.Embedding(len(vocab), 50)
self.linear1 = nn.Dense(128, activation="relu")
self.linear2 = nn.Dense(len(target_classes))
def forward(self, x):
x = self.word_embeddings(x)
x = x.mean(axis=1) ## Averaged Embeddings
x = self.linear1(x)
logits = self.linear2(x)
return logits #nd.softmax(logits)
model = EmbeddingClassifier()
model
from mxnet import init, initializer
model.initialize(initializer.Xavier())
preds = model(nd.random.randint(1,10000, shape=(10,50)))
preds.shape
In this section, we have trained our network using the same settings that we have been using for all our previous approaches. From the loss and accuracy value getting printed after each epoch, we can notice that our model is doing a good job.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
from mxnet import optimizer
epochs=15
learning_rate = 0.001
model = EmbeddingClassifier()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()
optimizer = optimizer.RMSProp(learning_rate=learning_rate)
trainer = gluon.Trainer(model.collect_params(), optimizer)
TrainModelInBatches(trainer, train_loader, test_loader, epochs)
In this section, we have evaluated the performance of the network as usual by calculating accuracy, confusion matrix and classification report metrics on test predictions. We can notice from the accuracy score that the network has good accuracy compared to our previous approaches. The classification report indicates that the network is good at classifying text documents of categories 'World', 'Sports' and 'Sci/Tech' compared to category 'Business'.
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Y_actuals, Y_preds = MakePredictions(model, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actuals.asnumpy(), Y_preds.asnumpy())))
print("Classification Report : ")
print(classification_report(Y_actuals.asnumpy(), Y_preds.asnumpy(), target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actuals.asnumpy(), Y_preds.asnumpy()))
Here, we have tried to explain the prediction made by our network using LIME algorithm. We randomly selected a sample and predicted its label using our trained network. Our network correctly predicts the target label as 'Business' for it. Then, we generated a visualization explaining the prediction. We can notice from the visualization that words like 'stocks', 'investor', 'fullquote', 'earnings', etc are contributing to the prediction which makes sense as they are commonly used words in business.
from lime import lime_text
explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)
rng = np.random.RandomState(123)
idx = rng.randint(1, len(X_test))
X_batch = [[vocab(word) for word in tokenizer(sample)] for sample in X_test[idx:idx+1]]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
preds = model(nd.array(X_batch)).argmax(axis=-1)
print("Prediction : ", target_classes[int(preds.asnumpy()[0])])
print("Actual : ", target_classes[Y_test[idx]])
explanation = explainer.explain_instance(X_test[idx], classifier_fn=make_predictions, labels=Y_test[idx:idx+1])
explanation.show_in_notebook()
Our approach in this section is almost exactly the same as our approach in the previous section with one minor change. The embeddings from the embedding layer were averaged in the previous approach whereas here, we have summed the embeddings from the embedding layer. We have summed embeddings of all tokens of a single text example before giving it to a dense layer. The rest of the code is almost the same as our previous approaches.
Below, we have defined a network that we'll use for our text classification task in this section. The network design is exactly the same as our previous section with only a change in the forward pass where we have summed embeddings instead of averaging them like in the previous section. The rest of the code is the same as earlier.
from mxnet.gluon import nn
class EmbeddingClassifier(nn.Block):
def __init__(self, **kwargs):
super(EmbeddingClassifier, self).__init__(**kwargs)
self.word_embeddings = nn.Embedding(len(vocab), 50)
self.linear1 = nn.Dense(128, activation="relu")
self.linear2 = nn.Dense(len(target_classes))
def forward(self, x):
x = self.word_embeddings(x)
x = x.sum(axis=1) ## Embeddings summed
x = self.linear1(x)
logits = self.linear2(x)
return logits #nd.softmax(logits)
model = EmbeddingClassifier()
model
from mxnet import init, initializer
model.initialize(initializer.Xavier())
preds = model(nd.random.randint(1,10000, shape=(10,50)))
preds.shape
Here, we have trained our network using exactly the same parameter settings that we have used for all our previous approaches. The loss and accuracy values getting printed after each epoch hint that our model is doing a good job at the given task.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
from mxnet import optimizer
epochs=15
learning_rate = 0.001
model = EmbeddingClassifier()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()
optimizer = optimizer.RMSProp(learning_rate=learning_rate)
trainer = gluon.Trainer(model.collect_params(), optimizer)
TrainModelInBatches(trainer, train_loader, test_loader, epochs)
In this section, we have evaluated the performance of the network as usual by calculating accuracy, classification report and confusion matrix metrics on test predictions. We can notice from the accuracy score that our accuracy is the highest of all approaches we tried till now. The classification report indicates that the network is doing a good job in categories 'World', 'Sports' and 'Sci/Tech' compared to category 'Business'.
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Y_actuals, Y_preds = MakePredictions(model, test_loader)
print("Test Accuracy : {}".format(accuracy_score(Y_actuals.asnumpy(), Y_preds.asnumpy())))
print("Classification Report : ")
print(classification_report(Y_actuals.asnumpy(), Y_preds.asnumpy(), target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actuals.asnumpy(), Y_preds.asnumpy()))
In this section, we have tried to explain the prediction made by the network using LIME algorithm. We randomly selected a test example and made predictions on it using our trained network. The network correctly predicts the target label as 'Business' for the selected test example. Then, we have created a visualization to explain the prediction made by the network. We can notice from the visualization that words like 'fullquote', 'investor', 'stocks', 'ticker', 'forecasts', etc are contributing to the prediction.
from lime import lime_text
explainer = lime_text.LimeTextExplainer(class_names=target_classes, verbose=True)
rng = np.random.RandomState(123)
idx = rng.randint(1, len(X_test))
X_batch = [[vocab(word) for word in tokenizer(sample)] for sample in X_test[idx:idx+1]]
X_batch = [sample+([0]* (50-len(sample))) if len(sample)<50 else sample[:50] for sample in X_batch] ## Bringing all samples to 50 length.
preds = model(nd.array(X_batch)).argmax(axis=-1)
print("Prediction : ", target_classes[int(preds.asnumpy()[0])])
print("Actual : ", target_classes[Y_test[idx]])
explanation = explainer.explain_instance(X_test[idx], classifier_fn=make_predictions, labels=Y_test[idx:idx+1])
explanation.show_in_notebook()
The below table highlights the approaches tried and their accuracy on the test set.
Approach | Test Accuracy |
---|---|
Approach 1 - Word Embeddings (Max Tokens=50, Embeddings Length=15) | 85.85 % |
Approach 2 - Word Embeddings (Max Tokens=50, Embeddings Length=50) | 84.78 % |
Approach 3 - Word Embeddings Averaged (Max Tokens=50, Embeddings Length=50) | 86.23 % |
Approach 4 - Word Embeddings Summed (Max Tokens=50, Embeddings Length=50) | 86.60 % |
Below, we have listed a few more suggestions on what can be further tried to improve network performance further.
There are many more things that can be tried to improve network performance further but it'll require further experimentation to check which one works.
This ends our small tutorial explaining how we can use word embeddings approach for text classification tasks by designing networks using MXnet. We also explained various functionalities available from gluonnlp module. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to