Updated On : Jun-03,2022 Time Investment : ~30 mins

Text Generation using PyTorch LSTM Networks (Character Embeddings)

Text Generation is one of the most important complicated tasks of Natural language processing (NLP). It requires us to understand the underlying structure of language to better form sentences that are meaningful. Nowadays, Language models are getting created using deep neural networks which are good at text generation tasks. Some researchers also refer to the text generation task as a language modeling task as it requires us to create a language model that understands language and then uses that knowledge to generate meaningful content. Language models have applications like machine translation, conversational systems (chatbots), text summarization, speech-to-text, etc. Deep learning models that involve RNN layers (Vanilla RNN, LSTM, GRU, etc) are generally preferred for language modeling tasks. The reason behind this is that these layers are good at understanding sequence found in data compared to dense layers which is the main reason they are commonly used for solving tasks involving time-series data.

As a part of our tutorial, we'll create language model by creating a Recurrent Neural Network consisting of LSTM layers using PyTorch for text generation task. The text generation model generally takes a list of tokens (characters/n-grams/words) as input and predicts the next token (character/n-gram/word) in the sequence as output. We have used character-based approach for our case which means that our network takes a list of characters as input and returns the next character that it thinks should come next. We can also design models that take a list of words as input and predicts the next word. For encoding text data, we have used character embeddings approach which assigns a real-valued vector to each token (character). We have used Wikipedia dataset available from Python library torchtext for training our network. We have another tutorial on text generation using Pytorch which does not use character embeddings and is based on only a bag of words. Please feel free to check it from the below link.

Please make a NOTE that language models are generally big and require training for many epochs hence we recommend using GPU for training them. It'll be hard to train the language model on CPU.

Below, we have listed important sections of Tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Prepare Data
    • 1.1 Load Data
    • 1.2 Populate Vocabulary
    • 1.3 Reshape Examples to Create Sequence Of Data
    • 1.4 Create Data Loaders
  2. Define LSTM Network
  3. Train Network
  4. Generate Text
  5. Train Network More
  6. Generate Text
  7. Lets Train Even More
  8. Generate Text
  9. Further Suggestions

Below, we have imported the necessary Python libraries that we have used in our tutorial and printed their versions.

import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.9.1
import torchtext

print("TorchText Version : {}".format(torchtext.__version__))
TorchText Version : 0.10.1
device = "cuda" if torch.cuda.is_available() else "cpu"

device
'cuda'
import gc

1. Prepare Data

In this section, we are preparing our dataset for giving it to the neural network for training. As we had said earlier, we'll use character-based approach for our case which means that we'll give a pre-decided length of characters to network and ask it to predict the next character after that sequence of characters. We have decided to use a sequence length of 100 characters to be given to the network for training making it predict the next character after them. We have used embeddings approach for encoding text data where we assign a real-valued vector of specified length to each unique character.

In order to prepare data for the network, we have followed the below steps.

  1. Load Text Examples
  2. Loop through all text examples and create a vocabulary of all unique characters. A vocabulary is simply mapping from character to integer index. Each character is assigned a unique integer starting from 0.
  3. Organize data by moving the window of 100 characters. For each text example, We'll follow the below process.
    • Take take characters 1-100 as data features (X) and character 101 as target value (Y).
    • Move the window by one character.
    • Take example 2-101 are data features (X) and character 102 as target value (Y).
    • Move the window by one character.
    • Take example 2-101 are data features (X) and character 102 as target value (Y).
    • Move the window by one character.
    • ... repeat the process till the end of the text example.
  4. Retrieve integer index for characters in data features and target values from our vocabulary.
  5. Assign a real-valued vector (embeddings) to each unique integer index representing a particular character in data features.

Steps 1-4 mentioned above will be completed in this section. Step 5 will be implemented in the neural network using an embedding layer which will assign unique embeddings to each character index.

Below, we have included an image of word embeddings. The character embeddings is exactly same as word embeddings with the only difference being that we assigned real-valued vector (embeddings) to each character.

PyTorch LSTM Networks for Text Generation (Character Embeddings)

1.1 Load Data

In this section, we have loaded our Wikipedia dataset that we are going to use for our task. The dataset has well-curated Wikipedia articles. The main dataset is already divided into the train, validation, and test sets. We'll be using only the train set for our purpose. The training dataset has ~36k articles.

train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()
wikitext-2-v1.zip: 100%|██████████| 4.48M/4.48M [00:00<00:00, 10.2MB/s]

1.2 Populate Vocabulary

In this section, we have populated a vocabulary of all unique words present in our dataset. In order to populate vocabulary, we have used build_vocab_from_iterator() function. This function takes as input a python iterator that returns a list of iterators on each call. We have created a function named build_vocabulary() which will act as our iterator. It takes datasets as input and loops through each example of the dataset yielding a list of characters of each example. We have done special handling of <unk> token so that it does not get broken into individual characters.

After populating the vocabulary, we printed the length of the vocabulary. We have also printed vocabulary to show unique characters present in it. Later on, we'll use this vocabulary to map each character to their index (E.g, <unk> will be mapped to index 0, ' ' will be mapped to index 1, character 'e' will be mapped to index 2, character 't' will be mapped to index 3, and so on).

from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

def build_vocabulary(datasets):
    for dataset in datasets:
        for text in dataset:
            if "<unk>" in text:
                texts = text.split("<unk>")
                total = list(texts[0].lower())
                for t in texts[1:]:
                    total.extend(["<unk>", ] + list(t.lower()))
                yield total
            else:
                yield list(text.lower())

vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, ]), min_freq=1, specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])
len(vocab)
244
print(vocab.get_itos())
['<unk>', ' ', 'e', 't', 'a', 'n', 'i', 'o', 'r', 's', 'h', 'd', 'l', 'c', 'm', 'u', 'f', 'g', 'p', 'w', 'b', 'y', ',', '.', 'v', 'k', '@', '\n', '1', '0', '=', '"', '2', "'", '9', '-', 'j', 'x', ')', '(', '3', '5', '8', '4', '6', '7', 'z', 'q', ';', '–', ':', '/', '—', '%', 'é', '$', '[', ']', '&', '!', 'í', '’', 'á', 'ā', '£', '°', '?', 'ó', '+', '#', 'š', '−', 'ō', 'ö', 'è', '×', 'ü', 'ä', 'ʻ', 'ś', 'ć', 'ø', '“', 'ł', 'ç', '”', '₹', 'ã', 'µ', 'ì', 'ư', '\ufeff', 'æ', '…', '→', 'ơ', 'ñ', 'å', '☉', '‘', '*', '~', '⁄', 'î', '²', 'ë', 'ệ', 'ī', 'ú', 'ễ', 'à', 'ô', 'ă', 'ū', '<', '^', 'ê', '♯', 'ỳ', '‑', 'đ', 'μ', '≤', '>', 'ل', 'ṃ', '~', '्', '†', '€', '±', 'ė', 'ž', '〈', '〉', '・', 'û', 'č', 'α', 'β', '½', 'γ', 'с', 'ṭ', 'ị', '„', '♭', 'â', '̃', 'ا', 'ه', '჻', 'ṅ', 'ầ', 'ớ', '′', '⅓', '大', '空', '¡', '¥', '³', '·', 'ş', 'ح', 'ص', 'ن', 'ვ', 'ი', 'კ', 'ო', 'ხ', 'ჯ', 'ḥ', 'ṯ', 'ả', 'ấ', '″', '火', '礮', '\\', '`', '|', '§', 'ò', 'þ', 'ń', 'ų', 'ż', 'ʿ', 'κ', 'а', 'в', 'е', 'к', 'о', 'т', 'я', 'ก', 'ง', 'ณ', 'ต', 'ม', 'ย', 'ร', 'ล', 'ั', 'า', 'ิ', '่', '์', 'გ', 'დ', 'ზ', 'რ', 'ს', 'უ', 'ც', 'ძ', 'წ', 'ṣ', 'ắ', 'ử', '₤', '⅔', 'の', 'ァ', 'ア', 'キ', 'ス', 'ッ', 'ト', 'プ', 'ュ', 'リ', 'ル', 'ヴ', '動', '場', '戦', '攻', '機', '殻', '隊']

1.3 Reshape Examples to Create Sequence Of Data

In this section, we have organized our dataset in the proper shape for the neural network. We are looping through each text example moving a window of 100 characters as we had explained at the beginning of the section. We have used limited text examples to complete training faster otherwise it could take a lot of time to train the network. When we take 100 characters as data features (X_train) and the next character as the target value (Y_train), we are also retrieving their index from the vocabulary.

After we have looped through all text examples moving a window of 100 characters through them, we converted the final data to torch tensors. PyTorch networks works on torch tensors hence we need to transform them from the list to tensors.

Below, we have tried to explain the process with a simple example.

vocab = {
'h':1,
'e':2,
'l':3,
'o':4,
' ':5,
',':6,
'w',7,
'a':8,
'r':9,
'y':10,
'u':11,
'?':12,
'c':13,
'm':14,
't':15,
'd':16,
'z':17,
'n':18
}

text_example = "Hello, How are you? Welcome to coderzcolumn?"
seq_length = 10

X_train = [
            ['h','e','l','l','o',',',' ', 'h','o','w'],
            [,'e','l','l','o',',',' ', 'h','o','w',' '],
            ['l','l','o',',',' ', 'h','o','w', ' ', 'a'],
            ['l','o',',',' ', 'h','o','w',' ', 'a', 'r'],
            ...
            ['d','e','r','z','c','o','l', 'u','m','n']
            ]
Y_train = ['e','l','l','o',',',' ', 'h','o','w',' ',..., '?']

X_train_vectorized = [
                        [1,2,3,4,5,6,1,4,7],
                        [2,3,4,5,6,1,4,7,5],
                        [3,4,5,6,1,4,7,5,1],
                        ...
                        [16,2,9,17,13,4,3,11,14,18]
                     ]
Y_train_vectorized = [1,2,3,4,5,6,1,4,7,5,1,...., 12]
%%time

train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()

seq_length = 100 ## Network Hyperparameter to tune
X_train, Y_train = [], []

for text in list(train_dataset)[:7500]:
    for i in range(0, len(text)-seq_length):
        inp_seq = list(text[i:i+seq_length].lower())
        out_seq = text[i+seq_length].lower()
        X_train.append(vocab(inp_seq)) ## Retrieve index for characters from vocab
        Y_train.append(vocab[out_seq]) ## Retrieve index for character from vocab

X_train, Y_train = torch.tensor(X_train, dtype=torch.int32), torch.tensor(Y_train)

X_train.shape, Y_train.shape
CPU times: user 25.2 s, sys: 1.09 s, total: 26.3 s
Wall time: 26.4 s
(torch.Size([1781323, 100]), torch.Size([1781323]))

1.4 Create Data Loaders

In this section, we have simply wrapped our data features (X_train) and target values (Y_train) in the tensor dataset and created a data loader from this dataset. The data loader will help us loop through data in batches during the training process. We have set the batch size of 1024. This will return 1024 examples and their target values on each call.

from torch.utils.data import DataLoader, TensorDataset

vectorized_train_dataset = TensorDataset(X_train, Y_train)

train_loader = DataLoader(vectorized_train_dataset, batch_size=1024, shuffle=False)
for X, Y in train_loader:
    print(X.shape, Y.shape)
    break
torch.Size([1024, 100]) torch.Size([1024])
gc.collect()
21

2. Define LSTM Network

In this section, we have defined a network that we'll use for our text generation task. Our task will be considered classification task because we are predicting one of the characters from the vocabulary. The network consists of four layers.

  1. Embedding Layer (100 Embed Length)
  2. LSTM Layer (256 Hidden dimension size)
  3. LSTM Layer (256 Hidden dimension size)
  4. Linear Layer (Output Units same as Vocabulary Length)

The first layer of our network is the embedding layer. We have created an embedding layer using Embedding() constructor. We have provided vocab length as a number of embeddings and embedding length of 100. This will create a matrix of shape (vocab_len, 100) which will be set as the weight matrix of the layer. The layer will take the list of indexes as input and retrieve embeddings for indexes by indexing the weight matrix of the layer. The input shape to layer is (batch_size, seq_length) = (batch_size, 100) and output shape will be (batch_size, seq_length, embed_len) = (batch_size, 100, 100).

The output of the embedding layer will be given to the first LSTM layer for processing which has hidden dimension size of 256. It'll process the data sequence. The output shape of first LSTM layer is (batch_size, seq_length, hidden_size) = (batch_size, 100, 256).

The output of the first LSTM layer will be given to the second LSTM layer for processing which also has hidden dimension size of 256. It'll process data sequence the same way. The output shape of second LSTM layer is (batch_size, seq_length, hidden_size) = (batch_size, 100, 256).

The output of the second LSTM layer will be given to the linear layer which has vocab_len output units for processing. It'll transform data shape to (batch_size, vocab_len). The output of the linear layer is the prediction of our network.

After defining the network, we have initialized it and printed shapes of weights/biases of layers. We have also performed a forward pass-through network using a few data examples for verification purposes.

Please make a NOTE that we have not explained how LSTM internally processes a sequence of data or PyTorch network design in detail here. If you are someone new to PyTorch and LSTM then we recommend that you go through the below links in your free time to understand them better.

from torch import nn
from torch.nn import functional as F

embed_len = 100
hidden_dim = 256
n_layers=2

class LSTMTextGenerator(nn.Module):
    def __init__(self):
        super(LSTMTextGenerator, self).__init__()
        self.word_embedding = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len)
        self.lstm = nn.LSTM(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, len(vocab))

    def forward(self, X_batch):
        embeddings = self.word_embedding(X_batch)

        hidden, carry = torch.randn(n_layers, len(X_batch), hidden_dim).to(device), torch.randn(n_layers, len(X_batch), hidden_dim).to(device)
        output, (hidden, carry) = self.lstm(embeddings, (hidden, carry))
        return self.linear(output[:,-1])
text_generator = LSTMTextGenerator().to(device)

text_generator
LSTMTextGenerator(
  (word_embedding): Embedding(244, 100)
  (lstm): LSTM(100, 256, num_layers=2, batch_first=True)
  (linear): Linear(in_features=256, out_features=244, bias=True)
)
for layer in text_generator.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()
Layer : Embedding(244, 100)
Parameters :
torch.Size([244, 100])

Layer : LSTM(100, 256, num_layers=2, batch_first=True)
Parameters :
torch.Size([1024, 100])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])
torch.Size([1024, 256])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])

Layer : Linear(in_features=256, out_features=244, bias=True)
Parameters :
torch.Size([244, 256])
torch.Size([244])

out = text_generator(torch.randint(0, len(vocab), (1024, seq_length)).to(device))

out.shape
torch.Size([1024, 244])

3. Train Network

In this section, we are training our network for generating text. We have designed a simple function which we'll use for training purposes. The function takes network, loss function, optimizer, train data loader, and a number of epochs as input. It then executes training loop number of epochs times. For each epoch, it loops through training data in batches using a train data loader. For each batch, it performs a forward pass to make predictions, calculates loss, calculates gradients, and updates network parameters using gradients. It records the loss of each batch and prints the average loss of all batches at the end of each epoch.

from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def TrainModel(model, loss_fn, optimizer, train_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X.to(device))

            loss = loss_fn(Y_preds, Y.to(device))
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        if (i%5) == 0:
            print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))

Below, we are actually training our network using a function defined in the previous cell. We have initialized a number of epochs to 25 and the learning rate to 0.001. Then, we have initialized cross entropy loss (classification task), LSTM network, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform training. We can notice from the loss values getting printed after every 5 epochs that our network seems to be improving over time.

%%time

from torch.optim import Adam

epochs = 25
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss().to(device)
text_generator = LSTMTextGenerator().to(device)
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [03:22<00:00,  8.58it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.47it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.38it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.39it/s]
Train Loss : 1.437
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.41it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
Train Loss : 1.307
100%|██████████| 1740/1740 [03:27<00:00,  8.40it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.37it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.38it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.41it/s]
Train Loss : 1.240
100%|██████████| 1740/1740 [03:27<00:00,  8.38it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.46it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.40it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.46it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
Train Loss : 1.197
100%|██████████| 1740/1740 [03:26<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.34it/s]
Train Loss : 1.166
CPU times: user 1h 22min 53s, sys: 29.1 s, total: 1h 23min 22s
Wall time: 1h 26min 14s

4. Generate Text

In this section, we are trying to generate text using our trained network to see how it is performing. We first randomly selected a sequence of characters from our train data and printed the characters of the sequence. Then we are looping 100 times generating new characters each time using our trained network. For the first iteration of the loop, our original selected sequence will be input to the network and it'll predict a new character. We'll add this character at the end of our sequence and take out the first character from the sequence for the second iteration. This process is repeated for 100 iterations where we add a newly predicted character at the end of the sequence taking the existing first character off every time. After we have generated 100 new characters, we have printed them as well.

We can notice from the results that they look like English language sentences. The network is correctly spelling words. There are no spelling errors which is good. The sentence does not seem to make much sense but the network has learned to generate correctly spelled words. The network seems to have become a little deterministic as some words are repeated. This can be avoided by adding little randomness to the output of the network so that it generates different words. Overall the results look promising by training network for only 25 epochs.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.int32).reshape(1, seq_length) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the south and the south and the south and the south and the south and the south as a province of

5. Train Network More

In this section, we have trained our network for another 50 epochs to check whether it helps improve results further. We have also reduced the learning rate from 0.001 to 0.0003. We can notice from the loss value getting printed after epochs that the network is improving further. Next, we'll check the performance.

epochs = 50

learning_rate = 3e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [03:25<00:00,  8.48it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.31it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.21it/s]
Train Loss : 1.141
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.32it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.29it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.32it/s]
Train Loss : 1.125
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.29it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
Train Loss : 1.112
100%|██████████| 1740/1740 [03:26<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.23it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.27it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.26it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
Train Loss : 1.101
100%|██████████| 1740/1740 [03:31<00:00,  8.24it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.25it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.21it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.22it/s]
Train Loss : 1.090
100%|██████████| 1740/1740 [03:31<00:00,  8.24it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.37it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.23it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.24it/s]
Train Loss : 1.080
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.19it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.23it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.27it/s]
Train Loss : 1.070
100%|██████████| 1740/1740 [03:31<00:00,  8.22it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.26it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
Train Loss : 1.061
100%|██████████| 1740/1740 [03:32<00:00,  8.18it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.41it/s]
100%|██████████| 1740/1740 [03:34<00:00,  8.12it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.42it/s]
Train Loss : 1.052
100%|██████████| 1740/1740 [03:33<00:00,  8.15it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.40it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.17it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
Train Loss : 1.044

6. Generate Text

In this section, we have again generated 100 characters using our trained model. We have used the same starting example that we had used earlier. We can notice that this time network is generating more different words. It even generates punctuation marks and also added a newline character ('\n'). The model still seems a little deterministic due to repeating words but the results are a little better compared to earlier. We'll train the model even further to check whether it helps or not.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.int32).reshape(1, seq_length) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the south and the south american country .
h , and the south american county stations , includi

7. Lets Train Even More

In this section, we have trained our network for another 50 epochs. We have reduced the learning rate from 0.0003 to 0.0001. We can notice from the loss value getting printed after every 5 epochs that the network seems to be improving. Next, we'll test the model.

epochs = 50

learning_rate = 1e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)
100%|██████████| 1740/1740 [03:23<00:00,  8.54it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:34<00:00,  8.12it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
Train Loss : 1.045
100%|██████████| 1740/1740 [03:34<00:00,  8.10it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:35<00:00,  8.09it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.31it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.37it/s]
Train Loss : 1.040
100%|██████████| 1740/1740 [03:35<00:00,  8.06it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
100%|██████████| 1740/1740 [03:36<00:00,  8.03it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.35it/s]
Train Loss : 1.036
100%|██████████| 1740/1740 [03:34<00:00,  8.12it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.24it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:37<00:00,  8.02it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.37it/s]
Train Loss : 1.032
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:37<00:00,  8.00it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.37it/s]
100%|██████████| 1740/1740 [03:30<00:00,  8.25it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.17it/s]
Train Loss : 1.028
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
100%|██████████| 1740/1740 [03:33<00:00,  8.16it/s]
100%|██████████| 1740/1740 [03:28<00:00,  8.36it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:35<00:00,  8.08it/s]
Train Loss : 1.025
100%|██████████| 1740/1740 [03:27<00:00,  8.40it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:36<00:00,  8.03it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.45it/s]
Train Loss : 1.021
100%|██████████| 1740/1740 [03:38<00:00,  7.98it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.30it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.17it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.40it/s]
Train Loss : 1.018
100%|██████████| 1740/1740 [03:34<00:00,  8.10it/s]
100%|██████████| 1740/1740 [03:29<00:00,  8.31it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.39it/s]
100%|██████████| 1740/1740 [03:35<00:00,  8.07it/s]
100%|██████████| 1740/1740 [03:26<00:00,  8.43it/s]
Train Loss : 1.014
100%|██████████| 1740/1740 [03:26<00:00,  8.44it/s]
100%|██████████| 1740/1740 [03:25<00:00,  8.45it/s]
100%|██████████| 1740/1740 [03:27<00:00,  8.39it/s]
100%|██████████| 1740/1740 [03:32<00:00,  8.19it/s]
100%|██████████| 1740/1740 [03:31<00:00,  8.25it/s]
Train Loss : 1.011

8. Generate Text

In this section, we have generated 100 new characters using our trained model. We have used the same example as a starting point. We can notice from the generated text that it seems to be generating decent text. A few words are repeated but the overall text looks like English language text. There are no spelling errors and the model is generating punctuation marks as well. Next, we have suggested a few tips to further improve model performance.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.int32).reshape(1, seq_length) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))
Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the series . the control of the series were recorded in the series of the series .
h , the cont

9. Further Suggestions

  1. Try the training model for more epochs.
  2. Try different embedding lengths.
  3. Try different sequence lengths. We tried a sequence length of 100.
  4. Try a different number of LSTM layers. (Please make a NOTE that training LSTM layers is a lengthy task hence adding more LSTM layers can increase training time a lot.)
  5. Try adding more dense layers after LSTM layers.
  6. Try adding dropout in LSTM layers.
  7. Try learning rate schedulers.
  8. Try using an n-gram/word based model instead of a character-based.
  9. Add little randomness to the prediction of character to make predictions look more natural and avoid deterministic words. REFERENCE

This ends our small tutorial explaining how to create LSTM Networks using PyTorch that uses character embeddings text encoding approach for text classification tasks. Please feel free to contact us if you have questions.

References

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.