Updated On : Jun-01,2022 Time Investment : ~30 mins

PyTorch: Text Generation using LSTM Networks (Character-based RNN)¶

Text Generation also referred to as Natural Language Generation is a kind of Language Modeling problem where we build a model that tries to understand the structure of a text and produce another text. Tasks like machine translation, conversational systems (chatbots), speech-to-text, text summarization, etc at their core try to build language models. Now a day's deep learning models are developed for language modeling tasks. The language model in the case of text generation tries to predict the next token (character/word/n-gram) in text-based on previously seen tokens. In order to predict the next token in sequence, the language model needs to understand the sequence in which tokens are laid out. Deep Learning Recurrent Neural Networks (RNNs) and their variants (LSTM, GRU, etc) are quite good at understanding the sequence of input data hence can be used for language modeling tasks.

As a part of this tutorial, we have explained how we can create Recurrent Neural Networks consisting of LSTM layers using Python deep learning library PyTorch for text generation task. In this tutorial, we have used Character-based approach for text generation tasks where the model takes a specified number of characters as input and predicts the next character in the sequence. In the same way, we can also create networks that take a sequence of words as input and predicts the next word. We have used bag of words approach for encoding text data. We have used the Wikipedia text corpus available from torchtext library (PyTorch NLP tasks helper library) for our purpose. We have another tutorial on text generation using Pytorch which uses character embeddings for encoding text data. Please feel free to check it from the below link.

Text Generation using PyTorch LSTM Networks (Character Embeddings)

Please make a NOTE that language models are generally big and take time to train until they can produce some meaningful text. It will be hard to train them on CPU and GPU can help with faster training hence we recommend training language models on GPU.

Below, we have listed important sections of Tutorial to give an overview of the material covered.

Important Sections Of Tutorial¶

Prepare Data
- 1.1 Load Data
- 1.2 Populate Vocabulary
- 1.3 Reshape Examples to Create Sequence Of Data
- 1.4 Create Data Loaders
Define LSTM Network
Train Network
Generate Text
Train Network More
Generate Text
Train Even More
Generate Text
Further Suggestions

Below, we have imported the necessary Python libraries and printed the versions that we have used in our tutorial.

import torch

print("PyTorch Version : {}".format(torch.__version__))

PyTorch Version : 1.9.1

import torchtext

print("TorchText Version : {}".format(torchtext.__version__))

TorchText Version : 0.10.1

device = "cuda" if torch.cuda.is_available() else "cpu"

device

'cuda'

import gc

1. Prepare Data ¶

In this section, we are preparing our data for training our network. As we said earlier, we are going to use character-based approach for text generation hence we'll feed a few characters to the network and make it predict the next character in the sequence. We have decided to use 100 characters sequence to network and make it predict the next character after them.

We'll be encoding data using bag of words approach. We'll follow the below steps to encode and prepare data.

Create a vocabulary of all unique characters of the data. A vocabulary is a simple mapping from characters to their integer index. Each unique character is assigned a unique index starting from 0.
Loop through data sequentially one character at a time. Take the first 100 characters as data features and the next character after them as the target value. E.g characters 1-100 data features and character 101 target value, characters 2-101 data features and character 102 target value, characters 3-102 data features and 103 target value, and so on.
Replace each character with their unique index as per vocabulary.

The data generated after following the above steps will be given to the LSTM network for processing. The network will process a sequence of 100 characters at a time and try to predict the next character. We have explained the steps in more detail below to make them easier to grasp.

1.1 Load Data¶

In this section, we have simply loaded our Wikipedia dataset. The dataset is already divided into the train, test, and validation sets. We'll use only the train set for our task. The train set has ~36k text examples. Each example represents a Wikipedia article.

train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()

wikitext-2-v1.zip: 100%|██████████| 4.48M/4.48M [00:02<00:00, 1.60MB/s]

1.2 Populate Vocabulary¶

In this section, we are building a vocabulary of all unique characters present in our dataset. In order to create a vocabulary, we have used build_vocab_from_iterator() function available from 'vocab' sub-module of torchtext library. The function accepts an iterator that returns a list of characters on each call. We have created a small function named build_vocabulary() that works as an iterator. The function takes datasets as input and loops through all datasets and their examples one at a time yielding list of characters. Our text examples have a special token named <unk> which represents the unknown character and we have done special handling of it to count it as one token instead of breaking it into characters.

After building vocabulary, we have printed vocabulary and the number of characters present in it.

from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

def build_vocabulary(datasets):
    for dataset in datasets:
        for text in dataset:
            if "<unk>" in text:
                texts = text.split("<unk>")
                total = list(texts[0].lower())
                for t in texts[1:]:
                    total.extend(["<unk>", ] + list(t.lower()))
                yield total
            else:
                yield list(text.lower())

vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, ]), min_freq=1, specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

len(vocab)

print(vocab.get_itos())

['<unk>', ' ', 'e', 't', 'a', 'n', 'i', 'o', 'r', 's', 'h', 'd', 'l', 'c', 'm', 'u', 'f', 'g', 'p', 'w', 'b', 'y', ',', '.', 'v', 'k', '@', '\n', '1', '0', '=', '"', '2', "'", '9', '-', 'j', 'x', ')', '(', '3', '5', '8', '4', '6', '7', 'z', 'q', ';', '–', ':', '/', '—', '%', 'é', '$', '[', ']', '&', '!', 'í', '’', 'á', 'ā', '£', '°', '?', 'ó', '+', '#', 'š', '−', 'ō', 'ö', 'è', '×', 'ü', 'ä', 'ʻ', 'ś', 'ć', 'ø', '“', 'ł', 'ç', '”', '₹', 'ã', 'µ', 'ì', 'ư', '\ufeff', 'æ', '…', '→', 'ơ', 'ñ', 'å', '☉', '‘', '*', '~', '⁄', 'î', '²', 'ë', 'ệ', 'ī', 'ú', 'ễ', 'à', 'ô', 'ă', 'ū', '<', '^', 'ê', '♯', 'ỳ', '‑', 'đ', 'μ', '≤', '>', 'ل', 'ṃ', '～', '्', '†', '€', '±', 'ė', 'ž', '〈', '〉', '・', 'û', 'č', 'α', 'β', '½', 'γ', 'с', 'ṭ', 'ị', '„', '♭', 'â', '̃', 'ا', 'ه', '჻', 'ṅ', 'ầ', 'ớ', '′', '⅓', '大', '空', '¡', '¥', '³', '·', 'ş', 'ح', 'ص', 'ن', 'ვ', 'ი', 'კ', 'ო', 'ხ', 'ჯ', 'ḥ', 'ṯ', 'ả', 'ấ', '″', '火', '礮', '\\', '`', '|', '§', 'ò', 'þ', 'ń', 'ų', 'ż', 'ʿ', 'κ', 'а', 'в', 'е', 'к', 'о', 'т', 'я', 'ก', 'ง', 'ณ', 'ต', 'ม', 'ย', 'ร', 'ล', 'ั', 'า', 'ิ', '่', '์', 'გ', 'დ', 'ზ', 'რ', 'ს', 'უ', 'ც', 'ძ', 'წ', 'ṣ', 'ắ', 'ử', '₤', '⅔', 'の', 'ァ', 'ア', 'キ', 'ス', 'ッ', 'ト', 'プ', 'ュ', 'リ', 'ル', 'ヴ', '動', '場', '戦', '攻', '機', '殻', '隊']

1.3 Reshape Examples to Create Sequence Of Data¶

In this section, we are reorganizing our dataset examples so that they can be used to train our LSTM network. We are simply looping through each text example of our train dataset. For each text example, we are sliding a window of 100 characters. We are taking 100 characters as data features and the next character in the sequence as the target value, then we move the window by 1 character and continue the process until we reach the end of the text. We have also replaced each character with its integer index using our vocabulary. Please make a NOTE that we have not used all examples available from the dataset for the training model as it'll take quite long.

After organizing the dataset, we have converted them to torch tensors. We have also added one extra dimension at the end in order to feed data to the LSTM layer.

Below, we have tried to explain the process with a simple example.

vocab = {
'h':1,
'e':2,
'l':3,
'o':4,
' ':5,
',':6,
'w',7,
'a':8,
'r':9,
'y':10,
'u':11,
'?':12,
'c':13,
'm':14,
't':15,
'd':16,
'z':17,
'n':18
}

text_example = "Hello, How are you? Welcome to coderzcolumn?"
seq_length = 10

X_train = [
            ['h','e','l','l','o',',',' ', 'h','o','w'],
            [,'e','l','l','o',',',' ', 'h','o','w',' '],
            ['l','l','o',',',' ', 'h','o','w', ' ', 'a'],
            ['l','o',',',' ', 'h','o','w',' ', 'a', 'r'],
            ...
            ['d','e','r','z','c','o','l', 'u','m','n']
            ]
Y_train = ['e','l','l','o',',',' ', 'h','o','w',' ',..., '?']

X_train_vectorized = [
                        [1,2,3,4,5,6,1,4,7],
                        [2,3,4,5,6,1,4,7,5],
                        [3,4,5,6,1,4,7,5,1],
                        ...
                        [16,2,9,17,13,4,3,11,14,18]
                     ]
Y_train_vectorized = [1,2,3,4,5,6,1,4,7,5,1,...., 12]

%%time

train_dataset, valid_dataset, test_dataset = torchtext.datasets.WikiText2()

seq_length = 100 ## Network Hyperparameter to tune
X_train, Y_train = [], []

for text in list(train_dataset)[:7500]:
    for i in range(0, len(text)-seq_length):
        inp_seq = list(text[i:i+seq_length].lower())
        out_seq = text[i+seq_length].lower()
        X_train.append(vocab(inp_seq))
        Y_train.append(vocab[out_seq])

X_train, Y_train = torch.tensor(X_train, dtype=torch.float32), torch.tensor(Y_train)

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) ## Extra dimension is added for LSTM layer

X_train.shape, Y_train.shape

CPU times: user 31.7 s, sys: 1.08 s, total: 32.8 s
Wall time: 32.9 s

(torch.Size([1781323, 100, 1]), torch.Size([1781323]))

1.4 Create Data Loaders¶

In this section, we have simply wrapped our torch tensors in the dataset and created a data loader from it. The data loader will let us process data in batches during the training process. We have set the batch size of 1024.

from torch.utils.data import DataLoader, TensorDataset

vectorized_train_dataset = TensorDataset(X_train, Y_train)

train_loader = DataLoader(vectorized_train_dataset, batch_size=1024, shuffle=False)

for X, Y in train_loader:
    print(X.shape, Y.shape)
    break

torch.Size([1024, 100, 1]) torch.Size([1024])

gc.collect()

2. Define LSTM Network ¶

In this section, we have defined a neural network that we'll use for our task. Our task will be considered a classification task as our network predicts one of the characters from the vocabulary.

The network that we have defined consists of 2 LSTM layers and one linear layer. The output size of each LSTM layer is set at 256. The usage of two consecutive LSTM layers will help us better capture the sequence of characters found in the data. We have defined LSTM layers using LSTM() constructor where we have provided the value of num_layers parameter as 2 instructing it to stack to LSTM layers. The output of the second LSTM layer is given to Linear layer which has output units the same as the size of the vocabulary.

After defining the network, we initialized it, printed the shape of weights/biases of layers, and performed a forward pass for verification purposes.

If you are someone who is new to PyTorch or don't have a background on LSTM Networks then we recommend that you go through the below links as they will help you with the background. We have not covered the inner workings of LSTM in-depth here as it is already covered there.

from torch import nn
from torch.nn import functional as F

hidden_dim = 256
n_layers=2

class LSTMTextGenerator(nn.Module):
    def __init__(self):
        super(LSTMTextGenerator, self).__init__()
        self.lstm = nn.LSTM(input_size=1, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, len(vocab))

    def forward(self, X_batch):
        hidden, carry = torch.randn(n_layers, len(X_batch), hidden_dim).to(device), torch.randn(n_layers, len(X_batch), hidden_dim).to(device)
        output, (hidden, carry) = self.lstm(X_batch, (hidden, carry))
        return self.linear(output[:,-1])

text_generator = LSTMTextGenerator().to(device)

text_generator

LSTMTextGenerator(
  (lstm): LSTM(1, 256, num_layers=2, batch_first=True)
  (linear): Linear(in_features=256, out_features=244, bias=True)
)

for layer in text_generator.children():
    print("Layer : {}".format(layer))
    print("Parameters : ")
    for param in layer.parameters():
        print(param.shape)
    print()

Layer : LSTM(1, 256, num_layers=2, batch_first=True)
Parameters :
torch.Size([1024, 1])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])
torch.Size([1024, 256])
torch.Size([1024, 256])
torch.Size([1024])
torch.Size([1024])

Layer : Linear(in_features=256, out_features=244, bias=True)
Parameters :
torch.Size([244, 256])
torch.Size([244])

out = text_generator(torch.randn(1024, seq_length, 1).to(device))

out.shape

torch.Size([1024, 244])

3. Train Network ¶

Here, we are training our network. To simplify the training process, we have created a helper training function. The function takes the model, loss function, optimizer, train data loader, and a number of epochs as input. It then executes a training loop number of epochs times looping through whole training data in batches each time. For each batch of data, it performs a forward pass to make predictions, calculates loss, calculates gradients, and updates network parameters using gradients. It records the loss value for each batch and prints the average loss value of all batches at the end of each epoch.

from tqdm import tqdm
from sklearn.metrics import accuracy_score
import gc

def TrainModel(model, loss_fn, optimizer, train_loader, epochs=10):
    for i in range(1, epochs+1):
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X.to(device))

            loss = loss_fn(Y_preds, Y.to(device))
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))

Below, we are actually training our network using the training routine from the previous cell. We have initialized a number of epochs to 25 and the learning rate to 0.001. Then, we have initialized cross entropy loss, our LSTM model, and Adam optimizer. At last, we have called our training routine with the necessary parameters to perform training. We have trained the network for 25 epochs to see what kind of results it produces. We can notice from the loss value getting printed after each epoch that the network seems to be doing a good job at learning the sequence of characters.

%%time

from torch.optim import Adam

epochs = 25
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss().to(device)
text_generator = LSTMTextGenerator().to(device)
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)

100%|██████████| 1740/1740 [02:47<00:00, 10.36it/s]

Train Loss : 2.532

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 2.166

100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]

Train Loss : 2.040

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.952

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.885

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.828

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.781

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.741

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.706

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.678

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.652

100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]

Train Loss : 1.626

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.604

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.584

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.565

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.550

100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]

Train Loss : 1.535

100%|██████████| 1740/1740 [02:48<00:00, 10.34it/s]

Train Loss : 1.521

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.508

100%|██████████| 1740/1740 [02:48<00:00, 10.33it/s]

Train Loss : 1.496

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.484

100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]

Train Loss : 1.474

100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]

Train Loss : 1.464

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.455

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.446
CPU times: user 1h 9min 6s, sys: 17.3 s, total: 1h 9min 23s
Wall time: 1h 10min 36s

4. Generate Text ¶

In this section, we are trying to generate data using our trained network. We have first retrieved a random text example from our organized train dataset. We have then printed the characters of that example. Then, we have a loop that generates 100 new characters. The logic starts with the initial randomly selected sequence and makes the next character prediction. It then removes the first character from the sequence and adds a newly predicted character at the end. Then, it makes another prediction and the process repeats for 100 characters.

We can notice from the results that our model is not making any spelling errors even though it is predicting one character at a time. The sequence of characters generated does not make much sense but seems like an English language sentence. It is also predicting punctuation marks. The model is a little deterministic and repeats the sequence of characters after some time. This can be avoided by introducing some kind of randomness to the output of the network.

The results look overall good as we have trained the network for just 25 epochs. Next, we'll train the network for more epochs and hopefully, it should improve results further.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))

Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the country , and the country , and the country , and the country , and the country , and the co

5. Train Network More ¶

In this section, we have trained our network for another 50 epochs. We have also reduced the learning rate from 0.001 to 0.0003. We can notice from the loss values getting printed that it is decreasing at every epoch which means that our network is getting good at the text generation task.

epochs = 50
learning_rate = 3e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)

100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]

Train Loss : 1.429

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.422

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.418

100%|██████████| 1740/1740 [02:48<00:00, 10.33it/s]

Train Loss : 1.414

100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]

Train Loss : 1.410

100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]

Train Loss : 1.407

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.404

100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]

Train Loss : 1.401

100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]

Train Loss : 1.398

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.395

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.393

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.390

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.387

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.385

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.382

100%|██████████| 1740/1740 [02:50<00:00, 10.18it/s]

Train Loss : 1.380

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.377

100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]

Train Loss : 1.375

100%|██████████| 1740/1740 [02:48<00:00, 10.31it/s]

Train Loss : 1.373

100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]

Train Loss : 1.370

100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]

Train Loss : 1.368

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.366

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.364

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.362

100%|██████████| 1740/1740 [02:48<00:00, 10.30it/s]

Train Loss : 1.360

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.358

100%|██████████| 1740/1740 [02:51<00:00, 10.17it/s]

Train Loss : 1.356

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.354

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.352

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.350

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.348

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.346

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.344

100%|██████████| 1740/1740 [02:51<00:00, 10.13it/s]

Train Loss : 1.342

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.341

100%|██████████| 1740/1740 [02:47<00:00, 10.41it/s]

Train Loss : 1.339

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.337

100%|██████████| 1740/1740 [02:48<00:00, 10.32it/s]

Train Loss : 1.335

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.334

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.332

100%|██████████| 1740/1740 [02:47<00:00, 10.39it/s]

Train Loss : 1.330

100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]

Train Loss : 1.329

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.327

100%|██████████| 1740/1740 [02:47<00:00, 10.38it/s]

Train Loss : 1.326

100%|██████████| 1740/1740 [02:50<00:00, 10.21it/s]

Train Loss : 1.324

100%|██████████| 1740/1740 [02:47<00:00, 10.37it/s]

Train Loss : 1.322

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.321

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.319

100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]

Train Loss : 1.318

100%|██████████| 1740/1740 [02:50<00:00, 10.24it/s]

Train Loss : 1.316

6. Generate Text ¶

Here, we have again generated new characters using our more trained network. We have used the same example that we had used earlier. We can notice that results seem to have improved a little bit. The model is not making spelling mistakes and new words are generated for the same example. The network still seems deterministic and produces the same characters again and again. We can train the network further to see whether it helps or not.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))

Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the south of the south of the south of the south of the south of the south of the south of the s

7. Train Even More ¶

In this section, we have trained our network for another 50 epochs. We have reduced the learning rate from 0.0003 to 0.0001. We can notice from the loss values at the end of the epoch that the network is improving further.

epochs = 50
learning_rate = 1e-4
optimizer = Adam(text_generator.parameters(), lr=learning_rate)

TrainModel(text_generator, loss_fn, optimizer, train_loader, epochs)

100%|██████████| 1740/1740 [02:46<00:00, 10.44it/s]

Train Loss : 1.314

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.312

100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]

Train Loss : 1.311

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.310

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.309

100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]

Train Loss : 1.309

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.308

100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]

Train Loss : 1.307

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.307

100%|██████████| 1740/1740 [02:49<00:00, 10.28it/s]

Train Loss : 1.306

100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]

Train Loss : 1.305

100%|██████████| 1740/1740 [02:49<00:00, 10.29it/s]

Train Loss : 1.305

100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]

Train Loss : 1.304

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.304

100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]

Train Loss : 1.303

100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]

Train Loss : 1.302

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.302

100%|██████████| 1740/1740 [02:46<00:00, 10.48it/s]

Train Loss : 1.301

100%|██████████| 1740/1740 [02:49<00:00, 10.26it/s]

Train Loss : 1.301

100%|██████████| 1740/1740 [02:47<00:00, 10.37it/s]

Train Loss : 1.300

100%|██████████| 1740/1740 [02:48<00:00, 10.36it/s]

Train Loss : 1.300

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.299

100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]

Train Loss : 1.299

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.298

100%|██████████| 1740/1740 [02:46<00:00, 10.47it/s]

Train Loss : 1.297

100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]

Train Loss : 1.297

100%|██████████| 1740/1740 [02:49<00:00, 10.25it/s]

Train Loss : 1.296

100%|██████████| 1740/1740 [02:47<00:00, 10.41it/s]

Train Loss : 1.296

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.295

100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]

Train Loss : 1.295

100%|██████████| 1740/1740 [02:50<00:00, 10.19it/s]

Train Loss : 1.294

100%|██████████| 1740/1740 [02:47<00:00, 10.40it/s]

Train Loss : 1.294

100%|██████████| 1740/1740 [02:49<00:00, 10.27it/s]

Train Loss : 1.293

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.293

100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]

Train Loss : 1.292

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.292

100%|██████████| 1740/1740 [02:46<00:00, 10.46it/s]

Train Loss : 1.291

100%|██████████| 1740/1740 [02:50<00:00, 10.22it/s]

Train Loss : 1.291

100%|██████████| 1740/1740 [02:48<00:00, 10.36it/s]

Train Loss : 1.290

100%|██████████| 1740/1740 [02:49<00:00, 10.24it/s]

Train Loss : 1.290

100%|██████████| 1740/1740 [02:50<00:00, 10.19it/s]

Train Loss : 1.289

100%|██████████| 1740/1740 [02:46<00:00, 10.47it/s]

Train Loss : 1.289

100%|██████████| 1740/1740 [02:50<00:00, 10.20it/s]

Train Loss : 1.288

100%|██████████| 1740/1740 [02:46<00:00, 10.43it/s]

Train Loss : 1.288

100%|██████████| 1740/1740 [02:51<00:00, 10.17it/s]

Train Loss : 1.287

100%|██████████| 1740/1740 [02:47<00:00, 10.42it/s]

Train Loss : 1.287

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.286

100%|██████████| 1740/1740 [02:50<00:00, 10.23it/s]

Train Loss : 1.286

100%|██████████| 1740/1740 [02:47<00:00, 10.39it/s]

Train Loss : 1.285

100%|██████████| 1740/1740 [02:46<00:00, 10.45it/s]

Train Loss : 1.285

8. Generate Text ¶

Here, we are again generating text on the same text example using our trained network. We can notice from the results this time that they are a little better compared to earlier. Though they are still deterministic.

import random

random.seed(123)
idx = random.randint(0, len(X_train))
pattern = X_train[idx].numpy().astype(int).flatten().tolist()

print("Initial Pattern : {}".format("".join(vocab.lookup_tokens(pattern))))

generated_text = []
for i in range(100):
    X_batch = torch.tensor(pattern, dtype=torch.float32).reshape(1, seq_length, 1) ## Design Batch
    preds = text_generator(X_batch.to(device)) ## Make Prediction
    predicted_index = preds.argmax(dim=-1).cpu().numpy()[0] ## Retrieve token index
    generated_text.append(predicted_index) ## Add token index to result
    pattern.append(predicted_index) ## Add token index to original pattern
    pattern = pattern[1:] ## Resize pattern to bring again to seq_length length.

print("Generated Text : {}".format("".join(vocab.lookup_tokens(generated_text))))

Initial Pattern : 1987 – 88 season where he was named the ihl 's co @-@ rookie of the year and most valuable player af
Generated Text : ter the second construction of the second construction of the second construction of the second cons

9. Further Suggestions ¶

Below we have suggested a few more things that can be tried to improve network performance further.

Train the network for more epochs.
Try different combinations of LSTM layers. Maybe add more LSTM layers.
Try different hidden sizes for LSTM layers.
Try different sequence lengths. In our case, we tried a sequence length of 100 characters.
Try using an n-gram/word-based model instead of a character-based.
Try adding linear layers in the network after dense layers.
Try learning rate schedulers.
Try different character encodings like character embeddings, etc.
Try to add randomness to the prediction of the next character to make the predicted text look more natural. REFERENCE

This ends our small tutorial explaining how to design LSTM Networks using PyTorch for Text generation tasks. Please feel free to contact us if you questions

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

Pytorch, LSTM, text-generation

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

PyTorch: Text Generation using LSTM Networks (Character-based RNN)¶

Important Sections Of Tutorial¶

1. Prepare Data ¶

1.1 Load Data¶

1.2 Populate Vocabulary¶

1.3 Reshape Examples to Create Sequence Of Data¶

1.4 Create Data Loaders¶

2. Define LSTM Network ¶

3. Train Network ¶

4. Generate Text ¶

5. Train Network More ¶

6. Generate Text ¶

7. Train Even More ¶

8. Generate Text ¶

9. Further Suggestions ¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription