Keras is one of the most commonly used deep learning libraries of Python to design neural networks. The reason behind such high popularity is that keras has one of the easiest APIs to work with neural networks which automates many tasks that otherwise developers need to code. One such task is the training of the neural network. In other Python deep learning libraries (like Tensorflow, PyTorch, mxnet, Flax (JAX), etc), the developers need to write code for training neural networks. In keras, developers need to just call fit() method to perform training of the neural network. It can even calculate loss and metrics on validation data. This frees developers from writing code for training neural networks which involve loops and can get messy as well introduce bugs sometimes. Though this makes the task of the developer quite easy, there are situations when we need to perform tasks before and after completion of epochs/steps (like logging results/metrics, saving model weights, modifying learning rate, stopping training, etc.). With other deep learning libraries, we are designing training loops so we can add these kinds of functionalities.
But how do we perform these kinds of tasks when using keras where we are using just one line of code to train neural networks?
The answer to this question is keras callbacks. Keras provides us with functions that we can execute at various stages of training like before the start of training, after completion of training, before the start of epochs, after completion of epochs, before the start of a batch, and after completion of the batch. We can execute functions to perform various tasks at earlier mentioned times of our training. Keras provides some ready callbacks for commonly performed tasks like log results/metrics, change learning rate, save model/weights, etc. It also lets us create custom callbacks if any of the existing callbacks are not satisfying our requirements.
As a part of this tutorial, we'll discuss how we can use existing keras callbacks and will also discuss how we can create our own custom callback if existing ones are not enough. We have used the Fashion MNIST dataset for our tutorial and have trained simple CNN on it to explain callbacks.
Below, we have highlighted important sections of our tutorial to give an overview of the material covered.
Below, we have imported keras and printed the version of it that we have used in our tutorial.
import tensorflow
from tensorflow import keras
print("Keras Version : {}".format(keras.__version__))
In this section, we have loaded the Fashion MNIST dataset which is available as a part of the keras package. The dataset has images of 10 different fashion items. The dataset is already divided into the train (60k images) and test (10k images) sets. The images are of shape 28 x 28 pixels. The below table shows the mapping from index to item names. We have also displayed a few random images in the cell after the below cell.
Label | Description |
---|---|
0 | T-shirt/top |
1 | Trouser |
2 | Pullover |
3 | Dress |
4 | Coat |
5 | Sandal |
6 | Shirt |
7 | Sneaker |
8 | Bag |
9 | Ankle boot |
from tensorflow.keras import datasets
(X_train, Y_train), (X_test, Y_test) = datasets.fashion_mnist.load_data()
X_train, X_test = X_train.reshape(-1,28,28,1), X_test.reshape(-1,28,28,1)
X_train.shape, Y_train.shape, X_test.shape, Y_test.shape
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(10,5))
plt.imshow(np.hstack(X_train[:5]), cmap="gray");
In this section, we'll introduce a callback that will help us reduce the learning rate if the metric/loss that it is monitoring is not improving.
Below, we have created a simple neural network with 2 convolution layers and one dense layer. The convolutional layers have 32 and 16 filters each. Both of them will apply kernels of size (3,3) on the input. We have applied relu (rectified linear unit) activation to the output of both convolution layers. We have created a method that will initiate a model each time it is called. We'll be reusing this method for each of our upcoming sections.
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
def create_model():
return Sequential([
layers.Conv2D(filters=32, kernel_size=(3,3), padding="same", activation="relu",
input_shape=(28,28,1)),
#layers.Conv2D(filters=32, kernel_size=(3,3), padding="same", activation="relu"),
layers.Conv2D(filters=16, kernel_size=(3,3), padding="same", activation="relu"),
layers.Flatten(),
layers.Dense(10, activation="softmax")
])
model = create_model()
model.summary()
In this section, we have compiled our model. We have set SGD (stochastic gradient descent) as our optimizer with a learning rate of 0.001, 'sparse categorical crossentropy' as our loss, and accuracy as our metric.
from tensorflow.keras.optimizers import SGD
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
In this section, we are training our network for 15 epochs. We have used a callback named ReduceLROnPlateau available from keras. We can initialize callbacks and then give them as list to callbacks argument of fit(), evaluate() and predict() methods.
The ReduceLROnPlateau callback will monitor the learning rate and will reduce it if there is no improvement in the metric. In our case, we have asked it to monitor validation accuracy. We have set patient parameter to 3 which will inform it that if there is no improvement in validation accuracy for 3 constant epochs then decrease learning rate. It'll multiply the learning rate by the number specified using factor parameter which is 0.5 in our case hence it'll halve the learning rate. The value set with min_delta is the amount by which validation accuracy should increase else the learning rate will decrease. The mode parameter is 'auto' by default and can determine easily if the metric provided to monitor parameter needs to be decreased or increased. The other two values are 'min' and 'max' where we explicitly specify whether metric needs to be monitored for maximization or minimization. We can specify a minimum value that the learning rate can go using min_lr parameter. We have also set verbose to True which will log messages when there is a change in learning rate by callback.
We can notice from the results below that the learning rate is changed 2 times.
from tensorflow.keras import callbacks
lr_reduce_max = callbacks.ReduceLROnPlateau(monitor="val_accuracy", factor=0.5,
patience=3, verbose=1, mode="max",
min_delta=0.01 ,min_lr=0.0001)
model.fit(X_train, Y_train, batch_size=256, epochs=15, validation_data=(X_test,Y_test), callbacks=[lr_reduce_max])
Below, we have explained another example of using ReduceLROnPlateau. This time, we are monitoring validation loss for minimization.
lr_reduce_min = callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=3,
verbose=1, mode="min", min_delta=0.01 ,min_lr=0.0001)
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test), callbacks=[lr_reduce_min])
In this section, we have introduced another callback that will stop training if there is no improvement in a metric that it is monitoring.
Below, we have created a model using the function we designed earlier and compiled it.
from tensorflow.keras.optimizers import SGD
model = create_model()
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
In this section, we are training our neural network for 10 epochs with an early stop callback. We can create a callback using EarlyStopping constructor available from keras. We need to provide the metric to monitor. We can also provide information like how many metrics should improve (min_delta), the number of epochs to wait before stopping training (patience), and whether to maximize or minimize metric (mode).
In our case, we are monitoring validation accuracy which if does not improve by 0.05 for 3 consecutive epochs then training stops. We can notice that training is stopped after 7 epochs even though we asked to run for 10 epochs.
early_stop = callbacks.EarlyStopping(monitor="val_accuracy", min_delta=0.05, patience=3, verbose=1, mode="max")
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test), callbacks=[early_stop])
Below, we have again called EarlyStopping callback. This time, we are monitoring validation loss. We monitor validation loss, which if does not improve by amount 0.01 for 3 consecutive epochs then training will be stopped. We can notice that training is stopped after 8 epochs even though we asked to run for 10 epochs.
early_stop = callbacks.EarlyStopping(monitor="val_loss", min_delta=0.01, patience=3, verbose=1, mode="min", baseline=0.4, restore_best_weights=True)
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test), callbacks=[early_stop])
In this section, we are explaining a callback that will save the model and its weights when it is giving the best results. We can then later load any of best performing models.
In this section, we have initialized our network and compiled it as usual.
from tensorflow.keras.optimizers import SGD
model = create_model()
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
In this section, we are training our neural network for 10 epochs with a model-saving callback. We can create a model checkpoint callback using ModelCheckpoint constructor. It accepts a file path where the model will be saved during training. It'll let us save the model at the end of each epoch or only the best result model or after a specified number of batches. We can also provide string formating to the filename.
In our case, we are saving the model after each epoch if validation accuracy keeps improving. It won't save the model if the validation accuracy of a current epoch is less than the previous epoch. The save_freq parameter can also accept integer values specifying after that many batches model needs to be saved. By default, the whole model state along with its weights will be saved. Later on, we have explained how we can save only weights.
from tensorflow.keras import callbacks
checkpoint = callbacks.ModelCheckpoint(filepath="/home/sunny/fashion_mnist_conv/model-{epoch:02d}-{val_accuracy:.2f}.hdf5",
monitor="val_accuracy", verbose=1, mode="max", save_freq="epoch")
lr_reduce_max = callbacks.ReduceLROnPlateau(monitor="val_accuracy",
factor=0.5, patience=3, verbose=1, mode="max",
min_delta=0.05 ,min_lr=0.0001)
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test),
callbacks=[lr_reduce_max, checkpoint])
%ls /home/sunny/fashion_mnist_conv/
model.evaluate(X_test, Y_test)
In this section, we are loading one of the models that were saved during training. After loading the model, we are also checking its accuracy of it. We can load model using load_model() function available from keras.models module.
from keras.models import load_model
import os
model_files = os.listdir("/home/sunny/fashion_mnist_conv/")
model_files = [f for f in model_files if "model" in f]
print("Loading Model : {}".format(model_files[-1]))
loaded_model1 = load_model(os.path.join("/home/sunny/fashion_mnist_conv/", model_files[-1]))
loaded_model1.evaluate(X_test, Y_test)
In this section, we have designed a callback asking it to save only the weights of the model instead of the whole model.
checkpoint = callbacks.ModelCheckpoint(filepath="/home/sunny/fashion_mnist_conv/weights-{epoch:02d}-{val_accuracy:.2f}.hdf5",
monitor="val_accuracy",
save_best_only=True, save_weights_only=True,
verbose=1, mode="max", save_freq="epoch")
model.fit(X_train, Y_train, batch_size=256, epochs=5, validation_data=(X_test,Y_test), callbacks=[checkpoint])
%ls /home/sunny/fashion_mnist_conv/
In this section, we are loading the model based on weights only. In order to load the model with weights, we first need to create model architecture and compile it. Then, we can call load_weights() method on it to load the model with weights from the file.
After loading the model, we have evaluated it on test data to check accuracy.
import os
weights_files = os.listdir("/home/sunny/fashion_mnist_conv/")
weights_files = [f for f in weights_files if "weights" in f]
print("Loading Weights : {}".format(weights_files[-1]))
loaded_model2 = create_model()
loaded_model2.compile(optimizer="sgd", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
loaded_model2.load_weights(os.path.join("/home/sunny/fashion_mnist_conv/",weights_files[-1]))
loaded_model2.evaluate(X_test, Y_test)
In this section, we have explained a callback that let us log loss and metrics to a CSV file.
We have initialized our network and compiled it as usual.
from tensorflow.keras.optimizers import SGD
model = create_model()
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
In this section, we are training our network with a callback that will save loss and metric values to a CSV file. We can create callback using CSVLogger() constructor. We just need to give a filename to it to which details will be stored.
from tensorflow.keras import callbacks
csv_logger = callbacks.CSVLogger("/home/sunny/model.csv", append=True)
lr_reduce_max = callbacks.ReduceLROnPlateau(monitor="val_accuracy",
factor=0.5, patience=3, verbose=1, mode="max",
min_delta=0.05 ,min_lr=0.0001)
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test),
callbacks=[lr_reduce_max, csv_logger])
Below, we have loaded the CSV file and printed the result logged by callback.
import pandas as pd
pd.read_csv("/home/sunny/model.csv")
In this section, we have explained another callback that logs training detail that can be used by tensorboard later on. Tensorboard is a tool created by the tensorflow team that can let us analyze various metrics of training that can give us meaningful insights.
We have initialized our neural network and compiled it as usual.
from tensorflow.keras.optimizers import SGD
model = create_model()
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
Below, we are training our neural network for 10 epochs with a callback that will log details for the tensorboard. We can create a callback using TensorBoard constructor. We need to provide a path where it'll log information. The histogram_freq parameter if set will calculate activation and weights histogram after that many epochs. The update_freq parameter accepts either string ('epoch' or 'batch') or integer value and will log losses/metrics after each epoch/batch. If we provide an integer value then it'll log losses/metrics after that many batches. There is also a parameter named write_images which if set to True will log model weights that can be visualized as images.
from tensorflow.keras import callbacks
tensorboard_logs = callbacks.TensorBoard("/home/sunny/logs", histogram_freq=1, write_graph=True,
update_freq="epoch")
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test),
callbacks=[tensorboard_logs])
%ls /home/sunny/logs
Below, we have loaded tensorboard as an external extension in the jupyter notebook first. Then, we have started it by giving a directory where logs are stored.
The %ls, %load_ext and %tensorboard are jupyter notebook magic commands. Jupyter notebook has many other magic commands that can help developers. If you want to learn about different magic commands then please check the below link which explains many of them.
%load_ext tensorboard
%tensorboard --logdir=/home/sunny/logs
In this section, we have explained how we can create our custom callback if none of the existing callbacks available from keras does satisfy our requirements.
Below, we have initialized our model and compiled it as usual.
from tensorflow.keras.optimizers import SGD
model = create_model()
model.compile(optimizer=SGD(learning_rate=0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
We can create a callback by extending Callback class available from keras. We need to implement methods according to our needs. Below, we have listed down all possible methods that can be implemented. The methods with 'train' word in them will be executed during training (fit() call), with 'test' word will be executed during evaluation (evaluate() call) and with 'predict' word will be executed during prediction (predict() call).
We can perform operations before the start of training, after the end of the training, before the start of the epoch, after the end of an epoch, before the start of a batch, and after the end of a batch. The methods on_epoch_begin() and on_epoch_end() will help execute particular steps before and after epoch during training. There are separate methods for batches. Each method has a parameter named logs that will have a dictionary of metrics and loss values till now.
In our case, we have implemented on_train_end() method that will save the model after completion of training. The implementation of on_epoch_begin() method simply prints the learning rate that will be used for that epoch. The on_epoch_end() method halves the learning rate after completion of the epoch and save model as well.
Please make a NOTE that we can access the model object from self object as we have done below in callback implementation. This can let us perform many things that will require access to the model like we can update/normalize weights as well using a callback.
from tensorflow.keras.callbacks import Callback
class CustomCallback(Callback):
def on_train_begin(self, logs=None):
print("Training Started")
def on_train_end(self, logs=None):
self.model.save("/home/sunny/convnet.hdf5")
def on_test_begin(self, logs=None):
pass
def on_test_end(self, logs=None):
pass
def on_predict_begin(self, logs=None):
pass
def on_predict_end(self, logs=None):
pass
def on_epoch_begin(self, epoch, logs=None):
current_lr = tensorflow.keras.backend.get_value(self.model.optimizer.learning_rate)
print("Epoch Learning Rate : {}".format(current_lr))
def on_epoch_end(self, epoch, logs=None):
self.model.optimizer.learning_rate = self.model.optimizer.learning_rate / 2
self.model.save("/home/sunny/convnet/model-{}.hdf5".format(epoch+1))
def on_train_batch_begin(self, batch, logs=None):
pass
def on_train_batch_end(self, batch, logs=None):
pass
def on_test_batch_begin(self, batch, logs=None):
pass
def on_test_batch_end(self, batch, logs=None):
pass
def on_predict_batch_begin(self, batch, logs=None):
pass
def on_predict_batch_end(self, batch, logs=None):
pass
Below, we are executing our model for 10 epochs with our custom callback. We can notice from the result that the learning rate is getting printed at the beginning of each epoch. We have later on also listed down directory contents and we can notice that models are saved after each epoch as well as after training completion.
from tensorflow.keras import callbacks
custom_callback = CustomCallback()
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test,Y_test), callbacks=[custom_callback])
%ls /home/sunny/convnet
model.evaluate(X_test,Y_test)
Below, we have loaded our model saved after training completion and evaluated the test set using it.
from keras.models import load_model
print("Loading Model : convnet.hdf5")
loaded_model = load_model("/home/sunny/convnet.hdf5")
loaded_model.evaluate(X_test, Y_test)
In this section, we have listed other callbacks available from keras that we have not covered in this tutorial as they are self-explanatory. We have included a small explanation for them below.
This ends our small tutorial explaining how we can use callbacks available from keras and create custom callbacks if needed. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to