When designing a deep learning model, there are many decisions that need to be made and we don't know the answer to many of them upfront. The common questions like
There are some common standards like using relu activation and Adam optimizer gives good results. But this is not always true. There are not 100% right answers to the above questions for any given problem. These are commonly referred to as hyperparameters for which we need to make decisions. One of the solutions is to try all possible combinations of these hyperparameters to see which one works best. Though this solution seems viable, in reality, deep learning models have a lot of data and can require a lot of time to train hence grid searching through all possible combinations might not be a feasible solution. There are other algorithms like random search, hyperband, and Bayesian optimization which we have covered in this tutorial.
As a part of this tutorial, we'll be explaining how we can use Keras Tuner library to optimize the hyperparameters of networks designed by Python deep learning library keras. The keras tuner library provides an implementation of algorithms like random search, hyperband, and bayesian optimization for hyperparameters tuning. These algorithms find good hyperparameters settings in less number of trials without trying all possible combinations. They search for hyperparameters in the direction that is giving good results. We have explained step by step guide to hyperparameters optimization with simple examples using a keras tuner.
Below, we have listed important sections of tutorial to give an overview of the material covered.
pip install -U keras_tuner
Below, we have imported the necessary libraries and printed the versions that we have used in our tutorial.
import keras_tuner
print("Keras Tuner Version : {}".format(keras_tuner.__version__))
from tensorflow import keras
print("Keras Version : {}".format(keras.__version__))
In our first example, we'll explain how we can use a keras tuner for regression tasks. We have loaded the Boston housing dataset available from datasets module of keras below.
from tensorflow.keras import datasets
(X_train_reg, Y_train_reg), (X_test_reg, Y_test_reg) = datasets.boston_housing.load_data()
X_train_reg.shape, X_test_reg.shape, Y_train_reg.shape, Y_test_reg.shape
In order to use the keras tuner, we need to design a function that takes as input a single parameter and returns a compiled keras model. The single input parameter is an instance of HyperParameters that has information about values of various hyperparameters that we want to tune. The HyperParameters instance has various methods that can be used to try different values for a particular type of hyperparameter. These methods let us provide values of different types like boolean, integer, list of strings, etc.
In our case below, we have created a neural network of 3 dense layers. For the first two dense layers, we want to try different values of units, use_bias, and activation hyperparameters. The last dense layer has one output unit which will be a prediction of our network. Then, we also want to try different optimizers to select the one that gives the best results. In order to try different values of hyperparameters, we have used various methods available from HyperParameters class. Below, we have explained the commonly used methods.
In our case below, we have asked to try various units of dense layers using Int() method. We have asked to try values in the range [16,50] with a value increment of 16 hence it'll try values 16,32 and 48. We have given different names for hyperparameters of both layers. For use_bias hyperparameter, we have used Boolean() method to try boolean values. For activation hyperparameter, we have used Choice() method to select 'relu' or 'tanh' activation function. We have also used Choice() method to try different optimizers ('sgd','rmsprop' and 'adam').
After creating a model with hyperparameters, we have compiled it and returned it from the function. This function will be used by the hyperparameters optimization algorithm. The algorithm will provide one set of hyperparameters settings to this function to create a model and then it'll run this model recording its performance of it. The algorithm will also keep track of the performance of various hyperparameters settings to select the best one.
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import SGD
def build_model(hyperparams):
model = Sequential()
model.add(layers.Input(shape=(X_train_reg.shape[1],)))
model.add(layers.Dense(units=hyperparams.Int("units_l1", 16, 50, step=16),
use_bias=hyperparams.Boolean("bias_l1"),
activation=hyperparams.Choice("act_l1", ["relu", "tanh"])
))
model.add(layers.Dense(units=hyperparams.Int("units_l2", 16, 50, step=16),
use_bias=hyperparams.Boolean("bias_l2"),
activation=hyperparams.Choice("act_l2", ["relu", "tanh"])
))
model.add(layers.Dense(1))
optim=hyperparams.Choice("optimizer",["sgd","rmsprop","adam"])
model.compile(optim, loss="mean_squared_error", metrics=["mean_squared_error"])
return model
Below, we have performed hyperparameters tunning using a random search algorithm. We can create an instance of random search algorithm using RandomSearch() constructor available from keras tuner. The constructor takes the below-mentioned important parameters that are required for finding the best hyperparameters for the model.
In our case below, we have created RandomSearch instance by giving the function we designed earlier. We have asked it to minimize validation mean squared error and try 5 different combinations of hyperparameters. It'll call the function 5 times with different hyperparameters settings and create 5 different models to try.
After we have created an instance of RandomSearch tuner, we need to call search() function on it to actually try different hyperparameters settings. The search() function accepts same parameters as fit() method of model instance. We have given train data and validation data to the method. The call to search() function will try 5 different hyperparameters settings by creating 5 different models using them. It'll run all models for 10 epochs and record all metrics for them. It'll then sort models that have less validation mean squared error to more than one.
The tuner prints the best validation mean squared error of 63.61 at the end of the tuning process.
from keras_tuner import RandomSearch
from keras_tuner import Objective
tuner1 = RandomSearch(hypermodel=build_model,
objective="val_mean_squared_error",
#objective=Objective(name="val_mean_squared_error",direction="min"),
max_trials=5,
#seed=123,
project_name="Regression",
overwrite=True
)
tuner1.search(X_train_reg, Y_train_reg, batch_size=32, epochs=10, validation_data=(X_test_reg, Y_test_reg))
After hyperparameters tuning process has completed, we can call get_best_hyperparameters() method on instance of RandomSearch tuner. Below, we have printed the best hyperparameters combination.
best_params = tuner1.get_best_hyperparameters()
best_params[0].values
We can also retrieve a model instance that gave the best results. We can use the same model for making predictions as well. It'll be loaded with trained parameters. We can save the best model for later use as well.
best_model = tuner1.get_best_models()[0]
best_model.summary()
Y_test_reg_preds = best_model.predict(X_test_reg)
Y_test_reg[:5], Y_test_reg_preds[:5]
We can print the results of trials by calling results_summary() function. It'll print results in order of best to worst performing models. We can provide num_trials parameter to it specifying to print only that many best entries.
In our case, we have asked to print 3 best-performing models. We can notice that the first best performing network has a validation mean squared error of around 63.6.
tuner1.results_summary(num_trials=3)
As a part of our second example, we have explained how we can use a random search tuner for classification tasks. We have loaded the Fashion MNIST dataset below for our task. The dataset has grayscale images of shape (28,28) pixels for 10 different fashion items. The dataset is already divided into the train (60k images) and test (10k images) sets. We'll be trying various convolutional neural networks on this dataset to check which one is giving the best results.
import numpy as np
from tensorflow.keras import datasets
(X_train_classif, Y_train_classif), (X_test_classif, Y_test_classif) = datasets.fashion_mnist.load_data()
X_train_classif, X_test_classif = X_train_classif.reshape(-1,28,28,1), X_test_classif.reshape(-1,28,28,1)
classes = np.unique(Y_train_classif)
X_train_classif.shape, X_test_classif.shape, Y_train_classif.shape, Y_test_classif.shape
In the below cell, we have created a new class that extends HyperModel class. The class has build() method that takes HyperParameters instance as input and returns a compiled keras model. It sets various hyperparameters using methods of HyperParameters instance. We'll be giving an instance of this class to RandomSearch() constructor later.
The function has first hyperparameter ('ConvNetType') that is choice between two values ('Conv1' and 'Conv2'). Based on the value of this parameter, we'll add two ('Conv1') or three ('Conv2') convolution layers to the network. For the convolution layer, we are trying different output channels using Int() method. For 2 layer convolution option ('Conv1'), both layers try 16 and 32 as output channel values. For 3 layer convolution option ('Conv2'), the first two convolution layer tries 16 and 32 output channel values and the third convolution layer tries 8 and 16 output channel values.
Apart from this, we have also asked to try different values of activation ('relu' and 'tanh') and kernel initialization ('random_normal', 'lecun_normal' and 'he_normal') using Choice() method.
Once, convolution layers are added to the network based on 'ConvNetType' hyperparameter value, we add a dense layer to the network that has 10 output units (same as the number of target classes) and softmax activation function.
Then, we have compiled the network and returned it.
This time, we have used another method of HyperParameters class named conditional_scope(). This method is used to create a scope that will be only active during specified values of particular hyperparameters. In our case, we have used it when 'ConNetType' has values 'Conv1' or 'Conv2'. This can be useful when we have to make decisions based on many values and we want a scope for a small list of values from all original values. To explain it with simple example, lets say that 'ConNetType' has 5 values ('Conv1', 'Conv2', 'Conv3', 'Conv4' and 'Conv5') and we want scope for 3 values ('Conv1', 'Conv3' and 'Conv5') and 2 values ('Conv2' and 'Conv4') separately.
from keras_tuner import HyperModel
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
class ConvNetwork(HyperModel):
def build(self, hp):
model = Sequential()
model.add(layers.Input(shape=X_train_classif.shape[1:]))
model_type = hp.Choice("ConvNetType", ["Conv1","Conv2"])
if model_type == "Conv1":
with hp.conditional_scope("ConvNetType", ["Conv1"]):
activation = hp.Choice("activation", ["relu", "tanh"])
kern_init = hp.Choice("kernel_initializer", ["random_normal", "lecun_normal","he_normal"])
model.add(layers.Conv2D(filters=hp.Int("Conv1_1", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
model.add(layers.Conv2D(filters=hp.Int("Conv1_2", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
elif model_type == "Conv2":
with hp.conditional_scope("ConvNetType", ["Conv2"]):
activation = hp.Choice("activation", ["relu", "tanh"])
kern_init = hp.Choice("kernel_initializer", ["random_normal", "lecun_normal","he_normal"])
model.add(layers.Conv2D(filters=hp.Int("Conv2_1", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
model.add(layers.Conv2D(filters=hp.Int("Conv2_2", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
model.add(layers.Conv2D(filters=hp.Int("Conv2_3", 8, 17, step=8), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
model.add(layers.Flatten())
model.add(layers.Dense(units=len(classes), activation="softmax"))
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
return model
In the below cell, we have created a random search tuner and executed it for 5 trials. We have given our instance of HyperModel to it and have asked it to maximize validation accuracy using Objective instance.
We have executed the tuning process by calling search() function giving it train data validation data, batch size (512), and epochs (10).
The tuner prints the best validation accuracy of 0.903 at the end of the tuning process.
from keras_tuner import RandomSearch
from keras_tuner import Objective
conv2 = ConvNetwork()
tuner2 = RandomSearch(hypermodel=conv2,
objective=Objective(name="val_accuracy",direction="max"),
max_trials=5,
#seed=123,
project_name="Classification",
overwrite=True
)
tuner2.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
In the below cell, we have printed the best hyperparameters settings that gave 0.903 accuracy.
In the next cells, we have retrieved the best model and used it to evaluate performance on the test dataset which we had used as a validation dataset. Then, we have printed the tuning summary as well.
best_params = tuner2.get_best_hyperparameters()
best_params[0].values
best_model = tuner2.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_test_classif, Y_test_classif)
tuner2.results_summary(num_trials=3)
In this section, we have explained how we can override default arguments like an optimizer, loss function, and metrics that we gave when we compile the model inside of the model creation function.
The RandomSearch tuner lets us provide arguments like an optimizer, loss, and metrics that will override whatever we had provided when compiling the model. Below, we have explained for example how we can override default arguments. We have overridden the optimizer from Adam to RMSProp. Though we have overridden loss and metrics as well, we have provided the same values again.
from keras_tuner import RandomSearch
from keras_tuner import Objective
from tensorflow.keras import metrics
conv3 = ConvNetwork()
tuner3 = RandomSearch(hypermodel=conv3,
objective=Objective(name="val_accuracy",direction="max"),
max_trials=5,
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
#metrics=["accuracy", metrics.AUC(name="area_under_curve")],
metrics=["accuracy"],
#seed=123,
project_name="OverrideCompileArgs",
overwrite=True
)
tuner3.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
best_params = tuner3.get_best_hyperparameters()
best_params[0].values
best_model = tuner3.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_test_classif, Y_test_classif)
tuner3.results_summary(num_trials=3)
As a part of this example, we have explained how we can override the existing setting of any hyperparameter by providing our new HyperParameters instance to RandomSearch() constructor.
There can be situations when we want to just override a few hyperparameters of the model and we don't want to modify settings done inside of the function building model. In those situations, we can define our own HyperParameters instance and set those hyperparameters that we want to modify inside it. Then, we provide this HyperParameters instance to hyperparameters argument of RandomSearch() constructor. This will override hyperparameters defined inside the function with those we provided through HyperParameters instance.
In our example below, we have again used the function from Regression section. We have override values that we try for activation functions of both dense layers. The function by default tries values relu and tanh. We have defined HyperParameters instance that replaces those values with selu and elu activations. Then, we have provided HyperParameters instance to RandomSearch tuner. We have then performed hyperparameters tunning by calling search() method on the tuner.
Later on, in the next few cells, we have printed the best hyperparameters, best model, and tuning results for verification purposes. We can notice from the results that the tuner now tries activation functions selu and elu instead of relu and tanh. This confirms that our settings are working as expected.
from keras_tuner import RandomSearch
from keras_tuner import Objective
from keras_tuner import HyperParameters
hp = HyperParameters()
hp.Choice("act_l1",["selu","elu"])
hp.Choice("act_l2",["selu","elu"])
#conv4 = ConvNetwork()
tuner4 = RandomSearch(hypermodel=build_model,
objective=Objective(name="val_mean_squared_error",direction="min"),
max_trials=5,
hyperparameters=hp,
#seed=123
project_name="OverrideExistingHyperparameters",
overwrite=True
)
tuner4.search(X_train_reg, Y_train_reg, batch_size=512, epochs=10, validation_data=(X_test_reg, Y_test_reg))
best_params = tuner4.get_best_hyperparameters()
best_params[0].values
best_model = tuner4.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_train_reg, Y_train_reg)
tuner4.results_summary(num_trials=3)
In this example, we have explained how we can fix the values of some of the hyperparameters that we are tunning. We can do this by defining our own HyperParameters instance and calling Fixed() method to fix values of hyperparameters. Then, we need to give this HyperParameters instance to RandomSearch() constructor.
Below, we are again using the model building function from the regression section. We have fixed a few hyperparameters that we don't want to tune. We have set activation of the first dense layer to relu, a number of units of the first dense layer to 32, and optimizer to adam. These 3 hyperparameters inside of the build function won't be tuned and these fixed values will be used. All other hyperparameters defined inside the function will still be tuned.
After fixing hyperparameters, we have created RandomSearch tuner with HyperParameters instance and called search() method on it to perform hyperparameters tuning.
In the next few cells after tuning, we have also printed the best hyperparameters found by the tuner, best model, and tuning summary results.
We can notice from the results that for the first dense layer units are set at 32 and an activation value of relu is used. The adam optimizer is used for optimization. This confirms that our settings are working as expected.
from keras_tuner import RandomSearch
from keras_tuner import Objective
from keras_tuner import HyperParameters
hp = HyperParameters()
hp.Fixed("act_l1","relu")
hp.Fixed("units_l1", 32)
hp.Fixed("optimizer", "adam")
#hp.Fixed("kernel_initializer", "he_normal")
#conv5 = ConvNetwork()
tuner5 = RandomSearch(hypermodel=build_model,
objective=Objective(name="val_mean_squared_error",direction="min"),
max_trials=5,
hyperparameters=hp,
#seed=123
project_name="FixHyperparameters",
overwrite=True
)
tuner5.search(X_train_reg, Y_train_reg, batch_size=512, epochs=10, validation_data=(X_test_reg, Y_test_reg))
best_params = tuner5.get_best_hyperparameters()
best_params[0].values
best_model = tuner5.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_train_reg, Y_train_reg)
tuner5.results_summary(num_trials=3)
In this section, we have performed hyperparameters optimization using Hyperband algorithm. It is a variation of random search with explore-exploit theory to find good hyperparameters settings. It focuses on speeding up random search through adaptive resource allocation and early stopping. It randomly allocates resources like iterations, data samples, and features to different hyperparameters settings and tries to solve stochastic bandit problems where it keeps on eliminating underperforming settings. The keras tuner provides an implementation of Hyperband algorithm tuner through Hyperband() constructor. It has the majority of the parameters same as random search with a few additional parameters as listed below.
max_epochs * (math.log(max_epochs, factor) ** 2)
cumulative epochs across all trials.In our case below, we have used Hyperband tuner for our classification task involving CNN. We have initialized it with the convolutional neural network with hyper band iterations set to 1. We have asked it to maximize validation accuracy.
After initializing the tuner, we have called search() method as usual to perform the tuning process. We have printed the best hyperparameters settings as well as the best model after completion of the tuning process. We have also printed the tuning process summary. We have got the best accuracy of 0.896. We can set hyperband_iterations to greater than 1 and it might improve results further.
from keras_tuner import Hyperband
from keras_tuner import Objective
from keras_tuner import Hyperband
from keras_tuner import Objective
conv6 = ConvNetwork()
tuner6 = Hyperband(hypermodel=conv6,
objective=Objective(name="val_accuracy",direction="max"),
hyperband_iterations=1,
#seed=123
project_name="Hyperband",
overwrite=True
)
tuner6.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
best_params = tuner6.get_best_hyperparameters()
best_params[0].values
best_model = tuner6.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_test_classif, Y_test_classif)
tuner6.results_summary(num_trials=3)
In this example, we have explained bayesian optimization tuner available from keras tuner. Bayesian optimization uses Bayes theorem to find the best hyperparameters settings. We can use the Bayesian optimization tuner by BayesianOptimization() constructor of the keras tuner. It has almost the same parameters as a random search tuner with a few additional parameters listed below.
1e-4
.Below, we have initialized the bayesian optimization tuner and tried to find good hyperparameters settings for our classification task network (CNN). As usual, we have performed a search by calling search() method on the tuner object.
We have printed the best hyperparameters settings and best model after completion of the process, as well as a summary of various settings, tried.
from keras_tuner import BayesianOptimization
from keras_tuner import Objective
conv7 = ConvNetwork()
tuner7 = BayesianOptimization(hypermodel=conv7,
objective=Objective(name="val_accuracy",direction="max"),
max_trials=10,
num_initial_points=2,
#seed=123
project_name="BayesianOptimization",
overwrite=True
)
tuner7.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
best_params = tuner7.get_best_hyperparameters()
best_params[0].values
best_model = tuner7.get_best_models()[0]
best_model.summary()
best_model.evaluate(X_test_classif, Y_test_classif)
tuner7.results_summary(num_trials=3)
This ends our small tutorial explaining how we can use various tuners available from keras tuner to find the best hyperparameters for the given model. We have explained all hyperparameters tuning algorithms available from keras tuner. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to