Scikit-Learn is the most famous ML library out there. It's been the most preferred ML library for a long time. It has an implementation of the majority of ML algorithms related to any problems (regression, classification, clustering, anomaly detection, dimensionality reduction, etc.). One of the main reasons why it's most preferred is because it's easy to use API. It let us perform training of our data and evaluation of it on test sets with just a few function calls. Though it has good API and implementation of the majority of ML algorithms, it does not have support for deep neural networks (convolutional neural network, recurrent neural network, etc.) which are commonly getting used to solving complicated problems (image classification, speech recognition, etc) nowadays. Apart from this, Scikit-Learn does not have support for running code on GPU as well.
The commonly preferred library when creating a deep neural network is PyTorch. It let us create complicated networks like convolutional and recurrent neural networks as well as it let us run the code on GPU as well. But one of the drawbacks of Pytorch is that it's a lower-level library that requires us to design training and evaluation code for our problem. This can take time to get it working correctly and can result in errors as well sometimes. This can be a little hard for the person with a background in Scikit-Learn who wants to use PyTorch to solve their problem.
To eliminate the drawbacks of both Scikit-Learn and PyTorch, a new library named Skorch was created. It let us use PyTorch for creating complicated neural network models and then use Scikit-Learn like API for training and evaluating that model. This frees developers from the burden of writing code for the training and evaluation of models. Skorch also makes the life of a developer who has good experience with Scikit-Learn and lets him easily use PyTorch to solve complicated problems using neural networks without worrying too much about code of training and evaluation.
As a part of this tutorial, we'll explain with simple examples how we can use Skorch to train and evaluate PyTorch models. We'll be creating simple neural networks to make things easy to understand and will be trying them on small toy datasets. Below we have listed important sections of the tutorial for giving an overview of what we'll be covering.
Below we have imported the necessary libraries that we'll use in our tutorial and printed the version of each.
import skorch
print("Skorch Version : {}".format(skorch.__version__))
import torch
print("Pytorch Version : {}".format(torch.__version__))
import sklearn
print("Scikit-Learn Version : {}".format(sklearn.__version__))
In this section, we'll explain how we can use skorch for a regression problem. We'll be designing a simple PyTorch neural network and using it to solve a regression problem. We'll be using the Boston housing dataset available from scikit-learn for our purpose.
In this section, we have loaded the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.
We have divided the dataset into the train (80%) and test (20%) sets as well. After that, we have converted the train and test dataset into PyTorch tensor as all PyTorch models require input as tensors.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
## Convert dataset to torch tensors
from torch import tensor
X_train = tensor(X_train, dtype=torch.float32)
X_test = tensor(X_test, dtype=torch.float32)
Y_train = tensor(Y_train.reshape(-1,1), dtype=torch.float32)
Y_test = tensor(Y_test.reshape(-1,1), dtype=torch.float32)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have created a simple neural network that takes as input our dataset features and predicts the median house price.
The design of a neural network is simple. The first layer is the input layer with 13 inputs (one for each feature of data). The second layer has 26 neurons, the third layer has 52 neurons and the final layer has just one neuron for predicting the house price. We have initialized all layers in init method of the class. The forward method has actual logic about going through the network. It applies relu activation function after each layer and returns output at last.
Please make a NOTE that we have not explained neural network creation using PyTorch in detail as we expect that reader has a background in simple model creation using it.
from torch import nn
import torch.nn.functional as F
class Regressor(nn.Module):
def __init__(self):
super(Regressor, self).__init__()
self.first_layer = nn.Linear(13, 26)
self.second_layer = nn.Linear(26,52)
self.final_layer = nn.Linear(52,1)
def forward(self, x_batch):
X = self.first_layer(x_batch)
X = F.relu(X)
X = self.second_layer(X)
X = F.relu(X)
return self.final_layer(X)
In this section, we have included logic that will create an ML model by wrapping PyTorch neural network which will behave like scikit-learn models and have API-like scikit-learn models (methods like fit(), predict(), etc).
In order to make our PyTorch neural network behave like scikit-learn ML models, we need to wrap them into NeuralNetRegressor instance of skorch. Below we have given the definition of NeuralNetRegressor for explanation purposes.
Below we have created an instance of NeuralNetRegressor by giving it our neural network which we had created in an earlier cell. We have given an instance of Adam optimizer to be used in the model. We have asked the model to be trained for 500 epochs for each call to fit() method. We have set verbose parameter to 0 which will have silent updates printed at each epoch. By default, the output result of each epoch is printed. As we have nearly 500 epochs, it'll flood output hence we have silent it.
from skorch import NeuralNetRegressor
from torch import optim
skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, max_epochs=500, verbose=0)
skorch_regressor
In this section, we have simply trained the model by calling fit() method giving it train dataset features and the target variable.
skorch_regressor.fit(X_train, Y_train);
We can make predictions using predict() method by giving it dataset features.
Y_preds = skorch_regressor.predict(X_test)
Y_preds[:5]
In this section, we are evaluating model performance by calculating mean squared error and R2 score on both train and test datasets. We can notice that our model seems to have good performance based on evaluation results.
The score() method will calculate R^2 score for regression problems.
If you are interested in learning about model evaluation metrics using scikit-learn then please feel free to check our tutorial on the same which explains the topic with simple and easy-to-understand examples.
from sklearn.metrics import mean_squared_error
print("Train MSE : {}".format(mean_squared_error(Y_train, skorch_regressor.predict(X_train).reshape(-1))))
print("Test MSE : {}".format(mean_squared_error(Y_test, skorch_regressor.predict(X_test).reshape(-1))))
print("\nTrain R^2 : {}".format(skorch_regressor.score(X_train, Y_train)))
print("Test R^2 : {}".format(skorch_regressor.score(X_test, Y_test)))
We can access training history using history attribute of NeuralNetRegressor instance. It has information about train loss, validation loss, epoch number, etc.
Below we have retrieved a few details from the history of training and printed them.
skorch_regressor.history[-2:]
skorch_regressor.history[:, ("train_loss", "valid_loss")][-5:]
skorch_regressor.history[-1:, ("train_loss", "valid_loss")]
In this section, we have explained how we can use PyTorch classification models like scikit-learn models. We’ll be designing a simple PyTorch classification neural network as a part of this example. We'll be using the wine dataset available from scikit-learn for our purpose.
In this section, we have loaded the wine dataset available from scikit-learn. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.
After loading, We have divided the dataset into the train (80%) and test (20%) sets. We have then converted all train and test array into tensor as all PyTorch models requires input as tensors.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_wine(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
## Convert dataset to torch tensors
from torch import tensor
X_train = tensor(X_train, dtype=torch.float32)
X_test = tensor(X_test, dtype=torch.float32)
Y_train = tensor(Y_train, dtype=torch.long)
Y_test = tensor(Y_test, dtype=torch.long)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have designed a simple PyTorch model of a few layers. The first layer is the input layer with 13 inputs (13 features). The second layer has 5 neurons and the last layer has 13 neurons. We have initialized each layer inside of init method of the model class. We have then included logic to take a dataset and make predictions inside of forward() method. We have also applied relu activation for each intermediate layer. At the final layer, we have applied softmax activation.
from torch import nn
import torch.nn.functional as F
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.first_layer = nn.Linear(13, 5)
self.second_layer = nn.Linear(5,13)
self.final_layer = nn.Linear(13,3)
def forward(self, x_batch):
X = self.first_layer(x_batch)
X = F.relu(X)
X = self.second_layer(X)
X = F.relu(X)
X = F.dropout(X, 0.15)
X = self.final_layer(X)
X = F.relu(X)
return F.softmax(X, dim=1)
In order to make our PyTorch classification neural net behave like scikit-learn models, we need to wrap them inside of NeuralNetClassifier class. It has the almost same signature as that of NeuralNetRegressor. Below we have highlighted the definition.
Below we have wrapped our PyTorch classifier inside of NeuralNetClassifier. We have also instructed to use nn.CrossEntropyLoss as loss function and optim.Adam as optimizer. We have also instructed train_split parameter to do a stratified split of the dataset as its classification dataset and we want the proportion of classes to be maintained across different datasets.
from skorch import NeuralNetClassifier
from torch import optim
skorch_classifier = NeuralNetClassifier(module=Classifier,
criterion=nn.CrossEntropyLoss,
optimizer=optim.Adam,
max_epochs=750,
train_split=skorch.dataset.CVSplit(cv=5, stratified=True),
verbose=0,
)
skorch_classifier
Now, we have trained the model by calling fit() method and giving it the train dataset and the target variable.
skorch_classifier.fit(X_train, Y_train);
Below we have made prediction on test dataset using predict() and predict_proba() methods. The predict() method will return actual class and predict_proba() will return probabilities of each class.
Y_preds = skorch_classifier.predict(X_test)
Y_probs = skorch_classifier.predict_proba(X_test)
Y_preds[:5], Y_probs[:5]
In this section, we have printed the accuracy of the model on train and test datasets. The score() method will calculate accuracy by default.
print("Test Accuracy : {:.2f}".format(skorch_classifier.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier.score(X_train, Y_train)))
Below we have printed a few entries of the history of the training process for analysis purposes.
skorch_classifier.history[-2:]
skorch_classifier.history[:, ("train_loss", "valid_loss")][-5:]
In this section, we'll explain how we can create a machine learning pipeline using scikit-learn by treating our PyTorch model as a sklearn estimator using skorch. We'll be creating a simple ML pipeline with only two steps. The first step of the pipeline will scale the data and the second step will apply our PyTorch model wrapped inside of skorch class. We'll be using the Boston housing dataset for this example hence our example will be solving the regression task.
If you are interested in learning about how to create a machine learning pipeline using scikit-learn then please feel free to check our tutorial on the same which tries to explain the topic with simple and easy-to-understand examples.
In this section, we have loaded the Boston housing dataset from sklearn, divided it into train/test sets, and converted datasets into Pytorch tensor. The code in this part is almost exactly the same as our code from the first example data load.
### Load Dataset
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train = X_train.astype(np.float32)
X_test = X_test.astype(np.float32)
Y_train = Y_train.reshape(-1,1).astype(np.float32)
Y_test = Y_test.reshape(-1,1).astype(np.float32)
In this section, we have created a simple PyTorch model and wrapped it into NeuralNetRegressor class of skorch so that it can be used like sklearn estimator. The code of this part is almost the same as that of the code from the regression section.
## Model Definition
from torch import nn
import torch.nn.functional as F
class Regressor(nn.Module):
def __init__(self):
super(Regressor, self).__init__()
self.first_layer = nn.Linear(13, 26)
self.second_layer = nn.Linear(26,52)
self.final_layer = nn.Linear(52,1)
def forward(self, x_batch):
X = self.first_layer(x_batch)
X = F.relu(X)
X = self.second_layer(X)
X = F.relu(X)
return self.final_layer(X)
## Declare Model
from skorch import NeuralNetRegressor
from torch import optim
skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, max_epochs=500, verbose=0)
skorch_regressor
In this section, we have created a machine learning pipeline using Pipeline class of sklearn. Our ML pipeline consists of two steps.
After creating the pipeline, we have called fit() method on it to train the pipeline using train data.
If you want to learn about scaling the data for machine learning tasks then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.
## Create Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", skorch_regressor)])
ml_pipeline.fit(X_train, Y_train)
In this section, we have evaluated the performance of the pipeline by calculating MSE and R^2 scores on train and test datasets. We can compare the output with that of output from the regression section and can notice that metrics results have improved significantly by just applying simple scaling to data.
### Evaluate Model
from sklearn.metrics import mean_squared_error
print("Train MSE : {}".format(mean_squared_error(Y_train, ml_pipeline.predict(X_train).reshape(-1))))
print("Test MSE : {}".format(mean_squared_error(Y_test, ml_pipeline.predict(X_test).reshape(-1))))
print("\nTrain R^2 : {}".format(ml_pipeline.score(X_train, Y_train)))
print("Test R^2 : {}".format(ml_pipeline.score(X_test, Y_test)))
In this section, we'll explain how we can perform hyperparameters tunning by grid searching through different values of hyperparameters. We'll be designing a simple PyTorch neural network, wrapping it inside of skorch class, and grid search through different hyperparameters of the model to find the best hyperparameter settings that give the best results. We'll be using the Boston housing dataset from the previous section.
If you are interested in learning about hyperparameters grid search using scikit-learn then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.
In this section, we have created a simple PyTorch neural network for the regression task and wrapped it inside of NeuralNetRegressor class of skorch to make it behave like sklearn estimator. The code for this part is almost the same as the code from the regression section.
## Model Definition
from torch import nn
import torch.nn.functional as F
class Regressor(nn.Module):
def __init__(self):
super(Regressor, self).__init__()
self.first_layer = nn.Linear(13, 26)
self.second_layer = nn.Linear(26,52)
self.final_layer = nn.Linear(52,1)
def forward(self, x_batch):
X = self.first_layer(x_batch)
X = F.relu(X)
X = self.second_layer(X)
X = F.relu(X)
return self.final_layer(X)
## Declare Model
from skorch import NeuralNetRegressor
from torch import optim
skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, verbose=0)
skorch_regressor
In this section, we have first declared a hyperparameters dictionary with a list of hyperparameters and their different values to try. The GridSearchCV from sklearn will try all combinations of these hyperparameters values with our data and keep track of results. We'll be trying different values of the below 3 hyperparameters.
After creating an instance of GridSearchCV by giving skorch regressor and hyperparameters dictionary, we have also performed hyperparameters search by calling fit() method on grid search instance. The call to fit() method will try different combinations of hyperparameters on a model with given data.
from sklearn.model_selection import GridSearchCV
params = {
"lr": [0.01, 0.02],
"max_epochs": [100, 250, 500],
"optimizer__weight_decay": [0, 0.1]
}
grid = GridSearchCV(skorch_regressor, params)
grid.fit(X_train, Y_train)
In this section, we have printed the best hyperparameters settings which gave the best score.
print("Best Score : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
At last, we have printed MSE and R^2 scores for train/test datasets using grid instance which will use a model with best hyperparameters settings.
### Evaluate Model
from sklearn.metrics import mean_squared_error
print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train).reshape(-1))))
print("Test MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test).reshape(-1))))
print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test R^2 : {}".format(grid.score(X_test, Y_test)))
In this section, we have explained how we can perform a grid search for hyperparameters tunning on a machine learning pipeline. We can tune various parameters of individual parts of the pipeline. We'll be creating a pipeline using scikit-learn and performing a grid search on it. We'll be using the Boston housing dataset which we had loaded earlier. We'll also be reusing the skorch wrapped PyTorch model for the task which we had created in the previous section.
Below we have first declared a hyperparameters search dictionary. As we'll be tunning hyperparameters of our skorch model, we have prefixed all hyperparameter names with string 'Model__'. The reason behind adding this prefix is because we have given the name 'Model' to our skorch model inside of ML pipeline which we have declared next.
We have then created an instance of GridSearchCV by giving it ML pipeline and hyperparameters dictionary. At last, we have performed grid search on ML pipeline by calling fit() method on GridSearchCV instance by giving it training data.
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
params = {
"Model__lr": [0.01, 0.02],
"Model__max_epochs": [100, 250, 500],
"Model__optimizer__weight_decay": [0, 0.1]
}
ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", skorch_regressor)])
grid = GridSearchCV(ml_pipeline, params)
grid.fit(X_train, Y_train)
In this section, we have printed the best hyperparameters settings which gave the best score.
print("Best Score : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
At last, we have printed MSE and R^2 scores for train/test datasets using grid instance which will use a model with best hyperparameters settings.
### Evaluate Model
from sklearn.metrics import mean_squared_error
print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train).reshape(-1))))
print("Test MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test).reshape(-1))))
print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test R^2 : {}".format(grid.score(X_test, Y_test)))
In this section, we have explained how we can save the trained skorch model and then load it again from saved files.
The skorch model provides a method named save_params() which lets us save model weights, optimizer, loss function, and training history to different files. We can then load the model using these files and resume training or make direct predictions.
Below we have called save_params() method on skorch model from classification section. We have provided it with four different file names for saving different details of the model as well.
skorch_classifier.save_params(f_params="params.pkl",
f_optimizer="opt.pkl",
f_criterion="criterion.pkl",
f_history="hist.json"
)
Below we have created a new instance of NeuralNetClassifier using our PyTorch model. We have given all other parameter values which we had used in the classification section.
After creating the model, we need to call initialize() method on it in order to make any prediction using it. This is needed because we haven't called fit() method a single time on it. If we have called fit() method then weights and other things will be initialized and we don't need to call initialize().
We have also evaluated model performance after initializing it and we can notice that as the model is initialized with random weights, the performance is not good.
from skorch import NeuralNetClassifier
from torch import optim
skorch_classifier2 = NeuralNetClassifier(module=Classifier,
criterion=nn.CrossEntropyLoss,
optimizer=optim.Adam,
max_epochs=750,
train_split=skorch.dataset.CVSplit(cv=5, stratified=True),
verbose=0,
)
skorch_classifier2
skorch_classifier2.initialize()
print("Test Accuracy : {:.2f}".format(skorch_classifier2.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier2.score(X_train, Y_train)))
We can load the model with previously saved weights using load_params() method of skorch model.
Below we have called load_params() method on the newly created skorch model by giving it various file names to which we had saved model weights and other details.
skorch_classifier2.load_params(
f_params="params.pkl",
f_optimizer="opt.pkl",
f_criterion="criterion.pkl",
f_history="hist.json"
)
After loading the model from files, below we have evaluated the performance of the model on train and test datasets. We can see that it's giving the same results as that of the classification section model.
print("Test Accuracy : {:.2f}".format(skorch_classifier2.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier2.score(X_train, Y_train)))
This ends our small tutorial explaining how we can wrap PyTorch model inside of scikeras model so that the resulting model can be used like scikit-learn estimator. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to