Trying to train a neural  network to classify images is a challenge. You need a large amount of data and compute power to train models from scratch. Customers that I've worked for typically don't have the data or the compute power to train image classifiers from scratch. Luckily, they don't have to because there's a nice trick that will speed up the process quite a bit.

In this post I will show you how you can use pre-trained models and transfer-learning to build image classification models in CNTK with ease.

You can get the code for this post over in my Azure notebooks collection: https://notebooks.azure.com/FizzyHacks/projects/cntk/tree/transfer-learning

If you're looking to train an image classifier based on an existing model you'll need to follow a two step process:

  1. Load and modify a pre-trained model
  2. Refine the modified model with more data

To show you how easy it is to use transfer-learning, we're going to use a pre-trained model to build a hotdog or dog model. This model is capable of distinguishing between a hotdog and a hot dog (teckle). Not very useful, I know, but I had to think  of something to demonstrate transfer-learning.

Loading and modifying a pre-trained model

To start out, we're going to take a look at how to load and modify pre-trained models.

CNTK comes with a number of pre-trained models specifically trained on image data. You can find all of the trained models on Github: https://github.com/Microsoft/CNTK/tree/master/PretrainedModels

We're going to load the pre-trained Resnet18 model. The Resnet18 model is a relatively small state-of-the-art image recognition network. Because it's small we can train it relatively fast on a CPU especially since we're going to refine a pre-trained model.

First, we're going to download the pre-trained model and load it into memory.

Loading the pre-trained model

To load the pre-trained Resnet18 model we need the following piece of code:

import os
from urllib.request import urlretrieve
import cntk as C

url = 'https://www.cntk.ai/Models/CNTK_Pretrained/ResNet18_ImageNet_CNTK.model'
model_file = os.path.join('models', 'resnet18.model')

if not os.path.exists(model_file):
    print('Model not found on disk.')
    print('Downloading %s to %s' % (url, model_file))

    os.makedirs('models', exist_ok=True)
    urlretrieve(url, model_file)

It performs the following steps:

  1. First, it checks if the model is available on disk so we don't download it twice
  2. Next, it creates a new model folder if the path doesn't exist yet
  3. Then, it uses the urlretrieve function to download the model

After the model is downloaded we can load the model using the following code:

base_model = C.load_model(model_file)

This code uses the load_model function to load the model graph from disk.

Once we have the model loaded, we can start to modify it.

Modifying the model

The current model doesn't fit the use-case that we're trying to implement. The pre-trained Resnet18 model requires normalized image data as input. The output of the model has 1000 neurons corresponding to the 1000 classes it can predict.

We don't have normalized data, we're using some images pulled from Google image search so we need to account for that. Also, we're only going to predict two classes: hotdog or dog. So we need to modify a few things.

We're going to perform a three step modification:

  1. First, we clone the layers that we need to reuse and freeze them to keep the trained parameters from the original model.
  2. Next, we're going to replace the input variable with a setup that normalizes the input data.
  3. Finally, we're going to attach a new output layer to the model so we only have two output neurons corresponding to the two classes that we have.

Let's start by cloning the layers that we're going to reuse.

Cloning the layers

Every CNTK graph has a method called clone that you can use to copy layers. You can choose to copy the layers with changeable weights or with frozen weights.

We're going to clone all layers except for the original input variable and the output layer of the network. For this we need to locate the input variable and output layer in the graph using the following code:

features_node = C.logging.graph.find_by_name(base_model, 'features')
last_node = C.logging.graph.find_by_name(base_model, 'z.x')

The code performs the following steps:

  1. First, we find the features node in the base model, this is the input variable
  2. Next, we find the z.x node, which is the input node for the final layer in the base model.

Now that we have the pointers in the base model that we need we can clone the layers that we want to reuse using the following code:

cloned_layers = C.combine([last_node.owner]).clone(
        C.CloneMethod.freeze, 
        { features_node: C.placeholder(name='features') })

This code performs the following steps:

  1. First, it creates a new model function using the combine operator. This takes the owner of the last layer, so we skip the output layer.
  2. Next, it clones the function up to the second-to-last layer. We specify that we want to freeze the parameters in the model and use a placeholder for the features node in the model.

The output of the code is a cloned set of layers that we can't optimize further. Now we can start to connect a new input and output layer to this set of cloned layers.

Attaching a new input layer to the model

In the previous section we've cloned the pre-trained model and removed the input and output layer of the model. In our new model we want to be able to feed in raw pixel data that isn't normalized. The model should normalize the raw pixel data before it is passed through the next of the model.

The following code demonstrates how to create a new input variable and input layer for the model:

features_input = C.input_variable((3,224,224), name='features')
normalized_features = features_input - C.Constant(114)

The code performs the following steps:

  1. First, we create a new input variable with three channels and a size of 224x224 pixels so we can feed images into the model.
  2. Then, we take the features from the input variable and subtract a constant value of 114 from all three channels. This normalizes the data to a format that the model understands.

Once we have the input layer, we can attach it to our cloned layers using the following code:

z = cloned_layers(normalized_features)

When we invoke the cloned_layers variable as a function and feed it the normalized_features variable, we get back a new model function that connects the input to the model.

With the input connected, let's take a look at connecting the new output layer next.

Connecting a new output layer

To connect a new output layer we need to define a new Dense layer with a softmax activation function. We can do this using the following code:

output_layer = C.layers.Dense(2, activation=C.ops.softmax, name='output')

The new Dense layer has two neurons, one for hotdog, and another one for dog. We've given it a name output so it's easier to find when we want to use the model.

To connect the model to the new output layer we need to write one more line of code:

z = output_layer(z)

This code takes the z variable that we created in the previous section and connects it to the output layer by invoking the output layer function with the z variable as input.

We now have a fully connected model that we can start to fine-tune with new samples.

Refining the modified model

In the previous section we've loaded a pre-trained model and modified it so it fits our use-case. Most of the layers in this modified model are fully trained. We only need to optimize the parameters in the output layer.

To optimize the output layer, we need to perform a couple of steps:

  1. First, we need to create a mini-batch source for our training and test data
  2. Next, we need to define a criterion and learner to optimize the parameters
  3. Finally, we need to train the model

Let's start with loading the data to train the model.

Creating a new minibatch source for training and testing

We don't have a huge dataset available, there are 20 images in total for training and 10 for validation. Each class is equally represented, 15 dogs and 15 hotdogs.

To load the data we're going to use a mini-batch source. The following code defines a utility function to make the process of creating a mini-batch source easier.

def create_datasource(filename, sweeps=C.io.INFINITELY_REPEAT):
    image_transforms = [
        C.io.transforms.scale(224, 224, 3),
    ]

    streams = C.io.StreamDefs(
        image=C.io.StreamDef('image', transforms=image_transforms),
        label=C.io.StreamDef('label', shape=2)
    )

    serializer = C.io.ImageDeserializer(filename, streams)

    return C.io.MinibatchSource(serializer, max_sweeps=sweeps)

This code performs the following steps:

  1. First, we create a set of image transformations that take the input image and scale it to 224 by 224 pixels with 3 channels. This is required so the data fits the model that we're working with.
  2. Next, we define a set of streams to read from the input file.
  3. In the set of streams we define a stream for the image file and attach the transforms to this stream.
  4. Then, we define another stream to load the labels which has a shape of 2 since we have two labels that we can predict.
  5. After defining the streams, we define a new image deserializer to read the streams from the input file.
  6. Finally, we create the mini-batch source with the deserializer and the sweeps setting.

We can use this utility function to create a test datasource and a training datasource:

train_datasource = create_datasource('data/train/mapping.txt')
test_datasource = create_datasource('data/test/mapping.txt', sweeps=1)

The first datasource reads the mapping.txt from data/train, the second reads the mapping.txt file from the data/test folder and has a sweep setting of 1.

We've just configured the data sources for training and validation, now we can go ahead and set up the criterion and learner for the model.

Defining the criterion and learner for the model

The criterion for a deep learning model in CNTK is defined as a combination of a loss and metric. The loss is used to determine how to optimize the weights. The metric is used to measure the performance during training and testing.

A criterion can be created using a criterion factory function which is demonstrated below:

@C.Function
def create_criterion(z, targets):
    loss = C.losses.cross_entropy_with_softmax(z, targets)
    metric = C.metrics.classification_error(z, targets)
    
    return loss, metric

This code performs the following steps:

  1. First, we define a new function marked with the C.Function annotation.
  2. Next, we create a new loss function cross_entropy_with_softmax
  3. Then, we create a new metric function classification_error
  4. Finally, the function returns the loss and metric in a tuple

You can use the criterion factory function to create the criterion for the model using the following code:

targets_input = C.input_variable(2)
criterion = create_criterion(z, targets_input)

The code performs the following steps:

  1. First, it creates a new input variable for the target labels
  2. Then, it invokes the create_criterion function with the model and targets input

Now that we have the criterion, we can create the learner to optimize the parameters in the model using the following code:

learner = C.learners.sgd(z.parameters, lr=0.01)

This initializes the SGD learner with the parameters of the model and a learning rate of 0.01.

We're ready to start to train the model, in the next section we'll use the criterion with the learner to train the model using the data from the train_datasource.

Training the model

In the previous sections we've created everything that we need to train the classifier. We can use the following code to set up the training process:

progress_writer = C.logging.ProgressPrinter(0)
test_config = C.train.TestConfig(test_datasource)

input_map = {
    features_input: train_datasource.streams.image,
    targets_input: train_datasource.streams.label
}

criterion.train(
    train_datasource,
    parameter_learners=[learner],
    callbacks=[progress_writer, test_config],
    model_inputs_to_streams=input_map,
    epoch_size=20,
    max_epochs=20
)

This code performs the following steps:

  1. First, we create a ProgressPrinter to log the output of the training process
  2. Then, we create a TestConfig to validate the model after it's trained
  3. Next, we set up a mapping between the input variables and the streams from the training datasource
  4. After that, we start the training process by invoking the train method on the criterion function object that we created earlier.

When we run the code, it will produce output similar to the following:

 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------
Learning rate per minibatch: 0.01
    0.589      0.589      0.281      0.281            32
    0.544      0.544      0.156      0.156            32
     0.54       0.54      0.125      0.125            32
    0.457      0.457     0.0625     0.0625            32
     0.51       0.51      0.125      0.125            32
    0.464      0.464     0.0938     0.0938            32
    0.462      0.462     0.0938     0.0938            32
    0.487      0.487      0.125      0.125            32
    0.419      0.419     0.0625     0.0625            32
    0.465      0.465      0.125      0.125            32
    0.439      0.439     0.0938     0.0938            32
    0.469      0.469      0.125      0.125            32
    0.399      0.399     0.0625     0.0625            16
Finished Evaluation [1]: Minibatch[1-1]: metric = 20.00% * 10;

The classification error during training goes down to 6,25% while the classification error on the test set is 20%. The model does overfit a little bit, but it's not bad given that we only have 20 samples to train on.

Now that we have a trained model, let's take a look at how to export it so we can use it from other applications.

Final steps

There's one more thing left to do. If we want to use the model in a different application we need to store the model on disk and use it in our application. CNTK supports storing files in ONNX and CNTK format. ONNX is an interoperable open format that allows you to load trained models in various languages such as Java or C#. So if you're planning on building an image classifier that you want to use in a mobile application or website then ONNX is your best bet.

Here's how to store the model in the ONNX format:

z.save('model.onnx', C.ModelFormat.ONNX)

Now it's time to have some fun! Download the ONNX runtime for C# or use DeepLearning4J to load the model. I'll leave it up to you to make something fun with the sample code in this post.

Ready to learn more?

And on that note, I want to take a minute to promote my book. Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide is a short book that helps you get started with deep learning. I've worked together with Packt to create this book to maximize your profits and minimize the effort to learn about deep learning.

You can find out more about my book on Amazon.

Hope you enjoyed this post, and see you soon!