Transfer learning for TensorFlow image classification models in Amazon SageMaker

Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.

Starting today, SageMaker provides a new built-in algorithm for image classification: Image Classification – TensorFlow. It is a supervised learning algorithm that supports transfer learning for many pre-trained models available in TensorFlow Hub. It takes an image as input and outputs probability for each of the class labels. You can fine-tune these pre-trained models using transfer learning even when a large number of training images aren’t available. It’s available through the SageMaker built-in algorithms as well as through the SageMaker JumpStart UI inside Amazon SageMaker Studio. For more information, refer to its documentation Image Classification – TensorFlow and the example notebook Introduction to SageMaker TensorFlow – Image Classification.

Image classification with TensorFlow in SageMaker provides transfer learning on many pre-trained models available in TensorFlow Hub. According to the number of class labels in the training data, a classification layer is attached to the pre-trained TensorFlow Hub model. The classification layer consists of a dropout layer and a dense layer, which is a fully connected layer with 2-norm regularizer that is initialized with random weights. The model training has hyperparameters for the dropout rate of the dropout layer and the L2 regularization factor for the dense layer. Then either the whole network, including the pre-trained model, or only the top classification layer can be fine-tuned on the new training data. In this transfer learning mode, you can achieve training even with a smaller dataset.

How to use the new TensorFlow image classification algorithm

This section describes how to use the TensorFlow image classification algorithm with the SageMaker Python SDK. For information on how to use it from the Studio UI, see SageMaker JumpStart.

The algorithm supports transfer learning for the pre-trained models listed in TensorFlow Hub Models. Each model is identified by a unique model_id. The following code shows how to fine-tune MobileNet V2 1.00 224 identified by model_id tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4 on a custom training dataset. For each model_id, in order to launch a SageMaker training job through the Estimator class of the SageMaker Python SDK, you need to fetch the Docker image URI, training script URI, and pre-trained model URI through the utility functions provided in SageMaker. The training script URI contains all the necessary code for data processing, loading the pre-trained model, model training, and saving the trained model for inference. The pre-trained model URI contains the pre-trained model architecture definition and the model parameters. Note that the Docker image URI and the training script URI are the same for all the TensorFlow image classification models. The pre-trained model URI is specific to the particular model. The pre-trained model tarballs have been pre-downloaded from TensorFlow Hub and saved with the appropriate model signature in Amazon Simple Storage Service (Amazon S3) buckets, such that the training job runs in network isolation. See the following code:

from sagemaker import image_uris, model_uris, script_uris
from sagemaker.estimator import Estimator

model_id, model_version = "tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4", "*"
training_instance_type = "ml.p3.2xlarge"

# Retrieve the docker image
train_image_uri = image_uris.retrieve(model_id=model_id,model_version=model_version,image_scope="training",instance_type=training_instance_type,region=None,framework=None)

# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")

# Retrieve the pre-trained model tarball for transfer learning
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

output_bucket = sess.default_bucket()
output_prefix = "jumpstart-example-ic-training"
s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"

With these model-specific training artifacts, you can construct an object of the Estimator class:

# Create SageMaker Estimator instance
tf_ic_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

Next, for transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters, which are listed in Hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default, update them as needed, and then pass them to the Estimator class. Note that the default values of some of the hyperparameters are different for different models. For large models, the default batch size is smaller and the train_only_top_layer hyperparameter is set to True. The hyperparameter Train_only_top_layer defines which model parameters change during the fine-tuning process. If train_only_top_layer is True, parameters of the classification layers change and the rest of the parameters remain constant during the fine-tuning process. On the other hand, if train_only_top_layer is False, all parameters of the model are fine-tuned. See the following code:

from sagemaker import hyperparameters
# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "5"

The following code provides a default training dataset hosted in S3 buckets. We provide the tf_flowers dataset as a default dataset for fine-tuning the models. The dataset comprises images of five types of flowers. The dataset has been downloaded from TensorFlow under the Apache 2.0 License.

# Sample training data is available in this bucket
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
training_data_prefix = "training-datasets/tf_flowers/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

Finally, to launch the SageMaker training job for fine-tuning the model, call .fit on the object of the Estimator class, while passing the S3 location of the training dataset:

# Launch a SageMaker Training job by passing s3 path of the training data
tf_ic_estimator.fit({"training": training_dataset_s3_path}, logs=True)

For more information about how to use the new SageMaker TensorFlow image classification algorithm for transfer learning on a custom dataset, deploy the fine-tuned model, run inference on the deployed model, and deploy the pre-trained model as is without first fine-tuning on a custom dataset, see the following example notebook: Introduction to SageMaker TensorFlow – Image Classification.

Input/output interface for the TensorFlow image classification algorithm

You can fine-tune each of the pre-trained models listed in TensorFlow Hub Models to any given dataset comprising images belonging to any number of classes. The objective is to minimize prediction error on the input data. The model returned by fine-tuning can be further deployed for inference. The following are the instructions for how the training data should be formatted for input to the model:

Input – A directory with as many sub-directories as the number of classes. Each sub-directory should have images belonging to that class in .jpg, .jpeg, or .png format.
Output – A fine-tuned model that can be deployed for inference or can be further trained using incremental training. A preprocessing and postprocessing signature is added to the fine-tuned model such that it takes raw .jpg image as input and returns class probabilities. A file mapping class indexes to class labels is saved along with the models.

The input directory should look like the following example if the training data contains images from two classes: roses and dandelion. The S3 path should look like s3://bucket_name/input_directory/. Note the trailing / is required. The names of the folders and roses, dandelion, and the .jpg filenames can be anything. The label mapping file that is saved along with the trained model on the S3 bucket maps the folder names roses and dandelion to the indexes in the list of class probabilities the model outputs. The mapping follows alphabetical ordering of the folder names. In the following example, index 0 in the model output list corresponds to dandelion, and index 1 corresponds to roses.

input_directory
    |--roses
        |--abc.jpg
        |--def.jpg
    |--dandelion
        |--ghi.jpg
        |--jkl.jpg

Inference with the TensorFlow image classification algorithm

The generated models can be hosted for inference and support encoded .jpg, .jpeg, and .png image formats as the application/x-image content type. The input image is resized automatically. The output contains the probability values, the class labels for all classes, and the predicted label corresponding to the class index with the highest probability, encoded in JSON format. The TensorFlow image classification model processes a single image per request and outputs only one line in the JSON. The following is an example of a response in JSON:

accept: application/json;verbose

 {"probabilities": [prob_0, prob_1, prob_2, ...],
  "labels":        [label_0, label_1, label_2, ...],
  "predicted_label": predicted_label}

If accept is set to application/json, then the model only outputs probabilities. For more details on training and inference, see the sample notebook Introduction to SageMaker TensorFlow – Image Classification.

Use SageMaker built-in algorithms through the JumpStart UI

You can also use SageMaker TensorFlow image classification and any of the other built-in algorithms with a few clicks via the JumpStart UI. JumpStart is a SageMaker feature that allows you to train and deploy built-in algorithms and pre-trained models from various ML frameworks and model hubs through a graphical interface. It also allows you to deploy fully fledged ML solutions that string together ML models and various other AWS services to solve a targeted use case. Check out Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models to find out how to use JumpStart to train an algorithm or pre-trained model in a few clicks.

Conclusion

In this post, we announced the launch of the SageMaker TensorFlow image classification built-in algorithm. We provided example code on how to do transfer learning on a custom dataset using a pre-trained model from TensorFlow Hub using this algorithm. For more information, check out documentation and the example notebook.

About the authors

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design, and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use cases and helping customers optimize deep learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware.

Raju Penmatcha is a Senior AI/ML Specialist Solutions Architect at AWS. He works with education, government, and nonprofit customers on machine learning and artificial intelligence related projects, helping them build solutions using AWS. When not helping customers, he likes traveling to new places.