Build, test, and deploy your Amazon Sagemaker inference models to AWS Lambda

Amazon SageMaker is a fully managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. When you deploy an ML model, Amazon SageMaker leverages ML hosting instances to host the model and provides an API endpoint to provide inferences. It may also use AWS IoT Greengrass.

However, thanks to Amazon SageMaker’s flexibility, which allows deployment to different targets, there are situations when hosting the model on AWS Lambda can provide some advantages. Not every model can be hosted on AWS Lambda, for instance, when a GPU is needed. Also, there are other limits, like the size of AWS Lambda’s deployment package, which can prevent you from using this method. When using AWS Lambda is possible, this architecture has advantages like lower cost, event triggering, seamless scalability, and spike requests. For example, when the model is small and not often invoked, it may be cheaper to use AWS Lambda.

In this post, I create a pipeline to build, test, and deploy an Lambda function that provides inferences.

Prerequisites

I assume that the reader has experience with Amazon SageMaker, AWS CloudFormation, AWS Lambda, and the AWS Code* suite.

Architecture description

To create the pipeline for CI/CD, use AWS Developer Tools. The suite uses AWS CodeDeploy, AWS CodeBuild, and AWS CodePipeline. Following is a diagram of the architecture:

When I train the model with Amazon SageMaker, the output model is saved into an Amazon S3 bucket. Each time a file is put into the bucket, AWS CloudTrail triggers an Amazon CloudWatch event. This event invokes a Lambda function to check whether the file uploaded is a new model file. It then moves this file to a different S3 bucket. This is necessary because Amazon SageMaker saves other files, like checkpoints, in different folders, along with the model file. But to trigger AWS CodePipeline, there must be a specific file in a specific folder of an S3 bucket.

Therefore, after the model file is moved from the Amazon SageMaker bucket to the destination bucket, AWS CodePipeline is triggered. First, AWS CodePipeline invokes AWS CodeBuild to create three items:

The deployment package of the Lambda function.
The AWS Serverless Application Model (AWS SAM) template to create the API.
The Lambda function to serve the inference.

After this is done, AWS CodePipeline executes the change set to transform the AWS SAM template into an AWS CloudFormation template. When the template executes, AWS CodeDeploy is triggered. AWS CodeDeploy invokes a Lambda function to test whether the Lambda function that was newly created in the latest version of your model is working as expected. If so, AWS CodeDeploy shifts the traffic from the old version to the new version of the Lambda function with the newest version of the model. Then, the deployment is done.

How the Lambda function deployment package is created

In the AWS CloudFormation template that I created to generate the pipelines, I included a section where I indicate how AWS CodeBuild should create this package. I also outlined how to create the AWS SAM template to generate the API and the Lambda function itself.

Here’s the code example:

- "git clone ${GitRepository}"
- "cd ${GitRepositoryName}"
- "rm -rf .git "
- "ls -al "
- "aws s3 cp s3://${SourceBucket}/${SourceS3ObjectKey} ."
- "tar zxf ${SourceS3ObjectKey}"
- "ls -al"
- "pwd"
- "rm -f ${SourceS3ObjectKey}"
- "aws cloudformation package --template-file samTemplateLambdaChecker.yaml --s3-bucket ${SourceBucket} --output-template-file ../outputSamTemplate.yaml"
- "cp samTemplateLambdaChecker.yaml ../"

In the BuildSpec, I use a GitHub repository to download the necessary files. These files are the Lambda function code, the Lambda function checker (which AWS CodeDeploy uses to check whether the new model works as expected), and the AWS SAM template. In addition, AWS CodeBuild copies the latest model.tar.gz file from S3.

To work, the Lambda function also must have Apache MXNet dependencies. The AWS CloudFormation template that you use creates a Lambda layer that contains the MXNet libraries necessary to run inferences in Lambda. I have not created a pipeline to build the layer, as that isn’t the focus of this post. You can find the steps I used to compile MXNet from Lambda in the following section.

Testing the pipeline

Before proceeding, create a new S3 bucket into which to move the model file:

In the S3 console, choose Create bucket.
For Bucket Name, enter a custom name.
For Region, choose the Region in which to create the pipeline and choose Next.
Enable versioning by selecting Keep all versions of an object in the same bucket and choose Next.
Choose Create bucket.

In this bucket, add three files:

An empty file in a zip file called empty.zip. This is necessary because AWS CodeBuild must receive a file when invoked in order to work—although, you do not use this file in this case.
The file mxnet-layer.zip.
The zip function, which copies the file from the Amazon SageMaker bucket to the AWS CodePipeline triggering bucket.

To upload these files:

Open the S3
Choose your bucket.
On the Upload page, click on Add files and select the zip file.
Choose Next until you can select Upload.

Now that you have created this new bucket, you can launch the AWS CloudFormation template after downloading the template.

Open the AWS CloudFormation
Choose Create Stack.
For Choose a template, select Upload a template to Amazon S3 and select the file.
Choose Next.
Add a Stack name.
Change SourceS3Bucket to the bucket name you have previously created.
Choose Next, then Next
Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Choose Create.

This creates the pipeline on your behalf and deploys everything necessary. When you train the model in Amazon SageMaker, you must indicate that the S3 bucket created by your AWS CloudFormation template is the bucket in which you should host the output model. To find the name of your S3 bucket:

Open the AWS CloudFormation
Select your Stack Name.
Choose Resources and find ModelS3Location.

To simulate that a new model has been trained by Amazon SageMaker and uploaded to S3, download a model that I previously trained and uploaded here on GitHub.

After that’s downloaded, you can upload the file to the S3 bucket that you created. The model has been trained from the SMS Spam Collection dataset provided by the University of California. You can also view the workshop from re:Invent 2018 that covers how to train this model. This simple dataset was trained with a neural network using Gluon, based on Apache MXNet.

Open the S3
Choose your ModelS3Location bucket.
Choose Upload, Add files, and select the zip file.
Choose Next, and choose Upload.

From the AWS CodeDeploy console, you should be able to see that the process has been initiated, as shown in the following image.

After the process has been completed, you can see that a new AWS CloudFormation stack called AntiSpamAPI has been created. As previously explained, this new stack has created the Lambda function and the API to serve the inference. You can invoke the endpoint directly. First, find the endpoint URL.

In the AWS CloudFormation console, choose your AntiSpamAPI.
Choose Resources and find ServerlessRestApi.
Choose the ServerlessRestApi resource, which opens the API Gateway console.
From the API Gateway console, select AntiSpamAPI.
Choose Stages, Prod.
Copy the Invoke URL.

After you have the endpoint URL, you can test it using this simple page that I’ve created:

For example, you can determine that the preceding sentence has a 99% probability of being spam, as you can see from the raw output.

Conclusion

I hope this post proves useful for understanding how you can automatically deploy your model into a Lambda function using AWS developer tools. Having a pipeline can reduce the overhead associated with using a model with a serverless architecture. With minor changes, you can use this pipeline to deploy a model that can be trained anywhere, like Amazon Deep Learning AMIs, AWS Deep Learning Containers, or on premises.

If you have questions or suggestions, please share them on GitHub or in the comments.

About the Author

Diego Natali is a solutions architect for Amazon Web Services in Italy. With several years engineering background, he helps ISV and Start up customers designing flexible and resilient architectures using AWS services. In his spare time he enjoys watching movies and riding his dirt bike.