Announcing provisioned concurrency for Amazon SageMaker Serverless Inference
Favorite Amazon SageMaker Serverless Inference allows you to serve model inference requests in real time without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. You can let AWS handle the undifferentiated heavy lifting of managing the underlying infrastructure and save costs in the process.
Read More
Shared by AWS Machine Learning May 9, 2023