Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Favorite Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster. These updates build on the capabilities introduced in the original launch of the inference optimization toolkit (to learn more, see Achieve up

Read More
Shared by AWS Machine Learning December 4, 2024

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Favorite Digital experience interruptions can harm customer satisfaction and business performance across industries. Application failures, slow load times, and service unavailability can lead to user frustration, decreased engagement, and revenue loss. The risk and impact of outages increase during peak usage periods, which vary by industry—from ecommerce sales events to

Read More
Shared by AWS Machine Learning December 4, 2024

Query structured data from Amazon Q Business using Amazon QuickSight integration

Favorite Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Although generative AI is fueling transformative innovations, enterprises may still experience sharply divided data silos when it comes to enterprise

Read More
Shared by AWS Machine Learning December 4, 2024

How Amazon Finance Automation built a generative AI Q&A chat assistant using Amazon Bedrock

Favorite Today, the Accounts Payable (AP) and Accounts Receivable (AR) analysts in Amazon Finance operations receive queries from customers through email, cases, internal tools, or phone. When a query arises, analysts must engage in a time-consuming process of reaching out to subject matter experts (SMEs) and go through multiple policy

Read More
Shared by AWS Machine Learning December 3, 2024

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

Favorite Chronos-Bolt is the newest addition to AutoGluon-TimeSeries, delivering accurate zero-shot forecasting up to 250 times faster than the original Chronos models [1]. Time series forecasting plays a vital role in guiding key business decisions across industries such as retail, energy, finance, and healthcare. Traditionally, forecasting has relied on statistical

Read More
Shared by AWS Machine Learning December 3, 2024

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2

Favorite In Part 1 of this series, we introduced Amazon SageMaker Fast Model Loader, a new capability in Amazon SageMaker that significantly reduces the time required to deploy and scale large language models (LLMs) for inference. We discussed how this innovation addresses one of the major bottlenecks in LLM deployment: the

Read More
Shared by AWS Machine Learning December 3, 2024

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1

Favorite The generative AI landscape has been rapidly evolving, with large language models (LLMs) at the forefront of this transformation. These models have grown exponentially in size and complexity, with some now containing hundreds of billions of parameters and requiring hundreds of gigabytes of memory. As LLMs continue to expand,

Read More
Shared by AWS Machine Learning December 3, 2024

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

Favorite Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. This innovation allows you to scale your models faster, observing up to 56% reduction in latency when scaling

Read More
Shared by AWS Machine Learning December 3, 2024

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Favorite Today at AWS re:Invent 2024, we are excited to announce a new feature for Amazon SageMaker inference endpoints: the ability to scale SageMaker inference endpoints to zero instances. This long-awaited capability is a game changer for our customers using the power of AI and machine learning (ML) inference in

Read More
Shared by AWS Machine Learning December 3, 2024

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

Favorite This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA. At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. These advancements build upon our collaboration

Read More
Shared by AWS Machine Learning December 3, 2024

« 1 … 3 4 5 6 7 … 244 »