Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod

Favorite Modern AI applications demand fast, cost-effective responses from large language models, especially when handling long documents or extended conversations. However, LLM inference can become prohibitively slow and expensive as context length increases, with latency growing exponentially and costs mounting with each interaction. LLM inference requires recalculating attention mechanisms for

Read More
Shared by AWS Machine Learning November 27, 2025

How CBRE powers unified property management search and digital assistant using Amazon Bedrock

Favorite This post was written with Lokesha Thimmegowda, Muppirala Venkata Krishna Kumar, and Maraka Vishwadev of CBRE. CBRE is the world’s largest commercial real estate services and investment firm. The company serves clients in more than 100 countries and offers services ranging from capital markets and leasing advisory to investment

Read More
Shared by AWS Machine Learning November 27, 2025

How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator

Favorite This post was written with Martyna Shallenberg and Brode Mccrady from Myriad Genetics. Healthcare organizations face challenges in processing and managing high volumes of complex medical documentation while maintaining quality in patient care. These organizations need solutions to process documents effectively to meet growing demands. Myriad Genetics, a provider of

Read More
Shared by AWS Machine Learning November 27, 2025

Train custom computer vision defect detection model using Amazon SageMaker

Favorite On October 10, 2024, Amazon announced the discontinuation of the Amazon Lookout for Vision service, with a scheduled shut down date of October 31, 2025 (see Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision blog post). As part of our transition guidance for customers, we recommend the

Read More
Shared by AWS Machine Learning November 26, 2025

Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference

Favorite Generative AI models continue to expand in scale and capability, increasing the demand for faster and more efficient inference. Applications need low latency and consistent performance without compromising output quality. Amazon SageMaker AI introduces new enhancements to its inference optimization toolkit that bring EAGLE based adaptive speculative decoding to

Read More
Shared by AWS Machine Learning November 26, 2025

Enhanced performance for Amazon Bedrock Custom Model Import

Favorite You can now achieve significant performance improvements when using Amazon Bedrock Custom Model Import, with reduced end-to-end latency, faster time-to-first-token, and improved throughput through advanced PyTorch compilation and CUDA graph optimizations. With Amazon Bedrock Custom Model Import you can to bring your own foundation models to Amazon Bedrock for

Read More
Shared by AWS Machine Learning November 26, 2025

Beyond the technology: Workforce changes for AI

Favorite Workplaces are increasingly integrating AI tools into daily operations, with AI assistants supporting teams, predictive analytics informing strategies, and automation streamlining workflows. AI has moved from experimental technology to standard business practice, changing how work gets done. Organizations need to understand what AI can do and how it affects

Read More
Shared by AWS Machine Learning November 26, 2025

Evaluate models with the Amazon Nova evaluation container using Amazon SageMaker AI

Favorite This blog post introduces the new Amazon Nova model evaluation features in Amazon SageMaker AI. This release adds custom metrics support, LLM-based preference testing, log probability capture, metadata analysis, and multi-node scaling for large evaluations. The new features include: Custom metrics use the bring your own metrics (BYOM) functions

Read More
Shared by AWS Machine Learning November 26, 2025

Optimizing Mobileye’s REM™ with AWS Graviton: A focus on ML inference and Triton integration

Favorite This post is written by Chaim Rand, Principal Engineer, Pini Reisman, Software Senior Principal Engineer, and Eliyah Weinberg, Performance and Technology Innovation Engineer, at Mobileye. The Mobileye team would like to thank Sunita Nadampalli and Guy Almog from AWS for their contributions to this solution and this post. Mobileye

Read More
Shared by AWS Machine Learning November 26, 2025

University of California Los Angeles delivers an immersive theater experience with AWS generative AI services

Favorite This post was co-written with Andrew Browning, Anthony Doolan, Jerome Ronquillo, Jeff Burke, Chiheb Boussema, and Naisha Agarwal from UCLA. The University of California, Los Angeles (UCLA) is home to 16 Nobel Laureates and has been ranked the #1 public university in the United States for 8 consecutive years.

Read More
Shared by AWS Machine Learning November 26, 2025