Come Partner with Us

Accelerating LLM inference with post-training weight and activation using AWQ and GPTQ on Amazon SageMaker AI

Favorite Foundation models (FMs) and large language models (LLMs) have been rapidly scaling, often doubling in parameter count within months, leading to significant improvements in language understanding and generative capabilities. This rapid growth comes with steep costs: inference now requires enormous memory capacity, high-performance GPUs, and substantial energy consumption. This

Read More
Shared by AWS Machine Learning January 10, 2026

Crossmodal search with Amazon Nova Multimodal Embeddings

Favorite Amazon Nova Multimodal Embeddings processes text, documents, images, video, and audio through a single model architecture. Available through Amazon Bedrock, the model converts different input modalities into numerical embeddings within the same vector space, supporting direct similarity calculations regardless of content type. We developed this unified model to reduce

Read More
Shared by AWS Machine Learning January 10, 2026

Detect and redact personally identifiable information using Amazon Bedrock Data Automation and Guardrails

Favorite Organizations handle vast amounts of sensitive customer information through various communication channels. Protecting Personally Identifiable Information (PII), such as social security numbers (SSNs), driver’s license numbers, and phone numbers has become increasingly critical for maintaining compliance with data privacy regulations and building customer trust. However, manually reviewing and redacting

Read More
Shared by AWS Machine Learning January 8, 2026

Build an AI-powered website assistant with Amazon Bedrock

Favorite Businesses face a growing challenge: customers need answers fast, but support teams are overwhelmed. Support documentation like product manuals and knowledge base articles typically require users to search through hundreds of pages, and support agents often run 20–30 customer queries per day to locate specific information. This post demonstrates

Read More
Shared by AWS Machine Learning December 29, 2025

Migrate MLflow tracking servers to Amazon SageMaker AI with serverless MLflow

Favorite Operating a self-managed MLflow tracking server comes with administrative overhead, including server maintenance and resource scaling. As teams scale their ML experimentation, efficiently managing resources during peak usage and idle periods is a challenge. Organizations running MLflow on Amazon EC2 or on-premises can optimize costs and engineering resources by

Read More
Shared by AWS Machine Learning December 29, 2025

Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer

Favorite The rise of powerful large language models (LLMs) that can be consumed via API calls has made it remarkably straightforward to integrate artificial intelligence (AI) capabilities into applications. Yet despite this convenience, a significant number of enterprises are choosing to self-host their own models—accepting the complexity of infrastructure management,

Read More
Shared by AWS Machine Learning December 24, 2025