Favorite A year ago, Simon Willison wrote one of the cleanest definitions of an agent that has stuck around: An LLM agent runs tools in a loop to achieve a goal. That definition stuck because it describes what every production agent actually does. Kiro, Amazon Q Developer, Quick Agents, Codex,
Read More
Shared by AWS Machine Learning June 18, 2026
Favorite Monitoring and troubleshooting generative AI inference endpoints operating at scale is challenging. When your large language model (LLM) endpoint’s P99 latency spikes, you must determine in minutes whether the root cause is GPU memory pressure, a saturated KV cache, unbalanced traffic across Availability Zones, or an auto scaling policy
Read More
Shared by AWS Machine Learning June 18, 2026
Favorite The models powering today’s agents are remarkably capable. They can reason across complex problems, plan multi-step workflows, and generate nuanced responses. But most agents are operating well below that potential. The gap isn’t intelligence. It’s access to the right context and feedback. A customer service agent tasked with answering
Read More
Shared by AWS Machine Learning June 17, 2026
Favorite Agents are only as intelligent as the context they can reason over. Today, that context is scattered across data lakes, data warehouses, lakehouses, databases, and streams, and in institutional knowledge that has never been written down. You want to trust the decisions made by your AI agents, but that
Read More
Shared by AWS Machine Learning June 17, 2026
Favorite What if you came back from a full day of meetings and the busywork was already done? Stalled deals followed up on. Compliance changes summarized. Meeting prep written. Not because you multi-tasked, but because something was working in the background while you focused on other urgent priorities. Teams are already using Amazon Quick — an AI assistant
Read More
Shared by AWS Machine Learning June 17, 2026
Favorite Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon Simple Storage Service (Amazon S3) before each invocation. For payloads up to 128,000
Read More
Shared by AWS Machine Learning June 17, 2026
Favorite Research in “Nature” shows our conversational AI system matches primary care physicians in complex disease management. View Original Source (blog.google/technology/ai/) Here.
Favorite The Open Source Initiative’s 2025 Annual Report documents a year in which Open Source found itself at the center of major debates around AI, cybersecurity, sustainability, and public policy. In 2025, OSI continued its work to protect and advance the Open Source ecosystem through licensing stewardship, policy engagement, research,
Read More
Shared by voicesofopensource June 17, 2026
Favorite As large language models (LLMs) grow in size and complexity, maximizing inference throughput while minimizing latency remains a critical challenge for enterprise production deployments. Speculative decoding is one effective strategy to address this, utilizing a lightweight draft model to guess future tokens which are then verified by the target LLM in a single forward pass.
Read More
Shared by AWS Machine Learning June 16, 2026
Favorite Today, we’re excited to announce container image caching for Amazon SageMaker AI inference, the next major advancement in our faster scaling optimization journey. This speeds up end-to-end latency by up to 2x for generative AI models during scale-out events. Over the years, Amazon SageMaker AI has continued to reduce
Read More
Shared by AWS Machine Learning June 16, 2026