Advancing AI trust with new responsible AI tools, capabilities, and resources

As generative AI continues to drive innovation across industries and our daily lives, the need for responsible AI has become increasingly important. At AWS, we believe the long-term success of AI depends on the ability to inspire trust among users, customers, and society. This belief is at the heart of our long-standing commitment to building and using AI responsibly. Responsible AI goes beyond mitigating risks and aligning to relevant standards and regulations. It’s about proactively building trust and unlocking AI’s potential to drive business value. A comprehensive approach to responsible AI empowers organizations to innovate boldly and achieve transformative business outcomes. New joint research conducted by Accenture and AWS underscores this, highlighting responsible AI as a key driver of business value — boosting product quality, operational efficiency, customer loyalty, brand perception, and more. Nearly half of the surveyed companies acknowledge responsible AI as pivotal in driving AI-related revenue growth. Why? Responsible AI builds trust, and trust accelerates adoption and innovation.

With trust as a cornerstone of AI adoption, we are excited to announce at AWS re:Invent 2024 new responsible AI tools, capabilities, and resources that enhance the safety, security, and transparency of our AI services and models and help support customers’ own responsible AI journeys.

Taking proactive steps to manage AI risks and foster trust and interoperability

AWS is the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines the requirements for organizations to manage AI systems responsibly throughout their lifecycle. Technical standards, such as ISO/IEC 42001, are significant because they provide a common framework for responsible AI development and deployment, fostering trust and interoperability in an increasingly global and AI-driven technological landscape. Achieving ISO/IEC 42001 certification means that an independent third party has validated that AWS is taking proactive steps to manage risks and opportunities associated with AI development, deployment, and operation. With this certification, we reinforce our commitments to providing AI services that help you innovate responsibly with AI.

Expanding safeguards in Amazon Bedrock Guardrails to improve transparency and safety

In April 2024, we announced the general availability of Amazon Bedrock Guardrails, which makes it easier to apply safety and responsible AI checks for your gen AI applications. Amazon Bedrock Guardrails delivers industry-leading safety protections by blocking up to 85% more harmful content on top of native protections provided by foundation models (FMs) and filtering over 75% of hallucinated responses from models using contextual grounding checks for Retrieval Augmented Generation (RAG) and summarization use cases. The ability to implement these safeguards was a big step forward in building trust in AI systems. Despite the advancements in FMs, models can still produce hallucinations—a challenge many of our customers face. For use cases where accuracy is critical, customers need the use of mathematically sound techniques and explainable reasoning to help generate accurate FM responses.

To address this need, we are adding new safeguards to Amazon Bedrock Guardrails to help prevent factual errors due to FM hallucinations and offer verifiable proofs. With the launch of the Automated Reasoning checks in Amazon Bedrock Guardrails (preview), AWS becomes the first and only major cloud provider to integrate automated reasoning in our generative AI offerings. Automated Reasoning checks help prevent factual errors from hallucinations using sound mathematical, logic-based algorithmic verification and reasoning processes to verify the information generated by a model, so outputs align with provided facts and aren’t based on hallucinated or inconsistent data. Used alongside other techniques such as prompt engineering, RAG, and contextual grounding checks, Automated Reasoning checks add a more rigorous and verifiable approach to enhancing the accuracy of LLM-generated outputs. Encoding your domain knowledge into structured policies helps your conversational AI applications provide reliable and trustworthy information to your users.

Click on the image below to see a demo of Automated Reasoning checks in Amazon Bedrock Guardrails.

As organizations increasingly use applications with multimodal data to drive business value, improve decision-making, and enhance customer experiences, the need for content filters extends beyond text. Amazon Bedrock Guardrails now supports multimodal toxicity detection (in preview) with support for image content, helping organizations to detect and filter undesirable and potentially harmful image content while retaining safe and relevant visuals. Multimodal toxicity detection helps remove the heavy lifting required to build your own safeguards for image data or invest time in manual evaluation that can be error-prone and tedious. Amazon Bedrock Guardrails helps you to responsibly create AI applications, helping build trust with your users.

Improving generative AI application responses and quality with new Amazon Bedrock evaluation capabilities

With more general-purpose FMs to choose from, organizations now have a wide range of options to power their generative AI applications. However, selecting the optimal model for a specific use case requires efficiently comparing models based on an organization’s preferred quality and responsible AI metrics. While evaluation is an important part of building trust and transparency, it demands substantial time, expertise, and resources for every new use case, making it challenging to choose the model that delivers the most accurate and safe customer experience. Amazon Bedrock Evaluations addresses this by helping you evaluate, compare, and select the best FMs for your use case. You can now use an LLM-as-a-judge (in preview) for model evaluations to perform tests and evaluate other models with human-like quality on your dataset. You can choose from LLMs hosted on Amazon Bedrock to be the judge, with a variety of quality and responsible AI metrics such as correctness, completeness, and harmfulness. You can also bring your own prompt dataset to customize the evaluation with your data, and compare results across evaluation jobs to make decisions faster. Previously, you had a choice between human-based model evaluation and automatic evaluation with exact string matching and other traditional natural language processing (NLP) metrics. These methods, though fast, didn’t provide a strong correlation with human evaluators. Now, with LLM-as-a-judge, you can get human-like evaluation quality at a much lower cost than full human-based evaluations while saving up to weeks of time. Many organizations still want the final assessment to be from expert human annotators. For this, Amazon Bedrock still offers full human-based evaluations with an option to bring your own workforce or have AWS manage your custom evaluation.

To equip FMs with up-to-date and proprietary information, organizations use RAG, a technique that fetches data from company data sources and enriches the prompt to provide more relevant and accurate responses. However, evaluating and optimizing RAG applications can be challenging due to the complexity of optimizing retrieval and generation components. To address this, we’ve introduced RAG evaluation support in Amazon Bedrock Knowledge Bases (in preview). This new evaluation capability now allows you to assess and optimize RAG applications conveniently and quickly, right where your data and LLMs already reside. Powered by LLM-as-a-judge technology, RAG evaluations offer a choice of several judge models and metrics, such as context relevance, context coverage, correctness, and faithfulness (hallucination detection). This seamless integration promotes regular assessments, fostering a culture of continuous improvement and transparency in AI application development. By saving both cost and time compared to human-based evaluations, these tools empower organizations to enhance their AI applications, building trust through consistent improvement.

The model and RAG evaluation capabilities both provide natural language explanations for each score in the output file and on the AWS Management Console. The scores are normalized from 0 to 1 for ease of interpretability. Rubrics are published in full with the judge prompts in the documentation so non-scientists can understand how scores are derived. To learn more about model and RAG evaluation capabilities, see News blog.

Introducing Amazon Nova, built with responsible AI at the core

Amazon Nova is a new generation of state-of-the-art FMs that deliver frontier intelligence and industry leading price-performance. Amazon Nova FMs incorporate built-in safeguards to detect and remove harmful content from data, rejecting inappropriate user inputs, and filtering model outputs. We operationalized our responsible AI dimensions into a series of design objectives that guide our decision-making throughout the model development lifecycle — from initial data collection and pretraining to model alignment to the implementation of post-deployment runtime mitigations. Amazon Nova Canvas and Amazon Nova Reel come with controls to support safety, security, and IP needs with responsible AI. This includes watermarking, content moderation, and C2PA support (available in Amazon Nova Canvas) to add metadata by default to generated images. Amazon’s safety measures to combat the spread of misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks also extend to Amazon Nova models. For more information on how Amazon Nova was built responsibly, read the Amazon Science blog.

Enhancing transparency with new resources to advance responsible generative AI

At re:Invent 2024, we announced the availability of new AWS AI Service Cards for Amazon Nova Reel, Amazon Canvas, Amazon Nova Micro, Lite, and Pro, Amazon Titan Image Generator, and Amazon Titan Text Embeddings to increase transparency of Amazon FMs. These cards provide comprehensive information on the intended use cases, limitations, responsible AI design choices, and best practices for deployment and performance optimization. A key component of Amazon’s responsible AI documentation, AI Service Cards offer customers and the broader AI community a centralized resource to understand the development process we undertake to build our services in a responsible way that addresses fairness, explainability, privacy and security, safety, controllability, veracity and robustness, governance, and transparency. As generative AI continues to grow and evolve, transparency on how technology is developed, tested, and used will be a vital component to earn the trust of organizations and their customers alike. You can explore all 16 AI Service Cards on Responsible AI Tools and Resources.

We also updated the AWS Responsible Use of AI Guide. This document offers considerations for designing, developing, deploying, and operating AI systems responsibly, based on our extensive learnings and experience in AI. It was written with a set of diverse AI stakeholders and perspectives in mind—including, but not limited to, builders, decision-makers, and end-users. At AWS, we are committed to continuing to bring transparency resources like these to the broader community—and to iterate and gather feedback on the best ways forward.

Delivering breakthrough innovation with trust at the forefront

At AWS, we’re dedicated to fostering trust in AI, empowering organizations of all sizes to build and use AI effectively and responsibly. We are excited about the responsible AI innovations announced at re:Invent this week. From new safeguards and evaluation techniques in Amazon Bedrock to state-of-the-art Amazon Nova FMs to fostering trust and transparency with ISO/IEC 42001 certification and new AWS AI Service Cards, you have more tools, resources and built-in protections to help you innovate responsibly and unlock value with generative AI.

We encourage you to explore these new tools and resources:

About the author

Dr. Baskar Sridharan is the Vice President for AI/ML and Data Services & Infrastructure, where he oversees the strategic direction and development of key services, including Bedrock, SageMaker, and essential data platforms like EMR, Athena, and Glue.

View Original Source (aws.amazon.com) Here.

← previous - next →