AWS Machine Learning
-
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 7 hours, 48 minutes ago
Building a voice-driven AWS assistant with Amazon Nova Sonic
As cloud infrastructure becomes increasingly complex, the need for intuitive and […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 1 day, 7 hours ago
Amazon Bedrock AgentCore Observability with Langfuse
The rise of artificial intelligence (AI) agents marks a change in software development and how […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 1 day, 7 hours ago
Scaling MLflow for enterprise AI: What’s New in SageMaker AI with MLflow
Today we’re announcing Amazon SageMaker AI with MLflow, now including a s […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 1 day, 7 hours ago
How Swisscom builds enterprise agentic AI for customer support and sales using Amazon Bedrock AgentCore
This post was written with Arun Sittampalam […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 1 day, 7 hours ago
How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro
This post was written with Bryan Woolgar-O’Neil, Jamie Cockrill and Adrian Cunliffe from Harmonic Security Organizations face increasing challenges protecting sensitive data while supporting third-party generative AI tools. Harmonic Security, a cybersecurity company, developed an AI governance and control layer that spots sensitive data in line as employees use AI, giving security teams the power to keep PII, source code, and payroll information safe while the business accelerates. The following screenshot demonstrates Harmonic Security’s software tool, highlighting the different data leakage detection types, including Employee PII, Employee Financial Information, and Source Code. Harmonic Security’s solution is also now available on AWS Marketplace, enabling organizations to deploy enterprise-grade data leakage protection with seamless AWS integration. The platform provides prompt-level visibility into GenAI usage, real-time coaching at the point of risk, and detection of high-risk AI applications—all powered by the optimized models described in this post. The initial version of their system was effective, but with a detection latency of 1–2 seconds, there was an opportunity to further enhance its capabilities and improve the overall user experience. To achieve this, Harmonic Security partnered with the AWS Generative AI Innovation Center to optimize their system with four key objectives: Reduce detection latency to under 500 milliseconds at the 95th percentile Maintain detection accuracy across monitored data types Continue to support EU data residency compliance Enable scalable architecture for production loads This post walks through how Harmonic Security used Amazon SageMaker AI, Amazon Bedrock, and Amazon Nova Pro to fine-tune a ModernBERT model, achieving low-latency, accurate, and scalable data leakage detection. Solution overview Harmonic Security’s initial data leakage detection system relied on an 8 billion (8B) parameter model, which effectively identified sensitive data but incurred 1–2 second latency, which ran close to the threshold of impacting user experience. To achieve sub-500 millisecond latency while maintaining accuracy, we developed two classification approaches using a fine-tuned ModernBERT model. First, a binary classification model was prioritized to detect Mergers & Acquisitions (M&A) content, a critical category for helping prevent sensitive data leaks. We initially focused on binary classification because it was the simplest approach that would seamlessly integrate within their current system that invokes multiple binary classification models in parallel. Secondly, as an extension, we explored a multi-label classification model to detect multiple sensitive data types (such as billing information, financial projections, and employment records) in a single pass, aiming to reduce the computational overhead of running multiple parallel binary classifiers for greater efficiency. Although the multi-label approach showed promise for future scalability, Harmonic Security decided to stick with the binary classification model for the initial version.The solution uses the following key services: Amazon SageMaker AI – For fine-tuning and deploying the model Amazon Bedrock – For accessing industry-leading large language models (LLMs) Amazon Nova Pro – A highly capable multimodal model that balances accuracy, speed, and cost The following diagram illustrates the solution architecture for low-latency inference and scalability. The architecture consists of the following components: Model artifacts are stored in Amazon Simple Storage Service (Amazon S3) A custom container with inference code is hosted in Amazon Elastic Container Registry (Amazon ECR) A SageMaker endpoint uses ml.g5.4xlarge instances for GPU-accelerated inference Amazon CloudWatch monitors invocations, triggering auto scaling to adjust instances (1–5) based on an 830 requests per minute (RPM) threshold. The solution supports the following features: Sub-500 milliseconds inference latency EU AWS Region deployment support Automatic scaling between 1–5 instances based on demand Cost optimization during low-usage periods Synthetic data generation High-quality training data for sensitive information (such as M&A documents and financial data) is scarce. We used Meta Llama 3.3 70B Instruct and Amazon Nova Pro to generate synthetic data, expanding upon Harmonic’s existing dataset that included examples of data in the following categories: M&A, billing information, financial projection, employment records, sales pipeline, and investment portfolio. The following diagram provides a high-level overview of the synthetic data generation process. Data generation framework The synthetic data generation framework is comprised of a series of steps, including: Smart example selection – K-means clustering on sentence embeddings supports diverse example selection Adaptive prompts – Prompts incorporate domain knowledge, with temperature (0.7–0.85) and top-p sampling adjusted per category Near-miss augmentation – Negative examples resembling positive cases to improve precision Validation – An LLM-as-a-judge approach using Amazon Nova Pro and Meta Llama 3 validates examples for relevance and quality Binary classification For the binary M&A classification task, we generated three distinct types of examples: Positive examples – These contained explicit M&A information while maintaining realistic document structures and finance-specific language patterns. They included key indicators like “merger,” “acquisition,” “deal terms,” and “synergy estimates.” Negative examples – We created domain-relevant content that deliberately avoided M&A characteristics while remaining contextually appropriate for business communications. Near-miss examples – These resembled positive examples but fell just outside the classification boundary. For instance, documents discussing strategic partnerships or joint ventures that didn’t constitute actual M&A activity. The generation process maintained careful proportions between these example types, with particular emphasis on near-miss examples to address precision requirements. Multi-label classification For the more complex multi-label classification task across four sensitive information categories, we developed a sophisticated generation strategy: Single-label examples – We generated examples containing information relevant to exactly one category to establish clear category-specific features Multi-label examples – We created examples spanning multiple categories with controlled distributions, covering various combinations (2–4 labels) Category-specific requirements – For each category, we defined mandatory elements to maintain explicit rather than implied associations: Financial projections – Forward-looking revenue and growth data Investment portfolio – Details about holdings and performance metrics Billing and payment information – Invoices and supplier accounts Sales pipeline – Opportunities and projected revenue Our multi-label generation prioritized realistic co-occurrence patterns between categories while maintaining sufficient representation of individual categories and their combinations. As a result, synthetic data increased training examples by 10 times (binary) and 15 times (multi-label) more. It also improved the class balance because we made sure to generate the data with a more balanced label distribution. Model fine-tuning We fine-tuned ModernBERT models on SageMaker to achieve low latency and high accuracy. Compared with decoder-only models such as Meta Llama 3.2 3B and Google Gemma 2 2B, ModernBERT’s compact size (149M and 395M parameters) translated into faster latency while still delivering higher accuracy. We therefore selected ModernBERT over fine-tuning those alternatives. In addition, ModernBERT is one of the few BERT-based models that supports context lengths of up to 8,192 tokens, which was a key requirement for our project. Binary classification model Our first fine-tuned model used ModernBERT-base, and we focused on binary classification of M&A content.We approached this task methodically: Data preparation – We enriched our M&A dataset with the synthetically generated data Framework selection – We used the Hugging Face transformers library with the Trainer API in a PyTorch environment, running on SageMaker Training process – Our process included: Stratified sampling to maintain label distribution across training and evaluation sets Specialized tokenization with sequence lengths up to 3,000 tokens to match what the client had in production Binary cross-entropy loss optimization Early stopping based on F1 score to prevent overfitting. The result was a fine-tuned model that could distinguish M&A content from non-sensitive information with a higher F1 score than the 8B parameter model. Multi-label classification model For our second model, we tackled the more complex challenge of multi-label classification (detecting multiple sensitive data types simultaneously within single text passages).We fine-tuned a ModernBERT-large model to identify various sensitive data types like billing information, employment records, and financial projections in a single pass. This required: Multi-hot label encoding – We converted our categories into vector format for simultaneous prediction. Focal loss implementation – Instead of standard cross-entropy loss, we implemented a custom FocalLossTrainer class. Unlike static weighted loss functions, Focal Loss adaptively down-weights straightforward examples during training. This helps the model concentrate on challenging cases, significantly improving performance for less frequent or harder-to-detect classes. Specialized configuration – We added configurable class thresholds (for example, 0.1 to 0.8) for each class probability to determine label assignment as we observed varying performance in different decision boundaries. This approach enabled our system to identify multiple sensitive data types in a single inference pass. Hyperparameter optimization To find the optimal configuration for our models, we used Optuna to optimize key parameters. Optuna is an open-source hyperparameter optimization (HPO) framework that helps find the best hyperparameters for a given machine learning (ML) model by running many experiments (called trials). It uses a Bayesian algorithm called Tree-structured Parzen Estimator (TPE) to choose promising hyperparameter combinations based on past results. The search space explored numerous combinations of key hyperparameters, as listed in the following table. Hyperparameter Range Learning rate 5e-6–5e-5 Weight decay 0.01–0.5 Warmup ratio 0.0–0.2 Dropout rates 0.1–0.5 Batch size 16, 24, 32 Gradient accumulation steps 1, 4 Focal loss gamma (multi-label only) 1.0–3.0 Class threshold (multi-label only) 0.1–0.8 To optimize computational resources, we implemented pruning logic to stop under-performing trials early, so we could discard configurations that were less optimal. As seen in the following Optuna HPO history plot, trial 42 had the most optimal parameters with the highest F1 score for the binary classification, whereas trial 32 was the most optimal for the multi-label. Moreover, our analysis showed that dropout and learning rate were the most important hyperparameters, accounting for 48% and 21% of the variance of the F1 score for the binary classification model. This explained why we noticed the model overfitting quickly during previous runs and stresses the importance of regularization. After the optimization experiments, we discovered the following: We were able to identify the optimal hyperparameters for each task The models converged faster during training The final performance metrics showed measurable improvements over configurations we tested manually This allowed our models to achieve a high F1 score efficiently by running hyperparameter tuning in an automated fashion, which is crucial for production deployment. Load testing and autoscaling policy After fine-tuning and deploying the optimized model to a SageMaker real-time endpoint, we performed load testing to validate the performance and autoscaling under pressure to meet Harmonic Security’s latency, throughput, and elasticity needs. The objectives of the load testing were: Validate latency SLA with an average of less than 500 milliseconds and P95 of approximately 1 second varying loads Determine throughput capacity with maximum RPM using ml.g5.4xlarge instances within latency SLA Inform the auto scaling policy design The methodology involved the following: Traffic simulation – Locust simulated concurrent user traffic with varying text lengths (50–9,999 characters) Load pattern – We stepped ramp-up tests (60–2,000 RPM, 60 seconds each) and identified bottlenecks and stress-tested limits As shown in the following graph, we found that the maximum throughput under a latency of 1 second was 1,185 RPM, so we decided to set the auto scaling threshold to 70% of that at 830 RPM. Based on the performance observed during load testing, we configured a target-tracking auto scaling policy for the SageMaker endpoint using Application Auto Scaling. The following figure illustrates this policy workflow. The key parameters defined were: Metric – SageMakerVariantInvocationsPerInstance (830 invocations/instance/minute) Min/Max Instances – 1–5 Cooldown – Scale-out 300 seconds, scale-in 600 seconds This target-tracking policy adjusts instances based on traffic, maintaining performance and cost-efficiency. The following table summarizes our findings. Model Requests per Minute 8B model 800 ModernBERT with auto scaling (5 instances) 1,185-5925 Additional capacity (ModernBERT vs. 8B model) 48%-640% Results This section showcases the significant impact of the fine-tuning and optimization efforts on Harmonic Security’s data leakage detection system, with a primary focus on achieving substantial latency reductions. Absolute latency improvements are detailed first, underscoring the success in meeting the sub-500 millisecond target, followed by an overview of performance enhancements. The following subsections provide detailed results for binary M&A classification and multi-label classification across multiple sensitive data types. Binary classification We evaluated the fine-tuned ModernBERT-base model for binary M&A classification against the baseline 8B model, introduced in the solution overview. The most striking achievement was a transformative reduction in latency, addressing the initial 1–2 second delay that risked disrupting user experience. This leap to sub-500 millisecond latency is detailed in the following table, marking a pivotal enhancement in system responsiveness. Model median_ms p95_ms p99_ms p100_ms Modernbert-base-v2 46.03 81.19 102.37 183.11 8B model 189.15 259.99 286.63 346.36 Difference -75.66% -68.77% -64.28% -47.13% Building on this latency breakthrough, the following performance metrics reflect percentage improvements in accuracy and F1 score. Model Accuracy Improvement F1 Improvement ModernBERT-base-v2 +1.56% +2.26% 8B model – – These results highlight that ModernBERT-base-v2 delivers a groundbreaking latency reduction, complemented by modest accuracy and F1 improvements of 1.56% and 2.26%, respectively, aligning with Harmonic Security’s objectives to enhance data leakage detection without impacting user experience. Multi-label classification We evaluated the fine-tuned ModernBERT-large model for multi-label classification against the baseline 8B model, with latency reduction as the cornerstone of this approach. The most significant advancement was a substantial decrease in latency across all evaluated categories, achieving sub-500 millisecond responsiveness and addressing the previous 1–2 second bottleneck. The latency results shown in the following table underscore this critical improvement. Dataset model median_ms p95_ms p99_ms Billing and payment 8B model 198 238 321 ModernBERT-large 158 199 246 Difference -20.13% -16.62% -23.60% Sales pipeline 8B model 194 265 341 ModernBERT-large 162 243 293 Difference -16.63% -8.31% -13.97% Financial projections 8B model 384 510 556 ModernBERT-large 160 275 310 Difference -58.24% -46.04% -44.19% Investment portfolio 8B model 397 498 703 ModernBERT-large 160 259 292 Difference -59.69% -47.86% -58.46% This approach also delivered a second key benefit: a reduction in computational parallelism by consolidating multiple classifications into a single pass. However, the multi-label model encountered challenges in maintaining consistent accuracy across all classes. Although categories like Financial Projections and Investment Portfolio showed promising accuracy gains, others such as Billing and Payment and Sales Pipeline experienced significant accuracy declines. This indicates that, despite its latency and parallelism advantages, the approach requires further development to maintain reliable accuracy across data types. Conclusion In this post, we explored how Harmonic Security collaborated with the AWS Generative AI Innovation Center to optimize their data leakage detection system achieving transformative results: Key performance improvements: Latency reduction: From 1–2 seconds to under 500 milliseconds (76% reduction at median) Throughput increase: 48%–640% additional capacity with auto scaling Accuracy gains: +1.56% for binary classification, with maintained precision across categories By using SageMaker, Amazon Bedrock, and Amazon Nova Pro, Harmonic Security fine-tuned ModernBERT models that deliver sub-500 millisecond inference in production, meeting stringent performance goals while supporting EU compliance and establishing a scalable architecture. This partnership showcases how tailored AI solutions can tackle critical cybersecurity challenges without hindering productivity. Harmonic Security’s solution is now available on AWS Marketplace, enabling organizations to adopt AI tools safely while protecting sensitive data in real time. Looking ahead, these high-speed models have the potential to add further controls for additional AI workflows. To learn more, consider the following next steps: Try Harmonic Security – Deploy the solution directly from AWS Marketplace to protect your organization’s GenAI usage Explore AWS services – Dive into SageMaker, Amazon Bedrock, and Amazon Nova Pro to build advanced AI-driven security solutions. Visit the AWS Generative AI page for resources and tutorials. Deep dive into fine-tuning – Explore the AWS Machine Learning Blog for in-depth guides on fine-tuning LLMs for specialized use cases. Stay updated – Subscribe to the AWS Podcast for weekly insights on AI innovations and practical applications. Connect with experts – Join the AWS Partner Network to collaborate with experts and scale your AI initiatives. Attend AWS events – Register for AWS re: Invent. to explore cutting-edge AI advancements and network with industry leaders. By adopting these steps, organizations can harness AI-driven cybersecurity to maintain robust data protection and seamless user experiences across diverse workflows. About the authors Babs Khalidson is a Deep Learning Architect at the AWS Generative AI Innovation Centre in London, where he specializes in fine-tuning large language models, building AI agents, and model deployment solutions. He has over 6 years of experience in artificial intelligence and machine learning across finance and cloud computing, with expertise spanning from research to production deployment. Vushesh Babu Adhikari is a Data scientist at the AWS Generative AI Innovation center in London with extensive expertise in developing Gen AI solutions across diverse industries. He has over 7 years of experience spanning across a diverse set of industries including Finance , Telecom , Information Technology with specialized expertise in Machine learning & Artificial Intelligence. Zainab Afolabi is a Senior Data Scientist at the AWS Generative AI Innovation Centre in London, where she leverages her extensive expertise to develop transformative AI solutions across diverse industries. She has over nine years of specialized experience in artificial intelligence and machine learning, as well as a passion for translating complex technical concepts into practical business applications. Nuno Castro is a Sr. Applied Science Manager at the AWS Generative AI Innovation Center. He leads Generative AI customer engagements, helping AWS customers find the most impactful use case from ideation, prototype through to production. He’s has 19 years experience in the field in industries such as finance, manufacturing, and travel, leading ML teams for 11 years. Christelle Xu is a Senior Generative AI Strategist who leads model customization and optimization strategy across EMEA within the AWS Generative AI Innovation Center, working with customers to deliver scalable Generative AI solutions, focusing on continued pre-training, fine-tuning, reinforcement learning, and training and inference optimization. She holds a Master’s degree in Statistics from the University of Geneva and a Bachelor’s degree from Brigham Young University. Manuel Gomez is a Solutions Architect at AWS supporting generative AI startups across the UK and Ireland. He works with model producers, fine-tuning platforms, and agentic AI applications to design secure and scalable architectures. Before AWS, he worked in startups and consulting, and he has a background in industrial technologies and IoT. He is particularly interested in how multi-modal AI can be applied to real industry problems. Bryan Woolgar-O’Neil is the co-founder & CTO at Harmonic Security. With over 20 years of software development experience, the last 10 were dedicated to building the Threat Intelligence company Digital Shadows, which was acquired by Reliaquest in 2022. His expertise lies in developing products based on cutting-edge software, focusing on making sense of large volumes of data. Jamie Cockrill is the Director of Machine Learning at Harmonic Security, where he leads a team focused on building, training, and refining Harmonic’s Small Language Models. Adrian Cunliffe is a Senior Machine Learning Engineer at Harmonic Security, where he focuses on scaling Harmonic’s Machine Learning engine that powers Harmonic’s propri […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 days, 8 hours ago
Implement automated smoke testing using Amazon Nova Act headless mode
Automated smoke testing using Amazon Nova Act headless mode helps development […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 3 days, 8 hours ago
Create AI-powered chat assistants for your enterprise with Amazon Quick Suite
Teams need instant access to enterprise data and intelligent guidance […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 3 days, 8 hours ago
Real-world reasoning: How Amazon Nova Lite 2.0 handles complex customer support scenarios
Artificial intelligence (AI) reasoning capabilities […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 4 days, 8 hours ago
Create an intelligent insurance underwriter agent powered by Amazon Nova 2 Lite and Amazon Quick Suite
Insurance underwriting requires analyzing […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 4 days, 8 hours ago
Streamline AI agent tool interactions: Connect API Gateway to AgentCore Gateway with MCP
AgentCore Gateway now supports API GatewayAs organizations […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 4 days, 8 hours ago
S&P Global Data integration expands Amazon Quick Research capabilities
Today, we are pleased to announce a new integration between Amazon Quick […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 4 days, 8 hours ago
How AWS delivers generative AI to the public sector in weeks, not years
When critical services depend on quick action, from the safety of vulnerable […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 1 day ago
Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod
Modern AI applications demand fast, cost-effective responses from […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 1 day ago
How CBRE powers unified property management search and digital assistant using Amazon Bedrock
This post was written with Lokesha Thimmegowda, […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 1 day ago
How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator
This post was written with Martyna Shallenberg and Brode Mccrady from Myriad Genetics. Healthcare organizations face challenges in processing and managing high volumes of complex medical documentation while maintaining quality in patient care. These organizations need solutions to process documents effectively to meet growing demands. Myriad Genetics, a provider of genetic testing and precision medicine solutions serving healthcare providers and patients worldwide, addresses this challenge. Myriad’s Revenue Engineering Department processes thousands of healthcare documents daily across Women’s Health, Oncology, and Mental Health divisions. The company classifies incoming documents into classes such as Test Request Forms, Lab Results, Clinical Notes, and Insurance to automate Prior Authorization workflows. The system routes these documents to appropriate external vendors for processing based on their identified document class. They manually perform Key Information Extraction (KIE) including insurance details, patient information, and test results to determine Medicare eligibility and support downstream processes. As document volumes increased, Myriad faced challenges with its existing system. The automated document classification solution worked but was costly and time-consuming. Information extraction remained manual due to complexity. To address high costs and slow processing, Myriad needed a better solution. This post explores how Myriad Genetics partnered with the AWS Generative AI Innovation Center (GenAIIC) to transform their healthcare document processing pipeline using Amazon Bedrock and Amazon Nova foundation models. We detail the challenges with their existing solution, and how generative AI reduced costs and improved processing speed. We examine the technical implementation using AWS’s open source GenAI Intelligent Document Processing (GenAI IDP) Accelerator solution, the optimization strategies used for document classification and key information extraction, and the measurable business impact on Myriad’s prior authorization workflows. We cover how we used prompt engineering techniques, model selection strategies, and architectural decisions to build a scalable solution that processes complex medical documents with high accuracy while reducing operational costs. Document processing bottlenecks limiting healthcare operations Myriad Genetics’ daily operations depend on efficiently processing complex medical documents containing critical information for patient care workflows and regulatory compliance. Their existing solution combined Amazon Textract for Optical Character Recognition (OCR) with Amazon Comprehend for document classification. Despite 94% classification accuracy, this solution had operational challenges: Operational costs: 3 cents per page resulting in $15,000 monthly expenses per business unit Classification latency: 8.5 minutes per document, delaying downstream prior authorization workflows Information extraction was entirely manual, requiring contextual understanding to differentiate critical clinical distinctions (like “is metastatic” versus “is not metastatic”) and to locate information like insurance numbers and patient information across varying document formats. This processing burden was substantial, with Women’s Health customer service requiring up to 10 full-time employees contributing 78 hours daily in the Women’s Health business unit alone. Myriad needed a solution to: Reduce document classification costs while maintaining or improving accuracy Accelerate document processing to eliminate workflow bottlenecks Automate information extraction for medical documents Scale across multiple business units and document types Amazon Bedrock and generative AI Modern large language models (LLMs) process complex healthcare documents with high accuracy due to pre-training on massive text corpora. This pre-training enables LLMs to understand language patterns and document structures without feature engineering or large labeled datasets. Amazon Bedrock is a fully managed service that offers a broad range of high-performing LLMs from leading AI companies. It provides the security, privacy, and responsible AI capabilities that healthcare organizations require when processing sensitive medical information. For this solution, we used Amazon’s newest foundation models: Amazon Nova Pro: A cost-effective, low-latency model ideal for document classification Amazon Nova Premier: An advanced model with reasoning capabilities for information extraction Solution overview We implemented a solution with Myriad using AWS’s open source GenAI IDP Accelerator. The accelerator provides a scalable, serverless architecture that converts unstructured documents into structured data. The accelerator processes multiple documents in parallel through configurable concurrency limits without overwhelming downstream services. Its built-in evaluation framework lets users provide expected output through the user interface (UI) and evaluate generated results to iteratively customize configuration and improve accuracy. The accelerator offers 1-click deployment with a choice of pre-built patterns optimized for different workloads with different configurability, cost, and accuracy requirements: Pattern 1 – Uses Amazon Bedrock Data Automation, a fully managed service that offers rich out-of-the-box features, ease of use, and straightforward per-page pricing. This pattern is recommended for most use cases. Pattern 2 – Uses Amazon Textract and Amazon Bedrock with Amazon Nova, Anthropic’s Claude, or custom fine-tuned Amazon Nova models. This pattern is ideal for complex documents requiring custom logic. Pattern 3 – Uses Amazon Textract, Amazon SageMaker with a fine-tuned model for classification, and Amazon Bedrock for extraction. This pattern is ideal for documents requiring specialized classification. Pattern 2 proved most suitable for this project, meeting the critical requirement of low cost while offering flexibility to optimize accuracy through prompt engineering and LLM selection. This pattern offers a no-code configuration – customize document types, extraction fields, and processing logic through configuration, editable in the web UI. We customized the definitions of document classes, key attributes and their definitions per document class, LLM choice, LLM hyperparameters, and classification and extraction LLM prompts via Pattern 2’s config file. In production, Myriad integrated this solution into their existing event-driven architecture. The following diagram illustrates the production pipeline: Document Ingestion: Incoming order events trigger document retrieval from source document management systems, with cache optimization for previously processed documents. Concurrency Management: DynamoDB tracked concurrent AWS Step Function jobs while Amazon Simple Queue Service (SQS) queues files exceeding concurrency limits for document processing. Text Extraction: Amazon Textract extracted text, layout information, tables and forms from the normalized documents. Classification: The configured LLM analyzed the extracted content based on the customized document classification prompt provided in the config file and classifies documents into appropriate categories. Key Information Extraction: The configured LLM extracted medical information using extraction prompt provided in the config file. Structured Output: The pipeline formatted the results in a structured manner and delivered to Myriad’s Authorization System via RESTful operations. Document classification with generative AI While Myriad’s existing solution achieved 94% accuracy, misclassifications occurred due to structural similarities, overlapping content, and shared formatting patterns across document types. This semantic ambiguity made it difficult to distinguish between similar documents. We guided Myriad on prompt optimization techniques that used LLM’s contextual understanding capabilities. This approach moved beyond pattern matching to enable semantic analysis of document context and purpose, identifying distinguishing features that human experts recognize but previous automated systems missed. AI-driven prompt engineering for document classification We developed class definitions with distinguishing characteristics between similar document types. To identify these differentiators, we provided document samples from each class to Anthropic Claude Sonnet 3.7 on Amazon Bedrock with model reasoning enabled (a feature that allows the model to demonstrate its step-by-step analysis process). The model identified distinguishing features between similar document classes, which Myriad’s subject matter experts refined and incorporated into the GenAI IDP Accelerator’s Pattern 2 config file for document classification prompts. Format-based classification strategies We used document structure and formatting as key differentiators to distinguish between similar document types that shared comparable content but differed in structure. We enabled the classification models to recognize format-specific characteristics such as layout structures, field arrangements, and visual elements, allowing the system to differentiate between documents that textual content alone cannot distinguish. For example, lab reports and test results both contain patient information and medical data, but lab reports display numerical values in tabular format while test results follow a narrative format. We instructed the LLM: “Lab reports contain numerical results organized in tables with reference ranges and units. Test results present findings in paragraph format with clinical interpretations.” Implementing negative prompting for enhanced accuracy We implemented negative prompting techniques to resolve confusion between similar documents by explicitly instructing the model what classifications to avoid. This approach added exclusionary language to classification prompts, specifying characteristics that should not be associated with each document type. Initially, the system frequently misclassified Test Request Forms (TRFs) as Test Results due to confusion between patient medical history and lab measurements. Adding a negative prompt like “These forms contain patient medical history. DO NOT confuse them with test results which contain current/recent lab measurements” to the TRF definition improved the classification accuracy by 4%. By providing explicit guidance on common misclassification patterns, the system avoided typical errors and confusion between similar document types. Model selection for cost and performance optimization Model selection drives optimal cost-performance at scale, so we conducted comprehensive benchmarking using the GenAI IDP Accelerator’s evaluation framework. We tested four foundation models—Amazon Nova Lite, Amazon Nova Pro, Amazon Nova Premier, and Anthropic Claude Sonnet 3.7—using 1,200 healthcare documents across three document classes: Test Request Forms, Lab Results, and Insurance. We assessed each model using three critical metrics: classification accuracy, processing latency, and cost per document. The accelerator’s cost tracking enabled direct comparison of operational expenses across different model configurations, ensuring performance improvements translate into measurable business value at scale. The evaluation results demonstrated that Amazon Nova Pro achieved optimal balance for Myriad’s use case. We transitioned from Myriad’s Amazon Comprehend implementation to Amazon Nova Pro with optimized prompts for document classification, achieving significant improvements: classification accuracy increased from 94% to 98%, processing costs decreased by 77%, and processing speed improved by 80%—reducing classification time from 8.5 minutes to 1.5 minutes per document. Automating Key Information Extraction with generative AI Myriad’s information extraction was manual, requiring up to 10 full-time employees contributing 78 hours daily in the Women’s Health unit alone, which created operational bottlenecks and scalability constraints. Automating healthcare KIE presented challenges: checkbox fields required distinguishing between marking styles (checkmarks, X’s, handwritten marks); documents contained ambiguous visual elements like overlapping marks or content spanning multiple fields; extraction needed contextual understanding to differentiate clinical distinctions and locate information across varying document formats. We worked with Myriad to develop an automated KIE solution, implementing the following optimization techniques to address extraction complexity. Enhanced OCR configuration for checkbox recognition To address checkbox identification challenges, we enabled Amazon Textract’s specialized TABLES and FORMS features on the GenAI IDP Accelerator portal as shown in the following image, to improve OCR discrimination between selected and unselected checkbox elements. These features enhanced the system’s ability to detect and interpret marking styles found in medical forms. We enhanced accuracy by incorporating visual cues into the extraction prompts. We updated the prompts with instructions such as “look for visible marks in or around the small square boxes (✓, x, or handwritten marks)” to guide the language model in identifying checkbox selections. This combination of enhanced OCR capabilities and targeted prompting improved checkbox extraction in medical forms. Visual context learning through few-shot examples Configuring Textract and improving prompts alone could not handle complex visual elements effectively. We implemented a multimodal approach that sent both document images and extracted text from Textract to the foundation model, enabling simultaneous analysis of visual layout and textual content for accurate extraction decisions. We implemented few-shot learning by providing example document images paired with their expected extraction outputs to guide the model’s understanding of various form layouts and marking styles. Multiple document image examples with their correct extraction patterns create lengthy LLM prompts. We leveraged the GenAI IDP Accelerator’s built-in integration with Amazon Bedrock’s prompt caching feature to reduce costs and latency. Prompt caching stores lengthy few-shot examples in memory for 5 minutes—when processing multiple similar documents within that timeframe, Bedrock reuses cached examples instead of reprocessing them, reducing both cost and processing time. Chain of thought reasoning for complex extraction While this multimodal approach improved extraction accuracy, we still faced challenges with overlapping and ambiguous tick marks in complex form layouts. To perform well in ambiguous and complex situations, we used Amazon Nova Premier and implemented Chain of Thought reasoning to have the model think through extraction decisions step-by-step using thinking tags. For example: Analyze the checkbox marks in this form: 1. What checkboxes are present? [List all visible options] 2. Where are the marks positioned? [Describe mark locations] 3. Which marks are clear vs ambiguous? [Assess mark quality] 4. For overlapping marks: Which checkbox contains most of the mark? 5. Are marks positioned in the center or touching edges? [Prioritize center positioning] Additionally, we included reasoning explanations in the few-shot examples, demonstrating how we reached conclusions in ambiguous cases. This approach enabled the model to work through complex visual evidence and contextual clues before making final determinations, improving performance with ambiguous tick marks. Testing across 32 document samples with varying complexity levels via the GenAI IDP Accelerator revealed that Amazon Textract with Layout, TABLES, and FORMS features enabled, paired with Amazon Nova Premier’s advanced reasoning capabilities and the inclusion of few-shot examples, delivered the best results. The solution achieved 90% accuracy (same as human evaluator baseline accuracy) while processing documents in approximately 1.3 minutes each. Results and business impact Through our new solution, we delivered measurable improvements that met the business goals established at the project outset: Document classification performance: We increased accuracy from 94% to 98% through prompt optimization techniques for Amazon Nova Pro, including AI-driven prompt engineering, document-format based classification strategies, and negative prompting. We reduced classification costs by 77% (from 3.1 to 0.7 cents per page) by migrating from Amazon Comprehend to Amazon Nova Pro with optimized prompts. We reduced classification time by 80% (from 8.5 to 1.5 minutes per document) by choosing Amazon Nova Pro to provide a low-latency and cost-effective solution. New automated Key Information Extraction performance: We achieved 90% extraction accuracy (same as the baseline manual process): Delivered through a combination of Amazon Textract’s document analysis capabilities, visual context learning through few-shot examples and Amazon Nova Premier’s reasoning for complex data interpretation. We achieved processing costs of 9 cents per page and processing time of 1.3 minutes per document compared to manual baseline requiring up to 10 full-time employees working 78 hours daily per business unit. Business impact and rollout Myriad has planned a phased rollout beginning with document classification. They plan to launch our new classification solution in the Women’s Health business unit, followed by Oncology and Mental Health divisions. As a result of our work, Myriad will realize up to $132K in annual savings in their document classification costs. The solution reduces each prior authorization submission time by 2 minutes—specialists now complete orders in four minutes instead of six minutes due to faster access to tagged documents. This improvement saves 300 hours monthly across 9,000 prior authorizations in Women’s Health alone, equivalent to 50 hours per prior authorization specialist. These measurable improvements have transformed Myriad’s operations, as their engineering leadership confirms: “Partnering with the GenAIIC to migrate our Intelligent Document Processing solution from AWS Comprehend to Bedrock has been a transformative step forward. By improving both performance and accuracy, the solution is projected to deliver savings of more than $10,000 per month. The team’s close collaboration with Myriad’s internal engineering team delivered a high-quality, scalable solution, while their deep expertise in advanced language models has elevated our capabilities. This has been an excellent example of how innovation and partnership can drive measurable business impact.” – Martyna Shallenberg, Senior Director of Software Engineering, Myriad Genetics Conclusion The AWS GenAI IDP Accelerator enabled Myriad’s rapid implementation, providing a flexible framework that reduced development time. Healthcare organizations need tailored solutions—the accelerator delivers extensive customization capabilities that let users adapt solutions to specific document types and workflows without requiring extensive code changes or frequent redeployment during development. Our approach demonstrates the power of strategic prompt engineering and model selection. We achieved high accuracy in a specialized domain by focusing on prompt design, including negative prompting and visual cues. We optimized both cost and performance by selecting Amazon Nova Pro for classification and Nova Premier for complex extraction—matching the right model to each specific task. Explore the solution for yourself Organizations looking to improve their document processing workflows can experience these benefits firsthand. The open source GenAI IDP Accelerator that powered Myriad’s transformation is available to deploy and test in your environment. The accelerator’s straightforward setup process lets users quickly evaluate how generative AI can transform document processing challenges. Once you’ve explored the accelerator and seen its potential impact on your workflows, reach out to the AWS GenAIIC team to explore how the GenAI IDP Accelerator can be customized and optimized for your specific use case. This hands-on approach ensures you can make informed decisions about implementing intelligent document processing in your organization. About the authors Priyashree Roy is a Data Scientist II at the AWS Generative AI Innovation Center, where she applies her expertise in machine learning and generative AI to develop innovative solutions for strategic AWS customers. She brings a rigorous scientific approach to complex business challenges, informed by her PhD in experimental particle physics from Florida State University and postdoctoral research at the University of Michigan. Mofijul Islam is an Applied Scientist II and Tech Lead at the AWS Generative AI Innovation Center, where he helps customers tackle customer-centric research and business challenges using generative AI, large language models (LLM), multi-agent learning, code generation, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual natural language processing (NLP), and multitask learning. His research has been published in top-tier conferences like NeurIPS, International Conference on Learning Representations (ICLR), Empirical Methods in Natural Language Processing (EMNLP), Society for Artificial Intelligence and Statistics (AISTATS), and Association for the Advancement of Artificial Intelligence (AAAI), as well as Institute of Electrical and Electronics Engineers (IEEE) and Association for Computing Machinery (ACM) Transactions. Nivedha Balakrishnan is a Deep Learning Architect II at the AWS Generative AI Innovation Center, where she helps customers design and deploy generative AI applications to solve complex business challenges. Her expertise spans large language models (LLMs), multimodal learning, and AI-driven automation. She holds a Master’s in Applied Data Science from San Jose State University and a Master’s in Biomedical Engineering from Linköping University, Sweden. Her previous research focused on AI for drug discovery and healthcare applications, bridging life sciences with machine learning. Martyna Shallenberg is a Senior Director of Software Engineering at Myriad Genetics, where she leads cross-functional teams in building AI-driven enterprise solutions that transform revenue cycle operations and healthcare delivery. With a unique background spanning genomics, molecular diagnostics, and software engineering, she has scaled innovative platforms ranging from Intelligent Document Processing (IDP) to modular LIMS solutions. Martyna is also the Founder & President of BioHive’s HealthTech Hub, fostering cross-domain collaboration to accelerate precision medicine and healthcare innovation. Brode Mccrady is a Software Engineering Manager at Myriad Genetics, where he leads initiatives in AI, revenue systems, and intelligent document processing. With over a decade of experience in business intelligence and strategic analytics, Brode brings deep expertise in translating complex business needs into scalable technical solutions. He holds a degree in Economics, which informs his data-driven approach to problem-solving and business strategy. Randheer Gehlot is a Principal Customer Solutions Manager at AWS who specializes in healthcare and life sciences transformation. With a deep focus on AI/ML applications in healthcare, he helps enterprises design and implement efficient cloud solutions that address real business challenges. His work involves partnering with organizations to modernize their infrastructure, enable innovation, and accelerate their cloud adoption journey while ensuring practical, sustainable outcomes. Acknowledgements We would like to thank Bob Strahan, Kurt Mason, Akhil Nooney and Taylor Jensen for their significant contributions, strategic dec […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 2 days ago
Train custom computer vision defect detection model using Amazon SageMaker
On October 10, 2024, Amazon announced the discontinuation of the Amazon […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 2 days ago
Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference
Generative AI models continue to […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 2 days ago
Enhanced performance for Amazon Bedrock Custom Model Import
You can now achieve significant performance improvements when using Amazon Bedrock […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 2 days ago
Beyond the technology: Workforce changes for AI
Workplaces are increasingly integrating AI tools into daily operations, with AI assistants supporting […] -
AWS Machine Learning wrote a new post on the site CYBERCASEMANAGER ENTERPRISES 2 weeks, 2 days ago
Evaluate models with the Amazon Nova evaluation container using Amazon SageMaker AI
This blog post introduces the new Amazon Nova model evaluation […] - Load More



















