Fine-tune and deploy a summarizer model using the Hugging Face Amazon SageMaker containers bringing your own script
There have been many recent advancements in the NLP domain. Pre-trained models and fully managed NLP services have democratised access and adoption of NLP. Amazon Comprehend is a fully managed service that can perform NLP tasks like custom entity recognition, topic modelling, sentiment analysis and more to extract insights from data without the need of any prior ML experience.
Last year, AWS announced a partnership with Hugging Face to help bring natural language processing (NLP) models to production faster. Hugging Face is an open-source AI community, focused on NLP. Their Python-based library (Transformers) provides tools to easily use popular state-of-the-art Transformer architectures like BERT, RoBERTa, and GPT. You can apply these models to a variety of NLP tasks, such as text classification, information extraction, and question answering, among others.
Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process, making it easier to develop high-quality models. The SageMaker Python SDK provides open-source APIs and containers to train and deploy models on SageMaker, using several different ML and deep learning frameworks.
The Hugging Face integration with SageMaker allows you to build Hugging Face models at scale on your own domain-specific use cases.
In this post, we walk you through an example of how to build and deploy a custom Hugging Face text summarizer on SageMaker. We use Pegasus [1] for this purpose, the first Transformer-based model specifically pre-trained on an objective tailored for abstractive text summarization. BERT is pre-trained on masking random words in a sentence; in contrast, during Pegasus’s pre-training, sentences are masked from an input document. The model then generates the missing sentences as a single output sequence using all the unmasked sentences as context, creating an executive summary of the document as a result.
Thanks to the flexibility of the HuggingFace library, you can easily adapt the code shown in this post for other types of transformer models, such as t5, BART, and more.
Load your own dataset to fine-tune a Hugging Face model
To load a custom dataset from a CSV file, we use the load_dataset
method from the Transformers package. We can apply tokenization to the loaded dataset using the datasets.Dataset.map
function. The map
function iterates over the loaded dataset and applies the tokenize function to each example. The tokenized dataset can then be passed to the trainer for fine-tuning the model. See the following code:
Build your training script for the Hugging Face SageMaker estimator
As explained in the post AWS and Hugging Face collaborate to simplify and accelerate adoption of Natural Language Processing models, training a Hugging Face model on SageMaker has never been easier. We can do so by using the Hugging Face estimator from the SageMaker SDK.
The following code snippet fine-tunes Pegasus on our dataset. You can also find many sample notebooks that guide you through fine-tuning different types of models, available directly in the transformers GitHub repository. To enable distributed training, we can use the Data Parallelism Library in SageMaker, which has been built into the HuggingFace Trainer API. To enable data parallelism, we need to define the distribution
parameter in our Hugging Face estimator.
The maximum training batch size you can configure depends on the model size and the GPU memory of the instance used. If SageMaker distributed training is enabled, the total batch size is the sum of every batch that is distributed across each device/GPU. If we use an ml.g4dn.16xlarge with distributed training instead of an ml.g4dn.xlarge instance, we have eight times (8 GPUs) as much memory as a ml.g4dn.xlarge instance (1 GPU). The batch size per device remains the same, but eight devices are training in parallel.
As usual with SageMaker, we create a train.py
script to use with Script Mode and pass hyperparameters for training. The following code snippet for Pegasus loads the model and trains it using the Transformers Trainer
class:
The full code is available on GitHub.
Deploy the trained Hugging Face model to SageMaker
Our friends at Hugging Face have made inference on SageMaker for Transformers models simpler than ever thanks to the SageMaker Hugging Face Inference Toolkit. You can directly deploy the previously trained model by simply setting up the environment variable "HF_TASK":"summarization"
(for instructions, see Pegasus Models), choosing Deploy, and then choosing Amazon SageMaker, without needing to write an inference script.
However, if you need some specific way to generate or postprocess predictions, for example generating several summary suggestions based on a list of different text generation parameters, writing your own inference script can be useful and relatively straightforward:
As shown in the preceding code, such an inference script for HuggingFace on SageMaker only needs the following template functions:
- model_fn() – Reads the content of what was saved at the end of the training job inside
SM_MODEL_DIR
, or from an existing model weights directory saved as a tar.gz file in Amazon Simple Storage Service (Amazon S3). It’s used to load the trained model and associated tokenizer. - input_fn() – Formats the data received from a request made to the endpoint.
- predict_fn() – Calls the output of
model_fn()
(the model and tokenizer) to run inference on the output ofinput_fn()
(the formatted data).
Optionally, you can create an output_fn()
function for inference formatting, using the output of predict_fn()
, which we didn’t demonstrate in this post.
We can then deploy the trained Hugging Face model with its associated inference script to SageMaker using the Hugging Face SageMaker Model class:
Test the deployed model
For this demo, we trained the model on the Women’s E-Commerce Clothing Reviews dataset, which contains reviews of clothing articles (which we consider as the input text) and their associated titles (which we consider as summaries). After we remove articles with missing titles, the dataset contains 19,675 reviews. Fine-tuning the Pegasus model on a training set containing 70% of those articles for five epochs took approximately 3.5 hours on an ml.p3.16xlarge instance.
We can then deploy the model and test it with some example data from the test set. The following is an example review describing a sweater:
Thanks to our custom inference script hosted in a SageMaker endpoint, we can generate several summaries for this review with different text generation parameters. For example, we can ask the endpoint to generate a range of very short to moderately long summaries specifying different length penalties (the smaller the length penalty, the shorter the generated summary). The following are some parameter input examples, and the subsequent machine-generated summaries:
Which summary do you prefer? The first generated title captures all the important facts about the review, with a quarter the number of words. In contrast, the last one only uses three words (less than 1/10th the length of the original review) to focus on the most important feature of the sweater.
Conclusion
You can fine-tune a text summarizer on your custom dataset and deploy it to production on SageMaker with this simple example available on GitHub. Additional sample notebooks to train and deploy Hugging Face models on SageMaker are also available.
As always, AWS welcomes feedback. Please submit any comments or questions.
References
[1] PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
About the authors
Viktor Malesevic is a Machine Learning Engineer with AWS Professional Services, passionate about Natural Language Processing and MLOps. He works with customers to develop and put challenging deep learning models to production on AWS. In his spare time, he enjoys sharing a glass of red wine and some cheese with friends.
Aamna Najmi is a Data Scientist with AWS Professional Services. She is passionate about helping customers innovate with Big Data and Artificial Intelligence technologies to tap business value and insights from data. In her spare time, she enjoys gardening and traveling to new places.
Tags: Archive
Leave a Reply