{"id":2010,"date":"2022-03-23T21:39:40","date_gmt":"2022-03-23T21:39:40","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/23\/set-up-a-text-summarization-project-with-hugging-face-transformers-part-2\/"},"modified":"2022-03-23T21:39:40","modified_gmt":"2022-03-23T21:39:40","slug":"set-up-a-text-summarization-project-with-hugging-face-transformers-part-2","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/03\/23\/set-up-a-text-summarization-project-with-hugging-face-transformers-part-2\/","title":{"rendered":"Set up a text summarization project with Hugging Face Transformers: Part 2"},"content":{"rendered":"<div id=\"\">\n<p>This is the second post in a two-part series in which I propose a practical guide for organizations so you can assess the quality of text summarization models for your domain.<\/p>\n<p>For an introduction to text summarization, an overview of this tutorial, and the steps to create a baseline for our project (also referred to as section 1), refer back to the <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/part-1-set-up-a-text-summarization-project-with-hugging-face-transformers\/\" target=\"_blank\" rel=\"noopener noreferrer\">first post<\/a>.<\/p>\n<p>This post is divided into three sections:<\/p>\n<ul>\n<li>Section 2: Generate summaries with a zero-shot model<\/li>\n<li>Section 3: Train a summarization model<\/li>\n<li>Section 4: Evaluate the trained model<\/li>\n<\/ul>\n<h2>Section 2: Generate summaries with a zero-shot model<\/h2>\n<p>In this post, we use the concept of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Zero-shot_learning\" target=\"_blank\" rel=\"noopener noreferrer\">zero-shot learning<\/a> (ZSL), which means we use a model that has been trained to summarize text but hasn\u2019t seen any examples of the <a href=\"https:\/\/www.kaggle.com\/Cornell-University\/arxiv\" target=\"_blank\" rel=\"noopener noreferrer\">arXiv dataset<\/a>. It\u2019s a bit like trying to paint a portrait when all you have been doing in your life is landscape painting. You know how to paint, but you might not be too familiar with the intricacies of portrait painting.<\/p>\n<p>For this section, we use the following <a href=\"https:\/\/github.com\/marshmellow77\/text-summarisation-project\/blob\/main\/2_zero_shot.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">notebook<\/a>.<\/p>\n<h3>Why zero-shot learning?<\/h3>\n<p>ZSL has become popular over the past years because it allows you to use state-of-the-art NLP models with no training. And their performance is sometimes quite astonishing: the <a href=\"https:\/\/bigscience.huggingface.co\/\" target=\"_blank\" rel=\"noopener noreferrer\">Big Science Research Workgroup<\/a> has recently released their T0pp (pronounced \u201cT Zero Plus Plus\u201d) model, which has been trained specifically for researching zero-shot multitask learning. It can often outperform models six times larger on the <a href=\"https:\/\/github.com\/google\/BIG-bench\" target=\"_blank\" rel=\"noopener noreferrer\">BIG-bench<\/a> benchmark, and can outperform the <a href=\"https:\/\/github.com\/openai\/gpt-3\" target=\"_blank\" rel=\"noopener noreferrer\">GPT-3<\/a> (16 times larger) on several other NLP benchmarks.<\/p>\n<p>Another benefit of ZSL is that it takes just two lines of code to use it. By trying it out, we create a second baseline, which we use to quantify the gain in model performance after we fine-tune the model on our dataset.<\/p>\n<h3>Set up a zero-shot learning pipeline<\/h3>\n<p>To use ZSL models, we can use Hugging Face\u2019s <a href=\"https:\/\/huggingface.co\/docs\/transformers\/main_classes\/pipelines\" target=\"_blank\" rel=\"noopener noreferrer\">Pipeline API<\/a>. This API enables us to use a text summarization model with just two lines of code. It takes care of the main processing steps in an NLP model:<\/p>\n<ol>\n<li>Preprocess the text into a format the model can understand.<\/li>\n<li>Pass the preprocessed inputs to the model.<\/li>\n<li>Postprocess the predictions of the model, so you can make sense of them.<\/li>\n<\/ol>\n<p>It uses the summarization models that are already available on the <a href=\"https:\/\/huggingface.co\/models?pipeline_tag=summarization&amp;sort=downloads\" target=\"_blank\" rel=\"noopener noreferrer\">Hugging Face model hub<\/a>.<\/p>\n<p>To use it, run the following code:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">from transformers import pipeline\n\nsummarizer = pipeline(\"summarization\")\nprint(summarizer(text))<\/code><\/pre>\n<\/p><\/div>\n<p>That\u2019s it! The code downloads a summarization model and creates summaries locally on your machine. If you\u2019re wondering which model it uses, you can either look it up in the <a href=\"https:\/\/github.com\/huggingface\/transformers\/blob\/master\/src\/transformers\/pipelines\/__init__.py\" target=\"_blank\" rel=\"noopener noreferrer\">source code<\/a> or use the following command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">print(summarizer.model.config.__getattribute__('_name_or_path'))<\/code><\/pre>\n<\/p><\/div>\n<p>When we run this command, we see that the default model for text summarization is called <code>sshleifer\/distilbart-cnn-12-6<\/code>:<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image005.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34011\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image005.png\" alt=\"\" width=\"700\" height=\"84\"><\/a><\/p>\n<p>We can find the <a href=\"https:\/\/huggingface.co\/sshleifer\/distilbart-cnn-12-6\" target=\"_blank\" rel=\"noopener noreferrer\">model card<\/a> for this model on the Hugging Face website, where we can also see that the model has been trained on two datasets: the <a href=\"https:\/\/huggingface.co\/datasets\/cnn_dailymail\" target=\"_blank\" rel=\"noopener noreferrer\">CNN Dailymail dataset<\/a> and the <a href=\"https:\/\/huggingface.co\/datasets\/xsum\" target=\"_blank\" rel=\"noopener noreferrer\">Extreme Summarization (XSum) dataset<\/a>. It\u2019s worth noting that this model is not familiar with the arXiv dataset and is only used to summarize texts that are similar to the ones it has been trained on (mostly news articles). The numbers 12 and 6 in the model name refer to the number of encoder layers and decoder layers, respectively. Explaining what these are is outside the scope of this tutorial, but you can read more about it in the post <a href=\"https:\/\/sshleifer.github.io\/blog_v2\/jupyter\/2020\/03\/12\/bart.html\" target=\"_blank\" rel=\"noopener noreferrer\">Introducing BART<\/a> by Sam Shleifer, who created the model.<\/p>\n<p>We use the default model going forward, but I encourage you to try out different pre-trained models. All the models that are suitable for summarization can be found on the <a href=\"https:\/\/huggingface.co\/models?pipeline_tag=summarization&amp;sort=downloads\" target=\"_blank\" rel=\"noopener noreferrer\">Hugging Face website<\/a>. To use a different model, you can specify the model name when calling the Pipeline API:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">summarizer = pipeline(\"summarization\", model=\"facebook\/bart-large-cnn\")<\/code><\/pre>\n<\/p><\/div>\n<h3>Extractive vs. abstractive summarization<\/h3>\n<p>We haven\u2019t spoken yet about two possible but different approaches to text summarization: <em>extractive <\/em>vs. <em>abstractive<\/em>. Extractive summarization is the strategy of concatenating extracts taken from a text into a summary, whereas abstractive summarization involves paraphrasing the corpus using novel sentences. Most of the summarization models are based on models that generate novel text (they\u2019re natural language generation models, like, for example, <a href=\"https:\/\/github.com\/openai\/gpt-3\" target=\"_blank\" rel=\"noopener noreferrer\">GPT-3<\/a>). This means that the summarization models also generate novel text, which makes them abstractive summarization models.<\/p>\n<h3>Generate zero-shot summaries<\/h3>\n<p>Now that we know how to use it, we want to use it on our test dataset\u2014the same dataset we used in <a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/part-1-set-up-a-text-summarization-project-with-hugging-face-transformers\/\" target=\"_blank\" rel=\"noopener noreferrer\">section 1<\/a> to create the baseline. We can do that with the following loop:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">candidate_summaries = []\n\nfor i, text in enumerate(texts):\n    if i % 100 == 0:\n        print(i)\n    candidate = summarizer(text, min_length=5, max_length=20)\n    candidate_summaries.append(candidate[0]['summary_text'])\n<\/code><\/pre>\n<\/p><\/div>\n<p>We use the <code>min_length<\/code> and <code>max_length<\/code> parameters to control the summary the model generates. In this example, we set <code>min_length<\/code> to 5 because we want the title to be at least five words long. And by estimating the reference summaries (the actual titles for the research papers), we determine that 20 could be a reasonable value for <code>max_length<\/code>. But again, this is just a first attempt. When the project is in the experimentation phase, these two parameters can and should be changed to see if the model performance changes.<\/p>\n<h3>Additional parameters<\/h3>\n<p>If you\u2019re already familiar with text generation, you might know there are many more parameters to influence the text a model generates, such as beam search, sampling, and temperature. These parameters give you more control over the text that is being generated, for example make the text more fluent and less repetitive. These techniques are not available in the Pipeline API\u2014you can see in the <a href=\"https:\/\/github.com\/huggingface\/transformers\/blob\/master\/src\/transformers\/pipelines\/text2text_generation.py#L151\" target=\"_blank\" rel=\"noopener noreferrer\">source code<\/a> that <code>min_length<\/code> and <code>max_length<\/code> are the only parameters that are considered. After we train and deploy our own model, however, we have access to those parameters. More on that in section 4 of this post.<\/p>\n<h3>Model evaluation<\/h3>\n<p>After we have the generated the zero-shot summaries, we can use our ROUGE function again to compare the candidate summaries with the reference summaries:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">from datasets import load_metric\nmetric = load_metric(\"rouge\")\n\ndef calc_rouge_scores(candidates, references):\n    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)\n    result = {key: round(value.mid.fmeasure * 100, 1) for key, value in result.items()}\n    return result<\/code><\/pre>\n<\/p><\/div>\n<p>Running this calculation on the summaries that were generated with the ZSL model gives us the following results:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image006.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34012\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image006.png\" alt=\"\" width=\"554\" height=\"78\"><\/a><\/p>\n<p>When we compare those with our baseline, we see that this ZSL model is actually performing worse that our simple heuristic of just taking the first sentence. Again, this is not unexpected: although this model knows how to summarize news articles, it has never seen an example of summarizing the abstract of an academic research paper.<\/p>\n<h3>Baseline comparison<\/h3>\n<p>We have now created two baselines: one using a simple heuristic and one with an ZSL model. By comparing the ROUGE scores, we see that the simple heuristic currently outperforms the deep learning model.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image007.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34013\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image007.png\" alt=\"\" width=\"700\" height=\"150\"><\/a><\/p>\n<p>In the next section, we take this same deep learning model and try to improve its performance. We do so by training it on the arXiv dataset (this step is also called <em>fine-tuning<\/em>). We take advantage of the fact that it already knows how to summarize text in general. We then show it lots of examples of our arXiv dataset. Deep learning models are exceptionally good at identifying patterns in datasets after they get trained on it, so we expect the model to get better at this particular task.<\/p>\n<h2>Section 3: Train a summarization model<\/h2>\n<p>In this section, we train the model we used for zero-shot summaries in section 2 (<code>sshleifer\/distilbart-cnn-12-6<\/code>) on our dataset. The idea is to teach the model what summaries for abstracts of research papers look like by showing it many examples. Over time the model should recognize the patterns in this dataset, which will allow it to create better summaries.<\/p>\n<p>It\u2019s worth noting once more that if you have labeled data, namely texts and corresponding summaries, you should use those to train a model. Only by doing so can the model learn the patterns of your specific dataset.<\/p>\n<p>The complete code for the model training is in the following <a href=\"https:\/\/github.com\/marshmellow77\/text-summarisation-project\/blob\/main\/3_model_training.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">notebook<\/a>.<\/p>\n<h3>Set up a training job<\/h3>\n<p>Because training a deep learning model would take a few weeks on a laptop, we use <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> training jobs instead. For more details, refer to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/how-it-works-training.html\" target=\"_blank\" rel=\"noopener noreferrer\">Train a Model with Amazon SageMaker<\/a>. In this post, I briefly highlight the advantage of using these training jobs, besides the fact that they allow us to use GPU compute instances.<\/p>\n<p>Let\u2019s assume we have a cluster of GPU instances we can use. In that case, we likely want to create a Docker image to run the training so that we can easily replicate the training environment on other machines. We then install the required packages and because we want to use several instances, we need to set up distributed training as well. When the training is complete, we want to quickly shut down these computers because they are costly.<\/p>\n<p>All these steps are abstracted away from us when using training jobs. In fact, we can train a model in the same way as described by specifying the training parameters and then just calling one method. SageMaker takes care of the rest, including stopping the GPU instances when the training is complete so to not incur any further costs.<\/p>\n<p>In addition, Hugging Face and AWS announced a partnership earlier in 2022 that makes it even easier to train Hugging Face models on SageMaker. This functionality is available through the development of Hugging Face\u00a0<a class=\"c-link\" href=\"https:\/\/docs.aws.amazon.com\/deep-learning-containers\/latest\/devguide\/what-is-dlc.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/docs.aws.amazon.com\/deep-learning-containers\/latest\/devguide\/what-is-dlc.html\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">AWS Deep Learning Containers<\/a>\u00a0(DLCs). These containers include Hugging Face Transformers, Tokenizers and the Datasets library, which allows us to use these resources for training and inference jobs. For a list of the available DLC images, see available\u00a0<a class=\"c-link\" href=\"https:\/\/github.com\/aws\/deep-learning-containers\/blob\/master\/available_images.md#huggingface-training-containers\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/github.com\/aws\/deep-learning-containers\/blob\/master\/available_images.md#huggingface-training-containers\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Hugging Face Deep Learning Containers Images<\/a>. They are maintained and regularly updated with security patches. We can find many examples of how to train Hugging Face models with these DLCs and the\u00a0<a class=\"c-link\" href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/frameworks\/huggingface\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/frameworks\/huggingface\/index.html\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">Hugging Face Python SDK<\/a>\u00a0in the following\u00a0<a class=\"c-link\" href=\"https:\/\/github.com\/huggingface\/notebooks\/tree\/master\/sagemaker\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/github.com\/huggingface\/notebooks\/tree\/master\/sagemaker\" data-sk=\"tooltip_parent\" data-remove-tab-index=\"true\">GitHub repo<\/a>.<\/p>\n<p>We use one of those examples as a template because it does almost everything we need for our purpose: <a href=\"https:\/\/github.com\/huggingface\/notebooks\/blob\/master\/sagemaker\/08_distributed_summarization_bart_t5\/sagemaker-notebook.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">train a summarization model<\/a> on a specific dataset in a distributed manner (using more than one GPU instance).<\/p>\n<p>One thing, however, we have to account for is that this example uses a dataset directly from the Hugging Face dataset hub. Because we want to provide our own custom data, we need to amend the notebook slightly.<\/p>\n<h3>Pass data to the training job<\/h3>\n<p>To account for the fact that we bring our own dataset, we need to use <em>channels<\/em>. For more information, refer to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/your-algorithms-training-algo-running-container.html\" target=\"_blank\" rel=\"noopener noreferrer\">How Amazon SageMaker Provides Training Information<\/a>.<\/p>\n<p>I personally find this term a bit confusing, so in my mind I always think <em>mapping <\/em>when I hear <em>channels<\/em>, because it helps me better visualize what happens. Let me explain: as we already learned, the training job spins up a cluster of <a href=\"http:\/\/aws.amazon.com\/ec2\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud<\/a> (Amazon EC2) instances and copies a Docker image onto it. However, our datasets are stored in <a href=\"http:\/\/aws.amazon.com\/s3\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) and can\u2019t be accessed by that Docker image. Instead, the training job needs to copy the data from Amazon S3 to a predefined path locally in that Docker image. The way it does that is by us telling the training job where the data resides in Amazon S3 and where on the Docker image the data should be copied to so that the training job can access it. We <em>map<\/em> the Amazon S3 location with the local path.<\/p>\n<p>We set the local path in the hyperparameters section of the training job:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image008.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34014\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image008.png\" alt=\"\" width=\"585\" height=\"126\"><\/a><\/p>\n<p>Then we tell the training job where the data resides in Amazon S3 when calling the fit() method, which starts the training:<br \/><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image009.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34015\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image009.png\" alt=\"\" width=\"593\" height=\"40\"><\/a><\/p>\n<p>Note that the folder name after <code>\/opt\/ml\/input\/data<\/code> matches the channel name (<code>datasets<\/code>). This enables the training job to copy the data from Amazon S3 to the local path.<\/p>\n<h3>Start the training<\/h3>\n<p>We\u2019re now ready to start the training job. As mentioned before, we do so by calling the <code>fit()<\/code> method. The training job runs for about 40 minutes. You can follow the progress and see additional information on the SageMaker console.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image010.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34016\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image010.png\" alt=\"\" width=\"700\" height=\"454\"><\/a><\/p>\n<p>When the training job is complete, it\u2019s time to evaluate our newly trained model.<\/p>\n<h2>Section 4: Evaluate the trained model<\/h2>\n<p>Evaluating our trained model is very similar to what we did in section 2, where we evaluated the ZSL model. We call the model and generate candidate summaries and compare them to the reference summaries by calculating the ROUGE scores. But now, the model sits in Amazon S3 in a file called <code>model.tar.gz<\/code> (to find the exact location, you can check the training job on the console). So how do we access the model to generate summaries?<\/p>\n<p>We have two options: deploy the model to a SageMaker endpoint or download it locally, similar to what we did in section 2 with the ZSL model. In this tutorial, I <a href=\"https:\/\/github.com\/marshmellow77\/text-summarisation-project\/blob\/main\/4a_model_testing_deployed.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">deploy the model to a SageMaker endpoint<\/a> because it\u2019s more convenient and by choosing a more powerful instance for the endpoint, we can shorten the inference time significantly. The GitHub repo contains a <a href=\"https:\/\/github.com\/marshmellow77\/text-summarisation-project\/blob\/main\/4b_model_testing_local.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">notebook<\/a> that shows how to evaluate the model locally.<\/p>\n<h3>Deploy a model<\/h3>\n<p>It\u2019s usually very easy to deploy a trained model on SageMaker (see again the following example on <a href=\"https:\/\/github.com\/huggingface\/notebooks\/blob\/master\/sagemaker\/08_distributed_summarization_bart_t5\/sagemaker-notebook.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<\/a> from Hugging Face). After the model has been trained, we can call <code>estimator.deploy()<\/code> and SageMaker does the rest for us in the background. Because in our tutorial we switch from one notebook to the next, we have to locate the training job and the associated model first, before we can deploy it:<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image011.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34017\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image011.png\" alt=\"\" width=\"700\" height=\"94\"><\/a><\/p>\n<p>After we retrieve the model location, we can deploy it to a SageMaker endpoint:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">from sagemaker.huggingface import HuggingFaceModel\n\nmodel_for_deployment = HuggingFaceModel(entry_point='inference.py',\n                                        source_dir='inference_code',\n                                        model_data=model_data,\n                                        role=role,\n                                        pytorch_version='1.7.1',\n                                        py_version='py36',\n                                        transformers_version='4.6.1',\n                                        )\n\npredictor = model_for_deployment.deploy(initial_instance_count=1,\n                                        instance_type='ml.g4dn.xlarge',\n                                        serializer=sagemaker.serializers.JSONSerializer(),\n                                        deserializer=sagemaker.deserializers.JSONDeserializer()\n                                        )\n<\/code><\/pre>\n<\/p><\/div>\n<p>Deployment on SageMaker is straightforward because it uses the <a href=\"https:\/\/github.com\/aws\/sagemaker-huggingface-inference-toolkit\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker Hugging Face Inference Toolkit<\/a>, an open-source library for serving Transformers models on SageMaker. We normally don\u2019t even have to provide an inference script; the toolkit takes care of that. In that case, however, the toolkit utilizes the Pipeline API again, and as we discussed in section 2, the Pipeline API doesn\u2019t allow us to use advanced text generation techniques such as beam search and sampling. To avoid this limitation, we provide our <a href=\"https:\/\/github.com\/marshmellow77\/text-summarisation-project\/blob\/main\/inference_code\/inference.py\" target=\"_blank\" rel=\"noopener noreferrer\">custom inference script<\/a>.<\/p>\n<h3>First evaluation<\/h3>\n<p>For the first evaluation of our newly trained model, we use the same parameters as in section 2 with the zero-shot model to generate the candidate summaries. This allows to make an apple-to-apples comparison:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">candidate_summaries = []\n\nfor i, text in enumerate(texts):\n    data = {\"inputs\":text, \"parameters_list\":[{\"min_length\": 5, \"max_length\": 20}]}\n    candidate = predictor.predict(data)\n    candidate_summaries.append(candidate[0][0])<\/code><\/pre>\n<\/p><\/div>\n<p>We compare the summaries generated by the model with the reference summaries:<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image012.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34018\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image012.png\" alt=\"\" width=\"544\" height=\"72\"><\/a><\/p>\n<p>This is encouraging! Our first attempt to train the model, without any hyperparameter tuning, has improved the ROUGE scores significantly.<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image013.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34019\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image013.png\" alt=\"\" width=\"700\" height=\"141\"><\/a><\/p>\n<h3>Second evaluation<\/h3>\n<p>Now it\u2019s time to use some more advanced techniques such as beam search and sampling to play around with the model. For a detailed explanation what each of these parameters does, refer to <a href=\"https:\/\/huggingface.co\/blog\/how-to-generate\" target=\"_blank\" rel=\"noopener noreferrer\">How to generate text: using different decoding methods for language generation with Transformers<\/a>. Let\u2019s try it with a semi-random set of values for some of these parameters:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">candidate_summaries = []\n\nfor i, text in enumerate(texts):\n    data = {\"inputs\":text,\n            \"parameters_list\":[{\"min_length\": 5, \"max_length\": 20, \"num_beams\": 50, \"top_p\": 0.9, \"do_sample\": True}]}\n    candidate = predictor.predict(data)\n    candidate_summaries.append(candidate[0][0])<\/code><\/pre>\n<\/p><\/div>\n<p>When running our model with these parameters, we get the following scores:<\/p>\n<p><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image014.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-34020\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/09\/ML-7391-image014.png\" alt=\"\" width=\"538\" height=\"78\"><\/a><\/p>\n<p>That didn\u2019t work out quite as we hoped\u2014the ROUGE scores have actually gone down slightly. However, don\u2019t let this discourage you from trying out different values for these parameters. In fact, this is the point where we finish with the setup phase and transition into the experimentation phase of the project.<\/p>\n<h2>Conclusion and next steps<\/h2>\n<p>We have concluded the setup for the experimentation phase. In this two-part series, we downloaded and prepared our data, created a baseline with a simple heuristic, created another baseline using zero-shot learning, and then trained our model and saw a significant increase in performance. Now it\u2019s time to play around with every part we created in order to create even better summaries. Consider the following:<\/p>\n<ul>\n<li><strong>Preprocess the data properly<\/strong> \u2013 For example, remove stopwords and punctuation. Don\u2019t underestimate this part\u2014in many data science projects, data preprocessing is one of the most important aspects (if not the most important), and data scientists typically spend most of their time with this task.<\/li>\n<li><strong>Try out different models<\/strong> \u2013 In our tutorial, we used the standard model for summarization (<code>sshleifer\/distilbart-cnn-12-6<\/code>), but <a href=\"https:\/\/huggingface.co\/models?pipeline_tag=summarization&amp;sort=downloads\" target=\"_blank\" rel=\"noopener noreferrer\">many more models<\/a> are available that you can use for this task. One of those might better fit your use case.<\/li>\n<li><strong>Perform hyperparameter tuning<\/strong> \u2013 When training the model, we used a certain set of hyperparameters (learning rate, number of epochs, and so on). These parameters aren\u2019t set in stone\u2014quite the opposite. You should change these parameters to understand how they affect your model performance.<\/li>\n<li><strong>Use different parameters for text generation<\/strong> \u2013 We already did one round of creating summaries with different parameters to utilize beam search and sampling. Try out different values and parameters. For more information, refer to <a href=\"https:\/\/huggingface.co\/blog\/how-to-generate\" target=\"_blank\" rel=\"noopener noreferrer\">How to generate text: using different decoding methods for language generation with Transformers<\/a>.<\/li>\n<\/ul>\n<p>I hope you made it to the end and found this tutorial useful.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><strong><a href=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/14\/heiko-hotz.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-34075 alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/03\/14\/heiko-hotz.jpg\" alt=\"\" width=\"100\" height=\"133\"><\/a>Heiko Hotz<\/strong> is a Senior Solutions Architect for AI &amp; Machine Learning and leads the Natural Language Processing (NLP) community within AWS. Prior to this role, he was the Head of Data Science for Amazon\u2019s EU Customer Service. Heiko helps our customers being successful in their AI\/ML journey on AWS and has worked with organizations in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. In his spare time Heiko travels as much as possible.<\/p>\n<p>       <!-- '\"` -->\n      <\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/part-2-set-up-a-text-summarization-project-with-hugging-face-transformers\/<\/p>\n","protected":false},"author":0,"featured_media":2011,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2010"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2010"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2010\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2011"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}