{"id":246,"date":"2020-09-17T23:53:11","date_gmt":"2020-09-17T23:53:11","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/17\/serving-pytorch-models-in-production-with-the-amazon-sagemaker-native-torchserve-integration\/"},"modified":"2020-09-17T23:53:11","modified_gmt":"2020-09-17T23:53:11","slug":"serving-pytorch-models-in-production-with-the-amazon-sagemaker-native-torchserve-integration","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/17\/serving-pytorch-models-in-production-with-the-amazon-sagemaker-native-torchserve-integration\/","title":{"rendered":"Serving PyTorch models in production with the Amazon SageMaker native TorchServe integration"},"content":{"rendered":"<div id=\"\">\n<p>In April 2020, AWS and Facebook announced the launch of <a href=\"https:\/\/github.com\/pytorch\/serve\" target=\"_blank\" rel=\"noopener noreferrer\">TorchServe<\/a> to allow researches and machine learning (ML) developers from the <a href=\"https:\/\/pytorch.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">PyTorch<\/a> community to bring their models to production more quickly and without needing to write custom code. TorchServe is an open-source project that answers the industry question of how to go from a notebook to production using PyTorch and customers around the world, such as <a href=\"https:\/\/aws.amazon.com\/pytorch\/customers\/\" target=\"_blank\" rel=\"noopener noreferrer\">Matroid<\/a>, are experiencing the benefits firsthand. Similarly, over 10,000 customers have adopted <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a> to quickly build, train, and deploy ML models at scale, and many of them have made it their standard platform for ML. From a model serving perspective, Amazon SageMaker abstracts all the infrastructure-centric heavy lifting and allows you to deliver low-latency predictions securely and reliably to millions of concurrent users around the world.<\/p>\n<h2>TorchServe\u2019s native integration with Amazon SageMaker<\/h2>\n<p>AWS is excited to announce that TorchServe is now natively supported in Amazon SageMaker as the default model server for PyTorch inference. Previously, you could use TorchServe with Amazon SageMaker by installing it on a notebook instance and starting a server to perform local inference or by building a TorchServe container and referencing its image to create a hosted endpoint. However, full notebook installations can be time-intensive and some data scientists and ML developers may not prefer to manage all the steps and <a href=\"http:\/\/aws.amazon.com\/iam\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management<\/a> (IAM) permissions involved with building the Docker container and storing the image on <a href=\"https:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) before ultimately uploading the model to <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service<\/a> (Amazon S3) and deploying the model endpoint. With this release, you can use the native Amazon SageMaker SDK to serve PyTorch models with TorchServe.<\/p>\n<p>To support TorchServe natively in Amazon SageMaker, the AWS engineering teams submitted pull requests to the <a href=\"https:\/\/github.com\/aws\/sagemaker-pytorch-inference-toolkit\/pull\/79\" target=\"_blank\" rel=\"noopener noreferrer\">aws\/sagemaker-pytorch-inference-toolkit<\/a> and the <a href=\"https:\/\/github.com\/aws\/deep-learning-containers\/pull\/347\" target=\"_blank\" rel=\"noopener noreferrer\">aws\/deep-learning-containers<\/a> repositories. After these were merged, we could use TorchServe via the Amazon SageMaker APIs for PyTorch inference. This change introduces a tighter integration with the PyTorch community. As more features related to the TorchServe serving framework are released in the future, they are tested, ported over, and made available as an <a href=\"https:\/\/github.com\/aws\/deep-learning-containers\/blob\/master\/available_images.md\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Deep Learning Container image<\/a>. It\u2019s important to note that our implementation hides the <a href=\"https:\/\/github.com\/pytorch\/serve\/blob\/master\/model-archiver\/README.md\" target=\"_blank\" rel=\"noopener noreferrer\">.mar<\/a> from the user while still using the <a href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/frameworks\/pytorch\/using_pytorch.html#get-predictions-from-a-pytorch-model\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker PyTorch API<\/a> everyone is used to.<\/p>\n<h2>The TorchServe architecture in Amazon SageMaker<\/h2>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15280\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-1.jpg\" alt=\"\" width=\"1000\" height=\"448\"><\/p>\n<p>You can use TorchServe natively with Amazon SageMaker through the following steps:<\/p>\n<ol>\n<li>\n<strong>Create a model in Amazon SageMaker<\/strong><strong>.<\/strong> By creating a model, you tell Amazon SageMaker where it can find the model components. This includes the Amazon S3 path where the model artifacts are stored and the Docker registry path for the Amazon SageMaker TorchServe image. In subsequent deployment steps, you specify the model by name. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateModel.html\" target=\"_blank\" rel=\"noopener noreferrer\">CreateModel<\/a>.<\/li>\n<li>\n<strong>Create an endpoint configuration for an HTTPS endpoint<\/strong><strong>.<\/strong> You specify the name of one or more models in production variants and the ML compute instances that you want Amazon SageMaker to launch to host each production variant. When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This provides continuous availability. Amazon SageMaker manages deploying the instances. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateEndpointConfig.html\" target=\"_blank\" rel=\"noopener noreferrer\">CreateEndpointConfig<\/a>.<\/li>\n<li>\n<strong>Create an HTTPS endpoint<\/strong><strong>.<\/strong> Provide the endpoint configuration to Amazon SageMaker. The service launches the ML compute instances and deploys the model or models as specified in the configuration. For more information, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_CreateEndpoint.html\" target=\"_blank\" rel=\"noopener noreferrer\">CreateEndpoint<\/a>. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint. For more information about the API, see <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/APIReference\/API_InvokeEndpoint.html\" target=\"_blank\" rel=\"noopener noreferrer\">InvokeEndpoint<\/a>.<\/li>\n<\/ol>\n<p>The Amazon SageMaker Python SDK simplifies these steps as we will demonstrate in the following example notebook.<\/p>\n<h3>Using a fine-tuned HuggingFace base transformer (RoBERTa)<\/h3>\n<p>For this post, we use a <a href=\"https:\/\/huggingface.co\/transformers\/\" target=\"_blank\" rel=\"noopener noreferrer\">HuggingFace transformer<\/a>, which provides us with a general-purpose architecture for Natural Language Understanding (NLU). Specifically, we present you with a <a href=\"https:\/\/huggingface.co\/roberta-base\" target=\"_blank\" rel=\"noopener noreferrer\">RoBERTa base<\/a> transformer that was fined tuned to perform sentiment analysis. The pre-trained checkpoint loads the additional head layers and the model will outputs positive, neutral, and negative sentiment of text.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15287\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-8.jpg\" alt=\"\" width=\"90\" height=\"63\"><\/p>\n<h3><strong>Deploying a CloudFormation Stack and verifying notebook creation<\/strong><\/h3>\n<p>You will deploy an ml.m5.xlarge Amazon SageMaker notebook instance. For more information about pricing, see <a href=\"https:\/\/aws.amazon.com\/sagemaker\/pricing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker Pricing<\/a>.<\/p>\n<ol>\n<li>Sign in to the <a href=\"http:\/\/aws.amazon.com\/console\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Management Console<\/a>.<\/li>\n<li>Choose from the following table to launch your template.<\/li>\n<\/ol>\n<table cellpadding=\"5px\">\n<thead>\n<tr>\n<td width=\"137\"><strong>Launch Template<\/strong><\/td>\n<td width=\"198\"><strong>Region<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"137\"><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=us-east-1#\/stacks\/create\/review?stackName=torchserve-on-aws&amp;templateURL=https:\/\/torchserve-workshop.s3.amazonaws.com\/torchserve-workshop-template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-14105 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/24\/launchstack.png\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/td>\n<td width=\"198\">\n<strong>N.Virginia<\/strong> (us-east-1)<\/td>\n<\/tr>\n<tr>\n<td width=\"137\"><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=eu-west-1#\/stacks\/create\/review?stackName=torchserve-on-aws&amp;templateURL=https:\/\/torchserve-workshop-eu-west-1.s3-eu-west-1.amazonaws.com\/torchserve-workshop-template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-14105 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/24\/launchstack.png\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/td>\n<td width=\"198\">\n<strong>Ireland<\/strong> (eu-west-1)<\/td>\n<\/tr>\n<tr>\n<td width=\"137\"><a href=\"https:\/\/console.aws.amazon.com\/cloudformation\/home?region=ap-southeast-1#\/stacks\/create\/review?stackName=torchserve-on-aws&amp;templateURL=https:\/\/torchserve-workshop-ap-southeast-1.s3-ap-southeast-1.amazonaws.com\/torchserve-workshop-template.yaml\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-14105 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/07\/24\/launchstack.png\" alt=\"\" width=\"107\" height=\"20\"><\/a><\/td>\n<td width=\"198\">\n<strong>Singapore<\/strong> (ap-southeast-1)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>You can launch this stack for any Region by updating the hyperlink\u2019s Region value.<\/p>\n<ol start=\"3\">\n<li>In the <strong>Capabilities and transforms<\/strong> section, select the three acknowledgement boxes.<\/li>\n<li>Choose <strong>Create stack<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15281\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-2.jpg\" alt=\"\" width=\"1000\" height=\"1035\"><\/p>\n<p>Your CloudFormation stack takes about 5 minutes to complete creating the Amazon SageMaker notebook instance and its IAM role.<\/p>\n<ol start=\"5\">\n<li>When the stack creation is complete, check the output on the <strong>Resources<\/strong> tab.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15282\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-3.jpg\" alt=\"\" width=\"1000\" height=\"331\">\n<\/li>\n<li>On the Amazon SageMaker console, under <strong>Notebook<\/strong>, choose <strong>Notebook instances<\/strong>.<\/li>\n<li>Locate your newly created notebook and choose <strong>Open Jupyter<\/strong>.<br \/><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-15283 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-4.jpg\" alt=\"\" width=\"1000\" height=\"327\">\n<\/li>\n<\/ol>\n<h3>Accessing the Lab<\/h3>\n<p>From within the notebook instance, navigate to the <code>serving_natively_with_amazon_sagemaker<\/code> directory and open <code>deploy.ipynb<\/code>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15284\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-5.jpg\" alt=\"\" width=\"1000\" height=\"368\"><\/p>\n<p>You can now run through the steps within the Jupyter notebook:<\/p>\n<ol>\n<li>Set up your hosting environment.<\/li>\n<li>Create your endpoint.<\/li>\n<li>Perform predictions with a TorchServe backend Amazon SageMaker endpoint.<\/li>\n<\/ol>\n<p>After setting up your hosting environment, creating an Amazon SageMaker endpoint using the native TorchServe estimator is as easy as:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">model = PyTorchModel(model_data=model_artifact,\r\n                   name=name_from_base('roberta-model'),\r\n                   role=role, \r\n                   entry_point='torchserve-predictor.py',\r\n                   source_dir='source_dir',\r\n                   framework_version='1.6.0',\r\n                   predictor_cls=SentimentAnalysis)\r\n\r\nendpoint_name = name_from_base('roberta-model')\r\npredictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=endpoint_name)\r\n<\/code><\/pre>\n<\/div>\n<h3><strong>Cleaning Up<\/strong><\/h3>\n<p>When you\u2019re finished with this lab, your Amazon SageMaker endpoint should have already been deleted. If not, complete the following steps to delete it:<\/p>\n<ol>\n<li>On the Amazon SageMaker console, under <strong>Inference<\/strong>, choose <strong>Endpoints<\/strong>.<\/li>\n<li>Select the endpoint (it should begin with <code>roberta-model<\/code>).<\/li>\n<li>From the <strong>Actions<\/strong> drop-down menu, choose <strong>Delete<\/strong>.<\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15285\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-6.jpg\" alt=\"\" width=\"1000\" height=\"381\"><\/p>\n<p>On the AWS CloudFormation console, delete the rest of your environment choosing the <code>torchserve-on-aws<\/code> stack and choosing <strong>Delete<\/strong>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-15286\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/08\/27\/serving-pytorch-models-7.jpg\" alt=\"\" width=\"1000\" height=\"343\"><\/p>\n<p>You can see two other stack names that were built based off of the original CloudFormation template. These are nested stacks and are automatically deleted with the main stack. The cleanup process takes just over 3 minutes to spin down your environment and will delete your notebook instance and the associated IAM role.<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p>As TorchServe continues to evolve around the very specific needs of the PyTorch community, AWS is focused on ensuring that you have a common and performant way to serve models with PyTorch. Whether you\u2019re using Amazon SageMaker, <a href=\"https:\/\/aws.amazon.com\/ec2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Compute Cloud (Amazon EC2<\/a>), or <a href=\"https:\/\/aws.amazon.com\/eks\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Kubernetes Service (Amazon EKS<\/a>), you can expect AWS to continue to optimize the backend infrastructure in support of our open-source community. We encourage all of you to submit pull requests and\/or create issues in our repositories (TorchServe, AWS Deep learning containers, PyTorch inference toolkit, etc) as needed.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-3180 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2018\/01\/25\/todd-escalona-100-1.jpg\" alt=\"\" width=\"100\" height=\"119\">As a Principal Solutions Architect, Todd spends his time working with strategic and global customers to define business requirements, provide architectural guidance around specific use cases, and design applications and services that are scalable, reliable, and performant. He has helped launch and scale the reinforcement learning powered AWS DeepRacer service, is a host for the AWS video series \u201cThis is My Architecture\u201d, and speaks regularly at AWS re:Invent, AWS Summits, and technology conferences around the world.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/serving-pytorch-models-in-production-with-the-amazon-sagemaker-native-torchserve-integration\/<\/p>\n","protected":false},"author":0,"featured_media":247,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/246"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=246"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/246\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/247"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}