{"id":300,"date":"2020-09-29T13:04:23","date_gmt":"2020-09-29T13:04:23","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/09\/29\/aws-inferentia-is-now-available-in-11-aws-regions-with-best-in-class-performance-for-running-object-detection-models-at-scale\/"},"modified":"2020-09-29T13:04:23","modified_gmt":"2020-09-29T13:04:23","slug":"aws-inferentia-is-now-available-in-11-aws-regions-with-best-in-class-performance-for-running-object-detection-models-at-scale","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/09\/29\/aws-inferentia-is-now-available-in-11-aws-regions-with-best-in-class-performance-for-running-object-detection-models-at-scale\/","title":{"rendered":"AWS Inferentia is now available in 11 AWS Regions, with best-in-class performance for running object detection models at scale"},"content":{"rendered":"<div id=\"\">\n<p>AWS has expanded the availability of <a href=\"https:\/\/aws.amazon.com\/ec2\/instance-types\/inf1\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EC2 Inf1 instances<\/a> to four new AWS Regions, bringing the total number of supported Regions to 11: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Europe (Frankfurt, Ireland, Paris), and South America (S\u00e3o Paulo).<\/p>\n<p>Amazon EC2 Inf1 instances are powered by <a href=\"https:\/\/aws.amazon.com\/machine-learning\/inferentia\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Inferentia<\/a> chips, which are custom-designed to provide you with the lowest cost per inference in the cloud and lower the barriers for everyday developers to use machine learning (ML) at scale. Customers using models such as YOLO v3 and YOLO v4 can get up to 1.85 times higher throughput and up to 40% lower cost per inference compared to the EC2 G4 GPU-based instances.<\/p>\n<p>As you scale your use of deep learning across new applications, you may be bound by the high cost of running trained ML models in production. In many cases, up to 90% of the infrastructure cost spent on developing and running an ML application is on inference, making the need for high-performance, cost-effective ML inference infrastructure critical. Inf1 instances are built from the ground up to deliver faster performance and more cost-effective ML inference than comparable GPU-based instances. This gives you the performance and cost structure you need to confidently deploy your deep learning models across a broad set of applications.<\/p>\n<h2>AWS Neuron SDK performance and support for new ML models<\/h2>\n<p>You can deploy your ML models to Inf1 instances natively with popular ML frameworks such as TensorFlow, PyTorch, and MXNet. You can deploy your existing models to Amazon EC2 Inf1 instances with minimal code changes by using the <a href=\"https:\/\/aws.amazon.com\/machine-learning\/neuron\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Neuron<\/a> SDK, which is integrated with these popular ML frameworks. This gives you the freedom to maintain hardware portability and take advantage of the latest technologies without being tied to vendor-specific software libraries.<\/p>\n<p>Since its launch, the Neuron SDK has seen a dramatic improvement in the breadth of models that deliver best-in-class performance at a fraction of the cost. This includes natural language processing models like the popular BERT, image classification models (ResNet and VGG), and object detection models (OpenPose and SSD). The latest Neuron release (1.8.0) provides optimizations that improve performance of YOLO v3 and v4, VGG16, SSD300, and BERT. It also improves operational deployments of large-scale inference applications, with a session management agent incorporated into all supported ML frameworks and a new neuron tool that allows you to easily scale monitoring of large fleets of inference applications.<\/p>\n<h2>Customer success stories<\/h2>\n<p>Since the launch of Inf1 instances, a broad spectrum of customers, from large enterprises to startups, as well as Amazon services, have begun using them to run production workloads.<\/p>\n<p>Anthem is one of the nation\u2019s leading health benefits companies, serving the healthcare needs of over 40 million members across dozens of states. They use deep learning to automate the generation of actionable insights from customer opinions via natural language models.<\/p>\n<p>\u201cOur application is computationally intensive and needs to be deployed in a highly performant manner,\u201d says Numan Laanait, PhD, Principal AI\/Data Scientist at Anthem. \u201cWe seamlessly deployed our deep learning inferencing workload onto Amazon EC2 Inf1 instances powered by the AWS Inferentia processor. The new Inf1 instances provide two times higher throughput to GPU-based instances and allowed us to streamline our inference workloads.\u201d<\/p>\n<p>Cond\u00e9 Nast, another AWS customer, has a global portfolio that encompasses over 20 leading media brands, including Wired, Vogue, and Vanity Fair.<\/p>\n<p>\u201cWithin a few weeks, our team was able to integrate our recommendation engine with AWS Inferentia chips,\u201d says Paul Fryzel, Principal Engineer in AI Infrastructure at Cond\u00e9 Nast. \u201cThis union enables multiple runtime optimizations for state-of-the-art natural language models on SageMaker\u2019s Inf1 instances. As a result, we observed a performance improvement of a 72% reduction in cost than the previously deployed GPU instances.\u201d<\/p>\n<h2>Getting started<\/h2>\n<p>The easiest and quickest way to get started with Inf1 instances is via <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker<\/a>, a fully managed service for building, training, and deploying ML models. If you prefer to manage your own ML application development platforms, you can get started by either launching Inf1 instances with <a href=\"https:\/\/aws.amazon.com\/machine-learning\/amis\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Deep Learning AMIs<\/a>, which include the Neuron SDK, or use Inf1 instances via <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/amazon-eks-now-supports-ec2-inf1-instances\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Kubernetes Service<\/a> (Amazon EKS) or <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/amazon-ecs-now-supports-ec2-inf1-instances\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Service<\/a> (Amazon ECS) for containerized ML applications.<\/p>\n<p>For more information, see <a href=\"https:\/\/aws.amazon.com\/ec2\/instance-types\/inf1\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EC2 Inf1 Instances<\/a>.<\/p>\n<hr>\n<h3>About the Author<\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignleft size-full wp-image-16387\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2020\/09\/28\/gadi-hutt-100.jpg\" alt=\"\" width=\"100\" height=\"138\">Gadi Hutt is a Sr. Director, Business Development at AWS. Gadi has over 20 years\u2019 experience in engineering and business disciplines. He started his career as an embedded software engineer, and later on moved to product lead positions. Since 2013, Gadi leads Annapurna Labs technical business development and product management focused on hardware acceleration software and hardware products like the EC2 FPGA F1 instances and AWS Inferentia along side with its Neuron SDK, accelerating machine learning in the cloud.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/aws.amazon.com\/blogs\/machine-learning\/aws-inferentia-is-now-available-in-11-aws-regions-with-best-in-class-performance-for-running-object-detection-models-at-scale\/<\/p>\n","protected":false},"author":0,"featured_media":301,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/300"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=300"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/300\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/301"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=300"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=300"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}