{"id":2183,"date":"2022-06-29T17:39:51","date_gmt":"2022-06-29T17:39:51","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/06\/29\/nvidia-partners-show-leading-ai-performance-and-versatility-in-mlperf\/"},"modified":"2022-06-29T17:39:51","modified_gmt":"2022-06-29T17:39:51","slug":"nvidia-partners-show-leading-ai-performance-and-versatility-in-mlperf","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/06\/29\/nvidia-partners-show-leading-ai-performance-and-versatility-in-mlperf\/","title":{"rendered":"NVIDIA, Partners Show Leading AI Performance and Versatility in MLPerf"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/06\/29\/nvidia-partners-ai-mlperf\/\" data-title=\"NVIDIA, Partners Show Leading AI Performance and Versatility in MLPerf\" data-hashtags=\"\">\n<p>NVIDIA and its partners continued to provide the best overall AI training performance and the most submissions across all benchmarks with 90% of all entries coming from the ecosystem, according to MLPerf benchmarks released today.<\/p>\n<p>The NVIDIA AI platform covered all eight benchmarks in the MLPerf Training 2.0 round, highlighting its leading versatility.<\/p>\n<p>No other accelerator ran all benchmarks, which represent popular AI use cases including speech recognition, natural language processing, recommender systems, object detection, image classification and more. NVIDIA has done so consistently since submitting in December 2018 to the first round of MLPerf, an industry-standard suite of AI benchmarks.<\/p>\n<h2><b>Leading Benchmark Results, Availability<\/b><\/h2>\n<p>In its fourth consecutive MLPerf Training submission, the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/\">NVIDIA A100 Tensor Core GPU<\/a> based on the NVIDIA Ampere architecture continued to excel.<\/p>\n<figure id=\"attachment_57970\" aria-describedby=\"caption-attachment-57970\" class=\"wp-caption alignnone\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/06\/Slide9-400x225.jpg\" alt=\"\" width=\"663\" height=\"373\"><figcaption id=\"caption-attachment-57970\" class=\"wp-caption-text\">Fastest time to train on each network by each submitter\u2019s platform<\/figcaption><\/figure>\n<p>Selene \u2014 our in-house AI supercomputer based on the modular NVIDIA DGX SuperPOD and powered by NVIDIA A100 GPUs, our software stack and NVIDIA InfiniBand networking \u2014 turned in the fastest time to train on four out of eight tests.<\/p>\n<figure id=\"attachment_57973\" aria-describedby=\"caption-attachment-57973\" class=\"wp-caption alignnone\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/06\/Slide10-400x225.jpg\" alt=\"\" width=\"663\" height=\"373\"><figcaption id=\"caption-attachment-57973\" class=\"wp-caption-text\">To calculate per-chip performance, this chart normalizes every submission to the most common scale across submitters, and scores are normalized to the fastest competitor which is shown with 1x.<\/figcaption><\/figure>\n<p>NVIDIA A100 also continued its per-chip leadership, proving the fastest on six of the eight tests.<\/p>\n<p>A total of 16 partners submitted results this round using the NVIDIA AI platform. They include ASUS, Baidu, CASIA (Institute of Automation, Chinese Academy of Sciences), Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, KRAI, Lenovo, MosaicML, Nettrix and Supermicro.<\/p>\n<p>Most of our OEM partners submitted results using <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/products\/certified-systems\/\">NVIDIA-Certified Systems<\/a>, servers validated by NVIDIA to provide great performance, manageability, security and scalability for enterprise deployments.<\/p>\n<h2><b>Many Models Power Real AI Applications<\/b><\/h2>\n<p>An AI application may need to understand a user\u2019s spoken request, classify an image, make a recommendation and deliver a response as a spoken message.<\/p>\n<figure id=\"attachment_57976\" aria-describedby=\"caption-attachment-57976\" class=\"wp-caption alignnone\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/06\/Slide13-400x225.jpg\" alt=\"\" width=\"647\" height=\"364\"><figcaption id=\"caption-attachment-57976\" class=\"wp-caption-text\">Even the simple above use case requires nearly 10 models, highlighting the importance of running every benchmark<\/figcaption><\/figure>\n<p>These tasks require multiple kinds of AI models to work in sequence, also known as a pipeline. Users need to design, train, deploy and optimize these models fast and flexibly.<\/p>\n<p>That\u2019s why both versatility \u2013 the ability to run every model in MLPerf and beyond \u2013 as well as leading performance are vital for bringing real-world AI into production.<\/p>\n<h2><b>Delivering ROI With AI<\/b><\/h2>\n<p>For customers, their data science and engineering teams are their most precious resources, and their productivity determines the return on investment for AI infrastructure. Customers must consider the cost of expensive data science teams, which often plays a significant part in the total cost of deploying AI, as well as the relatively small cost of deploying the AI infrastructure itself.<\/p>\n<p>AI researcher productivity depends on the ability to quickly test new ideas, requiring both the versatility to train any model as well as the speed afforded by training those models at the largest scale.That\u2019s why organizations focus on overall productivity per dollar to determine the best AI platforms \u2014 a more comprehensive view that more accurately represents the true cost of deploying AI.<\/p>\n<p>In addition, the utilization of their AI infrastructure relies on its fungibility, or the ability to accelerate the entire AI workflow \u2014 from data prep to training to inference \u2014 on a single platform.<\/p>\n<p>With NVIDIA AI, customers can use the same infrastructure for the entire AI pipeline, repurposing it to match the varying demands between data preparation, training and inference, which dramatically boosts utilization, leading to very high ROI.<\/p>\n<p>And, as researchers discover new AI breakthroughs, supporting the latest model innovations is key to maximizing the useful life of AI infrastructure.<\/p>\n<p>NVIDIA AI delivers the highest productivity per dollar as it is universal and performant for every model, scales to any size and accelerates AI from end to end \u2014 from data prep to training to inference.<\/p>\n<p>Today\u2019s results provide the latest demonstration of NVIDIA\u2019s broad and deep AI expertise shown in every MLPerf training, inference and HPC round to date.<\/p>\n<h2><b>23x More Performance in 3.5 Years<\/b><\/h2>\n<p>In the two years since our first MLPerf submission with A100, our platform has delivered 6x more performance. Continuous optimizations to our software stack helped fuel those gains.<\/p>\n<p>Since the advent of MLPerf, the NVIDIA AI platform has delivered 23x more performance in 3.5 years on the benchmark \u2014 the result of full-stack innovation spanning GPUs, software and at-scale improvements. It\u2019s this continuous commitment to innovation that assures customers that the AI platform that they invest in today and keep in service for 3 to 5 years, will continue to advance to support the state-of-the-art.<\/p>\n<p>In addition the <a href=\"https:\/\/www.nvidia.com\/en-us\/technologies\/hopper-architecture\/\">NVIDIA Hopper architecture<\/a>, announced in March, promises another giant leap in performance in future MLPerf rounds.<\/p>\n<h2><b>How We Did It<\/b><\/h2>\n<p>Software innovation continues to unlock more performance on the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/ampere-architecture\/\">NVIDIA Ampere architecture<\/a>.<\/p>\n<p>For example, <a href=\"https:\/\/developer.nvidia.com\/blog\/cuda-graphs\/\">CUDA Graphs<\/a> \u2014 software that helps minimize launch overhead on jobs that run across many accelerators \u2014 is used extensively across our submissions. Optimized kernels in our libraries like cuDNN and pre-processing in DALI unlocked additional speedups. We also implemented full stack improvements across hardware, software and networking such as NVIDIA Magnum IO and SHARP, which offloads some AI functions into the network to drive even greater performance, especially at scale.<\/p>\n<p>All the software we use is available from the MLPerf repository, so everyone can get our world-class results. We continuously fold these optimizations into containers available on <a href=\"https:\/\/ngc.nvidia.com\/catalog\">NGC<\/a>, our software hub for GPU applications, and offer NVIDIA AI Enterprise to deliver optimized software, fully supported by NVIDIA.<\/p>\n<p>Two years after the debut of A100, the NVIDIA AI platform continues to deliver the highest performance in MLPerf 2.0, and is the only platform to submit on every single benchmark. Our next-generation Hopper architecture promises another giant leap in future MLPerf rounds.<\/p>\n<p>Our platform is universal for every model and framework at any scale, and provides the fungibility to handle every part of the AI workload. It\u2019s available from every major cloud and server maker.<\/p>\n<p>\u00a0<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/06\/29\/nvidia-partners-ai-mlperf\/<\/p>\n","protected":false},"author":0,"featured_media":2184,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2183"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2183"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2183\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2184"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}