{"id":853,"date":"2021-09-17T06:56:38","date_gmt":"2021-09-17T06:56:38","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2021\/09\/17\/pushing-forward-the-frontiers-of-natural-language-processing\/"},"modified":"2021-09-17T06:56:38","modified_gmt":"2021-09-17T06:56:38","slug":"pushing-forward-the-frontiers-of-natural-language-processing","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2021\/09\/17\/pushing-forward-the-frontiers-of-natural-language-processing\/","title":{"rendered":"Pushing Forward the Frontiers of Natural Language Processing\u00a0"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2021\/09\/16\/nlp-frontiers-ai-hardware-summit\/\" data-title=\"Pushing Forward the Frontiers of Natural Language Processing\u00a0\">\n<p>Idea generation, not hardware or software, needs to be the bottleneck to the advancement of AI, Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, said this week at the AI Hardware Summit.<\/p>\n<p>\u201cWe want the inventors, the researchers and the engineers that are coming up with future AI to be limited only by their own thoughts,\u201d Catanzaro told the audience.<\/p>\n<p>Catanzaro leads a team of researchers working to apply the power of deep learning to everything from video games to chip design. At the annual event held in Silicon Valley, he described the work that NVIDIA is doing to enable advancements in AI, with a focus on large language modeling.<\/p>\n<h2><b>CUDA Is for the Dreamers<\/b><\/h2>\n<p>Training and deploying large neural networks is a tough computational problem, so hardware that\u2019s both incredibly fast and highly efficient is a necessity, according to Catanzaro.<\/p>\n<p>But, he explained, the software that accompanies that hardware might be even more important to unlocking further advancements in AI.<\/p>\n<p>\u201cThe core of the work that we do involves optimizing hardware and software together, all the way from chips, to systems, to software, frameworks, libraries, compilers, algorithms and applications,\u201d he said. \u201cWe optimize all of these things to give transformational capabilities to scientists, researchers and engineers around the world.\u201d<\/p>\n<p>This end-to-end approach yields chart-topping performance in industry-standard benchmarks, such as <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/mlperf\/\">MLPerf<\/a>. It also ensures that developers aren\u2019t constrained by the platform as they aim to advance AI.<\/p>\n<p>\u201cCUDA is for the dreamers, CUDA is for the people who are thinking new thoughts,\u201d said Catanzaro. \u201cHow do they think those thoughts and test them efficiently? They need something general and flexible, and that\u2019s why we build what we build.\u201d<\/p>\n<h2><b>Large Language Models Are Changing the World<\/b><\/h2>\n<p>One of the most exciting areas of AI is language modeling, which is enabling groundbreaking applications in natural language understanding and <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/02\/25\/what-is-conversational-ai\/\">conversational AI<\/a>.<\/p>\n<p>The complexity of large language models is growing at an incredible rate, with parameter counts <i>doubling <\/i>every two months.<\/p>\n<p>A well-known example of a large and powerful language model is GPT-3, developed by OpenAI. Packing 175 billion parameters, it required 314 zettaflops (1021 floating point operations) to train.<\/p>\n<p>\u201cIt\u2019s a staggering amount of compute,\u201d Catanzaro said. \u201cAnd that means language modeling is now becoming constrained by economics.\u201d<\/p>\n<p>Estimates suggest that GPT-3 would cost about $12 million to train and, Catanzaro observed, the rapid growth in model complexity means that, despite NVIDIA\u2019s tireless work to advance the performance and efficiency of its hardware and software, the cost to train these models is set to grow.<\/p>\n<p>And, according to Catanzaro, this trend suggests that it might not be too long before a single model might require more than a billion dollars\u2019 worth of computer time to train.<\/p>\n<p>\u201cWhat would it look like to build a model that took a billion dollars to train a single model? Well, it would need to reinvent an entire company, and you\u2019d need to be able to use it in a lot of different contexts,\u201d Catanzaro explained.<\/p>\n<p>Catanzaro expects that these models will unlock an incredible amount of value, inspiring continued innovation. During his talk, Catanzaro showed an example of the surprising capabilities of large language models to solve new tasks without being explicitly trained to do so.<\/p>\n<p>After inputting just a few examples into a large language model \u2014 four sentences, with two written in English and their corresponding translations into Spanish \u2014 he then entered an English sentence, which the model then translated into Spanish properly.<\/p>\n<p>The model was able to do this despite never being trained to do translation. Instead, it was trained \u2014 using, as Catanzaro described, \u201can enormous amount of data from the internet\u201d \u2014 to predict the next word that should follow a given sequence of text.<\/p>\n<p>To perform that very generic task, the model needed to come up with higher-level representations of concepts, such as the existence of languages in general, English and Spanish vocabularies and grammar, and the concept of a translation task, in order to understand the query and properly respond.<\/p>\n<p>\u201cThese language models are first steps towards generalized artificial intelligence with few shot learning, and that is enormously valuable and very exciting,\u201d explained Catanzaro.<\/p>\n<h2><b>A Full-Stack Approach to Language Modeling\u00a0<\/b><\/h2>\n<p>Catanzaro then went on to describe <a href=\"https:\/\/developer.nvidia.com\/blog\/language-modeling-using-megatron-a100-gpu\/\">NVIDIA Megatron<\/a>, a framework created by NVIDIA using PyTorch \u201cfor efficiently training the world\u2019s largest, transformer-based language models.\u201d<\/p>\n<p>A key feature of NVIDIA Megatron, which Catanzaro notes has already been used by various companies and organizations to train large transformer-based models, is model parallelism.<\/p>\n<p>Megatron supports both inter-layer (pipeline) parallelism, which allows different layers of a model to be processed on different devices, as well as intra-layer (tensor) parallelism, which allows a single layer to be processed by multiple different devices.<\/p>\n<p>Catanzaro further described some of the optimizations that NVIDIA applies to maximize the efficiency of pipeline parallelism and minimize so-called \u201cpipeline bubbles,\u201d during which a GPU is not performing useful work.<\/p>\n<p>A batch is split into microbatches, the execution of which is pipelined. This boosts the utilization of the GPU resources in a system during training. With further optimizations, pipeline bubbles can be reduced even more.<\/p>\n<p>Catanzaro described an optimization, <a href=\"https:\/\/arxiv.org\/abs\/2104.04473\">recently published<\/a>, that entails \u201cround-robining each (pipeline) stage among multiple GPUs so that we can further reduce the amount of pipeline bubble overhead in this schedule.\u201d<\/p>\n<p>Although this optimization puts additional stress on the communication fabric within the system, Catanzaro showed that, by leveraging the full suite of NVIDIA\u2019s high-bandwidth, low-latency interconnect technologies, this optimization is able to deliver sizable speedups when training GPT-3 style models.<\/p>\n<p><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2021\/09\/ai-hardware-summit-slide.png\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2021\/09\/ai-hardware-summit-slide-672x378.png\" alt=\"\" width=\"672\" height=\"378\"><\/p>\n<p><\/a><\/p>\n<p>Catanzaro then highlighted the impressive performance scaling of Megatron on <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-superpod\/\">NVIDIA DGX SuperPOD<\/a>, achieving 502 petaflops sustained across 3,072 GPUs, representing an astonishing 52 percent of Tensor Core peak at scale.<\/p>\n<p>\u201cThis represents an achievement by all of NVIDIA and our partners in the industry: to be able to deliver that level of end-to-end performance requires optimizing the entire computing stack, from algorithms to interconnects, from frameworks to processors,\u201d said Catanzaro.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/feedproxy.google.com\/~r\/nvidiablog\/~3\/1aTPPXV3J4s\/<\/p>\n","protected":false},"author":0,"featured_media":854,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/853"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=853"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/853\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/854"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=853"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=853"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=853"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}