{"id":3195,"date":"2023-09-29T16:20:33","date_gmt":"2023-09-29T16:20:33","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2023\/09\/29\/heeding-huangs-law-video-shows-how-engineers-keep-the-speedups-coming\/"},"modified":"2023-09-29T16:20:33","modified_gmt":"2023-09-29T16:20:33","slug":"heeding-huangs-law-video-shows-how-engineers-keep-the-speedups-coming","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2023\/09\/29\/heeding-huangs-law-video-shows-how-engineers-keep-the-speedups-coming\/","title":{"rendered":"Heeding Huang\u2019s Law: Video Shows How Engineers Keep the Speedups Coming"},"content":{"rendered":"<div id=\"bsf_rt_marker\">\n<p>In a talk, now available <a href=\"https:\/\/youtu.be\/rsxCZAE8QNA\">online<\/a>, NVIDIA Chief Scientist Bill Dally\u00a0 describes a tectonic shift in how computer performance gets delivered in a post-Moore\u2019s law era.<\/p>\n<p>Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems engineers. That\u2019s radically different from a generation ago, when engineers essentially relied on the physics of ever smaller, faster chips.<\/p>\n<p>The team of more than 300 that Dally leads at NVIDIA Research helped deliver a whopping 1,000x improvement in single GPU performance on AI inference over the past decade (see chart below).<\/p>\n<p>It\u2019s an astounding increase that <a href=\"https:\/\/spectrum.ieee.org\/move-over-moores-law-make-way-for-huangs-law\">IEEE Spectrum<\/a> was the first to dub \u201cHuang\u2019s Law\u201d after NVIDIA founder and CEO Jensen Huang. The label was later popularized by <a href=\"https:\/\/www.wsj.com\/articles\/huangs-law-is-the-new-moores-law-and-explains-why-nvidia-wants-arm-11600488001\">a column<\/a> in the Wall Street Journal.<\/p>\n<p><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/09\/1000x-GPU-Gains-in-10-Years.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/09\/1000x-GPU-Gains-in-10-Years-672x372.jpg\" alt=\"1000x leap in GPU performance in a decade\" width=\"672\" height=\"372\"><\/p>\n<p><\/a><\/p>\n<p>The advance was a response to the equally phenomenal rise of <a href=\"https:\/\/blogs.nvidia.com\/blog\/2023\/01\/26\/what-are-large-language-models-used-for\/\">large language models<\/a> used for <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/data-science\/generative-ai\/\">generative AI<\/a> that are growing by an order of magnitude every year.<\/p>\n<p>\u201cThat\u2019s been setting the pace for us in the hardware industry because we feel we have to provide for this demand,\u201d Dally said.<\/p>\n<p>In his talk, Dally detailed the elements that drove the 1,000x gain.<\/p>\n<p>The largest of all, a sixteen-fold gain, came from finding simpler ways to represent the numbers computers use to make their calculations.<\/p>\n<h2><b>The New Math<\/b><\/h2>\n<p>The latest <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/technologies\/hopper-architecture\/\">NVIDIA Hopper architecture<\/a> with its <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/22\/h100-transformer-engine\/\">Transformer Engine<\/a> uses a dynamic mix of eight- and 16-bit floating point and integer math. It\u2019s tailored to the needs of today\u2019s generative AI models. Dally detailed both the performance gains and the energy savings the new math delivers.<\/p>\n<p>Separately, his team helped achieve a 12.5x leap by crafting advanced instructions that tell the GPU how to organize its work. These complex commands help execute more work with less energy.<\/p>\n<p>As a result, computers can be \u201cas efficient as dedicated accelerators, but retain all the programmability of GPUs,\u201d he said.<\/p>\n<p>In addition, the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/ampere-architecture\/\">NVIDIA Ampere architecture<\/a> added <a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/05\/14\/sparsity-ai-inference\/\">structural sparsity<\/a>, an innovative way to simplify the weights in AI models without compromising the model\u2019s accuracy. The technique brought another 2x performance increase and promises future advances, too, he said.<\/p>\n<p>Dally described how <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/\">NVLink<\/a> interconnects between GPUs in a system and <a href=\"https:\/\/www.nvidia.com\/en-us\/networking\/\">NVIDIA networking<\/a> among systems compound the 1,000x gains in single GPU performance.<\/p>\n<h2><b>No Free Lunch\u00a0\u00a0<\/b><\/h2>\n<p>Though NVIDIA migrated GPUs from 28nm to 5nm semiconductor nodes over the decade, that technology only accounted for 2.5x of the total gains, Dally noted.<\/p>\n<p>That\u2019s a huge change from computer design a generation ago under Moore\u2019s law, an observation that performance should double every two years as chips become ever smaller and faster.<\/p>\n<p>Those gains were described in part by Denard scaling, essentially a physics formula defined in <a href=\"https:\/\/ieeexplore.ieee.org\/document\/1050511\">a 1974 paper<\/a> co-authored by IBM scientist Robert Denard. Unfortunately, the physics of shrinking hit natural limits such as the amount of heat the ever smaller and faster devices could tolerate.<\/p>\n<h2><b>An Upbeat Outlook<\/b><\/h2>\n<p>Dally expressed confidence that Huang\u2019s law will continue despite diminishing gains from Moore\u2019s law.<\/p>\n<p>For example, he outlined several opportunities for future advances in further simplifying how numbers are represented, creating more sparsity in AI models and designing better memory and communications circuits.<\/p>\n<p>Because each new chip and system generation demands new innovations, \u201cit\u2019s a fun time to be a computer engineer,\u201d he said.<\/p>\n<p>Dally believes the new dynamic in computer design is giving NVIDIA\u2019s engineers the three opportunities they desire most: to be part of a winning team, to work with smart people and to work on designs that have impact.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2023\/09\/29\/huangs-law-dally-hot-chips\/<\/p>\n","protected":false},"author":0,"featured_media":3196,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3195"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=3195"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3195\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/3196"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=3195"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=3195"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=3195"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}