{"id":2223,"date":"2022-07-26T16:41:58","date_gmt":"2022-07-26T16:41:58","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/07\/26\/what-is-an-exaflop\/"},"modified":"2022-07-26T16:41:58","modified_gmt":"2022-07-26T16:41:58","slug":"what-is-an-exaflop","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/07\/26\/what-is-an-exaflop\/","title":{"rendered":"What Is an Exaflop?"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/07\/26\/what-is-an-exaflop\/\" data-title=\"What Is an Exaflop?\" data-hashtags=\"\">\n<p>Computers are crunching more numbers than ever to crack the most complex problems of our time \u2014 how to cure diseases like COVID and cancer, mitigate climate change and more.<\/p>\n<p>These and other grand challenges ushered computing into today\u2019s exascale era when top performance is often measured in exaflops.<\/p>\n<h2><b>So, What\u2019s an Exaflop?<\/b><\/h2>\n<p>An exaflop is a measure of performance for a supercomputer that can calculate at least 10<sup>18<\/sup> or one quintillion floating point operations per second.<\/p>\n<p>In exaflop, the exa- prefix means a quintillion, that\u2019s a billion billion, or one followed by 18 zeros. Similarly, an exabyte is a memory subsystem packing a quintillion bytes of data.<\/p>\n<p>The \u201cflop\u201d in exaflop is an abbreviation for floating point operations. The rate at which a system executes a flop in seconds is measured in exaflop\/s.<\/p>\n<p>Floating point refers to calculations made where all the numbers are expressed with decimal points.<\/p>\n<h2><b>1,000 Petaflops = an Exaflop<\/b><\/h2>\n<p>The prefix peta- means 10<sup>15<\/sup>, or one with 15 zeros behind it. So, an exaflop is a thousand petaflops.<\/p>\n<p><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/Prefixes.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/Prefixes-672x390.jpg\" alt=\"The exaflop in historical context\" width=\"672\" height=\"390\"><\/p>\n<p><\/a><\/p>\n<p>To get a sense of what a heady calculation an exaflop is, imagine a billion people, each holding a billion calculators. (Clearly, they\u2019ve got big hands!)<\/p>\n<p>If they all hit the equal sign at the same time, they\u2019d execute one exaflop.<\/p>\n<p>Indiana University, home to the <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/08\/17\/ai-supercomputers-universities\/\">Big Red 200<\/a> and several other supercomputers, puts it this way: To match what an exaflop computer can do in just one second, you\u2019d have to perform one calculation every second for 31,688,765,000 years.<\/p>\n<h2><b>A Brief History of the Exaflop<\/b><\/h2>\n<p>For most of supercomputing\u2019s history, a flop was a flop, a reality that\u2019s morphing as workloads embrace AI.<\/p>\n<p>People used numbers expressed in the highest of several <a href=\"https:\/\/blogs.nvidia.com\/blog\/2019\/11\/15\/whats-the-difference-between-single-double-multi-and-mixed-precision-computing\/\">precision formats<\/a>, called double precision, as defined by the IEEE Standard for Floating Point Arithmetic. It\u2019s dubbed double precision, or FP64, because each number in a calculation requires 64 bits, data nuggets expressed as a zero or one. By contrast, single precision uses 32 bits.<\/p>\n<p>Double precision uses those 64 bits to ensure each number is accurate to a tiny fraction. It\u2019s like saying 1.0001 + 1.0001 = 2.0002, instead of 1 + 1 = 2.<\/p>\n<p>The format is a great fit for what made up the bulk of the workloads at the time \u2014 simulations of everything, from atoms to airplanes, that need to ensure their results come close to what they represent in the real world.<\/p>\n<p>So, it was natural that the LINPACK benchmark, aka HPL, that measures performance on FP64 math became the default measurement in 1993, when the TOP500 list of world\u2019s most powerful supercomputers debuted.<\/p>\n<h2><b>The Big Bang of AI<\/b><\/h2>\n<p>A decade ago, the computing industry heard what NVIDIA CEO Jensen Huang describes as the <a href=\"https:\/\/blogs.nvidia.com\/blog\/2016\/01\/12\/accelerating-ai-artificial-intelligence-gpus\/\">big bang of AI<\/a>.<\/p>\n<p>This powerful new form of computing started showing significant results on scientific and business applications. And it takes advantage of some\u00a0 very different mathematical methods.<\/p>\n<p>Deep learning is not about simulating real-world objects; it\u2019s about sifting through mountains of data to find patterns that enable fresh insights.<\/p>\n<p>Its math demands high throughput, so doing many, many calculations with simplified numbers (like 1.01 instead of 1.0001) is much better than doing fewer calculations with more complex ones.<\/p>\n<p>That\u2019s why AI uses lower precision formats like FP32, FP16 and FP8. Their 32-, 16- and 8-bit numbers let users do more calculations faster.<\/p>\n<h2><b>Mixed Precision Evolves<\/b><\/h2>\n<p>For AI, using 64-bit numbers would be like taking your whole closet when going away for the weekend.<\/p>\n<p>Finding the ideal lower-precision technique for AI is an active area of research.<\/p>\n<p>For example, the first <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/tensor-cores\/\">NVIDIA Tensor Core GPU<\/a>, Volta, used mixed precision. It executed matrix multiplication in FP16, then accumulated the results in FP32 for higher accuracy.<\/p>\n<h2><b>Hopper Accelerates With FP8<\/b><\/h2>\n<p>More recently, the <a href=\"https:\/\/www.nvidia.com\/en-us\/technologies\/hopper-architecture\/\">NVIDIA Hopper architecture<\/a> debuted with a lower-precision method for training AI that\u2019s even faster. The <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/22\/h100-transformer-engine\/\">Hopper Transformer Engine<\/a> automatically analyzes a workload, adopts FP8 whenever possible and accumulates results in FP32.<\/p>\n<p>When it comes to the less compute-intensive job of inference \u2014 running AI models in production \u2014 major frameworks such as TensorFlow and PyTorch support 8-bit integer numbers for fast performance. That\u2019s because they don\u2019t need decimal points to do their work.<\/p>\n<p>The good news is NVIDIA GPUs support all precision formats (above), so users can accelerate every workload optimally.<\/p>\n<p>Last year, the <a href=\"https:\/\/standards.ieee.org\/ieee\/3109\/10698\/\">IEEE P3109 committee<\/a> started work on an industry standard for precision formats used in machine learning. This work could take another year or two.<\/p>\n<h2><b>Some Sims Shine at Lower Precision<\/b><\/h2>\n<p>While FP64 remains popular for simulations, many use lower-precision math when it delivers useful results faster.<\/p>\n<figure id=\"attachment_58432\" aria-describedby=\"caption-attachment-58432\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/HPC-Perf-for-exaflop-explainer-source-2015-IBM-report.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/HPC-Perf-for-exaflop-explainer-source-2015-IBM-report-672x495.jpg\" alt=\"Factors for HPC app performance vary\" width=\"672\" height=\"495\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-58432\" class=\"wp-caption-text\">HPC apps vary in the factors that impact their performance.<\/figcaption><\/figure>\n<p>For example, researchers run in FP32 a popular simulator for car crashes, LS-Dyna from Ansys. Genomics is another field that tends to prefer lower-precision math.<\/p>\n<p>In addition, many traditional simulations are starting to adopt AI for at least part of their workflows. As workloads shift towards AI,\u00a0 supercomputers need to support lower precision to run these emerging applications well.<\/p>\n<h2><b>Benchmarks Evolve With Workloads<\/b><\/h2>\n<p>Recognizing these changes, researchers including Jack Dongarra \u2014 the 2021 Turing award winner and a contributor to HPL \u2014 debuted <a href=\"https:\/\/hpl-ai.org\/doc\/index\">HPL-AI<\/a> in 2019. It\u2019s a new benchmark that\u2019s better for measuring these new workloads.<\/p>\n<p>\u201cMixed-precision techniques have become increasingly important to improve the computing efficiency of supercomputers, both for traditional simulations with iterative refinement techniques as well as for AI applications,\u201d Dongarra said in a <a href=\"https:\/\/blogs.nvidia.com\/blog\/2019\/06\/17\/hpc-ai-performance-record-summit\/\">2019 blog<\/a>. \u201cJust as HPL allows benchmarking of double-precision capabilities, this new approach based on HPL allows benchmarking of mixed-precision capabilities of supercomputers at scale.\u201d<\/p>\n<p>Thomas Lippert, director of the J\u00fclich Supercomputing Center, agreed.<\/p>\n<p>\u201cWe\u2019re using the HPL-AI benchmark because it\u2019s a good measure of the mixed-precision work in a growing number of our AI and scientific workloads \u2014 and it reflects accurate 64-bit floating point results, too,\u201d he said in <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/06\/28\/top500-ai-cloud-native\/\">a blog<\/a> posted last year.<\/p>\n<h2><b>Today\u2019s Exaflop Systems<\/b><\/h2>\n<p>In a June report, 20 supercomputer centers around the world reported their <a href=\"https:\/\/hpl-ai.org\/doc\/results\">HPL-AI results<\/a>, three of them delivering more than an exaflop.<\/p>\n<p>One of those systems, a supercomputer at Oak Ridge National Laboratory, also exceeded an exaflop in FP64 performance on HPL.<\/p>\n<figure id=\"attachment_58429\" aria-describedby=\"caption-attachment-58429\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/HPLAI-June-2022.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/07\/HPLAI-June-2022-672x329.jpg\" alt=\"Exaflop results on HPL-AI\" width=\"672\" height=\"329\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-58429\" class=\"wp-caption-text\">A sampler of the June 2022 HPL-AI results.<\/figcaption><\/figure>\n<p>Two years ago, a very unconventional system was the<a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/04\/01\/foldingathome-exaflop-coronavirus\/\"> first to hit an exaflop<\/a>. The crowd-sourced supercomputer assembled by the Folding@home consortium passed the milestone after it put out a call for help fighting the COVID-19 pandemic and was deluged with donated time on more than a million computers.<\/p>\n<h2><b>Exaflop in Theory and Practice<\/b><\/h2>\n<p>Since then, many organizations have installed supercomputers that deliver more than an exaflop in theoretical peak performance. It\u2019s worth noting that the TOP500 list reports both Rmax (actual) and Rpeak (theoretical) scores.<\/p>\n<p>Rmax is simply the best performance a computer actually demonstrated.<\/p>\n<p>Rpeak is a system\u2019s top theoretical performance if everything could run at its highest possible level, something that almost never really happens. It\u2019s typically calculated by multiplying the number of processors in a system by their clock speed, then multiplying the result by the number of floating point operations the processors can perform in one second.<\/p>\n<p>So, if someone says their system can do an exaflop, consider asking if that\u2019s using Rmax (actual) or Rpeak (theoretical).<\/p>\n<h2><b>Many Metrics in the Exaflop Age<\/b><\/h2>\n<p>It\u2019s another one of the many nuances in this new exascale era.<\/p>\n<p>And it\u2019s worth noting that HPL and HPL-AI are synthetic benchmarks, meaning they measure performance on math routines, not real-world applications. Other benchmarks, like <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/resources\/mlperf-benchmarks\/\">MLPerf<\/a>, are based on real-world workloads.<\/p>\n<p>In the end, the best measure of a system\u2019s performance, of course, is how well it runs a user\u2019s applications. That\u2019s a measure not based on exaflops, but on ROI.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/07\/26\/what-is-an-exaflop\/<\/p>\n","protected":false},"author":0,"featured_media":2224,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2223"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2223"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2223\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2224"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}