{"id":3453,"date":"2024-05-13T06:40:50","date_gmt":"2024-05-13T06:40:50","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2024\/05\/13\/dial-it-in-data-centers-need-new-metric-for-energy-efficiency\/"},"modified":"2024-05-13T06:40:50","modified_gmt":"2024-05-13T06:40:50","slug":"dial-it-in-data-centers-need-new-metric-for-energy-efficiency","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2024\/05\/13\/dial-it-in-data-centers-need-new-metric-for-energy-efficiency\/","title":{"rendered":"Dial It In: Data Centers Need New Metric for Energy Efficiency"},"content":{"rendered":"<div>\n\t\t<span class=\"bsf-rt-reading-time\"><span class=\"bsf-rt-display-label\"><\/span> <span class=\"bsf-rt-display-time\"><\/span> <span class=\"bsf-rt-display-postfix\"><\/span><\/span><\/p>\n<p>Data centers need an upgraded dashboard to guide their journey to greater <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/energy-efficiency\/\">energy efficiency<\/a>, one that shows progress running real-world applications.<\/p>\n<p>The formula for energy efficiency is simple: work done divided by energy used. Applying it to data centers calls for unpacking some details.<\/p>\n<p>Today\u2019s most widely used gauge \u2014 power usage effectiveness (<a href=\"https:\/\/leonardo-energy.pl\/wp-content\/uploads\/2018\/03\/Green_Grid_Metrics.pdf\">PUE<\/a>)\u00a0 \u2014 compares the total energy a facility consumes to the amount its computing infrastructure uses. Over the last 17 years, PUE has driven the most efficient operators closer to an ideal where almost no energy is wasted on processes like power conversion and cooling.<\/p>\n<h2><b>Finding the Next Metrics<\/b><\/h2>\n<p>PUE served data centers well during the rise of cloud computing, and it will continue to be useful. But it\u2019s insufficient in today\u2019s <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/generative-ai\/\">generative AI<\/a> era, when workloads and the systems running them have changed dramatically.<\/p>\n<p>That\u2019s because PUE doesn\u2019t measure the useful output of a data center, only the energy that it consumes. That\u2019d be like measuring the amount of gas an engine uses without noticing how far the car has gone.<\/p>\n<p>Many standards exist for data center efficiency. A<a href=\"https:\/\/drive.google.com\/file\/d\/1k2QE_Jk8p5A9azpw4YrfWlZj5Sc0qWAQ\/view\"> 2017 paper<\/a> lists nearly three dozen of them, several focused on specific targets such as cooling, water use, security and cost.<\/p>\n<h2><b>Understanding What\u2019s Watts<\/b><\/h2>\n<p>When it comes to energy efficiency, the computer industry has a long and somewhat unfortunate history of describing systems and the processors they use in terms of power, typically in watts. It\u2019s a worthwhile metric, but many fail to realize that watts only measure input power at a point in time, not the actual energy computers use or how efficiently they use it.<\/p>\n<p>So, when modern systems and processors report rising input power levels in watts, that doesn\u2019t mean they\u2019re less energy efficient. In fact, they\u2019re often much more efficient in the amount of work they do with the amount of energy they use.<\/p>\n<p>Modern data center metrics should focus on energy, what the engineering community knows as kilowatt-hours or joules. The key is how much useful work they do with this energy.<\/p>\n<h2><b>Reworking What We Call Work<\/b><\/h2>\n<p>Here again, the industry has a practice of measuring in abstract terms, like processor instructions or math calculations. So, MIPS (millions of instructions per second) and FLOPS (floating point operations per second) are widely quoted.<\/p>\n<p>Only computer scientists care how many of these low-level jobs their system can handle. Users would prefer to know how much real work their systems put out, but defining useful work is somewhat subjective.<\/p>\n<p>Data centers focused on AI may rely on the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/resources\/mlperf-benchmarks\/\">MLPerf benchmarks<\/a>. Supercomputing centers tackling scientific research typically use <a href=\"https:\/\/developer.nvidia.com\/hpc-application-performance\">additional measures<\/a> of work. Commercial data centers focused on streaming media may want others.<\/p>\n<p>The resulting suite of applications must be allowed to evolve over time to reflect the state of the art and the most relevant use cases. For example, the <a href=\"https:\/\/blogs.nvidia.com\/blog\/tensorrt-llm-inference-mlperf\/\">last MLPerf round<\/a> added tests using two generative AI models that didn\u2019t even exist five years ago.<\/p>\n<h2><b>A Gauge for Accelerated Computing<\/b><\/h2>\n<p>Ideally, any new benchmarks should measure advances in<a href=\"https:\/\/blogs.nvidia.com\/blog\/what-is-accelerated-computing\/\"> accelerated computing<\/a>. This combination of parallel processing hardware, software and methods is running applications dramatically faster and more efficiently than CPUs across many modern workloads.<\/p>\n<p>For example, on scientific applications, the Perlmutter supercomputer at the National Energy Research Scientific Computing Center <a href=\"https:\/\/blogs.nvidia.com\/blog\/gpu-energy-efficiency-nersc\/\">demonstrated<\/a> an average of 5x gains in energy efficiency using accelerated computing. That\u2019s why it\u2019s among the 39 of the top 50 supercomputers \u2014 including the No. 1 system \u2014 on<a href=\"https:\/\/www.top500.org\/lists\/green500\/2023\/11\/\"> the Green500 list<\/a> that use NVIDIA GPUs.<\/p>\n<figure id=\"attachment_71555\" aria-describedby=\"caption-attachment-71555\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/05\/Power-over-time.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-71555\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/05\/Power-over-time-672x297.jpg\" alt=\"Chart of GPU vs CPU energy efficiency\" width=\"672\" height=\"297\"><\/a><figcaption id=\"caption-attachment-71555\" class=\"wp-caption-text\">Because they execute lots of tasks in parallel, GPUs execute more work in less time than CPUs, saving energy.<\/figcaption><\/figure>\n<p>Companies across many industries share similar results. For example, PayPal improved real-time fraud detection by 10% and <a href=\"https:\/\/developer.nvidia.com\/blog\/gpu-inference-momentum-continues-to-build\/\">lowered server energy consumption<\/a> nearly 8x with accelerated computing.<\/p>\n<p>The gains are growing with each new generation of GPU hardware and software.<\/p>\n<p>In a <a href=\"https:\/\/aiindex.stanford.edu\/wp-content\/uploads\/2023\/04\/HAI_AI-Index-Report_2023.pdf\">recent report<\/a>, Stanford University\u2019s Human-Centered AI group estimated GPU performance \u201chas increased roughly 7,000 times\u201d since 2003, and price per performance is \u201c5,600 times greater.\u201d<\/p>\n<figure id=\"attachment_71558\" aria-describedby=\"caption-attachment-71558\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/05\/Graph-of-data-center-benchmarks-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-71558\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/05\/Graph-of-data-center-benchmarks-672x377.jpg\" alt=\"Chart depicts relationships among various data center energy efficiency graphics\" width=\"672\" height=\"377\"><\/a><figcaption id=\"caption-attachment-71558\" class=\"wp-caption-text\">Data centers need a suite of benchmarks to track energy efficiency across their major workloads.<\/figcaption><\/figure>\n<h2><b>Two Experts Weigh In<\/b><\/h2>\n<p>Experts see the need for a new energy-efficiency metric, too.<\/p>\n<p>With today\u2019s data centers achieving scores around 1.2 PUE, the metric \u201chas run its course,\u201d said Christian Belady, a data center engineer who had the original idea for PUE. \u201cIt improved data center efficiency when things were bad, but two decades later, they\u2019re better, and we need to focus on other metrics more relevant to today\u2019s problems.\u201d<\/p>\n<p>Looking forward, \u201cthe holy grail is a performance metric. You can\u2019t compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success,\u201d said Belady, who continues to work on initiatives driving data center sustainability.<\/p>\n<p>Jonathan Koomey, a researcher and author on computer efficiency and sustainability, agreed.<\/p>\n<p>\u201cTo make good decisions about efficiency, data center operators need a suite of benchmarks that measure the energy implications of today\u2019s most widely used AI workloads,\u201d said Koomey.<\/p>\n<p>\u201cTokens per joule is a great example of what one element of such a suite might be,\u201d Koomey added. \u201cCompanies will need to engage in open discussions, share information on the nuances of their own workloads and experiments, and agree to realistic test procedures to ensure these metrics accurately characterize energy use for hardware running real-world applications.\u201d<\/p>\n<p>\u201cFinally, we need an open public forum to conduct this important work,\u201d he said.<\/p>\n<h2><b>It Takes a Village<\/b><\/h2>\n<p>Thanks to metrics like PUE and rankings like the Green500, data centers and supercomputing centers have made enormous progress in energy efficiency.<\/p>\n<p>More can and must be done to extend efficiency advances in the age of generative AI. Metrics of energy consumed doing useful work on today\u2019s top applications can take supercomputing and data centers to a new level of energy efficiency.<\/p>\n<p><i>To learn more about available energy-efficiency solutions, explore<\/i><a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/sustainable-computing\/\"> <i>NVIDIA sustainable computing<\/i><\/a><i>.<\/i><\/p>\n<p>\t\t<!-- .entry-footer --><\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/datacenter-efficiency-metrics-isc\/<\/p>\n","protected":false},"author":0,"featured_media":3454,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3453"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=3453"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3453\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/3454"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=3453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=3453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=3453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}