{"id":1497,"date":"2022-01-24T17:42:05","date_gmt":"2022-01-24T17:42:05","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/24\/meta-works-with-nvidia-to-build-massive-ai-research-supercomputer\/"},"modified":"2022-01-24T17:42:05","modified_gmt":"2022-01-24T17:42:05","slug":"meta-works-with-nvidia-to-build-massive-ai-research-supercomputer","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/01\/24\/meta-works-with-nvidia-to-build-massive-ai-research-supercomputer\/","title":{"rendered":"Meta Works with NVIDIA to Build Massive AI Research Supercomputer"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/01\/24\/meta-ai-supercomputer-dgx\/\" data-title=\"Meta Works with NVIDIA to Build Massive AI Research Supercomputer\" data-hashtags=\"\">\n<p>Meta Platforms gave a big thumbs up to NVIDIA, choosing our technologies for what it believes will be its most powerful research system to date.<\/p>\n<p>The AI Research SuperCluster (RSC), announced today, is already training new models to advance AI.<\/p>\n<p>Once fully deployed, Meta\u2019s RSC is expected to be the largest customer installation of NVIDIA DGX A100 systems.<\/p>\n<p>\u201cWe hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,\u201d the company said in a <a href=\"https:\/\/ai.facebook.com\/blog\/ai-rsc\">blog<\/a>.<\/p>\n<h2><b>Training AI\u2019s Largest Models<\/b><\/h2>\n<p>When RSC is fully built out, later this year, Meta aims to use it to train AI models with more than a trillion parameters. That could advance fields such as natural-language processing for jobs like identifying harmful content in real time.<\/p>\n<p>In addition to performance at scale, Meta cited extreme reliability, security, privacy and the flexibility to handle \u201ca wide range of AI models\u201d as its key criteria for RSC.<\/p>\n<figure id=\"attachment_55167\" aria-describedby=\"caption-attachment-55167\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/01\/meta-rsc-supercomputer.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/01\/meta-rsc-supercomputer.jpg\" alt=\"Meta RSC system\" width=\"600\" height=\"318\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-55167\" class=\"wp-caption-text\">Meta\u2019s AI Research SuperCluster features hundreds of NVIDIA DGX systems linked on an NVIDIA Quantum InfiniBand network to accelerate the work of its AI research teams.<\/figcaption><\/figure>\n<h2><b>Under the Hood<\/b><\/h2>\n<p>The new AI supercomputer currently uses 760 <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/\">NVIDIA DGX A100 systems<\/a> as its compute nodes. They pack a total of 6,080 <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/\">NVIDIA A100 GPUs<\/a> linked on an <a href=\"https:\/\/www.nvidia.com\/en-us\/networking\/products\/infiniband\/\">NVIDIA Quantum 200Gb\/s InfiniBand<\/a> network to deliver 1,895 petaflops of TF32 performance.<\/p>\n<p>Despite challenges from COVID-19, RSC took just 18 months to go from an idea on paper to a working AI supercomputer (shown in the video below) thanks in part to the NVIDIA DGX A100 technology at the foundation of Meta RSC.<\/p>\n<h2><b>20x Performance Gains<\/b><\/h2>\n<p>It\u2019s the second time Meta has picked NVIDIA technologies as the base for its research infrastructure. In 2017, Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day.<\/p>\n<p>Meta\u2019s early benchmarks showed RSC can train large NLP models 3x faster and run computer vision jobs 20x faster than the prior system.<\/p>\n<p>In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC\u2019s storage system to deliver up to an exabyte of data at 16 terabytes per second.<\/p>\n<h2><b>A Scalable Architecture<\/b><\/h2>\n<p>NVIDIA AI technologies are available to enterprises of any size.<\/p>\n<p>NVIDIA DGX, which includes a full stack of <a href=\"https:\/\/www.nvidia.com\/en-us\/gpu-cloud\/\">NVIDIA AI software<\/a>, scales easily from a single system to a DGX SuperPOD running on-premises or at a <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/colocation-partners\/\">colocation provider<\/a>. Customers can also rent DGX systems through <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-foundry\/\">NVIDIA DGX Foundry<\/a>.<\/p>\n<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/01\/24\/meta-ai-supercomputer-dgx\/<\/p>\n","protected":false},"author":0,"featured_media":1498,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1497"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=1497"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/1497\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/1498"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=1497"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=1497"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=1497"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}