{"id":2511,"date":"2022-08-19T16:51:35","date_gmt":"2022-08-19T16:51:35","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/08\/19\/nvidia-to-share-new-details-on-grace-cpu-hopper-gpu-nvlink-switch-jetson-orin-module-at-hot-chips\/"},"modified":"2022-08-19T16:51:35","modified_gmt":"2022-08-19T16:51:35","slug":"nvidia-to-share-new-details-on-grace-cpu-hopper-gpu-nvlink-switch-jetson-orin-module-at-hot-chips","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/08\/19\/nvidia-to-share-new-details-on-grace-cpu-hopper-gpu-nvlink-switch-jetson-orin-module-at-hot-chips\/","title":{"rendered":"NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/08\/19\/grace-hopper-nvswitch-hot-chips\/\" data-title=\"NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips\" data-hashtags=\"\">\n<p>In four talks over two days, senior NVIDIA engineers will describe innovations in <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/09\/01\/what-is-accelerated-computing\/\">accelerated computing<\/a> for modern data centers and systems <a href=\"https:\/\/blogs.nvidia.com\/blog\/2019\/10\/22\/what-is-edge-computing\/\">at the edge<\/a> of the network.<\/p>\n<p>Speaking at a virtual <a href=\"https:\/\/hotchips.org\/\">Hot Chips<\/a> event, an annual gathering of processor and system architects, they\u2019ll disclose performance numbers and other technical details for NVIDIA\u2019s first server CPU, the Hopper GPU, the latest version of the NVSwitch interconnect chip and the NVIDIA Jetson Orin system on module (SoM).<\/p>\n<p>The presentations provide fresh insights on how the NVIDIA platform will hit new levels of performance, efficiency, scale and security.<\/p>\n<p>Specifically, the talks demonstrate a design philosophy of innovating across the full stack of chips, systems and software where GPUs, CPUs and DPUs act as peer processors. Together they create a platform that\u2019s already running AI, data analytics and high performance computing jobs inside cloud service providers, supercomputing centers, corporate data centers and autonomous systems.<\/p>\n<h2><b>Inside NVIDIA\u2019s First Server CPU<\/b><\/h2>\n<p>Data centers require flexible clusters of CPUs, GPUs and other accelerators sharing massive pools of memory to deliver the energy-efficient performance today\u2019s workloads demand.<\/p>\n<p>To meet that need, Jonathon Evans, a distinguished engineer and 15-year veteran at NVIDIA, will describe the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink-c2c\/\">NVIDIA NVLink-C2C<\/a>. It connects CPUs and GPUs at 900 gigabytes per second with 5x the energy efficiency of the existing PCIe Gen 5 standard, thanks to data transfers that consume just 1.3 picojoules per bit.<\/p>\n<p>NVLink-C2C connects two CPU chips to create the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/grace-cpu\/\">NVIDIA Grace CPU<\/a> with 144 Arm Neoverse cores. It\u2019s a processor built to solve the world\u2019s largest computing problems.<\/p>\n<p>For maximum efficiency, the Grace CPU uses LPDDR5X memory. It enables a terabyte per second of memory bandwidth while keeping power consumption for the entire complex to 500 watts.<\/p>\n<h2><b>One Link, Many Uses<\/b><\/h2>\n<p>NVLink-C2C also links Grace CPU and Hopper GPU chips as memory-sharing peers in the <a href=\"https:\/\/nvidianews.nvidia.com\/news\/nvidia-introduces-grace-cpu-superchip\">NVIDIA Grace Hopper Superchip<\/a>, delivering maximum acceleration for performance-hungry jobs such as AI training.<\/p>\n<p>Anyone can build custom chiplets using NVLink-C2C to coherently connect to NVIDIA GPUs, CPUs, DPUs and SoCs, expanding this new class of integrated products. The interconnect will support AMBA CHI and CXL protocols used by Arm and x86 processors, respectively.<\/p>\n<figure id=\"attachment_59078\" aria-describedby=\"caption-attachment-59078\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/Grace-benchmarks-draft-scaled.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/Grace-benchmarks-draft-672x221.jpg\" alt=\"Memory benchmarks for Grace and Grace Hopper\" width=\"672\" height=\"221\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-59078\" class=\"wp-caption-text\">First memory benchmarks for Grace and Grace Hopper.<\/figcaption><\/figure>\n<p>To scale at the system level, the new <a href=\"https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/tesla-product-literature\/nvswitch-technical-overview.pdf\">NVIDIA NVSwitch<\/a> connects multiple servers into one AI supercomputer. It uses NVLink, interconnects running at 900 gigabytes per second, more than 7x the bandwidth of PCIe Gen 5.<\/p>\n<p>NVSwitch lets users link 32 <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-h100\/\">NVIDIA DGX H100<\/a> systems into an AI supercomputer that delivers an <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/07\/26\/what-is-an-exaflop\/\">exaflop<\/a> of peak AI performance.<\/p>\n<p>Alexander Ishii and Ryan Wells, both veteran NVIDIA engineers, will describe how the switch lets users build systems with up to 256 GPUs to tackle demanding workloads like training AI models that have more than 1 trillion parameters.<\/p>\n<p>The switch includes engines that speed data transfers using the NVIDIA Scalable Hierarchical Aggregation Reduction Protocol. <a href=\"https:\/\/docs.nvidia.com\/networking\/display\/sharpv214\">SHARP<\/a> is an in-network computing capability that debuted on NVIDIA Quantum InfiniBand networks. It can double data throughput on communications-intensive AI applications.<\/p>\n<figure id=\"attachment_59081\" aria-describedby=\"caption-attachment-59081\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/NVSwitch-system-scaled.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/NVSwitch-system-672x339.jpg\" alt=\"NVSwitch systems enable exaflop-class AI\" width=\"672\" height=\"339\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-59081\" class=\"wp-caption-text\">NVSwitch systems enable exaflop-class AI supercomputers.<\/figcaption><\/figure>\n<p>Jack Choquette, a senior distinguished engineer with 14 years at the company, will provide a detailed tour of the <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/\">NVIDIA H100 Tensor Core GPU<\/a>, aka Hopper.<\/p>\n<p>In addition to using the new interconnects to scale to unprecedented heights, it packs many advanced features that boost the accelerator\u2019s performance, efficiency and security.<\/p>\n<p>Hopper\u2019s new <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/22\/h100-transformer-engine\/\">Transformer Engine<\/a> and upgraded Tensor Cores deliver a 30x speedup compared to the prior generation on AI inference with the world\u2019s largest neural network models. And it employs the world\u2019s first HBM3 memory system to deliver a whopping 3 terabytes of memory bandwidth, NVIDIA\u2019s biggest generational increase ever.<\/p>\n<p>Among other new features:<\/p>\n<p>Choquette, one of the lead chip designers on the Nintendo64 console early in his career, will also describe parallel computing techniques underlying some of Hopper\u2019s advances.<\/p>\n<p>Michael Ditty, an architecture manager with a 17-year tenure at the company, will provide new performance specs for <a href=\"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/jetson-orin\/\">NVIDIA Jetson AGX Orin<\/a>, an engine for edge AI, robotics and advanced autonomous machines.<\/p>\n<p>It integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to deliver up to 275 trillion operations per second on AI inference jobs. That\u2019s up to 8x greater performance at <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/04\/06\/mlperf-edge-ai-inference-orin\/\">2.3x higher energy efficiency<\/a> than the prior generation.<\/p>\n<p>The <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/08\/03\/nvidia-jetson-agx-orin-32gb-production-modules\/\">latest production module<\/a> packs up to 32 gigabytes of memory and is part of a compatible family that scales down to pocket-sized 5W Jetson Nano developer kits.<\/p>\n<figure id=\"attachment_59084\" aria-describedby=\"caption-attachment-59084\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/Orin-perf.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/08\/Orin-perf-672x445.jpg\" alt=\"Performance benchmarks for NVIDIA Orin\" width=\"672\" height=\"445\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-59084\" class=\"wp-caption-text\">Performance benchmarks for NVIDIA Orin<\/figcaption><\/figure>\n<p>All the new chips support the NVIDIA software stack that accelerates more than 700 applications and is used by 2.5 million developers.<\/p>\n<p>Based on the CUDA programming model, it includes dozens of NVIDIA SDKs for vertical markets like automotive (<a href=\"https:\/\/developer.nvidia.com\/drive\">DRIVE<\/a>) and healthcare (<a href=\"https:\/\/developer.nvidia.com\/clara\">Clara<\/a>), as well as technologies such as recommendation systems (<a href=\"https:\/\/developer.nvidia.com\/nvidia-merlin\">Merlin<\/a>) and conversational AI (<a href=\"https:\/\/developer.nvidia.com\/riva\">Riva<\/a>).<\/p>\n<p>The NVIDIA AI platform is available from every major cloud service and system maker.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/08\/19\/grace-hopper-nvswitch-hot-chips\/<\/p>\n","protected":false},"author":0,"featured_media":2512,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2511"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2511"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2512"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}