{"id":3401,"date":"2024-03-21T14:41:35","date_gmt":"2024-03-21T14:41:35","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2024\/03\/21\/you-transformed-the-world-nvidia-ceo-tells-researchers-behind-landmark-ai-paper\/"},"modified":"2024-03-21T14:41:35","modified_gmt":"2024-03-21T14:41:35","slug":"you-transformed-the-world-nvidia-ceo-tells-researchers-behind-landmark-ai-paper","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2024\/03\/21\/you-transformed-the-world-nvidia-ceo-tells-researchers-behind-landmark-ai-paper\/","title":{"rendered":"\u2018You Transformed the World,\u2019 NVIDIA CEO Tells Researchers Behind Landmark AI Paper"},"content":{"rendered":"<div id=\"bsf_rt_marker\">\n<p>Of <a href=\"https:\/\/www.nvidia.com\/gtc\/\">GTC<\/a>\u2019s 900+ sessions, the most wildly popular was a conversation hosted by NVIDIA founder and CEO Jensen Huang with seven of the authors of the legendary research paper that introduced the aptly named <a href=\"https:\/\/blogs.nvidia.com\/blog\/what-is-a-transformer-model\/\">transformer<\/a> \u2014 a neural network architecture that went on to change the deep learning landscape and enable today\u2019s era of generative AI.<\/p>\n<p>\u201cEverything that we\u2019re enjoying today can be traced back to that moment,\u201d Huang said to a packed room with hundreds of attendees, who heard him speak with the authors of \u201c<a href=\"https:\/\/arxiv.org\/abs\/1706.03762\" target=\"_blank\" rel=\"noopener\">Attention Is All You Need<\/a>.\u201d<\/p>\n<p>Sharing the stage for the first time, the research luminaries reflected on the factors that led to their original paper, which has been cited more than 100,000 times since it was first published and presented at the NeurIPS AI conference. They also discussed their latest projects and offered insights into future directions for the field of generative AI.<\/p>\n<p>While they started as Google researchers, the collaborators are now spread across the industry, most as founders of their own AI companies.<\/p>\n<p>\u201cWe have a whole industry that is grateful for the work that you guys did,\u201d Huang said.<\/p>\n<figure id=\"attachment_70689\" aria-describedby=\"caption-attachment-70689\" class=\"wp-caption aligncenter\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/03\/JHH-Ai-Panel-5071-scaled.jpg\" alt=\"\" width=\"2048\" height=\"1365\"><figcaption id=\"caption-attachment-70689\" class=\"wp-caption-text\">From L to R: Lukasz Kaiser, Noam Shazeer, Aidan Gomez, Jensen Huang, Llion Jones, Jakob Uszkoreit, Ashish Vaswani and Illia Polosukhin.<\/figcaption><\/figure>\n<h2><b>Origins of the Transformer Model<\/b><\/h2>\n<p>The research team initially sought to overcome the limitations of <a href=\"https:\/\/blogs.nvidia.com\/blog\/whats-the-difference-between-a-cnn-and-an-rnn\/\">recurrent neural networks<\/a>, or RNNs, which were then the state of the art for processing language data.<\/p>\n<p>Noam Shazeer, cofounder and CEO of Character.AI, compared RNNs to the steam engine and transformers to the improved efficiency of internal combustion.<\/p>\n<p>\u201cWe could have done the industrial revolution on the steam engine, but it would just have been a pain,\u201d he said. \u201cThings went way, way better with internal combustion.\u201d<\/p>\n<p>\u201cNow we\u2019re just waiting for the fusion,\u201d quipped Illia Polosukhin, cofounder of blockchain company NEAR Protocol.<\/p>\n<p>The paper\u2019s title came from a realization that attention mechanisms \u2014 an element of neural networks that enable them to determine the relationship between different parts of input data \u2014 were the most critical component of their model\u2019s performance.<\/p>\n<p>\u201cWe had very recently started throwing bits of the model away, just to see how much worse it would get. And to our surprise it started getting better,\u201d said Llion Jones, cofounder and chief technology officer at Sakana AI.<\/p>\n<p>Having a name as general as \u201ctransformers\u201d spoke to the team\u2019s ambitions to build AI models that could process and transform every data type \u2014 including text, images, audio, tensors and biological data.<\/p>\n<p>\u201cThat North Star, it was there on day zero, and so it\u2019s been really exciting and gratifying to watch that come to fruition,\u201d said Aidan Gomez, cofounder and CEO of Cohere. \u201cWe\u2019re actually seeing it happen now.\u201d<\/p>\n<figure id=\"attachment_70692\" aria-describedby=\"caption-attachment-70692\" class=\"wp-caption aligncenter\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/03\/JHH-Ai-Panel-5282-scaled.jpg\" alt=\"\" width=\"2048\" height=\"1365\"><figcaption id=\"caption-attachment-70692\" class=\"wp-caption-text\">Packed house at the San Jose Convention Center.<\/figcaption><\/figure>\n<h2><b>Envisioning the Road Ahead\u00a0<\/b><\/h2>\n<p>Adaptive computation, where a model adjusts how much computing power is used based on the complexity of a given problem, is a key factor the researchers see improving in future AI models.<\/p>\n<p>\u201cIt\u2019s really about spending the right amount of effort and ultimately energy on a given problem,\u201d said Jakob Uszkoreit, cofounder and CEO of biological software company Inceptive. \u201cYou don\u2019t want to spend too much on a problem that\u2019s easy or too little on a problem that\u2019s hard.\u201d<\/p>\n<p>A math problem like two plus two, for example, shouldn\u2019t be run through a trillion-parameter transformer model \u2014 it should run on a basic calculator, the group agreed.<\/p>\n<p>They\u2019re also looking forward to the next generation of AI models.<\/p>\n<p>\u201cI think the world needs something better than the transformer,\u201d said Gomez. \u201cI think all of us here hope it gets succeeded by something that will carry us to a new plateau of performance.\u201d<\/p>\n<p>\u201cYou don\u2019t want to miss these next 10 years,\u201d Huang said. \u201cUnbelievable new capabilities will be invented.\u201d<\/p>\n<p>The conversation concluded with Huang presenting each researcher with a framed cover plate of the <a href=\"https:\/\/www.nvidia.com\/en-gb\/data-center\/dgx-systems\/dgx-1\/\">NVIDIA DGX-1<\/a> AI supercomputer, signed with the message, \u201cYou transformed the world.\u201d<\/p>\n<figure id=\"attachment_70695\" aria-describedby=\"caption-attachment-70695\" class=\"wp-caption aligncenter\">\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/03\/JHH-Ai-Panel-6840-scaled.jpg\" alt=\"\" width=\"2048\" height=\"1365\"><figcaption id=\"caption-attachment-70695\" class=\"wp-caption-text\">Jensen presents lead author Ashish Vaswani with a signed DGX-1 cover.<\/figcaption><\/figure>\n<p>There\u2019s still time to catch the <a href=\"https:\/\/www.nvidia.com\/gtc\/session-catalog\/?search=S63046&amp;tab.allsessions=1700692987788001F1cG\">session replay<\/a> by registering for a <a href=\"https:\/\/www.nvidia.com\/gtc\/pricing\/\">virtual GTC pass<\/a> \u2014 it\u2019s free.<\/p>\n<p>To discover the latest in generative AI, watch Huang\u2019s GTC keynote address:<\/p>\n<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/gtc-2024-transformer-ai-research-panel-jensen\/<\/p>\n","protected":false},"author":0,"featured_media":3402,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3401"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=3401"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3401\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/3402"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=3401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=3401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=3401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}