{"id":2573,"date":"2022-10-10T15:40:16","date_gmt":"2022-10-10T15:40:16","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2022\/10\/10\/beyond-words-large-language-models-expand-ais-horizon\/"},"modified":"2022-10-10T15:40:16","modified_gmt":"2022-10-10T15:40:16","slug":"beyond-words-large-language-models-expand-ais-horizon","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2022\/10\/10\/beyond-words-large-language-models-expand-ais-horizon\/","title":{"rendered":"Beyond Words: Large Language Models Expand AI\u2019s Horizon"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2022\/10\/10\/llms-ai-horizon\/\" data-title=\"Beyond Words: Large Language Models Expand AI\u2019s Horizon\" data-hashtags=\"\">\n<p>Back in 2018, <a href=\"https:\/\/arxiv.org\/abs\/1810.04805\">BERT<\/a> got people talking about how <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/08\/16\/what-is-a-machine-learning-model\/\">machine learning models<\/a> were learning to read and speak. Today, large language models, or LLMs, are growing up fast, showing dexterity in all sorts of applications.<\/p>\n<p>They\u2019re, for one, speeding drug discovery, thanks to <a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/07\/16\/ai-reads-proteins-covid\/\">research<\/a> from <a href=\"https:\/\/www.rostlab.org\/\">the Rostlab<\/a> at Technical University of Munich, as well as <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/622803v4.full.pdf\">work<\/a> by a team from Harvard, Yale and New York University and <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2022.07.20.500902v1\">others<\/a>. In separate efforts, they applied LLMs to interpret the strings of amino acids that make up proteins, advancing our understanding of these building blocks of biology.<\/p>\n<p>It\u2019s one of many inroads LLMs are making in healthcare, robotics and other fields.<\/p>\n<h2><b>A Brief History of LLMs<\/b><\/h2>\n<p><a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/25\/what-is-a-transformer-model\/\">Transformer models<\/a> \u2014 neural networks, defined in 2017, that can learn context in sequential data \u2014 got LLMs started.<\/p>\n<p>Researchers behind BERT and other transformer models made 2018 \u201ca watershed moment\u201d for natural language processing, a <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/12\/key-breakthroughs-ai-ml-2018-trends-2019\/\">report on AI<\/a> said at the end of that year. \u201cQuite a few experts have claimed that the release of BERT marks a new era in NLP,\u201d it added.<\/p>\n<p>Developed by Google, BERT (aka Bidirectional Encoder Representations from Transformers) delivered state-of-the-art scores on benchmarks for NLP. In 2019, it <a href=\"https:\/\/blog.google\/products\/search\/search-language-understanding-bert\/\">announced<\/a> BERT powers the company\u2019s search engine.<\/p>\n<p>Google released BERT as <a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">open-source software<\/a>, spawning a family of follow-ons and setting off a race to build ever larger, more powerful LLMs.<\/p>\n<p>For instance, Meta created an enhanced version called <a href=\"https:\/\/ai.facebook.com\/blog\/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems\/\">RoBERTa<\/a>, released as open-source code in July 2017. For training, it used \u201can order of magnitude more data than BERT,\u201d the paper said, and leapt ahead on NLP leaderboards. A scrum followed.<\/p>\n<h2><b>Scaling Parameters and Markets<\/b><\/h2>\n<p>For convenience, score is often kept by the number of an LLM\u2019s parameters or weights, measures of the strength of a connection between two nodes in a neural network. BERT had 110 million, RoBERTa had 123 million, then BERT-Large weighed in at 354 million, setting a new record, but not for long.<\/p>\n<figure id=\"attachment_60091\" aria-describedby=\"caption-attachment-60091\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/Compute-for-Training-LLMs-GPT3-paper.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/Compute-for-Training-LLMs-GPT3-paper-672x385.jpg\" alt=\"Compute required for training LLMs\" width=\"672\" height=\"385\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-60091\" class=\"wp-caption-text\">As LLMs expanded into new applications, their size and computing requirements grew.<\/figcaption><\/figure>\n<p>In 2020, researchers at OpenAI and Johns Hopkins University announced <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\">GPT-3<\/a>, with a whopping 175 billion parameters, trained on a dataset with nearly a trillion words. It scored well on a slew of language tasks and even ciphered three-digit arithmetic.<\/p>\n<p>\u201cLanguage models have a wide range of beneficial applications for society,\u201d the researchers wrote.<\/p>\n<h2><b>Experts Feel \u2018Blown Away\u2019<\/b><\/h2>\n<p>Within weeks, people were using GPT-3 to create poems, programs, songs, websites and more. Recently, GPT-3 even wrote <a href=\"https:\/\/www.scientificamerican.com\/article\/we-asked-gpt-3-to-write-an-academic-paper-about-itself-mdash-then-we-tried-to-get-it-published\/\">an academic paper about itself<\/a>.<\/p>\n<p>\u201cI just remember being kind of blown away by the things that it could do, for being just a language model,\u201d said Percy Liang, a Stanford associate professor of computer science, speaking in <a href=\"https:\/\/web.stanford.edu\/class\/cs224u\/podcast\/liang\/\">a podcast<\/a>.<\/p>\n<p>GPT-3 helped motivate Stanford to create <a href=\"https:\/\/crfm.stanford.edu\/report.html\">a center<\/a> Liang now leads, exploring the implications of what it calls foundational models that can handle a wide variety of tasks well.<\/p>\n<h2><b>Toward Trillions of Parameters<\/b><\/h2>\n<p>Last year, NVIDIA <a href=\"https:\/\/nvidianews.nvidia.com\/news\/nvidia-brings-large-language-ai-models-to-enterprises-worldwide\">announced<\/a> the <a href=\"https:\/\/developer.nvidia.com\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/\">Megatron 530B<\/a> LLM that can be trained for new domains and languages. It debuted with tools and services for training language models with trillions of parameters.<\/p>\n<p>\u201cLarge language models have proven to be flexible and capable \u2026 able to answer deep domain questions without specialized training or supervision,\u201d Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, said at that time.<\/p>\n<p>Making it even easier for users to adopt the powerful models, the <a href=\"https:\/\/nvidianews.nvidia.com\/news\/nvidia-launches-large-language-model-cloud-services-to-advance-ai-and-digital-biology\">NVIDIA Nemo LLM service<\/a> debuted in September at GTC. It\u2019s an NVIDIA-managed cloud service to adapt pretrained LLMs to perform specific tasks.<\/p>\n<h2><b>Transformers Transform Drug Discovery<\/b><\/h2>\n<p>The advances LLMs are making with proteins and chemical structures are also being applied to DNA.<\/p>\n<p>Researchers aim to scale their work with <a href=\"https:\/\/www.nvidia.com\/en-us\/gpu-cloud\/bionemo\/\">NVIDIA BioNeMo<\/a>, a software framework and cloud service to generate, predict and understand biomolecular data. Part of the <a href=\"https:\/\/www.nvidia.com\/en-us\/clara\/drug-discovery\/\">NVIDIA Clara Discovery<\/a> collection of frameworks, applications and AI models for drug discovery, it supports work in widely used protein, DNA and chemistry data formats.<\/p>\n<p>NVIDIA BioNeMo features multiple pretrained AI models, including the <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/04\/12\/ai-drug-discovery-astrazeneca-university-florida-health\/\">MegaMolBART<\/a> model, developed by NVIDIA and AstraZeneca.<\/p>\n<figure id=\"attachment_60094\" aria-describedby=\"caption-attachment-60094\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/LLM-heathcare-Stanford.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/LLM-heathcare-Stanford-672x340.jpg\" alt=\"LLM use cases in healthcare\" width=\"672\" height=\"340\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-60094\" class=\"wp-caption-text\">In their paper on foundational models, Stanford researchers projected many uses for LLMs in healthcare.<\/figcaption><\/figure>\n<h2><b>LLMs Enhance Computer Vision<\/b><\/h2>\n<p>Transformers are also reshaping computer vision as powerful LLMs replace traditional convolutional AI models. For example, researchers at Meta AI and Dartmouth designed <a href=\"https:\/\/arxiv.org\/abs\/2102.05095\">TimeSformer<\/a>, an AI model that uses transformers to analyze video with state-of-the-art results.<\/p>\n<p>Experts predict such models could spawn all sorts of new applications in computational photography, education and interactive experiences for mobile users.<\/p>\n<p>In related work earlier this year, two companies released powerful AI models to generate images from text.<\/p>\n<p>OpenAI announced <a href=\"https:\/\/openai.com\/dall-e-2\/\">DALL-E 2<\/a>, a transformer model with 3.5 billion parameters designed to create realistic images from text descriptions. And recently, Stability AI, based in London, launched <a href=\"https:\/\/stability.ai\/blog\/stable-diffusion-announcement\">Stability Diffusion<\/a>,<\/p>\n<h2><b>Writing Code, Controlling Robots<\/b><\/h2>\n<p>LLMs also help developers write software. <a href=\"https:\/\/www.tabnine.com\/\">Tabnine<\/a> \u2014 a member of <a href=\"https:\/\/www.nvidia.com\/en-us\/startups\/\">NVIDIA Inception<\/a>, a program that nurtures cutting-edge startups \u2014 claims it\u2019s automating up to 30% of the code generated by a million developers.<\/p>\n<p>Taking the next step, researchers are using transformer-based models to teach robots used in manufacturing, construction, autonomous driving and personal assistants.<\/p>\n<p>For example, DeepMind developed <a href=\"https:\/\/www.deepmind.com\/publications\/a-generalist-agent\">Gato<\/a>, an LLM that taught a robotic arm how to stack blocks. The 1.2-billion parameter model was trained on more than 600 distinct tasks so it could be useful in a variety of modes and environments, whether playing games or animating chatbots.<\/p>\n<figure id=\"attachment_60097\" aria-describedby=\"caption-attachment-60097\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/Gato-LLM-has-many-apps.jpg\"><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2022\/10\/Gato-LLM-has-many-apps-672x439.jpg\" alt=\"Gato LLM has many applications\" width=\"672\" height=\"439\"><\/p>\n<p><\/a><figcaption id=\"caption-attachment-60097\" class=\"wp-caption-text\">The Gato LLM can analyze robot actions and images as well as text.<\/figcaption><\/figure>\n<p>\u201cBy scaling up and iterating on this same basic approach, we can build a useful general-purpose agent,\u201d researchers said in <a href=\"https:\/\/arxiv.org\/pdf\/2205.06175.pdf\">a paper<\/a> posted in May.<\/p>\n<p>It\u2019s another example of what the Stanford center in <a href=\"https:\/\/arxiv.org\/pdf\/2108.07258.pdf%20(214.pdf\">a July paper<\/a> called a paradigm shift in AI. \u201cFoundation models have only just begun to transform the way AI systems are built and deployed in the world,\u201d it said.<\/p>\n<p>Learn how companies around the world are <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/10\/05\/ai-large-language-models-triton\/\">implementing LLMs<\/a> with <a href=\"https:\/\/developer.nvidia.com\/nvidia-triton-inference-server\">NVIDIA Triton<\/a> for many use cases.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2022\/10\/10\/llms-ai-horizon\/<\/p>\n","protected":false},"author":0,"featured_media":2574,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2573"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2573"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2573\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2574"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2573"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2573"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2573"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}