{"id":2907,"date":"2023-03-13T15:51:45","date_gmt":"2023-03-13T15:51:45","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2023\/03\/13\/what-are-foundation-models\/"},"modified":"2023-03-13T15:51:45","modified_gmt":"2023-03-13T15:51:45","slug":"what-are-foundation-models","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2023\/03\/13\/what-are-foundation-models\/","title":{"rendered":"What Are Foundation Models?"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2023\/03\/13\/what-are-foundation-models\/\" data-title=\"What Are Foundation Models?\" data-hashtags=\"\">\n<p>The mics were live and tape was rolling in the studio where the Miles Davis Quintet was recording dozens of tunes in 1956 for Prestige Records.<\/p>\n<p>When an engineer asked for the next song\u2019s title, Davis <a href=\"https:\/\/www.youtube.com\/watch?v=36wafFjFdYs\">shot back<\/a>, \u201cI\u2019ll play it, and tell you what it is later.\u201d<\/p>\n<p>Like the prolific jazz trumpeter and composer, researchers have been generating AI models at a feverish pace, exploring new architectures and use cases. Focused on plowing new ground, they sometimes leave to others the job of categorizing their work.<\/p>\n<p>A team of more than a hundred Stanford researchers collaborated to do just that in a 214-page <a href=\"https:\/\/arxiv.org\/abs\/2108.07258\">paper<\/a> released in the summer of 2021.<\/p>\n<figure id=\"attachment_62885\" aria-describedby=\"caption-attachment-62885\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Transformer-apps.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-62885\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Transformer-apps-672x459.jpg\" alt=\"2021 paper reports on applications of foundation models\" width=\"672\" height=\"459\"><\/a><figcaption id=\"caption-attachment-62885\" class=\"wp-caption-text\">In a 2021 paper, researchers reported that foundation models are finding a wide array of uses.<\/figcaption><\/figure>\n<p>They said <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/25\/what-is-a-transformer-model\/\">transformer models<\/a>, <a href=\"https:\/\/blogs.nvidia.com\/blog\/2023\/01\/26\/what-are-large-language-models-used-for\/\">large language models<\/a> (LLMs) and other neural networks still being built are part of an important new category they dubbed foundation models.<\/p>\n<h2><b>Foundation Models Defined<\/b><\/h2>\n<p>A foundation model is an AI neural network \u2014 trained on mountains of raw data, generally with <a href=\"https:\/\/blogs.nvidia.com\/blog\/2018\/08\/02\/supervised-unsupervised-learning\/\">unsupervised learning<\/a> \u2014 that can be adapted to accomplish a broad range of tasks, the paper said.<\/p>\n<p>\u201cThe sheer scale and scope of foundation models from the last few years have stretched our imagination of what\u2019s possible,\u201d they wrote.<\/p>\n<p>Two important concepts help define this umbrella category: Data gathering is easier, and opportunities are as wide as the horizon.<\/p>\n<h2><b>No Labels, Lots of Opportunity<\/b><\/h2>\n<p>Foundation models generally learn from unlabeled datasets, saving the time and expense of manually describing each item in massive collections.<\/p>\n<p>Earlier neural networks were narrowly tuned for specific tasks. With a little fine-tuning, foundation models can handle jobs from translating text to analyzing medical images.<\/p>\n<p>Foundation models are demonstrating \u201cimpressive behavior,\u201d and they\u2019re being deployed at scale, the group said on the website of its <a href=\"https:\/\/crfm.stanford.edu\/\">research center<\/a> formed to study them. So far, they\u2019ve posted <a href=\"https:\/\/crfm.stanford.edu\/research.html\">more than 50 papers<\/a> on foundation models from in-house researchers alone.<\/p>\n<p>\u201cI think we\u2019ve uncovered a very small fraction of the capabilities of existing foundation models, let alone future ones,\u201d said Percy Liang, the center\u2019s director, in the opening talk of the <a href=\"https:\/\/www.youtube.com\/watch?v=dG628PEN1fY\">first workshop<\/a> on foundation models.<\/p>\n<h2><b>AI\u2019s Emergence and Homogenization<\/b><\/h2>\n<p>In that talk, Liang coined two terms to describe foundation models:<\/p>\n<p><i>Emergence<\/i> refers to AI features still being discovered, such as the many nascent skills in foundation models. He calls the blending of AI algorithms and model architectures <i>homogenization<\/i>, a trend that helped form foundation models. (See chart below.)<\/p>\n<p><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Transformer-timeline.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-large wp-image-62888\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Transformer-timeline-672x143.jpg\" alt=\"Timeline for AI and foundation models\" width=\"672\" height=\"143\"><\/a>The field continues to move fast.<\/p>\n<p>A year after the group defined foundation models, other tech watchers coined a related term \u2014 <a href=\"https:\/\/developer.nvidia.com\/blog\/category\/generative-ai\/\">generative AI<\/a>. It\u2019s an umbrella term for transformers, large language models, diffusion models and other neural networks capturing people\u2019s imaginations because they can create text, images, music, software and more.<\/p>\n<p>Generative AI has the potential to yield trillions of dollars of economic value, said executives from the venture firm Sequoia Capital who shared their views in a recent <a href=\"https:\/\/soundcloud.com\/theaipodcast\/sequoia-capitals-pat-grady-and-sonya-huang-on-generative-ai-ep-187\">AI Podcast<\/a>.<\/p>\n<h2><b>A Brief History of Foundation Models<\/b><\/h2>\n<p>\u201cWe are in a time where simple methods like neural networks are giving us an explosion of new capabilities,\u201d said Ashish Vaswani, an entrepreneur and former senior staff research scientist at Google Brain who led work on the seminal 2017 <a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">paper<\/a> on transformers.<\/p>\n<p>That work inspired researchers who created BERT and other <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/10\/10\/llms-ai-horizon\/\">large language models<\/a>, making 2018 \u201ca watershed moment\u201d for natural language processing, a <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/12\/key-breakthroughs-ai-ml-2018-trends-2019\/\">report on AI<\/a> said at the end of that year.<\/p>\n<p>Google released BERT as <a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">open-source software<\/a>, spawning a family of follow-ons and setting off a race to build ever larger, more powerful LLMs. Then it applied the technology to its search engine so users could ask questions in simple sentences.<\/p>\n<p>In 2020, researchers at OpenAI announced another landmark transformer, <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\">GPT-3<\/a>. Within weeks, people were using it to create poems, programs, songs, websites and more.<\/p>\n<p>\u201cLanguage models have a wide range of beneficial applications for society,\u201d the researchers wrote.<\/p>\n<p>Their work also showed how large and compute-intensive these models can be. GPT-3 was trained on a dataset with nearly a trillion words, and it sports a whopping 175 billion parameters, a key measure of the power and complexity of neural networks.<\/p>\n<figure id=\"attachment_62891\" aria-describedby=\"caption-attachment-62891\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Compute-for-Training-LLMs-GPT3-paper.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-62891\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Compute-for-Training-LLMs-GPT3-paper-672x385.jpg\" alt=\"Compute needs for foundation models like large language models\" width=\"672\" height=\"385\"><\/a><figcaption id=\"caption-attachment-62891\" class=\"wp-caption-text\">The growth in compute demands for foundation models. (Source: <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\">GPT-3 paper<\/a>)<\/figcaption><\/figure>\n<p>\u201cI just remember being kind of blown away by the things that it could do,\u201d said Liang, speaking of GPT-3 in <a href=\"https:\/\/web.stanford.edu\/class\/cs224u\/podcast\/liang\/\">a podcast<\/a>.<\/p>\n<p>The latest iteration, ChatGPT \u2014 trained on 10,000 NVIDIA GPUs \u2014 is even more engaging, attracting over 100 million users in just two months. Its release has been called the iPhone moment for AI because it helped so many people see how they could use the technology.<\/p>\n<figure id=\"attachment_62894\" aria-describedby=\"caption-attachment-62894\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Road-to-ChatGPT-crop-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-62894\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Road-to-ChatGPT-crop-504x500.jpg\" alt=\"Timeline from early AI to ChatGPT\" width=\"504\" height=\"500\"><\/a><figcaption id=\"caption-attachment-62894\" class=\"wp-caption-text\">One timeline describes the path from early AI research to ChatGPT. (Source: blog.bytebytego.com)<\/figcaption><\/figure>\n<h2><b>From Text to Images<\/b><\/h2>\n<p>About the same time ChatGPT debuted, another class of neural networks, called diffusion models, made a splash. Their ability to turn text descriptions into artistic images attracted casual users to create amazing images that went viral on social media.<\/p>\n<p>The first <a href=\"https:\/\/arxiv.org\/pdf\/1503.03585.pdf\">paper<\/a> to describe a diffusion model arrived with little fanfare in 2015. But like transformers, the new technique soon caught fire.<\/p>\n<p>Researchers posted more than 200 papers on diffusion models last year, according to <a href=\"https:\/\/scorebasedgenerativemodeling.github.io\/\">a list<\/a> maintained by James Thornton, an AI researcher at the University of Oxford.<\/p>\n<p>In <a href=\"https:\/\/twitter.com\/DavidSHolz\/status\/1595253685529251840\/photo\/1\">a tweet<\/a>, Midjourney CEO David Holz revealed that his diffusion-based, text-to-image service has more than 4.4 million users. Serving them requires more than 10,000 NVIDIA GPUs mainly for AI inference, he said in <a href=\"https:\/\/stratechery.com\/2022\/an-interview-with-midjourney-founder-david-holz-about-generative-ai-vr-and-silicon-valley\/\">an interview<\/a> (subscription required).<\/p>\n<h2><b>Dozens of Models in Use<\/b><\/h2>\n<p>Hundreds of foundation models are now available. One <a href=\"https:\/\/arxiv.org\/pdf\/2302.07730.pdf\">paper<\/a> catalogs and classifies more than 50 major transformer models alone (see chart below).<\/p>\n<p>The Stanford group benchmarked 30 foundation models, noting the field is moving so fast they did not review some new and prominent ones.<\/p>\n<p>Startup <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/10\/05\/ai-large-language-models-triton\/\">NLP Cloud<\/a>, a member of the <a href=\"https:\/\/www.nvidia.com\/en-us\/startups\/\">NVIDIA Inception<\/a> program that nurtures cutting-edge startups, says it uses about 25 large language models in a commercial offering that serves airlines, pharmacies and other users. Experts expect that a growing share of the models will be made open source on sites like Hugging Face\u2019s <a href=\"https:\/\/huggingface.co\/docs\/hub\/models-the-hub\">model hub<\/a>.<\/p>\n<figure id=\"attachment_62897\" aria-describedby=\"caption-attachment-62897\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Open-source-lang-models.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-62897\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Open-source-lang-models-672x385.jpg\" alt=\"A list of foundation models released as open source\" width=\"672\" height=\"385\"><\/a><figcaption id=\"caption-attachment-62897\" class=\"wp-caption-text\">Experts note a rising trend toward releasing foundation models as open source.<\/figcaption><\/figure>\n<p>Foundation models keep getting larger and more complex, too.<\/p>\n<p>That\u2019s why \u2014 rather than building new models from scratch \u2014 many businesses are already customizing pretrained foundation models to turbocharge their journeys into AI.<\/p>\n<h2><b>Foundations in the Cloud<\/b><\/h2>\n<p>One venture capital firm lists <a href=\"https:\/\/www.scalevp.com\/blog\/generative-ai-index-use-case-glossary\">33 use cases<\/a> for generative AI, from ad generation to semantic search.<\/p>\n<p>Major cloud services have been using foundation models for some time. For example, Microsoft Azure worked with NVIDIA to implement a transformer for its <a href=\"https:\/\/translator.microsoft.com\/\">Translator<\/a> service. It <a href=\"https:\/\/blogs.nvidia.com\/blog\/2022\/03\/22\/microsoft-translator-triton-inference\/\">helped disaster workers<\/a> understand Haitian Creole while they were responding to a 7.0 earthquake.<\/p>\n<p>In February, Microsoft <a href=\"https:\/\/blogs.microsoft.com\/blog\/2023\/02\/07\/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web\/\">announced plans<\/a> to enhance its browser and search engine with ChatGPT and related innovations. \u201cWe think of these tools as an AI copilot for the web,\u201d the announcement said.<\/p>\n<p>Google announced <a href=\"https:\/\/blog.google\/technology\/ai\/bard-google-ai-search-updates\/\">Bard<\/a>, an experimental conversational AI service. It plans to plug many of its products into the power of its foundation models like LaMDA, PaLM, Imagen and MusicLM.<\/p>\n<p>\u201cAI is the most profound technology we are working on today,\u201d the company\u2019s blog wrote.<\/p>\n<h2><b>Startups Get Traction, Too<\/b><\/h2>\n<p>Startup Jasper expects to log $75 million in annual revenue from products that write copy for companies like VMware. It\u2019s leading a field of more than a dozen companies that generate text, including Writer, an NVIDIA Inception member.<\/p>\n<p>Other Inception members in the field include Tokyo-based <a href=\"https:\/\/rinna.id\/\">rinna<\/a> that\u2019s created chatbots used by millions in Japan. In Tel Aviv, <a href=\"https:\/\/www.tabnine.com\/\">Tabnine<\/a> runs a generative AI service that\u2019s automated up to 30% of the code written by a million developers globally.<\/p>\n<h2><b>A Platform for Healthcare<\/b><\/h2>\n<p>Researchers at startup Evozyne used foundation models in <a href=\"https:\/\/www.nvidia.com\/en-us\/gpu-cloud\/bionemo\/\">NVIDIA BioNeMo<\/a> to <a href=\"https:\/\/blogs.nvidia.com\/blog\/2023\/01\/12\/generative-ai-proteins-evozyne\/\">generate two new proteins<\/a>. One could treat a rare disease and another could help capture carbon in the atmosphere.<\/p>\n<figure id=\"attachment_62903\" aria-describedby=\"caption-attachment-62903\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Evozyne-diagram-NEW-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-62903\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2023\/03\/Evozyne-diagram-NEW-672x200.jpg\" alt=\"Diagram of foundation models that generate proteins\" width=\"672\" height=\"200\"><\/a><figcaption id=\"caption-attachment-62903\" class=\"wp-caption-text\">Evozyne and NVIDIA described a hybrid foundation model for creating proteins in a <a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2023.01.23.525232v1\">joint paper<\/a>.<\/figcaption><\/figure>\n<p>BioNeMo, a software platform and cloud service for generative AI in drug discovery, offers tools to train, run inference and deploy custom biomolecular AI models. It includes <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/04\/12\/ai-drug-discovery-astrazeneca-university-florida-health\/\">MegaMolBART<\/a>, a generative AI model for chemistry developed by NVIDIA and AstraZeneca.<\/p>\n<p>\u201cJust as AI language models can learn the relationships between words in a sentence, our aim is that neural networks trained on molecular structure data will be able to learn the relationships between atoms in real-world molecules,\u201d said Ola Engkvist, head of molecular AI, discovery sciences and R&amp;D at AstraZeneca, when the work <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/04\/12\/ai-drug-discovery-astrazeneca-university-florida-health\/\">was announced<\/a>.<\/p>\n<p>Separately, the University of Florida\u2019s academic health center collaborated with NVIDIA researchers to create <a href=\"https:\/\/ufhealth.org\/news\/2021\/university-florida-health-nvidia-develop-artificial-intelligence-model-hasten-clinical\">GatorTron<\/a>. The large language model aims to extract insights from massive volumes of clinical data to accelerate medical research.<\/p>\n<p>A Stanford center is applying the latest <a href=\"https:\/\/hai.stanford.edu\/news\/could-stable-diffusion-solve-gap-medical-imaging-data\">diffusion models<\/a> to advance medical imaging. NVIDIA also helps healthcare companies and hospitals use <a href=\"https:\/\/www.nvidia.com\/en-us\/clara\/medical-devices\/\">AI in medical imaging<\/a>, speeding diagnosis of deadly diseases.<\/p>\n<h2><b>AI Foundations for Business<\/b><\/h2>\n<p>Another new framework, <a href=\"https:\/\/developer.nvidia.com\/nvidia-nemo\">NVIDIA NeMo Megatron<\/a>, aims to let any business create its own billion- or trillion-parameter transformers to power custom chatbots, personal assistants and other AI applications.<\/p>\n<p>It created the 530-parameter Megatron-Turing Natural Language Generation model (<a href=\"https:\/\/developer.nvidia.com\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/\">MT-NLG<\/a>) that powers TJ, the Toy Jensen avatar that gave <a href=\"https:\/\/www.youtube.com\/watch?v=39ubNuxnrK8&amp;t=3600s\">part of the keynote<\/a> at NVIDIA GTC last year.<\/p>\n<p>Foundation models \u2014 connected to 3D platforms like <a href=\"https:\/\/www.nvidia.com\/en-us\/omniverse\/\">NVIDIA Omniverse<\/a> \u2014 will be key to simplifying development of the <a href=\"https:\/\/blogs.nvidia.com\/blog\/2021\/08\/10\/what-is-the-metaverse\/\">metaverse<\/a>, the 3D evolution of the internet. These models will power applications and assets for entertainment and industrial users.<\/p>\n<p>Factories and warehouses are already applying foundation models inside digital twins, realistic simulations that help find more efficient ways to work.<\/p>\n<p>Foundation models can ease the job of training autonomous vehicles and robots that assist humans on factory floors and logistics centers like the one described below.<\/p>\n<\/p>\n<p>New uses for foundation models are emerging daily, as are challenges in applying them.<\/p>\n<p>Several papers on foundation and generative AI models describing risks such as:<\/p>\n<ul>\n<li>amplifying bias implicit in the massive datasets used to train models,<\/li>\n<li>introducing inaccurate or misleading information in images or videos, and<\/li>\n<li>violating intellectual property rights of existing works.<\/li>\n<\/ul>\n<p>\u201cGiven that future AI systems will likely rely heavily on foundation models, it is imperative that we, as a community, come together to develop more rigorous principles for foundation models and guidance for their responsible development and deployment,\u201d said the Stanford paper on foundation models.<\/p>\n<p>Current ideas for safeguards include filtering prompts and their outputs, recalibrating models on the fly and scrubbing massive datasets.<\/p>\n<p>\u201cThese are issues we\u2019re working on as a research community,\u201d said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. \u201cFor these models to be truly widely deployed, we have to invest a lot in safety.\u201d<\/p>\n<p>It\u2019s one more field AI researchers and developers are plowing as they create the future.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/2023\/03\/13\/what-are-foundation-models\/<\/p>\n","protected":false},"author":0,"featured_media":2908,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2907"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=2907"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/2907\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/2908"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=2907"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=2907"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=2907"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}