{"id":3417,"date":"2024-04-10T15:20:40","date_gmt":"2024-04-10T15:20:40","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2024\/04\/10\/the-building-blocks-of-ai-decoding-the-role-and-significance-of-foundation-models\/"},"modified":"2024-04-10T15:20:40","modified_gmt":"2024-04-10T15:20:40","slug":"the-building-blocks-of-ai-decoding-the-role-and-significance-of-foundation-models","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2024\/04\/10\/the-building-blocks-of-ai-decoding-the-role-and-significance-of-foundation-models\/","title":{"rendered":"The Building Blocks of AI: Decoding the Role and Significance of Foundation Models"},"content":{"rendered":"<div id=\"bsf_rt_marker\">\n<p><i>Editor\u2019s note: This post is part of the <\/i><a href=\"https:\/\/blogs.nvidia.com\/blog\/tag\/ai-decoded\/\"><i>AI Decoded series<\/i><\/a><i>, which demystifies AI by making the technology more accessible, and which showcases new hardware, software, tools and accelerations for RTX PC users.<\/i><\/p>\n<p>Skyscrapers start with strong foundations. The same goes for apps powered by AI.<\/p>\n<p>A <a href=\"https:\/\/blogs.nvidia.com\/blog\/what-are-foundation-models\/\">foundation model<\/a> is an AI neural network trained on immense amounts of raw data, generally with <a href=\"https:\/\/blogs.nvidia.com\/blog\/supervised-unsupervised-learning\/\">unsupervised learning<\/a>.<\/p>\n<p>It\u2019s a type of artificial intelligence model trained to understand and generate human-like language. Imagine giving a computer a huge library of books to read and learn from, so it can understand the context and meaning behind words and sentences, just like a human does.<\/p>\n<figure id=\"attachment_71041\" aria-describedby=\"caption-attachment-71041\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/foundation-model.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-71041 size-large\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/foundation-model-672x457.jpg\" alt=\"\" width=\"672\" height=\"457\"><\/a><figcaption id=\"caption-attachment-71041\" class=\"wp-caption-text\">Foundation models.<\/figcaption><\/figure>\n<p>A foundation model\u2019s deep knowledge base and ability to communicate in natural language make it useful for a broad range of applications, including text generation and summarization, copilot production and computer code analysis, image and video creation, and audio transcription and speech synthesis.<\/p>\n<p>ChatGPT, one of the most notable generative AI applications, is a chatbot built with OpenAI\u2019s GPT foundation model. Now in its fourth version, GPT-4 is a large multimodal model that can ingest text or images and generate text or image responses.<\/p>\n<p>Online apps built on foundation models typically access the models from a data center. But many of these models, and the applications they power, can now run locally on PCs and workstations with <a href=\"https:\/\/www.nvidia.com\/en-us\/geforce\/\">NVIDIA GeForce<\/a> and <a href=\"https:\/\/www.nvidia.com\/en-us\/design-visualization\/technologies\/rtx\/\">NVIDIA RTX<\/a> GPUs.<\/p>\n<h2><strong>Foundation Model Uses<\/strong><\/h2>\n<p>Foundation models can perform a variety of functions, including:<\/p>\n<ul>\n<li>Language processing: understanding and generating text<\/li>\n<li>Code generation: analyzing and debugging computer code in many programming languages<\/li>\n<li>Visual processing: analyzing and generating images<\/li>\n<li>Speech: generating text to speech and transcribing speech to text<\/li>\n<\/ul>\n<p>They can be used as is or with further refinement. Rather than training an entirely new AI model for each generative AI application \u2014\u00a0a costly and time-consuming endeavor \u2014 users commonly fine-tune foundation models for specialized use cases.<\/p>\n<p>Pretrained foundation models are remarkably capable, thanks to prompts and data-retrieval techniques like <a href=\"https:\/\/blogs.nvidia.com\/blog\/what-is-retrieval-augmented-generation\/\">retrieval-augmented generation<\/a>, or RAG. Foundation models also excel at <a href=\"https:\/\/blogs.nvidia.com\/blog\/what-is-transfer-learning\/\">transfer learning<\/a>, which means they can be trained to perform a second task related to their original purpose.<\/p>\n<p>For example, a general-purpose large language model (LLM) designed to converse with humans can be further trained to act as a customer service chatbot capable of answering inquiries using a corporate knowledge base.<\/p>\n<p>Enterprises across industries are fine-tuning foundation models to get the best performance from their AI applications.<\/p>\n<h2><strong>Types of Foundation Models<\/strong><\/h2>\n<p>More than 100 foundation models are in use \u2014 a number that continues to grow. LLMs and image generators are the two most popular types of foundation models. And many of them are free for anyone to try \u2014 on any hardware \u2014 in the <a href=\"https:\/\/build.nvidia.com\/explore\/discover\">NVIDIA API Catalog<\/a>.<\/p>\n<p>LLMs are models that understand natural language and can respond to queries. Google\u2019s <a href=\"https:\/\/build.nvidia.com\/google\/gemma-7b\">Gemma<\/a> is one example; it excels at text comprehension, transformation and code generation. When asked about the astronomer Cornelius Gemma, it shared that his \u201ccontributions to celestial navigation and astronomy significantly impacted scientific progress.\u201d It also provided information on his key achievements, legacy and other facts.<\/p>\n<p>Extending the collaboration <a href=\"https:\/\/blogs.nvidia.com\/blog\/google-gemma-llm-rtx-ai-pc\/\">of the Gemma models<\/a>, accelerated with the NVIDIA TensorRT-LLM on RTX GPUs, <a href=\"https:\/\/developers.googleblog.com\/2024\/04\/gemma-family-expands.html\">Google\u2019s CodeGemma<\/a> brings powerful yet lightweight coding capabilities to the community. CodeGemma models are available as 7B and 2B pretrained variants that specialize in code completion and code generation tasks.<\/p>\n<p>MistralAI\u2019s <a href=\"https:\/\/build.nvidia.com\/mistralai\/mistral-7b-instruct-v2\">Mistral<\/a> LLM can follow instructions, complete requests and generate creative text. In fact, it helped brainstorm the headline for this blog, including the requirement that it use a variation of the series\u2019 name \u201cAI Decoded,\u201d and it assisted in writing the definition of a foundation model.<\/p>\n<figure id=\"attachment_71034\" aria-describedby=\"caption-attachment-71034\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/Hello-world-indeed.png\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-71034 size-full\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/Hello-world-indeed.png\" alt=\"\" width=\"589\" height=\"275\"><\/a><figcaption id=\"caption-attachment-71034\" class=\"wp-caption-text\">Hello, world, indeed.<\/figcaption><\/figure>\n<p>Meta\u2019s <a href=\"https:\/\/build.nvidia.com\/meta\/llama2-70b\">Llama 2<\/a> is a cutting-edge LLM that generates text and code in response to prompts.<\/p>\n<p>Mistral and Llama 2 are available in the <a href=\"https:\/\/www.nvidia.com\/en-ph\/ai-on-rtx\/chat-with-rtx-generative-ai\/\">NVIDIA ChatRTX<\/a> tech demo, running on RTX PCs and workstations. ChatRTX lets users personalize these foundation models by connecting them to personal content \u2014 such as documents, doctors\u2019 notes and other data \u2014 through RAG. It\u2019s accelerated by <a href=\"https:\/\/blogs.nvidia.com\/blog\/ai-decoded-tensorrt-stable-diffusion-automatic1111\">TensorRT-LLM<\/a> for quick, contextually relevant answers. And because it runs locally, results are fast and secure.<\/p>\n<p>Image generators like StabilityAI\u2019s <a href=\"https:\/\/build.nvidia.com\/stabilityai\/stable-diffusion-xl\">Stable Diffusion XL<\/a> and <a href=\"https:\/\/build.nvidia.com\/stabilityai\/sdxl-turbo\">SDXL Turbo<\/a> let users generate images and stunning, realistic visuals. StabilityAI\u2019s video generator, <a href=\"https:\/\/build.nvidia.com\/stabilityai\/stable-video-diffusion\">Stable Video Diffusion<\/a>, uses a generative diffusion model to synthesize video sequences with a single image as a conditioning frame.<\/p>\n<p>Multimodal foundation models can simultaneously process more than one type of data \u2014 such as text and images \u2014\u00a0to generate more sophisticated outputs.<\/p>\n<p>A multimodal model that works with both text and images could let users upload an image and ask questions about it. These types of models are quickly working their way into real-world applications like customer service, where they can serve as faster, more user-friendly versions of traditional manuals.<\/p>\n<figure id=\"attachment_71044\" aria-describedby=\"caption-attachment-71044\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/image.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-71044\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2024\/04\/image-672x334.png\" alt=\"\" width=\"672\" height=\"334\"><\/a><figcaption id=\"caption-attachment-71044\" class=\"wp-caption-text\">Many foundation models are free to try \u2014 on any hardware \u2014 in the NVIDIA API Catalog.<\/figcaption><\/figure>\n<p><a href=\"https:\/\/build.nvidia.com\/microsoft\/microsoft-kosmos-2\">Kosmos 2<\/a> is Microsoft\u2019s groundbreaking multimodal model designed to understand and reason about visual elements in images.<\/p>\n<h2><strong>Think Globally, Run AI Models Locally\u00a0<\/strong><\/h2>\n<p>GeForce RTX and NVIDIA RTX GPUs can run foundation models locally.<\/p>\n<p>The results are fast and secure. Rather than relying on cloud-based services, users can harness apps like ChatRTX to process sensitive data on their local PC without sharing the data with a third party or needing an internet connection.<\/p>\n<p>Users can choose from a rapidly growing catalog of open foundation models to download and run on their own hardware. This lowers costs compared with using cloud-based apps and APIs, and it eliminates latency and network connectivity issues.\u00a0<i>Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what\u2019s new and what\u2019s next by subscribing to the <\/i><a href=\"https:\/\/www.nvidia.com\/en-us\/ai-on-rtx\/?modal=subscribe-ai\"><i>AI Decoded newsletter<\/i><\/a><i>.<\/i><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/ai-decoded-foundation-models\/<\/p>\n","protected":false},"author":0,"featured_media":3418,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3417"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=3417"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/3417\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/3418"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=3417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=3417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=3417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}