{"id":4079,"date":"2025-08-05T17:50:31","date_gmt":"2025-08-05T17:50:31","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2025\/08\/05\/openais-new-open-models-accelerated-locally-on-nvidia-geforce-rtx-and-rtx-pro-gpus\/"},"modified":"2025-08-05T17:50:31","modified_gmt":"2025-08-05T17:50:31","slug":"openais-new-open-models-accelerated-locally-on-nvidia-geforce-rtx-and-rtx-pro-gpus","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2025\/08\/05\/openais-new-open-models-accelerated-locally-on-nvidia-geforce-rtx-and-rtx-pro-gpus\/","title":{"rendered":"OpenAI\u2019s New Open Models Accelerated Locally on NVIDIA GeForce RTX and RTX PRO GPUs"},"content":{"rendered":"<div>\n\t\t<span class=\"bsf-rt-reading-time\"><span class=\"bsf-rt-display-label\"><\/span> <span class=\"bsf-rt-display-time\"><\/span> <span class=\"bsf-rt-display-postfix\"><\/span><\/span><\/p>\n<p>In collaboration with OpenAI, NVIDIA has optimized the company\u2019s new open-source gpt-oss models for NVIDIA GPUs, delivering smart, fast inference from the cloud to the PC. These new <a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/ai-reasoning\/\" rel=\"noopener\">reasoning<\/a> models enable <a href=\"https:\/\/blogs.nvidia.com\/blog\/what-is-agentic-ai\/\">agentic AI<\/a> applications such as web search, in-depth research and many more.<\/p>\n<p>With the launch of gpt-oss-20b and gpt-oss-120b, OpenAI has opened cutting-edge models to millions of users. AI enthusiasts and developers can use the optimized models on NVIDIA RTX AI PCs and workstations through popular tools and frameworks like Ollama, llama.cpp and Microsoft AI Foundry Local, and expect performance of up to 256 tokens per second on the NVIDIA GeForce RTX 5090 GPU.<\/p>\n<p>\u201cOpenAI showed the world what could be built on NVIDIA AI \u2014 and now they\u2019re advancing innovation in open-source software,\u201d said Jensen Huang, founder and CEO of NVIDIA. \u201cThe gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI \u2014 all on the world\u2019s largest AI compute infrastructure.\u201d<\/p>\n<p>The models\u2019 release highlights NVIDIA\u2019s AI leadership from training to inference and from cloud to AI PC.<\/p>\n<h2><strong>Open for All <\/strong><\/h2>\n<p>Both gpt-oss-20b and gpt-oss-120b are flexible, open-weight reasoning models with chain-of-thought capabilities and adjustable reasoning effort levels using the popular mixture-of-experts architecture. The models are designed to support features like instruction-following and tool use, and were trained on <a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/\" rel=\"noopener\">NVIDIA H100 GPUs<\/a>. <span class=\"TextRun SCXW24568032 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><\/span><a class=\"Hyperlink SCXW24568032 BCX0\" href=\"https:\/\/developer.nvidia.com\/blog\/delivering-1-5-m-tps-inference-on-nvidia-gb200-nvl72-nvidia-accelerates-openai-gpt-oss-models-from-cloud-to-edge\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span class=\"TextRun Underlined SCXW24568032 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><\/span><\/a><span class=\"TextRun SCXW24568032 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><\/span><\/p>\n<p>These models can support up to 131,072 context lengths, among the longest available in local inference. This means the models can reason through context problems, ideal for tasks such as web search, coding assistance, document comprehension and in-depth research.<\/p>\n<p>The OpenAI open models are the first MXFP4 models supported on NVIDIA RTX. MXFP4 allows for high model quality, offering fast, efficient performance while requiring fewer resources compared with other precision types.<\/p>\n<h2><strong>Run the OpenAI Models on NVIDIA RTX With Ollama<\/strong><\/h2>\n<p>The easiest way to test these models on RTX AI PCs, on GPUs with at least 24GB of VRAM, is using the new Ollama app. Ollama is popular with AI enthusiasts and developers for its ease of integration, and the new user interface (UI) includes out-of-the-box support for OpenAI\u2019s open-weight models. Ollama is fully optimized for RTX, making it ideal for consumers looking to experience the power of personal AI on their PC or workstation.<\/p>\n<p>Once installed, Ollama enables quick, easy chatting with the models. Simply select the model from the dropdown menu and send a message. Because Ollama is optimized for RTX, there are no additional configurations or commands required to ensure top performance on supported GPUs.<\/p>\n<figure id=\"attachment_83414\" aria-describedby=\"caption-attachment-83414\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2025\/08\/rtx-ai-garage-3-steps-20b_2-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-83414\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2025\/08\/rtx-ai-garage-3-steps-20b_2-1680x893.jpg\" alt=\"\" width=\"1680\" height=\"893\"><\/a><figcaption id=\"caption-attachment-83414\" class=\"wp-caption-text\">Testing OpenAI\u2019s open models in Ollama is easy.<\/figcaption><\/figure>\n<p>Ollama\u2019s new app includes other new features, like easy support for PDF or text files within chats, multimodal support on applicable models so users can include images in their prompts, and easily customizable context lengths when working with large documents or chats.<\/p>\n<p>Developers can also use Ollama via command line interface or the app\u2019s software development kit (SDK) to power their applications and workflows.<\/p>\n<h2><strong>Other Ways to Use the New OpenAI Models on RTX<\/strong><\/h2>\n<p>Enthusiasts and developers can also try the gpt-oss models on RTX AI PCs through various other applications and frameworks, all powered by RTX, on GPUs that have at least 16GB of VRAM.<\/p>\n<p>NVIDIA continues to collaborate with the open-source community on both llama.cpp and the GGML tensor library to optimize performance on RTX GPUs. Recent contributions include implementing <a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/blog\/cuda-graphs\/\" rel=\"noopener\">CUDA Graphs<\/a> to reduce overhead and adding algorithms that reduce CPU overheads. Check out the <a target=\"_blank\" href=\"https:\/\/github.com\/ggml-org\/llama.cpp\" rel=\"noopener\">llama.cpp GitHub repository<\/a> to get started.<\/p>\n<figure id=\"attachment_83402\" aria-describedby=\"caption-attachment-83402\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2025\/08\/oai-perf.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-83402\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2025\/08\/oai-perf.png\" alt=\"RTX performance for OpenAI's new open models.\" width=\"624\" height=\"351\"><\/a><figcaption id=\"caption-attachment-83402\" class=\"wp-caption-text\">Overall performance of the gpt-oss-20b model on various RTX AI PCs.<\/figcaption><\/figure>\n<p>Windows developers can also access OpenAI\u2019s new models via <a target=\"_blank\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/foundry-local\/get-started\" rel=\"noopener\">Microsoft AI Foundry Local<\/a>, currently in public preview. Foundry Local is an on-device AI inferencing solution that integrates into workflows via the command line, SDK or application programming interfaces. Foundry Local uses ONNX Runtime, optimized through CUDA, with support for <a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/blog\/nvidia-tensorrt-for-rtx-introduces-an-optimized-inference-ai-library-on-windows\/\" rel=\"noopener\">NVIDIA TensorRT for RTX<\/a> coming soon. Getting started is easy: install Foundry Local and invoke \u201cFoundry model run gpt-oss-20b\u201d in a terminal.<\/p>\n<p>The release of these open-source models kicks off the next wave of AI innovation from enthusiasts and developers looking to add reasoning to their AI-accelerated Windows applications.<\/p>\n<p><em>Each week, the <\/em><a href=\"https:\/\/blogs.nvidia.com\/blog\/tag\/rtx-ai-garage\/\"><em>RTX AI Garage<\/em><\/a> <em>blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building <\/em><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/ai-agents\/\" rel=\"noopener\"><em>AI agents<\/em><\/a><em>, creative workflows, productivity apps and more on AI PCs and workstations. <\/em><\/p>\n<p><em>Plug in to NVIDIA AI PC on <\/em><a target=\"_blank\" href=\"https:\/\/www.facebook.com\/NVIDIA.AI.PC\/\" rel=\"noopener\"><em>Facebook<\/em><\/a><em>, <\/em><a target=\"_blank\" href=\"https:\/\/www.instagram.com\/nvidia.ai.pc\/\" rel=\"noopener\"><em>Instagram<\/em><\/a><em>, <\/em><a target=\"_blank\" href=\"https:\/\/www.tiktok.com\/@nvidia_ai_pc\" rel=\"noopener\"><em>TikTok<\/em><\/a><em> and <\/em><a target=\"_blank\" href=\"https:\/\/x.com\/NVIDIA_AI_PC\" rel=\"noopener\"><em>X<\/em><\/a><em> \u2014 and stay informed by subscribing to the <\/em><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/ai-on-rtx\/?modal=subscribe-ai\" rel=\"noopener\"><em>RTX AI PC newsletter<\/em><\/a><em>. Join NVIDIA\u2019s <\/em><a target=\"_blank\" href=\"https:\/\/discord.gg\/taH4gkMt\" rel=\"noopener\"><em>Discord server<\/em><\/a><em> to connect with community developers and AI enthusiasts for discussions on what\u2019s possible with RTX AI.<\/em><\/p>\n<p><em>Follow NVIDIA Workstation on <\/em><a target=\"_blank\" href=\"https:\/\/www.linkedin.com\/showcase\/3761136\/\" rel=\"noopener\"><em>LinkedIn<\/em><\/a><em> and <\/em><a target=\"_blank\" href=\"https:\/\/x.com\/NVIDIAworkstatn\" rel=\"noopener\"><em>X<\/em><\/a><em>. <\/em><\/p>\n<p><em>See <\/em><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-eu\/about-nvidia\/terms-of-service\/\" rel=\"noopener\"><em>notice<\/em><\/a><em> regarding software product information.<\/em><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/rtx-ai-garage-openai-oss\/<\/p>\n","protected":false},"author":0,"featured_media":4080,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4079"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=4079"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4079\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/4080"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=4079"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=4079"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=4079"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}