{"id":4521,"date":"2026-04-02T17:39:57","date_gmt":"2026-04-02T17:39:57","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2026\/04\/02\/from-rtx-to-spark-nvidia-accelerates-gemma-4-for-local-agentic-ai\/"},"modified":"2026-04-02T17:39:57","modified_gmt":"2026-04-02T17:39:57","slug":"from-rtx-to-spark-nvidia-accelerates-gemma-4-for-local-agentic-ai","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2026\/04\/02\/from-rtx-to-spark-nvidia-accelerates-gemma-4-for-local-agentic-ai\/","title":{"rendered":"From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI"},"content":{"rendered":"<div>\n<p><span data-contrast=\"none\">Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to\u00a0everyday devices.\u00a0As\u00a0these\u00a0models advance, their value increasingly depends on access to local, real-time context\u00a0that can\u00a0turn meaningful insights into action.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Designed for this shift,\u00a0Google\u2019s\u00a0latest\u00a0additions to the\u00a0<\/span>Gemma 4 family\u00a0<span data-contrast=\"none\">introduce\u00a0a class of small,\u00a0fast\u00a0and omni-capable models built for efficient local execution across a wide range of devices.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p>Google\u00a0<span data-contrast=\"none\">and NVIDIA\u00a0have\u00a0collaborated to\u00a0optimize\u00a0Gemma 4<\/span><b><span data-contrast=\"none\">\u00a0<\/span><\/b><span data-contrast=\"none\">for NVIDIA GPUs,\u00a0enabling efficient performance across a range of systems \u2014 from data center deployments to NVIDIA RTX-powered\u00a0PCs\u00a0and workstations,\u00a0the <a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/products\/workstations\/dgx-spark\/\" rel=\"noopener\">NVIDIA\u00a0DGX Spark<\/a>\u00a0personal\u00a0AI\u00a0supercomputer\u00a0and\u00a0<a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/jetson-nano\/product-development\/\" rel=\"noopener\">NVIDIA Jetson Orin Nano<\/a> edge AI modules.<\/span><\/p>\n<h2><b><span data-contrast=\"none\">Gemma 4:\u00a0Compact\u00a0Models Optimized for NVIDIA GPUs<\/span><\/b><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/h2>\n<p><span data-contrast=\"none\">The\u00a0latest additions to the\u00a0<\/span>Gemma 4\u00a0family\u00a0of open models<span data-contrast=\"none\">\u2014<\/span><span data-contrast=\"none\">\u00a0spanning E2B, E4B, 26B and 31B variants\u00a0<\/span><span data-contrast=\"none\">\u2014<\/span><span data-contrast=\"none\">\u00a0are\u00a0designed for efficient deployment from edge devices to high-performance GPUs.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<figure id=\"attachment_92036\" aria-describedby=\"caption-attachment-92036\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2026\/04\/gemma-4-perf-chart-desktop-light-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-92036\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2026\/04\/gemma-4-perf-chart-desktop-light-1.png\" alt=\"\" width=\"1149\" height=\"489\"><\/a><figcaption id=\"caption-attachment-92036\" class=\"wp-caption-text\">All configurations measured using Q4_K_M quantizations BS = 1, ISL = 4096 and OSL = 128 on NVIDIA GeForce RTX 5090 and Mac M3 Ultra desktops. Token generation throughput measured on llama.cpp b7789, using the llama-bench tool.<\/figcaption><\/figure>\n<p><span data-contrast=\"none\">This new generation of compact models\u00a0supports\u00a0a range of tasks,\u00a0including:<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<ul>\n<li><b><span data-contrast=\"none\">Reasoning:\u00a0<\/span><\/b><span data-contrast=\"none\">Strong performance\u00a0on complex problem-solving tasks.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Coding:\u00a0<\/span><\/b><span data-contrast=\"auto\">Code generation and\u00a0debugging for\u00a0developer workflows.\u00a0\u00a0<\/span><span data-ccp-props='{\"134233117\":false,\"134233118\":false,\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Agents:\u00a0<\/span><\/b><span data-contrast=\"auto\">Native support for structured tool use (function calling).\u00a0<\/span><span data-ccp-props='{\"134233117\":false,\"134233118\":false,\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Vision, Video and Audio\u00a0Capabilities:\u00a0<\/span><\/b><span data-contrast=\"auto\">E<\/span><span data-contrast=\"auto\">nables rich multimodal interactions for object recognition, automated speech recognition,\u00a0and\u00a0document\u00a0or\u00a0video\u00a0intelligence.<\/span><span data-ccp-props='{\"134233117\":false,\"134233118\":false,\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Interleaved Multimodal Input:\u00a0<\/span><\/b><span data-contrast=\"auto\">M<\/span><span data-contrast=\"auto\">ix text and images in any order within a single prompt.\u00a0<\/span><span data-ccp-props='{\"134233117\":false,\"134233118\":false,\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Multilingual:\u00a0<\/span><\/b><span data-contrast=\"auto\">Out-of-the-box support for 35+ languages, pretrained on 140+ languages.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"none\">The\u00a0<\/span>E2B and E4B models<span data-contrast=\"none\"> are built for ultraefficient, low-latency inference at the edge, running completely offline with near-zero latency across many devices including Jetson Nano modules.\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">The\u00a0<\/span>26B and 31B models<span data-contrast=\"none\">are designed for high-performance reasoning and developer-centric workflows, making them well suited for agentic AI.\u00a0Optimized\u00a0to deliver\u00a0state-of-the-art, accessible reasoning, these models run efficiently on NVIDIA RTX GPUs\u00a0and\u00a0DGX Spark\u00a0\u2014\u00a0powering development environments, coding\u00a0assistants\u00a0and agent-driven workflows.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">As local\u00a0agentic\u00a0AI\u00a0continues to gain\u00a0momentum,\u00a0applications\u00a0like\u00a0<\/span>OpenClaw<span data-contrast=\"none\">\u00a0are\u00a0enabling always-on AI assistants\u00a0on RTX PCs,\u00a0workstations\u00a0and DGX Spark.\u00a0The\u00a0latest\u00a0Gemma 4\u00a0models\u00a0are\u00a0compatible with\u00a0OpenClaw,\u00a0allowing\u00a0users\u00a0to\u00a0build capable local agents that draw context from personal files,\u00a0applications\u00a0and workflows to automate tasks.\u00a0Learn how to run\u00a0<\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/geforce\/news\/open-claw-rtx-gpu-dgx-spark-guide\/\" rel=\"noopener\"><span data-contrast=\"none\">OpenClaw for free on RTX GPUs and DGX Spark<\/span><\/a><span data-contrast=\"none\">\u00a0or\u00a0using\u00a0the\u00a0<\/span><a target=\"_blank\" href=\"https:\/\/build.nvidia.com\/spark\/openclaw\" rel=\"noopener\"><span data-contrast=\"none\">DGX Spark OpenClaw playbook<\/span><\/a><span data-contrast=\"auto\">.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<h2><b><span data-contrast=\"none\">Getting\u00a0Started:\u00a0Gemma 4\u00a0on\u00a0RTX\u00a0GPUs\u00a0and DGX Spark<\/span><\/b><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/h2>\n<p><span data-contrast=\"none\">NVIDIA\u00a0has collaborated with\u00a0Ollama\u00a0and\u00a0llama.cpp\u00a0to provide the best local deployment experience for each of the Gemma 4 models. \u00a0\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">To\u00a0use\u00a0Gemma 4 locally, users can\u00a0<\/span><span data-contrast=\"none\">download Ollama<\/span><span data-contrast=\"none\">\u00a0to run Gemma 4 models\u00a0<\/span><span data-contrast=\"none\">or<\/span><span data-contrast=\"none\">\u00a0install\u00a0<\/span><span data-contrast=\"none\">llama.cpp<\/span><span data-contrast=\"none\">\u00a0and pair it with the\u00a0Gemma 4 GGUF Hugging\u00a0Face checkpoint.\u00a0<\/span><span data-contrast=\"auto\">Additionally,\u00a0<\/span><span data-contrast=\"none\">Unsloth\u00a0provides day-one support with\u00a0optimized and quantized models\u00a0for\u00a0efficient local fine-tuning and deployment\u00a0via\u00a0Unsloth\u00a0Studio.\u00a0Start\u00a0<\/span><span data-contrast=\"auto\">running\u00a0and\u00a0<\/span><span data-contrast=\"none\">fine-tuning<\/span><span data-contrast=\"auto\">\u00a0Gemma 4\u00a0in\u00a0Unsloth\u00a0Studio today.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Running open models like the Gemma 4 family on NVIDIA GPUs\u00a0achieves\u00a0optimal\u00a0performance\u00a0because\u00a0NVIDIA Tensor Cores accelerate AI inference workloads\u00a0to deliver\u00a0higher throughput and lower latency for local execution.\u00a0Plus, the CUDA software stack ensures broad compatibility across leading frameworks and tools, enabling new models to run efficiently from day one.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">This combination allows open models like Gemma 4 to scale across a wide range of systems \u2014 from Jetson Orin Nano at the edge to RTX PCs, workstations and DGX Spark \u2014 without requiring extensive optimization.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Check out\u00a0<\/span><span data-contrast=\"none\">the\u00a0<\/span><a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/blog\/bringing-ai-closer-to-the-edge-and-on-device-with-gemma-4\/\" rel=\"noopener\"><span data-contrast=\"none\">NVIDIA technical blog<\/span><\/a><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"none\">for more details on how to get started with Gemma 4 on NVIDIA GPUs\u00a0and\u00a0learn more about<\/span><span data-contrast=\"none\">\u00a0NVIDIA\u2019s\u00a0work\u00a0on\u00a0<\/span><a href=\"https:\/\/blogs.nvidia.com\/blog\/ai-future-open-and-proprietary\/\"><span data-contrast=\"none\">open\u00a0models<\/span><\/a><span data-contrast=\"none\">.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<h2><b><span data-contrast=\"none\">#ICYMI: The Latest Updates\u00a0for RTX AI PCs<\/span><\/b><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/h2>\n<p><span data-contrast=\"none\">\u2728\u00a0Catch up on\u00a0<\/span><a href=\"https:\/\/blogs.nvidia.com\/blog\/rtx-ai-garage-gtc-2026-nemoclaw\"><span data-contrast=\"none\">RTX AI Garage<\/span><\/a><span data-contrast=\"none\">\u00a0blogs\u00a0for a host of agentic AI announcements\u00a0from NVIDIA GTC,\u00a0such as\u00a0new open models for local agents.\u00a0These models include\u00a0NVIDIA\u00a0Nemotron\u00a03 Nano 4B and\u00a0Nemotron\u00a03 Super 120B, and optimizations for Qwen 3.5 and Mistral Small 4.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">\u00a0NVIDIA\u00a0recently\u00a0introduced\u00a0<\/span><a target=\"_blank\" href=\"https:\/\/nvidianews.nvidia.com\/news\/nvidia-announces-nemoclaw\" rel=\"noopener\"><span data-contrast=\"none\">NVIDIA\u00a0NemoClaw,<\/span><\/a><span data-contrast=\"none\">\u00a0an\u00a0open\u00a0source\u00a0stack\u00a0that\u00a0optimizes\u00a0OpenClaw\u00a0experiences on NVIDIA devices by increasing security and supporting local models.\u00a0<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">????<\/span><\/b><b><span data-contrast=\"none\">\u00a0<\/span><\/b><a target=\"_blank\" href=\"https:\/\/accomplish.ai\/\" rel=\"noopener\"><span data-contrast=\"none\">Accomplish.ai<\/span><\/a><span data-contrast=\"none\">\u00a0announced Accomplish FREE, a no-cost version of its\u00a0open\u00a0source\u00a0desktop AI agent with built-in models. It\u00a0harnesses NVIDIA GPUs to run open\u00a0weight models locally, while a hybrid router dynamically balances workloads between local RTX hardware and the cloud\u00a0\u2014\u00a0enabling fast, private, zero-configuration execution without requiring an\u00a0application programming interface\u00a0key.<\/span><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><i><span data-contrast=\"none\">Plug in to NVIDIA AI PC on\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.facebook.com\/NVIDIA.AI.PC\/\" rel=\"noopener\"><i><span data-contrast=\"none\">Facebook<\/span><\/i><\/a><i><span data-contrast=\"none\">,\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.instagram.com\/nvidia.ai.pc\/\" rel=\"noopener\"><i><span data-contrast=\"none\">Instagram<\/span><\/i><\/a><i><span data-contrast=\"none\">,\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.tiktok.com\/@nvidia_ai_pc\" rel=\"noopener\"><i><span data-contrast=\"none\">TikTok<\/span><\/i><\/a><i><span data-contrast=\"none\">\u00a0and\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/x.com\/NVIDIA_AI_PC\" rel=\"noopener\"><i><span data-contrast=\"none\">X<\/span><\/i><\/a><i><span data-contrast=\"none\">\u00a0\u2014 and stay informed by subscribing to the\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/ai-on-rtx\/?modal=subscribe-ai\" rel=\"noopener\"><i><span data-contrast=\"none\">RTX AI PC newsletter<\/span><\/i><\/a><i><span data-contrast=\"none\">.<\/span><\/i><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<p><i><span data-contrast=\"none\">Follow NVIDIA Workstation on\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.linkedin.com\/showcase\/3761136\/\" rel=\"noopener\"><i><span data-contrast=\"none\">LinkedIn<\/span><\/i><\/a><i><span data-contrast=\"none\">\u00a0and\u00a0<\/span><\/i><a target=\"_blank\" href=\"https:\/\/x.com\/NVIDIAworkstatn\" rel=\"noopener\"><i><span data-contrast=\"none\">X<\/span><\/i><\/a><i><span data-contrast=\"none\">.\u00a0<\/span><\/i><span data-ccp-props='{\"335559739\":0}'>\u00a0<\/span><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/rtx-ai-garage-open-models-google-gemma-4\/<\/p>\n","protected":false},"author":0,"featured_media":4522,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4521"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=4521"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4521\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/4522"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=4521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=4521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=4521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}