{"id":640,"date":"2020-12-02T17:45:46","date_gmt":"2020-12-02T17:45:46","guid":{"rendered":"https:\/\/machine-learning.webcloning.com\/2020\/12\/02\/how-to-avoid-speed-bumps-and-stay-in-the-ai-fast-lane-with-hybrid-cloud-infrastructure\/"},"modified":"2020-12-02T17:45:46","modified_gmt":"2020-12-02T17:45:46","slug":"how-to-avoid-speed-bumps-and-stay-in-the-ai-fast-lane-with-hybrid-cloud-infrastructure","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2020\/12\/02\/how-to-avoid-speed-bumps-and-stay-in-the-ai-fast-lane-with-hybrid-cloud-infrastructure\/","title":{"rendered":"How to Avoid Speed Bumps and Stay in the AI Fast Lane with Hybrid Cloud Infrastructure"},"content":{"rendered":"<div data-url=\"https:\/\/blogs.nvidia.com\/blog\/2020\/11\/30\/dgx-hybrid-cloud-infrastructure-google-anthos\/\" data-title=\"How to Avoid Speed Bumps and Stay in the AI Fast Lane with Hybrid Cloud Infrastructure\">\n<p>Cloud or on premises? That\u2019s the question many organizations ask when building AI infrastructure.<\/p>\n<p>Cloud computing can help developers get a fast start with minimal cost. It\u2019s great for early experimentation and supporting temporary needs.<\/p>\n<p>As businesses iterate on their AI models, however, they can become increasingly complex, consume more compute cycles and involve exponentially larger datasets. The costs of <a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/01\/16\/5-predictions-data-center-ai-infrastructure\/\">data gravity<\/a> can escalate, with more time and money spent pushing large datasets from where they\u2019re generated to where compute resources reside.<\/p>\n<p>This AI development \u201cspeed bump\u201d is often an inflection point where organizations realize there are opex benefits with on-premises or <a href=\"https:\/\/blogs.nvidia.com\/blog\/2019\/07\/11\/dgx-ready-program-global-doubles-colocation-partners\/\">collocated infrastructure<\/a>. Its fixed costs can support rapid iteration at the lowest \u201ccost per training run,\u201d complementing their cloud usage.<\/p>\n<p>Conversely, for organizations whose datasets are created in the cloud and live there, procuring compute resources adjacent to that data makes sense. Whether on-prem or in the cloud, minimizing data travel \u2014 by keeping large volumes as close to compute resources as possible \u2014 helps minimize the impact of data gravity on operating costs.<\/p>\n<h2><strong>\u2018Own the Base, Rent the Spike\u2019\u00a0<\/strong><\/h2>\n<p>Businesses that ultimately embrace hybrid cloud infrastructure trace a familiar trajectory.<\/p>\n<p>One customer developing an image recognition application immediately benefited from a fast, effortless start in the cloud.<\/p>\n<p>As their database grew to millions of images, costs rose and processing slowed, causing their data scientists to become more cautious in refining their models.<\/p>\n<p>At this tipping point \u2014 when a fixed cost infrastructure was justified \u2014 they shifted training workloads to an on-prem <a href=\"http:\/\/www.nvidia.com\/dgx\">NVIDIA DGX system<\/a>. This enabled an immediate return to rapid, creative experimentation, allowing the business to build on the great start enabled by the cloud.<\/p>\n<p>The saying \u201cown the base, rent the spike\u201d captures this situation. Enterprise IT provisions on-prem DGX infrastructure to support the steady-state volume of AI workloads and retains the ability to burst to the cloud whenever extra capacity is needed.<\/p>\n<p>It\u2019s this hybrid cloud approach that can secure the continuous availability of compute resources for developers while ensuring the lowest cost per training run.<\/p>\n<p><a href=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2020\/11\/dgx-hybrid-cloud-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2020\/11\/dgx-hybrid-cloud-672x358.jpg\" alt=\"NVIDIA DGX hybrid cloud AI infrastructure\" width=\"672\" height=\"358\"><\/a><\/p>\n<h2><strong>Delivering the AI Hybrid Cloud with DGX and Google Cloud\u2019s Anthos on Bare Metal<\/strong><\/h2>\n<p>To help businesses embrace hybrid cloud infrastructure, NVIDIA has introduced support for Google Cloud\u2019s Anthos on bare metal for its <a href=\"http:\/\/www.nvidia.com\/dgxa100\">DGX A100 systems<\/a>.<\/p>\n<p>For customers using Kubernetes to straddle cloud GPU compute instances and on-prem DGX infrastructure, Anthos on bare metal enables a consistent development and operational experience across deployments, while reducing expensive overhead and improving developer productivity.<\/p>\n<p>This presents several benefits to enterprises. While many have implemented GPU-accelerated AI in their data centers, much of the world retains some legacy x86 compute infrastructure. With Anthos on bare metal, IT can easily add on-prem DGX systems to their infrastructure to tackle AI workloads and manage it the same familiar way, all without the need for a hypervisor layer.<\/p>\n<p>Without the need for a virtual machine, Anthos on bare metal \u2014 now <a href=\"https:\/\/cloud.google.com\/blog\/topics\/hybrid-cloud\/anthos-on-bare-metal-is-now-ga\">generally available<\/a> \u2014 manages application deployment and health across existing environments for more efficient operations. Anthos on bare metal can also manage application containers on a wide variety of performance, GPU-optimized hardware types and allows for direct application access to hardware.<\/p>\n<p>\u201cAnthos on bare metal provides customers with more choice over how and where they run applications and workloads,\u201d said Rayn Veerubhotla, Director of Partner Engineering at Google Cloud. \u201cNVIDIA\u2019s support for Anthos on bare metal means customers can seamlessly deploy NVIDIA\u2019s GPU Device Plugin directly on their hardware, enabling increased performance and flexibility to balance ML workloads across hybrid environments.\u201d<\/p>\n<p>Additionally, teams can access their favorite NVIDIA <a href=\"https:\/\/ngc.nvidia.com\/catalog\/collections\">NGC<\/a> containers, Helm charts and AI models from anywhere.<\/p>\n<p>With this combination, enterprises can enjoy the rapid start and elasticity of resources offered on Google Cloud, as well as the secure performance of dedicated on-prem DGX infrastructure.<\/p>\n<p>Learn more about <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/gpu-cloud-computing\/google-cloud-platform\/\">Google Cloud\u2019s Anthos<\/a>.<\/p>\n<p>Learn more about <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/\">NVIDIA DGX A100<\/a>.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/feedproxy.google.com\/~r\/nvidiablog\/~3\/CDd9pD0lc3A\/<\/p>\n","protected":false},"author":0,"featured_media":641,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/640"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=640"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/640\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/641"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}