Jun 182 min read

Enabling GPU as a Service

A cloud-like experience for GPU infrastructure using containers

The ability to leverage data in today’s computationally intensive business environment is essential for a business’s success. As AI adoption in the enterprise grows, XR Cloud delivers the compute and storage power needed to meet the challenges posed by machine learning (ML), deep learning (DL), and advanced data analytics.

GPU-ACCELERATED WORKLOADS FOR ENTERPRISE AI DEPLOYMENTS

The development of ML and DL predictive models is compute intensive. The use of accelerators such as graphics processing units (GPUs) provides a performance boost that significantly speeds up development as compared to CPU-only systems making GPUs a common infrastructure choice for ML and DL. However, in most enterprises today, IT teams find it challenging to meet the growing demand for GPUs from multiple data science teams for multiple different ML/DL applications and use cases. Furthermore, the complexity in standing up the right software components together with the underlying infrastructure is very time consuming and the process has to be repeated each time a new ML/DL application is requested. Once the infrastructure is provisioned, IT has very little visibility into utilization to be able to reassign infrastructure to a different application. This lack of visibility also makes it difficult to implement more robust cost chargeback models. There are public cloud services that offer the ability to deploy virtualized GPU resources on demand (that is, GPU as a service), but public cloud is not the only solution and, in some cases, may not be an option. Many organizations have workload requirements that require on-premises deployments due to considerations involving, security, performance, or data gravity.

ON-DEMAND AND ELASTIC PROVISIONING OF GPU RESOURCES

Now, XR Cloud enables enterprise IT organizations to deliver GPUaaS in multi-cloud deployment to increase business agility, optimize GPU utilization, and reduce overall TCO for GPUs. Using the container-based FusoinFlow Platform, GPUs from multiple heterogeneous environments can be consolidated and shared across multiple applications—for on-demand and elastic provisioning of containerized GPU resources, with just a few mouse clicks. Furthermore, using the unique ability to pause containers (where GPU, CPU, and memory resources are released while the overall application state is persisted), data science teams can run multiple different ML/DL applications on shared GPU infrastructure without recreating or reinstalling their applications and libraries.

Enabling GPU as a Service

Comments