Securing AI workloads in multi-tenant K8s clusters

Discover how F5 BIG-IP Next for Kubernetes offloads critical AI tasks to BlueField-3 DPUs

Partner Content Organizations are increasingly deploying AI workloads on Kubernetes (K8s) clusters to harness the benefits of containerization and orchestration.

K8s is a portable, extensible, open-source platform for managing containerized workloads and services. It provides building blocks for building developer platforms.

With automatic bin packing, K8s uses a cluster of nodes to run containerized tasks. By setting how much CPU and memory each container needs, K8s can fit containers onto nodes to make the best use of resources. Service discovery and load balancing of container traffic to containers ensure that deployment is stable.

The term "multi-tenancy" has been used to describe the sharing of clusters where different applications or multiple instances of an application run in the same cluster. Multi-tenancy in K8s clusters falls into variations and hybrids of two broad categories – multi-team and multi-customer.

A cluster is commonly shared between multiple teams within an organization, each of whom may operate one or more workloads. While team members have access to K8s resources, K8s policies such as role-based access control (RBAC), quotas, and network policies are essential to safely and fairly share clusters.

In a multi-customer tenancy scenario, a software-as-a-service (SaaS) vendor runs multiple instances of a workload for customers with K8s policies, isolating the workloads. Here, customers do not have access to the cluster. The vendor uses K8s to manage the workloads and optimize costs. Many GPU-as-a-Service (GPUaaS) or AI factory operators deploy multiple K8s clusters for multi-customer tenancy.

Security, TCO factors in multi-tenant K8s clusters

Key considerations in designing multi-tenant solutions with K8s include the isolation level, implementation effort, operational complexity, and service cost. Isolating the workloads of different tenants from each other to prevent unauthorized access is critical for both the control plane and the data plane in the K8s cluster. However, the strength of tenant isolation must be evaluated against the cost and complexity of managing multiple clusters.

Total cost of ownership (TCO) considerations include increased requirements for servers, graphic processing units (GPUs), and networking equipment; higher power and cooling costs; and more significant administrative effort, including monitoring, maintenance, and updates. Organizations are also challenged to protect sensitive data and comply with regulatory requirements.

Crucially, industries had not been prepared for the massive data processing capabilities and infrastructure buildouts required to realize the potential of AI. Efficient and secure operations are more critical than ever, especially for large-scale AI infrastructure.

As multi-tenancy grows within these K8s environments, F5 acceleration technology on data processing units (DPUs) enables providers to secure and optimize AI K8s clusters. The DPUs offload and accelerate security, network and storage tasks, freeing up the CPU to handle more AI workloads.

Specifically, F5 has combined its intelligent proxy with NVIDIA's BlueField-3 DPUs to create BIG-IP Next for Kubernetes, a solution to enhance security and improve application delivery for AI workloads. The integrated BlueField-3 DPUs facilitate granular multi-tenancy capabilities while optimize energy consumption and providing an integrated view of networking, security and traffic management.

The F5 BIG-IP Next for Kubernetes is tailored for AI use cases such as inference, retrieval-augmented generation (RAG), and seamless data management and storage. It offloads data-heavy tasks to the NVIDIA BlueField-3 DPUs and utilises the DPUs acceleration framework resources for revenue-generating applications. Outside of AI infrastructure, the solution is particularly beneficial at the network edge for virtualised RAN (vRAN) and other 5G network functions, particularly in low-power, high-performance edge scenarios.

By enabling effective multi-tenancy within a single K8s cluster, F5 acceleration technology on DPUs helps lower hardware requirements, reduce power and cooling costs, and streamline operational management. Through a centralized control point, service providers and large enterprises can accelerate, secure and streamline data traffic that flows into and out of large-scale AI infrastructures while using the rich observability and granular control provided to optimize AI workloads.

BIG-IP and DPUs bolster security

Furthermore, deploying F5's acceleration technology on DPUs enables advanced security features such as micro-segmentation, network separation, encryption and intrusion detection/prevention to operate at close to wire speed, ensuring minimal latency and high throughput. The DPUs handle critical security features and zero trust architecture, including edge firewall, distributed denial-of-service (DDoS) mitigation, API protection, intrusion prevention, encryption, and certificate management.

For service providers and large-scale infrastructures, F5 BIG-IP Next for Kubernetes offers a centralized integration point into AI networks. Support for multiple L7 protocols beyond HTTP enhances ingress and egress control at high performance. Customers can automate the discovery and security of AI training and inference endpoints while isolating AI applications from targeted threats. This feature enhances data integrity and sovereignty and addresses encryption capabilities for AI environments.

As organizations look for ways to harness the power of AI, the F5 BIG-IP Next for Kubernetes is aimed at transforming application delivery for service providers and enterprises as well as GPUaaS providers and AI factories. Optimization efforts for AI inference, embedding or training boost overall performance and accelerate GPU's access to data to meet the demands of generative AI.

The F5 BIG-IP Next for Kubernetes, a Kubernetes-native implementation of F5's BIG-IP platform, provides multi-tenancy support that enables service providers to securely host multiple users on the same AI infrastructure while keeping their AI workloads and data separate. AI workload delivery shares similarities with 5G workload delivery but involves exponentially more significant traffic volumes.

This year, the training of Meta's open-source learning language model (LLM) Llama 3 was hindered by network latency. By tuning hardware-software interactions, overall performance was increased by 10 percent, which translates to weeks of saved time and saved cost. F5 BIG-IP Next for Kubernetes efficiently routes this traffic across accelerated pathways, enabling similar efficiency gains and reducing time, cost and energy use. This delivers scalable AI performance while optimizing GPU resource utilization.

Robust, secure infrastructure drives evolving AI

This is where the AI factory or GPU-as-a-service (GPUaaS) concept provides a framework for organizations to operationalize their AI initiatives, making them more adaptable and secure to changing business and market demands. By offering on-demand access to powerful GPUs, GPUaaS providers allow seamless scaling and integration easily while connecting the necessary data fuel for AI. An AI factory becomes a massive storage, networking and computing resource to meet high-volume, high-performance training and inference requirements.

Deploying AI at scale is becoming increasingly essential as it serves as a crucial market differentiator and driver of operational efficiency. With multi-tenancy architecture future-proofing AI factories for ever-increasing AI workloads, the solution connects AI models with data in disparate locations while significantly enhancing visibility into app performance and utilizing advanced Kubernetes capabilities for AI workload automation and centralized policy controls.

The orchestration of hardware and software components allows AI factories to produce and continuously refine AI models while adapting to new data and evolving requirements. Ultimately, F5 builds resiliency, performance, and multi-tenancy into AI systems to develop a secure and robust infrastructure for training or inferencing AI models that future intelligent applications will require.

Contributed by F5.

More about

More about

More about

TIP US OFF

Send us news