Building high performance fabrics with Ethernet
How Dell is helping its customers to be GenAI ready
Sponsored Feature The next time you ask generative AI (GenAI) to write a happy birthday email to your client or craft a bash script, spare a thought for what's happening behind the scenes.
Because the bits are flying at unprecedented speed and volume in the datacenters serving all this up. Training the large language models (LLMs) that underpin GenAI involves large-scale parallel processing to chew through mountains of data. That in turn needs high-bandwidth, low-latency networking to support it all. How's a datacenter to cope?
Dell has been busy developing its AI factory concept to support these high-demand workloads. It's an end-to-end approach to building GenAI infrastructure, spanning compute, storage, and networking. The goal is to create an ecosystem of products - including both Dell's own and those of its third-party partners - that customers can assemble into powerful infrastructure to handle LLM training and inference.
It's all part of the company's plan to situate itself at the heart of GenAI infrastructure among two types of customer categories, explains Saurabh Kapoor, Director of product management and strategy at Dell. The first are large GPU farms or AI factories, including tier two cloud service providers who are extending themselves to become AI as-a-service providers.
"They are building large language models and running workloads which are training and fine tuning AI," he says. The second category is the large enterprise customer base "which leverage these trained models to power real-world business applications and run inferencing."
Dell's activity in this area is generating positive feedback. In Q1 this year, Forrester recognized the company in its leadership category for innovation.
"They appreciated a few things", says Kapoor. "First was Dell's ability to do custom solutions, bringing a choice of GPU solutions in partnership with Nvidia, AMD, and Intel." This lets the customer build AI architectures with a technology of their choice.
Secondly, Forrester liked Dell's approach of validating reference architectures so that customers can piece together these component choices while knowing that they will work well together. That helps them to build out their AI infrastructures more quickly, which is crucial in a fast-moving market like this.
"The other thing they appreciated was our supply chain and delivery excellence, which drives our ability to deliver technology in time, since speed and reliability in delivery are just as important as the technology itself." he says.
Switching up high-speed networks in the datacenter
Dell just demonstrated this with the launch of a new product that delivers on a recent major development in high-speed Ethernet. In February, the IEEE approved 802.3df, which defines 800Gb Ethernet. Dell has delivered this capability in a new piece of switching hardware - the PowerSwitch Z9864F-ON. It has also unveiled a new version of its open-source SONiC based network operating system, Enterprise SONiC Distribution by Dell Technologies 4.4 to go with it.
The new PowerSwitch model is important for the compute-intensive foundational training phase in GenAI, where huge streams of data travel between GPUs processing the raw statistical probability models that support the LLMs. It's also important for the fine tuning phase, where enterprises refine those general-purpose models to support specific use cases.
High-speed networking is also important in inference, where organizations use the LLMs to tackle business tasks. This is where the models must access and process vast amounts of data in real time to serve time-sensitive, low-latency queries from applications like chatbots or image generation systems.
The PowerSwitch Z9864F-ON comes with a range of features that will please datacenter operators trying to cope with the flood of GenAI data, says Kapoor. These include extending that 800GE connectivity over 64 ports, which increases the number of data flows that the switch can support between GPUs, servers, and storage systems.
This high port density enables the creation of large-scale AI clusters, supporting clusters of up to 8,000 GPUs in a two-tier Clos rail-optimized topology. Two-tier Clos uses two layers of switches; a spine layer, which uses high-capacity switches to form the network backbone, and a leaf layer, in which each of the switches connects to each of the spine switches in a mesh configuration. These leaf switches then talk directly to the servers and storage devices. For larger scales, users can go with a three-tier, rail-optimized topology.
This densely connected arrangement is good for east-west (server-to-server) communications says Dell, especially with AI workloads that need both high bandwidth and low latency communication between lots of nodes. It makes it easy to expand the leaf layer with more switches without changing the core backbone layer and makes the switch configuration fault tolerant.
The PowerSwitch Z9864F-ON integrates with Dell's latest Enterprise SONiC Distribution to further support AI workloads. One of SONiC's key features is priority flow control. It allows for lossless Ethernet by putting pause controls on specific traffic classes, preventing packet loss during congestion. This allows robust communications for AI applications that shuttle data between GPUs.
SONiC and the PowerSwitch Z9864F-ON also support RoCEv2 (Remote Direct Memory Access over Converged Ethernet version 2), which enables GPUs and storage systems to send data to each other over high-speed Ethernet while bypassing the CPU and operating system to reduce processing overhead.
Balancing and distributing traffic critical for AI
Optimizing network performance by balancing and distributing traffic loads is arguably more critical than ever in a high-intensity GenAI datacenter infrastructure. To meet that requirement, Dell's products now support several features designed to increase efficiency by carving up traffic and moving it around the network while responding to real-time conditions.
One such feature is dynamic load balancing, which distributes traffic across multiple paths by monitoring network conditions and adjusting traffic routes in real time. The PowerSwitch Z9864F-ON supports cell spray-based load balancing, which fragments packages into smaller cells at the ingress point, distributing these cells across links, and then reassembling them at the egress point. This approach, which happens at the silicon level in the ASIC rather than at the network interface card level, offers finer-grained load balancing than the packet spray approach, which distributes individual packets.
That ASIC is a key part of the PowerSwitch Z9864F-ON's capabilities. It is Broadcom's Tomahawk 5 chip, which comes with some beefy capabilities. It has a 51.2 Tb/s shared-buffer architecture to keep up with the faster RoCEv2 communication for example, and supports 256 ports of 200GbE on a single chip, enabling the creation of flat, low-latency AI/ML clusters.
Tomahawk 5 was designed with AI workloads in mind. These feature predictable, repetitive flows of similar data in extremely high volume, between storage devices and GPUs and between the GPUs themselves. These flows can create problems in datacenter networks, because traditional load balancing algorithms can't always distribute them effectively, says Dell. They can also saturate the buffers in less capable network switches, potentially causing packet loss. These problems can spike network latency, which is anathema to the demanding AI training and inference algorithms that these datacenters serve.
The latest release of SONiC also allows for more granular control over traffic distribution within AI fabrics using enhanced user-defined hashing. This feature, which can help to improve load balancing and network efficiency, enables network administrators to define custom hashing algorithms that consider specific attributes of AI traffic patterns, such as GPU-to-GPU communication flows or large data transfers between compute and storage nodes. By tweaking the technology to reflect their GenAI training and inference loads, admins can distribute traffic more evenly across available network paths, maximizing throughput and enhancing overall system performance.
Orchestration and integration with open source
These device- and operating system-level capabilities help to keep traffic flowing in demanding GenAI environments, but network management and orchestration is a key part of the equation. Dell has a third product that joins SONiC and its PowerSwitch solutions to fill this gap: SmartFabric Manager for SONiC.
This product enables admins to manage their AI network fabric from a single screen, offering several features to minimize their workload. These include templates to set up consistent network configurations, which will be especially useful for organizations that need to frequently set up and tear down training infrastructure.
The ethos of choice in Dell's AI factory concept extends to individual software components such as SONiC, which Kapoor points out is heavily API-focused for interoperability. It's a particularly important feature for network admins, many of whom will be bringing their own monitoring tools to the party. Dell integrates with technologies like Telegraf, Grafana, and Prometheus, which consume SONiC APIs to provide monitoring capabilities along with 3 party management and monitoring solutions.
"Think of niche organizations who have built advanced automation and monitoring solutions that are multi-vendor," he says. "SONiC gives them an open path where they're able to work with a lot of vendors in that space."
This focus on openness has won SONiC a certification under the U.S. government's USGv6 initiative for IPv6 readiness and interoperability. It is also compliant with the FIPS federal security certification. This has cleared the way for SONiC deployment in government environments, Kapoor adds.
While the PowerSwitch Z9864F-ON, SONiC operating system, and SmartFabric Manager handle the network fabric, Dell's PowerEdge XE9860 server is a solid contender for the compute component. This server supports multiple GPU options, including NVIDIA's H100 Tensor Core GPUs. Dell has also announced a liquid-cooled version of this server the XE9680L, which addresses the significant power and cooling requirements of AI workloads.
So what's next? Things never stand still in this fast moving sector. The Ultra Ethernet Consortium (UEC), hosted by the Linux Foundation, is constantly pushing Ethernet further in collaboration with companies including Dell. "In the AI race, Ethernet has always stood the test of time," Kapoor says. "We're looking at 800GB Ethernet rolling out now and Dell is already looking at the next higher bandwidth Ethernet silicon, with the UEC rolling out specifications in the second half."
As foundational model developers rush to release the best LLMs, datacenter operators may struggle to support their hunger for data processing. Kapoor plans to be there with new equipment and software to ride a rising tide of data that doesn't look set to slow any time soon.
Sponsored by Dell.