Meet the ‘DPU’ – accelerated network cards designed to go where CPUs and GPUs are too valuable to waste
You may know them as SmartNICs and they’re touted as the trick that will smarten UP AI, and help clouds and 5G to scale
Analysis Crack open a firewall or storage array, and on the motherboard you may well find a chip named “Octeon” from component-maker Marvell.
Octeon's job is giving appliance-builders a chip that handles networking and security chores so they can focus on building brilliant firewalls or storage arrays. The chips can scale to 16 more cores and are programmable: firewall vendors can set them up to inspect network traffic and array-builders can tweak them to handle the way disks deliver data to CPUs and networks. Using Octeon to do these low-level tasks is useful because it means appliances' CPUs can focus on the job of running a firewall or storage array.
The Register offers that short history lesson because it’s about to be re-told quite a lot thanks to another piece of hardware in which Octeon is often found: a network interface controller (NIC).
A few years back NICs got smart: vendors started adding some modestly-powerful compute cores and a little storage so they could be loaded with the same kind of firmware that makes an Octeon so useful. Again, the aim is to let a device's main processor concentrate on its main job, but in the case of these beefed-up NICs - labelled "SmartNICs" - the device in question is a server.
Hyperscale clouds liked the look of SmartNICs because they make money renting a server's CPU cores. But some of those cores were busy running networking and security chores. Many servers in clouds host virtual machines for multiple customers, making cloud networking and security rather complex. Hyperscalers therefore saw the potential to offload some work into SmartNICs, both to free up CPU cores and to further isolate customer workloads from other tenants and the perils of the wider internet.
Oracle reckons it was first to use SmartNICs in its second-generation cloud, which may be true although Amazon Web Services’ “Nitro” was announced a few months before Big Red’s efforts became public.
Cloud-measuring contests aside, SmartNICs have now become standard for hyperscalers. Alibaba, and Baidu are known users, while Google is under suspicion of having them under the hood and Microsoft does similar things with field-programmable gate arrays.
Because they work to move data, SmartNICs are now being described as data-processing units – “DPUs” – and being advanced as essential for demanding workloads like AI.
“The DPU is really good at looking inside data and running storage and compression and security,” NVIDIA’s vice president of marketing Kevin Deierling told The Register. Being good at that matters because users for whom clouds' latency means they're not an option for jobs like realtime analytics make big investments in on-prem servers that are rich in RAM and GPUs. Anything that leaves that GPUs free to do AI, instead of I/O, is therefore welcome.
5G is another likely use-case, as the new protocol assumes that network functions will be pushed into software, and running such code on the hardware that handles networking is useful because it means work gets done where traffic is moving, rather than having to go all the way to the CPU. By adding co-processors to servers, SmartNICs also improve compute density of devices that run in the space-constrained places like base stations and rooftops where 5G network hardware operates.
NVIDIA's Deierling also points out that SmartNICs are handy because firewalls on servers are becoming important as applications like AI push more data from East to West (inside a data centre) than rather than North/South (from a data centre to the world, and back).”
“Traditional firewalls on the perimeter no longer are adequate,” he argues. “You need security to match what the traffic generated by accelerated distributed apps look like.”
Note his use of the term “accelerated apps” because Deierling assumes the presence of a GPU, and suggests its cores and memory also deserve the protection of DPU offload.
NVIDIA even thinks GPUs can benefit from being melded with a DPU. The company’s forthcoming EGX platform does just that.
Sadly, putting GPUs to work is not easy. NVIDIA has an SDK and provides its own software to drive DPUs. Marvell makes sure its kit is ready to work with the Data Plane Development Kit, an effort founded by Intel, overseen by the Linux Foundation and operating with an aim “to accelerate packet processing workloads running on a wide variety of CPU architectures.”
John Sakamoto, veep of Marvell’s VP infrastructure business unit, told The Register he sees those who need to create custom code use the DPDK spec when they need the functionality of a DPU.
But while hyperscalers, appliance-builders and serious AI adopters are happy cutting networking code, most users are quite rightly far happier with off-the-shelf product.
If you fancy running a firewall in a SmartNIC/DPU, instead of using appliances or running software firewalls, your preferred vendor almost certainly has nothing that will run on the accelerators.
Analyst firm Gartner’s 2020 Hype Cycle for Enterprise Networking therefore rates SmartNICs – which it calls “function accelerator cards” – as currently applicable to “Less than one per cent of target audience”.
But The Register has written this story because change is in the wind. VMware has demonstrated its ESXi hypervisor on SmartNICs. As mentioned above, NVIDIA, is bolting one to a forthcoming GPU. Marvell has news up its sleeve. And Arm just created a DPU design just to accelerate storage.
And in not too many years, some of the news in the paragraph above will create new ways to consider running almost any data centre - be it a hyperscale cloud or your own server fleet. ®