This article is more than 1 year old

SmartNICs, IPUs, DPUs de-hyped: Why and how cloud giants are offloading work from server CPUs

Where this technology grew from, and what it offers you

Systems Approach The recent announcements from Intel about Infrastructure Processing Units (IPUs) have prompted us to revisit the topic of how functionality is partitioned in a computing system.

As we noted in an earlier piece, The Accidental SmartNIC, there is at least thirty years’ history of trying to decide how much one should offload from a general purpose CPU to a more specialized NIC, and an equally long tussle between more highly specialized offload engines versus more general-purpose ones.

The IPU represents just the latest entry in a long series of general-purpose offload engines, and we’re now seeing quite a diverse set of options, not just from Intel but from others such as Nvidia and Pensando. These latter firms use the term DPU (data processing unit) but the consensus seems to be that these devices tackle the same class of problems.

There are several interesting things going on here. The first is that there is an emerging consensus that the general purpose x86 (or Arm) server is no longer the best place to run the infrastructure functions of a cloud. By “infrastructure functions” we mean all the things that it takes to run a multi-tenant cloud that are not actually guest workloads: the hypervisor, network virtualization, storage services and so on.

Whereas the server used to be the home of both guest workloads and infrastructure services, these functions are increasingly viewed as “overhead” that is only taking cycles away from guests. One oft-cited paper is Facebook’s Accelerometer study, which measures overhead within Facebook’s data centers as high as 80 per cent, although this may not be generalizable to cloud providers. More plausibly, Google reported [PDF] in 2015:

'Datacenter tax' can comprise nearly 30 per cent of cycles [...], which makes its constituents prime candidates for hardware specialization in future server systems-on-chips.

Amazon Web Services presumably saw the same issue of overheads cutting into the revenue-generating workloads, and started to use specialized hardware for infrastructure services when it acquired Annapurna Labs in 2015, laying the groundwork for its Nitro architecture. The impact was to move almost all infrastructure services out of the servers, leaving them free to run guest workloads and little else.

Once you decide to move a function out of the general-purpose CPU complex into some sort of offload engine, the question is how to retain the appropriate level of flexibility. These offloaded functions are not static, so putting them into fixed-function hardware would be a short-sighted move.

This is why we have seen NICs move in recent years from fixed-function offloads such as TCP segmentation to the more flexible architecture of SmartNICs. So the goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services.

The goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services

Intel’s IPU family contains several entrants, which take different approaches to delivering that flexibility, including both FPGA- and ASIC-based versions. The Mount Evans ASIC is particularly interesting as it includes both Arm CPU cores and programmable networking hardware (from the Barefoot Networks team) that is P4-programmable.

This is a subject dear to our hearts here at Systems Approach, as the P4 toolchain is central to much of the technology that we wrote about in our SDN book.

Putting a P4-programmable switch in an IPU/DPU makes lots of sense, since the networking functions that are likely to be offloaded include those of a virtual switch. And one thing we learned at Nicira and later in the NSX team at VMware was that if you want to move the vswitch to an offload engine, that engine needs to be fully programmable.

If a NIC is insufficiently general to implement the whole vswitch, you can only move some subset of the vswitch functionality to the offload engine. Even if you could move 90 per cent of the functionality, that remaining 10 per cent that you have to keep doing in the CPU is likely to be a bottleneck.

So a P4-programmable offload engine based on the Protocol Independent Switching Architecture (PISA) provides the required level of flexibility and programmability to make offloading of the whole vswitch possible. Combine this with some other programmable hardware (such as Arm cores) and you can see how the entire set of infrastructure functions, including the hypervisor, storage virtualization, etc., can be offloaded to the IPU.

A network virtualization system overlaid on physical servers and switches

The virtual switch of a network virtualization system is a candidate for offloading

One way to view the latest generation of DPUs/IPUs is that the efforts of the SDN movement to create more programmable switches has enabled innovation in a new space.

SDN initially promised to drive control plane innovation by decoupling the switching hardware from the software that controlled it. Network virtualization was one of the first applications of SDN to take off, with the separation of control and data planes and highly flexible software switches enabling networks to be created entirely in software (on top of a hardware-based underlay).

PISA and P4 led to a more flexible form of switching hardware and a new way to define the hardware-software interface (improving on the earlier efforts of OpenFlow). All of these threads–control plane innovation, network virtualization, and flexible, programmable switch hardware–are now being brought together in the creation of IPUs and DPUs.

We can also view the development of IPUs/DPUs as a continuation of the trend in which processors are both highly flexible and yet specialized for certain tasks. GPUs and TPUs are really flexible, being used for everything from crypto-mining to machine learning to graphics processing, but are nevertheless quite specialized compared to CPUs. (GPUs were even used for packet processing in an era before we had PISA and P4.)

DPUs and IPUs now seem well established as a new category of highly programmable devices that are optimized for a specific set of tasks that need to be performed in a modern cloud data center. With that greater specialization comes greater efficiency, while flexibility remains high enough to support future innovation. ®

Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.

More about


Send us news

Other stories you might like