SmartNICs, IPUs, DPUs de-hyped: Why and how cloud giants are offloading work from server CPUs

Where this technology grew from, and what it offers you


Systems Approach The recent announcements from Intel about Infrastructure Processing Units (IPUs) have prompted us to revisit the topic of how functionality is partitioned in a computing system.

As we noted in an earlier piece, The Accidental SmartNIC, there is at least thirty years’ history of trying to decide how much one should offload from a general purpose CPU to a more specialized NIC, and an equally long tussle between more highly specialized offload engines versus more general-purpose ones.

The IPU represents just the latest entry in a long series of general-purpose offload engines, and we’re now seeing quite a diverse set of options, not just from Intel but from others such as Nvidia and Pensando. These latter firms use the term DPU (data processing unit) but the consensus seems to be that these devices tackle the same class of problems.

There are several interesting things going on here. The first is that there is an emerging consensus that the general purpose x86 (or Arm) server is no longer the best place to run the infrastructure functions of a cloud. By “infrastructure functions” we mean all the things that it takes to run a multi-tenant cloud that are not actually guest workloads: the hypervisor, network virtualization, storage services and so on.

Whereas the server used to be the home of both guest workloads and infrastructure services, these functions are increasingly viewed as “overhead” that is only taking cycles away from guests. One oft-cited paper is Facebook’s Accelerometer study, which measures overhead within Facebook’s data centers as high as 80 per cent, although this may not be generalizable to cloud providers. More plausibly, Google reported [PDF] in 2015:

'Datacenter tax' can comprise nearly 30 per cent of cycles [...], which makes its constituents prime candidates for hardware specialization in future server systems-on-chips.

Amazon Web Services presumably saw the same issue of overheads cutting into the revenue-generating workloads, and started to use specialized hardware for infrastructure services when it acquired Annapurna Labs in 2015, laying the groundwork for its Nitro architecture. The impact was to move almost all infrastructure services out of the servers, leaving them free to run guest workloads and little else.

Once you decide to move a function out of the general-purpose CPU complex into some sort of offload engine, the question is how to retain the appropriate level of flexibility. These offloaded functions are not static, so putting them into fixed-function hardware would be a short-sighted move.

This is why we have seen NICs move in recent years from fixed-function offloads such as TCP segmentation to the more flexible architecture of SmartNICs. So the goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services.

The goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services

Intel’s IPU family contains several entrants, which take different approaches to delivering that flexibility, including both FPGA- and ASIC-based versions. The Mount Evans ASIC is particularly interesting as it includes both Arm CPU cores and programmable networking hardware (from the Barefoot Networks team) that is P4-programmable.

This is a subject dear to our hearts here at Systems Approach, as the P4 toolchain is central to much of the technology that we wrote about in our SDN book.

Putting a P4-programmable switch in an IPU/DPU makes lots of sense, since the networking functions that are likely to be offloaded include those of a virtual switch. And one thing we learned at Nicira and later in the NSX team at VMware was that if you want to move the vswitch to an offload engine, that engine needs to be fully programmable.

If a NIC is insufficiently general to implement the whole vswitch, you can only move some subset of the vswitch functionality to the offload engine. Even if you could move 90 per cent of the functionality, that remaining 10 per cent that you have to keep doing in the CPU is likely to be a bottleneck.

So a P4-programmable offload engine based on the Protocol Independent Switching Architecture (PISA) provides the required level of flexibility and programmability to make offloading of the whole vswitch possible. Combine this with some other programmable hardware (such as Arm cores) and you can see how the entire set of infrastructure functions, including the hypervisor, storage virtualization, etc., can be offloaded to the IPU.

A network virtualization system overlaid on physical servers and switches

The virtual switch of a network virtualization system is a candidate for offloading

One way to view the latest generation of DPUs/IPUs is that the efforts of the SDN movement to create more programmable switches has enabled innovation in a new space.

SDN initially promised to drive control plane innovation by decoupling the switching hardware from the software that controlled it. Network virtualization was one of the first applications of SDN to take off, with the separation of control and data planes and highly flexible software switches enabling networks to be created entirely in software (on top of a hardware-based underlay).

PISA and P4 led to a more flexible form of switching hardware and a new way to define the hardware-software interface (improving on the earlier efforts of OpenFlow). All of these threads–control plane innovation, network virtualization, and flexible, programmable switch hardware–are now being brought together in the creation of IPUs and DPUs.

We can also view the development of IPUs/DPUs as a continuation of the trend in which processors are both highly flexible and yet specialized for certain tasks. GPUs and TPUs are really flexible, being used for everything from crypto-mining to machine learning to graphics processing, but are nevertheless quite specialized compared to CPUs. (GPUs were even used for packet processing in an era before we had PISA and P4.)

DPUs and IPUs now seem well established as a new category of highly programmable devices that are optimized for a specific set of tasks that need to be performed in a modern cloud data center. With that greater specialization comes greater efficiency, while flexibility remains high enough to support future innovation. ®

Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.

Similar topics

Broader topics


Other stories you might like

  • India reveals home-grown server that won't worry the leading edge

    And a National Blockchain Strategy that calls for gov to host BaaS

    India's government has revealed a home-grown server design that is unlikely to threaten the pacesetters of high tech, but (it hopes) will attract domestic buyers and manufacturers and help to kickstart the nation's hardware industry.

    The "Rudra" design is a two-socket server that can run Intel's Cascade Lake Xeons. The machines are offered in 1U or 2U form factors, each at half-width. A pair of GPUs can be equipped, as can DDR4 RAM.

    Cascade Lake emerged in 2019 and has since been superseded by the Ice Lake architecture launched in April 2021. Indian authorities know Rudra is off the pace, and said a new design capable of supporting four GPUs is already in the works with a reveal planned for June 2022.

    Continue reading
  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading

Biting the hand that feeds IT © 1998–2021