SmartNICs, IPUs, DPUs de-hyped: Why and how cloud giants are offloading work from server CPUs

Where this technology grew from, and what it offers you


Systems Approach The recent announcements from Intel about Infrastructure Processing Units (IPUs) have prompted us to revisit the topic of how functionality is partitioned in a computing system.

As we noted in an earlier piece, The Accidental SmartNIC, there is at least thirty years’ history of trying to decide how much one should offload from a general purpose CPU to a more specialized NIC, and an equally long tussle between more highly specialized offload engines versus more general-purpose ones.

The IPU represents just the latest entry in a long series of general-purpose offload engines, and we’re now seeing quite a diverse set of options, not just from Intel but from others such as Nvidia and Pensando. These latter firms use the term DPU (data processing unit) but the consensus seems to be that these devices tackle the same class of problems.

There are several interesting things going on here. The first is that there is an emerging consensus that the general purpose x86 (or Arm) server is no longer the best place to run the infrastructure functions of a cloud. By “infrastructure functions” we mean all the things that it takes to run a multi-tenant cloud that are not actually guest workloads: the hypervisor, network virtualization, storage services and so on.

Whereas the server used to be the home of both guest workloads and infrastructure services, these functions are increasingly viewed as “overhead” that is only taking cycles away from guests. One oft-cited paper is Facebook’s Accelerometer study, which measures overhead within Facebook’s data centers as high as 80 per cent, although this may not be generalizable to cloud providers. More plausibly, Google reported [PDF] in 2015:

'Datacenter tax' can comprise nearly 30 per cent of cycles [...], which makes its constituents prime candidates for hardware specialization in future server systems-on-chips.

Amazon Web Services presumably saw the same issue of overheads cutting into the revenue-generating workloads, and started to use specialized hardware for infrastructure services when it acquired Annapurna Labs in 2015, laying the groundwork for its Nitro architecture. The impact was to move almost all infrastructure services out of the servers, leaving them free to run guest workloads and little else.

Once you decide to move a function out of the general-purpose CPU complex into some sort of offload engine, the question is how to retain the appropriate level of flexibility. These offloaded functions are not static, so putting them into fixed-function hardware would be a short-sighted move.

This is why we have seen NICs move in recent years from fixed-function offloads such as TCP segmentation to the more flexible architecture of SmartNICs. So the goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services.

The goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services

Intel’s IPU family contains several entrants, which take different approaches to delivering that flexibility, including both FPGA- and ASIC-based versions. The Mount Evans ASIC is particularly interesting as it includes both Arm CPU cores and programmable networking hardware (from the Barefoot Networks team) that is P4-programmable.

This is a subject dear to our hearts here at Systems Approach, as the P4 toolchain is central to much of the technology that we wrote about in our SDN book.

Putting a P4-programmable switch in an IPU/DPU makes lots of sense, since the networking functions that are likely to be offloaded include those of a virtual switch. And one thing we learned at Nicira and later in the NSX team at VMware was that if you want to move the vswitch to an offload engine, that engine needs to be fully programmable.

If a NIC is insufficiently general to implement the whole vswitch, you can only move some subset of the vswitch functionality to the offload engine. Even if you could move 90 per cent of the functionality, that remaining 10 per cent that you have to keep doing in the CPU is likely to be a bottleneck.

So a P4-programmable offload engine based on the Protocol Independent Switching Architecture (PISA) provides the required level of flexibility and programmability to make offloading of the whole vswitch possible. Combine this with some other programmable hardware (such as Arm cores) and you can see how the entire set of infrastructure functions, including the hypervisor, storage virtualization, etc., can be offloaded to the IPU.

A network virtualization system overlaid on physical servers and switches

The virtual switch of a network virtualization system is a candidate for offloading

One way to view the latest generation of DPUs/IPUs is that the efforts of the SDN movement to create more programmable switches has enabled innovation in a new space.

SDN initially promised to drive control plane innovation by decoupling the switching hardware from the software that controlled it. Network virtualization was one of the first applications of SDN to take off, with the separation of control and data planes and highly flexible software switches enabling networks to be created entirely in software (on top of a hardware-based underlay).

PISA and P4 led to a more flexible form of switching hardware and a new way to define the hardware-software interface (improving on the earlier efforts of OpenFlow). All of these threads–control plane innovation, network virtualization, and flexible, programmable switch hardware–are now being brought together in the creation of IPUs and DPUs.

We can also view the development of IPUs/DPUs as a continuation of the trend in which processors are both highly flexible and yet specialized for certain tasks. GPUs and TPUs are really flexible, being used for everything from crypto-mining to machine learning to graphics processing, but are nevertheless quite specialized compared to CPUs. (GPUs were even used for packet processing in an era before we had PISA and P4.)

DPUs and IPUs now seem well established as a new category of highly programmable devices that are optimized for a specific set of tasks that need to be performed in a modern cloud data center. With that greater specialization comes greater efficiency, while flexibility remains high enough to support future innovation. ®

Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.

Broader topics


Other stories you might like

  • Lonestar plans to put datacenters in the Moon's lava tubes
    How? Founder tells The Register 'Robots… lots of robots'

    Imagine a future where racks of computer servers hum quietly in darkness below the surface of the Moon.

    Here is where some of the most important data is stored, to be left untouched for as long as can be. The idea sounds like something from science-fiction, but one startup that recently emerged from stealth is trying to turn it into a reality. Lonestar Data Holdings has a unique mission unlike any other cloud provider: to build datacenters on the Moon backing up the world's data.

    "It's inconceivable to me that we are keeping our most precious assets, our knowledge and our data, on Earth, where we're setting off bombs and burning things," Christopher Stott, founder and CEO of Lonestar, told The Register. "We need to put our assets in place off our planet, where we can keep it safe."

    Continue reading
  • Conti: Russian-backed rulers of Costa Rican hacktocracy?
    Also, Chinese IT admin jailed for deleting database, and the NSA promises no more backdoors

    In brief The notorious Russian-aligned Conti ransomware gang has upped the ante in its attack against Costa Rica, threatening to overthrow the government if it doesn't pay a $20 million ransom. 

    Costa Rican president Rodrigo Chaves said that the country is effectively at war with the gang, who in April infiltrated the government's computer systems, gaining a foothold in 27 agencies at various government levels. The US State Department has offered a $15 million reward leading to the capture of Conti's leaders, who it said have made more than $150 million from 1,000+ victims.

    Conti claimed this week that it has insiders in the Costa Rican government, the AP reported, warning that "We are determined to overthrow the government by means of a cyber attack, we have already shown you all the strength and power, you have introduced an emergency." 

    Continue reading
  • China-linked Twisted Panda caught spying on Russian defense R&D
    Because Beijing isn't above covert ops to accomplish its five-year goals

    Chinese cyberspies targeted two Russian defense institutes and possibly another research facility in Belarus, according to Check Point Research.

    The new campaign, dubbed Twisted Panda, is part of a larger, state-sponsored espionage operation that has been ongoing for several months, if not nearly a year, according to the security shop.

    In a technical analysis, the researchers detail the various malicious stages and payloads of the campaign that used sanctions-related phishing emails to attack Russian entities, which are part of the state-owned defense conglomerate Rostec Corporation.

    Continue reading
  • FTC signals crackdown on ed-tech harvesting kid's data
    Trade watchdog, and President, reminds that COPPA can ban ya

    The US Federal Trade Commission on Thursday said it intends to take action against educational technology companies that unlawfully collect data from children using online educational services.

    In a policy statement, the agency said, "Children should not have to needlessly hand over their data and forfeit their privacy in order to do their schoolwork or participate in remote learning, especially given the wide and increasing adoption of ed tech tools."

    The agency says it will scrutinize educational service providers to ensure that they are meeting their legal obligations under COPPA, the Children's Online Privacy Protection Act.

    Continue reading
  • Mysterious firm seeks to buy majority stake in Arm China
    Chinese joint venture's ousted CEO tries to hang on - who will get control?

    The saga surrounding Arm's joint venture in China just took another intriguing turn: a mysterious firm named Lotcap Group claims it has signed a letter of intent to buy a 51 percent stake in Arm China from existing investors in the country.

    In a Chinese-language press release posted Wednesday, Lotcap said it has formed a subsidiary, Lotcap Fund, to buy a majority stake in the joint venture. However, reporting by one newspaper suggested that the investment firm still needs the approval of one significant investor to gain 51 percent control of Arm China.

    The development comes a couple of weeks after Arm China said that its former CEO, Allen Wu, was refusing once again to step down from his position, despite the company's board voting in late April to replace Wu with two co-chief executives. SoftBank Group, which owns 49 percent of the Chinese venture, has been trying to unentangle Arm China from Wu as the Japanese tech investment giant plans for an initial public offering of the British parent company.

    Continue reading
  • SmartNICs power the cloud, are enterprise datacenters next?
    High pricing, lack of software make smartNICs a tough sell, despite offload potential

    SmartNICs have the potential to accelerate enterprise workloads, but don't expect to see them bring hyperscale-class efficiency to most datacenters anytime soon, ZK Research's Zeus Kerravala told The Register.

    SmartNICs are widely deployed in cloud and hyperscale datacenters as a means to offload input/output (I/O) intensive network, security, and storage operations from the CPU, freeing it up to run revenue generating tenant workloads. Some more advanced chips even offload the hypervisor to further separate the infrastructure management layer from the rest of the server.

    Despite relative success in the cloud and a flurry of innovation from the still-limited vendor SmartNIC ecosystem, including Mellanox (Nvidia), Intel, Marvell, and Xilinx (AMD), Kerravala argues that the use cases for enterprise datacenters are unlikely to resemble those of the major hyperscalers, at least in the near term.

    Continue reading

Biting the hand that feeds IT © 1998–2022