Nvidia brings liquid cooling to A100 PCIe GPU cards for ‘greener’ datacenters

For those who want to give their racks an air cut


Computex Nvidia's GPUs are becoming increasingly more power hungry, so the US giant is hoping to make datacenters using them "greener" with liquid-cooled PCIe cards that contain its highest-performing chips.

At this year's Computex event in Taiwan, the computer graphics goliath revealed it will sell a liquid-cooled PCIe card for its flagship server GPU, the A100, in the third quarter of this year. Then in early 2023, the company plans to release a liquid-cooled PCIe card for the A100's recently announced successor, the Hopper-powered H100.

Nvidia's A100 has already been available for liquid-cooled servers, but to date, this has only been possible in the GPU's SXM form factor that goes into the company's HGX server board.

With the new liquid-cooled PCIe form factor, Nvidia is making fluid-cooled GPU servers more widely available. Over a dozen server makers are expected to support the liquid-cooled A100 PCIe card later this year, including ASUS, Gigabyte, Inspur, and Supermicro.

The upcoming PCIe cards will use direct-to-chip liquid cooling, and, because of that, they will only take up one PCIe slot in a server versus the two slots required by the air-cooled versions.

In a briefing with journalists, Paresh Kharya, Nvidia's director of datacenter computing, claimed that these factors will allow datacenters with liquid-cooled A100 PCIe cards to provide the same level of performance as datacenters with air-cooled A100 cards while consuming up to 30 percent less power and using 66 percent fewer racks, based on recent tests Nvidia conducted with datacenter giant Equinix.

He said the liquid-cooled PCIe cards will also help datacenters improve their power usage effectiveness (PUE). PUE is a key industry metric that determines a datacenter's efficiency by measuring how much energy goes into the building and dividing it by the amount of energy consumed by the datacenter, including the cooling systems.

In the case of the liquid-cooled A100 PCIe card, Nvidia estimated it would lower a datacenter's PUE ratio to 1.15, "far below" the 1.6 ratio made possible by the air-cooled version. Kharya said the power efficiency gains are made possible by a combination of the liquid cooling itself, which requires less energy than air cooling, and the scale of the datacenter itself.

The GPUs inside these liquid-cooled cards will have the same specs as those in the air-cooled cards. This means they will not run faster, and, for instance, the liquid-cooled A100 PCIe card will still have a maximum thermal design power of 300 watts. At the moment, Nvidia is only putting out a liquid-cooled A100 PCIe card with an 80GB memory capacity, with no plans for a 40GB version.

Kharya said Nvidia decided to release these products in recognition of the fact that mainstream datacenter operators are looking at liquid cooling as a way to cut down on energy consumption.

It's not a surprise, given how power-hungry GPUs and other components have become over the past several years, which is also why Intel recently announced plans to build a $700 million "mega lab" to try out new liquid and immersion cooling technologies. ®


Other stories you might like

  • PCIe 7.0 pegged to arrive in 2025 with speeds of 512 GBps
    Although PCIe 5.0 is just coming to market, here's what we can expect in the years ahead

    Early details of the specifications for PCIe 7.0 are out, and it's expected to deliver data rates of up to 512 GBps bi-directionally for data-intensive applications such as 800G Ethernet.

    The announcement from the The Peripheral Component Interconnect Special Interest Group (PCI SIG) was made to coincide with its Developers Conference 2022, held at the Santa Clara Convention Center in California this week. It also marks the 30th anniversary of the PCI-SIG itself.

    While the completed specifications for PCIe 6.0 were only released this January, PCIe 7.0 looks to double the bandwidth of the high-speed interconnect yet again from a raw bit rate of 64 GTps to 128 GTps, and bi-directional speeds of up to 512 GBps in a x16 configuration.

    Continue reading
  • Nvidia wants to lure you to the Arm side with fresh server bait
    GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready

    Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

    Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

    The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

    Continue reading
  • AMD touts big datacenter, AI ambitions in CPU-GPU roadmap
    Epyc future ahead, along with Instinct, Ryzen, Radeon and custom chip push

    After taking serious CPU market share from Intel over the last few years, AMD has revealed larger ambitions in AI, datacenters and other areas with an expanded roadmap of CPUs, GPUs and other kinds of chips for the near future.

    These ambitions were laid out at AMD's Financial Analyst Day 2022 event on Thursday, where it signaled intentions to become a tougher competitor for Intel, Nvidia and other chip companies with a renewed focus on building better and faster chips for servers and other devices, becoming a bigger player in AI, enabling applications with improved software, and making more custom silicon.  

    "These are where we think we can win in terms of differentiation," AMD CEO Lisa Su said in opening remarks at the event. "It's about compute technology leadership. It's about expanding datacenter leadership. It's about expanding our AI footprint. It's expanding our software capability. And then it's really bringing together a broader custom solutions effort because we think this is a growth area going forward."

    Continue reading
  • Arm jumps on ray tracing bandwagon with beefy GPU design
    British chip designer’s reveal comes months after mobile RT moves by AMD, Imagination

    Arm is beefing up its role in the rapidly-evolving (yet long-standing) hardware-based real-time ray tracing arena.

    The company revealed on Tuesday that it will introduce the feature in its new flagship Immortalis-G715 GPU design for smartphones, promising to deliver graphics in mobile games that realistically recreate the way light interacts with objects.

    Arm is promoting the Immortalis-G715 as its best mobile GPU design yet, claiming that it will provide 15 percent faster performance and 15 percent better energy efficiency compared to the currently available Mali-G710.

    Continue reading
  • Having trouble finding power supplies or server racks? You're not the only one
    Hyperscalers hog the good stuff

    Power and thermal management equipment essential to building datacenters is in short supply, with delays of months on shipments – a situation that's likely to persist well into 2023, Dell'Oro Group reports.

    The analyst firm's latest datacenter physical infrastructure report – which tracks an array of basic but essential components such as uninterruptible power supplies (UPS), thermal management systems, IT racks, and power distribution units – found that manufacturers' shipments accounted for just one to two percent of datacenter physical infrastructure revenue growth during the first quarter.

    "Unit shipments, for the most part, were flat to low single-digit growth," Dell'Oro analyst Lucas Beran told The Register.

    Continue reading
  • Who's growing faster than Nvidia and AMD? Rising datacenter star Marvell
    Of the top 10 fabless chip designers, the Big M soared in Q1 thanks to switch ASICs

    In the world of fabless chip designers, AMD, Nvidia and Qualcomm usually soak up the most attention since their chips are fueling everything from top-end supercomputers to mobile devices.

    This hunger for compute is what has allowed all three companies to grow revenue in the high double digits recently. But there's one fabless chip designer that is growing faster among the largest in the world and it's far from a household name: Marvell Technology.

    Silicon Valley-based Marvell grew semiconductor revenue by 72 percent to $1.4 billion in the first quarter, which made it the fastest growing out of the top 10 largest fabless chip designers during that period, according to financials compiled by Taiwanese research firm TrendForce.

    Continue reading
  • Nvidia taps Intel’s Sapphire Rapids CPU for Hopper-powered DGX H100
    A win against AMD as a much bigger war over AI compute plays out

    Nvidia has chosen Intel's next-generation Xeon Scalable processor, known as Sapphire Rapids, to go inside its upcoming DGX H100 AI system to showcase its flagship H100 GPU.

    Jensen Huang, co-founder and CEO of Nvidia, confirmed the CPU choice during a fireside chat Tuesday at the BofA Securities 2022 Global Technology Conference. Nvidia positions the DGX family as the premier vehicle for its datacenter GPUs, pre-loading the machines with its software and optimizing them to provide the fastest AI performance as individual systems or in large supercomputer clusters.

    Huang's confirmation answers a question we and other observers have had about which next-generation x86 server CPU the new DGX system would use since it was announced in March.

    Continue reading

Biting the hand that feeds IT © 1998–2022