Nvidia brings liquid cooling to A100 PCIe GPU cards for ‘greener’ datacenters
For those who want to give their racks an air cut
Computex Nvidia's GPUs are becoming increasingly more power hungry, so the US giant is hoping to make datacenters using them "greener" with liquid-cooled PCIe cards that contain its highest-performing chips.
At this year's Computex event in Taiwan, the computer graphics goliath revealed it will sell a liquid-cooled PCIe card for its flagship server GPU, the A100, in the third quarter of this year. Then in early 2023, the company plans to release a liquid-cooled PCIe card for the A100's recently announced successor, the Hopper-powered H100.
Nvidia's A100 has already been available for liquid-cooled servers, but to date, this has only been possible in the GPU's SXM form factor that goes into the company's HGX server board.
With the new liquid-cooled PCIe form factor, Nvidia is making fluid-cooled GPU servers more widely available. Over a dozen server makers are expected to support the liquid-cooled A100 PCIe card later this year, including ASUS, Gigabyte, Inspur, and Supermicro.
- Nvidia reveals specs of latest GPU: The Hopper-based H100
- Nvidia open-sources Linux kernel GPU modules. Repeat, open-source GPU modules
- How Nvidia is overcoming slowdown issues in GPU clusters
- Intel plans immersion lab to chill its power-hungry chips
The upcoming PCIe cards will use direct-to-chip liquid cooling, and, because of that, they will only take up one PCIe slot in a server versus the two slots required by the air-cooled versions.
In a briefing with journalists, Paresh Kharya, Nvidia's director of datacenter computing, claimed that these factors will allow datacenters with liquid-cooled A100 PCIe cards to provide the same level of performance as datacenters with air-cooled A100 cards while consuming up to 30 percent less power and using 66 percent fewer racks, based on recent tests Nvidia conducted with datacenter giant Equinix.
He said the liquid-cooled PCIe cards will also help datacenters improve their power usage effectiveness (PUE). PUE is a key industry metric that determines a datacenter's efficiency by measuring how much energy goes into the building and dividing it by the amount of energy consumed by the datacenter, including the cooling systems.
In the case of the liquid-cooled A100 PCIe card, Nvidia estimated it would lower a datacenter's PUE ratio to 1.15, "far below" the 1.6 ratio made possible by the air-cooled version. Kharya said the power efficiency gains are made possible by a combination of the liquid cooling itself, which requires less energy than air cooling, and the scale of the datacenter itself.
The GPUs inside these liquid-cooled cards will have the same specs as those in the air-cooled cards. This means they will not run faster, and, for instance, the liquid-cooled A100 PCIe card will still have a maximum thermal design power of 300 watts. At the moment, Nvidia is only putting out a liquid-cooled A100 PCIe card with an 80GB memory capacity, with no plans for a 40GB version.
Kharya said Nvidia decided to release these products in recognition of the fact that mainstream datacenter operators are looking at liquid cooling as a way to cut down on energy consumption.
It's not a surprise, given how power-hungry GPUs and other components have become over the past several years, which is also why Intel recently announced plans to build a $700 million "mega lab" to try out new liquid and immersion cooling technologies. ®