LiquidStack CEO on why you shouldn't ignore immersion cooling
Depending on use case, the efficiency gains can be significant
Interview As chipmakers demand more power than ever, with some chips pushing 700 or even 1,000 watts in the case of some of Nvidia's upcoming parts, datacenter operators are having to get creative about the way they cool these chips.
One of the technologies gaining popularity in the wake of these trends is immersion cooling, a process by which systems are physically submerged in a bath of specialized coolants.
The Register sat down with LiquidStack CEO Joe Capes to discuss the merits of the technology, where it has seen success, and why a datacenter might opt for immersion cooling over alternatives like direct liquid cooling. We also discussed some of the biggest challenges standing in the way of broader adoption.
The following has been edited for both clarity and brevity.
What factors are driving adoption of immersion cooling tech?
The biggest driver, going back to maybe 2018, is that chip power packages – TDPs – are on a drastic increase. Prior to 2018, we had seen a decade or more of very static TDPs – typically CPUs were kind of hovering around 150 watts per chip.
We're seeing a dramatic increase in TDP driven by machine learning, AI, and other new workloads and applications. And the TDPs that we're now seeing in the market, particularly for GPUs, are already reaching 500-700 watts.
Even at 270 watts TDP, it becomes really, really hard to air-cool. You have to use much larger heat sinks and larger fans consume more power, and it becomes not only impractical but also inefficient and unsustainable.
Why would someone consider immersion cooling over alternatives like direct liquid cooling or opting for larger air cooled systems?
I've spent two decades of my career in the air cooling space, and what we found is in the last decade the power use effectiveness (PUE) that we were seeing in typical datacenters was flatlining.
What we're finding right now is single-phase immersion is finding some sweet spots in things like cryptomining and in some high-performance computing (HPC) applications for academia and government, and even some edge applications. However, it isn't finding a home in the traditional white space – especially with hyperscalers – because most of the fluids used for single phase are petrochemical-based, quite messy and greasy, and also flammable.
We see direct-to-chip as being a transitionary technology. It's got a lot of potential really here for the next 10 to 15 years, particularly in existing datacenters, like brownfield sites where you have an existing base load of cooling. You can use direct-to-chip to beef up or top up your base load of air cooling, and you can do so with fairly minimal disruption.
We think two-phase immersion cooling with data tanks is a nice approach because it supports traditional 19 and 21-inch OCP v3 form factors.
We see a lot of different pros and cons to direct, single phase, and two phase. They all have their unique advantages, and the industry is growing very quickly in all categories.
How does immersion cooling compare to alternative thermal management systems in terms of PUE?
The last Uptime report that I read showed that PUE from 2022 was hovering at around 1.58 for a traditional datacenter. That compares to a PUE nominally of about 1.02 to 1.05 for a two-phase immersion cooled datacenter. For single phase, we're typically seeing a PUE of around 1.05 to 1.1. For direct-to-chip cooling, we're seeing similar PUEs, if not just a bit higher.
How substantial is a drop from 1.5 to 1.05 PUE?
What that means is for every watt of IT power you're only consuming 0.05 additional watts for cooling, versus half a watt for traditional air cooling systems.
We think that that's a huge advantage for liquid cooling systems in general. When you get into the sub 1.1 range you're really splitting hairs and that your focus should be on your water use efficiency.
- Euro bit barn biz has hot prospect: immersion cooling for colo customers
- Castrol immerses itself deeper into liquid cooling with researcher
- Intel kills $700M liquid cooling lab amid chip slump
- As liquid cooling takes off in the datacenter, fortune favors the brave
So can you just plug a standard 19-inch or OCP-compliant system into one an immersion tank or does it require something a bit more exotic?
What we're seeing is kind of a two-pronged approach to immersion cooling in the hardware industry.
One is where you're taking an existing air-cooled server, and essentially hybridizing it to accommodate immersion. You're removing the fans, you're removing the heat sinks. You're basically telling the hardware BIOS that you're no longer air cooled.
The big advantage when you're designing for immersion on day one is that you can lay out the board with the mindset that you don't need a huge bank of fans; you don't need a skyscraper-style heatsink if you're using a high-TDP chip. That allows us to really shrink the form factor of the server.
In some cases, you can reduce the depth of the server by 300-400 millimeters and take a 4U server and reduce it down to a 1U form factor.
From a power perspective, how densely can you pack these systems when using immersion cooling?
We've had data tanks running up to 250 kilowatts in a 48U form factor for almost seven years now. I don't think that the IT hardware has caught up to the cooling technology — usually it's the inverse in our industry.
In the hyperscale space, we're seeing 125-150 kilowatt loads for a 48U form factor. So we have quite a bit of headroom to potentially increase compaction.
When you're doing a brownfield installation, you may be limited by the breaker sizing upstream. That's a factor that has to be considered whenever you're looking to densify your IT hardware and or increase the TDP of your chips.
What's the return on investment for immersion cooling, and is this something best suited to a greenfield deployment?
The answer will always be site specific. We did a third-party case study with Paige Southerland out of Austin, Texas.
Paige looked at a 36 megawatt example layout design based on an air cooled versus a two-phase immersion cold datacenter and calculated that the day-one capex savings can be up to $3.5 million per megawatt. We're talking in the range of a 20-30 percent day-one cost reduction.
What are the biggest challenges facing immersion cooling adoption and how can the industry as a whole come together to address them?
The common thread that we hear is the industry is really seeking standardization. Certain groups and bodies are doing amazing work right now to kind of work in that direction.
Some good examples of that would be agencies like our groups like Green Grid, or OCP, and even American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). ®