To quench AI's thirst, the way we build, operate datacenters needs to change

AI infrastructure isn't just hot and power hungry, it's uses a prodigous amount of water researchers tell El Reg

Generative models like GPT-4, Midjourney, and others have many understandably concerned about the technology's potential to disrupt society, but it's quenching AI's seemingly insatiable thirst that has researchers at the University of California Riverside and the University of Texas Arlington attention.

In a paper released [PDF] to the public last month, researchers estimated that training a GPT-3-scale large-language model would use roughly 700,000 liters of water, the equivalent of producing 320 Tesla EVs.

"ChatGPT needs to 'drink' a 500ml bottle of water for a simple conversation of roughly 20-50 questions and answers, depending on when and where ChatGPT is deployed," the researchers wrote.

And that's with today's models and technology. Researchers anticipate that as AI becomes more prevalent, water consumption will skyrocket, unless steps are taken to address datacenter water footprints.

That's bad news, particularly in the US. Climate models developed at the Department of Energy's Argonne National Labs project that by the middle of the century — just a short 27 years from now — large portions of the country will be in a state of persistent drought.

Dissecting AI water consumption

As you might expect, determining the water consumption of something as complex as an AI datacenter isn't exactly a straightforward process. To start, researchers need to know the water use and power use effectiveness (WUE/PUE) of the facility in question, from where it gets its power, how it cools its facilities, weather conditions, time of day, the amount of power required to train the model, and a host of other metrics. The problem: this isn't exactly the kind of information hyperscale and cloud providers like to reveal. 

"These companies, for some reason, release very rough information about their water usage — either the average water usage effectiveness or the actual total water consumption, but not both, and those numbers are usually annualized," Shaolei Ren, associate professor of electrical and computer engineering at UC Riverside and the primary investigator on the paper told The Register.

As a result, researchers were forced to make a fair number of assumptions about the nature of the facilities used to train these models.

Despite these assumptions, Ren claims his team was able to adapt a model developed by SPX Cooling Technologies to estimate the amount of water used by different facilities at different times, while also taking into consideration weather conditions, using the best available data. And from this, the researchers say they were able to ballpark the water requirements for various AI models including Google's LaMDA and Open AI's GPT-3 with what researchers believe is a reasonable degree of accuracy.

Does AI have to be so thirsty?

The paper paints a stark picture about AI's thirst for fresh water, however, Dell'Oro Group analyst Lucas Beran emphasizes that it's not the models that use the water, it's the thermal management systems used to keep them cool. "This isn't an AI problem; this is a thermal management problem," he said.

Depending on how and where the datacenter is built and whether the servers are air or liquid cooled, the amount of water used can vary wildly.

In the paper, researchers focused on cooling towers — a form of evaporative cooler — because they are "the most common cooling solution for warehouse-style datacenters." As water evaporates, it pulls heat out of the air. This phenomenon makes evaporative cooling popular among datacenter operators as it tends to use less power than alternative tech, and in many climates they only need to be run during the hottest months of the year.

That's not to say evaporative cooling is the only option. Last month, Microsoft committed to using "zero water" cooling infrastructure at two planned datacenters at campus in Goodyear, near Phoenix, Arizona. However, as we reported, while these facilities may use less water on site, they may end up using more power from the grid as a result.

Another option for reducing on-site water consumption is to adopt more efficient cooling tech at a system level. Today, many GPU nodes used in AI training can pull in excess of 10kW from the wall, pushing the limits of an air-cooled system.

"Investing in AI just naturally pushes you toward next-generation thermal management technologies — largely referencing direct liquid cooling and immersion cooling — to get the most performance out of your hardware, as well as to do so in the most economical fashion," Beran said.

While it may sound counterintuitive, liquid and immersion cooling tech can actually reduce the amount of water used by datacenters by removing heat more efficiently, he explained.

For reference, Submer and LiquidStack, two immersion-cooling vendors, often tout PUE ratings of less than 1.05 — making them far more efficient than typical air-cooled datacenters which usually come in at 1.4-1.5.

When and where matter

In addition to architecting datacenters in a way that minimizes their water consumption, the paper offers some recommendations for cutting AI water consumption.

The first is rather simple: don't train AI models in hot climates. According to researchers, training GPT-3 in Asian datacenters would need three times as much water as in the US. Instead, researchers suggest scheduling AI training workloads at facilities in cooler climates, particularly those that can take advantage of free cooling. Or, for smaller workloads, researchers suggest scheduling those tasks to run during the evening when temperatures tend to be lower and less water is lost to evaporation.

The researchers point to efforts by Apple and Microsoft to use scheduling to reduce the carbon footprint of device charging and updates as just one example.

The latter point is a bit of a paradox though, as the times of the day most efficient for water consumption also tend to be worse in terms of carbon emissions — you can't use solar if the sun isn't shining. To get around this, Ren and his team suggest taking advantage of datacenter's battery backup systems to effectively time shift renewable energy from sources like solar until the evening hours.

This also assumes the datacenter has a battery backup. Many existing datacenters rely on diesel or natural-gas fuel-cell generators.

A lack of transparency is holding back research

To better understand AI's impact on datacenter water consumption, more transparency is needed, the researchers note.

"It is crucial to have better visibility into the runtime WUE and increase transparency by keeping the AI model developers as well as end users informed," they wrote.

In this regard, Ren and his team could be in luck. As we reported last month, the European Commission is currently in the process of reviewing revisions to the Energy Efficiency Directive, which among other things would institute reporting requirements for all but the smallest datacenters.

While Beran remains cautious about the potential for regulation to inhibit datacenter operators, he agrees that better transparency is essential to optimizing these facilities, and a reporting requirement would at least level the playing field.

"The data center industry is just so secretive that sometimes it's hard to get the appropriate data to build models like this," he said. "I'd like to see the large cloud and colocation hyperscalers lead the way without needing to be regulated. But, I do feel that they are starting to run out of time to do that themselves before regulation has to happen." ®

More about

TIP US OFF

Send us news


Other stories you might like