This article is more than 1 year old
Digital Realty: We hear you like your racks dense, how does 70kW sound?
What else do you do when a single GPU node needs 10kW?
Digital Realty has announced support for high-density deployments of up to 70 kilowatts per rack in a bid to capitalize on growing demand for AI and high-performance computing (HPC) workloads.
While high-density deployments are not new among colo providers, they're far from the norm. A typical datacenter rack might support deployments of between 6-10kW. However, for customers that need more power, bit barns like Digital Realty have been able to support densities up to 90kW, CTO Chris Sharp told our sister site The Next Platform last month.
What Digital Realty aims to do, judging from Monday's announcement, is make dense rack deployments more accessible by standardizing the configurations around specific workloads, including power-hungry generative AI training and inference.
These high-density services support configurations of up to 70kW and are now available in 28 markets across most global regions. While that might sound like a lot of wattage to go around, it works out to between five and six top-spec GPU nodes per rack, with room to spare for storage and high-speed networking.
Of course, all of that power ultimately gets converted into heat. To contend with this, the colo is using "air-assisted liquid cooling." We've approached Digital Realty for more detail on what technologies this encompasses, but based on previous chats with Digital Realty, we can guess they're talking about rear-door heat exchangers (RDHx).
The kind of RDHx tech Digital Realty deploys can be compared to big radiators that bolted on the back of a rack. As coolant flows through the radiator — either from the facility water system or a closed loop system housed in the aisle — it pulls heat out the air as it is exhausted from the servers. RDHx have proven popular among colocation providers as they allow for greater rack power and thermal densities, while avoiding the litany of conflicting standards that come with adopting direct liquid or immersion cooling tech.
More powerful racks for hotter, denser chips
As we know, chips in general are getting hotter in a hurry — a problem that was highlighted by Uptime Institute earlier this year.
With the launch of Intel and AMD's fourth-gen processors, the chipmakers' most powerful CPUs are now capable of sucking down 350W and 400W respectively. Next-gen parts from both chipmakers are expected to use even more power. AMD's Instinct MI300A APU, for example, is rated for 850W when it launches later this year.
"If you look at the silicon roadmaps from Intel, Nvidia, and AMD, these things are space heaters," Zachary Smith, Equinix's global head of edge infrastructure services quipped in an interview with The Register last fall.
- The all liquid-cooled colo facility rush has begun
- Equinix would offer more liquid cooling but struggles without standards
- Intel, AMD just created a headache for datacenters
- Another thing you can blame AI for: Cloud grows but server shipments are down
In exchange for this higher power consumption, chipmakers have been able to cram in more, faster cores. Ideally, this means customers can get by with fewer servers. As AMD claimed during Genoa processor launch last fall, the higher core-counts afforded by its chips meant customers could condense 15 top-spec Intel Ice Lake systems into just five using its chips. But as Uptime noted, few legacy datacenters are really equipped to contend with systems this dense. And unless rack capacities are increased, valuable cabinet space could go wasted.
High density is hot
Digital Realty is not the only colocation provider growing wise to changing power and thermal requirements. In a blog post Monday, Equinix highlighted many of the challenges associated with supporting large-scale AI adoption in its datacenters.
"Generative AI training workloads can consume multiple megawatts of power, thus datacenters have to provide power circuits that can carry more power to the racks," the blog post reads. "Since generative AI GPU racks can consume more than 30kW of power per rack, traditional air cooling isn't efficient enough."
Nvidia's DGX H100 systems, for example, are rated for 10.2kW a piece and are designed to be deployed four to a rack to take advantage of the high bandwidth fabric used to mesh the GPUs together.
In an interview with The Register this spring, Cyxtera Field CTO Holland Barry said the colocation provider was seeing an increased demand for 30-34 kilowatt racks to accommodate their HPC workloads. In the extreme case, Barry said the company has seen requests for as much as 100kW to a single rack.
Other colocation providers, meanwhile, have gone all in on HPC and AI, building facilities specifically tailored to this use case. One example is Colovore, which announced a 9 megawatt datacenter in Santa Clara, California, that would support rack densities of up to 250kW using direct liquid cooling when it comes online next year. ®