What’s behind Lenovo’s future-proof HPC strategy?

Exascale to Every Scale

Paid Feature 2014 marked the beginning of a meteoric rise for Lenovo in high performance computing (HPC). From zero systems on the Top 500 ranking of the world’s most powerful supercomputers to leading the system share count in 2018, to maintaining that #1 position today, the company’s ascent is nothing short of miraculous.

What is the secret to Lenovo’s fast climb to HPC leadership? “When Lenovo purchased the IBM x86 computing group in 2014, we saw multiple unique elements come together. IBM’s deep technical experiences along with commitments to partnerships,” says Scott Tease, VP of HPC and AI at Lenovo.

“All of this combined with the DNA Lenovo is known for: owning its own supply chain, our advantages of scale, and speed of execution.” He says this converged for large supercomputing sites but in their overall data centre business as well and it’s not just about servers. The company has been on a tear in its storage, software, and services business as well.

On the HPC side, Tease understands well what it takes to grow a complex business in commercial, academic, and public sector segments. That IBM x86 business Lenovo acquired was composed of his own team as former manager of the unit at IBM for well over a decade. Just as important, he sees the writing on the wall when it comes to what HPC will need now - and in the coming ten years. It all starts with bringing exascale-driven technologies to every HPC shop.

This “From Exascale to Every Scale” concept means that the same high-end technology building blocks the largest banks and national labs have can be the exact same single-node or small cluster customers can deploy. “It is an engineering commitment that allows any customer to easily integrate the same HPC technologies used at the grandest scale. Our goal is to make sure these building blocks are engineered so that no matter how large or small your centre, even if you’re buying a single rack, you have access to the same technology at large installations.”

In addition to focusing on building blocks with the top end technology ready for deployment at any scale, Tease says the other bit of writing on the wall is making all of this capability efficient. This goes beyond mere energy consumption figures. As HPC systems require ever-greater density, that means they will generate even more heat -so much that traditional air-cooled methods of heat removal will no longer be up for the job.

Part of the future-proofing process for Lenovo to keep its placement as the top provider for HPC systems on the Top 500 is to continue its leadership in novel cooling technologies. At the heart of that effort is the Lenovo Neptune cooling system, which features direct warm water cooling in an ultra-dense enclosure that can satisfy the density demands of HPC without overheating elements, which leads to performance degradation and shutdowns.

High-end building blocks

Some of Lenovo’s forward-looking customers also saw the writing on the wall - in all capital letters, as it turns out. Simon Thompson is one of the hardware architecture masterminds behind compute resources at the University of Birmingham in the UK. He says he anticipated, as did Tease, a need for high-end building blocks as well as an increased need to focus on efficiency. In 2014, however, this meant taking some risks, both with evaluating bringing water into a data centre and later, with having Top 500-class resources that could be cobbled together seamlessly over time.

Like many university HPC centres that must serve a broad, diverse set of research on the same hardware, the University of Birmingham takes a build-as-we-go approach to new hardware acquisitions. Thompson says the advantages of this include being able to adopt new technologies (accelerators, new processors, etc.) as they arrive, avoiding the “ouch” factor financially with a single cluster purchase, and having the flexibility to meet user needs with the right hardware rather than a generalised resource.

With EasyBuild and other tools, he adds, managing these additions is no longer a time-consuming hassle, most of the process is automated. With all of this flexibility in place the biggest issue becomes keeping performance at its peak and managing power and cooling cost effectively.

Thompson and team were among the first to look at novel cooling technologies. In 2014 they looked at immersion cooling among other alternatives to air. At the same time, Lenovo began building on the liquid-cooled offerings it inherited from IBM, going beyond the direct water cooled NeXtScale chassis.

In 2018, with the introduction of the Lenovo Neptune branding for all things liquid cooled, Lenovo established itself as an industry leader in liquid cooling, with liquid cooled systems that were as easy to install, operate and service as their air cooled counterparts - something immersion competitors cannot guarantee with their requirements for server draining, anti-slip mats, and specialized handling.

By 2021 Lenovo released its fifth generation of Lenovo Neptune liquid cooling technology, and Thompson explains this capability is right on point with where HPC is heading: ultra-high density compute with power hungry CPU, GPU and larger memory, advanced networking all hooking into scalable storage systems.

The university’s latest cluster is based on Intel Xeon “Ice Lake” processors, Nvidia A100 GPUs and all of this compute, enough to rank in the top 200 supercomputers in terms of peak performance, is handled in a mere two racks.

“We are at a point where HPC can no longer deploy systems like this without liquid cooling,” Thompson says.

“What used to be unique is now the norm,” Tease agrees. “It used to be the exception to have 250 or 350W GPUs and 240W CPUs - now they are the norm and on the continual rise. Everyone is going to have to plan how to mitigate that heat and handle power delivery. What we need to be thinking about is how a data centre should work, how we deliver power and cooling, and what that means for people’s ability to keep on-roadmap with new technologies.”

While Thompson’s group in the UK is nowhere near the size and scale of some of Lenovo’s largest supercomputing customers (LRZ in Germany, KMA in South Korea) with its Lenovo Neptune liquid cooled systems, their story proves Tease’s concept of “From Exascale at Every scale.” The same high-end node- and rack-level technologies that power some of Lenovo’s largest installations are now available to University of Birmingham researchers.

Thompson says they have been able to take on far more applications from users in the last year and from a data centre perspective, they have saved 40 per cent of what an air-cooled facility would have required. Further, the ultra-dense Lenovo Neptune nodes in their n1200 enclosures means far less floor space for the space-constrained centre.

“The higher the power, the higher the heat output, the more interesting liquid cooling becomes,” Tease says. He adds that what began as an experiment over the years is turning into a liquid cooling revolution, moving from sites in Europe where power costs are often higher to China, Asia, Australia, and North America in addition to even more European sites. In addition to the performance advantages of cool systems (can run in turbo mode more frequently and for longer, avoid failures due to heat, etc.)

Tease adds another benefit - one that is less obvious. “That 40 per cent reduction in costs for keeping systems like that cool doesn’t need to go directly to the bottom line. It can be recycled back into other green and environmentally aware data centre projects.”

The University of Birmingham is living proof of Lenovo’s HPC strategy to date: a focus on building technologies that scale no matter the size cluster which they are deployed, that emphasize peak performance and high density, and that can enable that density and performance with state of the art cooling.

“At Lenovo, we are proud to have risen from zero in 2014 to being the leading Top500 supercomputer provider in the world,” Tease says. “More important than the number of systems is that we’re putting supercomputers in more countries and in the hands of more researchers than any other provider. We believe putting smarter technology in the hands of researchers around the world will help solve some of humanity’s greatest challenges and will make the future a better place to live.”

Sponsored by Lenovo.

Biting the hand that feeds IT © 1998–2021