HPC

Red Hat helps US Department of Energy containerize supercomputing

You might say the US agency needed an OpenShift in mindset


Cloud-native architectures have changed the way applications are deployed, but remain relatively uncharted territory for high-performance computing (HPC). This week, however, Red Hat and the US Department of Energy will be making some moves in the area.

The IBM subsidiary – working closely with the Lawrence Berkeley, Lawrence Livermore, and Sandia National Laboratories – aims to develop a new generation of HPC applications designed to run in containers, orchestrated using Kubernetes, and optimized for distributed filesystems.

The work might also make AI/ML workloads easier for enterprises to deploy in the process.

While Kubernetes, containerization, and block storage are all old hat in hyperscale and cloud datacenters, the technologies haven't been deployed on a wide scale in HPC environments. And where they have, they've been highly specialized to suit the workload's unique requirements.

"Our workloads are very different than the cloud. We need to run one very large job, and that gets split into many tens, hundreds, thousands of individual CPUs. It's a one-to-many mapping," Andrew Younge, research and development manager at Sandia National Laboratories, told The Register.

By comparison, cloud providers are primarily concerned with availability and capacity. In other words, how to make an application scale to meet the needs of rapidly changing usage and traffic patterns.

"With that in mind, we're trying… to use cloud-native technologies in the context of HPC, and that takes some customization," Younge explained.

Containerization isn't exactly new to HPC, but it has often been deployed in specialized runtimes, he added.

"By starting to adopt more standard technologies, that means that we can start to leverage other parts of the ecosystem," Shane Canon, senior engineer at Lawrence Berkeley National Laboratory, told The Register.

"What we want is to be able to run our HPC workloads, but we also want to start to marry that with Kubernetes-style deployments and configurations and execution."

Red Hat tips its hat to containerized HPC

"If you look at containerization in general, we have historically been focused on the application value of containers," Yan Fisher, global evangelist for emerging technologies at Red Hat, told The Register. "This really speaks to more of an infrastructure application."

To address these challenges, the IBM subsidiary is working with each of the labs to integrate cloud-native technologies into and in support of HPC workflows.

At Berkeley, Red Hat is working with Canon to make enhancements to Podman, a daemonless container engine similar to Docker, to replace the National Energy Research Scientific Computing Center's custom Shifter development runtime.

Similarly, at Sandia, Red Hat is working with Younge's team to explore the deployment of workloads on Kubernetes at scale using its OpenShift platform.

"In terms of Kubernetes, there's a lot of value to having that flexibility. We're traditionally used to HPC representing everything as a job, and that can sometimes be limiting," Younge said. "Representing services as well as jobs in some amalgamation of the two really provides a comprehensive scientific ecosystem."

Meanwhile, at Lawrence Livermore National Laboratory, the software vendor aims to help researchers deploy and manage containerized workloads alongside traditional HPC applications.

All three labs are investigating ways to deploy these workloads in distributed filesystems as opposed to the specialized parallel filesystems used today.

The ultimate goal of these endeavors is to make HPC workloads deployable on Kubernetes at "extreme scale" while providing users with well-understood ways of deploying them.

"A lot of this, especially with Podman, is about ensuring that the lessons we've learned in HPC can make it to a wider community," Younge said.

The benefits of this work extend well beyond the realm of science. The ability to easily deploy HPC workloads in containers or on Kubernetes has implications for the wave of enterprises scrambling to deploy large parallel workloads like AI/ML, he added. ®


Other stories you might like

Biting the hand that feeds IT © 1998–2022