Ready for testing: First-ever supercomputer powered by Intel's wildcard AI chips

At the Haba, go, go Habana. The hottest research north of Havana

The San Diego Supercomputer Center (SDSC) says it's ready to run test workloads on its experimental Voyager AI system, which looks to be the first-ever Intel Habana-based supercomputer.

The supercomputer was built in collaboration with Intel's Habana Labs and Supermicro as part of a five-year $11.25 million grant from America's National Science Foundation. And while powerful, Voyager isn't trying to win any benchmark records — it's not supposed to.

Voyager is intended to be a proving ground for AI/ML computing research and development on specialized hardware — in this case, Habana's Goya and Gaudi processors — Voyager Principal Investigator Amit Majumdar told The Register.

Introduced in 2019, Habana Lab's Goya was designed to accelerate AI inference workloads using eight tensor processor cores with support for mixed precision from FP32 to UINT8. Meanwhile, Gaudi, introduced a few months later, was a 350W chip designed with ML training in mind. It featured 32GB of onboard memory operating at a bandwidth of 1TB/s.

Intel acquired the chip designer in late 2019 after abandoning its ill-fated Nervana collab with Meta (then Facebook). Sort of a third-time lucky thing for Intel on AI systems.

The Habana AI accelerators are deployed across 42 Supermicro X12 networks that make up Voyager. Each X12 system is equipped with a pair of Intel's third-gen Xeon Scalable processors and eight Habana Gaudi AI processors. The cluster also employs a pair of the OEM's SuperServer 4029GP-T systems with eight Goya HL-100 PCIe cards for AI inferencing.

Because the system is designed to support very large AI models, each server is networked with six 400 Gbit/sec ports operating over the RDMA-over-converged-Ethernet protocol to a large Arista non-blocking switch.

Ready, set, test

With the Voyager system operational, SDSC has transitioned to the test-bed phase of the project.

During this period, the supercomputing center has three years to work directly with researchers to suss out the system's performance, hardware quirks, and software compatibility requirements, Majumdar explained.

The research will also explore use cases for Habana's chips, which have traditionally targeted computer vision, natural-language processing, and deep-learning workloads, Sree Ganeson, head of software product management at Habana Labs, told The Register.

"This community of scientists and researchers are going to bring a different class of problems and try to apply them too deep learning," she said. "The kinds of patterns they may bring might be different, so, it's going to be a learning [process]."

The results of this testing will be shared over the next few years during semiannual workshops and user forums.

However, not everyone will get to work on the system. Research groups determined with the help of an external advisory board, and the information collected will be used to develop best practices and allocation policies. This is different from category-one systems, which are opened to peer-reviewed research projects shortly after coming online, Majumdar said.

After the three years are up, the project will transition to a two-year allocation phase during which the SDSC team will step back and allow independent scientists to conduct research on the system.

While Voyager has only just come online, Majumdar claims early testing has been promising, with performance being "better than projected" and workloads porting relatively painlessly to run on Gaudi and Goya. "The software stack, porting, and running on the machine has been really smooth," he said.

What about Gaudi2 and Greco?

Voyager comes online just weeks after Intel's Habana Labs unveiled its second-gen AI training and inference processors: Gaudi2 and Greco.

Intel claims the chips offer a substantial performance boost over the previous generation and allegedly outperform Nvidia's A100 GPUs in its internal benchmarks.

The 600W Gaudi2 offers 24 tensor cores based on a 7nm manufacturing process and 96GB of HBM2e high-bandwidth memory operating at 2.45TB/s. Greco, meanwhile, offers 16GB — the same as Goya — of newer LPDDR5 in a smaller single-slot, half-height, half-length PCIe card that consumes less than half the power.

"Gaudi2 is bigger in many ways with more tensor processor cores, more HBM2e, more scale-out ports, so whatever we learn from [Voyager] should scale even better on Gaudi2," Ganeson said. "The cutting edge work is getting done by this community. So, we get to learn and develop for what's going to be in production in the future." ®

Other stories you might like

  • Intel is running rings around AMD and Arm at the edge
    What will it take to loosen the x86 giant's edge stranglehold?

    Analysis Supermicro launched a wave of edge appliances using Intel's newly refreshed Xeon-D processors last week. The launch itself was nothing to write home about, but a thought occurred: with all the hype surrounding the outer reaches of computing that we call the edge, you'd think there would be more competition from chipmakers in this arena.

    So where are all the AMD and Arm-based edge appliances?

    A glance through the catalogs of the major OEMs – Dell, HPE, Lenovo, Inspur, Supermicro – returned plenty of results for AMD servers, but few, if any, validated for edge deployments. In fact, Supermicro was the only one of the five vendors that even offered an AMD-based edge appliance – which used an ageing Epyc processor. Hardly a great showing from AMD. Meanwhile, just one appliance from Inspur used an Arm-based chip from Nvidia.

    Continue reading
  • Is computer vision the cure for school shootings? Likely not
    Gun-detecting AI outfits want to help while root causes need tackling

    Comment More than 250 mass shootings have occurred in the US so far this year, and AI advocates think they have the solution. Not gun control, but better tech, unsurprisingly.

    Machine-learning biz Kogniz announced on Tuesday it was adding a ready-to-deploy gun detection model to its computer-vision platform. The system, we're told, can detect guns seen by security cameras and send notifications to those at risk, notifying police, locking down buildings, and performing other security tasks. 

    In addition to spotting firearms, Kogniz uses its other computer-vision modules to notice unusual behavior, such as children sprinting down hallways or someone climbing in through a window, which could indicate an active shooter.

    Continue reading
  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading

Biting the hand that feeds IT © 1998–2022