HPC

This article is more than 1 year old

New Chinese exascale supercomputer runs 'brain-scale AI'

Massive model gobbles 37 million homegrown 'Sunway' cores

Fri 11 Mar 2022 // 15:47 UTC

Back in October, reports surfaced that China had achieved exascale-level supercomputing capabilities on two separate machines, one of which is its Sunway "Oceanlite" system, which is built with entirely Chinese components, from CPU to network.

While there have been few architectural details to date, a paper [PDF], published today, outlines the compute, memory, and other aspects, in addition to showing off the capabilities via a system-spanning AI workload for a pre-trained language model with 14.5 trillion parameters with mixed precision performance of over one exaflop.

The system has "as many as 96,000 nodes" the paper reveals, based on the Sunway SW26010-PRO compute units (manycore with built-in custom accelerators) with custom memory configuration and a homegrown network fabric.

Although the exascale achievement results using the supercomputing standard "Top 500" benchmark were verified, though not published, it is important to note that this "brain-scale" workload is not itself running at full exascale capability. Generally in supercomputing performance measurements, the standard is 64-bit floating point (FP64) but this work was based on mixed precision. The new Sunway system can handle FP64, FP16, and BF16 and can trade those around during training for maximum efficiency.

Even though mixed precision fails to make this a true sustained exascale workload in traditional terms, it does show evidence of some impressive hardware/software co-design thinking, especially as the supercomputing world wraps its collective head around how AI/ML is supposed to integrate with "old school" modeling and simulation.

The Chinese team provides detailed chip and node-level details for tuning HPC systems for AI, including scheduling, memory, and I/O operation optimizations and a unique parallelization strategy that mixes parallel models and then cuts down on compute time and memory use. They also developed a distinct load balancer and strategy for using mixed precision efficiently.

"This is an unprecedented demonstration of algorithm and system co-design on the convergence of AI and HPC," the paper's authors say.

The model and optimization set, called BaGuaLu, "enables decent performance and scalability on extremely large models by combining hardware-specific optimizations, hybrid parallel strategies, and mixed precision training," the team adds.

The authors, which include Alibaba employees in addition to academics from major Chinese universities, add that with current capabilities, a 174-trillion parameter model train is within the realm of possibility.

For avid readers of the architecture-centric The Next Platform, you can be sure there is a deep dive into the chewy bits of the architecture later today. Information about the machine's architecture has been light but there is finally some detail to sink our teeth into. ®

More about

China
HPC

More about

China
HPC

Narrower topics

Narrower topics

Broader topics

APAC

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

HPC

New Chinese exascale supercomputer runs 'brain-scale AI'

Massive model gobbles 37 million homegrown 'Sunway' cores

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Intel preps export-friendly lower-power Gaudi 3 AI chips for China

US senator wants to put the brakes on Chinese EVs

WhatsApp, Threads, more banished from Apple App Store in China

Protecting distributed branch office environments from ransomware

China scientists talk of powering hypersonic weapon with cheap Nvidia chip

China orders its telcos to rip and replace US chips with homegrown silicon by 2027

India and EU finally advance HPC collaboration project hatched in 2022

Where there's a will, there's Huawei to develop one's own chipmaking kit

Microsoft brings World of Warcraft and other Blizzard titles back to China

TSMC expects customers to pay more for chips fabbed overseas

Singapore infosec boss warns China/West tech split will be bad for interoperability

ASML profits plunge 40% amid dip in chipmaking tool orders

About Us

Our Websites

Your Privacy