New Chinese exascale supercomputer runs 'brain-scale AI'

Massive model gobbles 37 million homegrown 'Sunway' cores

Back in October, reports surfaced that China had achieved exascale-level supercomputing capabilities on two separate machines, one of which is its Sunway "Oceanlite" system, which is built with entirely Chinese components, from CPU to network.

While there have been few architectural details to date, a paper [PDF], published today, outlines the compute, memory, and other aspects, in addition to showing off the capabilities via a system-spanning AI workload for a pre-trained language model with 14.5 trillion parameters with mixed precision performance of over one exaflop.

The system has "as many as 96,000 nodes" the paper reveals, based on the Sunway SW26010-PRO compute units (manycore with built-in custom accelerators) with custom memory configuration and a homegrown network fabric.

Although the exascale achievement results using the supercomputing standard "Top 500" benchmark were verified, though not published, it is important to note that this "brain-scale" workload is not itself running at full exascale capability. Generally in supercomputing performance measurements, the standard is 64-bit floating point (FP64) but this work was based on mixed precision. The new Sunway system can handle FP64, FP16, and BF16 and can trade those around during training for maximum efficiency.

Even though mixed precision fails to make this a true sustained exascale workload in traditional terms, it does show evidence of some impressive hardware/software co-design thinking, especially as the supercomputing world wraps its collective head around how AI/ML is supposed to integrate with "old school" modeling and simulation.

The Chinese team provides detailed chip and node-level details for tuning HPC systems for AI, including scheduling, memory, and I/O operation optimizations and a unique parallelization strategy that mixes parallel models and then cuts down on compute time and memory use. They also developed a distinct load balancer and strategy for using mixed precision efficiently.

"This is an unprecedented demonstration of algorithm and system co-design on the convergence of AI and HPC," the paper's authors say.

The model and optimization set, called BaGuaLu, "enables decent performance and scalability on extremely large models by combining hardware-specific optimizations, hybrid parallel strategies, and mixed precision training," the team adds.

The authors, which include Alibaba employees in addition to academics from major Chinese universities, add that with current capabilities, a 174-trillion parameter model train is within the realm of possibility.

For avid readers of the architecture-centric The Next Platform, you can be sure there is a deep dive into the chewy bits of the architecture later today. Information about the machine's architecture has been light but there is finally some detail to sink our teeth into. ®

Similar topics

Broader topics

Other stories you might like

  • Lenovo halves its ThinkPad workstation range
    Two becomes one as ThinkPad P16 stands alone and HX replaces mobile Xeon

    Lenovo has halved its range of portable workstations.

    The Chinese PC giant this week announced the ThinkPad P16. The loved-by-some ThinkPad P15 and P17 are to be retired, The Register has confirmed.

    The P16 machine runs Intel 12th Gen HX CPUs, but only up to the i7 models – so maxes out at 14 cores and 4.8GHz clock speed. The laptop is certified to run Red Hat Enterprise Linux, and can ship with that, Ubuntu, and Windows 11 or 10. The latter is pre-installed as a downgrade right under Windows 11.

    Continue reading
  • US won’t prosecute ‘good faith’ security researchers under CFAA
    Well, that clears things up? Maybe not.

    The US Justice Department has directed prosecutors not to charge "good-faith security researchers" with violating the Computer Fraud and Abuse Act (CFAA) if their reasons for hacking are ethical — things like bug hunting, responsible vulnerability disclosure, or above-board penetration testing.

    Good-faith, according to the policy [PDF], means using a computer "solely for purposes of good-faith testing, investigation, and/or correction of a security flaw or vulnerability."

    Additionally, this activity must be "carried out in a manner designed to avoid any harm to individuals or the public, and where the information derived from the activity is used primarily to promote the security or safety of the class of devices, machines, or online services to which the accessed computer belongs, or those who use such devices, machines, or online services."

    Continue reading
  • Intel plans immersion lab to chill its power-hungry chips
    AI chips are sucking down 600W+ and the solution could be to drown them.

    Intel this week unveiled a $700 million sustainability initiative to try innovative liquid and immersion cooling technologies to the datacenter.

    The project will see Intel construct a 200,000-square-foot "mega lab" approximately 20 miles west of Portland at its Hillsboro campus, where the chipmaker will qualify, test, and demo its expansive — and power hungry — datacenter portfolio using a variety of cooling tech.

    Alongside the lab, the x86 giant unveiled an open reference design for immersion cooling systems for its chips that is being developed by Intel Taiwan. The chip giant is hoping to bring other Taiwanese manufacturers into the fold and it'll then be rolled out globally.

    Continue reading

Biting the hand that feeds IT © 1998–2022