Russia scrambles to bootstrap HPC clusters with native tech
Don't send a mobile chip to do a high-end CPU, GPU job. Unless you have no choice.
With the largest data center chipmakers locking Russia out of next-generation devices, not to mention the withdrawal of mobile and software makers from that market, it is no surprise Russian researchers are on the fast track to develop ways around the new technologies that will drive the rest of the world.
This is important in the Russian context now, but these efforts are likely to spur similar efforts in China, which is also no stranger to sanctions of the tech variety – as we've seen in cases like Huawei, for instance.
The US government last week blocked key technology exports, including semiconductors, to Russia after the invasion of Ukraine. Chipmakers complying with the US export controls include AMD, Intel, TSMC, and GlobalFoundries, at least, with all suspending shipments of products to Russia. Dell, HP, and Lenovo have also stopped shipping products to the country, and Oracle and SAP suspended their business last night.
The effects of these sanctions cannot be overstated as the impact hits everything from Russia's most powerful supercomputers to enterprise systems and the much wider world of mobile for both businesses and consumers.
"In the meantime," says Andrei Sukhov, professor and head of the CAD lab at HSE University in Moscow, "simply stating the problem is not enough; it is necessary to look for a quick way out of the situation, relying on available resources."
In a timely piece via the Association for Computing Machinery (ACM), Sukhov explains how Russian computer science teams are looking at building the next generation of clusters using older clustering technologies and a slew of open-source software for managing everything from code portability to parallelization as well as standards including PCIe 3.0, USB 4, and even existing Russian knock-off buses inspired by Infiniband (Angara ES8430).
Such systems would have to be based on what is readily available from a processing and networking standpoint, which for Russia (and China should it come down to it) have more robust native mobile options.
For some context in terms of the highest end of computing, Russia has seven supercomputers on the Top500 rankings, with 199-node super Chervonenkis its highest ranked at the 19th spot. Chervonenkis is based on AMD Epyc processors with Nvidia A100 accelerators. There isn't much an integrator can do without vital components, including the InfiniBand.
As it turns out, the next three most powerful machines (numbers 36, 40, and 43) are similar configurations with AMD processors and Nvidia accelerators.
"Although these devices are not the most powerful, their production is completely independent and does not affect the patent rights of American or European companies," Sukhov explained.
- Here's why prolonged Russia-Ukraine war would be really bad for us, say chip designers
- Oracle, SAP suspend business in Russia amid invasion
- Joined up thinking: Europe to oversee trio of projects for homegrown chips, HPC gear
- Europe's largest nuclear plant on fire after Russian attack
While all the pieces might be in place, there is still the need to manufacture new boards, a problem Sukhov said can be routed around by using wireless protocols as the switching mechanism between processors, even though the network latency hit will be subpar, making it difficult to do any true tightly coupled, low-latency HPC simulations (which come in handy in areas like nuclear weapons simulations, as just one example).
"Given that the available mobile systems-on-chip are on the order of 100 Gflops, performance of several teraflops for small clusters of high-performance systems-on-chip is quite achievable," Sukhov added.
"The use of standard open operating systems, such as Linux, will greatly facilitate the use of custom applications and allow such systems to run in the near future. It is possible that such clusters can be heterogeneous, including different systems-on-chip for different tasks (or, for example, FPGAs to create specialized on-the-fly configurable accelerators for specific tasks)."
As he told The Register in a short exchange following the article, "As for the existing supercomputers that have already been put into operation, no special problems are foreseen. These supercomputers are based on Linux and can continue to operate without the support of the companies that supplied the hardware and software. According to my information, so far all scientific supercomputers, including those older than five years old, are operated today in a normal mode. Only forced control commands or hacker attacks can stop them. But such actions in relation to scientific projects, including supercomputers, are not yet known to me."
"Naturally, it will be impossible to make a new supercomputer in Russia in the coming years. Nevertheless, it is quite possible to close all the current needs in computing and data processing using the approach we have proposed. Especially if we apply hardware acceleration to tasks, depending on their type," he adds.
"It should be noted that our proposed approach is intended for rapid implementation as a pilot project. During this implementation, software solutions and new protocols for data exchange, as well as computing technologies, will be worked out.
"In the future, it will be possible to refine the cluster device (for example, to try to launch the release of a new motherboard, which will host several chips connected by a common bus)." ®