Aleph Alpha enlists Cerebras waferscale supers to train AI for German military

Also demonstrates success accelerating molecular dynamics

Even as world leaders raise alarm bells about the impact of AI in war, waferscale startup Cerebras is joining forces with Aleph Alpha to develop sovereign models for the German armed forces.

Under the multi-year agreement, Cerebras has announced Aleph Alpha as the first company in Europe to deploy its CS-3 AI supercomputers. These systems will apparently be housed at alpha ONE – Aleph's AI datacenter at the GovTech Campus in Berlin. Up to this point, Cerebras's systems have largely been deployed in the US.

"We chose Cerebras because of their world-class AI expertise and peerless waferscale technology that enables us to train state of the art AI models with high efficiency," Aleph Alpha CEO Jonas Andrulis declared in a statement.

While Cerebras competes directly with Nvidia in the AI training space – it isn't interested in inferencing and has hooked up with Qualcomm for that reason – its chips bear little resemblance to team green's GPUs.

Cerebras's third-gen parts, announced in March, measure 46,225 mm2 (about 71.6 inches2) and pack 4 trillion transistors powering 900,000 cores. Instead of the costly high-bandwidth memory found on most AI accelerators, the chip relies on a massive 44 GB pool of SRAM etched right into the dinner plate-sized component.

Collectively, each chip is capable of outputting 125 petaFLOPS of incredibly sparse FP16 performance. Cerebras claims an 8x improvement in this department.

Apart from the fact Cerebras's CS-3 systems will provide the computational grunt necessary for Aleph Alpha to train sovereign models for the German military, the announcement was rather short on details as to how they might be employed.

In any case, the agreement comes just weeks after multiple world leaders and market mogul Warren Buffett drew comparisons between artificial intelligence and the atomic bomb.

Cerebras, Neural Magic push the limits of sparse models

On a brighter note, Cerebras on Wednesday revealed its recent successes developing sparse models that can take advantage of its unique compute architecture.

Working with AI startup Neural Magic, Cerebras claims to have developed a novel approach to shrinking models by as much as 70 percent – which it calls sparse fine-tuning.

This reduction in model size has a couple of benefits, including reduced compute requirements and memory footprints. Because Cerebras's waferscale accelerators are optimized for half-precision (FP/BF16), every billion parameters consumes two gigabytes of memory. As such, larger models must be spread across multiple accelerators – just like you see in GPU systems.

By pruning unneeded model weights, you can reduce the size of the model – which often has the benefit of improving throughput, since these smaller models put less pressure on the memory subsystems.

To be clear, the concept of weight pruning isn't new and isn't without faults. While pruning can help reduce the size of the model, it can also degrade accuracy.

Cerebras and Neural Magic's approach to sparse fine-tuning adds two additional steps to restore accuracy to pre-pruned levels, and works a bit like this.

First, a one-shot pruning pass is made on a dense model, like Llama. This removes about 50 percent of the model's weights, shrinking it considerably. Second, that pruned model is pre-trained using Cerebras's SlimPajama dataset to recover lost accuracy.

Finally, the model is fine-tuned on application-specific datasets for common tasks like chatbots or code generators. According to Cerebras, this three-stage approach renders LLMs with the same level of accuracy while being up to 70 percent smaller.

Cerebras claims molecular dynamics superiority over Frontier

While Cerebras's focus is understandably on artificial intelligence, that's not to say its waferscale chips aren't useful for more traditional HPC workloads.

Working in collaboration with the Department of Energy's Sandia, Lawrence Livermore, and Los Alamos National Labs, the chipmaker claims it was able to perform atomic-scale simulations of molecules in the millisecond regime. That would make it 179x faster than Frontier, the fastest publicly known supercomputer on the Top500.

"This work changes the landscape of what is possible with molecular dynamics simulations," Michael James, Cerebras chief architect, bragged in a statement. "Simulations that would have taken a year on a traditional supercomputer can now be completed in just two days."

According to Cerebras, these results were achieved by mapping individual atoms to its older WSE-2's more than 800,000 cores. Because all of these cores are contained on a single wafer, they're able to communicate with each other much more efficiently. According to Cerebras, this allowed the system to simulate 270,000 time steps a second for each of those atoms.

For Sandia researcher Siva Rajamanickam, the results represent a major milestone in the National Nuclear Security Administration's mission to boost the performance of its critical systems by 40x. "These results open up new opportunities for materials research and science discoveries beyond what we envisioned," he enthused in a statement. ®

Need more analysis? Don't forget to check out Timothy Prickett Morgan's commentary on Cerebras's latest HPC advances right here on The Next Platform.

More about


Send us news

Other stories you might like