What do you mean your exaflop is better than mine?

Gaming the system was fine for a while, now it's time to get precise about precision

Comment A multi-exaflop supercomputer the size of your mini-fridge? Sure, but read the fine print and you may discover those performance figures have been a bit … stretched.

As more chipmakers bake support for 8-bit floating point (FP8) math into next-gen silicon, we can expect an era of increasingly wild AI performance claims that differ dramatically from the standard way of measuring large system performance, using double-precision 64-bit floating point or FP64.

When vendors shout about exascale performance, be aware that some will use FP8 and some FP64, and it's important to know which is being used as a metric. A computer system that can achieve (say) 200 peta-FLOPS of FP64 is a much more powerful beast than a system capable of 200 peta-FLOPS at just FP8.

These performance numbers are not technically fabrications, but there is certainly a new kind of numbers game afoot – especially among the supercomputing set. It appears some are taking the opportunity of the rise in machine-learning workloads to break away from using FP64, and use the lower FP8, to make their performance numbers seem immense. There is some logic to it, as AI models don't need the full FP64 and in fact will do pretty much just as well with a lower precision, such as FP8.

AI startup Graphcore contends that standardizing on FP8 as an industry will allow for better machine-learning performance and efficiency while enabling "seamless interoperability" of workloads across systems for training and inference. More likely, the AI startup sees FP8 as an opportunity to level the playing field and win customers away from Nvidia, AMD, and others.

Nvidia, for its part, added support for sparse FP8 math with the launch of its Hopper-based GH100 GPU in March. And at ISC this spring, Nvidia shared that its upcoming Venado supercomputer – one of the first based on an all-Nvidia architecture – would deliver "10 exaflops of peak AI performance."

For those who follow supercomputing milestones, reaching exascale performance is a massive, expensive achievement – but of course, that's using the traditional FP64 measuring point. Now, all exaflop performance claims have to be thoroughly investigated to determine precision.

In reality, Venado relies on FP8 with sparsity to achieve its performance claims. It doesn't hold a candle to the AMD-based Frontier supercomputer, which holds the top spot at 1.1 exaflops of actual FP64 performance.

If Nvidia kept this low precision reporting up, the FP64-based Top500 list would be an absolute bloodbath led by Nvidia's H100 accelerators and Grace CPUs.

Rest assured, there hasn't been some colossal breakthrough that's allowed chipmakers to leapfrog Moore's law several times over. We've just become more comfortable with less precision – at least when it comes to AI.

There's nothing inherently wrong with FP8. If you're trying to achieve the absolute highest number of flops possible, trading precision for speed is a pretty economical way to do it – especially if you've got hardware optimized for the task.

The fewer bits you use, the less accurate your results will be – but the easier and faster they'll be to calculate. If your workload favors speed – and giving up a little accuracy in the process won't make a meaningful difference – why wouldn't you?

For the past few years, the industry settled on FP16 for AI workloads, and FP64 for traditional computational calculations, and with the latest generation of AI accelerators, FP8 is clearly en vogue.

This week, Graphcore proposed the adoption of FP8 as an industry standard for AI/ML workloads to the IEEE. But it's not just Graphcore that would like to see the industry come together around FP8.

Mike Mantor, AMD's chief GPU architect, and John Kehrli, senior director of product management at Qualcomm, also expressed support for an FP8 AI compute standard, in a statement provided by Graphcore.

Based on their enthusiasm, there's little doubt the next generation of AMD and Qualcomm accelerators will also feature FP8 support to match Nvidia's H100.

Whether or not standardizing on FP8 will make AI performance any less confusing remains to be seen. Until then, check the fine print and take any lofty performance claim with a healthy grain of salt. ®

Similar topics


Other stories you might like

Biting the hand that feeds IT © 1998–2022