Nvidia: An unintended exascale-super innovator

CEO just wanted to play 3D video games

SC11 Jen-Hsun Huang, one of the cofounders of graphics-chip maker Nvidia, never intended to be a player in the supercomputing racket. But his company is now at the forefront of the CPU-GPU hybrid computing revolution that is taking the HPC arena by storm as supercomputing centers try to cram as much math into as small a power budget as possible.

Back when Nvidia was founded in 1993, there were a staggering 80 companies making graphics chips, Huang explained in his keynote at the SC11 supercomputing conference in Seattle on Tuesday. "Our idea was: 'Wouldn't it be fun to build graphics chips so we could play video games in 3D?'," he said. "That was it. The entire business plan."

In fact, Huang admitted, he never did get around to finishing writing-up the business plan. And worse still, there were no 3D games.

But five years later, along came Quake, the first OpenGL application – and suddenly millions of kids were out there buying graphics cards for $200 to $300 to play it. Then other game makers were doing 3D, and suddenly they were in business.

While Huang was proud of those early graphics cards, they underserved the corporate workstation market dominated by the Unix vendors of the time, and Nvidia didn't have much success breaking into the visualization field.

Still, the company continued improving its graphics chips, and eventually started adding more cores and support for more algorithms for the special effects that game writers wanted. And then the company did a funny thing: It added support for 32-bit IEEE floating-point math to its chips, which had evolved into a much more sophisticated graphics coprocessors.

"As we made these GPUs more programmable, we semi-tripped into the next market," said Huang, stressing the "semi." The problem was that it was still too hard to move the code running on parallel supercomputers to these GPUs. "If I could just express all of my problems as a triangle," Huang said to big laughs to the assembled boffinry.

It wasn't long before Tsuyoshi Hamada of Nagasaki University in Japan built a homemade cluster with 256 GeForce GPUs that cost only $230,000 and proved that this could be done, however inelegantly the machine might look.

It was, however, a fire hazard – and as Huang pointed out, if it ever caught fire, all you could do was run.

But it proved that the concept could work. And here we are, only two years later, and the Cray XK6 that is at the heart of the "Titan" 20 petaflops supercomputer that will be installed at Oak Ridge National Laboratory next year is really just a grownup version of Hamada-san's firetrap.

And guess what? It is still not quite good enough. The challenge to get to exascale performance levels in supercomputing – and therefore teraflops levels on our smartphones and tablets and tens to hundreds of teraflops on our desktops – is going to require some innovative leaps. Perhaps like the unintended kind that Nvidia itself did to become the world's largest graphics chip company and a player in ARM processors. And perhaps not by Nvidia itself – but not if Huang can help it.

The problem is that Dennard scaling, named after Robert Dennard, the IBM researcher who invented the DRAM chip and the ability to scale performance of semiconductors that actually explains why Moore's Law works, has run out of gas. And right about now, in fact.

If you plot a line from the Cray Y-MP8, an eight-processor vector machine from 1988, through an Alpha-based Cray T3E-1200 in 1998, to the Cray XT5 "Jaguar" machine at Oak Ridge in 2008, you get this beautiful straight line that goes from gigaflops to teraflops to petaflops, and you would hit exaflops somewhere around 2019 and zettaflops around 2031. But the current curve shows that getting to exaflops in a 20-megawatt thermal envelope, which is the practical upper limit for a system, is only attainable by 2035 with current CPU technology.

"The beautiful thing about a project that we won't get to until 2035 is that we don't have to start building it until 2030," Huang said, and got some more laughs. But the governments of the world want exascale computers by 2018 to 2020.


The problem, of course, is that a CPU is designed to run single threads as fast as possible and they are not particularly good at running things in parallel. There is 20 times the energy used to actually perform a calculation used in moving data into and out of an x86 chip than to do the calculation itself, and it takes 50 times the energy to schedule the instruction as it does to process it. This is great for low-threaded PC applications, but a disaster for CPU-based supercomputer clusters.

The problem with GPU coprocessors is that they are not as easy to program, and that is why Nvidia has started an effort called OpenACC, which seeks to set a standard for parallel programming for CPUs and GPUs.

Portland Group, a popular HPC compiler maker, and CAPS, which has created compilers specifically for GPUs, are backing the standard, which provides a means of putting "directive" hints into Fortran, C, and C++ code so the compilers have an easier time expressing the parallelism to the CPUs and GPUs. Neither Intel nor Advanced Micro Devices have been invited to the OpenACC party as of its launch, but Ian Buck, general manager for the CUDA compiler stack at Nvidia, tells El Reg that they can both adopt the OpenACC. Cray has signed up to to support the standard, too, which makes sense because it is selling Opteron-Fermi hybrids.

Even with all this, with current estimates, it is going to take three years longer to get to exascale than anyone thinks is possible with current technology, says Huang. He didn't elaborate on how this problem would be solved, but seemed optimistic that the industry would figure it out.

What Huang did not talk about was the status of the impending "Kepler" and future "Maxwell" GPUs, the latter expected in 2013, or the Project Denver ARMv8 processors that the company announced it was working on early this year. ®

Similar topics

Narrower topics

Other stories you might like

  • Nvidia wants to lure you to the Arm side with fresh server bait
    GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready

    Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

    Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

    The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

    Continue reading
  • Nvidia taps Intel’s Sapphire Rapids CPU for Hopper-powered DGX H100
    A win against AMD as a much bigger war over AI compute plays out

    Nvidia has chosen Intel's next-generation Xeon Scalable processor, known as Sapphire Rapids, to go inside its upcoming DGX H100 AI system to showcase its flagship H100 GPU.

    Jensen Huang, co-founder and CEO of Nvidia, confirmed the CPU choice during a fireside chat Tuesday at the BofA Securities 2022 Global Technology Conference. Nvidia positions the DGX family as the premier vehicle for its datacenter GPUs, pre-loading the machines with its software and optimizing them to provide the fastest AI performance as individual systems or in large supercomputer clusters.

    Huang's confirmation answers a question we and other observers have had about which next-generation x86 server CPU the new DGX system would use since it was announced in March.

    Continue reading
  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading

Biting the hand that feeds IT © 1998–2022