This article is more than 1 year old
Nvidia Tesla bigwig: Why you REALLY won't need x86 chips soon
Find out where Intel, AMD, ARM stand in GPU giant's roadmap
Interview Life is what happens when you are trying to do other things, as the old saying goes.
Jen-Hsun Huang, co-founder and CEO of Nvidia has been perfectly honest about the fact that the graphics chip maker didn't intend to get into the supercomputing business. Rather, it was founded by a bunch of gamers who wanted better graphics cards to play 3D games. Fast forward two decades, though, and the Nvidia Tesla GPU coprocessor and the CUDA programming environment have taken the supercomputer world by storm.
Nearly two years ago, Steve Scott, the CTO at supercomputer maker Cray who designed several generations of supercomputers and interconnects, joined Nvidia to be CTO of its Tesla GPU coprocessor unit.
Scott sat down with El Reg to talk in more detail about the roadmaps for the Tesla GPU coprocessors and Tegra CPUs, which were announced at the GPU Technical Conference a month ago:
TPM (Vulture Central's interviewer): You know the kinds of things that El Reg cares about. We care about the GPU computing roadmap, we care about Project Denver ARM cores, and we care about servers and connectivity and the kind of role that you might play in both.
Steve Scott: Well, let me start with this: Jen-Hsun talked about it already, the fact that with "Logan" Tegra is going to become GPU-capable. What we are seeing is a convergence between what Tegra is doing and what Tesla is doing.
Tesla could never do what we do without GeForce. The HPC community is just far too small to support the kind of development it would take to build a competitive processor. This is why Cray got out of building processors, and recently Cray even got out of building interconnects, because the HPC market just isn't big enough. So Tesla has been completely enabled by GeForce. The cool thing that is going on right now is that we are going to see that extend from GeForce to Tesla to Tegra. All of the GPU computing, all of the software – CUDA, OpenACC, and anything else in the software stack – is now going to work on phones, tablets, laptops, whatever. On the other end of the spectrum, Tesla is becoming more Tegra-like because we are going to start integrating CPUs into Tesla.
TPM: Will they be literally the same processors used in Tegra and Tesla, or will there be variants of the Denver core aimed at mobile and server devices?
Scott: Our Denver project is really aimed at putting out a high-performance ARMv8 processor. Our Denver 64-bit ARM core will be higher performance than anything you can buy from ARM Holdings. That core is going to show up in Tegra, but it won't show up in all of the Tegra processors. We will still have Tegra processors that use stock ARM cores as well, like we use Cortex-A9 cores today, but Denver will show up in the high end.
As an architecture licensee, the thing to remember is that you can tweak an ARM core to change its performance, but you can't change the architecture one lick. You have to conform to the ISA, and they are quite disciplined about that.
TPM: Can a full licensee, like you are, add to the ISA?
Nvidia Tesla CTO Steve Scott
Scott: No. You can add system-on-chip features, you can put on a video transcoder, for instance, or a different network interface, but the instruction set is the ARM-approved ISA and that is it. What that means is that you can compile any software and it will run on any ARM core, and it's a good thing.
TPM: So how do we think about Tesla and Tegra going forward? Will there always be a Tesla, or does Tegra just get fatter and faster?
Scott: As the CTO of Tesla, I sure hope so. (Laughs)
TPM: I would think you would know ... er ... (More laughs)
Scott: There are no current plans that I am aware of to do away with Tesla. But seriously ... the products don't have to converge, but we do end up with a converged architecture. This is really unprecedented. The right way to build a consumer processor has, historically, not been the right way to build a supercomputer processor. Go back in time and look at a Cray 1 and a Motorola 6800 or MOS 6502. They were just night-and-day different. Go back even five years and they were quite different.
But now everybody is constrained by power – the most important thing in your phone is power efficiency because you have got one watt, and the most important thing in a $100m supercomputer is power efficiency because it is getting expensive to plug the things in. So the right way to build supercomputers in the future is going to be with lots of little power-efficient cores. You are going to have to do that to get the efficiency. You don't really want to build a really complex processor for a supercomputer, even though that would give you faster single-thread performance, because it is going to become power inefficient.
So the question becomes, do you have lots of little cores, or lots and lots of little cores? Do you have 100 cores, or do you have 1,000 cores?
TPM: You already do a lot of differentiation with the GPUs today. You scale up and down the number of CUDA cores, the number of SMs, and the memory, and so forth – and decide what features to turn on and off in each product, whether it is visualization or dynamic parallelism or Hyper-Q, how much single-precision or double-precision math, or whatever. So there will be similar differentiation between Tegra and Tesla?
Scott: It's going to be stuff around the edges – what kind of network interface, how much memory bandwidth you have, do you put in ECC, do you put in a high-throughput double-precision floating point. The answers to all of those questions is yes for HPC, and no for the mobile space. But the architecture – which means taking some heavy-weight cores that are designed for single-thread performance and coupling those with lots of cores that are designed for power efficiency – are the same between the two.
So we can now develop the "Maxwell" family of GPUs, and that will go into the Tesla line and into the "Parker" family of Tegra processors. Just like today we make a GK104, a GK107, and a GK110 SKU of the same basic architectural family. The interesting bit about dynamic parallelism in particular is that we won't regress. Any future Tesla GPU will have it, even though it was first introduced only in the GK100 used in the Tesla K20.
(Scott did not know when dynamic parallelism might be added to future Tegra CPU-GPU hybrids.)
The point is, once you make Tegra compute-capable and once you integrate GPU cores and have the same basic architecture, we now actually have what the cute little phrase – "from cell phones to supercomputers" – promised. It means we have higher volume, which provides the foundation for Tesla.
TPM: At the moment, Tesla is basically a slightly modified GeForce card aimed at servers and workstations. But going forward, it is still going to be a little different in that with both Tesla and Tegra, you are going to have both CPUs and GPUs on a single die. . . .
Scott: They don't necessarily have to be on a single die. You could get the same effect by having a custom interface between the two.
TPM: As Intel does with Xeon E3s and the HD graphics in the same package, yup. Although in your case, you would be not dropping in or not turning on the CPU where Intel is enabling or disabling the graphics, depending on if it is aimed at a workstation or a server.
Scott: And eventually, depending on how successful ARM is in attracting games, you could imagine game machines in the future – a gaming PC – with an ARM plus GPU instead of X86 plus GPU.
TPM: I have been thinking for some time that you would get into the console business, and then servers proper, and then maybe PCs - whatever that term might mean in the future - aside from a smartphone and a tablet.
Scott: The point is, you can serve all of our current businesses with the same architecture. But we will continue to interoperate with Intel and AMD processors – that's important to us and we will continue to do that – and there will be overlap. But you can imagine a future where you really don't need an X86 processor anymore because we have an integrated ARM processor.