Interview The Intel Software Development Conference was on in London last week, and we took the opportunity to catch up with James Reinders, director and evangelist for parallel programming and HPC tools.
Reinders is talking up Knights Landing, the next generation of Xeon Phi, Intel's MIC (many integrated core) processors, which are designed for high-performance concurrent programming.
The first Xeon Phi, Knights Corner, was released in 2012 and had up to 61 cores. 48,000 of the chips are installed in the world's most powerful supercomputer, China's Tianhe-2.
Knights Landing has up to 72 cores, but the more significant difference is that the new Xeon Phi is a processor rather than a co-processor. Co-processors use a host/device programming model, where an application running on the host (the CPU) offloads compute-intensive tasks to the device (the co-processor), with huge potential speed-ups. Nvidia's Tesla range of GPU accelerator boards (installed in the Titan, the world's second most powerful supercomputer) also use this model.
Intel's James Reinders
Processor versus co-processor
Why did Intel go the co-processor route with Knights Corner, but is now changing tack? "One issue was software," says Reinders. "[Knights Corner] being a co-processor fitted with a mould that people seemed to be more ready for. The other thing was a bit of legacy. The cluster on a chip design came from Larabee, a project for something else that we didn't bring to market. We could introduce a co-processor faster. It was an engineering trade-off.
"In a co-processor you can control your ecosystem more: everything that runs on it we had control of. The host was standard. We weren't quite ready to understand how 512-bit vectors should be done on a processor.
"Personally I was, I'll deal with this co-processor and where it is taking us, but I can't wait for Knights Landing."
From the programmer's perspective, a processor is easier to code for since you no longer have to worry about the host/device boundary. "Co-processors have a big issue, a controlling program that already has the data, but has to ship the data over to the co-processor. You buy the memory twice. You have memory on the host that stores the data, then you transfer it to the memory on the card," says Reinders.
"The other thing is integrating the fabric onto the package. Knights Landing will be our first processor that does that. Then driving the latency down on that fabric will allow scaling out," he adds.
Supercomputers spend much of their time analysing huge datasets, a trend that will continue as IoT (Internet of Things) sensors supply more and more data. "By turning [Xeon Phi] into a processor rather than a co-processor, it unleashes our ability to handle huge amounts of data. The processor nature will enable machines to be built with arbitrarily large amounts of memory," Reinders explains.
Intel's Xeon Phi has far fewer cores than its GPU-based competition, but each core is more capable. "It's a classic computer architecture question. Are you better off with a few fat cores that do everything well, or a bunch of smaller cores? We're doing it in a way that's compatible," says Reinders.
From 61 to just 72 cores over three years' development may seem disappointing, but Reinders says the core count is not the only important thing. "Can people figure out how to get three times as much parallelism? Or will they be better off if we became a processor, ran at a higher clock rate, gave high-bandwidth memory, and did out-of-order execution to accelerate the per-thread experience? That's the design trade-off we've made."
When do we get Knights Landing, which Intel originally promised for 2015? "We have three systems outside of Intel now," Reinders told the Reg. "Cray has one, Sandia National Laboratories, and CEA in France. They are on A0 (first stepping level) silicon. You'll see a gradual ramp-up as the new year starts. We haven't said when general availability comes."
Why Fortran is great for supercomputing
Supercomputing and concurrent programming is not just about the hardware. Getting results means writing well-optimised code, and that has proved to be the harder problem. Enter Fortran. "While computer science may have abandoned Fortran, it still drives the scientific world," says Reinders. "Fortran is a very good language for scientific programming. Fortran has grown up and some of the arguments against it have been rectified."
He is particularly enthusiastic about Coarray Fortran, which is designed for parallelism. "I consider it one of a few PGAS [partitioned global address space] technologies and they are pretty popular with a certain crowd. In particular, they are very popular on Cray Aries fabric, which is very low latency. When a programmer has to deal with moving data around it's a pain, but you need some mechanism to make sure that it doesn't move around too much. I've seen some beautiful programs written with Coarray Fortran. It's not for everybody or every algorithm, but when it fits, it exposes when you are talking to remote memory and you can make sure you don't do that too often."
One consequence of taking the processor route with Xeon Phi is that it is less distinct from Xeon, Intel's mainstream high-end processor brand. They may eventually become one, says Reinders. "Our biggest competition for Xeon Phi is Xeon. We named them on purpose to reflect that blurring. Two or three decades from now, I can't promise that they'll still be separate products." ®