ISC'11 Intel doesn't seem to be in a hurry to get its own line of Knights coprocessors for HPC applications into the field, and maybe it doesn't have to be.
To be sure, Nvidia is stealing most of the oxygen in the conversation about coprocessors for accelerating supercomputer applications with its Tesla family of GPU accelerators. Most recently with the launch of the Tesla M2090 fanless coprocessors, which are based on its "Fermi" GPUs.
Advanced Micro Devices is getting some, but not very much, traction with its FireStream coprocessors, built from its "Cypress" GPU chips, and is even retrofitting its FirePro workstation graphics cards for servers to support GPU acceleration with the fanless FirePro V7800P card, announced last month to try to blunt Nvidia's attack on the HPC centers of the world.
Thus far, Intel's Many Integrated Core (MIC) is little more than a research project. Intel picked up the remnants of the failed "Larrabee" graphics card project and rechristened it Knights and put it solely in the service of the king of computing, the CPU.
The Larrabee graphics co-processor made a brief debut at the SC09 supercomputing trade show in November 2009, hitting one teraflops running the SGEMM single precision, dense matrix multiply benchmark when Intel overclocked it.
A few weeks later, Intel said that Larrabee was being denigrated to research status , and last May, ahead of the International Super Computing 2010 (ISC) event in Hamburg, the chip maker said it was not going to enter the discrete graphics card business to take on Nvidia and AMD after all. And thus was born the "Knights" family of Many Independent Core (MIC) discrete coprocessors for HPC applications.
A year ago, at ISC, Intel was talking up the MIC architecture, and at this year's ISC, which takes place this week, Intel has made a few baby steps toward getting the first MIC coprocessors to market.
In a briefing with the press ahead of ISC, Anthony Neal-Graves, general manager of workstations and MIC computing at Intel, confirmed that the first MIC device, the "Knights Ferry" development platform, is being ramped as planned this year and that the first MIC commercial coprocessor, called "Knights Corner", will be launched using Intel's 22 nanometer Tri-Gate process technology.
That same process is being used for future Xeon and Itanium processors, including the "Kittson" Itaniums and the "Ivy Bridge" Xeons, both due in 2012. It is not clear if Intel will roll out the MIC coprocessors ahead of the Poulson or Ivy Bridge chips, or get them out behind them, and the chip giant had no intention of clearing this up at ISC this week.
As the Top 500 list of supercomputers has shown for the past two years, advances in hybrid computing tools and the desire to do more computing for less money, using less electricity, and generating less heat is calling into question the use of CPU-only parallel machines to run giant simulations.
But this market is far from mature yet, so Intel probably has enough time to get MIC coprocessors into the field before Nvidia gets a nearly insurmountable lead in coprocessors. And that is for two reasons: Intel has an economic advantage with its vast chip-making operations, and unlike GPU coprocessors the Knights coprocessors run plain old x64 code.
"At the end of the day, folks will go to wherever they can get maximum performance for the least amount of effort," Neal-Graves explained on the call.
You don't have to have a supercomputer and run a big simulation to see that things tend toward the lowest energy state in the universe. Intel is counting on the combination of its Fortran, C, and C++ compilers, which are popular among HPC shops, the MIC coprocessors, with their x64 instruction set, and its manufacturing prowess to allow it to at least catching up to Nvidia with its Tesla coprocessors and CUDA programming environment.
Given all of this, maybe Intel doesn't have to be in a hurry. Or, to say it another way, to get the kind of performance Intel needs to demonstrate with MIC coprocessors to pull even with Nvidia next year, maybe it cannot go any faster because of the work it needs to do in its compilers and in perfecting the 22 nanometer processes.
Conceptually, here's what a MIC coprocessor looks like:
The Many Integrated Core architecture block diagram
The MIC chips put multiple processing units consisting of an x64 core, a vector processor, and some cache memory into a module, and then cookie-cutter them onto the chip with a fast ring interconnect keeps the caches for each chip coherent (so they can share data quickly and function more or less like a baby parallel supercomputer).
The MIC chip has a superscalar x64 core (without the out-of-order execution of Xeons, so akin to the Atom chip in some respects) and a 512-bit vector math unit that can do 16 floating point operations per clock with single precision math.