Memory muddle muddies Intel's Exascale ambitions
To DRAM or not to DRAM? That is one of (many) HPC questions
IDF13 Intel's lofty attempt to make a supercomputer capable of an exaflop by 2020 while consuming a mere 20 megawatts of power is running into major problems due to the pesky laws of physics.
When Intel announced its exaflop goal back in 2011, the chipmaking giant talked up a variety of technologies it was bringing to bear on the stupendously tough problem of making a computer orders of magnitude more powerful than present-day bit-fiddlers while also making it much more efficient.
Some of the technologies Daddy Silicon has been toying with include ultra low-power chips via Near-Threshold Voltage Processor (NVP) technology, memory stacking with systems such as the Hybrid Memory Cube, and heterogeneous-processing systems using tech like Intel's massively multi-core Xeon Phi platform.
But as Intel has delved further into these technologies it has come up with numerous problems that, while not showstoppers, are going to be tough for the company to surmount.
"There's many new technologies in flight. These are going to have a profound impact on how we build systems," said Intel's chief architect for exascale systems Al Gara in a speech at the Intel Developer Forum.
The greatest wildcard is the type of memory Intel can use, he said. And the jury is very much out.
"When I look at directions we could go with HPC and how memory plays, I see it splitting into two directions: one is where we're stuck with DRAM and have to live with DRAM a long time," he said, the other is "if one of these [new] memory technologies really does evolve, then things change dramatically".
These newer technologies include things like spin-transfer torque memory, nanomechanical RAM, phase-change memory, and other emerging non-volatile memory technologies. All of them hold the promise of a 5 to 10X improvement on DRAM performance, and some have new compute possibilities as well.
The rub is that Intel isn't sure if they are going to mature in time for it to be able to pick a new memory standard, work to understand it and program for it, and then design new logic to get the best out of it.
We asked Gara how long he thought it might be until Intel could make a bet on either DRAM or one of the up-and-comers, and he said they might know "over the next year and a half to two years. You're going to see them become real in that timeframe. They won't be what we want for a DRAM replacement at that point, [but] that's when you have to check."
"Until those technologies get into the simpler or easier markets to enter we won't really know".
This means Intel's supercomputer dream is currently defined by two very different possibilities: one is that DRAM remains the best way to build systems. This will be tricky, as "if we're stuck with DRAM then the problem is because of the increase in performance of compute we're going to continue to drop memory capacity for performance," Gara says. "We're going to be driven to a very aggressive threading scenario."
This will necessitate the creation of various new programming methods that are implicitly parallel, and supported by high-speed interconnect and on-chip data shuttling systems such as photonic interconnects, to make the most of this low-memory high-compute environment.
An alternate world is where one of these memory functions changes, and at that point things get radically different. If spin-torque memory were to come through, for example, then computation can be done in a very different way.
"We can use the magnetic properties of the material," Gara says. This allows you to use the physical properties of the new memory technology to stand in for typical logic gates, and thereby be able to design circuits that are about 25 percent smaller, he said.
However if this form of memory comes through, then Intel will have much work to do to get the most of it. "Those [filing systems] are all optimized for when access times are in the tens of milliseconds, but [with non-volatile] now they're in the tens of nanoseconds," Intel's recently departed Lab chief Justin Rattner, told us when we asked him about this at IDF a year ago.
Though memory poses some difficult problems for Intel, the chip giant is more hopeful in other areas such as photonics, which are coming on strong.
At the moment the company is using four distinct wavelengths of light to generate 50Gbps in interconnect capacity, and is looking at moving to eight to get to 100Gbps. Ultimately, Intel thinks if it can push the number of wavelengths and efficiency up it could get to a terabit.
Unfortunately "there's no free lunch here" because photonics cost more energy than copper. "While it has enormous advantages for a lot of system regions, power is one of the key things we have to keep an eye on," Gara said.
But all of this bandwidth combined with faster memory mediums (or slightly better DRAM, depending), means that Intel needs to create better CPUs as well. In this area, it is focusing on thread scaling, and is "maniacally chasing" after improvements in areas such as false cache sharing, start-up overheads, synchronization overheads, and load/execution imbalances to improve this.
Improvements can't come from upping the clock rate. "We've topped out at frequency," he says. Even if Intel can increase it a bit, that wouldn't help: "If I suddenly gave you a terahertz processor and the same memory system you wouldn't get dramatic speedups."
One alternative would be a constrained programming model that could allow for simpler cores with higher frequencies, he said. This, combined with voltage scaling – last year at IDF Intel demonstrated a near-threshold voltage processor which sipped power at a very low rate – would let Intel do this without seeing the power climb. "The difficulty is as you lower voltage you also drop frequency".
To this hack's mind, Intel's big problem is that as it runs to meet its goal, it is perpetually being thwacked in the face by the fundamental laws of physics which rear up in the materials it is using or ways it want to shuttle information. THWACK, goes voltage dispersal as you step down through finer chip processes. BANG, goes the speed of light as you try to use photonics. And so on.
But even here Intel is thinking of workarounds. One way to stuff more smarts into a circuit could be to use the time intervals at which information is squirted around the logic to carry computation, Gara said.
"Energy efficiency is just a function of how long the wires are, and how many you have. In reality we're not using time – you can encode information in time!" he points out. "If I sent a signal across a single wire, but time when I am transitioning it – that's how I'm encoding information. Now it costs me frequency, but this approach allows me to get to energy efficiency numbers I couldn't have gotten. The question is whether this works for logic?"
Intel does not know, but this is one of the many areas it is exploring as it tries to step aside certain apparently impassable limits in its quest for the exascale system.
But if Intel makes it to an exaflop within its timeframe, then the work is going to start all over again, Gara says, because according to Landauer's principle – a theory that puts the lower bound on the cost of computation – Intel has a long way to go.
"It turns out for an exaflop you need 16W, which is interesting because that's what the brain is, for 16MW you should be able to do one yottaflop from an information-theory perspective," he said with a wry grin. Just don't tell upper management. ®