Having swallowed its pride and started again with 10nm chips, Intel teases features in these 2019-ish processors
3D stacks of Arm-like core clusters, APIs, and more coming some time soon
"We have humble pie to eat right now, and we're eating it," Murthy Renduchintala, Intel's chief engineering officer, said yesterday. "My view on [Intel's] 10nm is that brilliant engineers took a risk, and now they're retracing their steps and getting it right."
Record scratch. Freeze frame. You're probably wondering how Chipzilla got into this situation. On Tuesday, during a briefing on its future chip architecture plans in a $22m Silicon Valley mansion, Intel execs played down the holdup with 10nm, and tried emphasizing that there's more to processors than transistor sizes.
First, let's catch you up on the history. In 2013, Intel claimed it could produce chips with a 10nm lithography process by 2015, then revised that timetable to 2016, and then late 2017. Then late 2019 at the earliest, or perhaps early 2020 for most system buyers.
In mid-2018, Intel limped out a wimpy dual-CPU 10nm Core i3 processor, aimed at low-power Chinese laptops, and codenamed Cannon Lake, mainly so it could say it was shipping some silicon at that node. However, the Cannon Lake family is now effectively dead, and the process engineers have gone back to the drawing board to make their fabrication technology work properly for the mass production of future half-decent 10nm chipsets.
As a sign they pretty much had to start over, the Cannon Lake Core i3's integrated GPU was disabled because it didn't work at 10nm.
Wintel dust up: Intel supply woes vs Win10 demandREAD MORE
You see, Intel managers made some brave decisions on how they were going to get down to 10nm from 14nm, and ended up in a dead end, resulting in low yields of working chip dies. The cause is likely multi-patterning and choices in metal layer layout. Fixing those issues involves redoing substantial amounts of design work and tooling, which is why people won't get their hands on proper 10nm Intel processors until late 2019 or early 2020, some five years later than expected.
While the world waited for Chipzilla to swallow its pride, and change course, a couple of things happened. One, Intel squeezed every last drop of performance out of its 14nm process technology, comfortably tiding the biz over. Don't forget, Chipzilla is a money-printing machine, still. And two, other chip factories, such as TSMC and Samsung, caught up.
Intel used to have a commanding lead over rival silicon fabricators, but that lead is in danger of being squandered. What TSMC and pals market as 10nm chips pretty much match Chipzilla's 14nm in terms of transistor density and performance. Meanwhile, TSMC and Samsung started producing 7nm mobile parts this year, and both foundries promise to ramp up production of high-performance 7nm server and desktop components in 2019 and 2020, just as Intel hopes to get a handle on 10nm.
If the non-Intel 7nm is as good as Intel's 10nm, then they'll all be on a level pegging. That means AMD, Qualcomm, Nvidia, and other TSMC and Samsung foundry customers, will match Intel in terms of transistor density. And that's not great news for Intel.
This is a case of marketing coming back to bite Chipzilla in the ass. With TSMC, for one, touting 10nm that is on a par with Intel's 14nm, while Intel is struggled with 10nm, Chipzilla had a PR problem. Which is why in early 2017, Intel appealed for another way to measure up fabrication technologies. If only, it said, semiconductor makers could express transistor densities in terms of the number of NAND gate cells and standard flip-flop blocks per square millimeter, Intel could reset the industry's high-score table, and show its current technology matched or beat TSMC and friends in terms of the number of transistors crammed on (presumably working) dies.
After all, 14, 10 and 7nm are just marketing terms. The gate dimensions aren't actually 14, 10, or 7nm. For example, an Intel 14nm FinFET gate has a length of 20nm. Funnily enough, rather than adopt the aforementioned logic cell equation, Intel's competitors stuck to the nanometre numbering, using 10nm and 7nm as a stick to beat the dominant, and in some markets virtual monopoly, player.
For what it's worth, Intel is toiling away on 7nm and 5nm in its labs, separate from its 10nm work. Different teams worked on 7 and 10nm lithography, and so you may be forgiven for thinking 7nm isn't that far behind 10.
"Seven nanometers, for us, is a separate team and a largely separate effort," Renduchintala told investors earlier this month on a conference call.
"And we are quite pleased with our progress on 7, in fact very pleased with our progress on 7, and I think that we have taken a lot of lessons out of the 10-nanometer experience as we defined that and defined a different optimization point between transistor density, power and performance and schedule predictability.
"As you look at 7-nanometer, for us this is really now a point in time where we will get EUV back into the manufacturing matrix, and therefore, I think, that will give us a degree of back to the traditional Moore’s Law cadence that we were really talking about. 14 and 10 were really about double patterning and quad patterning in the absence of EUV."
Fast-forward to this week, and Renduchintala admitted Chipzilla had been too "stubborn" to start over with 10nm, and had persevered down a rabbit hole rather than admit defeat and have a fresh stab at what is admittedly a non-trivial problem. The tech titan was too tied to its process nodes, he said, building its technology around transistor sizes even when it didn't make sense.
Smaller gate sizes means more gates per millimeter square, which means more CPU cores and performance, or smaller dies for the same performance. There are other benefits, such as reduced power consumption and heat dissipation. It's just generally a good idea. But on systems where electrical power isn't constrained, and cooling isn't a problem, such as desktop PCs, it doesn't make sense to rush into a smaller transistor size, thus reducing power consumption and heat output, when there are other ways to increase performance. Speaking of which...
New old ideas
And so, like in 2017 when Intel cried foul over other factories, in its view, not playing fair with nanometer marketing in order to play down its own 10nm woes, in late 2018, Intel is continuing to play down its 10nm woes by insisting there's more to life than gate size – a characteristic it used to bang on about when it was a clear leader.
It's like watching the kid who usually wins every track circuit race realize they may lose their next race, and then argue it's not crossing the line first that matters: it's how comfortable your shoes are, and whether or not you're wearing a fitness smartwatch, or have lots of fans cheering you on, that decides the winner.
To be clear, Intel is still emphasizing the importance of gate size: "No transistor left behind," quipped Raja Koduri, Intel's chief architect. It insists lithography is crucial, and it hasn't given up, it has just identified five other pillars that it cares about, and thinks you should, too. And as if you have a choice: those 10nm Intel CPUs are at least a year away.
The six pillars are, in no particular order: architecture, lithography process, software, memory, interconnects, and security. These should help Intel target an addressable market of more than $300bn by 2020, it is hoped.
Here's a summary of the main upcoming promised technologies in these areas – we'll go over them in more detail later this month once we've had a chance to digest the changes, and before everyone legs it for Christmas.
Arm-like big.LITTLE architecture
In the second half of 2019 or early 2020, at the request of an unnamed customer, Intel expects to launch a multi-core processor that has large and small x86 CPUs, like modern smartphones have four, six or eight beefy Arm-compatible CPUs and the same number of smaller power-efficient cores. Arm calls this big.LITTLE. Intel hopes to produce a 10nm fanless system-on-chip with a set of high-performance cores to take on intensive workloads, and a set of low-power Atom cores to run all other code. Open a Start menu on Windows, and one of the performance cores will fire up in anticipation of starting an application, like Photoshop. The chipset should draw 2mW in standby mode, and will be generally available, we're told.
One catch is that the performance and Atom cores aren't the same architecture, beyond Intel's base 32-bit and 64-bit x86 ISA. For one thing, Atoms don't have the same vector math extensions as the larger cores, so the underlying operating system will have to juggle applications carefully to avoid any crashes by migrating code to an incompatible core.
This chip will be built out of a 3D packing technology that stacks dies on top of each other to form a system-on-chip. The base 22nm die houses all the I/O, SRAM, and power control circuits, while dies of 10nm compute cores sit on top, along with memory, storage, and any GPUs and other accelerators. This stacking tech, dubbed Foveros, allows Intel to mix and match components and sit them on top of each other to form a system-on-chip.
Crucially, the layers are connected using a high-speed interconnect, we're told, allowing data to move up and down the stack with ease.
Another CPU, another codename
Intel teased another upcoming 10nm processor family, this one codenamed Sunny Cove, which, we're told, will have larger buffers and caches to handle data-heavy workloads, new vector instructions to run more operations, such as AES encryption, bit manipulation, and SHA hashing, in parallel, and similar tweaks and enhancements to reduce information-processing latency. Sunny Cove CPUs will form the basis of next-gen Xeon and Core products. Being a 10nm component, this will land some time in the future.
Chipzilla wants to introduce an abstraction layer called OneAPI, which means software developers can craft code that makes the best use of available hardware acceleration in the host machine's CPUs, GPUs, FPGAs, and AI accelerators. Rather than tailor an app for, say, a graphics processor, you write it to the OneAPI specification, so that when run on a given box, the abstraction layer directs work to the best available compute resource. This is supposed to simplify any development friction, while neatly keeping programmers tied to Intel's platforms. This is due in 2019.
Speaking of interfaces, Intel hopes to release what it calls its Deep Learning Reference Stack, for running artificial intelligence algorithms and models on Xeon server-grade systems.
Integrated GPUs no longer second-class citizens
Intel is still committed to producing a discrete graphics processor by 2020, but also wants to make Chipzilla's integrated GPUs not suck in the meantime. These builtin graphics engines should be just as good as discrete GPUs, Koduri said, and so Intel is working on 11th generation iGPUs in 10nm processors in 2019. These should top 1 TFLOPS of performance, we're told.
And keep an eye out on our high-performance computing sister site, The Next Platform, for Timothy Prickett Morgan's take on Intel's latest goals and promises. ®