Hot Chips At the Hot Chips symposium in Silicon Valley on Monday, IBM and Intel each revealed a few more details about some upcoming processors of theirs.
From Big Blue, an addition to the Power9 family, dubbed a "bandwidth beast," and from Chipzilla, a Nervana neural-network processor code-named Spring Crest.
I got the Power
Let’s start with IBM. It revealed the latest and final design for its Power9 family: the Advanced IO or AIO, which is due out in 2020. This 14nm part was supposed to be out this year, and when it eventually shows up, it will sport up to 24 SMT4 cores, and have a max sustained system memory bandwidth of 650GB/s, according to Big Blue. We're also told it supports OpenCAPI 4.0. You can see how it compares to its Power9 siblings below, in a slide from IBM's Hot Chips presentation – click to enlarge any picture. Note also the upcoming Power10, now slated for 2021, that will use PCIe 5.0 and, well, there are not many other details on that yet.
Below are the headline specs of the as-yet-not-launched Power9 AIO. It will come in 12 or 24 CPU core variants, we're told, with up to 120MB of L3 cache, 48 lanes of PCIe 4, up to 16 lanes of CAPI 2.0 connectivity, and the usual on-die compression and encryption acceleration, and NVLink support for interfacing with Nvidia GPUs to speed up parallel number crunching. You can use up to 16 P9 AIO chips in one SMP system, and each crams eight billion transistors into a 728mm2 die, IBM says.
Here's where it gets more interesting. The Power9 AIO uses direct-attached RAM that talks the Open Memory Interface, OMI, which is based on OpenCAPI. This protocol uses 25.6GHz signalling, and tops out at 650GB/s. You can plug OMI RAM straight into the P9 AIO, or use Microchip's just announced controller to interface traditional DDR DRAM DIMMs with OMI. The load-to-use latency is 5 to 10ns with RDIMMs and about 4ns with LRDIMMs when using an OMI-to-DDR4 controller, Big Blue reckons. OMI appears to be a follow on from IBM's Centaur memory.
The benefit of accessing system RAM over OMI, according to IBM, is that bandwidth is significantly higher versus plain-old DDR DIMMs, and you can pack more RAM capacity into your box, which is ideal for in-memory databases and analytics, AI processing, and that sort of thing. However, it appears you will need to buy OMI RAM, or use DDR DIMMs with an OMI-compatible controller.
Finally, here's how OpenCAPI 4, supported by the P9 AOI, shapes up. It is pretty close to its siblings' OpenCAPI 3. OpenCAPI allows processor cores to coherently interface accelerators and IO devices.
As we said, this design isn't going to hit general availability until next year, and we won't know more practical details – like pricing, power draw, and clock speeds – until then. Consider this a little tease from Big Blue on what's coming up in Power land. IBM's Power9 chips power America's Summit supercomputer among other big beasts.
Chipzilla's Spring Crest aka NNP-T, previously known as the NNP-L
Intel showed off at Hot Chips its processor code-named Spring Crest, also known as the Neural Network Processor for Training or NNP-T, developed by its Nervana AI hardware team. In typical Chipzilla fashion, the part has a mildly confusing history: Spring Crest was first known as NNP-L, was due to land in 2019, got turned into a development platform, and has since been renamed to the NNP-T with a 2020 general shipping date, though tier-one cloud giants may get hold of them by the end of this year. The Nervana hype train continues.
The NNP-T is designed to train machine-learning models, which is the intensive part of developing an artificially intelligent system: this is when most of the heavy number crunching and vector math operations takes place as the software pores over batches of information to learn patterns in the data.
Spring Crest, as it stands today according to Intel, will be fabricated by Intel's rival manufacturer, TSMC. The component can hit 119 trillion operations per second (TOPS) using a mixture of BFloat16 with FP32 accumulation, it is claimed. It has 27 billion 16nm transistors on 680mm2 of silicon, 24 tensor processors arranged in a grid, a core frequency of up to 1.1GHz, 60MB of on-die memory, 4 x 8GB of HBM2-2000 RAM stacked on top, on-die management CPU and serial communications, a x16 PCIe 4 interface, and more, all drawing 150 to 250W total. This all sits on a 1,200mm2 interposer in a 60mm x 60mm 2.5D package.
Below are Intel's key slides from its Hot Chips NNP-T talk on Monday:
Each tensor processing unit features a microcontroller for directing what is essentially a math cocoprocessor; each TPU has a limited instruction set though this can be extended with custom microcontroller instructions, apparently. There are numerous frameworks for writing machine-learning software, and Spring Crest supports some of the most popular including Google’s TensorFlow as well as PyTorch, and Baidu’s Paddle Paddle. Intel will provide a software stack for talking and controlling the devices.
We'll write more about NNP-T, or whatever it's called next, once it actually starts shipping for us mere mortals to buy. ®
And finally... Upstart Cerebras, with $200m of funding under its belt, has been on a publicity tour, briefing selected mainstream journos about its bonkers iPad-sized TSMC-fabricated 16nm 46,000 mm2 single-die processor. It supposedly features up to 400,000 cores focused on machine-learning math processing, 1.2 trillion transistors, 100Pbps of fabric bandwidth, and 18GB of on-chip RAM moving at 9PB/s.
Dubbed the world's largest AI chip, it is not due to arrive until, well, when it eventually does. There are no prices nor any other details. You also need to put the thing into its own special box as it requires substantial custom cooling gear, and the whole thing won't start shipping to selected customers until later this year, though some prototypes are out there already, apparently.
Meanwhile... Our sister site The Next Platform has summarized AMD CEO Lisa Su's Hot Chips keynote speech.