IBM, Intel tease 2020's specialist chips: Power9 'bandwidth beast' – and Spring Crest Nervana neural-net processor

Plus, Cerebras hypes up AI-focused '400,000-core die the size of an iPad'


Hot Chips At the Hot Chips symposium in Silicon Valley on Monday, IBM and Intel each revealed a few more details about some upcoming processors of theirs.

From Big Blue, an addition to the Power9 family, dubbed a "bandwidth beast," and from Chipzilla, a Nervana neural-network processor code-named Spring Crest.

I got the Power

Let’s start with IBM. It revealed the latest and final design for its Power9 family: the Advanced IO or AIO, which is due out in 2020. This 14nm part was supposed to be out this year, and when it eventually shows up, it will sport up to 24 SMT4 cores, and have a max sustained system memory bandwidth of 650GB/s, according to Big Blue. We're also told it supports OpenCAPI 4.0. You can see how it compares to its Power9 siblings below, in a slide from IBM's Hot Chips presentation – click to enlarge any picture. Note also the upcoming Power10, now slated for 2021, that will use PCIe 5.0 and, well, there are not many other details on that yet.

Below are the headline specs of the as-yet-not-launched Power9 AIO. It will come in 12 or 24 CPU core variants, we're told, with up to 120MB of L3 cache, 48 lanes of PCIe 4, up to 16 lanes of CAPI 2.0 connectivity, and the usual on-die compression and encryption acceleration, and NVLink support for interfacing with Nvidia GPUs to speed up parallel number crunching. You can use up to 16 P9 AIO chips in one SMP system, and each crams eight billion transistors into a 728mm2 die, IBM says.

Here's where it gets more interesting. The Power9 AIO uses direct-attached RAM that talks the Open Memory Interface, OMI, which is based on OpenCAPI. This protocol uses 25.6GHz signalling, and tops out at 650GB/s. You can plug OMI RAM straight into the P9 AIO, or use Microchip's just announced controller to interface traditional DDR DRAM DIMMs with OMI. The load-to-use latency is 5 to 10ns with RDIMMs and about 4ns with LRDIMMs when using an OMI-to-DDR4 controller, Big Blue reckons. OMI appears to be a follow on from IBM's Centaur memory.

The benefit of accessing system RAM over OMI, according to IBM, is that bandwidth is significantly higher versus plain-old DDR DIMMs, and you can pack more RAM capacity into your box, which is ideal for in-memory databases and analytics, AI processing, and that sort of thing. However, it appears you will need to buy OMI RAM, or use DDR DIMMs with an OMI-compatible controller.

Finally, here's how OpenCAPI 4, supported by the P9 AOI, shapes up. It is pretty close to its siblings' OpenCAPI 3. OpenCAPI allows processor cores to coherently interface accelerators and IO devices.

As we said, this design isn't going to hit general availability until next year, and we won't know more practical details – like pricing, power draw, and clock speeds – until then. Consider this a little tease from Big Blue on what's coming up in Power land. IBM's Power9 chips power America's Summit supercomputer among other big beasts.

Chipzilla's Spring Crest aka NNP-T, previously known as the NNP-L

Intel showed off at Hot Chips its processor code-named Spring Crest, also known as the Neural Network Processor for Training or NNP-T, developed by its Nervana AI hardware team. In typical Chipzilla fashion, the part has a mildly confusing history: Spring Crest was first known as NNP-L, was due to land in 2019, got turned into a development platform, and has since been renamed to the NNP-T with a 2020 general shipping date, though tier-one cloud giants may get hold of them by the end of this year. The Nervana hype train continues.

The NNP-T is designed to train machine-learning models, which is the intensive part of developing an artificially intelligent system: this is when most of the heavy number crunching and vector math operations takes place as the software pores over batches of information to learn patterns in the data.

Spring Crest, as it stands today according to Intel, will be fabricated by Intel's rival manufacturer, TSMC. The component can hit 119 trillion operations per second (TOPS) using a mixture of BFloat16 with FP32 accumulation, it is claimed. It has 27 billion 16nm transistors on 680mm2 of silicon, 24 tensor processors arranged in a grid, a core frequency of up to 1.1GHz, 60MB of on-die memory, 4 x 8GB of HBM2-2000 RAM stacked on top, on-die management CPU and serial communications, a x16 PCIe 4 interface, and more, all drawing 150 to 250W total. This all sits on a 1,200mm2 interposer in a 60mm x 60mm 2.5D package.

Below are Intel's key slides from its Hot Chips NNP-T talk on Monday:

Each tensor processing unit features a microcontroller for directing what is essentially a math cocoprocessor; each TPU has a limited instruction set though this can be extended with custom microcontroller instructions, apparently. There are numerous frameworks for writing machine-learning software, and Spring Crest supports some of the most popular including Google’s TensorFlow as well as PyTorch, and Baidu’s Paddle Paddle. Intel will provide a software stack for talking and controlling the devices.

We'll write more about NNP-T, or whatever it's called next, once it actually starts shipping for us mere mortals to buy. ®

And finally... Upstart Cerebras, with $200m of funding under its belt, has been on a publicity tour, briefing selected mainstream journos about its bonkers iPad-sized TSMC-fabricated 16nm 46,000 mm2 single-die processor. It supposedly features up to 400,000 cores focused on machine-learning math processing, 1.2 trillion transistors, 100Pbps of fabric bandwidth, and 18GB of on-chip RAM moving at 9PB/s.

Dubbed the world's largest AI chip, it is not due to arrive until, well, when it eventually does. There are no prices nor any other details. You also need to put the thing into its own special box as it requires substantial custom cooling gear, and the whole thing won't start shipping to selected customers until later this year, though some prototypes are out there already, apparently.

Meanwhile... Our sister site The Next Platform has summarized AMD CEO Lisa Su's Hot Chips keynote speech.

Similar topics


Other stories you might like

  • VMware claims ‘bare-metal’ performance from virtualized Nvidia GPUs
    Is... is that why Broadcom wants to buy it?

    The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

    Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

    NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

    Continue reading
  • Nvidia promises annual datacenter product updates across CPU, GPU, and DPU
    Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

    Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

    Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

    "We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

    Continue reading
  • Now Amazon puts 'creepy' AI cameras in UK delivery vans
    Big Bezos is watching you

    Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

    The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

    Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

    Continue reading

Biting the hand that feeds IT © 1998–2022