All aboard the PCIe bus for Nvidia's Tesla P100 supercomputer grunt

No NVLink CPU? No problem

ISC Nvidia has popped its Tesla P100 accelerator chip onto PCIe cards for bog-standard server nodes tasked with artificial intelligence and supercomputer-grade workloads.

The P100 was unveiled in April at Nvidia's GPU Tech Conference in California: it's a 16nm FinFET graphics processor with 15 billion transistors on a 600mm2 die. It's designed to crunch trillions of calculations a second for specialized software including neural-network training and weather and particle simulations. The GPU uses Nvidia's Pascal architecture, which has fancy tricks like CPU-GPU page migration, and we've detailed the designs here.

Each P100 features four 40GB/s Nvidia NVLink ports to connect together clusters of GPUs, NVLink being Nvidia's high-speed interconnect. IBM's Power8+ and Power9 processors will support NVLink, allowing the host's Power CPU cores to interface directly with the GPUs.

Those Big Blue chips are destined for American government-owned supercomputers and other heavy-duty machines. The rest of us, in the real world, are using x86 processors for backend workloads.

Nearly 100 per cent of compute processors in data centers today are built by Intel; Intel does not support Nvidia's NVLink, and Intel does not appear to be in any hurry to do so. Thus, Nvidia has emitted – as expected and as planned – a PCIe version of its Tesla P100 card so server builders can bundle the accelerators with their x86 boxes. That means the GPUs can talk to each other at high speed via NVLink, and to the host CPUs via a PCIe bus.

The PCIe P100 comes in two flavors: one with 16GB of HBM2 stacked RAM that has an internal memory bandwidth of 720GB/s, and the other – cheaper – option with 12GB of HBM2 RAM and an internal memory bandwidth of 540GB/s. Both throw 32GB/s over the PCIe gen-3 x16 interface.

They can each sustain 4.7TFLOPS when crunching 64-bit double-precision math; 9.3TFLOPS for 32-bit single-precision; and 18.7TFLOPS for 16-bit half-precision. That's a little under the raw P100 specs of 5.3TFLOPS, 10.6TFLOPS and 21TFLOPS for double, single and half precision, respectively. The reason: the PCIe card's performance is dialed down so it doesn't kick out too much heat – you don't want racks and racks of GPU-accelerated nodes to melt.

While the NVLink P100 will consume 300W, its 16GB PCIe cousin will use 250W, and the 12GB option just below that.

By the way, if you want full-speed, full-power Tesla P100 cards for non-NVLink servers, you will be able to get hold of them: system makers can add a PCIe gen-3 interface to the board for machines that can stand the extra thermal output. But if you want PCIe only, and are conscious of power consumption, the lower performance, lower wattage PCIe options are there for you.

"The PCIe P100 will be for workhorse systems – the bulk of machines," Roy Kim, a senior product manager at Nvidia, told The Register. He suggested four or eight of the cards could be fitted in each server node.

These PCIe devices won't appear until the final quarter of 2016, and will be available from Cray, Dell, HP, IBM, and other Nvidia partners. The final pricing will be up to the reseller, but we're told the cheaper option will set you back about as much as an Nvidia K80 – which today is about $4,000.

For what it's worth, Nvidia told us the P100 PCIe cards will later this year feature in an upgraded build of Europe's fastest supercomputer: the Piz Daint machine at the Swiss National Supercomputing Center in Lugano, Switzerland. ®

PS: Look out for updates to Nvidia's AI training software Digits – version four will include new object detection technology – and its cuDNN library, version 5.1 of which will include various performance enhancements. Meanwhile, Nv's GPU Inference Engine (GIE) will finally make a public appearance this week: this is code designed to run on hardware from data center-grade accelerators down to system-on-chips in cars and drones, given applications the ability to perform inference using trained models.

Similar topics

Other stories you might like

  • Microsoft's do-it-all IDE Visual Studio 2022 came out late last year. How good is it really?

    Top request from devs? A Linux version

    Review Visual Studio goes back a long way. Microsoft always had its own programming languages and tools, beginning with Microsoft Basic in 1975 and Microsoft C 1.0 in 1983.

    The Visual Studio idea came from two main sources. In the early days, Windows applications were coded and compiled using MS-DOS, and there was a MS-DOS IDE called Programmer's Workbench (PWB, first released 1989). The company also came up Visual Basic (VB, first released 1991), which unlike Microsoft C++ had a Windows IDE. Perhaps inspired by VB, Microsoft delivered Visual C++ 1.0 in 1993, replacing the little-used PWB. Visual Studio itself was introduced in 1997, though it was more of a bundle of different Windows development tools initially. The first Visual Studio to integrate C++ and Visual Basic (in .NET guise) development into the same IDE was Visual Studio .NET in 2002, 20 years ago, and this perhaps is the true ancestor of today's IDE.

    A big change in VS 2022, released November, is that it is the first version where the IDE itself runs as a 64-bit process. The advantage is that it has access to more than 4GB memory in the devenv process, this being the shell of the IDE, though of course it is still possible to compile 32-bit applications. The main benefit is for large solutions comprising hundreds of projects. Although a substantial change, it is transparent to developers and from what we can tell, has been a beneficial change.

    Continue reading
  • James Webb Space Telescope has arrived at its new home – an orbit almost a million miles from Earth

    Funnily enough, that's where we want to be right now, too

    The James Webb Space Telescope, the largest and most complex space observatory built by NASA, has reached its final destination: L2, the second Sun-Earth Lagrange point, an orbit located about a million miles away.

    Mission control sent instructions to fire the telescope's thrusters at 1400 EST (1900 UTC) on Monday. The small boost increased its speed by about 3.6 miles per hour to send it to L2, where it will orbit the Sun in line with Earth for the foreseeable future. It takes about 180 days to complete an L2 orbit, Amber Straughn, deputy project scientist for Webb Science Communications at NASA's Goddard Space Flight Center, said during a live briefing.

    "Webb, welcome home!" blurted NASA's Administrator Bill Nelson. "Congratulations to the team for all of their hard work ensuring Webb's safe arrival at L2 today. We're one step closer to uncovering the mysteries of the universe. And I can't wait to see Webb's first new views of the universe this summer."

    Continue reading
  • LG promises to make home appliance software upgradeable to take on new tasks

    Kids: empty the dishwasher! We can’t, Dad, it’s updating its OS to handle baked on grime from winter curries

    As the right to repair movement gathers pace, Korea’s LG has decided to make sure that its whitegoods can be upgraded.

    The company today announced a scheme called “Evolving Appliances For You.”

    The plan is sketchy: LG has outlined a scenario in which a customer who moves to a locale with climate markedly different to their previous home could use LG’s ThingQ app to upgrade their clothes dryer with new software that makes the appliance better suited to prevailing conditions and to the kind of fabrics you’d wear in a hotter or colder climes. The drier could also get new hardware to handle its new location. An image distributed by LG shows off the ability to change the tune a dryer plays after it finishes a load.

    Continue reading
  • IBM confirms new mainframe to arrive ‘late’ in first half of 2022

    Hybrid cloud is Big Blue's big bet, but big iron is predicted to bring a welcome revenue boost

    IBM has confirmed that a new model of its Z Series mainframes will arrive “late in the first half” of 2022 and emphasised the new device’s debut as a source of improved revenue for the company’s infrastructure business.

    CFO James Kavanaugh put the release on the roadmap during Big Blue’s Q4 2021 earnings call on Monday. The CFO suggested the new release will make a positive impact on IBM’s revenue, which came in at $16.7 billion for the quarter and $57.35bn for the year. The Q4 number was up 6.5 per cent year on year, the annual number was a $2.2bn jump.

    Kavanaugh mentioned the mainframe because revenue from the big iron was down four points in the quarter, a dip that Big Blue attributed to the fact that its last mainframe – the Z15 – emerged in 2019 and the sales cycle has naturally ebbed after eleven quarters of sales. But what a sales cycle it was: IBM says the Z15 has done better than its predecessor and seen shipments that can power more MIPS (Millions of Instructions Per Second) than in any previous program in the company’s history*.

    Continue reading
  • Earthquake halts operations at two of Toshiba's chip factories

    6.6-rated rumble joins fire, snow, plague, and trade war as source of recent semiconductor supply chain SNAFUs

    A 6.6 magnitude earthquake that hit southwestern Japan around 1:00 AM last Saturday has led to the closing of Toshiba’s Oita semiconductor plant.

    The Japan Meteorological Agency (JMA) said the 'quake may have caused significant shaking, making it difficult to walk unassisted and causing items on shelves to fall.

    The agency also warned that more tremors and earthquakes could occur in the immediate days following the seismic activity.

    Continue reading

Biting the hand that feeds IT © 1998–2022