Revealed: Blueprints to Google's AI FPU aka the Tensor Processing Unit

PCIe-connected super-calculator trounces outdated competition


Analysis In 2013, Google realized that its growing dependence on machine learning would force it to double the number of data centers it operates to handle projected workloads.

Based on the scant details Google provides about its data center operations – which include 15 major sites – the search-and-ad giant was looking at additional capital expenditures of perhaps $15bn, assuming that a large Google data center costs about $1bn.

The internet king assembled a team to produce a custom chip capable of handling part of its neural network workflow known as inference, which is where the software makes predictions based on data developed through the time-consuming and computationally intensive training phase. The processor sits on the PCIe bus and accepts commands from the host CPU: it is akin to a yesteryear discrete FPU or math coprocessor, but obviously souped up to today's standards.

The goal was to improve cost-performance over GPUs tenfold. By Google's own estimation, it succeeded, though its in-house hardware was competing against chips that have since been surpassed.

In a paper published in conjunction with a technical presentation at the National Academy of Engineering meeting at the Computer History Museum in Silicon Valley, Google engineers – more than 70 of them – have revealed how the web giant's Tensor Processing Unit (TPU), a custom ASIC designed to process TensorFlow machine-learning jobs, performs in its data centers.

Google introduced its TPU at Google I/O 2016. Distinguished hardware engineer – and top MIPS CPU architect – Norm Jouppi in a blog post said Google had been running TPUs in its data centers since 2015 and that the specialized silicon delivered "an order of magnitude better-optimized performance per watt for machine learning."

Jouppi went so far as to suggest that the improvements amounted to traveling forward in time seven years – about three chip generations under Moore's Law.

Google executives had previously declared that artificial intelligence, in the form of machine learning and related technologies, was critical to the company's future. The existence of hardware custom-built for that purpose reinforced those statements.

Now, performance tests the company has run against Intel's Haswell CPU and Nvidia's Tesla K80 GPU appear to validate its approach.

Based on workloads involving neural network inference, "the TPU is 15x to 30x faster than contemporary GPUs and CPUs," said Jouppi in a blog post on Wednesday, and achieves "30x to 80x improvement" as measured in TOPS/Watt (trillions of computations per Watt).

Not so fast

In a post to Reddit, Justin Johnson, a PhD student in the Stanford Vision Lab, points out that Google's researchers conducted their comparison with a Tesla K80 GPU, which is two generations old and lacks the hardware support for computation found in the TPU.

"The comparison doesn't look quite so rosy next to the current-gen Tesla P40 GPU, which advertises 47 INT8 TOP/s at 250W TDP; compared to the P40, the TPU is about 1.9x faster and 6.5x more energy-efficient," Johnson wrote.

Still, Google's results suggest that the premise laid out in its paper – that "major improvements in cost-energy-performance must now come from domain-specific hardware" – has merit. In other words, semiconductor makers may become more inclined to match the hardware they design with anticipated applications.

In an email to The Register, Johnson explained that the TPU is special-purpose hardware designed to accelerate the inference phase in a neural network, in part through quantizing 32-bit floating point computations into lower-precision 8-bit arithmetic.

"This allows it to achieve significantly faster speeds and better energy efficiency than general-purpose GPUs," said Johnson. "Energy efficiency is particularly important in a large-scale datacenter scenario, where improving energy efficiency can significantly reduce cost when running at scale."

Johnson said he wasn't sure about the broad significance of the TPU. "Since it is not intended for training, I think that researchers will likely stick with Nvidia hardware for the near future," he said. "Designing your own custom hardware is a huge engineering effort that is likely beyond the capabilities of most companies, so I don't expect each and every company to have its own bespoke TPU-esque chips any time soon."

Nonetheless, he speculates TPUs could help Google Cloud Platform undercut competing services from Amazon Web Services, at least among customers running trained neural network models in production. ®


Other stories you might like

  • Employers in denial over success of digital skills training, say exasperated staffers

    Large disparities in views from bosses vs workers on 'talent transformation initiatives,' says survey

    Digital transformation projects are being held back by a lack of skills, according to a new survey, which finds that while many employers believe they are doing well at training up existing staff to meet the requirements, their employees beg to differ.

    Skills shortages are nothing new, but the Talent Transformation Global Impact report from research firm Ipsos on behalf of online learning provider Udacity indicates that although digital transformation initiatives are stalling due to a lack of digital talent, enterprises are becoming increasingly out of touch with what their employees need to fill the skills gap.

    The report is the result of two surveys taking in over 2,000 managers and more than 4,000 employees across the US, UK, France, and Germany. It found that 59 per cent of employers state that not having enough skilled employees is having a major or moderate impact on their business.

    Continue reading
  • Saved by the Bill: What if... Microsoft had killed Windows 95?

    Now this looks like a job for me, 'cos we need a little, controversy... 'Cos it feels so NT, without me

    Former Microsoft veep Brad Silverberg has paid tribute to Bill Gates for saving Windows 95.

    Silverberg posted his comment in a Twitter exchange started by Fast co-founder Allison Barr Allen regarding somebody who'd changed your life. Silverberg responded "Bill Gates" and, in response to a question from Microsoft cybersecurity pro Ashanka Iddya, explained Gates's role in Windows 95's survival.

    Continue reading
  • UK government opens consultation on medic-style register for Brit infosec pros

    Are you competent? Ethical? Welcome to UKCSC's new list

    Frustrated at lack of activity from the "standard setting" UK Cyber Security Council, the government wants to pass new laws making it into the statutory regulator of the UK infosec trade.

    Government plans, quietly announced in a consultation document issued last week, include a formal register of infosec practitioners – meaning security specialists could be struck off or barred from working if they don't meet "competence and ethical requirements."

    The proposed setup sounds very similar to the General Medical Council and its register of doctors allowed to practice medicine in the UK.

    Continue reading
  • Microsoft's do-it-all IDE Visual Studio 2022 came out late last year. How good is it really?

    Top request from devs? A Linux version

    Review Visual Studio goes back a long way. Microsoft always had its own programming languages and tools, beginning with Microsoft Basic in 1975 and Microsoft C 1.0 in 1983.

    The Visual Studio idea came from two main sources. In the early days, Windows applications were coded and compiled using MS-DOS, and there was a MS-DOS IDE called Programmer's Workbench (PWB, first released 1989). The company also came up Visual Basic (VB, first released 1991), which unlike Microsoft C++ had a Windows IDE. Perhaps inspired by VB, Microsoft delivered Visual C++ 1.0 in 1993, replacing the little-used PWB. Visual Studio itself was introduced in 1997, though it was more of a bundle of different Windows development tools initially. The first Visual Studio to integrate C++ and Visual Basic (in .NET guise) development into the same IDE was Visual Studio .NET in 2002, 20 years ago, and this perhaps is the true ancestor of today's IDE.

    A big change in VS 2022, released November, is that it is the first version where the IDE itself runs as a 64-bit process. The advantage is that it has access to more than 4GB memory in the devenv process, this being the shell of the IDE, though of course it is still possible to compile 32-bit applications. The main benefit is for large solutions comprising hundreds of projects. Although a substantial change, it is transparent to developers and from what we can tell, has been a beneficial change.

    Continue reading
  • James Webb Space Telescope has arrived at its new home – an orbit almost a million miles from Earth

    Funnily enough, that's where we want to be right now, too

    The James Webb Space Telescope, the largest and most complex space observatory built by NASA, has reached its final destination: L2, the second Sun-Earth Lagrange point, an orbit located about a million miles away.

    Mission control sent instructions to fire the telescope's thrusters at 1400 EST (1900 UTC) on Monday. The small boost increased its speed by about 3.6 miles per hour to send it to L2, where it will orbit the Sun in line with Earth for the foreseeable future. It takes about 180 days to complete an L2 orbit, Amber Straughn, deputy project scientist for Webb Science Communications at NASA's Goddard Space Flight Center, said during a live briefing.

    "Webb, welcome home!" blurted NASA's Administrator Bill Nelson. "Congratulations to the team for all of their hard work ensuring Webb's safe arrival at L2 today. We're one step closer to uncovering the mysteries of the universe. And I can't wait to see Webb's first new views of the universe this summer."

    Continue reading

Biting the hand that feeds IT © 1998–2022