Google IO The latest iteration of Google’s custom-designed number-crunching chip, version three of its Tensor Processing Unit (TPU), will dramatically cut the time needed to train machine learning systems, the Chocolate Factory has claimed.
Google CEO Sundar Pichai revealed the third version of the Google-crafted matrix math processor during his Google IO developer conference keynote, saying a pod of TPU 3.0s is eight times faster than a pod of its predecessor. In a separate session, Zak Stone, product manager for TensorFlow and Cloud TPUs, gave a slightly more deeper dive into the details.
To catch yourself up, read about the first TPU here, and TPU 2.0 here. The TPU 1.0 was, relatively speaking, a primitive affair, with no support for branch instructions, and in fact only supported about eight software instructions. It was more of a handy math accelerator for training models, hooked up to host CPUs, than anything else. The TPU 2.0 was more complex with the ability to run more as a standalone chip, and was made available to developers via Google Cloud. The TPU 3.0, presumably, takes it further.
Basically, the web giant needed more dedicated computing power to keep up with its latest neural networks. That's why it keeps updating its custom math unit silicon.
“Initially, machine learning systems for things like imaging or speech recognition used their own coding techniques,” Stone intoned. “But we’ve seen a coming together of neural networks across all these different tasks. That has come at a cost – they tend to be larger and you need more computation to run them – so we need specialized hardware for machine learning.”
The first TPU was pressed into service in 2015, and the new chipset has dramatically increased performance over earlier designs, Stone claimed. So much so that the hardware now has to be liquid-cooled to cope with silicon you could fry bacon on.
Feelin' hot hot hot ... Liquid cooled TPU3
Exactly what's inside the new TPU 3.0 devices wasn't revealed, except to say it's faster than previous generations. Generally, Google doesn't say anything about one generation of TPU until the next is heavily deployed. Presumably, the TPU 3.0 will follow TPU 2.0 into the cloud, and be used not just for running Google's own internal code for its services but also customer workloads.
These TPUs are racked together into pods, and they are all network attached, making them ideal for cloud use. There are typically 64 devices per pod, four ASIC chips per device, and two cores per chip. They are, as you’d expect, optimized for Google’s TensorFlow-based software – TensorFlow being a popular machine-learning toolset, and one of the most active groups on GitHub.
A pod of TPU 3.0s can crunch numbers at 100 petaFLOPS or more, it is claimed. A pod of TPU 2.0s clocked in at about 11.5 petaFLOPS tops. Eight times 11.5 is 92, which is about 100. It's all rather nebulous, though, because Google shies away from stating the precision of the math in these benchmarks, and the precision will make a difference.
“The days of single systems are irrelevant, are over,” Stone added. “These cloud TPUs give you a dial you can turn. Set up model up on a small system, get it running and then dial it up to take training time from hours to minutes.”
Google will begin large-scale deployment of the new TPUs in a few months. ®