A team from MIT has demonstrated a new type of deep-learning chip that dramatically speeds up the ability of neural networks to process and identify data.
In a presentation at the International Solid-State Circuits Conference in San Francisco, the researchers showed off Eyeriss, a chip designed specifically for deep learning. They claimed that it increased the speed of neural networks by a factor of ten over some GPUs, and could conceivably bring the technology to your phone.
"Deep learning is useful for many applications, such as object recognition, speech, face detection," said Vivienne Sze, assistant professor of electrical engineering at MIT.
"Right now, the networks are pretty complex and are mostly run on high-power GPUs. You can imagine that if you can bring that functionality to your cell phone or embedded devices, you could still operate even if you don't have a Wi-Fi connection. You might also want to process locally for privacy reasons. Processing it on your phone also avoids any transmission latency, so that you can react much faster for certain applications."
The speed increase comes down to some canny processor designs. Firstly, rather than sharing memory, each of the 168 cores on the Eyeriss chip has its own discrete memory cache to avoid having to port data around the system.
To cut down on data shipping time load, the chip also contains a specific circuit that compresses the information before sending it on, and decompresses it when needed at its destination.
But most importantly, the chip uses specialized circuitry that can be configured to shunt data between cores in the most efficient way possible. By spreading the load, the chip can crunch through more data faster before fetching more from the main memory store.
"This work is very important, showing how embedded processors for deep learning can provide power and performance optimizations that will bring these complex computations from the cloud to mobile devices," said Mike Polley, a senior vice president at Samsung's Mobile Processor Innovations Lab.
"In addition to hardware considerations, the MIT paper also carefully considers how to make the embedded core useful to application developers by supporting industry-standard [network architectures] AlexNet and Caffe."
The team members aren't the only ones designing custom hardware for neural networks. Amazon gets Intel to customize some of the chips for its EC2 service, and Google has a deal with Qualcomm to produce custom ARM server chips.
In addition, Chinese search king Baidu uses custom hardware from Nervana Systems. While GPUs have found a niche in neural networks, Nervana's co-founder Amir Khosrowshahi told El Reg's sister site The Next Platform that customization was the way to go.
"GPUs and CPUs emphasize floating point performance, which is something deep learning doesn't need. Further, it's computationally expensive," he said.
"If you do integer math instead, there are area and power costs that go up too and as a cloud service, that is something we need to avoid, since most of our operational costs are in power. What we chose then is a processor that doesn't use floating point, which gives us the option to jam more compute in and save on power." ®