What sort of silicon brain do you need for artificial intelligence?
Using CPUs, GPUs, FPGAs and ASICS to make sense of AI
The Raspberry Pi is one of the most exciting developments in hobbyist computing today. Across the world, people are using it to automate beer making, open up the world of robotics and revolutionise STEM education in a world overrun by film students. These are all laudable pursuits. Meanwhile, what is Microsoft doing with it? Creating squirrel-hunting water robots.
Over at the firm’s Machine Learning and Optimization group, a researcher saw squirrels stealing flower bulbs and seeds from his bird feeder. The research team trained a computer vision model to detect squirrels, and then put it onto a Raspberry Pi 3 board. Whenever an adventurous rodent happened by, it would turn on the sprinkler system.
Microsoft’s sciurine aversions aren’t the point of that story – its shoehorning of a convolutional neural network onto an ARM CPU is. It shows how organizations are pushing hardware further to support AI algorithms. As AI continues to make the headlines, researchers are pushing its capabilities to make it increasingly competent at basic tasks such as recognizing vision and speech.
As people expect more of the technology, cramming it into self-flying drones and self-driving cars, the hardware challenges are increasing. Companies are producing custom silicon and computing nodes capable of handling them.
Jeff Orr, research director at analyst firm ABI Research, divides advances in AI hardware into three broad areas: cloud services, on‑device, and hybrid. The first focuses on AI processing done online in hyperscale data centre environments like Microsoft’s, Amazon’s and Google’s.
At the other end of the spectrum, he sees more processing happening on devices in the field, where connectivity or latency prohibit sending data back to the cloud.
“It’s using maybe a voice input to allow for hands-free operation of a smartphone or a wearable product like smart glasses,” he says. “That will continue to grow. There’s just not a large number of real-world examples on‑device today.” He views augmented reality as a key driver here. Or there’s always this app, we suppose.
Finally, hybrid efforts marry both platforms to complete AI computations. This is where your phone recognizes what you’re asking it but asks cloud-based AI to answer it, for example.
The cloud: rAIning algorithms
The cloud’s importance stems from the way that AI learns. AI models are increasingly moving to deep learning, which uses complex neural networks with many layers to create more accurate AI routines.
There are two aspects to using neural networks. The first is training, where the network analyses lots of data to produce a statistical model. This is effectively the “learning” phase. The second is inference, where the neural network then interprets new data to generate accurate results. Training these networks chews up vast amounts of computing power, but the training load can be split into many tasks that run concurrently. This is why GPUs, with their double floating point precision and huge core counts, are so good at it.
Nevertheless, neural networks are getting bigger and the challenges are getting greater. Ian Buck, vice president of the Accelerate Computing Group at dominant GPU vendor Nvidia, says that they’re doubling in size each year. The company is creating more computationally intense GPU architectures to cope, but it is also changing the way it handles its maths.
“It can be done with some reduced precision,” he says. Originally, neural network training all happened in 32‑bit floating point, but it has optimized its newer Volta architecture, announced in May, for 16‑bit inputs with 32‑bit internal mathematics.
Reducing the precision of the calculation to 16 bits has two benefits, according to Buck.
“One is that you can take advantage of faster compute, because processors tend to have more throughput at lower resolution,” he says. Cutting the precision also increases the amount of available bandwidth, because you’re fetching smaller amounts of data for each computation.
“The question is, how low can you go?” asks Buck. “If you go too low, it won’t train. You’ll never achieve the accuracy you need for production, or it will become unstable.”
While Nvidia refines its architecture, some cloud vendors have been creating their own chips using alternative architectures to GPUs. The first generation of Google’s Tensor Processing Unit (TPU) originally focused on 8‑bit integers for inference workloads. The newer generation, announced in May, offers floating point precision and can be used for training, too. These chips are application-specific integrated circuits (ASICs). Unlike CPUs and GPUs, they are designed for a specific purpose (you’ll often see them used for mining bitcoins these days) and cannot be reprogrammed. Their lack of extraneous logic makes them extremely high in performance and economic in their power usage – but very expensive.
Google's scale is large enough that it can swallow the high non-recurring expenditures (NREs) associated with designing the ASIC in the first place because of the cost savings it achieves in AI‑based data centre operations. It uses them across many operations, ranging from recognizing Street View text to performing Rankbrain search queries, and every time a TPU does something instead of a GPU, Google saves power.
“It’s going to save them a lot of money,” said Karl Freund, senior analyst for high performance computing and deep learning at Moor Insights and Strategy.
He doesn’t think that’s entirely why Google did it, though. “I think they did it so they would have complete control of the hardware and software stack.” If Google is betting the farm on AI, then it makes sense to control it from endpoint applications such as self-driving cars through to software frameworks and the cloud.