This article is more than 1 year old
Google uses deep learning to design faster, smaller AI chips
Silicon engineers, you are now in the PRIME of your life
Googlers and UC Berkeley academics say they have devised a way to use artificial intelligence to design faster and smaller chips that accelerate artificial intelligence.
In a note shared on Thursday, the researchers said they have developed a deep-learning approach called PRIME that generates AI chip architectures by drawing from existing blueprints and performance figures. They claim their approach can produce designs that have lower latency and require less space than Google's in-production EdgeTPU accelerator and other designs made using traditional tools.
Google has quite an interest in this area. Last year, it said it had used machine learning to optimize the layout of one of its TPU designs. Meanwhile, traditional chip design toolmakers, such as Synopsys and Cadence, say they've added machine-learning to their software suites.
These latest findings from Google could prove a game-changer for the web giant's custom chip design efforts. They were detailed in a paper titled, "Data-Driven Offline Optimization for Architecting Hardware Accelerators," which was accepted for this year's International Conference on Learning Representations.
Outside of enabling faster and more efficient designs, the PRIME approach is significant because traditional simulation-based chip design can be time-consuming and computationally expensive, according to the researchers. They said designing chips using simulation software can also lead to "infeasible" blueprints when trying to optimize for certain things like low power usage or low latency.
The team said chip designs made the PRIME way had up to 50 percent less latency than those created using simulation-driven methods, and the deep learning approach also reduced that amount of time to generate said blueprints by up to 99 percent.
The researchers compared PRIME-generated chip designs against the simulation-produced EdgeTPU across nine AI applications, which included image classification models MobileNetV2 and MobileNetEdge. Crucially, the PRIME designs were optimized for each application.
They said they found that the PRIME chip designs overall improved latency by 2.7x and reduced die area usage by 1.5x. This last part shocked the boffins because they did not train PRIME to reduce die size, which can make chips cheaper and lower power consumption. For certain models, the latency and die area improvements were even greater.
The researchers also used PRIME to design chips that were optimized to work well across multiple applications. They found that the PRIME designs still had less latency than simulation-driven designs. Perhaps more surprising, this was even the case when the PRIME designs ran on applications for which there was no training data. What's more, the performance improved with more applications.
- If you want to make your own chip and aren't Microsoft rich, who do you turn to?
- Googlers and co offer video dataset-generating Kubric
- SPEC mulls benchmarks for ML processing performance
- Startups competing with OpenAI's GPT-3 all need to solve the same problems
Finally, the researchers used PRIME to design a chip that could provide the best performance across the nine aforementioned applications. There were only three applications where the PRIME design had higher latency than a simulation-driven design, and the researchers found this was because PRIME favors designs that have larger on-chip memory and, as a result, less processing power.
Drilling down into how PRIME actually works, the researchers created what is called a robust prediction model that learns how to generate optimized chip designs by being fed offline data of AI chip blueprints, including ones that don't work. To avoid typical pitfalls associated with using supervised machine learning, the researchers devised PRIME to avoid being misled by so-called adversarial examples.
The researchers said this approach allows the model to optimize for targeted applications. PRIME can also optimize for applications for which there is no training data, which is accomplished by training a single large model on design data across applications for which data is available.
While this won't change Google's chip engineering ways overnight, the researchers said it has promise for multiple avenues. This includes creating chips for applications that require solving complex optimization problems as well as using low-performing chip blueprints as training data to help kick-start hardware design.
They also hope to use PRIME for hardware-software co-design thanks to its general-purpose nature. ®