How Google hopes to build more efficient, multi-capability AI systems

Architecture may make it possible to train one machine-learning model that performs all sorts of tasks

Google says it is developing an AI architecture that can be used to train one giant system capable of performing multiple different tasks more efficiently than what's possible with today's models.

Machine-learning models are typically built to tackle a particular challenge, such as object detection or facial recognition, and usually have to be trained from scratch when the scope or nature of the problem changes. Developers find themselves train separate models for each type of task that needs to be performed, each requiring different datasets.

Training these models can be expensive – especially as they grow in complexity and size. Google wants to develop a type of computational architecture that can train a single giant system capable of performing multiple types of task, and can be continuously updated to learn new capabilities.

Jeff Dean, senior fellow and SVP of Google Research and Google Health, introduced the idea of Pathways last year to achieve this.

"We'd like to train one model that can not only handle many separate tasks, but also draw upon and combine its existing skills to learn new tasks faster and more effectively. That way, what a model learns by training on one task – say, learning how aerial images can predict the elevation of a landscape – could help it learn another task – say, predicting how flood waters will flow through that terrain," he wrote in a blog post in October.

"We want a model to have different capabilities that can be called upon as needed, and stitched together to perform new, more complex tasks – a bit closer to the way the mammalian brain generalizes across tasks."

Dean and his colleagues haven't quite managed that yet, according to a paper [PDF] describing the Pathways architecture in more detail, released this week. But they have demonstrated how such a system might work in the future.

Pathways allows developers to train their models more efficiently across thousands of tensor processing unit (TPU) chips, coordinates the data transfer between chips, and schedules the necessary computations that need to be executed in parallel.

A single machine-learning algorithm is trained in a distributed manner, where all the chips crunching the data communicate via high-bandwidth interconnects – like Nvidia's NVLink – to run the same computations in parallel. The speed at which the algorithm can be trained is limited by how many chips can be connected in a single system, and how fast they can communicate with each other.

Pathways, however, allows models to be trained over multiple networks of chips. Google researchers used the architecture to run programs written in JAX across multiple TPU pods for the first time, scaling to over 2,048 TPUs.

"Pathways uses a client-server architecture that enables Pathways' runtime to execute programs on system-managed islands of compute on behalf of many clients," the paper explains. "Pathways is the first system designed to transparently and efficiently execute programs spanning multiple 'pods' of TPUs and it scales to thousands of accelerators by adopting a new dataflow execution model."

Google hopes that the architecture can be expanded further to improve the way a model handles sparsity. Traditional neural networks typically require the whole system to perform computations when it is trained; it's more efficient, however, to only activate a small portion of its neurons instead of the whole network. This sparsity can be used by Pathways to enable a single model to adapt better to new tasks over time. 

One day it may be possible to train new models on different modalities of data, too – to create one giant overarching system instead of smaller, specialized ones.

According to Dean, "Pathways will enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data, and to do so with remarkable efficiency – advancing us from the era of single-purpose models that merely recognize patterns to one in which more general-purpose intelligent systems reflect a deeper understanding of our world and can adapt to new needs." ®

Other stories you might like

Biting the hand that feeds IT © 1998–2022