Hot Chips Microsoft today teased chip designers with Brainwave, its cloud-hosted pool of FPGAs designed to perform AI stuff in real time.
The Windows giant has previously spoken of bunging FPGAs in its Azure cloud. It's been using the programmable logic gate arrays in its data centers for a few years now.
The chips are typically used in Redmond's servers as accelerators attached to CPUs via PCIe 3, with a 40Gb/s QSFP channel to the network controller – so it can access packets –– and another channel to a special network bus of FPGAs. These accelerators can be programmed to tackle tasks from calculating webpages' search rankings to machine-learning workloads, in dedicated silicon at high speed.
Brainwave seeks to enhance the performance of FPGAs used in Microsoft's cloud by turning each of the arrays into hardware microservices. Yes, the dreaded M word. There are a few key steps that have been taken to maximize efficiency in order to achieve on-the-fly processing for AI applications. These techniques were presented at the Hot Chips conference in Silicon Valley earlier today.
One is step is that, by using the latest Intel Stratix chips, Redmond's machine-learning models are stored entirely in memory within the gate array and not in RAM. That allows the model to persist within the chips, with the attached DRAM used only for buffering incoming and outgoing data.
Another step is to optimize the FPGA design so that every single array resource and memory is used to process an incoming query. That increases throughput and avoids having to crunch queries in batches, which ruins latency and hampers real-time analysis. In other words, it's possible to maintain a stream of data processing.
The next step is to pool the FPGAs as a collection of microservices to tackle a task. For example, let's say a process requires eight stages, or eight matrix-based equations performed on some data. Brainwave allocates eight FPGAs to form an eight-stage pipeline, flowing data from one chip to the next via the array network. Each FPGA on this network is two microseconds from each other, in terms of latency. Thus, the data is scheduled through the eight stages; as one stage is finished, it is allocated to another pipeline.
This approach can also be used to perform matrix math in parallel, running, say, a single dense matrix through eight FPAGs at the same time.
Microsoft appears to be using this technology internally for now, and has deployed it, or will shortly deploy it, in production for the usual things: Bing searches, computer vision, speech processing, and so on. Any external availability has yet to be announced.
"We are working to bring this powerful, real-time AI system to users in Azure, so that our customers can benefit from Project Brainwave directly, complementing the indirect access through our services such as Bing," said Microsoft engineer Doug Burger.
Each of these Brainwave-managed FPGAs has a Redmond-designed microarchitecture that has instructions specialized for machine learning, such as vector operations and non-linear activations.
Microsoft has been mentioning Brainwave here and there for a while, although only now revealing some of its technical details. Derek Chiou of Azure's silicon team gave a presentation earlier this year about it. Today, Redmond published a blog post about the technology, attaching its slides from Hot Chips for anyone who wants to peer deeper. ®