Bing Chat so hungry for GPUs, Microsoft will rent them from Oracle
Frenemies in multi-year deal to offload AI inference to Big Red super-cluster
Demand for Microsoft's AI services is apparently so great – or Redmond's resources so tight – that the software giant plans to offload some of the machine-learning models used by Bing Search to Oracle's GPU supercluster as part of a multi-year agreement announced Tuesday.
"Our collaboration with Oracle and use of Oracle Cloud infrastructure along with our Microsoft Azure AI infrastructure, will expand access to customers and improve the speed of many of our search results," Divya Kumar, who heads up Microsoft's Search and AI marketing team, explained in a statement.
The partnership essentially boils down to: Microsoft needs more compute resources to keep up with the alleged "explosive growth" of its AI services, and Oracle just happens to have tens of thousands of Nvidia A100s and H100 GPUs available for rent. Far be it from us to suggest the Larry-Ellison-founded database giant doesn't have enough cloud customers to consume its stocks of silicon.
Microsoft was among the first to integrate a generative AI chatbot into its search engine with the launch of Bing Chat back in February. You all know the drill by now: you can feed prompts, requests, or queries into Bing Chat, and it will try to look up information, write bad poetry, generate pictures and other content, and so on.
The large language models that underpin the service not only require massive clusters of GPUs to train, but for inferencing – the process of putting a model to work – to run at scale. It's Oracle's stack of GPUs that will help with this inference work.
The two cloud providers' latest collaboration takes advantage of the Oracle Interconnect for Microsoft Azure, which allows services running in Azure to interact with resources in Oracle Cloud Infrastructure (OCI). The two super-corps have previously used the service to allow customers to connect workloads running in Azure back to OCI databases.
- Developing AI models or giant GPU clusters? Uncle Sam would like a word
- Microsoft 365 Copilot 'generally available' – if you can afford 300 seats
- In-memory database Redis wants to dabble in disk
- OpenAI hits the GPT-4 Turbo button plus promises copyright shield for fans
In this case, Microsoft is using the system alongside its Azure Kubernetes Service to orchestrate Oracle's GPU nodes to keep up with what's said to be demand for Bing's AI features.
According to StatCounter, for October 2023, Bing had a 3.1 percent global web search market share for all platforms – that's compared to Google's 91.6 percent, but up from 3 percent the month before. On desktop, Bing climbed to 9.1 percent, and 4.6 percent for tablets.
Maybe StatCounter is wrong; maybe Microsoft's chatty search engine isn't as staggeringly popular as we're led to believe. Maybe Microsoft just wants to make Bing look like it's in high demand; maybe Redmond really does need the extra compute.
Oracle claims its cloud super-clusters, which presumably Bing will use, can each scale to 32,768 Nvidia A100s or 16,384 H100 GPUs using a ultra-low latency Remote Direct Memory Access (RDMA) network. This is supported by petabytes of high-performance cluster file storage designed to support highly parallel applications.
Microsoft hasn't said just how many of Oracle's GPU nodes it needs for its AI services and apps, and won't say. A spokesperson told us: “Those aren’t details we are sharing as part of this announcement.” We've asked Oracle too for more information and we'll let you know if we hear anything back.
This isn't the first time the frenemies have leaned on each other for help. Back in September Oracle announced it would colocate its database systems in Microsoft Azure datacenters. In that case, the collaboration was intended to reduce the latency associated with connecting Oracle databases running in OCI to workloads in Azure. ®