With OpenAI, there are no allegiances - just compute at all costs

Google's TPUs might not be on Altman's menu just yet, but he's never been all that picky about hardware

Analysis No longer bound to Microsoft's infrastructure, OpenAI is looking to expand its network of compute providers to the likes of Oracle, CoreWeave, and apparently even rival model builder Google.

But while OpenAI may have set up shop at the Chocolate Factory, it won't be using Google's home-grown tensor processing units (TPUs) to run or train its models anytime soon, the AI darling tells Reuters.

In a statement made to the publication over the weekend, OpenAI admitted it was playing with Google's TPUs, but didn't have any plans to deploy them at scale right now.

The denial comes days after The Information reported that Google had managed to convince the model builder to transition its workloads over to the home-grown accelerators.

OpenAI's alleged embrace of Google's TPU tech was seen by many as a sign that the Sam Altman-backed model builder wasn't only looking to end its reliance on Microsoft, but was also looking to curb its dependence on Nvidia hardware as well. 

However, if you've been paying attention, you'd know that OpenAI has been diversifying its hardware stack for a while now. The company may have gotten its start using Nvidia's DGX systems, but it's never been an exclusive relationship.

Over the years, the model builder's fleet of GPTs has run on a wide variety of hardware. You may recall that Microsoft had GPT-3.5 running on its home-grown Maia accelerators. 

Microsoft — OpenAI's chief infrastructure provider until just recently — was also among the first to adopt AMD's Instinct MI300-series accelerators, with running models like GPT-4 being one of the key use cases for them. 

AMD's accelerators have historically offered higher memory capacities and bandwidth, likely making them more economical than Nvidia's GPUs for model serving.

And even as OpenAI's ties to Microsoft soften, AMD remains a key hardware partner for the budding AI behemoth. Last month, Altman took the stage at AMD's Advancing AI event in San Jose to highlight their ongoing collaboration.

If that weren't enough, OpenAI is reportedly developing an AI chip of its own to further optimize the ratio of compute, memory, bandwidth, and networking for its training and inference pipelines.

Considering all this, the idea that OpenAI is playing with Google's home-grown silicon isn't that surprising. The search engine's Gemini models have already shown the architecture is more than capable of large-scale training. 

Google also offers a number of different configurations of these accelerators, each with different ratios of compute, memory, and scalability, which would give OpenAI a degree of flexibility depending on whether they were hurting for compute-intensive training jobs or memory bandwidth bound inference workloads.

The Chocolate Factory's 7th-generation of Ironwood TPUs boast up to 4.6 petaFLOPS of dense FP8 performance, 192 GB of high-bandwidth memory (HBM) good for 7.4TB/s of bandwidth, and 1.2TB/s of inter-chip bandwidth, putting them in the same ballpark as Nvidia's Blackwell accelerators.

TPUv7 can be had in two configurations: a pod with 256 chips or 9,216. We're told multiple pods can be tied together to further extend compute capacity to more than 400,000 accelerators. And if there's anything that gets Sam Altman excited, it's massive quantities of compute.

So why is it that OpenAI decided against using Google's TPUs? There could be a couple of factors at play here. It's possible that performance just wasn't as good as expected, or that Google may not have had enough TPUs lying around to meet OpenAI's needs, or simply that the cost per token was too high.

However, the most obvious answer is that OpenAI's software stack has, for the most part, been optimized to run on GPUs. Adapting this software to take full advantage of Google's TPU architecture would take time and additional resources, and ultimately may not offer much, if any, tangible benefit over sticking with GPUs.

As they say, the grass is always greener on the other side. You'll never know for sure unless you check. ®

More about

TIP US OFF

Send us news


Other stories you might like