TensorBlow? Data boffins struggle with GPU shortage in Google Cloud, opposition offers to help out coders

Scale with the king of hyperscalers... until the silicon runs out, that is

Got Tips? 5 Reg comments

Updated The cloud continued to totter this month, and it was Google's turn to run into resource droughts – this time, virtual machine instances with GPUs.

Users have spent the past week struggling to spin up machines with graphics chip accelerators across all Google Cloud Platform zones, with gripes cropping up in the Chocolate Factory's own issue tracker as well Google Groups threads devoted to Mountain View's cloud services.

Like other cloud vendors, Google offer a range of GPUs to accelerate machine learning and data processing, as well as chips aimed at graphics workloads, such as 3D rendering. The internet giant suffered an embarrassing glitch last year with its wonder-chips that left some researchers thinking they had stumbled on a method of getting freebie acceleration before the grant-munching invoice arrived later.

The issue appears to be intermittent, with some users having success, presumably as others complete their jobs and free up the constrained resources. While some had GPU joy by upping their quota, one user noted they had "never seen the resource shortage be so widespread and long-lasting before."

Mark Zuckerberg

Nvidia's A100 GPU coming to a cloud near you, DARPA details AI war games, Intel wants to help scan your brain

READ MORE

Another user reported having a US VM equipped with an Nvidia Tesla K80 that threw a "does not have enough resources available to fulfill the request" error when the thing was fired up over the weekend for a bit of deep learning action. Others fell back back to a desktop GTX 1070 – hardly a scalable solution.

There's the cloud for that. Oh, wait...

It could be a coincidence, but the deadline for paper abstract submissions for this year's AI super-conference, NeurIPS, is May 27, so perhaps boffins globally are snapping up all available GPUs to run and test their models for their proposals.

Oracle's Karan Batta was as helpful as one might expect a cloud rival to be, suggesting Big Red was "happy to help out... we've got plenty" of GPU capacity in response to a Twitter thread concerning the issue. Other users suggested a jump to AWS for urgent workloads.

Since Larry Ellison's crew trails behind Amazon, Microsoft, and Google in the market-share stakes, the news that it has some unused capacity is not altogether surprising.

The Register asked Google for its take on things, and we have yet to hear back. Certainly, normal compute processing on the platform appears unaffected, just the tasks that need the more exotic Nvidia hardware. ®

Updated to add

"We are aware of an issue with a small number of customers being able to access GPU infrastructure and have made changes to improve their ability to obtain capacity," a Google Cloud spokesperson told The Reg. "Impacted customers should start seeing improvements today, and our team will continue to monitor the situation."

Sponsored: Webcast: Ransomware has gone nuclear

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER


Biting the hand that feeds IT © 1998–2020