Brace yourselves: Google Cloud preps server firmware upgrade to fix GPU glitches

If you run a physical graphics accelerator and SSD in the G-cloud, pay attention


Google Cloud has warned that some of its servers will soon have their firmware upgraded, an event that will likely disrupt workloads.

An email sent to Google Compute Engine (GCE) customers on Tuesday, and an accompanying incident report, state: “We are experiencing an issue with Google Compute Engine beginning in 2020-08. A firmware rollout is being created that should address the issue.”

“The rollout is currently expected to complete next week, but mitigation efforts are still ongoing,” Google’s advisory added. “Affected customers will experience elevated frequency of Host Maintenance events.”

A what? In this support document Google explains that GPU Host Maintenance events translate into downtime while the underlying cloud platform is updated and tweaked.

“GPU instances cannot be live migrated,” the document reads. “You must set your GPU instances to stop for host maintenance events. If needed, you can set your stopped instances to automatically restart after the maintenance event completes.”

Host Maintenance events are not uncommon. Google warn its cloud subscribers that they should expect one every two weeks, and sometimes more frequently. The idea is you use the gear for batch processing for things like AI training, or take the restarts into account if you need the systems available all the time.

Google says using its cloud servers powered by Nvidia's V100 GPU are unaffected, which tells us that this specific problem impacts servers in the G-fleet that feature other GPU accelerators, such as Nvidia's Tesla P4, T4, K80 and P100.

While GCE customers have some work to do ahead of this event, The Register cannot find evidence that whatever issue the firmware upgrade addresses has created noticeable problems. If the upgrade delivers significant performance improvements it will be a little embarrassing given GPUs and SSDs attract premium prices on the basis of their superior specs. ®


Keep Reading

Intel is over GPUs and CPUs – it's all about 'XPUs' now that OneAPI code-abstraction tool is golden

And why not have a server-grade GPU dedicated to game-streaming to test the new abstraction tool

AMD claims high-end Big Navi Radeon GPUs leave Nvidia's ray-tracing cards in the dust

If you don't want to wait for new stock or splash out on the RTX 30 series, consider AMD's RX 6000s

Nvidia touts another two spanking new GPUs to join its list of Ampere architecture based goodies

GTC 2020 Also, you can sign up for to use its Omniverse and CloudXR SDK graphics platforms

You're stuck inside, gaming's getting you through, and you've $1,500 to burn. Check out Nvidia's latest GPUs

Kitchen table chat tries to sell you on the latest kit, AI devs might like it, too

You know what would look great on our database? Your machine learning model: GPUs and unstructured data on the menu for Exasol as it tries to unify BI and ML

Keeping up in performance stakes vital as data science sector explodes, says analyst

Meet the ‘DPU’ – accelerated network cards designed to go where CPUs and GPUs can’t be bothered

Analysis You may know them as SmartNICs and they’re touted as making AI, clouds and 5G scale

Pssst.... build your own machine learning computer, it's cheaper and even faster than using GPUs on cloud

Nvidia's fancy Volta V100 chips not always worth it

The knives are out for cloud gaming as Nvidia flashes blade-based box packing 40 RTX GPUs

GTC It might actually be able to run Crysis this time... maybe

Biting the hand that feeds IT © 1998–2020