Google broke its own cloud again

Software bug confused SSDs into thinking they were full during maintenance


Google has 'fessed up to breaking its own cloud. Again.

The most recent mess occurred on June 28 when Google Compute Engine SSD Persistent Disks in us-central1-a “experienced elevated write latency and errors in one zone for a duration of 211 minutes.” The mess meant that disks probably stopped accepting writes and instances that used SSDs as their root partition probably hung.

Google's very good about revealing just why things go bad in its cloud. This time around it says: “Two concurrent routine maintenance events triggered a rebalancing of data by the distributed storage system underlying Persistent Disk.”

Nothing to worry about there, because “this rebalancing is designed to make maintenance events invisible to the user, by redistributing data evenly around unavailable storage devices and machines.”

Which is just how a cloud should behave: lots of moving parts at the back end invisible to you, who just keeps getting well-behaved servers.

But on this occasion, “a previously unseen software bug, triggered by the two concurrent maintenance events, meant that disk blocks which became unused as a result of the rebalance were not freed up for subsequent reuse, depleting the available SSD space in the zone until writes were rejected.”

Oops.

And once the disks thought they'd run out of space, no amount of clever back-endery could help for the 211 minutes it took Google to figure it out and set things to rights.

As ever, Google's pledged to do better in future and says its “engineers are refining automated monitoring such that, if this issue were to recur, engineers would be alerted before users saw impact. We are also improving our automation to better coordinate different maintenance operations on the same zone to reduce the time it takes to revert such operations if necessary.”

As we've previously noted, Google is more candid than its rivals when it discloses outages and their causes. But it also appears to have more outages to disclose: The Register monitors the big three clouds' outage notifications and Google announces problems more than either AWS and Microsoft, both of which have larger clouds with more products.

The Alphabet subsidiary's new cloud chief Diane Greene has quite a job ahead of her. ®

Broader topics


Other stories you might like

  • Robotics and 5G to spur growth of SoC industry – report
    Big OEMs hogging production and COVID causing supply issues

    The system-on-chip (SoC) side of the semiconductor industry is poised for growth between now and 2026, when it's predicted to be worth $6.85 billion, according to an analyst's report. 

    Chances are good that there's an SoC-powered device within arm's reach of you: the tiny integrated circuits contain everything needed for a basic computer, leading to their proliferation in mobile, IoT and smart devices. 

    The report predicting the growth comes from advisory biz Technavio, which looked at a long list of companies in the SoC market. Vendors it analyzed include Apple, Broadcom, Intel, Nvidia, TSMC, Toshiba, and more. The company predicts that much of the growth between now and 2026 will stem primarily from robotics and 5G. 

    Continue reading
  • Deepfake attacks can easily trick live facial recognition systems online
    Plus: Next PyTorch release will support Apple GPUs so devs can train neural networks on their own laptops

    In brief Miscreants can easily steal someone else's identity by tricking live facial recognition software using deepfakes, according to a new report.

    Sensity AI, a startup focused on tackling identity fraud, carried out a series of pretend attacks. Engineers scanned the image of someone from an ID card, and mapped their likeness onto another person's face. Sensity then tested whether they could breach live facial recognition systems by tricking them into believing the pretend attacker is a real user.

    So-called "liveness tests" try to authenticate identities in real-time, relying on images or video streams from cameras like face recognition used to unlock mobile phones, for example. Nine out of ten vendors failed Sensity's live deepfake attacks.

    Continue reading
  • Lonestar plans to put datacenters in the Moon's lava tubes
    How? Founder tells The Register 'Robots… lots of robots'

    Imagine a future where racks of computer servers hum quietly in darkness below the surface of the Moon.

    Here is where some of the most important data is stored, to be left untouched for as long as can be. The idea sounds like something from science-fiction, but one startup that recently emerged from stealth is trying to turn it into a reality. Lonestar Data Holdings has a unique mission unlike any other cloud provider: to build datacenters on the Moon backing up the world's data.

    "It's inconceivable to me that we are keeping our most precious assets, our knowledge and our data, on Earth, where we're setting off bombs and burning things," Christopher Stott, founder and CEO of Lonestar, told The Register. "We need to put our assets in place off our planet, where we can keep it safe."

    Continue reading

Biting the hand that feeds IT © 1998–2022