Amazon: Some data won't be recovered after cloud outage

Post mortem wait


Amazon says that about 0.07 per cent of the EBS storage volumes in the East Region of its infrastructure cloud are not "fully recoverable" following the extended outage that hit the service last Thursday.

The company has yet to fully explain the cause of the outage, but it still plans to publish a "post mortem" on the incident. "We are digging deeply into the root causes of this event," the company says in a post to its Amazon Web Services status dashboard.

In the early hours Pacific time on Thursday, Amazon said on its status page that it was investigating connectivity issues with its EC2 (Elastic Compute Cloud) service, which provides on-demand access to processing power via the web. The outage brought down many websites that run atop the service, including Quora, Sencha, Reddit, and FourSquare. According to one of the brief status messages from Amazon, the problem began with a "network event" that caused the service to re-mirror a large number of Elastic Block Store volumes in its East Region.

Amazon divides its "infrastructure cloud" into multiple geographic regions, and it guarantees 99.95 per cent availability within each region if you're using multiple "availability zones". Some regions – including the East Region, served up from Northern Virginia – are divided into these ostensibly separate zones, and Amazon has always said that these zones are "insulated" from each other's failures. But the East Region outage spread across multiple zones.

On Sunday, the company said that a "majority" of affected EBS volumes had been restored, but that it needed more time to restore data for some customers. But on Monday, it announced that some volumes would not be restored. "We have completed our remaining recovery efforts and though we've recovered nearly all of the stuck volumes, we've determined that a small number of volumes (0.07% of the volumes in our US-East Region) will not be fully recoverable," the company said.

It is in the process of contacting these customers.

For many – including Thorsten von Eicken, CTO of RightScale, an EC2 management service, and the employees of Scalr, an open source platform similar to RightScale – one of the chief problem is that Amazon has so far provided so little information about the outage. We await the post mortem with bated breath. Amazon has never said how its "availability zones" are designed. ®

Update: This story been updated to provide more detail on Amazon's uptime guarantee for EC2.

Similar topics


Other stories you might like

  • Pentester pops open Tesla Model 3 using low-cost Bluetooth module
    Anything that uses proximity-based BLE is vulnerable, claim researchers

    Tesla Model 3 and Y owners, beware: the passive entry feature on your vehicle could potentially be hoodwinked by a relay attack, leading to the theft of the flash motor.

    Discovered and demonstrated by researchers at NCC Group, the technique involves relaying the Bluetooth Low Energy (BLE) signals from a smartphone that has been paired with a Tesla back to the vehicle. Far from simply unlocking the door, this hack lets a miscreant start the car and drive away, too.

    Essentially, what happens is this: the paired smartphone should be physically close by the Tesla to unlock it. NCC's technique involves one gadget near the paired phone, and another gadget near the car. The phone-side gadget relays signals from the phone to the car-side gadget, which forwards them to the vehicle to unlock and start it. This shouldn't normally happen because the phone and car are so far apart. The car has a defense mechanism – based on measuring transmission latency to detect that a paired device is too far away – that ideally prevents relayed signals from working, though this can be defeated by simply cutting the latency of the relay process.

    Continue reading
  • Google assuring open-source code to secure software supply chains
    Java and Python packages are the first on the list

    Google has a plan — and a new product plus a partnership with developer-focused security shop Snyk — that attempts to make it easier for enterprises to secure their open source software dependencies.

    The new service, announced today at the Google Cloud Security Summit, is called Assured Open Source Software. We're told it will initially focus on some Java and Python packages that Google's own developers prioritize in their workflows. 

    These two programming languages have "particularly high-risk profiles," Google Cloud Cloud VP and GM Sunil Potti said in response to The Register's questions. "Remember Log4j?" Yes, quite vividly.

    Continue reading
  • Rocket Lab is taking NASA's CAPSTONE to the Moon
    Mission to lunar orbit is further than any Photon satellite bus has gone before

    Rocket Lab has taken delivery of NASA's CAPSTONE spacecraft at its New Zealand launch pad ahead of a mission to the Moon.

    It's been quite a journey for CAPSTONE [Cislunar Autonomous Positioning System Technology Operations and Navigation Experiment], which was originally supposed to launch from Rocket Lab's US launchpad at Wallops Island in Virginia.

    The pad, Launch Complex 2, has been completed for a while now. However, delays in certifying Rocket Lab's Autonomous Flight Termination System (AFTS) pushed the move to Launch Complex 1 in Mahia, New Zealand.

    Continue reading

Biting the hand that feeds IT © 1998–2022