AWS outage killed some cloudy servers, recovery time is uncertain
‘Power event’ blamed, hit subset of kit in US-EAST-1
Updated Parts of Amazon Web Services' US-East-1 region have experienced about half an hour of downtime, but some customers' instances and data can't be restored because the hardware running them appears to have experienced complete failure.
The cloud colossus’ status page reports an investigation of “connectivity issues affecting some instances in a single Availability Zone in the US-EAST-1 Region” as of 3:13 PM PDT on Thursday, May 31.
A 3:42 PM update confirmed “an issue in one of the datacenters that makes up one of US-EAST-1 Availability Zones. This was a result of a power event impacting a small percentage of the physical servers in that datacenter as well as some of the networking devices.”
“Customers with EC2 instances in this availability zone may see issues with connectivity to the affected instances. We are seeing recovery and continue to work toward full resolution.
By 4:29 PM the company said it had “restored power to the vast majority of the affected instances and continue to work towards full recovery.”
Cloud is a six-horse race, and three of those have been lappedREAD MORE
AWS’ EC2, Relational Database Service, Workspaces and Redshift were all impacted.
Plenty of AWS users took to Twitter to apologise for the outage’s effect on their services. One such complainant was Open Whisper Systems, maker of the Signal secure messaging service. Signal and AWS recently clashed over “domain fronting” , a technique Signal uses to enhance privacy but which AWS doesn’t like because it exposes it to risk.
Amazon threatens to suspend Signal's AWS account over censorship circumvention: https://t.co/8llgFKoCGY— Signal (@signalapp) May 1, 2018
Some saw Signal going dark as evidence AWS had booted the company off its platform. Happily, Signal was just a victim of the outage like plenty of others.
The Signal service has recovered from an outage that was caused by data center power issues. We appreciate your patience as we worked towards resolution, and we'll continue to make additional improvements. Want to help? We are hiring Server Developers: https://t.co/QcCbR3Ia79— Signal (@signalapp) May 31, 2018
All other AWS regions appear to be working just fine at the time of writing. If AWS offers new information, we’ll update this story. ®
7:30 PM PDT June 1st: AWS has now revealed more detail about the outage.
"Beginning at 2:52 PM PDT a small percentage of EC2 servers lost power in a single Availability Zone in the US-EAST-1 Region," the company's guidance now says.
"This resulted in some impaired EC2 instances and degraded performance for some EBS volumes in the affected Availability Zone."
"Power was restored at 3:22 PM PDT" and that fixed most problems. But the company says some instances haven't come back yet because they were "hosted on hardware which was adversely affected by the loss of power."
"While we will continue to work to recover all affected instances and volumes, for immediate recovery, we recommend replacing any remaining affected instances or volumes if possible," it added.