PaaS + IaaS

This article is more than 1 year old

AWS US East region endures eight-hour wobble thanks to 'Stuck IO' in Elastic Block Store

EC2 instances were impaired, Redshift hurt, and some of you may still struggle to access your data

Tue 28 Sep 2021 // 07:57 UTC

Amazon Web Services' largest region yesterday experienced an eight-hour disruption with the Elastic Block Store (EBS) service that impacted several notable web sites and services.

The lack of fun started at 8:11pm PDT on Sunday, when EBS experienced "degraded performance" in one availability zone (USE1-AZ2) in the US-EAST-1 Region. A subsequent update described the issue as "Stuck IO" and warned that existing EC2 instances may "experience impairment" while new EC2 instances could fail.

US East is the only AWS Region to offer six availability zones – a reflection of its status as the company's first location.

Other AWS services – among them Redshift, OpenSearch, Elasticache, and RDS databases – experienced "connectivity issues" as well.

By 9:17pm, AWS felt that the number of EC2 instances impacted by the issue had plateaued, but users continued to experience difficulties.

By 9:47pm the beginnings of an explanation emerged, as AWS revealed "A subsystem within the larger EBS service that is responsible for coordinating storage hosts is currently degraded due to increased resource contention."

Among the organisations impacted were secure messaging app Signal …

Hold tight, folks! Signal is currently down, due to a hosting outage affecting parts of our service. We’re working on bringing it back up.
— Signal (@signalapp) September 27, 2021

… and The New York Times games site (yes, your correspondent has a Spelling Bee problem).

Hi folks, our Games page is now up and running. Apologies for disturbing your morning routines, but back to your Monday solving! 🧩
— NYTimes Wordplay (@NYTimesWordplay) September 27, 2021

At 10:23pm AWS explained it had made "several changes to address the increased resource contention within the subsystem responsible for coordinating storage hosts with the EBS service". While those changes "led to some improvement" an 11:19pm update reported only "some improvements" but admitted "we have not yet seen performance for affected volumes return to normal levels".

A minute later, AWS rolled a change. By 11:43pm, AWS was confident enough to report the mitigations had worked, and predicted EBS volume performance would return to normal levels within an hour.

But at 1:15am the next day, a glitch struck. Some restored services slowed down again, and some new volumes also experienced "degraded performance".

By 3:36am new EC2 instances were again booting without incident, and at 4:21am the cloud concern updated its status feed with news that full operations had been restored at 3:45am.

But the company also admitted "While almost all of EBS volumes have fully recovered, we continue to work on recovering a remaining small set of EBS volumes.

"While the majority of affected services have fully recovered, we continue to recover some services, including RDS databases and Elasticache clusters," the final update added.

Clouds. Sometimes it's hard to find the silver lining. ®

More about

AWS
Outage

More about

AWS
Outage

Narrower topics

Broader topics

Narrower topics

Broader topics

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

PaaS + IaaS

AWS US East region endures eight-hour wobble thanks to 'Stuck IO' in Elastic Block Store

EC2 instances were impaired, Redshift hurt, and some of you may still struggle to access your data

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

AWS must pay $525M to cloud storage patent holder, says jury

911 goes MIA across multiple US states, cause unclear

A different view from the edge

Irish power crunch could be prompting AWS to ration compute resources

Snowmobile, Amazon's truck-powered migration service, reaches the end of the road

UK govt office admits ability to negotiate billions in cloud spending curbed by vendor lock-in

AWS severs connection with several hundred staff

Amazon to lure upstarts with $500K in AWS AI credits each

Cyberattack hits Omni Hotels systems, taking out bookings, payments, door locks

GenAI will be bigger than the cloud or the internet, Amazon CEO hopes

Datacenter outages are on the decline, but when they hit, they hit hard

About Us

Our Websites

Your Privacy