Rackspace datacenter infrastructure took 12-hour nap in London, Sydney, Hong Kong
Borked SANs, not a security SNAFU, identified as the cause. Services are back, but Linux VMs must reboot
Updated Rackspace is in a mess again.
The cloudy concern's status page reports outages in its SYD2, LON5, LON3, and HKG5 datacenter infrastructure across May 29 and 30.
Rackspace's first incident report is timestamped 29 May 22:24 CDT.
A subsequent update identified the issue as related to Dense Wavelength-Division Multiplexing (DWDM) in London, as that facility is related to a fiber transport network that allows Rackspace to deliver traffic between datacenters and internet service providers.
But an hour later Rackspace ruled out DWDM as a cause of the incident. The company has not updated its status page since.
The Register has obtained an email a SaaS company that resides in Rackspace has sent to its customers.
"Our hosting provider Rackspace have confirmed they are experiencing connectivity issues," the email opens. "All available engineers have been engaged and are working to resolve the issue with the highest priority."
It gets worse: Rackspace has warned customers of its London datacenters that whatever's causing the issue may disrupt their backups, and offered instructions on how to detect any failures.
At the time of writing – 02:45 CDT on May 30 – Rackspace had not updated its status page for over an hour. The Register has sought comment and will update this story if we receive useful information.
- Rackspace racks up job cuts amid market downturn and talk of offshoring
- Rackspace confirms ransomware attack behind days-long email meltdown
- On the 12th day of the Rackspace email disaster, it did not give to me …
- Microsoft ditches plans for 500,000 sq ft London office
This outage comes at a terrible time for Rackspace as its US and UK customers emerge from a holiday weekend.
The company is also far from out of the woods after the December 2022 attack on its Hosted Exchange environment caused weeks of disruption and saw the service abandoned.
That incident led to protracted inability to access data, again with terrible timing as customers prepared for the festive season. Class actions are under way to give aggrieved customers a chance for compensation.
And now Rackspace customers on three continents have a new set of worries. ®
Updated at 23:00 UTC, May 30
Rackspace has identified the cause of the problem as "I/O limits in the multi-tenant Shared SAN environment had reset incorrectly."
Rackspace ran a script to reset the value and as of 12:10 CDT services were restored – with some exceptions.
"It has been identified that any impacted Linux VMs (virtual machines) will not automatically recover if storage has been adjusted and will need to be manually rebooted. Rackspace engineers can reboot impacted VMs from the portal where necessary" states Rackspace's status update.
A Rackspace spokesperson told us the incident is not considered a security matter.