Microsoft admits 'power issue' downed Azure services in West Europe
Work ongoing to manually recover some storage nodes
Updated Microsoft techies are trying to recover storage nodes for a "small" number of customers following a "power issue" on October 20 that triggered Azure service disruptions and ruined breakfast for those wanting to use hosted virtual machines or SQL DB.
The degradation began at 0731 UTC on Friday when Microsoft spotted the unspecified power problem, which affected infrastructure in one Availability Zone in the West Europe region. As such, businesses using VMs, Storage, App Service, or Cosmos and SQL DB suffered interruptions.
So what caused this unplanned downtime session? Microsoft says in an incident report on its Azure status history page: "Due to an upstream utility disturbance, we moved to generator power for a section of one datacenter at approximately 0731 UTC. A subset of those generators supporting that section failed to take over as expected during the switch over from utility power, resulting in the impact."
Engineers managed to restore power again at around 0800 UTC and the impacted infrastructure began to clamber back online again. When the networking and storage plumbing recovered, compute scale units were brought into service, and for the "vast majority" the Azure services were accessible again from 0915 UTC.
Yet not everyone was up and running smoothly, Microsoft admitted.
"A small amount of storage nodes needs to be recovered manually, leading to delays in recovery for some services and customers. We are working to recover these nodes and will continue to communicate to these impacted customers directly via the Service Health blade in the Azure Portal."
- Down and out: Barclays Bank takes unplanned digital detox, customers not invited
- Gas supplier blames 'rogue' code for Channel Island outage
- Salesforce engineers roll back change after breaking own cloud for hours today
- Square blames last week's outage on DNS screw-up
We've asked Microsoft for an update on when those punters can expect normal service to resume.
Microsoft last reported unscheduled downtime Azure SQL in mid-September. It was out for the count on the US east coast after a network power failure. The problem wasn't mitigated for more than half a day. Luckily it was a Saturday, so only die-hard workers were impacted.
A far worse biz interruption came in late August when the entire Australia East cloud region went under, with Microsoft admitting that insufficient staff numbers on site was, in part, to blame and borked automation didn't help.
A report by the Uptime Institute in March found the rate of infrastructure outages had slowed in recent years, but they can still be pretty pricey when they happen. It said: "Decades of innovation, investment and better management mean that, overall, critical IT systems, networks and datacenters are far more reliable than they were."
It found that two-thirds of blackouts now cost more than $100,000 on average. ®
Updated at 15.20 UTC on October 24 2023, to add:
A spokesperson at Microsoft sent us a statement:
"A small number of customers in West Europe may have experienced a longer duration of impact as we worked to recover some servers that required additional diagnostics. All services have now been fully restored and there is no indication of data loss for customers."