Updated Microsoft Azure tumbled over in northern Europe – and services have effectively stayed down for unlucky customers for around five hours.
The disruption was caused by problems within the cloud platform's storage and networking systems, we're told. Today, from 1744 UTC to now, at time of writing 2115 UTC, according to the Windows giant:
A subset of customers using Virtual Machines, Storage, SQL Database, Key Vault, App Service, Site Recovery, Automation, Service Bus, Event Hubs, Data Factory, Backup, API management, Log Analytics, Application Insight, Azure Batch Azure Search, Redis Cache, Media Services, IoT Hub, Stream Analytics, Power BI, Azure Monitor, Azure Cosmo DB or Logic Apps in North Europe may experience connection failures when trying to access resources hosted in the region.
It added on its status page that "engineers have identified the root cause, and actively working to mitigate the issue."
Amusingly, amid its tweets trying to soothe frustrated customers, the Azure Support Twitter account tweeted the following now-deleted and probably automatically scheduled suggestion:
Need help choosing which #Azure services to use for your own projects or for customer solutions? Check out this blog post from Rob Caron and @AzureBarry for the process they use to decide which services are best for them...
Cue wry replies suggesting one should choose to avoid services in northern Europe right now. Meanwhile, the downtime was knocking websites and production systems offline for affected punters...
#Azure— Tanny (@TannysDream) June 19, 2018
Our production VM's are down ... customer complaints pouring in!
Sorry for the delay, but we currently have a Storage outage in North Europe. Engineers are actively investigating and working to mitigate the issue. Check out https://t.co/Dw19fIoS5H and the portal at https://t.co/cFMfZQMdWd for updates. ^DO— Azure Support (@AzureSupport) June 19, 2018
We'll let you know when services have been restored. ®
Updated to add at 2315 UTC
Five and a half hours in, and services have not been fully resolved. According to Microsoft:
Engineers are investigating a control system failure that affected a limited number of Storage scale units and network infrastructure in one of the data centers supporting the region. Work is in progress to recover impacted scale units and restore service to downstream impacted services.
Updated to add at 0010 UTC
We're assured that Azure is starting to pull itself together in northern Europe after Redmond's techies pulled out the stops to fix their equipment – although it seems it may have possibly cooked itself, to some degree. According to the US IT giant:
Engineers are seeing recovery. As we work to recover and validate improvements, engineers continue to investigate an underlying temperature issue in one of the data centers which caused Storage, Network device equipment to fail.
We've asked Microsoft what exactly an "underlying temperature issue" is.
Updated to add at 0700 UTC
Microsoft's status page now states: "Engineers identified that an underlying temperature issue in one of the data centers in the region triggered an infrastructure alert, which in turn caused a structured shutdown of a subset of Storage and Network devices in this location to ensure hardware and data integrity."
A full explanation should land inside 72 hours.
For what it's worth, the North Europe region is located in Ireland, where the weather has been typically temperate this week. Next week, however, temperatures are expected to graze 30°C. If Azure's North Europe operations have problems handling heat, this may not be the last outage it experiences.