Microsoft is recovering somewhat from a bad case of the Mondays that left some of its subscribers unable to use multi-factor authentication to log into their cloud services.
The Redmond giant said that around 2130 UTC today it had managed to get its Azure Cloud back up and running as per normal. Meanwhile, Office 364 is still being knocked into shape by the Windows giant's techies.
Azure and the cloud-based Office suite started playing up at 0439 UTC, meaning multi-factor authentication has been knackered for about 17 hours, preventing unlucky users from logging in.
Over the weekend, Azure's DevOps services were also a little wobbly, we note.
In the case of Azure's login woes, the fix seemed relatively straightforward, if not cliche. Microsoft turned their computers off and on again, according to the cloud platform's status page at time of writing:
Engineers cycling of impacted servers is complete and initial telemetry and customer reports indicates issue is majority mitigated. telemetry will continue until mitigation confirmed. Engineers will continue to monitor any updates or changes made from the work-streams currently being explored.
Just as we were publishing this story, we noticed Microsoft had quietly revised its explanation of the Azure cock-up: it now blames an overloaded Redis database in Europe that ultimately took out other systems when it tried to fail over to equipment in North America. A hotfix was applied, and machines were rebooted, to revive the platform's multi-factor authentication systems, as the status page now explains:
Preliminary root cause: Requests from MFA servers to Redis Cache in Europe reached operational threshold causing latency and timeouts. After attempting to fail over traffic to North America this caused a secondary issue where servers became unhealthy and traffic was throttled to handle increased demand.
Mitigation: Engineers deployed a hotfix which eliminated the connection between Azure Identity Multi-Factor Authentication Service and a backend service. Secondly engineers cycled impacted servers which allowed authentication requests to succeed.
In a way, Microsoft is saying its cloud couldn't handle the weight of multi-factor login requests. Given the multi-factor tokens are only valid for a short time – there's typically a 60-second window to enter the correct code – high latency can be a real problem.
As for Office 363-and-a-half, the fix does not appear to be so straightforward. Redmond said that it is still working on the problem, and as of right now engineers aren't even sure just what went wrong.
"Due to the complex nature of this problem, our investigation into the root cause of this issue may take an extended period of time," Microsoft told customers. "This incident is and will remain our highest priority until the underlying source of the problem has been identified."
Azure, Office 365 go super-secure: Multi-factor auth borked in Europe, Asia, USAREAD MORE
This is after many, but not all, punters have spent most of the day unable to log-in to their Microsoft-hosted services via multi-factor authentication. The outages were felt worldwide, beginning in the afternoon of Monday in Asia, and carrying on into Europe's start-of-the-week, and into the Americas as unlucky users found themselves unable to use their two-factor gizmos to log in.
For security reasons, multi-factor authentication is highly recommended in order to prevent password-stealing hackers from hijacking accounts.
Now, with Asia nearly ready to get up and start its Tuesday workday, Microsoft has yet to fully resolve its Office 359 login headaches. Passwords also cannot be reset by users.
If there is one saving grace for Redmond, it is that thanks to the impending Thanksgiving holiday combined with severe weather on the US East Coast and wildfire-induced pollution in California, a good number of Americans are working from home on Monday, potentially limiting frustrations stemming from the outages: it's quite easy to go back to bed if you can't even log in.
The Register will continue to monitor the situation and update as needed. ®