Updated In a touching show of solidarity with its Exchange Online cousin, Microsoft’s Azure Multi-Factor Authentication (MFA) service has fallen over and is struggling to get back up. Again.
If Microsoft hasn’t developed an AI bot capable of filling its social media orifices with apologies yet, then it is surely only a matter of time before it does so.
Microsoft suffers the Tuesday shakes as Exchange Online continues to be wobbly for UK usersREAD MORE
A Microsoft engineer, fingers doubtless weary from writing up last week’s fiasco, took to the Azure status page to admit that, yes, as of 14:25 UTC today, MFA was having problems. But it's ok – it's only a “subset” of customers. The Windows giant went on to warn that those who had MFA required by policy might experience intermittent issues signing in to Azure resources.
These resources include Azure Active Directory. Can you hear the admins wailing?
MFA is undoubtedly a good thing, since it forces users to adopt two or more ways of authentication beyond just a password. A phone, dongle or biometrics can come into play as well. Assuming the MFA service is actually running, of course.
The issue, which is worldwide, comes hot on the heels of the publication of a root cause analysis into the incident last week, which saw a trio of failures that led to users being unable to access their beloved Office 365 services.
is it down again? Same issues as the other day. not acceptable— Olie Denyer (@odee30) November 27, 2018
At the time, Microsoft said it would endeavour to prevent a recurrence of the problem by looking at how it handled testing and updates and review ways of containing failures before they kick off.
Hopefully that review didn’t take long, because there is a failure happening right now that sure needs some containment.
In the meantime, some unkind customers have suggested applying the solution that worked last time. You know: turn it off and turn it on again.
MFA down again. Have you tried to restart? xD pic.twitter.com/dKgYTwqDjr— Mariano de Pedro (@Crower) November 27, 2018
We contacted Microsoft to find out what had become of the service and the lessons learned from last week, but have yet receive a response. ®
Updated to add
According to Microsoft, "After a preliminary investigation, engineers found that an earlier DNS issue triggered a large number of sign-in requests to fail, which resulted in backend infrastructure becoming unhealthy."
And yes, the outage was tackled, and systems restored, after switching equipment off and on again: "After the DNS issue was resolved, engineers then focused on cycling the relevant backend services to resolve the congestion issue. They observed a decrease in the failure rate after the reboot cycles."
A full postmortem will be released in the next few days.