Microsoft extends 'outage mode' for Azure Active Directory to bake more resilience into cloudy services

But Redmond has bigger questions to answer regarding Azure architecture


Microsoft hopes to improve the resilience of its cloud services by extending an "outage mode" for Azure Active Directory to cover web as well as desktop applications.

Azure Active Directory (AAD) is Microsoft's cloud directory which handles authentication for Office 365 and can be linked to on-premises Active Directory. Further, developers can write applications that use the service. However, if it goes wrong, customers experience multiple failures, including the inability to access the Azure Portal in order to manage other cloud services.

In December last year Microsoft updated its SLA (Service Level Agreement) for AAD to 99.99 per cent uptime, increased from 99.9 per cent, though with some sleight of hand as it also removed "administrative features" from its definition of availability.

Now the company has given more details about its efforts, focusing on a backup authentication service which replicates authentication data during normal operations, and then if the primary service fails, transitions to "outage mode" where it is able to check requests and provide tokens to clients.

Microsoft's diagram showing how backup AAD works

Microsoft's diagram showing how backup AAD works

According to Microsoft this has been in operation for Outlook Web Access and SharePoint Online since 2019, though we note that during the September 2020 outage both Outlook and SharePoint were impacted. The reason given at the time was that "a recent configuration change impacted a backend storage layer", a problem that was compounded by a further issue caused by "a change put in place to mitigate impact." It seems therefore that the backup service was not sufficient in that instance.

There is also a limitation in that authentications are only processed by the backup service if the user has already accessed an "app or resource" within the last three days, described as the "storage window." The company felt this was OK for most users who "access their most important applications daily from a consistent device," but it is easy to think of cases where users will be locked out, for example if they purchase a new device.

It is better than nothing though, and Microsoft has been busy extending its applicability. Earlier this year support for desktop and mobile applications was added, and next year more web applications including Teams Online and the rest of Office 365 will be too. Customer applications using Open ID Connect will follow shortly.

More questions than answers

In some respects Microsoft's latest post begs more questions than answers. A quick look at the Azure status page shows "Azure Active Directory - Issues when attempting to authenticate", though possibly restricted to customers using Azure Active Directory External Identities, with the root cause attributed to "outbound port exhaustion", though where that sits on the company's architecture diagram is not clear.

In March this year there was an extended AAD outage caused by mistaken removal of a key used for cryptographic signing. Microsoft referenced the backup service at the time and said that "Unfortunately, it did not help in this case as it provided coverage for token issuance but did not provide coverage for token validation as that was dependent on the impacted metadata endpoint."

It is apparent therefore that the extension of the backup service will not solve all the issues that might impact AAD even though it is beneficial.

In August this year Gartner analysts reported that customers "remain concerned about real-world impacts" from Azure reliability even though its performance is not bad in an absolute sense. Gartner considers some Azure regions less resilient than they should be, perhaps thanks to capacity issues - though note that the pandemic caused a spike in demand for all cloud providers.

Microsoft also has questions to answer regarding the Cosmos DB vulnerability described by security researchers at Wiz earlier this month. The vulnerability has been fixed, but the researchers identified what look like some extraordinary architectural mistakes, like firewall rules designed to prevent escalation of a breach but "these firewall rules were configured locally on the container where we were currently running as root. So, we simply deleted the rules (by issuing iptables -F), clearing the way to these forbidden IP addresses and to some even more interesting findings."

It is a good thing when Azure CTO Mark Russinovich pops up to tell us, along with colleagues, about improvements in Azure reliability, and the extended AAD backup service is welcome even if not always effective, but we would like to know more about these other pressing matters. ®

Similar topics


Other stories you might like

  • India reveals home-grown server that won't worry the leading edge

    And a National Blockchain Strategy that calls for gov to host BaaS

    India's government has revealed a home-grown server design that is unlikely to threaten the pacesetters of high tech, but (it hopes) will attract domestic buyers and manufacturers and help to kickstart the nation's hardware industry.

    The "Rudra" design is a two-socket server that can run Intel's Cascade Lake Xeons. The machines are offered in 1U or 2U form factors, each at half-width. A pair of GPUs can be equipped, as can DDR4 RAM.

    Cascade Lake emerged in 2019 and has since been superseded by the Ice Lake architecture launched in April 2021. Indian authorities know Rudra is off the pace, and said a new design capable of supporting four GPUs is already in the works with a reveal planned for June 2022.

    Continue reading
  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading

Biting the hand that feeds IT © 1998–2021