Microsoft is "taking the next step in our commitment to the resiliency and availability" of Azure Active Directory by promising 99.99 per cent uptime from 1 April 2021, rising from 99.9 per cent for the current Service Level Agreement (SLA).
Azure Active Directory (AD) – a critical component in the company's Azure cloud infrastructure – is the directory used by Microsoft 365 (formerly Office 365). Organisations can also set up Azure AD Connect to replicate an on-premises Active Directory with this cloudy equivalent. Windows 10 PCs can be set up to login using Azure AD.
When Azure AD fails – as it did in September – the impact is huge, preventing access not only to Microsoft 365 but also to key services like the Azure management portal and even impacting some desktop applications like Microsoft Office and Visual Studio, which can be configured to rely on the service for checking for a valid subscription.
In December, Google suffered similar issues when its "central identity management system" fell over, impacting services including Google Kubernetes Engine, Gmail, Docs, and Meet.
In other words, keeping the directory up and running is essential for reliable cloud services, with Azure's record (while not bad in absolute terms) worse than its rivals. Azure AD services more than 400 million monthly active users, according to Microsoft, and processes tens of billions of authentications daily.
Unsecured Azure blob exposed 500,000+ highly confidential docs from UK firm's CRM customersREAD MORE
Late last year Microsoft's VP of Engineering, Nadim Abdo, who leads core authentication and identity services across the company, said the company was "raising the bar for resilience of the Azure AD Service".
Abdo said there will be a roadmap for resilience work published soon, but has already promised to improve the SLA from the April date from 99.9 per cent to 99.99, also known as four nines. There is a caveat, however – the updated SLA will only cover user authentication and federation, not administrative features.
The current SLA includes the ability to create and amend entries. There is a security implication as in the event of stolen credentials or a disaffected employee leaving, the ability to deprovision a user quickly is important and will not be covered by the new SLA.
Customers, to nobody's surprise, apparently told Microsoft that "the most critical promise of our service is ensuring that every user can sign into the apps and services."
Despite the absence of a detailed roadmap, Abdo explained some of the steps Microsoft has taken and is planning. The company uses a fault domain isolation model to limit failures to a subset of users so the more fault domains there are, the smaller these subsets become. Microsoft increased their number by five times in 2020 and plans to evolve this further.
A backup authentication service is in development and already in use by Outlook Web Access and SharePoint Online. This will be integrated with further services during the course of this year.
Other changes include more integration with regional endpoints for better resilience in the event that the primary system breaks, and improved scalability.
The stakes are high for Microsoft. Gartner analysts recently noted that AWS "has a better track record for availability and reliability than the other hyperscale providers".
Improving both its reputation and actual performance in this respect must be a high priority for Microsoft since few things matter more to enterprises mulling cloud choices, and rock-solid Azure AD is a good place to start. ®