This article is more than 1 year old

Exchange Online and Microsoft Teams went down in APAC because Microsoft broke itself

Legacy process overwhelmed infrastructure, brought ten hours of trouble

Updated Microsoft's flagship cloudy productivity services are down across the Asia-Pacific region.

"Our initial investigation indicates that there our service infrastructure is performing at a sub-optimal level, resulting in impact to general service functionality" states an advisory time-stamped 12:41PM on December 2.

The incident means customers of Exchange Online may not be able to access the service, send email and/or files, or use what Microsoft described as "General functionality".

The impact on Teams means:

  • Users may experience issues scheduling/editing meetings and/or live meetings;
  • People Picker/Search function may not work as expected;
  • Users may be unable to search Microsoft Teams;
  • Users may be unable to load the Assignments tab in Microsoft Teams.

Messaging, chat, channels, and other core Teams services appear to be available.

Microsoft appears not to know what's wrong.

"While we continuing to analyze any relevant diagnostic data, we are restarting a subset of the affected infrastructure to determine if that will provide relief to the service," stated an update posted 17 minutes after the first status notice.

Another update, time-stamped 1:14PM, offers the following information:

We have successfully restarted a small portion of the affected systems and are monitoring the service to determine if there is a positive effect on our service. While we continue to monitor, we will work to understand the root cause and develop other potential mitigation pathways.

Microsoft's advisory states that the issue "may impact any user within the Asia-Pacific region." The Register was alerted to the situation by a user in Australia, and our quick scan of social media only yielded mentions about the outage in Japanese.

Suffice to say that machine translations of those Japanese-language posts yield expressions of frustrations at the outage, and the fact that Microsoft's Twitter status feed has nothing to say about the incident.

The Register will keep an eye on this incident and update this story as the situation changes. ®

Updated at 04:30 UTC Microsoft's latest status update suggests this incident is getting worse.

A status update time-stamped 3:14PM - it's unclear in which time zone - states the incident now also impacts SharePoint Online, Microsoft PowerApps, and Microsoft PowerAutomate.

It also details a wider impact on Teams, including contacts becoming unavailable and problems initiating new chats.

"Our service availability remains at a stable level and we are still focused on understanding the underlying source of the issue and ensuring the problem does not reoccur," the update states.

Updated at 22:00 UTC, December 2nd The incident has ended! An update to Microsoft's incident report time-stamped 2314 on December 2 offers the description of the preliminary root cause:

Processing components were not performing within optimal performance thresholds because of a legacy process that required tokens to be processed on specific components. In isolation this process wasn't problematic, but combined with the large number of requests, this resulted in resource saturation, causing impact across multiple Microsoft 365 apps

Microsoft tested transitioning away from the problematic legacy process and restarting affected infrastructure.

Which worked, so the company did the same thing in its live environment.

The incident ran for nine hours and 59 minutes, from 1355 UTC on December 1st to 0954 UTC on December 2.

Microsoft has promised to publish a proper incident report within five days. We'll look it up and report on its content, either as a further update to this story or a fresh piece of prose.

More about

TIP US OFF

Send us news


Other stories you might like