T-Mobile US is attempting to pin the blame for a massive network outage on Monday on a third-party leased fiber network, though the head of America's communications watchdog has demanded a full investigation into the "unacceptable" blunder.
The mobile telco, now one of just three giants in the US mobile market after its merger with Sprint was approved, suffered a multi-hour breakdown that caused so much disruption that some feared the US had been hit by a massive distributed denial-of-service attack. Incoming calls and texts to T-Mobile US subscribers were dropped, and data services degraded.
In fact, according to the network's president of technology Neville Ray, “a leased fiber circuit failure from a third party provider in the southeast” was to blame. In a blog post published late Tuesday, Ray that the mobile carrier did have a redundancy system in place but it failed, causing an overload that reverberated across the whole network.
“We’ve worked with our vendors to build redundancy and resiliency to make sure that these types of circuit failures don’t affect customers. This redundancy failed us and resulted in an overload situation that was then compounded by other factors,” he wrote. “This overload resulted in an IP traffic storm that spread from the Southeast to create significant capacity issues.”
That explanation may not be not enough for Ajit Pai, chairman of the Federal Communications Commission (FCC), however. In Monday night, Pai tweeted: “The T-Mobile network outage is unacceptable. The FCC is launching an investigation. We're demanding answers – and so are American consumers.”
That same evening, T-Mobile US CEO Mike Sievert posted a brief explanation online that the communications breakdown was “an IP traffic related issue that has created significant capacity issues in the network core throughout the day.” Both Sievert and Ray stressed that many services were still working fine, and that it wasn’t their fault.
It’s not clear if that will stick if the FCC does carry out a full probe as Pai has demanded; we have asked the FCC what its plans are. National mobile operators do use a multitude of networks, not just their own, but there is assumed to be a massive amount of resiliency and redundancy built into those systems given their critical role in everyday communications for millions of people.
SoftBank to hang up on T-Mobile stake to shore up its balance sheetREAD MORE
One telecoms policy expert, Harold Feld of Public Knowledge, was skeptical of T-Mob's explanation, tweeting: "How the Hell is it possible that a huge chunk of our telecom infrastructure went down because of a single circuit (and back up) failure?"
Any investigation will dig into why T-Mobile US back-up systems failed. It will also look at whether the claims are true. There have been other FCC probes into previous outages, with some fines handed out.
In December 2018, a network meltdown within CenturyLink broke broadband and VoIP connectivity for more than a day, affecting 22 million subscribers in 39 states, and caused some 12 million calls to be black-holed or degraded. 911 emergency calls were also affected. The ISP was not fined.
Having said that, the FCC scrutinized separate outages within the 911 call system, and fined AT&T $5.25m in 2018 and CenturyLink $400,000 in 2019 for dropping some emergency calls.
If T-Mobile US is found to have been responsible for Monday's screw-up, it may face a fine, whereas if its backup systems failed, it may face a requirement to improve them albeit with no fine.
According to informed speculation on the part of Cloudflare CEO Matthew Prince, the issue may in fact have been caused by T-Mobile US engineers who were “making some changes to their network configurations” that “went badly” and resulted in a “series of cascading failures for their users, impacting both their voice and data networks.” ®