Updated Monday's prolonged Google cloud and websites outage was triggered by a botched network update by a West Africa telco, it is claimed.
Main One, a biz ISP based in Lagos, Nigeria, that operates a submarine cable between Portugal and South Africa, said a misconfiguration at its end caused Google-bound traffic to be redirected to China Telecom for 74 minutes.
During that time, web browsers and apps that tried to connect to Google, YouTube, etc, or sites and platforms on Google Cloud, such as Spotify and Nest, were routed to the Chinese telco via Russian ISP TransTelekom, and dropped into a black hole.
The blunder was possible because Main One leaked details of one part of the internet's layout into the configuration of another, temporarily rewiring the spinal cord of the 'net. Packets working their way toward Google were sent down routes that took them around the globe.
Main One peers with Google, in that they agree to exchange traffic with each other through a peering point. Simply put, the ISP accidentally let slip details of its routes into Google's network in a way that caused the rest of the 'net to adjust its pathways so Google-bound traffic headed toward China Telecom.
"MainOne has a peering relationship with Google via IXPN in Lagos and has direct routes to Google, which leaked into China Telecom," explained Ameet Naik of cloud-monitoring biz ThousandEyes today. These leaked routes propagated from China Telecom, via TransTelecom to NTT and other transit ISPs. We also noticed that this leak was primarily propagated by business-grade transit providers and did not impact consumer ISP networks as much."
This eventually caused vast bucketfuls of internet traffic bound for Google in the US and posibly elsewhere to pour into a bottomless pit in China Telcom's network, effectively knocking the ad giant offline in the eyes of many netizens. It's understood no data was intercepted or handed over during the spill. Here's Main One 'fessing up:
We have investigated the advertisement of @Google prefixes through one of our upstream partners. This was an error during a planned network upgrade due to a misconfiguration on our BGP filters. The error was corrected within 74mins & processes put in place to avoid reoccurrence— MainOne (@Mainoneservice) November 13, 2018
Google also said that none of its servers or data was affected by the incident.
"We’re aware that a portion of internet traffic was affected by incorrect routing of IP addresses, and access to some Google services was impacted," a Chocolate Factory spokesperson told El Reg. "The root cause of the issue was external to Google and there was no compromise of Google services."
OK Google, why was your web traffic hijacked and routed through China, Russia today?READ MORE
If nothing else, the disclosure will allay fears that the outage was the result of some sort of attack or other nefarious activity. At the same time, the realization that something as simple as a regional ISP misconfiguring a server could trigger a global outage does not sit particularly well either.
"This incident further underscores one of the fundamental weaknesses in the fabric of the internet," said Naik. "BGP was designed to be a chain of trust between well-meaning ISPs and universities that blindly believe the information they receive. It hasn’t evolved to reflect the complex commercial and geopolitical relationships that exist between ISPs and nations today. While verification methods like ROA [Route Origin Authorization] exist, few ISPs use them. Even corporations like Google with massive resources at their disposal are not immune from this sort of BGP leak or malicious hijacks."
As NSA advisor and former White House cyber security boss Rob Joyce noted, the incident should serve as a call to reassess the state of the BGP system.
"I hope this latest fiasco of traffic rerouting through China is the wakeup call for all of us to get serious about addressing the massive and unacceptable vulnerability inherent in today’s BGP routing architecture," Joyce said on Tuesday. ®
Updated to add
Cloudflare has some more technical info on the BGP blunder, and suggests it is possible to perform route filtering to cut out dodgy updates. Unfortunately, China Telecom's CN2 carrier did not perform any sanity checks on Main One's changes to the internet's pathways, leading to this week's cock-up.
Also, had the intermediate systems been able to withstand the volume of Google-bound traffic, the sites and cloud connections wouldn't have dropped. In the end, the machines that suddenly had Google traffic routed through them fell over, taking Google with it for a ton of netizens.
Sponsored: Webcast: Simplify data protection on AWS