On-Prem

Networks

How four rotten packets broke CenturyLink's network for 37 hours, knackering 911 calls, VoIP, broadband

FCC delivers postmortem after blunder cripples US fiber links


A handful of bad network packets triggered a massive chain reaction that crippled the entire network of US telco CenturyLink for roughly a day and a half.

This is according to the FCC's official probe [PDF] into the December 2018 super-outage, during which CenturyLink's broadband internet and VoIP services fell over and stayed down for a total of 37 hours. This meant subscribers couldn't, among other things, call 911 over VoIP at the time – which is a violation of FCC rules, and triggered a formal investigation.

"This outage was caused by an equipment failure catastrophically exacerbated by a network configuration error," America's communications regulator said in its summary of its inquiry, published yesterday.

"It affected communications service providers, business customers, and consumers who directly or indirectly relied upon CenturyLink’s transport services, which route communications traffic from various providers to locations across the country, resulting in extensive disruptions to phone service, including 911 calling."

CenturyLink has six long-haul networks that make up the backbone of its digital empire, interconnecting regions of America. These networks use Infinera-built nodes to switch packets over high-speed optic fiber: data flowing into each node is directed to other nodes, ultimately pumping VoIP, regular internet traffic, and more, across the nation as needed.

We're told four malformed network packets were the root cause of the outage: they were generated by a switching module in a node in Denver, Colorado, for reasons still yet unknown, and sent on to other nodes. The broken packets all had the following qualities:

1. a broadcast destination address, meaning that the packet was directed to be sent to all connected devices;

2. a valid header and valid checksum;

3. no expiration time, meaning that the packet would not be dropped for being created too long ago; and

4. a size larger than 64 bytes.

Each dodgy packet would arrive at a node, get rejected and be passed along a chain of filters until it was injected into a management channel and handed to all connecting nodes. Here's a flow diagram, courtesy of the FCC, showing how the corrupted packets ended up being forwarded on to all neighboring nodes, and so on and so on, producing a growing chain reaction of corrupted packets...

Click to enlarge

"Due to the packets’ broadcast destination address, the malformed network management packets were delivered to all connected nodes. Consequently, each subsequent node receiving the packet retransmitted the packet to all its connected nodes, including the node where the malformed packets originated," the FCC said in its report.

"Each connected node continued to retransmit the malformed packets across the proprietary management channel to each node with which it connected because the packets appeared valid and did not have an expiration time. This process repeated indefinitely."

As you might imagine, the exponentially growing storm of packets soon overwhelmed CenturyLink's optic-fiber backbone, causing regular traffic to stop flowing: VoIP phones stopped working, internet access slowed to a halt, and so on. Folks in New Orleans were first to spot their connections stalling, at roughly 0356 EST on December 27.

Here is where things went from really, really bad to terrible: the nodes along the fiber network were so flooded, they could not be reached by their administrators to troubleshoot the issue. It wasn't until some 15 hours later the techies could finally track down the single errant node in Colorado responsible for sparking the deluge, not that replacing it helped. The packet tsunami was still washing back and forth, knocking nodes over.

US states join watchdog probing CenturyLink's Xmas data center outage that screwed 911 system

READ MORE

"At 2102 on December 27, CenturyLink network engineers identified and removed the module that had generated the malformed packets," the report noted. "The outage, however, did not immediately end; the malformed packets continued to replicate and transit the network, generating more packets as they echoed from node to node."

It would be another three hours before CenturyLink's network admins could begin to get through to the other nodes, and get them to kill off the spread of bad packets. It took until 1130 on December 28 to get visibility of the network back, and it wasn't until 2336 that all nodes had been restored. On December 29, just after midday, CenturyLink finally declared the crisis over.

"The event caused a nationwide voice, IP, and transport outage on CenturyLink’s fiber network. CenturyLink estimates that 12,100,108 calls were blocked or degraded due to the incident," the FCC said.

"Where long-distance voice callers experienced call quality issues, some customers received a fast-busy signal, some received an error message, and some just had a terrible connection with garbled words."

The outage also knackered local governments and telcos that relied on the CenturyLink network for portions of their services. State governments in Illinois, Kansas, Minnesota, and Missouri all had portions of their networks down for roughly 36 hours thanks to CenturyLink, and phone services sold by Comcast, Verizon, TeleCommunication Systems, General Dynamics IT, and West Safety Services – including 911 call centers – saw connectivity interrupted for some or all of the outage period.

As to what can be done to prevent similar failures, the FCC is recommending CenturyLink and other backbone providers take some basic steps, such as disabling unused features on network equipment, installing and maintaining alarms that warn admins when memory or processor use is reaching its peak, and having backup procedures in the event networking gear becomes unreachable.

"Currently, CenturyLink is in the process of updating its nodes’ Ethernet policer to reduce the chance of the transmission of a malformed packet in the future," the report notes. "The improved ethernet policer quickly identifies and terminates invalid packets, preventing propagation into the network. This work is expected to be complete in Fall, 2019."

The report did not mention any possible fines or penalties against CenturyLink. ®

Send us news
53 Comments

Apple's Safari browser runs the risk of becoming the new Internet Explorer – holding the web back for everyone

WebKit engine is well behind the competition

Feature The legacy of Internet Explorer 6 haunts web developer nightmares to this day. Microsoft's browser of yore made their lives miserable and it's only slightly hyperbolic to say it very nearly destroyed the entire internet. It really was that bad, kids. It made us walk to school in the snow. Uphill. Both ways. You wouldn't understand.

Or maybe you would. Today developers who want to use "cutting-edge" web APIs find themselves resorting to the same kind of browser-specific workarounds, but this time the browser dragging things down comes from Apple.

Apple's Safari lags considerably behind its peers in supporting web features. Whether it's far enough behind to be considered "the new IE" is debatable and may say more about the shadow IE still casts across the web than it does about Safari. But Safari – or more specifically the WebKit engine that powers it – is well behind the competition. According to the Web Platform Tests dashboard, Chrome-based browsers support 94 per cent of the test suite, and Firefox pulls off 91 per cent, but Safari only manages 71 per cent.

Continue reading

Judging by the way your face lit up, my inbox just got more attractive

A message for you, (on your) rudie

Something for the Weekend, Sir? "You've got mail!" announces a voice on the tram.

How very 1990s, I think, imagining myself as a double-taking, pre-gravitas Tom Hanks in a remake of the lightweight romantic comedy. I tear my attention away from my book and look up to see who uttered this famous refrain.

A man standing across the aisle is staring straight at me. OK, I'm no Tom Hanks but this guy's definitely no Meg Ryan. It is unsettling. I look up and down the carriage for other potential sources.

Continue reading

BOFH: So you want to have your computer switched out for something faster. It's time to learn from the master

Corporate will make you jump through hoops – but there's always a window

Episode 19 "It's just … so slow," my user complains.

"Slow, or comparatively slow?" I ask.

"What do you mean?"

Continue reading

How to keep a support contract: Make the user think they solved the problem

Look what you found! Aren't you clever!

On Call Let us take a little trip back to the days before the PC, when terminals ruled supreme, to find that the more things change the more they stay the same. Welcome to On Call.

Today's story comes from "Keith" (not his name) and concerns the rage of a user whose expensive terminal would crash once a day, pretty much at the same time.

The terminal in question was a TAB 132/15. It was an impressive bit of kit for the time and was capable of displaying 132 characters of crisp, green text on a 15-inch CRT housed in a futuristic plastic case. Luxury for sure, unless one was the financial trader trying to use the device.

Continue reading

Apple kicked an M1-shaped hole in Intel's quarter

Chipzilla braces for a China-gaming-ban-shaped hole in future results, predicts more product delays

Intel has blamed Apple's switch to its own M1 silicon in Macs for a dip in sales at its client computing group, and foreshadowed future unpleasantness caused by supply chain issues and China's recent internet crackdowns.

Chipzilla's finances were robust for the third quarter of its financial year: revenue of $19.2 billion was up five per cent year over year, while net income of $6.8 billion was up 60 per cent compared to 2020's Q3.

But revenue for the client computing group was down two points. CFO George Davis – whose retirement was announced today – was at pains to point out that were it not for Apple quitting Intel silicon and Chipzilla exiting the modem business, client-related revenue would have risen ten per cent.

Continue reading

How your phone, laptop, or watch can be tracked by their Bluetooth transmissions

Unique fingerprints lurk in radio signals more often than not, it seems

Over the past few years, mobile devices have become increasingly chatty over the Bluetooth Low Energy (BLE) protocol and this turns out to be a somewhat significant privacy risk.

Seven boffins at University of California San Diego – Hadi Givehchian, Nishant Bhaskar, Eliana Rodriguez Herrera, Héctor Rodrigo López Soto, Christian Dameff, Dinesh Bharadia, and Aaron Schulman – tested the BLE implementations on several popular phones, PCs, and gadgets, and found they can be tracked through their physical signaling characteristics albeit with intermittent success.

That means the devices may emit a unique fingerprint, meaning it's possible to look out for those fingerprints in multiple locations to figure out where those devices have been and when. This could be used to track people; you'll have to use your imagination to determine who would or could usefully exploit this. That said, at least two members of the team believe it's worth product makers addressing this privacy weakness.

Continue reading

YouTubers fell for shady 'sponsors' who seized, then sold, accounts

Vid-slingers had been asking how this happened for years, even while their channels were spruiking dodgy crypto

After years of complaints from YouTubers, Google has pinpointed the root cause of a series of account hijackings: software sponsorship deals that delivered malware.

Google forums have for years witnessed pleas for help to regain control of stolen YouTube accounts, despite the owners using multi-factor authentication. Impacted influencers found themselves not just locked out of their accounts, but scrambling to stop the sale of their channels.

What did they all have in common?

Continue reading

Alibaba Cloud drops all-in-one client device, on-prem cloud-native DB

Claims shared memory speed breakthrough in new server, plans to enter South Korea and Thailand, and more

Announcements were coming thick and fast at Alibaba Cloud's annual APSARA conference, where the Middle Kingdom's biggest cloud unleashed an all-in-one client device, plenty of upgrades to its cloud services, and an uncanny weather predictor.

The Chinese cloud leader last year gave the world "Wuying" – a tiny device that provided on-prem access to a PC running in its cloud. Last year's Wuying offered a USB-C port to connect it to the required external display (as demonstrated in this video).

2021's model is an all-in-one device. Here it is in all its glory.

Continue reading

Microsoft emits more Win 11 fixes for AMD speed issues and death by PowerShell bug

Names November as the month for Win 10 H2 update – then reveals major new feature won’t arrive on time

Microsoft has released a build of Windows 11 that it claims addresses performance problems the new OS imposed on some systems.

Redmond's announcement of OS Build 22000.282 lists over 60 "improvements and fixes" on top of a lucky 13 "highlights".

One of those highlights is described as fixing "an issue that causes some applications to run slower than usual after you upgrade to Windows 11 (original release)".

Continue reading

US consumer watchdog starts sniffing around tech giants' use of your spending data

Amazon, Apple, Facebook, Google, PayPal, Square under investigation

America's Consumer Financial Protection Bureau (CFPB) said on Thursday it is probing some of the biggest names in the electronic payments industry, requesting detailed information from them on how they collect and use people's spending data.

A strings of demands was issued by the government watchdog to Amazon, Apple, Facebook, Google, PayPal, and Square, said CFPB Director Rohit Chopra, and more could be sent to others. In addition, the agency is also looking into Chinese payment providers WeChat Pay and Alipay, saying the duo are "combining messaging, e-commerce and payment functionality into super-apps," which America's internet goliaths may try to imitate.

“Big Tech companies are eagerly expanding their empires to gain greater control and insight into our spending habits,” said Chopra in a statement [PDF]. “We have ordered them to produce information about their business plans and practices.”

Continue reading

We're closing the gap with Arm and x86, claims SiFive: New RISC-V CPU core for PCs, servers, mobile incoming

As it appears Intel's attempt to gobble the upstart collapses

SiFive reckons its fastest RISC-V processor core yet is closing the gap on being a mainstream computing alternative to x86 and Arm.

The yet-unnamed high-performance design is within reach of Intel's Rocket Lake family, introduced in March, and Arm's Cortex-A78 design, announced last year, in terms of single-core performance, James Prior, senior director of product marketing and communications at SiFive, told The Register.

San Francisco-based SiFive didn't provide specific comparative benchmarks, so you'll have to take their word for it, if you so choose.

Continue reading