Finally, made it to the weekend, time to breathe, relax, and... Cloudflare's taken down a chunk of the web

DNS provider goes dark amid bad routing, world+dog goes through nine minutes of terror

17 Reg comments Got Tips?

Updated Global internet glue Cloudflare experienced a brief network outage on Friday that broke multiple apps and websites, including your humble Register.

On its status page, as of Jul 17, 21:37 UTC, the DNS-and-everything-else provider said it was "investigating issues with Cloudflare Resolver and our edge network in certain locations," and warned that customers in certain regions may experience failures or errors.

Affected services included the Cloudflare API and Cloudflare Recursive DNS, both of which were listed as having degraded performance. And in various regions around the world where Cloudflare handles network traffic, the status page said data is being rerouted.

Nine minutes later, at Jul 17, 21:46 UTC, the biz announced a fix had been implemented without immediately saying what happened. Within the past few minutes, a Cloudflare spokesperson told The Register the blip was due to a blunder involving one of its routers:

This afternoon we saw an outage across some parts of our network. It was not as a result of an attack. It appears a router on our global backbone announced bad routes and caused some portions of the network to not be available. We believe we have addressed the root cause and monitoring systems for stability now. We will share more shortly – we have a team writing an update as we speak.

Cloudflare CEO Matt Prince then pointed to a single piece of equipment in Atlanta, USA, as the culprit:

He added the glitch "appears to have impacted about 50 per cent of our traffic for a bit over 20 minutes."

"The issue was caused by a mistaken configuration we were applying to a router during a routine update," said Prince. "There was no attack. It was not a failure of the router software. Blog post with details coming soon as are protections to our backbone to prevent going forward."

Because Cloudflare handles DNS services and edge computing services for many, many non-commercial and commercial websites, including The Register and our backend systems, the brief service interruption drew immediate notice.

Aside from El Reg, services said to have been affected are Authy, Digital Ocean, Discord, Downdetector, GitLab, Medium, Patron, and Riot among others.

The incident, while short-lived, is yet another reminder of the fragility of critical online services. We'll let you know when we know more. ®

Updated to add

Cloudflare has published a fairly detailed postmortem of the downtime.

"We are sorry for this outage and have already made a global change to the backbone configuration that will prevent it from being able to occur again," said CTO John Graham-Cumming.

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER


Biting the hand that feeds IT © 1998–2020