Cloudflare engineer broke rules – and a customer's website – with traffic throttle
Those who think Big Tech has its thumb on the scales are going to love this
Updated Cloudflare has admitted that one of its engineers stepped beyond the bounds of its policies and throttled traffic to a customer's website.
The internet-grooming outfit has 'fessed up to the incident and explained it started on February 2 when a network engineer "received an alert for a congesting interface" between an Equinix datacenter and a Cloudflare facility.
Cloudflare's post about the matter states that such alerts aren't unusual – but this one was due to a sudden and extreme spike of traffic and had occurred twice in successive days.
"The engineer in charge identified the customer's domain … as being responsible for this sudden spike of traffic between Cloudflare and their origin network, a storage provider," the post states. "Traffic from this customer went suddenly from an average of 1,500 requests per second, and a 0.5MB payload per request, to 3,000 requests per second (2x) and more than 12MB payload per request (25x)."
As the spike created congestion on a physical interface, it impacted many Cloudflare customers and peers.
Cloudflare's automated remedies swung into action, but weren't sufficient to completely fix the problem.
An unidentified engineer "decided to apply a throttling mechanism to prevent the zone from pulling so much traffic from their origin."
- Cloudflare finds a way through China's network defences
- Cloudflare hikes prices by a quarter, blames the accountants
- Tencent Cloud expands CDN using its own security tools
- CloudFlare apologizes for Telia screwing you over
A post to Hacker News that Cloudflare's post links to – and which The Register therefore assumes was posted by the throttled customer – states the throttle was applied without warning and caused the customer's site and API to become effectively unavailable due to slow responses leading to timeouts.
Cloudflare has issued a mea culpa for its decision to impose the throttle.
"Let's be very clear on this action: Cloudflare does not have an established process to throttle customers that consume large amounts of bandwidth, and does not intend to have one," wrote Cloudflare senior veep for production engineering Jeremy Hartman and veep for networking engineering Jérôme Fleury.
"This remediation was a mistake, it was not sanctioned, and we deeply regret it."
Cloudflare has promised to change its policies and procedures so this can't happen again – at least not without multiple execs signing off on it.
"To make sure a similar incident does not happen, we are establishing clear rules to mitigate issues like this one. Any action taken against a customer domain, paying or not, will require multiple levels of approval and clear communication to the customer," Hartman and Fleury state. "Our tooling will be improved to reflect this. We have many ways of traffic shaping in situations where a huge spike of traffic affects a link and could have applied a different mitigation in this instance."
The Hacker News post referenced above sparked a 300-plus comment conversation in which few authors have kind things to say about Cloudflare. Nor do various folks in some of the darker reaches of the web, where Cloudflare has often been accused of throttling traffic as a political act, given its track record of declining to serve sites that host hate speech.
Actually throttling a customer without warning will likely fuel theories that Cloudflare, like its Big Tech peers, is an activist organization that does not treat all types of speech fairly.
Hartman and Fleury promised that Cloudflare is re-writing its legalese to better explain what customers can expect. "We will follow up with a blog post dedicated to these changes later," the pair wrote.
The post does not mention what, if anything, happened to the engineer who applied the throttle. ®
Updated to add at 2350 UTC, February 9
Cloudflare contacted The Register with the following statement: "There were no punitive measures taken against anyone involved in this unfortunate incident. We have a blame-free culture at Cloudflare. People make mistakes. It's the responsibility of the organization to make sure that the damage from those mistakes is limited."