During Sunday, US time, prominent Web services outfit CloudFlare sent an instruction to its routers in response to an attempted DoS, and instead took down its own network.
In a rare example of detailed disclosure, the company has posted an explanation of what happened here.
The network collapse occurred, the company explains, after it detected an attempted denial-of-service attack against a customer’s DNS servers using packets that were between 99,971 and 99,985 bytes long – an oddity, CloudFlare notes, because that’s so much larger than the Internet’s typical packet length (500 – 600 bytes according to the company) and larger than the 4,470 byte maximum packet it allows on its internal network.
So it wrote a JunOS rule (CloudFlare is a Juniper shop) to drop the packets, propagated the rule to its routers – and for reasons unknown, that rule crashed all the routers at which the instruction arrived.
“Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed,” the blog post notes.
The crashes happened in such a way, CloudFlare says, that the routers didn’t reboot automatically, which meant that they couldn’t be accessed remotely; and worse, those routers that did wake back up copped the entire traffic load, couldn’t cope, and crashed again.
Accounts covered by SLAs will get credits, the company says, and it is investigating the problem with Juniper. ®