This article is more than 1 year old
An international incident or just some finger trouble at the console?
All routers are equal, but some are more equal than others
Who, Me? Welcome to an edition of Who, Me? where some configuration confusion left an entire nation cast adrift.
Today's story is set in the early 2000s and comes from a reader Regomized as "Mikael" who was gainfully employed at a European ISP. The company had customers in multiple countries and Mikael's team was responsible for the international backbone.
"Us senior network engineers were widely regarded as consummate professionals," he told us, before adding, "at least amongst ourselves."
This was the era of the mighty Cisco GSR 12000 router, "which, going off on a slight tangent," said Mikael, "were supposedly originally going to be called the BFR series – until marketing people figured out what THAT meant and settled for the more appropriate Gigabit Switch Router. Boo."
We'll take your word for it, Mikael.
The core routing protocol of choice was Open Shortest Path First (OSPF) and an update to the router OS allowed for link-by-link authentication in OSPF. "Basically," explained Mikael, "you configure a shared secret on both ends of a link to verify that the guy on the other end is who he says he is, and that you can trust the routing information that he sends."
So far so good.
"My colleague and I were adding this authentication feature to our links going into France. We had two core routers and two distribution routers in each country, and we were adding authentication between these two layers in the network," he said.
"Given the redundant setup, we were doing this during working hours."
There was a slight wrinkle in the procedure to apply the configuration. It had to be applied to both routers on either side within 30 seconds. If it wasn't, the link would drop since the routers would realize they didn't agree on the shared secrets.
The inevitable mistake was, however, not a timing problem. Indeed, Mikael told us: "My colleague and I sat at either side of a pod of desks, each with a terminal window open – me on the core router, him on distribution; doing the ceremonial countdown and hitting Enter at the same time on the first link."
The buttons were pressed. The duo sat at their respective terminals, both running and re-running the command that would show that all was well.
However, as the seconds ticked by, both grew increasingly concerned. All did not appear to be well at all.
"Maybe one of us had mistyped the password," wondered Mikael. The pair pulled up the change request and began copying and pasting it again, when Mikael's chum announced: "I think I lost my router."
Now this was really strange. Losing a link shouldn't matter. Management traffic would be redirected over the remaining link without issue. It was only as Mikael continued tapping in the password that icy chill of realization began creeping up his spine.
He asked his colleague: "Hey, umm, which distribution router were you making your change on – RD-01 or RD-02?"
The response was: "RD-01."
"Right... See, I added it on RC-02, as we always start with the nominal backup device to see that things work fine so as not to break too much."
There was a beat before the same stomach-dropping realization hit his coworker: "Wait, doesn't that mean that..."
"Yep, we just disconnected France."
- In IT, no good deed ever goes unpunished
- What do you mean you gave the boss THAT version of the report? Oh, ****ing ****balls
- How to stop a content filter becoming a career-shortening network component
- Electrocution? All part of the service, sir!
The next few seconds were frantic ones as a swift change was made to RC-01 and connectivity was restored. Cue La Marseillaise and all that.
"It was probably down for no more than a minute or two in total," said Mikael, "and we got away with it more or less scot-free."
"As a bonus we could tell the NOC that we'd verified that the feature works as expected."
"And for the record," he added, "I still blame the other guy for what happened."
Fair.
It's one thing to lose the odd device from a network, but quite another to lose an entire country. Have you accidentally done similar? Or maybe you're Mikael's colleague and want to tell your side of the story? Tell all, with an email to Who, Me? ®