Beware the big bang in the network room
Trust me, we're going to need a bigger maintenance window
Who, Me? Cables can be unkind, especially when one has the confidence of youth but not the dark cynicism of experience. Welcome to an edition of Who, Me? to tug at, if not the heart strings, then certainly the RJ45s.
Today's hero is a Register reader Regomised as "Colin." Colin had accepted his first proper role as a network engineer for a small UK consultancy. Up to that point, his networking experience had consisted of swapping out the hub of a previous employer for a switch and getting the company's self-hosted web site and email server up and running.
He was confident. Gosh, he was confident.
Our story takes place in the early part of the last decade. The client had a London data centre and he was tasked with swapping out some of the switches within for modern units, replete with 10 gigabit uplinks.
"Regarding myself as 'not a total cowboy'," he told us, "I had surveyed the racks in advance and recoiled at the sprawling mass of Cat 5 cables that had 'grown organically' around this 48-port switch and confirmed that yes, of course there wasn't an empty slot in the rack..."
"I was quite new to the whole networking lark," he went on, "and had the bullishness to believe that with no assistance, and in spite of the Gordian-knot-like cable nest, since this was a Layer 2 Switch with identical VLAN config to its replacement, I should be able to replace it along with a sister switch in another rack within the same maintenance window."
"Mostly because I didn't want to run the hassle of agreeing multiple windows with the client (and their clients)."
Colin was careful. He labelled all the cables. Undid all the bolts. And ever so gently, he began sliding the old switch out of the rack.
"I felt some resistance and pulled gently since the RJ45 jacks on Cat 5 cables can take some strain without damage, and all the cables were out anyway..."
Out of the switch, perhaps. Alas, by the time Colin realised what was causing the resistance, the second power lead had popped out of the server that happened to be running BGP on an ancient version of Linux for the customer's entire network. And, of course, there was no backup server waiting to helpfully step in.
Maintenance windows can be used to cover many sins, and Colin was burning through his at a terrifying rate. Yes, the Penguin Gods smiled and the Linux box restarted without issue, but there was work to do and time was marching on.
Tired, Colin came to the final switch. It was at the top of a full height rack. Of course it was. The Data Centre helpdesk team had not shared their step ladder due to "health and safety", necessitating some precarious balancing but at least the cables were a bit more accessible. They needed to be - Colin hadn't been given the key to get into the rear of the rack.
"I swapped the switch and attached the power, fibre uplink, and customer Cat 5 cables one-by-one," said Colin, "Since I had been meticulously checking for traffic on each customer port as I connected them, I noticed that one cable was missing."
He hunted and hunted. But cable there was none. "I think it was somewhere around the side but falling towards the back under its own weight," he told us.
Midnight came and went and Colin was exhausted. There was a real chance that he might make things oh-so-much worse with tired hands. The maintenance window was also closing, "so I figured that the customer would understand."
He messaged a senior colleague to flag up the port missing its cable. Someone in the morning would have to deal with it. It was, after all, just one cable. How bad could it be?
- To err is human. To really tmux things up requires an engineer
- It's the day before the grand opening but we need a firmware update. It'll be fine
- How to destroy expensive test kit: What does that button do?
- IT god exposed as false idol by quirks of Java – until he laid his hands on the server
"My colleague impressed upon me the following day that the only missing cable belonged to the client's biggest customer, and he did not see my message until after a while had gone by since the customer's business hours began and they were fully affected by the outage."
Colin was allowed to forget his mistake. It was used, he said, "as a prime example of why you should never go 'big bang' with a swap-out like this, not to work onsite alone, and to always declare a much longer window than you think you need."
Lessons learned after event tend to be the toughest ones. Ever found yourself fighting through a forest of cables and finding a customer less than understanding and forgiving of an unforced error? Confess all with an email to Who, Me? ®