Server broke because it was invisibly designed to break
Cause of weeklong outage was under tech support's nose – but not on their mind
On Call The week, and indeed the year, may be ebbing away to their respective conclusions, but The Register continues to toil away at On Call, our weekly reader-contributed tale of techies triumphing under trying circumstances.
This week, meet a chap we'll Regomize as "Kris" who arrived at his desk one Monday morning to find the phone ringing off the hook, a full voicemail inbox, and his pager pinging with urgent messages from his manager.
The cause of all that urgency was a dead server that powered an important application.
Kris quickly inspected the machine, which was connected to power and had working UPSes. No tell-tale smell suggested something inside had expired. Turning everything and anything off and on again had no effect.
“The unit was dead as a doornail, as they say,” Kris wrote.
The only thing to do was therefore to call the company's service provider for help.
"This was one of those situations where I really didn't want to deal with them because in most instances by the time they sent over a tech, the problem was already resolved either by the user or by myself,” Kris told us. But at this stage a second opinion was the only option.
"About five hours later the tech shows up smelling as if he lived in his car and slept in his ashtray," Kris told On Call. Within five minutes he'd diagnosed the server as broken and new power supplies as the fix.
That hardware arrived two days later and – after Kris and the techie grunted and hefted the server to get it installed – failed to fix the dead box.
- Boss installed software from behind the Iron Curtain, techies ended up Putin things back together
- Two signs in the comms cabinet said 'Do not unplug'. Guess what happened
- Go ahead, be rude. You don't know it now, but it will cost you $350,000
- No, I will not pay the bill. Why? Because we pay you to fix things, not break them
Kris confessed to The Register that part of him didn't mind this at all, because seeing the smug techie brought down a peg was quite fun.
However by this time – three days into an incident that deprived the company of an important application – Kris was under more than a little pressure.
So he concurred with the visiting tech's blankly amused observation that the server remained broken and that the motherboard must be the real problem.
Three days later a new motherboard arrived and – after more lifting and sweat – did nothing to restore the server to working order.
But all that work did set Kris on the way to finding a fix, because handling the case set in train another thought: were the interlock switches working?
Interlock switches, for the uninitiated, are safety mechanisms that stop current flowing when cases are open. Which is a fine idea because nobody should be electrocuted while working on a server.
It transpired that one of the switches had broken on this server, but the fault was invisible and undetectable.
"I'd never opened up the server in my time at this job," Kris said. "I made a quick trip over to R&D and one of the engineers pulled a similar switch out of their parts and gave one to me. The tech wired in a new switch and we were good to go."
This incident is not a typical On Call triumph.
That came – or rather, didn't come – in the following weeks and months, when Kris's inbox never contained an invoice for the attempted fix by his external service provider. So while the incident was unpleasant, at least it didn't cost Kris's employer a cent!
On Call will run in its usual Friday timeslot through the holiday season. Use your downtime to recall any stories of yours like this one from Kris, or tales of holiday time tech support click here to send On Call an email. ®