On Call Friday is upon us, and with it another On Call story from those poor souls who have to answer the phone when everything goes wrong. Not all heroes wear capes and, as we'll see, remember to ward their Linux servers from an enthusiastic boss.
"Hans" is the contributor of today's tale, and his story takes place some 15 years ago, the era of the Friends finale, Shrek 2 and, of course, Shaun of the Dead.
In what will be a familiar story for many readers, Hans was the one-man-band IT department for a rapidly growing business. "When I started," he told us, "it was like 400 employees. Five years later it was 1,200."
Naturally, IT resourcing didn't keep pace with the business's growth.
Hans was on the road, heading to a new office location to set up some IT gear, when The Call came in: "I got a call from the company owner that something was wrong."
As it transpired, something was very, very wrong. There was: "No internet access, all subsidiaries are offline, no email and all 'production' stopped."
Whipping out his notebook and plugging in his trusty PC card modem (remember those?) Hans connected to the internet and logged into the company's modem.
"Back then," he recalled, "small companies like us did not have any corporate network from a big telco but managed everything on their own. This means that it had to be cheap."
"Cheap" meant a DIY router made from a home-built server running Linux. The server had been set-up to be a jack-of-all trades, acting as "Firewall, Router, VPN Gateway, Groupware Server, Timeserver and some other things."
What could possibly go wrong with such a set-up?
Hans connected, checked the logs and found the problem was "an unplanned restart about an hour ago."
He discovered that none of the services were running on the Linux server because whatever fool had set the thing up had configured the running state to be different to what was the default following a restart. "Changing the run state was therefore a quick fix to get everything to work again," he recalled happily.
The cause of the failure was a mystery. Hans checked in with The Boss and discovered that a user in a remote location had complained that he was unable to log in to the terminal server. Since Hans was out, The Boss (in a most unboss-like fit of business ownery helpfulness) decided to check out the machine in question. It all looked fine, but right after he checked the "Internet and Email had a problem as well..."
In fact, everything had had a problem straight after that innocent check.
Like a sleuth in paperback of the finest pulp fiction, Hans pondered the problem and, after asking a few more pointed questions, came up the sequence of events that led to the world dropping out of the bottom of the data centre.
The Boss had trotted to the server room, opened the rack, turned on the CRT (a CRT!) and hit Ctrl-Alt-Delete to bring up the Windows Server login dialog. Only after that did he hit the button to bring up the Terminal Server on the KVM switch.
What the boss didn't fully comprehend was that the switch was pointing at the Linux box, and that particular key combination could do terrible things to a console. Unbeknownst to The Boss, he'd "triggered the reboot of the machine sending it to a maintenance mode."
"All could have been prevented," sighed Hans, without "a boss who had enough access to be dangerous" and a user with a locked account.
"The very next day… The Boss handed over his key to the server room, understanding that most things he can do in there could cause more problems than solving any."
The story, however, does not end there, as Hans confessed to us that he "had kept quiet that I might have forgotten to change the default behaviour of a Linux console to NOT reboot when confronted with a Windows User, and having a production system start into maintenance mode by default..."
Ever had to defrock The Boss after a call-out revealed power gone to the head? Or had someone blunder into a cock-up of your own making? Send an email to On Call to share the pain. ®