On Call Welcome to On Call, The Register's weekly cautionary tale for those who believe a good deed can ever go unpunished.
Today's adventure from 1990s Chicago is a reminder of what can happen if you are feeling just a little too helpful.
A reader, who we shall call "Alf" (for that is not his name), was handling second-line support duties for a well-known bank in the city. "The three main UNIX file servers," he explained, "were a couple of Auspexen."
"Two were for Production, and one was for Development."
Auspex Systems, for those spared the storage solutions of the previous century, was founded back in 1987 and rolled out the first network-attached storage (NAS) devices. It went on to crop up in many data centres before falling behind the likes of NetApp. It was eventually liquidated in 2003.
But back then the company was very much alive and kicking and the developers using the boxes were doing what they do best: releasing software without too much thought for where it was running.
"They released beta versions of new software from the Development box," explained Alf, "but those programs frequently ended up going live without being properly migrated to the Production boxen."
Of course, the inevitable happened. A drive on the development box died, which "brought a few production services to a screeching halt since the custom version of SunOS 4.1.3 blocked on I/O wait".
"After some investigation I found which drive was bad, then tried all the usual admin tricks to get it to behave, unmounting, fscking, et al. None of which worked because of the I/O block."
The bank sensibly had a service contract with Auspex, but it would be hours before an engineer could get to the site, "and that," said Alf, "just wasn't going to do". This being quite some time before the days of Stack Overflow and Google, Alf was reduced to desperate
man -king for something – anything – that could help.
Every minute of downtime was losing trades and burning money.
"I eventually found
apx_umount. These finally let me get the faulty drive unmounted and the server working again."
The financial losses were substantial, but could have been a lot worse. Alf and a PFY managed to coax the dead drive back to a semblance of life on a Sun workstation and extract around 80 per cent of the data.
"The developers," he remarked, "got a very stern talking to about running production apps from the development server."
And there the story should have ended, if it wasn't for the phone calls.
"I would get calls during the day about random Auspex problems ", Alf explained. "Then I started getting off-hours calls... even when I was not on-call."
"After a few midnight calls and weekend calls direct to me instead of the actual on-call personnel," Alf had had enough, and asked the night-ops person why he and his wife were being woken rather than whoever was being paid to sit by the phone.
"Turns out, after my recovery of the Auspex box, someone decided I was now the expert on those machines and posted my name and direct phone number on the wall right next to them."
The moral of the story? "Developers need a firm hand about what is Production and what is Development," said Alf...
And, of course: "No good deed goes unpunished."
Ever tried to be helpful and come to rue your non-BOFH behaviour later? Or forgotten where that critical production application was running until the staging box got unplugged? Us too. Tell us all about it in an email to On Call.
Where were you 20 years ago? Were you frantically cutting COBOL or adding a crucial extra byte or two to a date field? Or a bodge that might last to, oh, 2050 before it explodes? Who, Me? and On Call would also like to hear your sordid Y2K tales for a festive feast of near-failures and dodged bullets. ®