Fix five days of server failure with this one weird trick
When you have eliminated the impossible...
On Call Welcome to another in The Register's series of confessions from readers who were either possessed by the pager or all too happy to fire off a demand for On Call support.
Our story, from a reader Regomised as "Will", takes us back a quarter of century to the back office of a well-known UK bank. You know the one – it had a strapline that Will noted was "an absolute nightmare of a slogan if we ever refused anyone a loan."
It had a substantial amount of Unisys B-Series gear, in this case one server per 20 connected clients. "We had nine of these server/client configurations," Will recalled. And he was responsible for supporting them.
The servers were of the B39 variety. Veritable balls of fire, with a mighty 20MB of RAM serving the demanding 80386-based processor. It had a floppy drive and an exposed SCSI bus for attaching peripherals. As was the custom, the grey, plastic box housing processor and RAM was called the "CPU" and Will's usual configuration added one or two hard disks, a tape drive, and some communications modules hanging off the SCSI bus.
All was good until a fateful Monday, when the server crashed overnight with a SCSI tape drive error. Not unheard of, and after a reboot all seemed OK.
Tuesday, the same thing happened again. This time Will escalated the problem. A call was made to Unisys, his word was accepted ("I had a bit of a reputation at the time," he modestly told us) and a new drive shipped. Will fitted it, tested it, and declared the job good.
Wednesday rolled around and once again the server tripped up with a tape drive error. Faulty replacement parts are not unheard of, but Unisys opted to play it safe and sent out an engineer to do the switcheroo. All went well until the engineer was packing up and the new tape drive errored. Again.
"Long story short," said Will, "the rest of Wednesday was spent trying to get this tape drive to work, and it didn't."
The Unisys engineer arranged for a complete, known working server to be shipped to the office.
On Thursday the server arrived. The hard disk from the suspect box was swapped.
The new server crashed with the same error.
Will had a bright idea – there was a decommissioned server still running in the office. How about using that?
Nope – still crashed.
Will and the Unisys engineer then tried building a single server from bits of other known working computers. The only constant was that hard drive.
And still it crashed.
Will's tale predates Craig David's "7 Days", but the experience certainly sounds like one to which the singer could relate.
- So the data centre's 'getting a little hot' – at 57°C, that's quite the understatement
- See that last line in the access list? Yeah, that means you don't have an access list
- Breaking Bad or just a bad breakpoint? That feeling when your predecessor is BASIC
- Malware and Trojans, but there's only one horse the boss man wants to hear about
The week was coming to an end at the bank, and still the server was fighting Will and the called-out Unisys engineer. The hard drive couldn't be changed (for obvious reasons), but everything else had.
Except one thing. The box received power from a brick on the floor. "This is going to sound weird," Will hesitantly said to the Unisys engineer. "I know the power supply shouldn't have anything to do with it because everything else is working, but we've changed everything else, so…"
"Yes, you're right," replied the engineer, "it is weird."
"But why not?"
As Will bent to inspect the black power brick, the system responded by immediately crashing.
"It turned out that the power adapter was faulty and sending bad voltage to the server and then to all the peripherals via the SCSI bus."
It transpired that the CPU and hard drives could handle it, but the tape drive was a bit fussier and responded in time-honoured fashion: generating an unhelpful error and crashing the server.
"The power brick and cable were taken from our spare," said Will, "and it worked faultlessly thereafter."
Sadly, the line "and then I changed the power supply" does not feature in the climax of Mr David's opus. It might have improved it immeasurably, perhaps creating an anthem for all those cursed to be called out to solve more mysterious IT problems. Share your experience with an email to On Call. ®