On Call Though the week is over, for some the weekend does not involve frolics and adult beverages but a nerve-wracking 48 hours of watching the company phone. Welcome to On Call.
Today's story comes from "Guy" (not his name) who was enjoying his second weekend of being on call.
He'd been told that Saturday was the only day when the phone might ring, and so had planned a relaxed Sunday in central London with a new significant other. Not too far from the office should the unexpected happen.
And happen it did. Just as he disembarked the train, ready to impress his date with the sights and sounds of the big city, his phone went off. It was the on-call DBA and one of the most important database servers was very, very unhappy. Rather than its usual eight CPUs, it was struggling along on only two. Could he check it out, please?
The office was located in London's Docklands, so Guy's date was treated to an impromptu tour of the Docklands Light Railway and a demonstration of Guy's skills of persuasion in gaining access to the server room.
"Opening the door to the machine room I was hit by a wall of hot air," Guy told us. "The thermometer just inside the door was reading 40°C, it should have been at 5°C."
The aircon had failed, although strangely none of the supposed hourly checks by building security had noted it. Almost as if they opted for a peruse of the Sunday papers rather than doing their rounds. Surely not.
Guy reported the failure and proceeded to have a look around the stifling room. Almost every server was showing the yellow service light of distress but amazingly none seemed to have actually died.
There was, however, that problem reported by the DBA. So Guy dug a little deeper.
"The majority of the machines used were Sun E4500's," he told us, "and unusually E4500's vented cooling air from left to right, rather than front to back."
Of course, the database server was located at the right-hand of a rack. Guy couldn't get too close – the air coming off the server was hot. A thermometer registered it as 70°C.
"What had happened was the DB server was one of three E4500's all in a line. The left-hand server was the dev DB server and the hot air it was sending out was being sucked into the UAT DB right next to it (literally a 2" gap). It further heated the air before it was sucked into the Prod DB server, again only inches away."
- Fix five days of server failure with this one weird trick
- So the data centre's 'getting a little hot' – at 57°C, that's quite the understatement
- See that last line in the access list? Yeah, that means you don't have an access list
- Breaking Bad or just a bad breakpoint? That feeling when your predecessor is BASIC
Guy got the go-ahead to turn off the poor thing in order to allow it to cool down. He also shifted it to the shelf above (sadly his new girlfriend was not present to witness him hefting the weighty chassis of one of Sun's finest above his head) while he waited for the aircon engineers to deal with the issue.
With his plans for a lovely day in London now in tatters – it would take hours for the server room to cool down – Guy pondered what to do. The answer was obvious – rope his date into helping him identify all the unhappy servers so he could diagnose their issues.
The old romantic.
As it transpired, only a few power supplies and disks had been properly fried, and those were just on some of the dev servers and so could wait until Monday for repair.
As it turns out, that surprise date in the data centre was a smart move as Guy and his then-girlfriend are now just a few years shy of their 20th wedding anniversary. "She doesn't ask much about my work, though," he added wryly.
And the security guards? Their fate is shrouded in the mists of time, although Guy did note that their discussion with the building manager was "somewhat interesting."
One might almost say "heated."
We've all had a candlelit moment spoiled by the trilling of the On Call phone. Then again, the same persistent ping can also rescue one from the most awkward of rendezvous. Tell us about the time the pager went off at an inopportune time with an email to On Call. ®