Real-time software? How about real-time patching?
For those times when being On Call means more than just cycling the power
On Call Sometimes some systems just have to work, and there is always a Register reader to save the day. Welcome to On Call.
Our story today takes us back to the 1990s and involves "Paul", who had just joined the industry after university.
His employer produced command-and-control systems for public services (think fire brigades and police). He had a one-year contract developing drivers for alarm hardware such as pagers and sirens to let all concerned know that something bad was happening and needed dealing with.
"The technical setup was mostly HP, Sun and Motorola Unix workstations with QNX servers doing the dirty work, my products being located on the QNX realtime servers doing all the alarming whenever something alarming happened," said Paul.
He also did some work connecting to various shouty devices using C and C++.
This all took place on the European mainland, and our hero's office was located in a country famed for sausage, beer, and industrial metal. Every few weeks Paul would visit a client site to make sure the system worked correctly with whatever was installed at the local fire station, police HQ and so on.
And so it was that he headed off, just before Christmas, to a neighbouring country to check things were ticking over at the new "centre for catastrophe protection and emergency services."
His task was, as Paul put it, to ensure the code "would finally pass… the really final attempt at passing the acceptance tests at the end of the year." No pressure, and when things are getting a bit squeaky, who would you call? The youth fresh out of university, of course.
He packed up his "portable computer" (a sewing-machine sized contraption with twin 5 1/4" floppy drives and a 7" CRT) and headed out to the site, nestled in the mountains.
The new system was already up and running but was not working well. Paul was stationed in a conference room, looking at the logs in real time and patching things as they went wrong. Every time the system crashed, the emergency services crew had to switch to the old backup system and send out alarm calls manually. There was a lot of manual intervention, and the customer was not happy.
"You had phenomena such as 'snow', 'cold', 'avalanches', 'parties with very drunk people in houses in the mountains where you had a precipice behind the backyard where you might erroneously try to take the last open-air piss of your life'..." explained Paul, with a little too much detail.
"I saw all those funny test notifications they sent for my personal amusement," he went on. "General alarm – explosion in an iron foundry – send EVERONE –
For the uninitiated (and bless you if you are – we envy you), a SIGSEGV is usually a segmentation fault or violation and happens when some code tries to go stomping around in memory where it shouldn't.
"Only those weren't test messages – that was the live emergency system for the [region] crashing completely when it tried to send out the rescue teams (including helicopters and all) after a major event."
- No, I've not read the screen. Your software must be rubbish
- Bouncing cheques or a bouncy landing? All in a day's work for the expert pilot
- Why should I pay for that security option? Hijacking only happens to planes
- Nothing's working, and I've checked everything, so it must be YOUR fault
Desperately, as the gap between Christmas and New Year wore on, Paul tried to solve the problem. Emergency calls were flooding in around him, and his employer's system was disgracing itself.
He, meanwhile, "was debugging through a rather complex Medusa's head of recursively stacked C functions with not very self-explanatory names to find out what exactly the whole rat's spaghetti nest of code did..."
As he peered blearily at the source, the system SIGSEGVed once again.
It transpired that a predecessor had needed to convert (in C) an integer value into a binary value and opted to temporarily convert to a hexadecimal value on the way through some decidedly iffy pointer-based bit juggling operations. Yet such a conversion wasn't really needed – the value was already stored in binary format.
However, the original coder hadn't been fully aware of that (and had likely been parachuted into the job with precious little C experience) so had tried his hardest to make the conversion work... but it didn't. Not all the time. And C, as programmers know all too well, can be cruel to the unwary. Hence SIGSEGV.
Paul simply removed the whole complicated chunk of code and the SIGSEGV just went away.
"I think I whittled down the remaining problems quite nicely during that week," he said, "but whether they got the final approval or not, nobody ever bothered to tell me."
His efforts were rewarded with a visit to the recruitment agencies. "A month later or so the company was sold to their competition," he went on, "and as a recent university graduate with neither seniority nor family I was one of the first to get fired in the takeover cleanup."
Ever been called upon to support those who really are On Call? Been on the receiving end of messages that you thought might be gallows humour but turned out to be something quite different? Tell your tale with an email to On Call. ®