On Call What's that coming over the hill, could it be Friday? Hurrah! Time to settle down for a reminder about the dangers of over-modifying a system past its prime in today's instalment of On Call.
"Ed" contributed today's story, which takes us back nearly a decade to a trading system running on an iSeries.
The software itself was an elderly bit of code, customised past recognition and mostly run in a classic green-screen via a terminal emulator.
Yes, one of those applications: modified way beyond its original purpose and featuring new, exciting, and unexpected interactions between original and new functionality.
Likely seared into his memory, Ed recalled the user interface: "For record list screens, it was standard that Option 1 and Enter would select a record and either perform an action, or more commonly, load a second-level screen (like a detail record, or a Y/N prompt for an action)."
You *bang* will never *smash* humiliate me *whack* in front of *clang* the teen computer whizz *crunch* EVER AGAINREAD MORE
So far, so good. However, on those second-level screens, a jab of F4 would perform the equivalent of specifying that Option 1 against a select-all of all records.
What, a curious reader might ask, could possibly go wrong?
"One such list screen," Ed explained, "was a list of pending transactions, where Option 1 would cancel the transaction.
"Whoever wrote that screen in the original application 20 years earlier had decided to include the F4 auto-select feature."
This was A Bad Idea™ because there was no confirmation prompt. Option 1 simply cancelled the transaction. Hitting F4 would therefore cancel all pending transactions.
"To make matters worse," he added, "Option F4 was only labelled as 'auto select', not as 'cancel all transactions'."
A little like relabelling the nuclear launch button to one entitled "bring forth the fluffy bunnies".
To be fair to the original coders, back in the day transactions had usually been processed immediately, so the potential for destruction likely never occurred to anyone. However, things had changed in the intervening years and a whizzy custom product now generated thousands of transactions overnight which sat in Pending status until 11 o'clock the following morning ahead of processing.
Those of a nervous disposition should look away now.
The inevitable call from a user came in at 9 o'clock: "I had a stuck transaction, I tried pressing F4 on the pending transactions screen and now everything's gone."
Ed enjoyed a few seconds of blissful ignorance as he hunted down the screen to find out what it did.
"Then the bottom dropped out of my stomach," he said. "Oh ****, it's before 11am... If these transactions couldn't be restored, it would take the business months to manually correct things."
The team had less than two hours to prevent catastrophe.
Ed told his boss, took his phone off the hook, and got stuck in.
"Rolling back the journals would not work. The updates were performed not only by the user's job, but were cascaded to many other tables by a large pool of background jobs, that were processing many other transactions at the same time."
Ah, the joy of cascading transactions. So convenient. So handy for enforcing business rules. So horrifically destructive.
Ed would have to deal with the issue manually.
The sheer age of the system meant that the end-to-end process wasn't terribly well understood, but with perhaps a premonition of the disaster that might unfold, Ed had dutifully begun documenting it and had started just the previous week.
He wasn't done, but at least some of that two hours wouldn't need to be spent pondering what programmers of decades past had been thinking.
"What followed," he said, "was about 90 minutes of furious code analysis and querying. Identifying the affected list of transactions and all the updates performed. Writing SQL to revert the updates. Reproducing the issue in test and confirming the SQL worked. Requesting emergency update access to live, transferring the transaction list onto it, and running the SQL."
A paragraph to induce nausea in even the strongest of admin stomachs.
"I did it, with about five minutes to spare."
He also, while he had emergency access, updated the screen to show a warning prompt on what he delicately described as the "death button" – aka F4.
Knowing users all too well, he sighed: "Otherwise it was bound to happen again sometime..."
Unusually for those normally unsung heroes of IT, his reward was profuse thanks from his boss and the unfortunate soul that had pressed the button of certain doom. "For the longest time after," he said, "the business users would do almost anything for me."
Heck, he even got a £50 gift voucher.
However, he also got a reputation as being the chap to go to when everything went wrong and, quoting a bit of 1992's Under Siege at us, "coming up with last-minute desperate solutions to impossible problems created by other [expletive deleted] people."
Ever found yourself plunged into decades-old code with mere minutes before the business explodes? Or saved the day and found yourself saddled with an unfortunately helpful reputation? Of course you have, and you should share That Time The Phone Rang with the vultures of On Call. ®