New boss took charge of project code and sent two billion unwanted emails
Techie summoned at 02:00 AM to sort things out sent another 2 billion trying to fix it
On Call Welcome to another instalment of On Call, The Register's weekly wander through your tales of tech support.
This week, meet a reader we'll Regomize as "Nick" who in the late 2000s found himself working as a contractor at a London investment bank.
"I had technical oversight of a project related to overnight valuations of the bank's credit derivative products," Nick told On Call.
Some of you may remember those products were a big reason the global economy tanked in 2008. Nick wasn't responsible for that mess, but his job did involve systems that valued the bank's products overnight, so that when trading started in the morning everything was shipshape.
"We needed to know quickly if something did go awry with the process, so we implemented a Log4j plug-in that sent out an email whenever an error was detected, together with all the error details," Nick explained. That system was rate-limited to send error message emails no more than once every ten seconds.
The system worked … until the bank hired a new project manager who decided the best way to understand the system was to take charge of the next release.
"He duly went through the build and release scripts, decided that the Log4j plug-in had no place in a release system and pulled it, without mentioning this to the rest of the team."
Of course the new project manager made mistakes – especially the omission of a changed SQL script – which became evident when the new release went live on Saturday.
"At 2:00 AM on Sunday morning I receive a call saying the system had crashed," Nick told On Call. "I logged in but could not see any error messages – because the system had generated two billion SQL error messages."
The new project manager's code changes meant the one-error-every-ten-seconds rule was gone, so every single one of the two billion errors resulted in an email.
That flood of messages swamped the bank's email servers, giving Nick no evidence with which to diagnose the issue.
- 'Fax virus' panicked a manager and sparked job-killing Reply-All incident
- Client defended engineer after oil baron-turned tech support entrepreneur lied about dodgy dealings
- Energy drink company punished ERP graybeard for going too fast
- Hardware inspector fired for spotting an error he wasn't trained to find
"I had no idea about the problem, so told the support group to restart the calculations. The result was another two billion email error messages."
"It took the best part of two days to get the email servers operational again," Nick told On Call.
This story has a happy ending, because the project manager understood the error of his ways, stopped working solo, and became a proper team player.
"We did all work well together in the end," Nick told On Call.
Have you been summoned in the small hours to fix a mess made your boss? If so, click here to send On Call an email so we can immortalize your tale on a future Friday. ®