On Call Welcome to another in The Register's series of stories from those receiving calls for help and slightly passive-aggressive helpdesk tickets. Start your Friday with a helping of On Call.
Today's tale comes from "Jon", who found himself having to deal with the fallout from an ill-considered corporate emission.
In the middle of the last decade, Jon was working for a large US firm specialising in the software and hardware used in the media world. Customers included film makers, television broadcasters, 24x7 rolling news outfits as well as recording artists and live performances.
Jon was the IT manager of the whole show and responsible for all the gizmos from the CEO's mobile phone to the receptionist's desktop and everything in between. "My primary concern," he told us, "was what sat in the data centre."
The day of what Jon delicately called "The Event" began with a P1-level helpdesk ticket arriving: the company website was down and an immediate response was needed. A conference bridge was set up, and "the usual suspects assembled from different corners of the world."
It was an odd one, looking for all the world like Distributed Denial of Service (DDoS) attack: "The web servers themselves were up but the web server engine was getting mullered by thousands of requests per second. Initial investigations showed that the requests were coming from all over the world; there was no single IP, or range, or country or entity. It was coming in from everywhere."
Oh sure, we'll just make a tiny little change in every source file without letting anyone know. What could go wrong?READ MORE
But it couldn't be a DDoS attack. The company paid handsomely for a protection service, but it seemed to be standing idly by as the web server engine enjoyed an impromptu Hammer Time.
"It was as though the requests were being whitelisted by the DDoS protection service and it wasn't doing anything at all."
The question was how all those requests from all over the world could have been whitelisted. The gang looked deeper and discovered the awful truth in the form of a particular User Agent string.
Jon revealed a little about the inner workings of that company's products: "When the firm's software is sold and deployed by the client, it comes with an application manager, a bastardised version of Chromium."
This evil creation squatted on the user's computer and made regular calls back to the mothership to check the licence, look for updates, the usual sort of thing.
"The User Agent string was from the application manager," admitted Jon, "which was perhaps inconveniently whitelisted by the anti-DDoS service."
But why would the company's website be hammered by the application manager now?
It was, of course, the fault of the development team who had pushed out an update, "although the developers refused to admit it at the time," Jon added. This was normal practise, but this particular emission included a tweak to the application manager that changed the call-home frequency from once every four hours to once every four minutes.
To make matters worse, if the initial request failed the software would immediately retry. And keep retrying.
"As the update went out worldwide," Jon explained, "the new version started calling home waaaay too often, the number of requests mushroomed and took down the company website.
"The firm had DDoS'd itself. :facepalm:"
Fixing the problem was not straightforward. An update to dial back the frequency could not be pushed out because, er, the existing application manager was too busy performing a highly effective DDoS to spot and download the new code.
"Rock, meet hard place," sighed Jon.
In the end, "I spent a good few hours spinning up new VMs to have all traffic route through a layer of Linux servers running HAProxy." He was able to carve out the application manager traffic and therefore allow a small percentage of requests to succeed and the fixed code gradually rolled out.
"This is one of the reasons I now deploy HAProxy in front of all web traffic regardless of the application, company or volume..."
On Call now includes those oh-so-urgent helpdesk tickets. Ever been on the receiving end of a particularly grim example of the breed, or lost hours dealing with somebody else's self-inflicted catastrophe? You have? Share it with an email to On Call. ®