RBS Mainframe Meltdown: A year on, the fallout is still coming
When the totally brand new kit comes on ... what do you think will happen?
Bad-tasting layered cake
Banks have layered on more apps, different code and data flows to these systems as the business has changed. The RBS mainframe has been stretched with the acquisition of NatWest in 2000 and Ulster Bank to the point where it’s audibly creaking.
The Edinburgh mainframe system was so old that parts of its code had been written in assembler for hardware going back to the 1970s. The mainframe talks to RBS’s network of ATMs, once CICS terminals but which are today PCs serving modern amenities such as mobile phone topup. The back-end systems are strung together using a clone of MQ (Message Queuing) middleware from IBM. One of our ex-insiders points out:
“Heaven knows how much extra cruft you have to know and understand.”
This has made the Edinburgh mainframe hard to run and to maintain. Another former RBSer explained how complicated it had become:
“I recall getting into discussions regarding adding an itemized ATM withdrawal fee to statements if such a thing was incurred, and waving around a bit of listing with the relevant code on it. The meeting (involving about 12 people) happily discussed all sorts of peripheral things, with the only question to me being 'can you actually read that stuff'. They gave up on that idea after a while on the grounds that it was "too complicated."
Spending for a fresh start was perhaps inevitable given the levels of flak RBS and its chairman took. But such systems don’t run on their own, and there’s a need to make sure those running them understand the systems and the jobs. Is RBS really making the changes that might help stave off a repeat of the crisis of June 2012?
Treasury Select Committee chairman Andrew Tyrie MP wrote to RBS CEO Stephen Hester at the time of the outage saying his committee was “extremely concerned about the current crisis at RBS.”
In his letter, Tyrie demanded Hester explain what caused the meltdown, what it revealed in RBS’ ability to spot potential risks and to develop contingency plans, and to know whether outsourcing had contributed to the crisis. The latter question is pertinent because among 36,000 jobs cut by RBS since March 2012 as part of cost cutting are 500 IT jobs that have been outsourced to suppliers in India.
Hester and RBS have said their investigations found the error occurred during operations “managed and operated by our team in Edinburgh.”
It’s a carefully constructed defense because, as The Reg found out at the time, among 500 IT staff that have been cut by RBS were those running the CA-7 process that went wrong. The job of running CA-7 went, at least in part, to staffers working in Hyderabad who were paid 8-10 lakhs of rupees, a salary of roughly £9,000 - £11,000 according to an “urgent” job ad in February 2012.
RBS uses CA-7 to perform routine batch scheduling of jobs on the mainframe:
Batch scheduling software is used to process routine jobs and avoid the need for manual input: jobs are prioritized, scheduled and performed automatically. RBS runs updates on accounts on the mainframe concerned overnight, with thousands of batch jobs scheduled by CA-7 [from CA].
The important difference is the geographical gap between those running the CA-7 process, in Hyderabad, and those managing the CA-7 team – in Edinburgh.
One ex-RBS IT staffer pointed to the existence of a communication gap between teams in India managed by staff in the UK and how this could have helped slow down RBS’s response to the unfolding crisis last June.
“A lot of people miss the fact that there are very different cultural references and behaviors, lots of people in UK presume that they speak English so they must understand everything said in the same way that someone from the UK does and it's just not the case,” our source said.
Another consequence of outsourcing has been a loss of those skilled in running the mainframe and knowledgeable as to how the mainframe’s owner, RBS, operates in what is a sensitive and demanding sector. One reason mainframes are so popular among banks is the fact they are reliable, so an RBS-style meltdown should be relatively rare.
Robin Bloomfield, professor of software and system dependability at City University, London, told The Reg that skilled IT staff are as important as the hardware because they get to know the individual systems and learn to spot early warning signs and apply the appropriate remediation before things escalate.
“Sometimes people see legacy equipment as a legacy issue and all you need do is plug in something more modern,” Bloomfield says. “But they are reliable because of the culture around them – the people around them, the safeguards. That can be ignored in an organization if it’s seen as an IT issue and ‘all we need is a black box’.”
Bloomfield, who specializes in dependability and safety of software-based systems and in operational risk, says he’s seen many cases in financial IT where the technology is treated as black box – meaning it can be installed and operated, without much thought to who runs it.
It is unclear whether RBS is reversing its policy of outsourcing or whether the team running the new mainframe will be brought back on-shore. We asked RBS what had happened to the old outsourced teams that ran the CA software last June but RBS did not respond. We asked RBS whether it had updated or changed the policies used to manage the risks associated with the mainframe to improve recovery as a result of last year’s outage, but – again – RBS did not respond.
As noted, the matter of outsourcing and running critical banking functions on legacy mainframes loaded with a spaghetti of dated code is not restricted to RBS.
It could be that one potential consequence of the June 2012 RBS meltdown is other banks are forced to update or change their ways, too, especially if regulators act.