A year ago, RBS experienced its Chernobyl moment – an incident when a case of simple human error by those running critical systems resulted in a crisis.
IT staff badly botched routine maintenance of the IBM mainframe handling millions of customers' accounts – a system processing 20 million transactions a day. The mistake was compounded by their inability to recover the situation quickly enough.
The fallout saw up to 16.7 million customers at three banks in the group – RBS, NatWest and Ulster Bank – unable to access their money for four days.
RBS couldn’t hide and MPs monitoring the City pounced, demanding immediate answers from senior management on what went wrong with the bank's computers.
The press of middle England weighed in, too, pillorying the bank’s already unpopular chairman as he gave a grovelling apology to MPs for the whole episode.
A year on and the fallout is still landing as the Financial Conduct Authority (FCA) are deciding whether action is needed against RBS.
The bank is splashing out £450m on top of its £2bn annual IT spend to replace the mainframe that failed and on new backup. RBS told The Reg that it's instituting a “complete refresh of the mainframe” system in Edinburgh.
It’s an unprecedented move: RBS has had computer problems in the past but nothing has warranted a complete rip and replace of entire systems on this scale.
A year in, though, has RBS learned its lessons? Can throwing money at new hardware save customers from future problems? It’s still unclear that RBS has truly reversed course on its policy of wholesale outsourcing of IT jobs - a policy that helped ignite last year's crisis.
Then it's quite possible that what happened at RBS might be replicated elsewhere as old and overloaded mainframes like the one at RBS hold millions of accounts at other banks who’ve also sent their IT jobs overseas.
RBS is spending nearly half a billion to replace the system that failed in June 2012. The group is buying a new IBM mainframe and making “significant change to many of the systems to improve disaster recovery and automated error recovery” a bank spokesperson tells the Reg. “This is a result of the IT problems [last] June.”
One former RBS IT insider told us the usual procedure for outages at the bank:
“I've dealt with a few outages at RBS in the past. There will have been standard 'lessons learned' or 'drains up' type of investigation. The exact reason for the problem have been found and poured over in a tedious level of detail, then process will have been put in place to prevent the set of circumstances repeating. Often this is to tighten security and/or process, so for example you may have found individual user groups being tied down more or process documentation required to be more granular. Potentially more post-change reviewing to make sure that people did what they said they would.”
RBS faces a Herculean job in bringing online a new mainframe operating in a core part of its day-to-day business. It must plan and execute the job without interrupting the existing service by taking the old mainframe offline during the transition.
RBS did not say when it plans to bring the new mainframe online.
But hardware is only one thing: RBS must also determine what to do with the existing apps running on the system. Either it must port existing apps to the new system - which is likely - or write or buy new apps. If the former, RBS must design, write, test and then shift. If the latter, RBS must make sure the new apps work on the new mainframe and interoperate with other RBS’s other, connected systems.
RBS did not say whether the old CA-7 software would be ported to run on the new mainframe.
Our ex-RBS techie outlines the complexity of the challenge facing the RBS team making the switchover:
Being a bank everything is by nature hung together with schedules and batches, this is really the correct way of doing things. You make a product, make it stable, then if it needs to do something else you add another system and a dataflow … I once saw a diagram of all the dataflow in RBS, just the ones between major systems fairly impressively filled a projection screen, then all the minor systems were added and it was just a black screen!
It’s a measure of just how bad things were for RBS that it’s spending £450m to be more or less back where it started - on a mainframe, just a newer and, fingers crossed, more reliable mainframe. There have been reports of companies dumping mainframes, but the mainframe remains a standard for banks: 25 of the world’s top banks use mainframes from IBM, according to Gartner.
Another ex-RBS IT staffer told us:
“A lot of talk has been had in the news about how these systems are too complex and bound to fail, but I guarantee that re-writing the systems and making them monolithic programs would result in some serious pain and cost in the short to medium term and in the long term you'd just end up with the same satellite batch controlled systems when the appetite to change the central system runs out.”
Mainframes are embedded thanks to their history, starting out with the S/360 in the mid 1960s giving companies access to the kinds of fast computer power that had only been available to governments and academics building machines on a project-by-project basis. Over the years, IBM has extended and upgraded the family, through S/390 to the z-series.