Updated A report into the IT meltdown at TSB has suggested the British bank did not carry out rigorous enough testing and that the problems went beyond previously reported middleware issues.
The chaos at the bank, a subsidiary of the Spanish Sabadell Group, saw many customers unable to access services for a week at the end of April after the bank bodged a long-planned migration off its former parent firm Lloyds Banking Group's systems. The IT issues were compounded by a wave of scams and an underwhelming response from execs.
Amid the crisis, the bank hired IBM in a systems integration role to identify and resolve the problems – which the bank's CEO Paul Pester told MPs was due to issues with its middleware systems.
Big Blue produced a short presentation for the bank four days after being appointed, offering a "preliminary work plan with very early hypotheses" – and the Treasury Committee has today published the slide deck.
In it, IBM suggested that the bank's testing was not up to scratch, saying it "has not seen evidence of the application of a rigorous set of go-live criteria to prove production readiness".
Emphasising the scale and complexity of the project, IBM said that a firm would need "longer than normal to prove the platform through incremental customer take-on to observe and mitigate any operational risks" – and warned that such projects bring a broad range of hard-to-diagnose technical and functional problems.
"To address this risk profile, IBM would expect world class design rigour, test discipline, comprehensive operational proving, cut-over trial runs and operational support set-up," it said.
However, a set of bullet points suggest that this was not the case – or that TSB was not able to demonstrate this to IBM.
"Performance testing did not provide the required evidence of capacity and the lack of active-active test environments have materialised risk due to issues with global load balancing (GLB) across data centres," IBM stated.
It said that a "limited number of services" – including mortgage origination and ATM and head office functions – had been launched on the new platform and a broader set of services to about 2,000 TSB partners.
The integrator added that it "has not seen evidence of technical information available to TSB", such as architectures, configuration and design documents, test outcomes or monitoring information.
A statement in italics at the bottom of this slide in an apparent attempt to contrast the situation at TSB with IBM's own work:
In a similar situation when IBM partnered with a financial organisation to migrate to a new core banking platform, multiple trial migrations were conducted, rolled back and then remediated prior to launch.
The production launch was done over a longer period, initially open to programme members only, then staff, then targeted customer groups, before full launch to new customers and subsequent migration.
UK 'meltdown' bank TSB's owner: Our IT migration was a 'success'READ MORE
The assessment – or, rather, its release into the public domain – adds to the pressure Pester is facing to stand down as CEO, as he told MPs that testing had been “extensive” three days after TSB had been handed the IBM report.
Pester also told the MPs that the issues at the bank were with middleware – despite the fact the IBM report outlined problems with custom and package applications and the network; although it did back-up TSB's statements that the underlying infrastructure was functional.
The report – compiled about a week after the failed go-live following two days of planned downtime for the migration – also set out recommendations for initial actions, advising the bank to focus on stability in the short term and ease pressure on some systems.
This includes a recommendation to throttle connections to try to limit inbound connections to "serve a proportion of customers more consistently", although it acknowledged this would have a bigger impact on internet and mobile users.
IBM also told TSB to prioritise telephony and branch channels, noting that large volumes of internet and mobile banking was "causing additional stress on systems". By getting the other two channels working properly, they could help customers and reduce complaints, IBM said.
It's not clear from the document how many of these recommendations were acted upon – for instance, the bank did say it was "limiting access" to the app and online banking, but this happened before IBM made the presentation.
However, TSB confirmed to The Register that it had not implemented one eyebrow-raising suggestion: to reduce the number of times Actimize – anti-fraud and money laundering software – was invoked in user journeys. IBM's report said this would "create more capacity", but might create "additional risk of fraud".
In a canned statement, TSB said: "The IBM document contained a preliminary work plan with very early hypotheses based on observations to date, that were produced after only three days of engagement with TSB.
"To present this document as a clear view on what went wrong wouldn't be a fair reflection. Similarly it isn't a fair reflection of what actions may or may not subsequently have been taken." ®
Updated to add
Three hours after issuing the above statement, the PR machine at TSB – possibly having realised just what sort of light the IBM report cast on the bank – cranked it up a notch, and got back in touch with an updated missive that aimed to downplay the report it commissioned from the firm it hired to help it deal with the meltdown:
The IBM document contained a preliminary work plan with very early hypotheses based on observations to date, that were produced after only three days of engagement with TSB, almost eight weeks ago. The content is therefore now very much out of date.
The hypotheses were not final, nor were they a validated view of what went wrong or of the actions that have subsequently been taken. Without this context, this document could be misinterpreted to the detriment of TSB's customers.