The catastrophic systems failure that grounded British Airways flights for a day appears to have been caused by networking hardware failing to cope with a power surge and messaging systems failing as a result.
The Register has asked BA's press office to detail what went wrong, what equipment failed, what disaster recovery arrangements were in place and why they appear not to have worked. BA has not responded to our requests for information, so we've instead reviewed the interviews that BA CEO Alex Cruz gave to British television outlets and pieced together an account of the outage.
Cruz told several outlets that a power surge took some systems offline and that backup power systems then failed.
Speaking to Sky News, he added a little more detail about what went wrong, as follows:
On Saturday morning around 9:30 there was indeed a power surge that had a catastrophic effect over some communications hardware which eventually affected the messaging across our systems.
Tens of millions of messages every day that are shared across 200 systems across the BA network and it actually affected all of those systems across the network.
Speaking to Channel 4, Cruz said: "We were unable to restore and use some of those backup systems because they themselves could not trust the messaging that had to take place amongst them."
Cruz has insisted, in all interviews, that outsourcing was not the source of the problem as the affected infrastructure was maintained by "local" people.
Speaking to the BBC, he said: "There are no redundancies or outsourcing taking place around this particular hardware, live operational systems resilience set of infrastructure in this particular case.
"It is all locally hired, etc, resources that have been attending to the maintenance and the running of this particular infrastructure."
He went on to say that the incident is completely unrelated to redundancies among the carrier's IT staff.
But Cruz's remarks to Sky News didn't rule out that staff responsible were not BA employees.
"All the parties involved around this particular event have not been involved with any type of outsourcing in any foreign country," he said [Reg emphasis]. "They have all been local issues around a local data centre who [sic] has been managed and fixed by local resources.”
Another point Cruz has made repeatedly is that BA does not believe the incident was caused by an attack, while it has no evidence its systems were compromised or accessed by unauthorised third parties.
Cruz's watchword in all interviews was "profusely" – that's the adjective chosen to describe just how sorry he is to have inconvenienced so many travellers.
The CEO is also sending a message that the airline is just about back in the skies, with 95 per cent of flights expected to proceed as normal and two-thirds of stranded passengers now having been moved on to their destinations. Cruz has also committed to paying all required compensation.
But BA still appears to be missing a trick or two. The airline's own YouTube interview with Cruz says the airline's site features prominent links to information on how to claim compensation. All The Register could find was a generic compensation claim page for lost luggage. The version of the front page that Vulture South can view, even after setting our location to the UK, offered no information on compensation.
Cruz has also promised that passengers will never again have such an experience with BA, in part because the carrier will review the incident and figure out how to avoid a repeat.
That review promises to be fascinating, but Cruz hasn't said when it will arrive. For now, he's repeatedly said, the carrier is focused more on sorting things out for passengers it's left in the lurch. ®