This article is more than 1 year old
Airline 'in talks' with Kyndryl after failed network card grounds flights
Delays and cancellations thought to have cost Aer Lingus millions
Aer Lingus says it is in talks with its IT services supplier, former IBM arm Kyndryl, after the disastrous combo of a sliced fiber optic cable and a faulty network card on the backup line caused an IT systems outage that forced the airline to cancel more than 50 flights.
The outage on September 10 disrupted plans for tens of thousands of travelers between Dublin airport and other European destinations.
Speaking about the systems outage to Ireland's Joint Oireachtas Committee on Transport, Aer Lingus CEO Lynne Embleton said damage to a fiber optic cable combined with a hardware failure on its IT contractor's backup knocked out the carrier's passenger processing system, stranding or delaying 32,000 people.
"For almost 10 hours we had no access to our core operational and customer systems. Now, what this meant was we couldn't check in. We couldn't board customers. We couldn't get access to flight information. We couldn't get access to customer bookings data. We couldn't get access to customer contact information. So this meant we had very limited ability to communicate with our customers."
After hours in the dark, the airline decided to axe 51 flights, with 21,000 customers on delayed flights, and 11,000 on cancelled flights. It had received 7,500 applications for compensation with any one application containing one or more passengers. While the airline didn't give a figure for the compensation, it is understood to be in the millions of euros.
Embleton told the committee: "We have a contract with a leading cloud services provider to host the network and the infrastructure behind our core operational and customer system. Their ISP in the UK had a major failure on that day, early Saturday morning, where there was unrelated construction work which damaged a fiber optic cable – and that cable would provide connection to our systems. The cable wasn't repaired until 5.30 in the afternoon. It was a damage that occurred just after eight o'clock in the morning."
The Aer Lingus exec pointed out that their data had been mirrored to two separate sites, a datacenter in Manchester, and the second one in Birmingham by their IT services provider, and that the lines had been replicated into both. "So it should have been more resilient than it proved to be on the day."
It is not known where the fiber optic cable was located, but the CEO told the committee the cut to the line had occurred during rail tunnel work before a separate unconnected hardware error happened on the backup line.
Apologia for the world's construction workers
The construction foreperson and their crew should not be assumed to be careless meatheads who are unbothered about danger to themselves and their workmates, a potential huge increase in build/work time, and massive delays to a construction project if they hit something they shouldn't.
Without having an opinion on what happened with Kyndryl and the rail construction (it could be that everything was clearly marked in the plans, but the steering got stuck on the forklift, or there was bad weather, or a number of other reasons), construction teams usually check the drawings for the underground utilities in the area. They then recheck the plans to see if any modifications have been filed with the city or administration. You don't normally just shove your way in with an excavator or a digger, or just scrape away with a backhoe or spade with careless abandon and then slap your forehead when you find you've started tearing into another supplier's infrastructure.
No, what happens is you gingerly dig a little trench in an area that is marked as free of cabling.
Ideally, all contractors, both public and private, would keep their paperwork up to date. This vulture has heard horror stories (again, unrelated to this story) about private companies just dropping in a line without telling anyone, going bankrupt years later, and then after the line had been let or sublet by several shell companies, no one even knows where to dig when services go dark. Food for thought.
The fault in the primary line was a severed cable and the second failure, in the backup line leading to the other DC, was due to a failed network card. The "precise reason" for the network card failure isn't clear, they confirmed, although they said the provider had confirmed the issue had never before occurred on any of the 4,000 network cards they had in service.
- Aer Lingus opts for Tegile arrays, snubs EMC in shock move
- Airbus auctions off bits from retired A380 superjumbo jet
- Viasat and Inmarsat $7.3b tie-up delayed over competition concerns
- Heart now pledges 30-seat hybrid electric commercial flights by 2028
"The fact that the site (of the damage) was a railway line meant it took them some time to get on site after the construction machinery broke through the casing," she said.
When asked how often the backup line was tested, Embleton responded: "What is common practice is to allow traffic over both, really to avoid a situation where one has been lying dormant and then, at the point it's needed, it fails, so we flip between the two to enable us to ensure both are working, which is why... to have a main failure and then a backup failure really shouldn't have happened."
She added: "Had the network card been working properly, that shouldn't have happened.
"And indeed the contract we have with our with our supplier was for this not to happen, for there to be a robust backup solution."
When asked if the provider was in breach of its contract with Aer Lingus, the airliner's corporate affairs officer Donal Moriarty told the committee the wholly owned subsidiary of International Airlines Group was in talks with its IT provider, which he identified as Kyndryl, "regarding the consequences and the impact of the outage on both us as an airline and on our customers," adding: "You'll appreciate there are constraints on what I can say."
Moriarty added that there "were other customers who were impacted – I don't have their names here – but their systems weren't impacted to the same degree because their usage and the reliance on the systems over the course of a Saturday" wasn't as extensive as that of Aer Lingus "as, aviation is 24 hours, seven days a week, so we are operating all through the weekend, which isn't the case for many other businesses."
We have asked Kyndryl for comment.
Michael O'Leary's Ryanair offered €100 "rescue flights" on the day as some passengers opted to buy flights on other airlines when it became clear they would not be able to board.
Moriarty said Aer Lingus had implemented a new system to monitor the network and replaced the failed backup card, and planned to add a "tertiary system" for further resilience. ®