NATS ignored previous recommendations – IT cock-up report

Since privatisation, investment ‘somewhat less than had been planned overall’

The National Air Traffic Services failed to implement recommendations to mitigate IT risks, according to an independent report into the mega systems failure in December which left thousands of passengers stranded in Blighty.

In December 2014, 120 flights were cancelled and 500 delayed for 45 minutes, affecting 10,000 passengers in total. An interim report in February pointed to a failure in both System Flight Server (SFS) channels as the cause.

According to the NATS System Failure 12 December 2014: Final Report (PDF), previous recommendations from a major outage only a year earlier had not been addressed by the body.

These included a review of the industry’s ability to respond to service failures and identify required changes to NATS’ crisis management capabilities, resilience of systems, procedures and service continuity plans.

It had also suggested better interactions with aviation safety body – the Eurocontrol Network Manager – during a crisis.

"Despite being assessed by NATS as complete before 12 December, it is evident that neither of these recommendations had been addressed fully," said the report.

In an apparently unrelated move, last week NATS chief executive Richard Deakin announced he was stepping down. Deakin had been at the helm for five years.

Former business secretary Vince Cable had accused NATS of skimping on IT investment and leaving itself vulnerable due to its “ancient” technology. However, Deakin denied the body had under-invested in its tech.

However, the report also acknowledged that in the 12 years since the body was privatised, the company "has invested somewhat less than had been planned overall".

The report made a number of recommendations, included a suggestion that NATS should consider introducing a formal Error Management System (EMS) to capture anomalous occurrences that fall below the safety event threshold.

Responding to the report, NATS said: "We agree with the panel that it is unrealistic to expect that complex systems such as ours will never fail."

"To mitigate this we will continue to invest in making sure that failures are extremely rare and the impact of such failures on the travelling public are minimised as far as reasonably practical."

"And we are pleased that the panel recognised the continued programme of investment to accelerate the deployment of our next generation of systems." ®

Similar topics

Broader topics

Other stories you might like

  • 5G C-band rollout at US airports slowed over radio altimeter safety fears
    Well, they did say from July, now they really mean from July 2023

    America's aviation watchdog has said the rollout of 5G C-band coverage near US airports won't fully start until next year, delaying some travelers' access to better cellular broadband at crowded terminals.

    Acting FAA Administrator Billy Nolen said in a statement this month that its discussions with wireless carriers "have identified a path that will continue to enable aviation and 5G C-band wireless to safely co-exist."

    5G C-band operates between 3.7-3.98GHz, near the 4.2-4.4GHz band used by radio altimeters that are jolly useful for landing planes in limited visibility. There is or was a fear that these cellular signals, such as from cell towers close to airports, could bleed into the frequencies used by aircraft and cause radio altimeters to display an incorrect reading. C-band technology, which promises faster mobile broadband, was supposed to roll out nationwide on Verizon, AT&T and T-Mobile US's networks, but some deployments have been paused near airports due to these concerns. 

    Continue reading
  • IT downtime not itself going down, power failures most common cause
    2022 in a nutshell: Missing SLAs, failing to meet customer expectations

    Infrastructure operators are struggling to reduce the rate of IT outages despite improving technology and strong investment in this area.

    The Uptime Institute's 2022 Outage Analysis Report says that progress toward reducing downtime has been mixed. Investment in cloud technologies and distributed resiliency has helped to reduce the impact of site-level failures, for example, but has also added complexity. A growing number of incidents are being attributed to network, software or systems issues because of this intricacy.

    The authors make it clear that critical IT systems are far more reliable than they once were, thanks to many decades of improvement. However, data covering 2021 and 2022 indicates that unscheduled downtime is continuing at a rate that is not significantly reduced from previous years.

    Continue reading
  • Teeth marks yield clue to widespread internet outage in Canada
    The chompers belonged to a beaver and offer a parable on the risks of a shared physical layer

    Here:s a novel cause for an internet outage: a beaver.

    This story comes from Canada, where CTV News Vancouver yesterday reported that Canadian power company BC Hydro investigated the cause of a June 7 outage that "left many residents of north-western British Columbia without internet, landline and cellular service for more than eight hours."

    That investigation found tooth marks at the base of a tree that fell across BC Hydro wires. Canadian mobile network operator shares the poles BC Hydro uses, so its optical fibre came down with the electrical wires.

    Continue reading

Biting the hand that feeds IT © 1998–2022