A previously-unknown software flaw in a widely-deployed General Electric energy management system contributed to the devastating scope of the August 14th northeastern U.S. blackout, industry officials revealed this week.
The bug in GE Energy's XA/21 system was discovered in an intensive code audit conducted by GE and a contractor in the weeks following the blackout, according to FirstEnergy Corp., the Ohio utility where investigators say the blackout began. "It had never evidenced itself until that day," said spokesman Ralph DiNicola. "This fault was so deeply embedded, it took them weeks of pouring through millions of lines of code and data to find it."
The flaw was responsible for the alarm system failure at FirstEnergy's Akron, Ohio control center that was noted in a November report from the U.S.-Canadian task force investigating the blackout. The report blamed the then-unexplained computer failure for retarding FirstEnergy's ability to respond to events that lead to the outage, when quick action might have limited the blackout's spread.
"Power system operators rely heavily on audible and on-screen alarms, plus alarm logs, to reveal any significant changes in their system's conditions," the report noted. FirstEnergy's operators "were working under a significant handicap without these tools. However, they were in further jeopardy because they did not know that they were operating without alarms, so that they did not realize that system conditions were changing."
The cascading blackout eventually cut off electricity to 50 million people in eight states and Canada.
The blackout occurred at a time when the Blaster computer worm was wreaking havoc across the Internet. The timing triggered some speculation that the virus may have played a role in the outage -- a theory that gained credence after SecurityFocus reported that two systems at a nuclear power plant operated by FirstEnergy had been impacted by the Slammer worm earlier in the year.
Instead, the XA/21 bug was triggered by a unique combination of events and alarm conditions on the equipment it was monitoring, DiNicola said. When a backup server kicked-in, it also failed, unable to handle the accumulation of unprocessed events that had queued up since the main system's failure. Because the system failed silently, FirstEnergy's operators were unaware for over an hour that they were looking at outdated information on the status of their portion of the power grid, according to the November report.
The root cause of the outage was linked to a variety of factors, including FirstEnergy's failure to trim back trees encroaching on high-voltage power lines. FirstEnergy says its problems were some of many issues destabilizing power flow in the northeast that day, and that its role in the outage is overstated in the interim report.
On Tuesday, the North American Electric Reliability Council (NERC), the industry group responsible for preventing blackouts in the U.S. and Canada, approved a raft of directives to utility companies aimed at preventing a recurrence of the outage. One of them gives FirstEnergy a June 30th deadline to install any known patches for its XA/21 system.
FirstEnergy says it already patched the blackout bug last fall, when GE made a fix available, and is in the process of replacing the XA/21 with a competing system -- a changeover that was planned before the blackout.
NERC spokesperson Ellen Vancko said the organization would release a more comprehensive list of recommendations next month that would likely instruct all U.S. and Canadian electric companies using GE's XA/21 system to install the patch.
"That blackout report will go into much greater detail and will more broadly address the entire industry, whereas this particular report addressed the specific actors involved in the blackout, as well as some specific actions NERC had to take," Vancko said.
GE Energy declined repeated requests for comment on the bug.