NASA has explained in detail how its New Horizons deep-space probe suffered a computer meltdown just as it was moving in towards its destination: the tiny frost-world Pluto on the fringes of the Solar System.
At 2pm Eastern Time on Saturday July 4 – Independence Day in the US – contact with the spacecraft was lost. The ground team immediately called out an alert and fortunately was able to reestablish contact with the backup computer within a few hours.
So what went wrong? According to New Horizons project manager Glen Fountain, the problem was simple: processor overload. The probe was preparing for the most intensive session of its nearly 10-year mission and the main computer just couldn't take it.
The probe is now around eight million miles from Pluto. Over the next two weeks it will operate at full power to ensure its instruments can harvest all information possible as it whizzes by at 32,500 miles per hour.
To prepare for the final days of its mission, the probe was doing two things at once. First, it was taking the scientific data it has already harvested, compressing it, and writing it to a portion of its 128GBit storage (two 8GB solid-state recorders). At the same time the instrument command sequence for the flyby was being uploaded.
The combined workload slightly exceeded the processor's capabilities, and triggered a watchdog feature designed to prevent the spacecraft's software from crashing. This watchdog switched the main computer system over to the backup computer, while putting the main system into sleep mode as a safety measure.
The processor is a Synova Mongoose-V: a 12MHz MIPS R3000 CPU hardened against radiation. The R3000 is a 32-bit chip that's pretty similar to the one used in the original 1994-era Sony PlayStation among many other devices.
"I got a call at 2pm on Saturday and was back at lab within 20 minutes," Fountain said. "Engineers were already on the job and suggested looking for the backup signal from the probe, which we picked up. It took an hour for the spacecraft to handle the transition [back to the main computer], it was right on the mark and the recovery plan was executed."
Science readings lost
There then came the job of working out exactly what was going on and testing it against the duplicate probe computer system housed in the testbed here on Earth. The team worked through the night on Saturday and all day Sunday until they were confident they knew exactly what happened.
This was helped enormously by the preparations NASA had made for just such an eventuality. The team had already gamed out 249 contingency plans for things that could go wrong with the probe and, while this scenario wasn't precisely prepared for, there was enough overlap with what had been tested that the fault was relatively easy to nail down and correct.
"We lost some of Saturday's science readings, then all of Sunday and Monday's science," said Alan Stern, New Horizons principal investigator. "The command decision I made, and the team agreed with, was that it is more important to focus on getting ready for flyby than to collect science from eight or nine million miles out."
Stern estimates that less than 1 per cent of the science returns from the probe were lost, and none of it was critical data compared to what will be collected as the probe gets closer to Pluto.
New Horizons is programmed up to make sure the next nine days work as planned. If the main computer fails again the probe will reboot (a process that takes around seven minutes) and the computer will carry on taking and recording measurements on automatic pilot.
In a more serious scenario the team also has a backup plan. If the signal between Earth and the probe causes any concern a "slam' code is pushed out, causing the probe to move directly to collect instrument readings and send the data back to Earth.
As the probe flies past Pluto and its moons on July 14, the data connection will be boosted as much as possible beyond its standard 1Kbps transfer rate to feed back the enormous amount of information harvested. It will take over a year to send it all back to Earth, but as Stern put it:
"You've got to be into delayed gratification if you are on this mission." ®