Updated HPE's crack repair squad has laboured for over four days to replace kit at Australia's Taxation Office, with no guarantee that the Office's online services would be back online come Monday.
HPE and the Australian Taxation Office's (ATO's) troubles started in mid-December 2016, when some ATO services went offline The ATO named HPE 3Par storage units as the source of the problem, which resulted in data loss, thankfully not of taxpayer data. But taxpayers were inconvenienced because a range of online services went down with the HP kit.
HPE eventually came out of the woodwork to say the problem probably wouldn't manifest with other 3PAR rigs and the services impacted by the outage limped back into service, just in time for Australia to disappear to the beach for most of January.
Which is what The Register did, missing news that the outages meant not all ATO services operated at full-throttle between Christmas and last week. We also missed that a couple of weekends were dedicated to planned outages that gave the ATO time to conduct more repairs.
Things seem to have taken a turn for the worse, hoewver, as on Thursday February 2nd ATO's Systems Update page was forced to post anew, revealing that “The ATO is experiencing issues relating to the hardware faults that occurred in December. We are replacing the affected hardware, but this process will take some time.” Again, a swag of online services used by tax agents and businesses went down.
On the morning of the 3rd, the ATO told Australia “Specialist ATO and HPE technicians have worked through the night to restore our systems and online services. While there has been significant progress on their restoration plan, the process is highly complex.”
By the afternoon of the 3rd, Australia learned that “Due to an unforseen [sic] complexity in the system restoration process, our services are unlikely to be available before close of business today.”
Come Saturday the fourth, we learned that “HPE, with the support of our ATO technicians, are continuing work over the weekend to restore our systems.”
“At this stage we do not expect services to be available tomorrow, Sunday 5 February. However we will update you tomorrow if these circumstances change.”
We also learned that the ATO has “commissioned a new Storage Area Network (SAN) which has arrived on ATO premises. This new system will contain vast amounts of ATO data and it is currently being configured to provide more reliability and stability. The nature and scope of the ATO’s SAN means that this process of replacing the affected hardware will take some time.”
How much time? Enough time that HPE's prediction that a Sunday restoration would be optimistic proved correct. We know this because on Sunday afternoon we learned that “Good progress has been made as we work towards having services available for clients on Monday. This will however be subject to ongoing testing of the integrity and stability of the system.”
The ATO has hired PwC to review the incident, “because of their specific expertise with the ICT storage that is at the centre of the incident.”
That report is due in March and The Register will swoop on it once released.
For now the ATO has given itself three priorities.
- First, making our services available to our key impacted stakeholders – tax practitioners, the superannuation industry and software developers – and the community more broadly;
- Second, building system resilience to best ensure the stability of our services to the community; and
- Third, increasing the capacity of our systems to deliver the services the community expects.
The Register imagines the Office's fourth priority could be “Activate every penalty clause in our contract with HPE and instruct our lawyers to exercise extreme prejudice.” ®
Updated to add
At 6:30AM on Monday February 6th, the ATO announced " most of our systems are back up and running, with core services used by our clients, including the Tax Agent, Business and BAS Agent Portals, ATO Online services, and Standard Business Reporting (SBR) services now available."
Not all services have been restored and the Office warned that "Our clients may experience some slowness as further work is undertaken to improve the overall performance of our systems."
"Our focus will now turn to building system resilience to best ensure the stability of our services to the community," the Office says.