Updated Brit hosting provider Claranet found itself resetting the "29 days without incident" sign this morning as "connectivity issues" felled customer emails and websites all over again.
Problems were first acknowledged at 10:23 BST, when the company's Twitter presence winced: "We are experiencing connectivity issues in one of our data centre suites and our engineers are investigating."
Not even a month has passed since Claranet last had to apologise for an outage. On 21 August, "an issue with services hosted in our Hoddesdon data centre" prompted that special spittle-flecked rage customers reserve for social media.
Suffice to say they aren't happy today either.
how does a major Data centre not have redundancy?! this is the 2nd time in 3 months we have suffered total system lost due to the incompetence at Claranet.— Laurence pitts (@LaurencePitts) September 19, 2019
Others commented that their business is now "at a standstill" and pointed out that perhaps two major blackouts in two months isn't a good look.
And like last time, the issue has knocked out not just Claranet's status page, which will be reassuring for clients, but the entirety of its SOHO subsidiary's website too, as shown in the screenshots below taken around 15:20 BST.
Moments later, Claranet's SOHO page spluttered back to life, claiming an email server issue to be "fixed" – which is great, but also means it has been down for at least five hours. Last time was slightly less painful at two hours of darkness. Claranet's overall status page eventually started redirecting to the "Contact us" site.
As the hours ticked past, the company attempted to keep users in the loop. At 11:30 BST, "cooling" had been "restored", which sounds ominously like a major failure at a data centre, and Claranet said it was "now working to bring services back online in the safest and most effective way".
Welcome to Hollywood, Claranet-style: You've (not) got mail, or hosted sites for that matterREAD MORE
Customers shrugged, saying they didn't know what that meant and would just like their services to come back online now, please, thanks very much.
At 12:23 BST, cooling "remained stable". Phew. And at 14:29 on the UK clock, engineers were said to be "completing further health checks on the storage platform with storage vendors. Once the storage platform has passed health checks, services will be brought back online."
Meanwhile, customers pleaded for some sort of time frame as to when that might be.
El Reg had a whack at its Service Desk – and a recorded message revealed that it's the very same data centre that fell over in August. So whatever they did last time, um, it hasn't worked particularly well.
The Register contacted Claranet to find out just what in tarnation is going on and boy, oh boy.
We experienced a sudden and prolonged loss of cooling at our Hoddesdon data centre and had to immediately remove the power to large sections of our equipment to protect overheating and damage.
Cooling and storage arrays are now restored and the arrays are undergoing automated but important consistency checks.
We anticipate these checks to be completed within a six-hour period from now, due to the volume of data that we hold.
The spokesperson added that more information would be available tomorrow.
So, Claranet customers, if you haven't already, go home. You won't be getting much work done today. ®
Updated 16:13 UTC 19/09/19
Claranet's rep updated us shortly after the piece was published. Here's how things are looking now:
- All arrays are now online and health checks completed
- We have an HP engineer to assist with final checks due onsite, ETA 18:30
- We will now commence power on of servers for admin clusters
- Then will start on customer clusters
In terms of time frames:
- We would hope admins cluster & VMs up within the hour and start on customers as soon as that is done
- So would hope customers are up within 2 hours, could be less
Updated 08:31 UTC 20/09/19
"Services have remained stable overnight and Claranet's support teams have been working with a small subset of customers to resolve their isolated issues. We would request that if you are still experiencing a service impact that you call the Service Desk [0330 390 0500] for assistance."