Talk about a Blue Monday: OVH outlines recovery plan as French data centres smoulder
Servers affected include those used by ESA, Villarreal football club, and some misused by malware miscreants
Updated Customers of European cloud hosting provider OVH have been told it plans to restart three data centres on its French campus in Strasbourg next week, following a massive fire on site this morning that destroyed one bit barn.
The SBG1 and SBG4 data centres are scheduled to reopen by Monday 15 March and the SBG3 DC by Friday next week. SBG2 was wiped out by the blaze but fortunately no one was hurt in the incident.
The fire caused serious disruption across European websites, with, according to Netcraft, "3.6 million websites across 464,000 distinct domains... taken offline."
Power infrastructure supplying the DCs on the site appears to have been damaged, with OVH founder and chairman Octave Klaba outlining a recovery plan that he estimated would take at least the next seven days to implement.
Klaba said the company had stock of "new servers, pcc, pci ready to be delivered" to "all impacted customers... of course for free" and that it would "add 10,000 servers in the next three to four weeks."
The blaze destroyed SBG2, a five-storey, 500m2 data centre, and damaged servers located at SBG1, also at the site, while firefighters protected SBG3 and 4. The web hosting provider has 15 DCs across Europe, but four in the Strasbourg location, with a fifth under construction.
Outage: Faulty UPS at data centre housing London Internet Exchange causes grief for ISPs and telcos alikeREAD MORE
At the time of publication, OVH was still taking inventory across its Strasbourg DCs after firefighters quelled the flames – as we reported earlier today. The firm said the fire broke out in a room in SGB2 at 00:47 CET, and the building had been destroyed completely by 04:09 CET, while servers in SGB1 – also at the Rue du Bassin de l'Industrie at the Port du Rhin, an industrial area on the River Rhine – were also affected. All four of the DCs were taken offline.
"A part of SBG1 is destroyed," said the tweet from Klaba earlier today as he recommended that customers activate disaster plans, adding: "The whole site has been isolated, which impacts all services in SBG1-4."
OVHcloud's services status page – at the time of publication – showed major grief across the board, with the firm pleading with customers not to "request a reset."
Klaba, the boss of Europe's biggest web hosting provider outside of the Big Three (AWS, Microsoft Azure, Google Cloud) said recovery would include "rebuilding 20KV for SBG3"; "rebuilding 240V in SBG1/SBG4"; verifying routers/switches in one network room; and rebuilding a second network room in another. "The network room in SBG1 is OK," he confirmed in another tweet.
How the fire broke out is unclear, although local papers say it took 115 firefighters six hours to put out. The OVH status page does not show which racks are specifically affected in the other buildings. The data in SBG2 – the building which spent six hours in flames last night – is almost certainly entirely lost.
The fire comes three years after the group embarked on a "€4m-€5m investment plan in the wake of a major outage that left three of the Strasbourg data centres – SBG1, SBG2 and SBG4 – without power for 3.5 hours in November 2017.
Klaba himself said at the time of the 2017 outage that it was partly because "SBG's power grid inherited all the design flaws that were the result of the small ambitions initially expected for that location."
At the time of the 2017 outage, "SBG2's power grid" was built atop "SBG1's power grid instead of making them independent of each other".
The Reg has asked OVH how far along it is with that upgrade, which was said to involve "de-installation of maritime containers" (shipping containers) and major electrical work.
Gartner senior analyst Tiny Haynes said of the conflagration: "This is a very unusual event. The last event of this nature I can recollect was back in July 2012 at the Shaw Communications Data Center in Calgary.
"For a fire to destroy an entire data centre would raise questions around the operational effectiveness of the fire detection and suppression systems in place. Without knowing anything official, I would guess this is result of a UPS malfunction.
"I have personally seen the results of such explosions in another data centre in my career which was thankfully not fully operational at the time. This is the risk of the modular or campus approach to building data centres.
OVH data centre destroyed by fire in Strasbourg – all services unavailableREAD MORE
"The total power and cooling capacity of a campus might not be envisaged upon first commission of [the] data centre, resulting in additional challenges to power and cooling as the campus grows. We have seen the same in Harbour Exchange in London Docklands, a building that was never designed to be a data centre, resulting in design compromises that haven't always worked out."
Speaking about how customers might protect themselves in the event of such incidents generally, Haynes said: "It is therefore essential that businesses ensure that not just data centre infrastructure is audited to TIA 942 Tier III standard, but also their processes for change management, incident response and risk management are fully documented and auditable."
Among those affected by today's blaze was the European Space Agency's Data and Information Access Services ONDA project – which allows users to host geospatial data and build apps in the cloud.
ONDA is led by Serco Italia, while OVH looks after the cloud infrastructure of the project, which makes 10PB of non-pre-structured data from the Copernicus Earth observation project available to developers via a public cloud.
The project said today that all of its services had been "temporarily disabled... following a major incident this morning on OVH Cloud infrastructure in Strasbourg."
Other clients included the French government – whose data.gouv.Fr was down earlier today but has since been recovered – cryptocurrency exchange Deribit, and infosec threat intelligence company Bad Packets, which tracks DDoS botnets and network abuse among other services. Troy Mursch, chief research officer at Bad Packets, earlier tweeted that "some data may be lost", however he has since clarified to The Reg he was not referring to that of his own org*.
However, it seems those on the wrong side of infosec were affected by the blaze. Kasperky's director of research and the global analysis team Costin Raiu claimed this morning that malware-flinging miscreants abusing other people's hardware were also hit: "Out of the 140 known C2 servers we are tracking at OVH that are used by APT and sophisticated crime groups, approximately 64% are still online. The affected 36% include several APTs: Charming Kitten, APT39, Bahamut and OceanLotus."
In a statement sent to The Reg, Raiu added:
"In the top of ISPs hosting Command and control infrastructure, OVH is in the 9th position, according to our tracking data. Overall, they are hosting less than 2% of all the C2s used by APTs and sophisticated crime groups, way behind other hosts such as, CHOOPA."
He said: "I believe this unfortunate incident will have a minimal impact on these groups operations; I'm also taking into account that most sophisticated malware has several C2s configured, especially to avoid take-downs and other risks. We're happy to see nobody was hurt in the fire and hope OVH and their customers manage to recover quickly from the disaster."
Other big OVH clients affected by the fire included European People's University, Strasbourg Airport, the city of Cherbourg, the Peugeot-Sochaux works council, the Meteor brewery in Alsace, the Clermont-Ferrand rugby club, and even Spanish professional football club Villarreal.
Marketing platform Paper.li also took a tumble, telling users: "There has been a major issue at the data center of our service provider which has resulted in a service outage" – and that it sent its "support to our hosting provider... as they endure the aftermath of a fire in one of their buildings."
Users with dead websites include Apple accessory sellers CoverStyle, while others affected included (briefly) free chess server bods at Lichess.org – who lost 24 hours of moves but clearly had a great DR plan – cryptocurrency exchange Deribit, telecom company AFR-IX, encryption utility VeraCrypt – whose git and site were both affected and which referred users to Sourceforge and GitHub "for downloads and source code access while we work on service restoration." And Image board pr0gramm has been telling website visitors "Data centre is on fire :(" - but assured its users that the last backup took place two hours before the blaze.
Not everyone was so lucky, however:
Noooo!!!! F4ck!!!— Rassegna Stampa (@StampaNews) March 10, 2021
Me like the most part of clients does not have any disaster recovery plan... My server is in Rack 70C09 - how to see if it is safe?
European cloud contender
OVH has 27 data centres across Europe, North America, and Asia.
Along with AWS, Microsoft Azure, and Google Cloud, OVH is one of the largest web hosting providers, and widely seen as Europe's great hope for native hosting, a place to store corporate data that's not safe in American companies' hands.
Just four days ago, OVH publicly announced its intentions to make a bid for French streaming startup Shadow, also known as Blade, with founder Klaba musing about developing a European alternative to Office365 and Gsuite. On Monday this week the cloud computing provider was talking about an IPO, with a spokeswoman telling Reuters it had "started the process for a potential listing in Paris."
We have asked OVH for comment.
Gartner's Haynes advised customers of cloud giants: "Some of [hyperscalers'] architectures can cover multiple, geographically diverse DCs in the larger regions. It is difficult to get the full transparency of DC infrastructure from these providers, so for highly sensitive applications, multiple availability zones should be used." ®
Updated at 18:43 UTC on 11 March 2021 to add:
* Mursch clarified to The Reg that "no production servers used by Bad Packets were impacted by the OVH data centre fire, nor was any data lost on our side. Additionally, Bad Packets CTI capabilities are not concentrated to one service provider." He added that "no Bad Packets data or CTI coverage was lost due to the OVH data centre fire."
Updated at 16:41 on 17/03/21 to add:
The European Space Agency told The Register: "Copernicus operations handled by ESA make use of the OVH Strasbourg Data Center as a cloud service provider, among others.
"Following the fire incident on 10 March, a few operational services that rely on OVH were impacted directly and indirectly owing to the unavailability of support service and operational cascading effects.
"Essential operations were maintained, the fire incident had no impact on Sentinel satellite operations, on the acquisition, on the production and on the archiving services that are operated nominally and infrastructure used by Copernicus services was not damaged by the fire." It conceded however, that: "Some fresh data publication for Sentinel-1 and Sentinel-3 were temporarily interrupted. However, operations for both are currently working nominally and backup infrastructure is in place thanks to the flexibility of the Copernicus Ground Segment."
ESA added: "It is expected to restore full services within the next two weeks."