Outage: Faulty UPS at data centre housing London Internet Exchange causes grief for ISPs and telcos alike
Some of the providers have staggered back to feet
Updated One of the UK's larger data centres has suffered a major service outage affecting customers across the hosting, cloud, and telecommunications sectors.
The incident was caused by a faulty UPS system followed by a fire alarm (there was no fire) that powered down Equinix's LD8 data centre, a low latency hub that was formerly the Telecity Harbour Exchange. Located on London's Isle of Dogs, the facility caters to hundreds of clients and is positioned close to the city's financial institutions.
Equinix told The Register: "Today at 4.40am, Equinix IBX LD8, in the Docklands, London, UK, experienced a power outage. This has impacted customers who are based there. The outage may have also affected customers' network services. Equinix engineers have diagnosed the root cause of the issue as a faulty UPS (uninterrupted power supply) system and we are working with our customers to minimise the impact. We regret any inconvenience this has caused."
If you're the unlucky engineer who gets the call from the customer, the supplier is being more flexible about letting people head down to the DC (contact it first), though your usual Docklands cappuccino haunt will likely be closed, so bring a thermos (and a mask).
"Due to this incident we are allowing customers more flexible access to LD8 working within our COVID-19 restrictions including mandatory temperature checks and wearing face coverings. The safety of our employees and our customers is our highest priority."
The facility is one of the DCs that houses the London Internet Exchange (LINX), one of the world's largest with hundreds of members – including ISPs such as BT, Sky, and Virgin Media, as well as smaller outfits and content providers. These internet providers were said to be potentially affected by the breakdown, so if your connectivity has died, this may be why.
"My first alert was at 7:30am and it's still ongoing," said one reader of the LD8 outage.
UK ISP M12 Solutions, which also owns premium ultra-fast business internet brand GigaNet, speculated earlier today on its status page: "Due to the scale of this outage, and the carriers & suppliers affected... there could be knock-on impacts around the carrier ecosystem/networks.
"For instance, we are seeing some carriers report their racks in LD8 are being powered back up, and when this happens there could be increased routing table changes on their network devices that could cause our circuits to be affected that traverse their network.
"The London Internet Exchange (LINX), are currently reporting that 150 of their members are affected by this outage, to provide the sense of scale of this outage."
The status page of Exponential-e, another affected internet supplier, said: "This incident is being treated as a Major Incident both at Equinix and at Exponential-e. We have prepared for all hands on deck for when power is restored to carry out an extensive range of tests to ensure service has restored."
Worried customers took to Twitter to voice their concerns.
Completely unacceptable situation ongoing in @EquinixUK #LD8 (HEX 8/9) right now. Reports of a fire alarm, but this triggered the loss of both our A+B diverse power feeds to our main rack since 04:24. Lack of communication is abysmal. @Equinix need to sort the basics out.— Matthew Skipsey (@matthewskipsey) August 18, 2020
In accordance with safety procedures, all systems within the vicinity of the fault – from levels one to four – were powered down.
A reader told us they had received an update stating that IBX had been "evacuated" and that the "fire alarm was triggered by the failure of output static switch from Galaxy UPS system supporting levels 1, 2, 3, 4 in building 8/9 at LD8".
The update added: "This has resulted in a loss of power for multiple customers and IBX [Equinix International Business Exchange] Engineers are working to resolve the issue."
Meanwhile, users on DownDetector are reporting outages at BT. The incumbent UK telco is thought to host an exchange in the DC, but so far has not been responding to requests for comment.
The Register spoke to an Exponential-e staffer, who confirmed the facility wasn't on fire and said their customer support lines were being flooded by desperate customers eager to get their kit back up and running.
There is currently no known time frame for when operations will return to normal. Have you been affected? Drop us an email.
This reminds us of the time in July 2016 when the Telecity data centre suffered a "brief outage" that knocked 10 per cent of BT internet subscribers offline in the UK as well as a number of other providers. ®
Updated at 17:07 UK time on 18/08/20 to add
BT has been in touch to say: "A small number of BT’s Enterprise and Global customers experienced an interruption to their Ethernet services between 04:28 and 08:03 this morning due to a data centre provider’s power failure at their site. Services were restored to the affected BT customers by 8:03 this morning, before the start of standard business hours."
Updated at 17:30 on 18/08/20 to add
Giganet and M12, whose team The Reg has to commend for giving some wonderfully descriptive updates to its users during the day, updated its status page a few minutes ago, noting:
"We continue to observe good network availability since our LD8 core router has been connected to a temporary power feed from our adjacent rack (that was unaffected by today’s issue)."