Refrigeration failure at Hong Kong datacenter takes out Alibaba Cloud and others
Unlike Oracle and Google, this time there's no heatwave to blame
Alibaba Cloud lost its cool over the weekend after a refrigeration failure rendered several services unavailable at one of the cloud provider's Hong Kong availability zones.
The outage affected several large customers, including crypto exchange OKX, which showed empty customer balances, and several websites and apps run by Macau's monetary authority, the South China Morning Post reported early Monday.
According to Alibaba Cloud, the source of the trouble was initially described as an "equipment anomaly" at its Hong Kong IDC Zone C, that resulted in the failure of its Elastic Compute Service (ECS), cloud database, storage, and network products. An investigation later traced the outage back to a failed refrigeration unit at a data center owned by PCCW, a Hong Kong-based information and communications service provider (and Lenovo collaborator).
"Alibaba Cloud engineers are working closely with engineers from PCCW to expedite the repair process, and some refrigeration equipment has already been restored," Alibaba said on its status page, early Sunday morning.
Roughly three hours after Alibaba began tracking the outage, the company announced that repairs to the refrigeration equipment at the PCCW datacenter had been completed, and engineers were working to bring services back online.
- Lessons to be learned from Google and Oracle's datacenter heatstroke
- Deploying disaster-proof apps may be easier than you think
- Arm processor technology caught up in US chip war with China
- Equinix to cut costs by cranking up the heat in its datacenters
"For customers who have been affected by this anomaly, we will make compensation accordingly based on the product/service agreements with customers," Alibaba said on its status page. "We sincerely apologize for the inconvenience caused."
As of Monday, Alibaba's status page reported that all systems were back online.
PCCW appears not to have posted information about the outage. The Register has approached the company for comment. We also note that Hong Kong's weather is currently mild: overnight temperatures of 12°C and daytime maximums of around 17°C should not stress a data center.
Alibaba Cloud is far from the only cloud provider to lose its cool this year. Amid a record heatwave in July, Google and Oracle both suffered cooling-related failures at two datacenters in the UK.
The failures allowed temperatures within the facilities to climb so high that operators were forced to shutdown customer systems and workloads to prevent damage to the hardware and limit data loss.
All three outages underscore the fact that while application resiliency is a major selling point of the cloud, the impetus is often on the customer to implement additional redundancy. However, as Uptime Institute recently found, the vast majority of customers incorrectly believe that application resilience is the cloud provider's responsibility. ®