Datacenter outages are costing more, $1m+ failures now common

And operators not getting any better at saving power – or watching the water their bit barns drink

Datacenter operators worldwide are largely unprepared for sustainability requirements, despite the industry anticipating new regulations in many regions. Meanwhile, outages are becoming increasingly costly, and progress on energy efficiency is stuck.

These findings come from Uptime Institute's 2022 Global Datacenter Survey, which condenses experiences of owners and operators of datacenters around the world, including some responsible for managing infrastructure at the world's largest IT organizations.

According to the report, creating a more environmentally sustainable footprint is a major challenge that many operators are now grappling with. However, it found that most operators do not currently track and report key environmental data metrics, despite an expectation among the majority (63 percent) that authorities in their region will require datacenters to publicly report environmental data within the next five years.

Sustainability is a rising concern that has rivaled resiliency among operators since 2020, according to Uptime Institute. But while 85 percent of respondents say they report their overall datacenter power use and 73 percent report efficiency metrics such as power usage effectiveness (PUE), only 37 percent say they collect data relating to carbon emissions, and only 39 percent of respondents currently report their water use.

What about water?

The number of respondents even reporting water use has actually fallen since last year, but Uptime Institute attributes this to the latest survey taking in a larger and more diverse sample than previous years, rather than water efficiency becoming less important.

Most operators that do not track water use choose not to because there is no business justification, suggesting it is a low priority for management in terms of cost, risk or environmental considerations. This is despite a growing number of authorities moving towards rejecting datacenter developments unless they are designed for minimal direct water consumption, which is likely to become an important consideration for datacenter design in future.

Uptime Institute thus recommends that all operators should put plans in place to report all carbon emissions associated with their datacenters, regardless of whether there is an immediate legal requirement, and that water consumption data is also collected.

Other findings are that operators are reporting fewer disruptive outages than before. However, there is a mixed picture as many failures are now often partial or distributed.

Uptime Institute's data shows the number of outages is increasing globally year on year, but the frequency is not expanding as fast as the global datacenter footprint. In other words, the failure rate per unit of capacity is actually falling. The report states that for 2022, 60 percent of operators surveyed had an outage in the past three years, down from 69 percent in 2021 and 78 percent in 2020.

There are also signs that the impact of some outages is decreasing, with 66 percent of operators in the survey stating that the greatest impact they had experienced from an outage in the past three years was either minimal or negligible.

But while there are fewer serious outages, those that do occur are becoming more costly. A quarter of respondents indicated that their most recent outage had cost more than $1 million in both direct and indirect costs. This is a significant increase from 2021 and continues a clear trend, according to Uptime Institute, with a further 45 percent reporting that their most recent outage cost between $100,000 and $1 million.

One reason for the increasing cost of outages is the increasing dependence on digital services and the datacenter for the day-to-day business activities of organizations. Other factors include inflation, fines, service level agreement breaches, labor costs, call-outs, and the cost of replacement equipment.

Power problems

As in previous Datacenter Survey reports, on-site power problems are identified as the biggest single cause of significant site outages. Three other common causes are cooling failures, IT system/software errors, and network issues.

Uptime Institute claims other research it has published indicates the main causes of power-related outages are uninterruptible power supply failures, with a failure of the switchover mechanism from the grid to a backup source and generator failures both less common.

Energy grid failures have not been attributed as a primary cause of outages, but the report says a slight increase in power-related failures in recent years may correlate with degrading grid reliability in some regions.

The report also found that improvements in PUE effectively stalled since 2014, having made significant gains since Uptime Institute started tracking it in 2007. The average annual PUE reported by respondents for 2022 was 1.55, meaning that a typical datacenter expends 55 percent as much energy on cooling, power distribution, and other secondary functions as is consumed by the IT equipment.

However, the rapid progress between 2007 and 2014 reflects the adoption of inexpensive efficiency measures, such as hot/cold air containment, optimized cooling control, and increased air supply temperatures, the report states. Efficiency improvements beyond these may not have been economically or technically feasible for many older datacenters.

The research also reveals air cooling still dominates in datacenters, even in new facilities, although factors are now at play that should lead to broader adoption of techniques such as direct liquid cooling, including upcoming server processors with higher thermal power requirements that may see the industry average PUE rise before there are any further improvements.

To back this up, the report reckons that rack power density is rising across datacenter segments, with 40 percent of organizations that operate facilities with capacities above 5MW saying their densities are increasing rapidly.

Uptime Institute says that for the largest facilities operating at 10 MW and above, nearly half have cabinets above 20kW, and almost one in five run some racks at over 40kW. A small but growing number of datacenters now house some cabinets above 70kW, concentrated mostly in the largest facilities.

Server lifespans are also lengthening, often exceeding the vendor recommendation of three to five years. According to the report, only 34 percent of respondents kept servers in operation for five years or longer back in 2015, but for 2022 this has grown to 52 percent.

Semiconductor supply shortages are one reason cited for this, resulting in increased delivery times for some IT hardware. However, most of the big cloud operators have previously reported that they have extended the lifespans of at least some of their servers in order to make cost savings. ®

Similar topics


Other stories you might like

Biting the hand that feeds IT © 1998–2022