Wanna curb datacenter outages? Try combating burnout with shorter shifts
If hiring more people to work fewer hours isn't appealing, you could always make a robot do it
Datacenter outages remain a perennial problem, with human error among the top contributors. Analyst outfit the Uptime Institute suggests the key to curbing these disruptions could be as simple as shortening shifts.
Uptime reported that four in ten surveyed datacenter operators experienced a major outage in the past three years as a result of human error. What's more, half of respondents of the institute's poll revealed these errors were caused by a failure to follow the correct procedures.
While Uptime acknowledges that regular training, practice, and experience can help to reduce the likelihood of outage-causing errors, one factor that's probably being overlooked is fatigue.
It's well understood that we're more likely to make mistakes when we're tired, which is bound to happen in a lengthy shift. While Uptime's datacenter staffing survey [paywalled] found that average shifts for datacenter workers ranged from eight to ten hours, this varied considerably by region.
For example, in the Asia Pacific, more than a fifth of respondents reported datacenter shifts lasting more than ten hours. By comparison, in Europe Uptime observed that shorter five- to seven-hour shifts were three times more common than in APAC. One of the reasons for this is European labor laws often have protections against night shifts lasting longer than eight to ten hours.
In the Americas, Uptime found shorter shifts were less common, which they attributed to healthcare being tied to a 40-hour work week.
Despite evidence linking shift length to fatigue and human error – Uptime cited a pair of studies by the Chinese University of Hong Kong [PDF] and the Finnish Institute of Occupational Health, both of which found positive correlation between the two – cutting back hours may be problematic for many datacenter operators.
As Uptime has previously found datacenter staff shortages remain an ongoing issue – one that's only expected to get worse as a large cohort of workers ages into retirement over the next few years. Because of this, Uptime noted many operators have little choice but to extend shifts to beyond 12 hours in order to cover for absent employees, further contributing to employee burnout.
Ironically, Uptime found that many datacenter staff often express a preference toward 12-hour shifts – either because of overtime pay, or because they allow for longer weekends.
- Ohio power plants want special tariffs on datacenters to protect regional grid
- Google now 'third-largest' in datacenter processors
- Google's €1B Finnish datacenter expansion to heat the local community
- CoreWeave plows £1B into UK HQ and datacenters as it eyes European expansion
To address these challenges, Uptime offered a few recommendations to operators, the first of which is kind of obvious: Adjust staff levels to reduce or eliminate the need for shifts to last more than 12 hours.
The group also suggested that operators should monitor overtime and rest periods, to ensure that workers aren't showing up to work tired in the first place. Finally, Uptime warned that while employees may favor longer shifts, those preferences rarely take into account the potential impact on job performance and employee health.
Uptime did warn that these changes shouldn't be made too quickly. Shortening shifts suddenly could actually have a detrimental effect on morale and result in an increase in fatigue-related errors – at least until employees adjust to their new schedules.
The staff shortages facing datacenter operators have driven some to explore the use of robots to share the heavy lifting. Following an outage at a datacenter in Australia last year, Microsoft listed openings for a hardware automation team manager to oversee the use of robotic systems throughout its facilities.
Meanwhile, Oracle is already using Boston Dynamics four-legged robodogs in its datacenters, and Meta and Jtek last year demoed a robotic server cart capable of hauling around entire server racks. ®