On-Prem

This article is more than 1 year old

Hyperscale data centres win between their ears, not on the racks

Operating at scale is easy. Changing culture to accept and cope with failure is harder

Thu 18 May 2017 // 07:37 UTC

Organisations that hope to improve their own data centre operations by adopting the techniques used by hyperscale operators like Google or Facebook need to consider the stuff between their ears, not just the stuff on their racks, because changing data centre culture is more powerful than changing equipment.

That was the gist of a session delivered by Gartner analysts Joe Skorupa and Evan Zeng at the firm's IT Infrastructure, Operations & Data Centre Summit in Sydney on Tuesday.

“Operating at scale is not the trick,” Skorupa said. “The issue is that hyperscalers understand how to deal with risk.”

The pair argued that the culture of corporate data centres and the incentives offered to their staff mitigate against innovation. “In the enterprise we measure and pay people on mean time between failure,” Skorupa said. “The whole operating principle is to avoid risk at all cost.” Data centre teams therefore run a mile at the prospect of anything that might risk an outage and end up incapable of innovation as a result.

Hyperscalers, by contrast, accept that there will be be failures and “are better at identifying and managing risk, and recovering from failure.”

Skorupa therefore said that organisations hoping to learn from hyperscalers “can't think the way you used to think and can't measure and reward the way you used to do.” So forget about just buying some Open Compute kit and then living the good life.

He and Zeng have therefore cooked up new metrics by which to measure on-premises data centre teams, namely:

Mean time to respond for a new service
Mean time to discover a failure
Mean time to repair

The pair also advocate learning how to understand the impact of failure, or the “blast radius of an outage.” Doing so means data centre teams can think differently about the kind of changes they are willing to entertain.

“They won't do 100 changes at once because the blast radius is big,” Zeng said. “But they will learn to design the data centre for resilience and for frequent changes that have small blast radii.”

Which is not to say that kit doesn't matter. The pair advocated standardisation whenever possible. Skorupa also singled out Dell's dis-aggregated switches, which can run any of five network operating systems, as the kind of hyperscale-driven innovation that on-premises operators will do well to consider. Chef, Puppet and Ansible were name-checked for their utility facilitating automation, which the pair pronounced essential “if there is any chance of doing something twice.”

But the pair also declared that “the biggest opportunity is changing how people think and react”. ®

Topics

Special Features

Vendor Voice

Resources

On-Prem

Hyperscale data centres win between their ears, not on the racks

Operating at scale is easy. Changing culture to accept and cope with failure is harder

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

Industrial robots make people feel worse about jobs and themselves

Healthcare AI won't take jobs – it'll make nursing easier, says process automation founder

Cloudflare says it has automated empathy to avoid fixing flaky hardware too often

Reducing the cloud security overhead

ServiceNow goes to Washington DC, with a suitcase full of AI

Woo-hoo, UK ahead of Europe in this at least – enterprise IT automation

South Korea opens the door for robots to roam among pedestrians

Let's give these quadruped robot dogs next-gen XM7 rifles, says US Army

Chipotle welcomes you to the age of robot guacamole

LLMs appear to reason by analogy, a cornerstone of human thinking

Rise of the machines is slower than expected says World Economic Forum

Watchdog calls for automatic braking to be standard in cars

About Us

Our Websites

Your Privacy