Google’s massively global infrastructure now employs a proprietary system that automatically moves and replicates loads between its mega data centers when traffic and hardware issues arise.
The distributed technology was first hinted at — in classically coy Google fashion — during a conference this summer, and Google fellow Jeff Dean has now confirmed its existence in a presentation (PDF) delivered at a symposium earlier this month.
The platform is known as Spanner. Dean’s presentation calls it a “storage and computation system that spans all our data centers [and that] automatically moves and adds replicas of data and computation based on constraints and usage patterns.” This includes constraints related to bandwidth, packet loss, power, resources, and “failure modes”.
Dean speaks of an “automated allocation of resources across [Google’s] entire fleet of machines” — and that's quite a fleet. Google now has at least 36 data centers across the globe — though a handful may still be under construction. And as Data Center Knowledge recently noticed, the goal is to span a far larger fleet.
According to Dean’s presentation, Google is intent on scaling Spanner to between one million and 10 million servers, encompassing 10 trillion (1013) directories and a quintillion (1018) bytes of storage. And all this would be spread across “100s to 1000s” of locations around the world.
Imagine that. A single corporation housing an exabyte of the world's data across thousands of custom-built data centers.
Google’s 10-million-server vision
Dean declined to discuss the presentation with The Reg. And Google’s PR arm has yet to respond to specific questions about the Spanner setup. But Google senior manager of engineering and architecture Vijay Gill alluded to the technology during an appearance at the cloud-happy Structure 09 mini-conference in San Francisco earlier this year.