How Google Spanner's easing our distributed SQL database woes
No scaling limit? Do go on
Storage Architect I've been messing about with databases for a long time. I say "messing about" because I've never been a DBA, but as a systems programmer and storage administrator, I've been on the periphery of the application layer and of course I've deployed many personal databases.
I was intrigued to read about the new global distributed SQL database Google is making available in their Cloud Platform, aka Spanner. This is an ACID-compliant, geo-distributed SQL database that allegedly has no scaling limit, with practical scalability in the thousands of nodes, or more.
IBM invented everything
The challenge of building a distributed consistent SQL database isn't trivial. Looking back at my mainframe days, IBM needed Parallel Sysplex (and GDPS – Geographically Dispersed Parallel Sysplex) to achieve the feat. The core of Parallel Sysplex was the Sysplex Timer, a hardware timer that each mainframe in a Sysplex used as a single "point of truth" for time.
The point here is not whether the Sysplex Timer was accurate in absolute time terms (in fact no clock ever is), but that it was a single point of reference that each system (and in the case of databases, DB2) could use to be 100 per cent sure of the timing of I/O updates and commits. Without this, any system in a cluster can't be 100 per cent certain of the time order of updates. Relying on local clocks isn't enough. The Sysplex Timer was also used in synchronising other functions like generic resource locking (ENQs).
Google has had to put some significant engineering into Spanner, including a huge amount of resiliency improvements to their own network. Spanner uses atomic clocks and GPS to deliver something called TrueTime, Google's single "point of truth" on time, which acts as the equivalent of the Sysplex Timer. There are more details in a paper published by Google in 2012, see section 3 [PDF].
The other issue here is adhering to CAP Theorem, which says we can't have consistency (C), availability (A) or partition tolerance (P) in a distributed system, but rather can only guarantee 2 of 3. Imagine a distributed environment with an unreliable network. If the distributed nodes become partitioned (or isolated from each other), there's no way to determine who is the "owner" of any updates; a classic "split brain" scenario. Google has had to ensure that their own network is therefore as resilient as possible.
An interesting extension of the CAP Theorem is the snappily named PACELC Theorem. This describes how trade-offs (like eventual consistency) can allow some of the CAP issues to be mitigated (although they aren't specifically solved).
Google is now offering Spanner in the Google Compute Platform alongside traditional and non-relational database offerings. No doubt the platform will prove useful for customers who need the availability of a single distributed monolithic database in order to deliver to their customers.
The Architect's view
It's interesting that there are still some challenges in computing that require massive amounts of investment. Google has reinvented the Parallel Sysplex solution for systems at scale, showing that the original solution to problems such as CAP aren't trivial to solve. Incidentally, it's worth noting that although Eric Brewer gets accreditation with CAP around 1998, this problem was well understood by IBM ten years before.
For customers, Spanner offers the ability to remove some significant application consistency headaches. The only problem, perhaps, is that it also provides the ultimate lock-in, as Spanner is never likely to be offered outside of Google's own data centres. The solution for customers not willing to go all-in with Google could be CockroachDB, an open-source platform developed by Cockroach Lab. I've yet to evaluate this technology, however it's now on my list. ®