Google Spanner in the NewSQL works?
Don't decommission that MySQL server – yet
The commercial release by Google of its Spanner database as a public beta last month came as both a pleasant surprise and a wake-up call: perhaps the end of NoSQL databases is in sight.
Spanner represents the pinnacle of the so-called NewSQL movement, which is a rejection of NoSQL values and a return to the old values of SQL, ACID transactions and good old relational tables. Goodbye to the new kids on the block with their crazy interface languages, bizarre data structures and distributed data tables.
Spanner promises a brave new alternative (whilst keeping to old standards) that allows a distributed database to seemingly break CAP theorem proposed by UC Berkeley computer scientist Eric Brewer in the late 1990’s. The theorem itself is simple to understand, but often misunderstood – it simply states that any distributed system (such as a database) can only guarantee two of the following: Consistency, Availability and Partition Tolerance.
Basically, if you have two or more computers in a system, if there is a break in communications then your system can be inconsistent (each computer giving different answers) or not available to answer any queries. NoSQL systems generally fall into one of two camps: either don’t answer if there is a partition break (MongoDB for instance) or let nodes on either side of the partition give a different answer (Cassandra for instance).
If you always want both the same answer across the system and always to be available, you can’t let a partition happen. Traditional relational database system do this by only having one master, the keeper of the truth and slaves which keep copies and may actually be wrong!
Spanner is seen as the fix for this problem – a system that is available, consistent and 100 per cent available.
Except it isn’t. Eric Brewer (of CAP theorem) is now employed by Google and while not directly involved in the Spanner project itself, a whitepaper from Brewer makes it clear that while Spanner does not break the CAP theorem, it is also not 100 per cent available. Problem? Not really, Spanner is just so available it might as well be 100 per cent available.
The reason for this is Google owns the entire infrastructure running Spanner and there is no other data on the Spanner network other than Google’s data. Spanner has availability of 99.9999 per cent, which means as a customer you can treat it as a system that will always be consistent and available; you can treat it just like your reliable relational database. But there will be the occasional partition (which will involve Google engineers running around with their hair on fire) and in that case – because of the way Spanner works – onside of the partition will be fine and carry on as usual, whilst the other side will be unavailable.
Even then, thanks to Snapshot reads, it’s possible that both sides will be able to read data, if you have access to the network of course.
So far, so good, but there are some potential issues.
One is caused by the way Spanner implements distributed transactions by use of a system called Paxos. Paxos implements transactions through the use of “group leader” and periodic elections in the system for this leader. This can cause a problem if the leader fails – you might need to wait out for a new election to happen before transactions can continue, or the leader might be restarted and you will need to wait for that.
Another is the fact Spanner is not a true relational database, it’s a key-value store in semi-relational tables. Each row must have a name and each table must have an ordered set of primary keys based on these names. This has an effect on the SQL-like language that is used to interact with Spanner: it’s very similar to SQL but different enough to cause problems for experienced SQL users.
In particular when creating tables the user must define how tables are “interleaved” to describe the locality relationships between multiple tables. If you get this wrong then there is a price to be paid in terms of performance: your system just won’t work as fast you need, especially if you have a globally distributed system. Google admits this in its original paper, saying there is room for improvement in the way Spanner handles complex SQL queries and the problem lies in the way each node handles data. Perhaps this has improved since the original paper, though.
Spanner, however, does have some useful tricks up its sleeve thanks to the use of Google’s TrueTime. This is an implementation of synchronized clocks using GPS receivers and atomic clocks in every data center. This can cause problems during a partition if a node can’t connect to a master – its clock will drift, causing the election of Paxos masters to slow down.
But TrueTime does allow schema changes to be scheduled for a later date and for both schemas to run at the same time, with a change to the new schema at a later date. This could certainly be helpful for organisations heavily invested in DevOps – schema changes of database (and roll backs in particular) are always a major problem here and in particular the roll back of bad schema changes. Running both at the same time would be a real gain.
Make no mistake, Google Spanner represents a real breakthrough in distributed database systems. It’s not a direct replacement for relational SQL databases, though as it does not appear you will be able to simply port a SQL application onto Spanner: there are changes to be made to the way data tables are defined and to the syntax of the SQL used to file and retrieve data.
The real question, though, is how many organisations actually need access to a globally scalable relational database? During the past couple of years NoSQL databases have muscled in on the data store action and shown that they can perform as reliably as their relational counterparts.
As ever, the decision will come down to cost: at $0.90 per node per hour and $0.30 per GB per month, this might seem very reasonable. But remember, if you need a global transactional database then you will need a large number of nodes and you will probably have a large amount of data, so that cost could start to rocket.
Don’t throw away that MySQL server just yet. ®