Google Spanner in the NewSQL works?

Don't decommission that MySQL server – yet


The commercial release by Google of its Spanner database as a public beta last month came as both a pleasant surprise and a wake-up call: perhaps the end of NoSQL databases is in sight.

Spanner represents the pinnacle of the so-called NewSQL movement, which is a rejection of NoSQL values and a return to the old values of SQL, ACID transactions and good old relational tables. Goodbye to the new kids on the block with their crazy interface languages, bizarre data structures and distributed data tables.

Spanner promises a brave new alternative (whilst keeping to old standards) that allows a distributed database to seemingly break CAP theorem proposed by UC Berkeley computer scientist Eric Brewer in the late 1990’s. The theorem itself is simple to understand, but often misunderstood – it simply states that any distributed system (such as a database) can only guarantee two of the following: Consistency, Availability and Partition Tolerance.

Basically, if you have two or more computers in a system, if there is a break in communications then your system can be inconsistent (each computer giving different answers) or not available to answer any queries. NoSQL systems generally fall into one of two camps: either don’t answer if there is a partition break (MongoDB for instance) or let nodes on either side of the partition give a different answer (Cassandra for instance).

If you always want both the same answer across the system and always to be available, you can’t let a partition happen. Traditional relational database system do this by only having one master, the keeper of the truth and slaves which keep copies and may actually be wrong!

Spanner is seen as the fix for this problem – a system that is available, consistent and 100 per cent available.

Except it isn’t. Eric Brewer (of CAP theorem) is now employed by Google and while not directly involved in the Spanner project itself, a whitepaper from Brewer makes it clear that while Spanner does not break the CAP theorem, it is also not 100 per cent available. Problem? Not really, Spanner is just so available it might as well be 100 per cent available.

The reason for this is Google owns the entire infrastructure running Spanner and there is no other data on the Spanner network other than Google’s data. Spanner has availability of 99.9999 per cent, which means as a customer you can treat it as a system that will always be consistent and available; you can treat it just like your reliable relational database. But there will be the occasional partition (which will involve Google engineers running around with their hair on fire) and in that case – because of the way Spanner works – onside of the partition will be fine and carry on as usual, whilst the other side will be unavailable.

Even then, thanks to Snapshot reads, it’s possible that both sides will be able to read data, if you have access to the network of course.

So far, so good, but there are some potential issues.

One is caused by the way Spanner implements distributed transactions by use of a system called Paxos. Paxos implements transactions through the use of “group leader” and periodic elections in the system for this leader. This can cause a problem if the leader fails – you might need to wait out for a new election to happen before transactions can continue, or the leader might be restarted and you will need to wait for that.

Another is the fact Spanner is not a true relational database, it’s a key-value store in semi-relational tables. Each row must have a name and each table must have an ordered set of primary keys based on these names. This has an effect on the SQL-like language that is used to interact with Spanner: it’s very similar to SQL but different enough to cause problems for experienced SQL users.

In particular when creating tables the user must define how tables are “interleaved” to describe the locality relationships between multiple tables. If you get this wrong then there is a price to be paid in terms of performance: your system just won’t work as fast you need, especially if you have a globally distributed system. Google admits this in its original paper, saying there is room for improvement in the way Spanner handles complex SQL queries and the problem lies in the way each node handles data. Perhaps this has improved since the original paper, though.

Spanner, however, does have some useful tricks up its sleeve thanks to the use of Google’s TrueTime. This is an implementation of synchronized clocks using GPS receivers and atomic clocks in every data center. This can cause problems during a partition if a node can’t connect to a master – its clock will drift, causing the election of Paxos masters to slow down.

But TrueTime does allow schema changes to be scheduled for a later date and for both schemas to run at the same time, with a change to the new schema at a later date. This could certainly be helpful for organisations heavily invested in DevOps – schema changes of database (and roll backs in particular) are always a major problem here and in particular the roll back of bad schema changes. Running both at the same time would be a real gain.

Make no mistake, Google Spanner represents a real breakthrough in distributed database systems. It’s not a direct replacement for relational SQL databases, though as it does not appear you will be able to simply port a SQL application onto Spanner: there are changes to be made to the way data tables are defined and to the syntax of the SQL used to file and retrieve data.

The real question, though, is how many organisations actually need access to a globally scalable relational database? During the past couple of years NoSQL databases have muscled in on the data store action and shown that they can perform as reliably as their relational counterparts.

As ever, the decision will come down to cost: at $0.90 per node per hour and $0.30 per GB per month, this might seem very reasonable. But remember, if you need a global transactional database then you will need a large number of nodes and you will probably have a large amount of data, so that cost could start to rocket.

Don’t throw away that MySQL server just yet. ®

Similar topics


Other stories you might like

  • Ubuntu 21.10: Plan to do yourself an Indri? Here's what's inside... including a bit of GNOME schooling

    Plus: Rounded corners make GNOME 40 look like Windows 11

    Review Canonical has released Ubuntu 21.10, or "Impish Indri" as this one is known. This is the last major version before next year's long-term support release of Ubuntu 22.04, and serves as a good preview of some of the changes coming for those who stick with LTS releases.

    If you prefer to run the latest and greatest, 21.10 is a solid release with a new kernel, a major GNOME update, and some theming changes. As a short-term support release, Ubuntu 21.10 will be supported for nine months, which covers you until July 2022, by which point 22.04 will already be out.

    Continue reading
  • Heart FM's borkfast show – a fine way to start your day

    Jamie and Amanda have a new co-presenter to contend with

    There can be few things worse than Microsoft Windows elbowing itself into a presenting partnership, as seen in this digital signage for the Heart breakfast show.

    For those unfamiliar with the station, Heart is a UK national broadcaster with Global as its parent. It currently consists of a dozen or so regional stations with a number of shows broadcast nationally. Including a perky breakfast show featuring former Live and Kicking presenter Jamie Theakston and Britain's Got Talent judge, Amanda Holden.

    Continue reading
  • Think your phone is snooping on you? Hold my beer, says basic physics

    Information wants to be free, and it's making its escape

    Opinion Forget the Singularity. That modern myth where AI learns to improve itself in an exponential feedback loop towards evil godhood ain't gonna happen. Spacetime itself sets hard limits on how fast information can be gathered and processed, no matter how clever you are.

    What we should expect in its place is the robot panopticon, a relatively dumb system with near-divine powers of perception. That's something the same laws of physics that prevent the Godbot practically guarantee. The latest foreshadowing of mankind's fate? The Ethernet cable.

    By itself, last week's story of a researcher picking up and decoding the unintended wireless emissions of an Ethernet cable is mildly interesting. It was the most labby of lab-based demos, with every possible tweak applied to maximise the chances of it working. It's not even as if it's a new discovery. The effect and its security implications have been known since the Second World War, when Bell Labs demonstrated to the US Army that a wired teleprinter encoder called SIGTOT was vulnerable. It could be monitored at a distance and the unencrypted messages extracted by the radio pulses it gave off in operation.

    Continue reading

Biting the hand that feeds IT © 1998–2021