Google Spanner in the NewSQL works?

Don't decommission that MySQL server – yet


The commercial release by Google of its Spanner database as a public beta last month came as both a pleasant surprise and a wake-up call: perhaps the end of NoSQL databases is in sight.

Spanner represents the pinnacle of the so-called NewSQL movement, which is a rejection of NoSQL values and a return to the old values of SQL, ACID transactions and good old relational tables. Goodbye to the new kids on the block with their crazy interface languages, bizarre data structures and distributed data tables.

Spanner promises a brave new alternative (whilst keeping to old standards) that allows a distributed database to seemingly break CAP theorem proposed by UC Berkeley computer scientist Eric Brewer in the late 1990’s. The theorem itself is simple to understand, but often misunderstood – it simply states that any distributed system (such as a database) can only guarantee two of the following: Consistency, Availability and Partition Tolerance.

Basically, if you have two or more computers in a system, if there is a break in communications then your system can be inconsistent (each computer giving different answers) or not available to answer any queries. NoSQL systems generally fall into one of two camps: either don’t answer if there is a partition break (MongoDB for instance) or let nodes on either side of the partition give a different answer (Cassandra for instance).

If you always want both the same answer across the system and always to be available, you can’t let a partition happen. Traditional relational database system do this by only having one master, the keeper of the truth and slaves which keep copies and may actually be wrong!

Spanner is seen as the fix for this problem – a system that is available, consistent and 100 per cent available.

Except it isn’t. Eric Brewer (of CAP theorem) is now employed by Google and while not directly involved in the Spanner project itself, a whitepaper from Brewer makes it clear that while Spanner does not break the CAP theorem, it is also not 100 per cent available. Problem? Not really, Spanner is just so available it might as well be 100 per cent available.

The reason for this is Google owns the entire infrastructure running Spanner and there is no other data on the Spanner network other than Google’s data. Spanner has availability of 99.9999 per cent, which means as a customer you can treat it as a system that will always be consistent and available; you can treat it just like your reliable relational database. But there will be the occasional partition (which will involve Google engineers running around with their hair on fire) and in that case – because of the way Spanner works – onside of the partition will be fine and carry on as usual, whilst the other side will be unavailable.

Even then, thanks to Snapshot reads, it’s possible that both sides will be able to read data, if you have access to the network of course.

So far, so good, but there are some potential issues.

One is caused by the way Spanner implements distributed transactions by use of a system called Paxos. Paxos implements transactions through the use of “group leader” and periodic elections in the system for this leader. This can cause a problem if the leader fails – you might need to wait out for a new election to happen before transactions can continue, or the leader might be restarted and you will need to wait for that.

Another is the fact Spanner is not a true relational database, it’s a key-value store in semi-relational tables. Each row must have a name and each table must have an ordered set of primary keys based on these names. This has an effect on the SQL-like language that is used to interact with Spanner: it’s very similar to SQL but different enough to cause problems for experienced SQL users.

In particular when creating tables the user must define how tables are “interleaved” to describe the locality relationships between multiple tables. If you get this wrong then there is a price to be paid in terms of performance: your system just won’t work as fast you need, especially if you have a globally distributed system. Google admits this in its original paper, saying there is room for improvement in the way Spanner handles complex SQL queries and the problem lies in the way each node handles data. Perhaps this has improved since the original paper, though.

Spanner, however, does have some useful tricks up its sleeve thanks to the use of Google’s TrueTime. This is an implementation of synchronized clocks using GPS receivers and atomic clocks in every data center. This can cause problems during a partition if a node can’t connect to a master – its clock will drift, causing the election of Paxos masters to slow down.

But TrueTime does allow schema changes to be scheduled for a later date and for both schemas to run at the same time, with a change to the new schema at a later date. This could certainly be helpful for organisations heavily invested in DevOps – schema changes of database (and roll backs in particular) are always a major problem here and in particular the roll back of bad schema changes. Running both at the same time would be a real gain.

Make no mistake, Google Spanner represents a real breakthrough in distributed database systems. It’s not a direct replacement for relational SQL databases, though as it does not appear you will be able to simply port a SQL application onto Spanner: there are changes to be made to the way data tables are defined and to the syntax of the SQL used to file and retrieve data.

The real question, though, is how many organisations actually need access to a globally scalable relational database? During the past couple of years NoSQL databases have muscled in on the data store action and shown that they can perform as reliably as their relational counterparts.

As ever, the decision will come down to cost: at $0.90 per node per hour and $0.30 per GB per month, this might seem very reasonable. But remember, if you need a global transactional database then you will need a large number of nodes and you will probably have a large amount of data, so that cost could start to rocket.

Don’t throw away that MySQL server just yet. ®


Other stories you might like

  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading
  • Big Tech loves talking up privacy – while trying to kill privacy legislation
    Study claims Amazon, Apple, Google, Meta, Microsoft work to derail data rules

    Amazon, Apple, Google, Meta, and Microsoft often support privacy in public statements, but behind the scenes they've been working through some common organizations to weaken or kill privacy legislation in US states.

    That's according to a report this week from news non-profit The Markup, which said the corporations hire lobbyists from the same few groups and law firms to defang or drown state privacy bills.

    The report examined 31 states when state legislatures were considering privacy legislation and identified 445 lobbyists and lobbying firms working on behalf of Amazon, Apple, Google, Meta, and Microsoft, along with industry groups like TechNet and the State Privacy and Security Coalition.

    Continue reading
  • SEC probes Musk for not properly disclosing Twitter stake
    Meanwhile, social network's board rejects resignation of one its directors

    America's financial watchdog is investigating whether Elon Musk adequately disclosed his purchase of Twitter shares last month, just as his bid to take over the social media company hangs in the balance. 

    A letter [PDF] from the SEC addressed to the tech billionaire said he "[did] not appear" to have filed the proper form detailing his 9.2 percent stake in Twitter "required 10 days from the date of acquisition," and asked him to provide more information. Musk's shares made him one of Twitter's largest shareholders. The letter is dated April 4, and was shared this week by the regulator.

    Musk quickly moved to try and buy the whole company outright in a deal initially worth over $44 billion. Musk sold a chunk of his shares in Tesla worth $8.4 billion and bagged another $7.14 billion from investors to help finance the $21 billion he promised to put forward for the deal. The remaining $25.5 billion bill was secured via debt financing by Morgan Stanley, Bank of America, Barclays, and others. But the takeover is not going smoothly.

    Continue reading

Biting the hand that feeds IT © 1998–2022