Google goes back to the future with SQL F1 database

'Can you have a truly scalable database without going NoSQL? Yes!'


The tech world is turning back toward SQL, bringing to a close a possibly misspent half-decade in which startups courted developers with promises of infinite scalability and the finest imitation-Google tools available, and companies found themselves exposed to unstable data and poor guarantees.

The shift has been going on quietly for some time, and tech leader Google has been tussling with the drawbacks of non-relational and non ACID-compliant systems for years. That struggle has demanded the creation of a new system to handle data at scale, and on Tuesday at the Very Large Data Base (VLDB) conference, Google delivered a paper outlining its much-discussed "F1" system, which has replaced MySQL as the distributed heart of the company's hugely lucrative AdWords platform.

The AdWords system includes "100s of applications and 1000s of users," which all share a database over 100TB serving up "hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day," Google said. And it's got five nines of availability.

Though Google had given a presentation on F1 at Stanford University SIGMOD in 2012, and a follow-up technical session at a conference in California earlier this year, it had not yet presented a paper outlining the technology in rigorous detail.

F1 uses some of Google's most advanced technologies, such as BigTable and the planet-spanning "Spanner" database, which F1 servers are co-located with for optimum use. Google describes it as a "a hybrid, combining the best aspects of traditional relational databases and scalable NoSQL systems".

Though built atop Spanner (which draws its own data from the Colossus File System), it adds in five new key features, including distributed SQL queries, transactionally consistent secondary indexes, asynchronous schema changes including database reorganizations, optimistic transactions, and automatic change history recording and publishing.

It has a relational scheme similar to traditional RDBMS systems, with extensions such as explicit table hierarchy and columns with Protocol Buffer data types.

The table hierarchy means that each primary key includes other keys that reference its ancestors, so for instance the key for an AdGroup entity could reference the Campaign entity of which it is a sub-group and the Customer entity of which the Campaign is a subgroup. This, Google says, allows for fast common-case join processing.

By combining these technologies, F1 brings the inherent scale-out benefits of NoSQL with the SQL features that enterprises are used to – ACID compliance, SQL queries, schema management, and indexes. It can also handle analysis as well as transaction processing, as F1 supports Google's MapReduce framework, allowing for Hadoop-like jobs.

The technology comes with a cost, as Google said due to its design choices it resulted in "higher latency for typical reads and writes," though the company has developed workarounds for this. It has "relatively high" commit latencies of 50-150 ms, Google writes.

F1 "started as an experiment and it wasn't clear it was possible or would work," Jeff Shute, Google's lead engineer on F1, said in a presentation at the O'Reilly Strata conference earlier this year. By putting F1 at the heart of Google's Ad network, the company feels it has proved "you can make a database that works like a database, but scales like NoSQL systems like BigTable."

"With F1 and Spanner we really decided to start over," Shute says. We've moved this huge Adwords system onto this new DB and proved it actually works. The system is way more scalable than what it was before - better availability than MySQL, better consistency than MySQL, we've got SQL-query that's just as good as what we started with."

Fire up the SQL-Delorean, Google's going back to the future

The reason Google built F1 was partly because of the frustration its engineers had found when dealing with non-relational systems with poor consistency problems.

"We also have a lot of experience with eventual consistency systems at Google," they write in the paper. "In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date."

So severe a penalty do these systems impose on developers that Google called it an "unacceptable burden". For this reason, F1 offers full transactional consistency across snapshot transactions, pessimistic transactions for holding locks in place across distributed checks, and optimistic transactions for writing globally without needing to check certain states locally.

The company had also found frustration with a lack of SQL among its developers. "Very clean semantics we find is something you cannot live without," Google software engineer Stephan Ellner, says.

Along with the SQL interface, F1 also gives developers a NoSQL key-value based interface as well, with Google encouraging developers to switch to SQL "for low-latency OLTP queries, large OLAP queries, and everything in between," according to the Google paper.

"It's really one query engine with a very diverse workload," Shute said "Small queries for user-facing applications... go all the way up to huge analysis across the whole database. We do have a lot of other data in external systems - stats, logs - and these can be orders of magnitude bigger than what's in the actual database."

It also allows for joins between Spanner and other data sources, such as BigTable, CSV files, and stuff from the analytical data warehouse inside Adwords.

By implementing F1, Google has been able to consolidate its systems as well. "We have defragmented our storage ecosystem," Stephan Ellner, a software engineer at Google, says. "We bring more things back from core BigTables into the core transactional database so developers have fewer external things to deal with and fewer APIs."

Google expects to roll out the underlying Spanner system across a large amount of Google's products, and where Spanner goes, F1 may follow.

Just as Google pioneered the use of scale-out analytical and data storage technologies with the Google File System and MapReduce papers, its combination of Spanner and F1 may inspire a new wave of startups working on SQL-like scale-out databases.

Right now, companies such as Cloudera, ParAccel, MongoDB, and Cascading are all trying to layer SQL-like query engine over a datastore like HDFS or MongoDB. But though these can scale well they lack the transactional capabilities of some systems. This, for enterprises, is an unpleasant pill to swallow.

With F1, Google feels it has been able to develop workarounds for some of the thornier problems to do with consistent transactions when running at global scale – and this may put the frighteners on companies already attempting to mimic Spanner's tech such as FoundationDB.

"Can you have a truly scalable database without going NoSQL? Our answer is yes," Ellner says. ®


Other stories you might like

  • Robotics and 5G to spur growth of SoC industry – report
    Big OEMs hogging production and COVID causing supply issues

    The system-on-chip (SoC) side of the semiconductor industry is poised for growth between now and 2026, when it's predicted to be worth $6.85 billion, according to an analyst's report. 

    Chances are good that there's an SoC-powered device within arm's reach of you: the tiny integrated circuits contain everything needed for a basic computer, leading to their proliferation in mobile, IoT and smart devices. 

    The report predicting the growth comes from advisory biz Technavio, which looked at a long list of companies in the SoC market. Vendors it analyzed include Apple, Broadcom, Intel, Nvidia, TSMC, Toshiba, and more. The company predicts that much of the growth between now and 2026 will stem primarily from robotics and 5G. 

    Continue reading
  • Deepfake attacks can easily trick live facial recognition systems online
    Plus: Next PyTorch release will support Apple GPUs so devs can train neural networks on their own laptops

    In brief Miscreants can easily steal someone else's identity by tricking live facial recognition software using deepfakes, according to a new report.

    Sensity AI, a startup focused on tackling identity fraud, carried out a series of pretend attacks. Engineers scanned the image of someone from an ID card, and mapped their likeness onto another person's face. Sensity then tested whether they could breach live facial recognition systems by tricking them into believing the pretend attacker is a real user.

    So-called "liveness tests" try to authenticate identities in real-time, relying on images or video streams from cameras like face recognition used to unlock mobile phones, for example. Nine out of ten vendors failed Sensity's live deepfake attacks.

    Continue reading
  • Lonestar plans to put datacenters in the Moon's lava tubes
    How? Founder tells The Register 'Robots… lots of robots'

    Imagine a future where racks of computer servers hum quietly in darkness below the surface of the Moon.

    Here is where some of the most important data is stored, to be left untouched for as long as can be. The idea sounds like something from science-fiction, but one startup that recently emerged from stealth is trying to turn it into a reality. Lonestar Data Holdings has a unique mission unlike any other cloud provider: to build datacenters on the Moon backing up the world's data.

    "It's inconceivable to me that we are keeping our most precious assets, our knowledge and our data, on Earth, where we're setting off bombs and burning things," Christopher Stott, founder and CEO of Lonestar, told The Register. "We need to put our assets in place off our planet, where we can keep it safe."

    Continue reading

Biting the hand that feeds IT © 1998–2022