Database high priest mud-wrestles Facebook
Rubbishes MySQL. Bitchslaps NoSQL
Mike Stonebraker is famous for slagging Google's backend. And now he's slagging Facebook's too.
Last week, in a piece from our friends at GigaOM, Database Grandpoobah Mike Stonebraker announced that Facebook's continued dependance on MySQL was “a fate worse than death,” insisting that the social network's only route to salvation is to “bite the bullet and rewrite everything.”
We're confident he was quoted warmly and accurately. After all, he said much the same thing to The Register. "Facebook has shared their social network over something north of 4,000 MySQL instances, and that's nowhere near fast enough, so they're put 9,000 instances of memcached in memory in front of them. They are just dying trying to manage this," Stonebraker recently told us. "They have to do data consistency and crash recovery in user space."
As a professor of computer science at the University of California, Berkeley, Stonebraker helped develop the Ingres and Postgres relational databases, but in an age where ordinary relational databases can't always keep pace with internet-sized applications, he now backs a new breed of distributed in-memory database designed to handle exponentially larger amounts of information. In addition to serving as an adjunct professor at MIT, Stonebraker is the chief technology officer at VoltDB, an outfit that sells this sort of "NewSQL" database.
Stonebraker's Facebook comments drew fire not only from a core database engineer at Mark Zuckerberg's social networking outfit, but also from the recognized kingpin of "cloud computing": Amazon chief technology officer Werner Vogels. Both argue that Stonebraker has no right to his opinion because he's never driven the sort of massive backend that drives likes of Facebook and Amazon.
But Stonebraker was dead right several years back when he exposed the flaws of the MapReduce distributed number crunching platform that underpinned Google's backend infrastructure – even Google admitted as much – and as vehemently as Facebook defends its MySQL setup, there are other cases where the company has dropped the old school relational database in favor of distributed "NoSQL" platforms such as the Cassandra database built by Facebook and HBase, the open source offering inspired by Google's BigTable.
'Go write a paper'
Twelve hours after GigaOm's article appeared, Facebook database engineer Domas Mituzas unloaded on Stonebraker from somewhere in Lithuania, implying that the longtime professor doesn't understand the demands of a major website. Facebook, he said, focuses getting the most performance out of "mixed composition" I/O devices rather than in-memory data because it saves the company cash.
"I feel somewhat sad that I have to put this truism out here: disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read," he wrote. "Building the web that lasts is completely different task from what academia people imagine building the web is."
And he wasn't done. He added that Stonebraker – and some other unnamed database "pioneer" – failed to realize that using disks would save the world. "I already had this issue with [another] RDBMS pioneer...he also suggested that disks are things of the past and now everything has to be in memory, because memory is cheap. And data can be whatever unordered clutter, because CPUs can sort it, because CPUs are cheap," Mituzas wrote.
"Throwing more and more hardware without fine tuning for actual operational efficiency requirements is wasteful and harms our planet. Yes, we do lots of in-memory efficiency work, so that we reduce our I/O, but at the same time we balance the workload so that I/O subsystem provides as efficient as possible delivery of the long tail.
"What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched. What happens in the academia of in-memory databases, if one gets 2x efficiency gain? A paper. What happens when real world doesn’t read your papers anymore? You troll everyone via GigaOM."
That's quite a flame when you consider Stonebraker's pedigree. But Mituzas stood by his post. And he was backed by Vogels. "If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook's data store," the Amazon CTO tweeted. And then he tweeted again: "Scaling systems is like moving customers from single engine Cessna to 747 without them noticing it, with no touchdown & refueling in mid-air."
And again: "Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works".
Stonebraker versus the world
But Stonebraker dares. And whether you agree with his language or not, on some level he has a point. Rather than use MySQL, Facebook built Cassandra for its inbox search tool, and it went with HBase for its new messaging platform. These distributed databases abandon the traditional SQL model in favor of distributed non-relational architectures that can readily scale. "[Facebook] has got something that works: sharding MySQL. But the problem with sharding MySQL is not that it can't be made to work, so much that it's not application transparent across systems," says Jonathan Ellis, the chair of the open source Cassandra project and the CTO of DataStax, the Texas outfit that has commercialized the platform.
"You saw that they went for Cassandra for inbox search and HBase for messaging. The reason they're not doing that on MySQL is that sharding MySQL is a lot of effort and you have to apply that effort to each new project."
The extra twist of the knife is that Stonebraker has little respect for Cassandra or HBase either. VoltDB provides the speed of Cassandra and HBase and other NoSQL databases such as MongoDB, he says, but it retains the relational model. It doesn't limit your transactional semantics. He calls this NewSQL, in clear response to the NoSQL movement.
"At least for new OLTP applications, giving up ACID and giving up SQL is a terrible idea. You don't have to give up either of those. You can go fast without giving up either. If you give up ACID, you end up pushing data consistency into the application logic and that's just way harder to do," Stonebraker tells us.
"We've benchmarked ourselves against Cassandra on TPC-C, and we're a factor of five faster...the difference between NoSQL and NewSQL performance is a very big number."
Asked about Stonebraker's claims, DataStax's Jonathan Ellis argues that VoltDB has its own limitations. "There's a ton of limitations that VoltDB marketing doesn't tell you about," he says. "We're had people complain that you have to do queries within a partition and that if you step outside of that, it doesn't warn you. I also think that the focus on in-memory-only is limiting. Almost all Cassandra users have datasets larger than memory, and some subset of that will be active at any time.
"Either you have to buy ten times as many servers so you can fit the whole thing in RAM or license something like [the HP realtime processing engine] Vertica to put it offline. Neither is a compelling story."
So, it's Stonebraker against the web. And the difference of option is severe. In May, at a MongoDB developer conference in San Francisco, Mongo creator Dwight Merriman told his audience there was "no way" to do distributed joins in a way that really scales. "I'm not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational," he said
"You can do distributed transactions, but if you do them with no loss of generality and you do them across a thousand machines, it's not going to be that fast."
Stonebraker says precisely the opposite, and in typical fashion, he goes right for the jugular. "I reject what Merriman says out of hand," he tells The Register. Merriman and his company, 10gen, declined to comment for this story. But Stonebaker says words don't matter. As much as he likes to wield his opinions, he insists the debate will be decided elsewhere. "Let the bake-off begin," he crows.
Of course, as Facebook points out, speed isn't everything. In the end, there's no deciding this debate. Not that we would want to. It's too much fun. ®