Database high priest mud-wrestles Facebook

Rubbishes MySQL. Bitchslaps NoSQL


Mike Stonebraker is famous for slagging Google's backend. And now he's slagging Facebook's too.

Last week, in a piece from our friends at GigaOM, Database Grandpoobah Mike Stonebraker announced that Facebook's continued dependance on MySQL was “a fate worse than death,” insisting that the social network's only route to salvation is to “bite the bullet and rewrite everything.”

We're confident he was quoted warmly and accurately. After all, he said much the same thing to The Register. "Facebook has shared their social network over something north of 4,000 MySQL instances, and that's nowhere near fast enough, so they're put 9,000 instances of memcached in memory in front of them. They are just dying trying to manage this," Stonebraker recently told us. "They have to do data consistency and crash recovery in user space."

Mike Stonebraker

Mike Stonebraker

As a professor of computer science at the University of California, Berkeley, Stonebraker helped develop the Ingres and Postgres relational databases, but in an age where ordinary relational databases can't always keep pace with internet-sized applications, he now backs a new breed of distributed in-memory database designed to handle exponentially larger amounts of information. In addition to serving as an adjunct professor at MIT, Stonebraker is the chief technology officer at VoltDB, an outfit that sells this sort of "NewSQL" database.

Stonebraker's Facebook comments drew fire not only from a core database engineer at Mark Zuckerberg's social networking outfit, but also from the recognized kingpin of "cloud computing": Amazon chief technology officer Werner Vogels. Both argue that Stonebraker has no right to his opinion because he's never driven the sort of massive backend that drives likes of Facebook and Amazon.

But Stonebraker was dead right several years back when he exposed the flaws of the MapReduce distributed number crunching platform that underpinned Google's backend infrastructure – even Google admitted as much – and as vehemently as Facebook defends its MySQL setup, there are other cases where the company has dropped the old school relational database in favor of distributed "NoSQL" platforms such as the Cassandra database built by Facebook and HBase, the open source offering inspired by Google's BigTable.

'Go write a paper'

Twelve hours after GigaOm's article appeared, Facebook database engineer Domas Mituzas unloaded on Stonebraker from somewhere in Lithuania, implying that the longtime professor doesn't understand the demands of a major website. Facebook, he said, focuses getting the most performance out of "mixed composition" I/O devices rather than in-memory data because it saves the company cash.

"I feel somewhat sad that I have to put this truism out here: disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read," he wrote. "Building the web that lasts is completely different task from what academia people imagine building the web is."

And he wasn't done. He added that Stonebraker – and some other unnamed database "pioneer" – failed to realize that using disks would save the world. "I already had this issue with [another] RDBMS pioneer...he also suggested that disks are things of the past and now everything has to be in memory, because memory is cheap. And data can be whatever unordered clutter, because CPUs can sort it, because CPUs are cheap," Mituzas wrote.

"Throwing more and more hardware without fine tuning for actual operational efficiency requirements is wasteful and harms our planet. Yes, we do lots of in-memory efficiency work, so that we reduce our I/O, but at the same time we balance the workload so that I/O subsystem provides as efficient as possible delivery of the long tail.

"What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched. What happens in the academia of in-memory databases, if one gets 2x efficiency gain? A paper. What happens when real world doesn’t read your papers anymore? You troll everyone via GigaOM."

That's quite a flame when you consider Stonebraker's pedigree. But Mituzas stood by his post. And he was backed by Vogels. "If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook's data store," the Amazon CTO tweeted. And then he tweeted again: "Scaling systems is like moving customers from single engine Cessna to 747 without them noticing it, with no touchdown & refueling in mid-air."

And again: "Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works".

Stonebraker versus the world

But Stonebraker dares. And whether you agree with his language or not, on some level he has a point. Rather than use MySQL, Facebook built Cassandra for its inbox search tool, and it went with HBase for its new messaging platform. These distributed databases abandon the traditional SQL model in favor of distributed non-relational architectures that can readily scale. "[Facebook] has got something that works: sharding MySQL. But the problem with sharding MySQL is not that it can't be made to work, so much that it's not application transparent across systems," says Jonathan Ellis, the chair of the open source Cassandra project and the CTO of DataStax, the Texas outfit that has commercialized the platform.

"You saw that they went for Cassandra for inbox search and HBase for messaging. The reason they're not doing that on MySQL is that sharding MySQL is a lot of effort and you have to apply that effort to each new project."

The extra twist of the knife is that Stonebraker has little respect for Cassandra or HBase either. VoltDB provides the speed of Cassandra and HBase and other NoSQL databases such as MongoDB, he says, but it retains the relational model. It doesn't limit your transactional semantics. He calls this NewSQL, in clear response to the NoSQL movement.

"At least for new OLTP applications, giving up ACID and giving up SQL is a terrible idea. You don't have to give up either of those. You can go fast without giving up either. If you give up ACID, you end up pushing data consistency into the application logic and that's just way harder to do," Stonebraker tells us.

"We've benchmarked ourselves against Cassandra on TPC-C, and we're a factor of five faster...the difference between NoSQL and NewSQL performance is a very big number."

Asked about Stonebraker's claims, DataStax's Jonathan Ellis argues that VoltDB has its own limitations. "There's a ton of limitations that VoltDB marketing doesn't tell you about," he says. "We're had people complain that you have to do queries within a partition and that if you step outside of that, it doesn't warn you. I also think that the focus on in-memory-only is limiting. Almost all Cassandra users have datasets larger than memory, and some subset of that will be active at any time.

"Either you have to buy ten times as many servers so you can fit the whole thing in RAM or license something like [the HP realtime processing engine] Vertica to put it offline. Neither is a compelling story."

So, it's Stonebraker against the web. And the difference of option is severe. In May, at a MongoDB developer conference in San Francisco, Mongo creator Dwight Merriman told his audience there was "no way" to do distributed joins in a way that really scales. "I'm not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational," he said

"You can do distributed transactions, but if you do them with no loss of generality and you do them across a thousand machines, it's not going to be that fast."

Stonebraker says precisely the opposite, and in typical fashion, he goes right for the jugular. "I reject what Merriman says out of hand," he tells The Register. Merriman and his company, 10gen, declined to comment for this story. But Stonebaker says words don't matter. As much as he likes to wield his opinions, he insists the debate will be decided elsewhere. "Let the bake-off begin," he crows.

Of course, as Facebook points out, speed isn't everything. In the end, there's no deciding this debate. Not that we would want to. It's too much fun. ®

Similar topics


Other stories you might like

  • It's 2022 and there are still malware-laden PDFs in emails exploiting bugs from 2017
    Crafty file names, encrypted malicious code, Office flaws – ah, it's like the Before Times

    HP's cybersecurity folks have uncovered an email campaign that ticks all the boxes: messages with a PDF attached that embeds a Word document that upon opening infects the victim's Windows PC with malware by exploiting a four-year-old code-execution vulnerability in Microsoft Office.

    Booby-trapping a PDF with a malicious Word document goes against the norm of the past 10 years, according to the HP Wolf Security researchers. For a decade, miscreants have preferred Office file formats, such as Word and Excel, to deliver malicious code rather than PDFs, as users are more used to getting and opening .docx and .xlsx files. About 45 percent of malware stopped by HP's threat intelligence team in the first quarter of the year leveraged Office formats.

    "The reasons are clear: users are familiar with these file types, the applications used to open them are ubiquitous, and they are suited to social engineering lures," Patrick Schläpfer, malware analyst at HP, explained in a write-up, adding that in this latest campaign, "the malware arrived in a PDF document – a format attackers less commonly use to infect PCs."

    Continue reading
  • New audio server Pipewire coming to next version of Ubuntu
    What does that mean? Better latency and a replacement for PulseAudio

    The next release of Ubuntu, version 22.10 and codenamed Kinetic Kudu, will switch audio servers to the relatively new PipeWire.

    Don't panic. As J M Barrie said: "All of this has happened before, and it will all happen again." Fedora switched to PipeWire in version 34, over a year ago now. Users who aren't pro-level creators or editors of sound and music on Ubuntu may not notice the planned change.

    Currently, most editions of Ubuntu use the PulseAudio server, which it adopted in version 8.04 Hardy Heron, the company's second LTS release. (The Ubuntu Studio edition uses JACK instead.) Fedora 8 also switched to PulseAudio. Before PulseAudio became the standard, many distros used ESD, the Enlightened Sound Daemon, which came out of the Enlightenment project, best known for its desktop.

    Continue reading
  • VMware claims 'bare-metal' performance on virtualized GPUs
    Is... is that why Broadcom wants to buy it?

    The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

    Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

    NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

    Continue reading
  • Nvidia promises annual updates across CPU, GPU, and DPU lines
    Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

    Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

    Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

    "We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

    Continue reading
  • Amazon puts 'creepy' AI cameras in UK delivery vans
    Big Bezos is watching you

    Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

    The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

    Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

    Continue reading

Biting the hand that feeds IT © 1998–2022