ACID fan and language lessons
"I can remember a debate in the '70s: assembly language jockeys would say C is too slow. I need to control my own registers. Twenty-five to 30 years later, we know that's not true and compilers are as good as or better than humans at producing machine-optimized code. Just like you'd never bet on assembly language today. You should not bet on low-level database repositories by alleging they are faster than a higher-level language."
As far as Stonebraker is concerned, these NoSQL architectures were built to fix specific problems by the companies that made them. Now, they are being peddled to the wider world. And that's the real problem because they undermine the principles of ACID that have helped guarantee the performance and reliability of data and that have fundamentally underpinned relational and Stonebraker's work. Even as he's gone non-relational with SciDB, Stonebraker said SciDB will comply with ACID.
Sure, Memcached, for example, is popular - used by Twitter, YouTube and Digg - and it's often used in conjunction with MySQL. But Memcached is not ACID-compliant. It might be fine at processing observations, tweets, videos and news, but customers outside the world of Web 2.0 clouds won't want to run things like financials through a Memcached system. Memcached is not alone: most NoSQLers make no bones about dumping ACID.
Stonebraker is more than just an MIT academic - he's part businessman: he's co-founder and chief technology officer for VoltDB, and a co-founder and board member of Vertica. That makes this more than a battle of architectures - it's a fight for customers' dollars.
Stonebraker reckons the NoSQL community has ditched ACID because it's "too expensive" but installing an ACID-free database is a bet against the future. As organizations grow, they will take decisions that inevitably put more of their important data into such systems and it's then that data integrity as guaranteed by ACID will matter.
"I'm a huge fan of ACID," Stonebraker confessed. "The database transaction model has served us well for 30 years and essentially everyone who jettisons it regrets it because it gives you a systematic underpinning for your data. A lot of the NoSQL guys jettison ACID and that's a huge mistake because, by and large, the NoSQL guys are not database experts.
"You might not need ACID now, but database applications live a very long time...requirements may change over that time. If you decide not to run ACID, make sure you never need it in the future," Stonebraker said.
ACID is what businesses need for mission-critical stuff. "I have a friend at a large telco who's not interested in NoSQL because they give up ACID compliance," Stonebraker reckoned.
Of course, Stonebraker is more than just an MIT academic. He's part businessman. He's co-founder and chief technology officer for VoltDB, and a co-founder and board member of Vertica. That makes this more than a battle of architectures. It's a fight for customers' dollars. It was no coincidence that Stonebraker and DeWitt's attack on MapReduce was launched from their Vertica blog.
Vertica's customers include Mozilla - the open-source operation uses Stonebraker's creation with open-source BI Pentaho to analyze billions of Firefox user log records per day in an attempt to improve product R&D. Guess.com, meanwhile, uses Vertica with MicroStrategy to analyze retail and inventory data in its US and European data centers.
Vertica's also landed some Web 2.0 big-data fish: Zynga, maker of the popular FarmVille and Mafia Wars hits on Facebook has come out as a relational fan for analytics. In a statement supporting Vertica 4.0, Zynga's vice president of analytics Ken Rudin called Vertica a "no wind-up toy", running a daily load of 40 million players and 3TB of data across 230 nodes and two clusters on the database's columnar data warehouse.
This wouldn't be so awkward if it weren't for the fact that Facebook has built its own big-data offering, Cassandra, which has its own take on columns. Cassandra's columns require a mind switch for those coming from a relational background, while Vertica provocatively calls itself "the only true enterprise-ready MPP columnar database" with an emphasis on "only" and "columnar database".
Stonebraker's other recent creation, VoltDB, which started operations in 2009, doesn't yet list any customers.
Roasting relational elephants
Stonebraker's not just critical of the NoSQL new wave: he's got plenty of fire left for the relational "elephants," Microsoft and Oracle. Increasingly, their answer to high-end relational processing is to boost the software by fusing it with the underlying hardware.
Oracle's built the Exadata server, a hardware appliance running Oracle's database that combines Smart Flash Cache to reduce bottlenecks and columnar compression to reduce data warehousing table size with solid-state multiterabyte storage arrays to offload data. Microsoft's partnered with Bull, Dell, EMC, HP and IBM on massively parallel processor appliances running SQL Server - SQL Server 2008 R2 Parallel Data Warehouse.
The concepts are similar to Stonebraker's warehousing and analytics work, but Stonebraker has not allowed himself to become married to a small set of certified hardware suppliers with specialized chips or hardware. Stronebraker's goal is to achieve scale through software working on affordable, commodity hardware - taking advantage of multi-core CPUs and greater memory. According to Stonebraker, Oracle and Microsoft can just keep adding more expensive hardware but the fundamental problems or bottlenecks won't be solved.