Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small
Former Google veteran talks to El Reg about trends in big data of the past decade
Interview Jordan Tigani made his name as an engineer leading the team behind BigQuery, Google’s data warehouse, which was among a group of systems to transform the market by separating storage and compute. Despite helping win global customers including Vodafone, and taking on big-name rivals Snowflake, Azure Synapse and AWS Redshift, he’s begun to think the approach has run out of “magic beans” for some users.
Now founder and CEO of MotherDuck – which built a serverless analytics system based around DuckDB – he finds himself eschewing the virtues of scale-out systems for the benefits a lightweight in-process OLAP database affords. You can catch all of this detail and more in our interview with him below.
Tigani recalled coming across DuckDB while he was chief product officer at SingleStore — his first job after Google.
“We started seeing DuckDB appearing on performance reports and giving us a run for the money. Not beating us, but it was still surprising. Then I started poking at it and I realised it could do some stuff that we couldn't do in SingleStore, that Snowflake couldn’t do either, and that was pretty interesting,” he says.
The brainchild of academics at Amsterdam's Centrum Wiskunde & Informatica mathematical and theoretical computing research center, DuckDB is embedded within a host process. There is no DBMS server software to install, update or maintain. For example, the DuckDB Python package can run queries directly on data in Python software library Pandas without importing or copying data. Written in C++, DuckDB is free and open source under the MIT License.
Tigani's pitch to co-authors Hannes Mühleisen and Mark Raasveldt was to build a cloud-based serverless product around DuckDB.
The important thing about DuckDB is that it's a scale-up system, contrary to the received wisdom of the last 15 years of analytics, embodied in the first generation of so called big data systems based on the Hadoop Distributed File System. The scale-out approach led to the boom in cloud-based analytics systems like Snowflake, BigQuery, Synapse and Redshift.
- The nodes have it in the Great DB debate: Reg readers pick graph
- Cassandra 4.1 promises dev guardrails and pluggable storage
- MotherDuck scores $47.5m to prove scale-up databases are not quackers
- Google wants to copy-paste your mainframe applications into its cloud
“At some point, there are no magic beans left. There will be a convergence around the performance of all these systems. Snowflake, Redshift, BigQuery, and Synapse are all probably within a factor of two in performance right now. For the most part, that's not what should drive people to use one of these systems versus the other,” Tigani says.
Added to that, the server machinery and infrastructure around these systems can “be really quite crufty,” he adds.
“DuckDB has been able to kind of strip all that away by being an in-process database, and that means that you basically can marshal data in and out of your application, or your data frames, with the minimum of data movements,” he says.
DuckDB is not designed to replace company-wide enterprise data warehouse systems like BigQuery but — by courting data scientists, machine learning engineers and developers, it could still have plenty of space to fly. ®