MotherDuck scores $47.5m to prove scale-up databases are not quackers
Former BigQuery tech lead tells Reg about data-warehouse-on-your-laptop DuckDB
Interview In analytical database systems, the story of the last ten years or more has been about building out. Only databases distributed over multiple nodes could cope with the scale required by so-called Big Data. Web and mobile data were driving demand for systems which scale out, rather than rely on more and more powerful single instances.
Hadoop (technically a distributed file system), AWS Redshift, Snowflake, and Google's BigQuery all followed this trend – at least in terms of On-Line Analytical Processing (OLAP).
But one of the chief architects of BigQuery is taking a bet on a system which goes in the other direction. Jordan Tigani's new company, MotherDuck, has just taken $47.5 million in seed and Series A funding, with backers including a16z, the VC co-founded by web pioneer Marc Andreessen.
MotherDuck has built a serverless extension to the open source database DuckDB, which was featured in The Register in September.
Although only just releasing its 0.6.0 iteration this week, DuckDB has already found a home at Google, Facebook and Airbnb.
DuckDB is embedded within a host process, with no DBMS server software to install, update or maintain. For example, the DuckDB Python package can run queries directly on data in Python software library Pandas without importing or copying data.
The other thing that makes it different is that DuckDB scales up, rather than scaling out.
Tigani tells The Register: "Everyone is talking about Big Data. Databricks and Snowflake have been trying to outdo each other in benchmark wars over a 100TB dataset. In reality, nobody uses that amount of data. Everybody focuses on giant datasets, but the actual workloads on the database tend to be gigabytes."
While working as chief product officer for SingleStore – the database which claims to support both analytical and transactional workloads on a single system – Tigani saw DuckDB, an open source project co-authored by Dutch computer science researchers Hannes Mühleisen and Mark Raasveldt.
"Since the days when MapReduce was first introduced in 2004, scale up was a dirty word, but when you realize that most data we work on is not that huge, and at the same time, laptop and desktop hardware have got better, you don't need to scale out. Scaling up so much simpler, and more robust. When we built Google BigQuery as a large-scale distributed system, it took an enormous amount of energy to get it to work," says Tigani.
MotherDuck provides a backend extension to DuckDB, allowing the database to work in a way that is analogous to Google Sheets, which partly runs on the client and partly on the server. It hooks the client database into a backend execution pipeline and cost-based optimizer which uses the "standard tricks" used to optimize queries in the data warehousing world. It also helps the system decide what to execute on the client and what should go to the cloud, Tigani says.
Additionally, it allows developers and data scientists to collaborate on the same data set, avoiding replication and version control – although the DuckDB literature makes clear it is no replacement for large client/server installations for centralized enterprise data warehousing.
- The world was promised 'cloud magic'. So much for that fairy tale
- Couchbase claims fourfold performance boost for DBaaS using a tenth of the memory
- Db2 goes 'cloud-first' as IBM struggles to lift database dinosaur
- US Veterans Affairs hits brakes on $10b Oracle Cerner health record system
DuckDB, which remains open source under the permissive MIT license, has attracted interest from developers wanting to build it into their data analytics and machine learning systems.
Matthew Mullins, CTO of collaborative analytics tool builder Coginiti, tells The Register: "I'm super excited about DuckDB and all the things people are going to build on it because it's very easy to use, it's incredibly fast, and once you touch it, you start thinking of all the places you could use it.
"Our product enables analysts to transform data in the leading analytic data platforms, which are all column-oriented like DuckDB. Implementing DuckDB in our product was a way to carve off some data warehouse-like compute and replicate it in the browser. It's like bringing fire down from the clouds. For us, DuckDB is enabling users to manipulate large data sets with incredible speed and accuracy while leveraging local compute to save on platform costs. We’re just at the start of our journey with DuckDB," Mullins says. ®