PostgreSQL in line for DuckDB-shaped boost in analytics arena
Microsoft-backed extension aims to address open source database's perceived weaknesses
A Microsoft-backed open source project aims to help address PostgreSQL's weaknesses as an analytics database.
The project, which is under the permissive MIT License, plans to bring popular in-process OLAP system DuckDB to PostgreSQL, which has been named the most popular database among developers since 2023.
While the popularity of PostgreSQL has grown steadily over the last 28 years, something credited to its robust and extensible design, developers have struggled to build analytical systems based on the database as effectively as they support transactions.
An open source PostgreSQL extension that embeds DuckDB's analytics engine intends to help address the problem.
A blog post from MotherDuck, which provides back-end services and extensions to DuckDB, said the IP is owned by the independent non-profit DuckDB Foundation, ensuring that it stays open source. It is hosted in the official DuckDB GitHub repository.
"PostgreSQL is great at a lot of things, but if you try to use it for analytics, you hit a wall pretty quickly," the post said. "That is, it is great at creating, finding and locating individual rows, but if you want to understand what is going on in a data set, it can be painfully slow.
"People have tried to add Band-Aids to improve PostgreSQL analytical performance but they haven't been particularly successful, because being good at analytics requires different techniques for running your queries, like being able to operate over batches of rows at once, and avoiding decompressing data until it is absolutely needed. And typically, that takes a purpose-built analytical engine, which takes a ton of effort."
DuckDB is embedded within a host process, so there is no DBMS server software to install, update, or maintain. For example, the DuckDB Python package can run queries directly on data in Python software library Pandas without importing or copying data. Written in C++, DuckDB is free and open source under the MIT License.
MotherDuck said DuckDB's vectorized query engine processes chunks of data at a time, making it valuable for answering analytical questions about what is going on in the data. "DuckDB's popularity has been soaring due to its speed, ease of use, and versatility," the blog claimed.
- DuckDB promises greater stability with 1.0 release
- DuckDB shuns VC breadcrumbs so support isn't all it's quacked up to be
- VCs lay $52.5M golden egg for MotherDuck's serverless analytics platform
- Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small
The new DuckDB PostgreSQL extension, pg_duckdb, is supported by DuckDB Labs, the commercial support company founded by DuckDB's developers, Microsoft, which hosts PostgreSQL services and employs PostgreSQL committers, Hydra, the data and software company that kicked off pg_duckdb, and Neon, which offers a serverless PostgreSQL database platform. MotherDuck is also supporting the project.
Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, said developers had been trying to make OLAP versions of PostgreSQL since the 2000s, "but these were all hard forks of the Postgres code" such as Greenplum, Vertica, Aster Data, and ParAccel.
"This approach was necessary at the time because the PostgreSQL extension or 'hooks' API started in 2006 and was originally meant to [provide] support for debugging functionality," he told us. "Then in the early 2010s, systems like Citus showed how to create a version of PostgreSQL that was based entirely on using extensions. The other big player which showed that you don't need to fork Postgres to support custom workloads is Timescale.
"DuckDB's optimizer is not as battle-hardened as PostgreSQL in terms of the number of years of service. But DuckDB's optimizer design is based on the TUM's HyPer optimizer, which is one of the best in the world. For example, it can unnest arbitrary subqueries that not even Microsoft SQL Server or Oracle can handle."
Pavlo added that the range of companies backing the project was impressive, especially MotherDuck, Neon and Microsoft. "If I was a developer that was already using Postgres + DuckDB in my stack and I wanted to be able to combine the two together, I would choose pg_duckdb over the other DuckDB Postgres extension alternatives," he said. ®