Spotlight on Databases

This article is more than 1 year old

Great Graph Database Debate: Abandoning the relational model is 'reinventing the wheel'

Let's talk about what well-architected looks like

Wed 8 Mar 2023 // 10:30 UTC

Register Debate Welcome back to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument.

The format is simple: we propose a motion, the arguments for the motion ran on Monday and today, and the arguments against on Tuesday and tomorrow. During the week you can cast your vote on which side you support using the poll embedded below, choosing whether you're in favor or against. The final score will be announced on Friday, revealing which argument was most popular.

It's up to our writers to convince you to vote for their side.

This week's motion is:

Graph databases – in which relationships are stored natively alongside the data elements – do not provide a significant advantage over well-architected relational databases for most of the same use cases.

Arguing FOR the motion once again, with a counterpoint to yesterday's assertions from Jim Webber, is Andy Pavlo, associate professor of databaseology at Carnegie Mellon University.

It's all fun and games until things get ... Relational

My esteemed colleague is being coy in claiming that he does not know what a "well-architected" DBMS means. Nevertheless, allow me to remind him of the key characteristics of such a system. I will focus on analytical queries over graphs, as this is what graph DBMSs claim to be better at than relational DBMSs. My list below is also inspired by this CIDR 2023 paper [PDF] by researchers at CWI.

Fast Scans on Typed Data – The NoSQL movement misguided developers in thinking that a schemaless (i.e., schema-optional) database is a good idea. It is necessary for some scenarios, and most relational DBMSs now support JSON data types. But for ripping through data quickly, if the DBMS is aware of the schema and the data's types, it removes indirection and improves scan performance.
Columnar Storage – Storing data in a column-oriented manner has several benefits, including reducing disk I/O and memory overhead due to skipping unnecessary columns for a query. Columnar data also has lower entropy than row-oriented data, making it more amenable to compression schemes that do not require the DBMS to decompress it first.
Vectorized Query Execution – Using a vectorized processing model [PDF] (as opposed to a tuple-at-a-time model) improves DBMS performance for analytical queries by 10-100x. Processing data in batches also allows the DBMS to leverage vectorized CPU instructions (SIMD) to improve performance further.
Explicit Control of Memory – The DBMS needs complete control over its memory allocations. This means not using a "managed" memory runtime (eg, JVM, Erlang) where fragmentation and garbage collection will cause performance problems, nor should it allow the OS to determine what data to evict from the cache (eg, MMAP). The DBMS must also have fine-grained data placement to ensure that related information is located close to each other to improve CPU efficiency.

Andy Pavlo: "The mistake they made was to ignore database history and try to reinvent the wheel by abandoning the relational model"

There are some additional optimizations that are specific to graphs that a relational DBMS needs to incorporate:

Graph-centric Query APIs – The SQL:2023 standard introduces Property Graph Queries (SQL/PGQ) for defining and traversing graphs in a relational DBMS. SQL/PGQ is a subset of the emerging GQL standard. Thus, SQL/PGQ further narrows the functionality difference between relational DBMSs and native graph DBMSs.
Query Execution Enhancements – A well-architected relational DBMS will want to include enhancements specifically for optimizing graph queries, including multi-way worst-case optimal joins (WCOJ), compact ephemeral data structures (e.g., compressed sparse rows), and factorized query processing. Although adding these requires non-trivial engineering, such enhancements fit nicely with existing relational DBMS execution engines and do not require a new engine to be written from scratch.

As evidence for how a well-architected relational DBMS performs against a graph DBMS, I refer to the CIDR 2023 paper from CWI [PDF]. They extended the DuckDB embedded analytical relational DBMS to add support for SQL/PGQ and the above enhancements. Then, they compared it against a leading graph DBMS on an industry-standard graph benchmark. Their results show that DuckDB, with the above extensions, outperforms the native graph DBMS by up to 10x. These are state-of-the-art results from January 2023 and not from five years ago.

And although the CWI team includes some of the best database systems researchers in the world, they have not raised hundreds of millions of dollars to achieve these results.

Regarding my colleague's reference to Stonebraker's seminal 2006 paper that refutes "one size fits all" DBMS architectures, their anecdote that Neo4j arose from their attempt to use a relational DBMS in the 2000s for graph-centric workloads is evidence of Stonebraker's argument. But the mistake they made was to ignore database history and try to reinvent the wheel by abandoning the relational model. I encourage interested readers also to read Stonebraker's 2006 treatise [PDF] on the failure of alternative data models to supplant the relational data model since its invention in 1969. Graph databases are just another category of non-relational DBMSs that will eventually decline in popularity as relational DBMSs adopt the best parts of them (as they have with others in the past).

Lastly, I stand by my 2021 public wager about the future of the graph database market. I will replace my official CMU photo with one of me wearing a shirt that says "Graph Databases Are #1". I will use that photo until I retire, get fired, or a former student stabs me. ®

Cast your vote below. We'll close the poll on Thursday night and publish the final result on Friday. You can track the debate's progress here.

Topics

Special Features

Vendor Voice

Resources

Spotlight on Databases

Great Graph Database Debate: Abandoning the relational model is 'reinventing the wheel'

Let's talk about what well-architected looks like

It's all fun and games until things get ... Relational

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Cybercriminals threaten to leak all 5 million records from stolen database of high-risk individuals

Valkey publishes release candidate and attracts new backer

North American S/4HANA migrations ramping among SAP users

Industrial systems integrating digitalisation

Linux Foundation marshals support for open source alternative to Redis

Progress outbids private equity in offer for MariaDB plc

PostgreSQL pioneer's latest brainchild promises time travel to dodge ransomware

Beijing issues list of approved CPUs – with no Intel or AMD

Whistleblower raises alarm over UK Nursing and Midwifery Council's DB

Nutanix catapults IP theft sueball at DBaaS startup Tessell

Voltron Data revs up hyper-speed analytics, leaves Snowflake in the dust

Don't be like these 900+ websites and expose millions of passwords via Firebase

About Us

Our Websites

Your Privacy