The Great DB debate: SQL extensions won't solve the graph problem
Even ISO committee that delivered SQL thinks graphs are different enough to warrant a full query language
Register Debate Welcome back to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument.
The format is simple: we propose a motion, the arguments for the motion ran on Monday and Wednesday, and the arguments against on Tuesday and today. Read over the arguments: you have until tonight to cast your vote about which side you support using the poll embedded below, choosing whether you're in favor or against. The final score will be announced on Friday, revealing which argument was most popular.
It's up to our writers to convince you to vote for their side.
This week's motion is:
Graph databases – in which relationships are stored natively alongside the data elements – do not provide a significant advantage over well-architected relational databases for most of the same use cases.
Arguing AGAINST the motion for the second time is Neo4j chief scientist and professor of computer science Jim Webber, slapping down databaseology prof Andy Pavlo's argument FOR the motion yesterday.
Papers? We've read a few
I welcome the House's return to this debate. It could be pointed out that the delay is somewhat convenient since my opponent has been able to wait for a paper at CIDR 2023 about graph analytics, a close cousin of graph databases.
The house draws extensively on that paper from CWI in his response. It's a fortunate coincidence for the house that its authors share his opinions. Nonetheless, the house's claim that graph databases "focus on analytical queries over graphs" is plainly wrong. In reality, the majority of graph database workloads consist of concurrent reads and writes for online systems for which DuckDB is not designed. Indeed another paper at CIDR 23 [PDF] by equally eminent researchers points out that "The workloads of these application (sic) require several storage and processing features that existing RDBMSs are generally not optimized for."
Still, the house's peculiar conflation of graph databases and static graph analytics strengthens my point that different data structures and algorithms are needed for different workloads. In fact, Neo4j has had a similar graph analytics system (called Neo4j Graph Data Science) in production use since 2020. It is unfortunate that it wasn't used for comparison in the CWI paper, as pitting a graph database against a graph compute module for a compute benchmark is not good science.
The implementation advice given by the paper is reasonable, but much of it is already standard in graph databases. For example, some graph databases are schema-first and can use that to both help query planning and optimize disk use, yet schema-optional is a huge productivity boost for systems developers. Column storage is a reasonable way of storing properties, certainly, but locality benefits can accrue from other storage strategies. Parallelism for analytical queries makes sense (which Neo4j already does), though not always best for the most common OLTP use cases. Finally, any serious DBMS uses a mixture of memory management techniques, including allocating native memory, regardless of language or platform choice.
When it comes to APIs, SQL has been able to subsume other data models over the years, but GQL is the pending standard for graphs. GQL is overseen by the same ISO committee that delivered SQL. If SQL extensions were enough to solve the graph problem, I would trust this committee to halt its work. Instead, a learned body has decided graphs are different enough to warrant a full query language, not only an expedient stop-gap in the form of SQL/PGQL.
The house rightly brings up that a future well-architected DBMS will want to include graph optimizations. Unfortunately, there is little implementation support in current relational databases, and as the house says, this is a non-trivial challenge in any case. I'm happy that our efforts have helped to provide such impetus to our academic colleagues.
As the house ends on a public wager, I'll recount my own. In late 2010, I visited former colleagues at the University of Sydney, Australia. I gave a talk on graph databases and ended it by lightheartedly saying something like, "This technology category is going to catch on. You're going to ignore it for now, but in about a decade you will become interested and start telling us that we've done it all wrong."
I lost that wager: it evidently took two years longer than I'd forecast. ®
Cast your vote below. We'll close the poll later tonight and publish the final result on Friday. You can track the debate's progress here.