How NoSQL graph databases still usurp relational dynasties

Turn the tables on tables

Analysis Despite being assaulted from all sides, the relational model for databases is still the king of the hill and it looks like it will not only survive, but thrive as well.

NoSQL databases have become increasingly popular and have been offering a number of data and deployment modes that have overcome the limitations – real or imagined – of their SQL cousins.

NoSQL databases come in a number of guises, but essentially they are designed either to make the life of the programmer easier or to overcome the problem of distributing data at scale.

Models such as key-value stores (where the value can be anything from a simple string to a complex serialized blob) can assist programmers who want to store a lot of simple data quickly, retrieve it at speed, and in many cases don't really care if it persists in the case of a power cut.

Document databases store JSON-style data whilst indexing it for easy retrieval, allowing the programmer to map RESTFUL interfaces to a store without the hassle of converting it to relational format. Programmers call this impedance mismatch, after the term in electronics which stops you plugging any old instrument into the microphone socket of a recording device. The model inside the programmer's head is so very different from the relational model that it's a real hassle to convert between the two, so use a store that matches the model more closely.

Distributing data at scale is harder for relational databases, but curiously it's not because they are relational. The problem is that ACID (atomicity, consistency, isolation, durability) transactions (which are essential if you want your database to be consistent and giving correct answers) are very hard to do at scale, locking tables across continents is not easy, and the process is made harder as you may have multiple write points.

There are ways to do it, two-phase commit or consensus-do work, but they are clunky and will slow your system down. Scalable NoSQL databases solve the problem by either throwing transactions out or by admitting that your distributed system may not give the same answer across all nodes, or may simply refuse to give the answer.

It's not perfect, but at the moment it's what we have.

Where's the response?

Traditional database vendors, though, are fighting back. Microsoft's SQL server (as of version 2016) offers a way to store and retrieve JSON data in a relatively painless way, although the data itself is stored in the relational engine. You should get all the benefits of a SQL engine without that nasty impedance mismatch programmers hate. Oracle's NoSQL goes further, storing JSON or name value pairs across a sharded, shared nothing system.

Among the newer breed of tech firm – those in the cloud – Google has tackled distributed transactions with Spanner, and with F1 built on top of it, allows a SQL-like environment to be distributed (although it makes use of atomic clocks for time synchronization).

There is, however, one problem that relational databases are simply not good at: graphs. Here's a problem: design a relational database that supports threaded message boards that allow threads of any depth that can be queried from any message to give parents, children and siblings.

I've no doubt that many readers can do it and it's obvious from forum software that it can be done, but solutions are complex and can be hard to understand. This is just a special case of a graph where any node can be joined to any other node with a direction attached to the join.

Actually the joins aren't joins, they are edges; if they were joins then we might think of them as relational joins, which they aren't, as they have a direction which a relational join doesn't. In any case, a network of nodes is very hard (but not impossible) to model in a relational database, and can suffer from massive performance problems. Graph databases solve it by actually linking nodes with software edges. Think of your network of friends – you're in the center (naturally), you've got close friends you really like, some of the friends of your friends are OK, and there are some you dislike even if they seem to like each other. Draw it out and you'll soon see how complicated it is; graph databases eat this sort of data up.

Most modern graph databases can handle a huge number of nodes and edges, both of which can have a large number of properties, and better still, some of them even handle transactions. Neo4j, for instance, can handle tens of billions of nodes and relationships, although speed of querying will depend on the amount of RAM on your server.

Distributed graph databases such as Titan can be tuned to support ACID or eventual consistency and can be configured to run on top of popular NoSQL databases. If you look long enough down the back of your sofa, you might even get enough money to buy a Cray graph database appliance, or you could ask international security agencies if you could borrow theirs.

Although the underlying graph is easy to understand in both systems, the only downside is you will have to learn a new querying language, as SQL really isn't built to query this type of structure. Each of the databases has their own language, so choose wisely based on what it is you're trying to do. Also, you need to look at the architecture – will you need vast distribution or will a couple of high-power servers with lots of memory do the job?

Relational engines might someday find a way to optimize graph-style queries, but writing a SQL statement that will find all my friends' friends that are two nodes away should always cause a program a severe migraine. Graph databases and their languages, on the other hand, are specifically designed for the task. JSON, name value pairs, and columns stores could all get eaten by SQL engines and the problem of distributing transactions might be solved, but it's hard to see the relational engine eating the graph problem any time soon. ®

Similar topics

Other stories you might like

  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading
  • American diplomats' iPhones reportedly compromised by NSO Group intrusion software

    Reuters claims nine State Department employees outside the US had their devices hacked

    The Apple iPhones of at least nine US State Department officials were compromised by an unidentified entity using NSO Group's Pegasus spyware, according to a report published Friday by Reuters.

    NSO Group in an email to The Register said it has blocked an unnamed customers' access to its system upon receiving an inquiry about the incident but has yet to confirm whether its software was involved.

    "Once the inquiry was received, and before any investigation under our compliance policy, we have decided to immediately terminate relevant customers’ access to the system, due to the severity of the allegations," an NSO spokesperson told The Register in an email. "To this point, we haven’t received any information nor the phone numbers, nor any indication that NSO’s tools were used in this case."

    Continue reading

Biting the hand that feeds IT © 1998–2021