How NoSQL graph databases still usurp relational dynasties

Turn the tables on tables

Analysis Despite being assaulted from all sides, the relational model for databases is still the king of the hill and it looks like it will not only survive, but thrive as well.

NoSQL databases have become increasingly popular and have been offering a number of data and deployment modes that have overcome the limitations – real or imagined – of their SQL cousins.

NoSQL databases come in a number of guises, but essentially they are designed either to make the life of the programmer easier or to overcome the problem of distributing data at scale.

Models such as key-value stores (where the value can be anything from a simple string to a complex serialized blob) can assist programmers who want to store a lot of simple data quickly, retrieve it at speed, and in many cases don't really care if it persists in the case of a power cut.

Document databases store JSON-style data whilst indexing it for easy retrieval, allowing the programmer to map RESTFUL interfaces to a store without the hassle of converting it to relational format. Programmers call this impedance mismatch, after the term in electronics which stops you plugging any old instrument into the microphone socket of a recording device. The model inside the programmer's head is so very different from the relational model that it's a real hassle to convert between the two, so use a store that matches the model more closely.

Distributing data at scale is harder for relational databases, but curiously it's not because they are relational. The problem is that ACID (atomicity, consistency, isolation, durability) transactions (which are essential if you want your database to be consistent and giving correct answers) are very hard to do at scale, locking tables across continents is not easy, and the process is made harder as you may have multiple write points.

There are ways to do it, two-phase commit or consensus-do work, but they are clunky and will slow your system down. Scalable NoSQL databases solve the problem by either throwing transactions out or by admitting that your distributed system may not give the same answer across all nodes, or may simply refuse to give the answer.

It's not perfect, but at the moment it's what we have.

Where's the response?

Traditional database vendors, though, are fighting back. Microsoft's SQL server (as of version 2016) offers a way to store and retrieve JSON data in a relatively painless way, although the data itself is stored in the relational engine. You should get all the benefits of a SQL engine without that nasty impedance mismatch programmers hate. Oracle's NoSQL goes further, storing JSON or name value pairs across a sharded, shared nothing system.

Among the newer breed of tech firm – those in the cloud – Google has tackled distributed transactions with Spanner, and with F1 built on top of it, allows a SQL-like environment to be distributed (although it makes use of atomic clocks for time synchronization).

There is, however, one problem that relational databases are simply not good at: graphs. Here's a problem: design a relational database that supports threaded message boards that allow threads of any depth that can be queried from any message to give parents, children and siblings.

I've no doubt that many readers can do it and it's obvious from forum software that it can be done, but solutions are complex and can be hard to understand. This is just a special case of a graph where any node can be joined to any other node with a direction attached to the join.

Actually the joins aren't joins, they are edges; if they were joins then we might think of them as relational joins, which they aren't, as they have a direction which a relational join doesn't. In any case, a network of nodes is very hard (but not impossible) to model in a relational database, and can suffer from massive performance problems. Graph databases solve it by actually linking nodes with software edges. Think of your network of friends – you're in the center (naturally), you've got close friends you really like, some of the friends of your friends are OK, and there are some you dislike even if they seem to like each other. Draw it out and you'll soon see how complicated it is; graph databases eat this sort of data up.

Most modern graph databases can handle a huge number of nodes and edges, both of which can have a large number of properties, and better still, some of them even handle transactions. Neo4j, for instance, can handle tens of billions of nodes and relationships, although speed of querying will depend on the amount of RAM on your server.

Distributed graph databases such as Titan can be tuned to support ACID or eventual consistency and can be configured to run on top of popular NoSQL databases. If you look long enough down the back of your sofa, you might even get enough money to buy a Cray graph database appliance, or you could ask international security agencies if you could borrow theirs.

Although the underlying graph is easy to understand in both systems, the only downside is you will have to learn a new querying language, as SQL really isn't built to query this type of structure. Each of the databases has their own language, so choose wisely based on what it is you're trying to do. Also, you need to look at the architecture – will you need vast distribution or will a couple of high-power servers with lots of memory do the job?

Relational engines might someday find a way to optimize graph-style queries, but writing a SQL statement that will find all my friends' friends that are two nodes away should always cause a program a severe migraine. Graph databases and their languages, on the other hand, are specifically designed for the task. JSON, name value pairs, and columns stores could all get eaten by SQL engines and the problem of distributing transactions might be solved, but it's hard to see the relational engine eating the graph problem any time soon. ®

Other stories you might like

  • Experts: AI should be recognized as inventors in patent law
    Plus: Police release deepfake of murdered teen in cold case, and more

    In-brief Governments around the world should pass intellectual property laws that grant rights to AI systems, two academics at the University of New South Wales in Australia argued.

    Alexandra George, and Toby Walsh, professors of law and AI, respectively, believe failing to recognize machines as inventors could have long-lasting impacts on economies and societies. 

    "If courts and governments decide that AI-made inventions cannot be patented, the implications could be huge," they wrote in a comment article published in Nature. "Funders and businesses would be less incentivized to pursue useful research using AI inventors when a return on their investment could be limited. Society could miss out on the development of worthwhile and life-saving inventions."

    Continue reading
  • Declassified and released: More secret files on US govt's emergency doomsday powers
    Nuke incoming? Quick break out the plans for rationing, censorship, property seizures, and more

    More papers describing the orders and messages the US President can issue in the event of apocalyptic crises, such as a devastating nuclear attack, have been declassified and released for all to see.

    These government files are part of a larger collection of records that discuss the nature, reach, and use of secret Presidential Emergency Action Documents: these are executive orders, announcements, and statements to Congress that are all ready to sign and send out as soon as a doomsday scenario occurs. PEADs are supposed to give America's commander-in-chief immediate extraordinary powers to overcome extraordinary events.

    PEADs have never been declassified or revealed before. They remain hush-hush, and their exact details are not publicly known.

    Continue reading
  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading

Biting the hand that feeds IT © 1998–2022