Using SQL techniques in NoSQL is OK, right? WRONG

Don't try and cut corners, or you'll end up in a twisty, tangly mess

Now that we have CQL for Cassandra and N1QL for Couchbase, it seems the power of SQL is being opened for NoSQL database developers.

There is, however, a real danger that developers who see these SQL-like languages as an implementation of SQL are in for a world of pain as they try to leverage their SQL skills on a NoSQL database.

Unless the developer has a correct understanding of the underlying data model, then almost-SQL-like statements will at best be slow, and will at worst produce incorrect results. Worse still, there are things in SQL you just can't do in CQL, and a lot in CQL that makes no sense in SQL. How can you avoid these problems?

It's no surprise to find that the first thing NoSQL database programmers did in the search for speed and flexibility was throw away the SQL language. This gave them what they were looking for, but the price to be paid was a lack of a standard interface, which made it complicated and difficult to learn.

These interfaces did allow a lot of flexibility, but often the interface library was from a third-party developer, which could lead to a lack of support.

The introduction of SQL-like languages helped alleviate the problem, allowing the SQL programmer to transfer knowledge straight to NoSQL databases. Further, the language is developed by the database programmers, giving some sort of support from official sources. So how like SQL are these languages? Using Cassandra's CQL language you can select data with:

Select * from table where key="Name";

And insert data with:

insert into Sensors (name,insertion_time,reading) values (5f388896-71c4-402c-b7c6-8610e305422f,'2015-01-01',35);

Which looks fabulous: you can replace your expensive fibre-channel servers with cheap commodity clusters, transfer the data and enjoy blazingly fast query responses and data inserts. Except, of course, you can't.

Although the open source developers are doing a fantastic job of making their data look like regular relational tables, the data structure underneath will be far different to anything you'll find in SQL server or Oracle – or anything else, really.

There are good reasons NoSQL is so fast (and distributable), mainly that these engines have done away with table joins. As developers, we are used to working with normalised tables and joining them on the fly with primary and secondary keys.

The engine takes care of the joins and a correctly indexed table will be reasonably performant. However, in a NoSQL cluster, the data is likely to be distributed across a number of machines, so joining tables will take time. As a consequence, the databases won't let you do it. Sure you can add a system on top of the data (Hadoop or Spark), or you can roll your own in the application, but you will have to take that performance hit.

All of which leads to one inescapable fact: with a NoSQL system, your design team can't start off with an entity relationship diagram, and modelling data on a graph store, a column store or a key value store is always going to be very different.

Sure, the SQL-like language may make the data look like it's stored tables, but actually it's going to be very different. Besides the little matter of not being able to join tables, this can effect how we can select data.

Most developers will be accustomed to using the SQL "where" statement to select rows that match a certain criteria. CQL, for instance, does implement a where statement, but you can only use it to select rows on the primary and some indexed keys. You can add indexes to columns, but unlike a SQL index, these do not speed up queries, but rather slow them down. They do allow more where queries to be performed, but your performance will suffer.

Worse, the order of conditionals (and things such as statements) is important, and you can only effectively use them on the clustering column. You can get round this by using a keyword (Allow filtering), but this puts a huge burden on the servers – the entire column is read into memory and then filtered. Bang goes your blistering performance.

Broader topics

Other stories you might like

  • Talos names eight deadly sins in widely used industrial software
    Entire swaths of gear relies on vulnerability-laden Open Automation Software (OAS)

    A researcher at Cisco's Talos threat intelligence team found eight vulnerabilities in the Open Automation Software (OAS) platform that, if exploited, could enable a bad actor to access a device and run code on a targeted system.

    The OAS platform is widely used by a range of industrial enterprises, essentially facilitating the transfer of data within an IT environment between hardware and software and playing a central role in organizations' industrial Internet of Things (IIoT) efforts. It touches a range of devices, including PLCs and OPCs and IoT devices, as well as custom applications and APIs, databases and edge systems.

    Companies like Volvo, General Dynamics, JBT Aerotech and wind-turbine maker AES are among the users of the OAS platform.

    Continue reading
  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading
  • Another AI supercomputer from HPE: Champollion lands in France
    That's the second in a week following similar system in Munich also aimed at researchers

    HPE is lifting the lid on a new AI supercomputer – the second this week – aimed at building and training larger machine learning models to underpin research.

    Based at HPE's Center of Excellence in Grenoble, France, the new supercomputer is to be named Champollion after the French scholar who made advances in deciphering Egyptian hieroglyphs in the 19th century. It was built in partnership with Nvidia using AMD-based Apollo computer nodes fitted with Nvidia's A100 GPUs.

    Champollion brings together HPC and purpose-built AI technologies to train machine learning models at scale and unlock results faster, HPE said. HPE already provides HPC and AI resources from its Grenoble facilities for customers, and the broader research community to access, and said it plans to provide access to Champollion for scientists and engineers globally to accelerate testing of their AI models and research.

    Continue reading

Biting the hand that feeds IT © 1998–2022