This article is more than 1 year old

Using SQL techniques in NoSQL is OK, right? WRONG

Don't try and cut corners, or you'll end up in a twisty, tangly mess

Agh, ACID!

As developers, we are used to ordering the output from the SQL statement, but here again, CQL has limitations: you can only order on the second column of the compound primary key.

And if that sentence doesn't make sense, then you are beginning to see why you can't just let your SQL programmer loose on a NoSQL database without a good understanding of the data model.

Although not part of the relational model, SQL programmers have become reliant on ACID (atomicity, consistency, isolation, durability) transactions to keep their data durable and consistent.

Transactions mean that you don't need to worry about two users trying to pick the same user name at the same time – your transaction will make sure only one can succeed (even if you don't actually specify a transaction, the insert statement will do it for you).

That's great if the data is on one box, but what happens if the cluster is distributed around the world and your users are likewise distributed? Either your system chooses some sort of distributed locking mechanism (complicated and difficult) or you just throw ACID away and hope for the best.

This is exactly what most NoSQL systems do. Some, such as graph databases, do keep ACID transactions, but generally you can't distribute them anyway. This is where eventual consistency comes in: using our user name example, both users pick the name they want. It appears to succeed, but actually only one user gets the name they want and the other is left with a puzzle – the system said they got their user name but now they can't log on.

Your developers need to be aware of this and put these locks into the application, or they can take advantage of lightweight transactions such as Paxos in Cassandra.

These are designed to solve exactly the problem just outlined. Only one of the username registrations will succeed, but there is a huge performance hit while the system gets around eventual consistency. Your NoSQL developer must be aware of when to use these transactions and when not to.

But wait: it's not all bad news. These SQL-like languages don't just remove features. They can take advantage of the underlying data structure to add features you won't normally find in a SQL language.

Take for instance "Time To Live" in Cassandra, which makes your data disappear after a certain amount of time. This can give the developer the ability to add a feature to an application that would be painful in a traditional relational database if, say, a user uploaded photos that disappear after a certain amount of time.

Then there are Maps, Lists, and Sets as fields in the table. These let the developer store many attributes under one field, for example users could have multiple email addresses in a field storing their user data. If you want, these can map on to your application structures, making persisting state easy.

The bottom line here is to train your developers into understanding that even if it looks like SQL and quacks like SQL, if it's on a NoSQL database then it isn't SQL.

Official training on the product can help of course. Even if it's expensive, it's possibly the best way forward. Relying on tutorials on the web is dangerous, as these often go out of date very quickly and often don't state the software version they refer to. I have spent many an hour trying to follow a tutorial, only to find that the latest version of the software doesn't implement the feature in quite the way described.

As with everything on the internet, free is never quite what it's cracked up to be. ®

More about

TIP US OFF

Send us news


Other stories you might like