This article is more than 1 year old

Thank heavens for the silicon chip: A BRIEF history of data

You have a pair of bones in Africa to thank for Larry Ellison

They call it ACID

We started performing transactions in computing in the 1960s, but ACID properties were only nailed down in 1993 – by Jim Gray and Andreas Reuter. To put that achievement into perspective, Gray received a Turing Award for this work. But this time scale means that we had happily been performing transactions for about 30 years without truly understanding them.

Next up, it’s computing power. An image is an example of Big Data and scanning a picture to find the faces and matching those faces to a known set of images takes, ignoring briefly the complexity of the code necessary to do that, a great deal of processing power. We simply didn’t have the cycles is the early days.

Fourthly, the code is complex. It has taken a great deal of computer science and fundamental mathematics to work out the best way to analyse some of this Big Data.

So, for a long time, we collected and used both kinds of data but only analysed the tabular. Then, gradually, the four factors mentioned above began to be resolved and the world became interested in analysing Big Data. New data models were developed, we started to become very interested in concepts like late-binding schemas and Big Data analysis blossomed as tabular analytics had done in the nineties.

The big question: has data finally come of age? Certainly not. While it’s no longer a babe in arms, it is still a youngster – think of it as a terrible teenager. Where will its life take it now?

I think that we will continue to see new data models and engines emerging. Why? Well the whole point of a new data model is that it allows particular types of queries to be expressed easily and to run rapidly. For example, graph databases let us query social network data very efficiently.

This proliferation of models is wonderful, but it has a downside: the systems are becoming more specialised. More than once I have had to copy my data into two different engines simply to get the benefits of both. In the most recent case I was holding the same data in a relational engine in order to run SQL and a multi-dimensional engine to run MDX. I got the benefits of both languages and the strengths of the two engines but at the cost of data duplication.

What we may and should see emerging are engines that hold the data in some complex internal format, but are capable of allowing it to be viewed and queried in multiple ways. Intersystems has an engine called Caché that already does this – it can simultaneously present the data it is holding as relational and/or object-oriented. Others should follow. And, gloriously ironically given the name, we are seeing some NoSQL systems appearing that now have SQL interfaces.

I think these kinds of developments are an important indication of the way forward, as data moves into its next decade of life as an older, more experienced and more mature entity. ®

More about

TIP US OFF

Send us news


Other stories you might like