Comment Data warehouses have always had a problem with performance when it comes to (complex) analytic queries. Data Appliances are the latest answer to this issue. The question is: how well will they fare?
First, let’s be clear about the meaning of analytics – this is the processing of queries against transaction-level data. For example, queries like “which customers bought patio furniture within three weeks of purchasing a barbecue”? Note that we are interested in individual customers not just how many – not that there aren’t performance issues with regard to aggregated queries also. But the issue is more acute when granular level detail is required, in part because of the volumes of data that have to be searched.
This has long been recognised as a problem and various “analytic servers” have been put forward as solutions. The most common technique employed by the suppliers of these solutions has been column-based relational databases, but exotica such as vector-based databases have also been tried. While I have been a fan of these offerings for some time, the truth is that most of the suppliers (Sybase with Sybase IQ is an exception) have diverted into niche or different markets. For example, Alterian is focused especially on market campaign analysis, Kx Systems on stock ticker information, and Sand, although relatively successful in this sector, is increasingly focusing on its archiving capabilities. Meanwhile, Aruna has gone out of business and WhiteLight got bought out by SymphonyRPM.
In other words, and for whatever reason (including marketing), these solutions have largely failed to capture the attention of the buying public. Now a new solution is available, as supplied by Nētezza and Datallegro. These two companies offer data warehousing appliances that promise more (typically much more) performance, plus scalability, at lower cost, than conventional data warehousing solutions.
The big difference between these two vendors and their column-based rivals is that they use conventional relational (open source) databases; in the case of Nētezza based on PostgreSQL and for Datallegro, based on Ingres. This is likely to make them much more acceptable to the average database administrator, because there is no need to explain the concept of how the product works in software terms.
However, there is an issue over hardware because the whole point of an appliance is that it is a solution that blends hardware (processors – Linux-based, and disks) and software into a single solution. The advantage of this approach is not only that you get everything from one vendor but also that the software is specifically optimised to run on the selected hardware. As a result you get much better performance. Actually, there is more to it than that because, for example, you can implement software directly within disk controllers. Thus for some queries you can retrieve data at close to disk access speeds. To cut a long story short: the results are impressive.
The nice thing is that this is a well-trodden path. Going back to the days of Britton-Lee and then with the likes of Teradata and WhiteCross, the idea of mixing hardware and software, particularly in the data warehousing arena, is well-established. Thus Ntezza and Datallegro should not find it difficult to find customers and, indeed, there are a number of well-known companies that have already adopted this technology. It is too early to say that data appliances will take the data warehousing space by storm but there is lots of opportunity: there are plenty of users that are unhappy with the performance of their existing systems or that have dismissed the whole concept as too expensive. Data appliances offer a potential solution to both of these groups of users.
Related stories
Database developers get tooled up
IBM buys Ascential
Why IBM needs ETL