Analysis At its annual analyst conference last week, IBM announced its next generation database. The big news is that this will not be a relational database. Or, to be more accurate, it will not just be a relational database.
IBM has concluded, rightly in my view, that using a relational approach is not adequate for processing XML. Either you store it in relational format, in which case you get a major performance hit because you have to convert it to and from tabular format whenever you store or retrieve it, or you have to store it as a binary large object, in which case you can’t do any processing with it.
So, using relational storage is inadequate for one reason or another, and IBM has concluded that another approach is necessary. The company’s next generation database will therefore have two storage engines: one relational store and one native XML store. And let me be quite clear about this: these engines will be completely separate, with separate tablespaces, separate indexes (Btrees and so forth on the one hand, and hierarchical on the other), and so on.
On the other hand, all the database management stuff, autonomics, the optimiser and so forth, will all be held in common and sit above the two engines. So, there is a database management layer and two database storage engines. This raises the question as to whether you might have more than two storage engines, to which the answer, in principal, is yes.
As far as marketing is concerned, IBM has not yet decided on the name of the new product which, incidentally, has been in alpha since June, and will be entering beta shortly. It is likely that the XML storage engine will be offered as an optional extra though there is obviously the possibility that you might want to license the XML database without the relational engine. As and when IBM moves the DB2 content repository to the new platform (something which has not been announced but which is an obvious next move), this could be a possibility.
So much for the hard facts; now for some opinion. First, I think this leaves Oracle and Sybase (as the two vendors with the best current handle on XML) well behind the curve, with Microsoft and the others more or less out of sight. What this release will allow you to do is to build applications that handle both XML and relational data much more easily, without losing any of the richness that this implies, and without degrading performance.
To a certain extent this release will help those few remaining vendors with pure XML databases: Software AG, Ipedo and Xyleme, for example; as it validates native XML storage. However, apart from specialised applications, most users want to be able to combine transactional and XML data which is what IBM is doing and these companies are not. This may change in the case of Software AG (see forthcoming article) but in the meantime, of the three companies mentioned, it is most likely to be Xyleme that benefits, as it is essentially a content management database vendor, whereas the other two (at present) are now mostly focused on integration.
Finally, I expect to see Oracle, in particular, to froth at the mouth at this announcement. It will no doubt declare that this is the wrong direction and the wrong road. In my opinion it will be Oracle that is wrong: you just can’t get both the necessary flexibility and performance that you need for XML unless you are prepared to move away from a purely relational approach. So any frothing at the mouth will be exactly that: froth and bubble.
Related stories
Sybase partners with IBM
Oracle rebuilds Warehouse
IBM puts new DB2 up for inspection