Teradata update chews up, spits out columns
Plus: SQL-MapReduce alliance appliance
Data warehousing pioneer Teradata is turning up the dial on its eponymous parallel database to 14, adding in slew of new features that include the ability to process columnar data as well as the more standard-row-based chewing in relational databases. The company has also released an appliance running the hybrid row/column nCluster database that it got through its acquisition of Aster Data in March for $263m.
Scott Gnau, president of Teradata Labs, tells The Reg that the columnar support in Teradata 14 is not based on Aster Data nCluster, but is rather a tweak of the Teradata database and its underlying file system. The work on this feature predates the Aster deal, and has been in development for the past 18 months.
"We figured out a better mousetrap to provide columnar structures in the database," says Gnau. "This is a really big deal."
Columnar databases have been around for three decades, says Gnau, and can provide a significant speed-up for accessing and – sometimes more importantly – compressing certain kinds of data stored in a database. For example, if your iTunes library could be organized into columns, you could search the Beatles column for and get to a tune very quickly in the database. Otherwise, you'd have to search each song in the database for an artist named Beatles.
However, adding or updating data in columnar databases can be hard, Gnau says, which is why they are not prevalent in commercial computing. Yet.
Instead of building a separate columnar database and bolting it onto the side of the data warehouse, Teradata' feature makes use of changes to the Teradata file system that allow it to store a column-based table, just like it currently stores row-based tables.
The optimizers in the Teradata system are aware of columnar and row-based data, and you can mix and match columnar and row tables within the same database and even create a hybrid table that can be searched by row or column. This hybrid table is created by taking a row table and designating a couple of key columns for searches within it. The columnar tables in Teradata 14 can have normal database functions such as joins performed on them, and they can be compressed using run length, dictionary, trim, delta on mean, null, and UTF8 methods.
Teradata 14 will also get a bunch of new SQL commands that help with application portability from other database-management systems, and will also support a number of new ARRAY and NUMBER data types not previously supported without having to convert that data into a native format.
Teradata has also added row-level access controls to the database to enhance its security, and temporal data-analysis tools can now present data across multiple time zones as if they were in one time zone – a multinational company can, for example, look at all of its global operations on a single day or week during normal business hours across each zone.
The database also now has block-level data compression, and based on the hotness or coldness of the data it will compress or unzip data. Cold data can be migrated to slower storage in the cluster, should you have any.
The columnar support will be available in December when Teradata 14 ships. The new database is in beta testing now at Teradata customers.
Aster Data appliance for MapReduce
Unstructured data is building up all over the data centers of the world, particularly from web-page clicks, and every company believes that they have to keep every bit of data – structured or unstructured – because it may someday be useful.
Aster Data's SQL-MapReduce and the way it handles these steaming heaps of structured and unstructured data made a it particularly interesting acquisition for Teradata.
Well, there's also the fact that Teradata had to keep it from falling into the enemy hands of Oracle, IBM, or EMC; and that the nCluster hybrid row/column database runs on parallel x86 clusters, just like the Teradata software – in fact, both companies use Dell PowerEdge hardware. But we digress...
The news today is that Teradata is putting out Aster Database 5.0, an update to that SQL-MapReduce hybrid database, as well as plunking it and MapReduce onto a preconfigured appliance called – you guessed it – the Aster MapReduce Appliance.
Teradata's Aster MapReduce Appliance
If you don't want to buy the appliance, Aster Database 5.0 release sans appliance is certified to run on Red Hat and SUSE Linuxes on x86 servers. (Aster did RHEL, Teradata does SLES.)
The SQL-MapReduce framework that is part of the Aster Database allows for SQL programs or other business-analytics tools to make MapReduce calls to chew on data and use the results of that processing in their own SQL transactions. With the 5.0 release of the database, Teradata has tweaked the system-management software to do a better job of allocating memory in the cluster for SQL and MapReduce processes. It also includes pre-built MapReduce modules for analyzing clickstream behavior or doing marketing analysis, among other things.
The Aster MapReduce Appliance is based on two-socket Xeon 5600 server nodes with six-core Xeon X5675 processors running at 3.06GHz. The nodes are linked together using 10 Gigabit Ethernet switches, and the software stack – which includes the Aster Database at the 4.6.2 release or higher running on SUSE Linux Enterprise Server 11 – has the Aster-Teradata adapter that allows for data to be sucked out of a Teradata data warehouse to be chewed by the Aster database. The Aster MapReduce appliance scales up to six racks of servers, yielding more than 200TB of user space when compression is turned on.
"The appliance is all about the TCO, not the TCA," says Gnau, who adds that customers were asking Teradata to build a pre-configured system. "Because we are bundling and optimizing it, you can buy less hardware than you would in a cobbled-together solution."
The Aster MapReduce Appliance will ship in the first quarter of next year. Prices for the getup have not been set yet. ®