Sometimes the big data is bigger than you would like, and you need to hold onto it longer than you otherwise would for regulatory or business reasons. There's nothing worse than waiting to get a moldy gob of data back off tape, and it is even worse (and less likely to be successful) on a very large bucket of said musty data.
With this in mind, IBM's Netezza unit has rejiggered the TwinFin data warehousing appliance to be skinny on the blades and fat on the disks, so companies can create what is, in effect, a nearline data warehouse.
The Netezza High Capacity Appliance comes in a two-rack or four-rack configuration, and will eventually scale up to six and then eight racks and has four times the disk capacity and about 40 per cent less processing capacity than the normal TwinFin appliances. The TwinFin is based on IBM's BladeCenter x64 blade servers and chassis, and was before IBM paid $1.9bn to acquire the appliance maker last September.
The current machines are based on two-socket HS22 blades using four-core Xeon 5600 processors, and are mated with a field programmable gate array (FPGA) co-processor that Netezza uses to speed up the heavily modified PostgreSQL database that runs on top of the iron. The combination of the HS22 and FPGA blades is called an S-Blade. There are eight FPGAs on the accelerator blade - one for each x64 core - and they speed up the filtering of data moving off storage before being passed on to the database software as well as doing complex sorting and joins of database tables as part of analytical routines.
The High Capacity Appliance rack puts four S-Blades in a rack, with 32 cores and 32 FPGAs, and 144TB of uncompressed data in a dozen disk enclosures, each with a dozen 2TB drives. The rack also includes redundant host servers for loading data and distributing workload across the cluster and planning queries. The C1000-8 has two racks, for a total of 64 cores, 64 FPGAs, 288TB of uncompressed user data capacity, and 1.1PB of capacity with compression turned on. The C1000-16 doubles that up to four racks, and the future C1000-24 model will have six racks and the C1000-32 will eventually offer eight racks.
That top-end model will have 256 cores and FPGAs and 1.15PB of uncompressed data space and 4.4PB of compressed capacity. Such a behemoth, by the way, draws 44 kilowatts. Eventually, IBM plans to offer C1000-40, -48, -64, and -80 designations that scale to over 10PB of in 20 racks and offer data load rates of 5.5TB per hour.
Here's what the Netezza C1000 rack looks like:
Phil Francisco, vice president of product management for the Netezza unit (which is technically part of IBM's Information Management division, not part of its System x hardware division), says that the C1000 high capacity appliances will be available in the middle of July. IBM plans to charge $2,500 per user terabyte for the high-capacity appliance, which is considerably lower than the $10,000 per user terabyte that IBM charges for the regular TwinFin appliances. Those FPGAs and server nodes don't come cheap, apparently. ®