EMC has launched its Greenplum Data Computing Appliance (DCA), promising twice the performance of Oracle's Exadata system.
The DCA is an online analytical processing (OLAP) engine for looking at business transaction data, mining it, and getting information out of it that can better describe customer behaviour, to help mobile phone suppliers reduce customer churn, losing them to competing suppliers, for example.
It uses Greenplum's massively parallel processing, shared-nothing architecture. A single rack has 16 segment servers inside, each using two Intel Xeon E5670, 6-core, 2.93GHz processors, making 192 Intel cores in total. The rack also has two redundant servers for co-ordination operations; they don't do data mining work.
There can be up to 24 racks, totalling 4,608 data mining cores. A DCA rack has 36TB of usable uncompressed disk space, using 600GB drives. EMC says there is 144TB with compression. The amount of compression varies with the type of data and EMC is using a generalised 4X compression factor.
The DCA is an integrated IT stack system, including database, compute, storage and network resources in a single product. It is available in half-rack, full-rack, and multiple-rack appliance configurations, and scales up to 3.46PB with compression.
Describing it, and not being ironic, Greenplum founder and EMC data products division CTO Luke Lonnergan said: "We don’t need anything esoteric."
He says we are entering an era of big data and massively parallel systems are needed to ingest it, mine (digest) it, and spit out results fast.
Customers can integrate it with EMC's Data Domain deduplicated backup, recovery and replication technologies, for data protection. Replication can also be provided by EMC's RecoverPoint product for disaster recovery.
This hardware runs v4.0 of the Greenplum database and EMC promises "the fastest data loading and best price/performance in the data warehousing industry". One DCA rack can ingest data at 10TB/hour, twice as fast, EMC says, as Oracle's Exadata system, and five times faster than Netezza and Teradata products. Performance scales linearly and a 24-rack system would theoretically ingest data at 240TB/hour.
Lonnergan said: "The strength of the appliance model is that it lands on the floor tested and configured at the point of manufacture, the weakness has been that many of these products are infrastructure islands.
"The DCA can be deployed and operated as a stand-alone Appliance, turn it on and data goes in while decisions come out, but you can connect it to an EMC array if you choose, replicate it with RecoverPoint and back it up to Data Domain.
"You’re now storing the data on your production arrays, getting long distance continuous remote replication with bookmarking and backing it up to deduplication storage with built-in integrity checking and bandwidth-optimised replication… it’s no longer an island in your data centre. It’s part of the infrastructure."
The Greenplum 4.0 database is shipping and available separately as software-only, to be run on X86 hardware, such as, EMC suggests, the Virtual Computing Environment (VCE) coalition Vblock infrastructure packages. The DCA product is available immediately. Pricing was not revealed. ®