As big-data hook ups go, they don't get much bigger: NoSQL and distributed computing pin ups Cassandra and Hadoop have been united by the Apache Software Foundation.
ASF has released Apache Cassandra 0.6, adding support for its Hadoop project. Both Cassandra and Hadoop are ASF projects, with Cassandra only graduating from Apache's early phase incubator phase in February.
The union will allow users to run analytics queries using the Hadoop map reduce framework against data held inside Cassandra.
Hadoop is an open-source project party based on Google's MapReduce technology that found large-scale use inside Yahoj!. Cassandra is one of a family of NoSQL systems that started life as a way to store and serve frequently accessed data in massive systems spanning tens of thousands of servers and millions of users. The idea is NoSQL is faster and its architecturally easier to construct that using a traditional relational database system in these environments.
The Cassandra NoSQL technology started at Facebook and became an ASF incubator project in 2009. Users include Digg, Cisco WebEx, Rackspace, Reddit, and Twitter.
As more data has been put into NoSQL systems. it has inevitably followed that those running them want to query it rather than simply use NoSQL as a holding pen for things like Facebook status updates, Tweets, or Digg posts.
Earlier this week, Gear6 announced that it's adding native query capabilities to its Memecached distribution, to create what it called a "NoSQL-like store".
Other improvement in Cassandra 0.6 include integrated row cache to eliminate the need for a separate caching layer, which ASF said would help simplify architectures, and a 30 per cent across-the-board increase in speed to handle increasing write loads by big customers. ®