Database-as-a-service company Cloudant is pouring its own code into the Apache CouchDB project: the company wants to invigorate the open-source distributed database while halting development of its own fork of the tech.
The code merge of Cloudant's tweaked BigCouch database into Apache CouchDB was announced on Monday, and brings features such as cluster management, compaction technology, and custom RPC into the distributed NoSQL database.
By injecting BigCouch's cluster features into the CouchDB community, Cloudant hopes to spur enthusiasm for the database, which is designed to handle the management and manipulation of datasets too large to sit on a single server. Alongside the code merge, the company is going to cease development of BigCouch.
"In terms of corporate strategy, a healthy CouchDB is healthy for Cloudant," the company's CTO Adam Kocoloski tells The Reg. "It's also something that is a source of leads for hires, and many other things."
BigCouch was developed to deal with some of the perceived clustering drawbacks of CouchDB, and uses a quorum system to ensure consistency when running at scale with many clusters.
"We're continuing work within the Apache project to integrate the clustering technology of BigCouch, but now we've set the stage and are welcoming more project committers to get involved," Jan Lehnardt, project management committee chair of the Apache CouchDB project, said in a statement.
"Cloudant's work fine-tuning BigCouch database replication at large scale now gives Apache CouchDB a complete strategy for replicating data across distributed systems," Lehnardt said, "whether nodes are Erlang clusters in the same data center or on the other side of the world."
BigCouch is inspired by the Dynamo database design revealed by Amazon in a 2007 academic paper, as is distributed datastore Riak. Among NoSQL systems, CouchDB/BigCouch's closest contemporary is Riak, an open source datastore developed by Basho with features designed for media, gaming, and telecommunication companies.
The two databases are becoming more similar over time. "The areas in which we differ are in some cases diminishing as we add other features the others had in the past," Kocoloski says.
Both Riak and CouchDB have had a limited impact on the market, with Basho claiming a few customers but not exhibiting torrential growth, and Cloudant looking much the same, though both CouchDB and Riak have lively mailing lists in which developers and users discuss the technologies.
DB-Engines, which ranks databases according to a combination of online chatter, job listings, technical queries, and Google Trends results, puts CouchDB in 19th place of its database rankings, versus 25th for Riak.
Though both are small compared to the giant systems of Oracle, MySQL, SQL Server, and the like, they are part of a new wave of databases and datastores that run under the banner of "NoSQL" systems.
This frightful term is generally taken to mean that the systems do not use SQL as their query interface, though this has changed over the years as many NoSQL companies have introduced SQL query overlays because having to learn a new query language can alienate a large portion of the potential database market.
"The code contributions we're announcing are focused on adding new key pieces of technology that enhance CouchDB's already strong reputation for operational stability and durability," Kocoloski says, "and it's a viable option for deployments where the dataset does not fit comfortably on a single server. This, along with ongoing efforts to enhance the server admin console and improve documentation, is part of a comprehensive strategy to ensure that CouchDB becomes a generally deployed system for a wide variety of applications."
These technologies promise much, but with few case studies and far fewer users than those found in the major systems, it will be some time before they make a clear impact on the monoliths of Oracle and Microsoft, or even NoSQL leader MongoDB. ®