DataStax, the lead vendor behind open-source DBMS Apache Cassandra, is rolling out a database-as-a-service (DBaaS) version of its wares, while the Cassandra-emulating ScyllaDB has pushed out its own upgrade to compete with DynamoDB, the AWS-native version of the same database.
The updates are part of a flurry of activity in the NoSQL database world, that has also seen Aerospike, a value-key database used by PayPal and the European Central Bank, launch its fifth iteration, with features including low latency and consistency in transactional applications distributed across global locations.
Although Cassandra and its derivatives have different strengths and use cases to Aerospike, the releases show how the NoSQL market is gearing up in terms of performance, scale, and cloud-readiness.
The Multi-Site Clustering in Aerospike's Database 5 is said to eliminate the trade-off between data consistency and high performance, often a problem with legacy relational DBs. Aerospike reckons this holds true whether the database runs across multiple data centres, cloud locations or between on-premises and cloud in different regions.
All about that DBaaS: NoSQL cheerleader MongoDB sees revenue spike, but expects 'up to' $25m Covid-19 hit for 2021READ MORE
Srini Srinivasan, chief product officer and founder of Aerospike, told The Register: "An entire database can be in one rack, so you could drive one rack of the cluster in a public cloud, another in your own data centre region as a single cluster and between them there will be a coordinated distributed transaction for every write."
Meanwhile, the database can be read locally within an application, because an entire copy of the cluster is held on a single rack. "You get no read latency," he claimed. "And also, the system is active so an update which is happening in one place, is immediately reflected across on the other sites."
While there was still fractional latency running the system across the globe, it was greatly reduced compared with rival NoSQL systems, he said. "It depends on the distance between sites. You know that could be anywhere between a couple of milliseconds if it's the same city, or across the continent, it could be 150 milliseconds. And that's a small price to pay to do some of these processes, which currently take hours or days to be consistent."
Aerospike is popular in the financial services sector, with customers including Barclays, but it is branching out into IoT applications in industry and logistics.
Donald Feinberg, Gartner distinguished VP and analyst, said it is a feature that global organisations will want to see. "Getting data centres running together consistently, that's always created a problem of latency. If you want consistent transactions in a multi-region application, the traditional approach is a two-phase commit and it takes time and can slow the transaction."
DataStax buries Apache hatchet and launches features to make NoSQL Cassandra faster, safer and more graphableREAD MORE
While there will be some lag in AeroSpike's approach – "you can't change the speed of light," he said – it was talking about a greatly reduced time to achieve consistency improving overall service levels.
Although it has built a connector to the Apache Spark analytics engine, Aerospike remains largely a transactional database.
Unlike Aerospike, Apache Cassandra was not designed primarily for transactions, but as a distributed storage system for managing structured data that can scale to a very large size across many commodity servers. The lead vendor contributing to the project is DataStax, which today announces the general availability of DataStax Astra, a DBaaS for Apache Cassandra applications. The objective is to reduce cloud application deployment time from weeks to minutes, removing the biggest obstacle to Cassandra deployments. Developers can also build applications without the operational overhead of Cassandra, DataStax said.
Astra is available on Google Cloud Platform (GCP) and AWS with support for Azure expected to follow.
Amazon pushes the button on Keyspaces: Cassandra lookalike to boost its NoSQL credentialsREAD MORE
Ed Anuff, DataStax chief product officer, said: "A challenge for databases like Cassandra is it's a lot more complicated to operate in the cloud [than a relational database]. Back in the day when people were using MySQL, you could just run a single instance of it, whether it was on your laptop or on an ISP account. When you're running a distributed database it's a lot more complicated to operate."
DataStax's customers include Bank of America, eBay and online learning company Coursera. Although some customers want to run Cassandra themselves, others wanted the DBaaS option, he said.
Gartner's Feinberg agreed that customers were looking to ease deployment and management headaches with Cassandra. "They tell us the number one issue is management. If you have multiple nodes it can become complex and cumbersome."
The same reasoning is behind the development of Amazon Keyspaces, a serverless rendition of a Cassandra copy-cat database, he said.
Amazon how easy it is to swap...
Another Cassandra clone, ScyllaDB, released its fourth iteration last week. It promises APIs compatible with both Cassandra and a DynamoDB, Amazon's other Cassandra lookalike, to make it easier to swap applications to a new database and avoid vendor lock-in, ScyllaDB said.
It can run on-premises, on a preferred cloud platform, or on Scylla's fully managed DBaaS, Scylla Cloud.
But the secret sauce is in the code, said ScyllaDB CEO Dor Laor. From the ground up, the database was rewritten in C++ rather than the Java which underpins Cassandra.
"The first advantage is performance and better performance means cost reduction because you always run the database distributed: when you drive more performance from a smaller cluster it cost less and it's simpler to manage. So, we can perform six to 10 x times better than Cassandra, both in throughput and in latency together. We don't have Java, so we don't have the JVM, which is a big layer in the middle. We've also implemented many algorithms that allow us to perform better.
"For example, we control our own memory management, whereas in Java they rely on Linux page cache which is a cache in Linux which is generic: our cache is tuned for us, and we have just one main cache and not many."
Although Comcast, Starbucks, and Samsung have adopted ScyllaDB, Gartner's Feinberg remained sceptical about the company's performance claims.
He said in every respective release of ScyllaDB and Cassandra, each would claim better performance than the other. Meanwhile, AWS DynamoDB supports Amazon Prime. Although "the guy who wrote it is sitting next door," it does show the performance possibilities with the database, he said.
He was not convinced ScyllaDB performance advantages would make customers move. "Why not use Cassandra if it is working fine for you?" he said.
As the NoSQL market grows, the names get more ridiculous and the options become more baffling – especially with numerous replicas of essentially the same database. But IT teams should be wary and pick according to the cost and use case, rather than pure performance or marketing vision. ®