The beta version of Apache Cassandra 4.0, the most significant new release of column-oriented NoSQL database in years according to those involved, hit the downloads centre this week, with the promise of a 5x scaling speed improvement, new observability features and less "unnecessary garbage".
Although official support for Java 11 has missed the release, it is in the pipeline.
The robustness of the new release comes through its new Zero Copy Streaming features, which aims to make it faster and more efficient to scale and recover node outages in the distributed database, according to DataStax, a vendor supporting the Apache open-source project.
Joshua McKenzie, head of open source strategy at DataStax, said: “If you release stuff [data] through the Linux kernel, you pay a price for it. The notion behind zero copy streaming is to avoid having to pay that price: to get things directly to the application; do things in bulk; don't de-serialize and re-serialize the data. The whole idea behind it is basically just pick up a chunk of a things that someone else needs, send it to them as fast as possible and get all the middlemen out of the way.”
He said the feature would increase operational efficiency and lower skills level necessary to deploy and manage database clusters. “It’s part of the play to make it easier for operators,” he said.
The explanation relates to the way Cassandra can create unnecessary garbage and slows down the whole streaming process as some Sorted Strings Table can be transferred as a whole file rather than individual partitions.
If you release stuff [data] through the Linux kernel, you pay a price for it. The notion behind zero copy streaming is to avoid having to pay that price...
Cassandra 4.0 modifies the streaming path to add additional information into the streaming header and uses ZeroCopy APIs to transfer bytes to and from the network and disk. So now, an SSTable may be transferred using this strategy when Cassandra detects that a complete SSTable needs to be transferred.
McKenzie said the release was user and community-led and described himself as a “muppet” of the democratic Cassandra open source community - which we take to mean its mouthpiece.
The second tranche of features focus on security and observability and include an audit logging feature for operators to track the data manipulation language (DML), and control language (DCL) activity with minimal impact to normal workload performance, DataStax said. The point is to help address data governance and compliance, such as with GDPR, more easily.
Other observers said the release could be the most stable of the NoSQL database. Ben Bromhead, CTO of Cassandra consulting and support firm Instaclustr, said the Apache Cassandra community had struggled with finding the right balance of features, stability and performance for major version releases for a long time.
“The 4.0 beta demonstrates a maturity as a community to step back and take a hard look at what our priorities are and start to change our processes and focus based on those discussions. The result will (hopefully) be one of the most stable major releases of Apache Cassandra yet.”
There had been some friction between DataStax as dominant player, and the rest of the community. But in April, DataStax veep of developer relations Patrick McFadin said the company had been working hard to correct the perception since the arrival of CEO Chet Kapoor from Google in October last year.
In future Cassandra features in the pipeline is support for Java 11 and its new Z Garbage Collector (ZGC) that aims to reduce the pause times for this memory management exercise no more than a few milliseconds.
Datastax’s McKenzie said the community was unsure of whether to hold up the release of Cassandra 4.0 to ensure support for JDK 11 could be included. But it eventually decided to go ahead with the release and “address things that come up that are JDK 11 specific”.
“There was an unofficial poll and something like 25 per cent of the community is running, Cassandra on JDK 11 already so, the thing definitely works,” he said.
General availability of Cassandra 4.0 would depend on the community but is expected before the end of the year, he said. ®