Cassandra redesigns indexing, storage management for 5.0 release
Users warned to get off 3.x releases as support ends
The Apache Software Foundation Cassandra project has released the 5.0 iteration of the wide-column store database boasting new features to improve vector search, a Java update and enhanced performance.
Cassandra is designed to support highly distributed systems where writes exceed reads, and so-called ACID compliance is not important. Netflix, for example, has been using Cassandra since 2013, replacing Oracle databases and using the NoSQL system to support global accounts and customer data worldwide.
New features for the 5.0 release, made generally available on Thursday last week, include Storage Attached Indexes (SAI) which promises to boost query flexibility and performance, especially for large datasets. The approach to move indexing closer to the data, improves query performance, and replaces the original Secondary Index feature.
The database upgrade also includes Vector Search, which supports a vector data type and indexing for Approximate Nearest Neighbor (ANN).
"Cassandra 5.0 lays the groundwork for advanced AI and machine learning applications. Developers building Generative AI applications can combine Cassandra's scale and distribution with the latest search technology," the project said in a statement.
Apache Iceberg promises to change the economics of cloud-based data analytics
READ MORESarma Pydipally, a Cassandra contributor and freelance database engineer with experience of large-scale systems in the telecoms sector, said the two updates would work together to improve the database's performance in support of applications based on GenAI.
He explained that in earlier approaches to indexing, each node would only contain the index information for its own data, slowing the performance of distributed queries. "The SAI [Storage Attached Indexes] model is little bit different as it is attached to the data itself."
"Storage attached indexes are going to change the way we create indexes in Cassandra. It seems to solve that indexing problem for Cassandra, and the vector search is totally dependent on it," he said.
Patrick McFadin, developer relations veep at Cassandra vendor DataStax, said the net effect of changes to the way data is stored and the way storage is managed would mean organizations "can run less Cassandra."
- What do CTOs hate most about GenAI? Tool changes that break stuff
- Italy's climate super computer, Cassandra, to combine HPC with AI
- Google advances with vector search in MySQL, leapfrogging Oracle in LLM support
- SAP jumps on AI-assisted coding wagon, but uses its own ABAP language
"You don't have to use as many nodes to get the same amount of effect… that shows up as node density, you get much higher node density," he told us.
The Cassandra 5.0 release also upgrades Java support to JDK 17, bringing performance improvements of up to 20 percent in some cases, according to the community announcement.
With the emission of 5.0, the Cassandra project announced the end-of-life (EOL) for the 3.x series, including versions 3.0 and 3.11. It said it would evaluate security patches contributed to unmaintained branches on a case-by-case basis while the application of CVE fixes is not guaranteed.
As of last year, a DataStax survey showed around a third of Cassandra users were on the 3.x releases, McFadin said. ®