Free Riak database acts like depressed teenager to assure data reliability

Version 1.3 constantly interrogates itself to make sure data is ok


Basho's NoSQL Riak database has been given an upgrade that makes it question its own integrity at all times – a tedious trait in people, but a handy one for assuring data preservation in massively distributed information stores.

The "active anti-entropy" feature in Riak 1.3, which was released on Thursday, means the open source database will constantly interrogate key-value data stored in multiple locations to repair divergent, missing, or corrupted replicas.

Previously, Riak checked up on the health of data during read requests. Administrators could also issue a repair command via the Riak Console, which would fix all data in a specific partition. But that depended on knowing that data in a certain partition was outdated – "not super-viable for large deployments," Basho director of product management Shanley Kane told The Register via email.

Active Anti-Entropy, by contrast, runs in the background.

"Having things running in the background also detects problems sooner, which not only repairs problems sooner but also allows repairs to happen incrementally over time versus, say, only verifying data once a week and repairing lots of data in bulk," Kane said.

By running in the background, anti-entropy can be particularly useful for maintaining cold data – that is, data that is not likely to be read for long periods of time, Kane said.

Anti-entropy works by determining the difference between bits of data via a hash tree exchange. This means the amount of information exchanged during Active Anti-Entropy interrogation is proportional to the differences between two replicas.

The hash trees are stored as persistent entities in LevelDB instances outside the standard Riak key-value data. Trees are expired and rebuilt once a week to protect against the hash diverging from the underlying data, and trees are updated whenever a write is sent to the Riak-stored data.

Along with the entropy feature, IPv6 support has been expanded to the protocol buffer interfaces and some parts of Riak's data transfer component. Also, the Riak Control management software has been given a polish to give developers that 'new iterative release feel'. The update to the database is available via Basho's Github page.

Swanky multi-data center replication for enterprises with cash

Riak Enterprise, a paid-for version of the Riak database that Basho develops for large businesses, has received an update as well, in the form of "Advanced Multi-Datacenter Replication Capabilities."

The technology makes it easier to replicate information between multiple data centers without suffering a huge latency hit.

This is one of the trendier features to implement in a database at the moment, and follows TransLattice adopting a similar technology for their cloud-based tech.

Riak 1.3 updates the paid-for Riak Enterprise product's "Advanced Multi-Datacenter Replication Capabilities."

This upgrade give administrators the option of streaming data from one cluster to another over multiple TCP connections – the number of which tops out at around one per physical node – where previously you could only use one TCP connection per cluster, which could create performance bottlenecks.

"The new replication improvements are already used in production by customers and yielding significant performance improvements," Riak said. "For now, the new replication technology is available in advanced mode: it’s optional to turn on."

However, the technology does not yet support SSL, NAT, or full-sync scheduling, so organizations with thorough security policies may have to do a bit of extra work to use the feature.

"Our expectation is that most enterprises will already have trusted transport layer security solutions in place, but adding SSL to the advanced mode is one of our top priorities," Kane said, and indicated SSL support should come along in May.

Riak Enterprise pricing starts at $6,000 per node, and discounts are available as the number of nodes increases. ®

Similar topics

Broader topics

Narrower topics


Other stories you might like

  • Cassandra vendor DataStax secures $115m investment for $1.6b valuation
    Tech stock crash fails to deter Goldman Sachs as it leads funding round in the real-time data specialist

    DataStax, the database company based on the open-source Cassandra system, has secured $115 million in funding for a $1.6 billion valuation.

    Led by the Growth Equity business within Goldman Sachs and backed by RCM Private Markets and EDB Investments, the latest round follows a strong first quarter based on the popularity of DataStax's Cassandra DBaaS Astra DB. Existing investors include Crosslink Capital, Meritech Capital Partners, OnePrime Capital, and others.

    Cassandra is a distributed, wide-column store database suited to real-time use cases such as e-commerce and inventory management, personalization and recommendations, Internet of Things-related applications, and fraud detection. It is freely available on the Apache Version 2 license, although DataStax offers managed service Astra on a subscription model.

    Continue reading
  • NoSQL player Aerospike links up with Starburst for SQL-based access to edge data
    'We’re not necessarily replacing Snowflake' is an interesting choice of words

    Aerospike, the value-key NoSQL database, has launched a collaboration with data connection vendor Starburst to offer SQL access to its datastores.

    Dubbed Aerospike SQL Powered by Starburst, the system hopes to offer data analysts and data scientists a single point of access to federated data in Aerospike using existing SQL analytic tools such as Tableau, Qlik, and Power BI. It is the first time Aerospike has offered an off-the-shelf tool to analyze its database using SQL, the ubiquitous database language.

    Aerospike was purpose-built with a highly parallelized architecture to support real-time, data-driven applications that cost-effectively scale up and out. It claims to offer predictable sub-millisecond performance up to petabyte-scale with five-nines uptime with globally distributed, strongly consistent data.

    Continue reading
  • Aerospike takes swipe at document databases JSON support
    MongoDB, Couchbase in the crosshairs as Aerospike tries to broaden use cases

    Distributed NoSQL database Aerospike has added support for JSON documents to a slew of new features included in its Database 6 release.

    The value-key database has established its niche by touting high throughput, low latency and global scalability. It is adding support for the document format in a bid to broaden use cases and take on document database specialists Couchbase and MongoDB.

    Lenley Hensarling, chief product officer, said some customers were supporting as many as 13 billion transactions per day on Aerospike. Support for JSON document models, the Java programming models, and JSONPath query would help users store, search, and better manage complex data sets and workloads.

    Continue reading
  • Cassandra 4.0 release held back after Apple engineer discovers last-minute bug
    Bid to build the most stable iteration of the columnar database has its price

    Cassandra 4.0 – the open-source distributed NoSQL database used by Apple, Netflix, and Spotify – has been delayed at the 11th hour after a developer spotted a bug in the code.

    Project contributors had committed to making the much-anticipated release the most stable yet and wanted to ensure it shipped with no known issues. But the world will have to wait a little longer for the release, previously slated for 8am BST, 19 July.

    "In preparing the 4.0 GA release, the Apache Cassandra community identified a fix to be made late Friday. Because of this, the release is being held until the fix is complete. We will share the new release time as soon as we know," a community spokesperson said.

    Continue reading
  • The first rule of NoSQL DBaaS club is: You must talk about NoSQL DBaaS club. And Couchbase is in
    Follows its customers into the cloud

    Couchbase's database-as-a-service product has hit general availability – although this is just on Amazon Web Services initially, with Microsoft Azure and Google Cloud Platform soon to follow.

    The document-oriented database will be available as a service in the public cloud providers, but also on virtual private cloud deployments on AWS, which the vendor behind the open-source system said would help customers lower operational costs compared with earlier approaches to deployment.

    Despite taking over management of the database, Couchbase offers customers some control of configuration and node-level performance. Through a single-pane-of-glass control environment, users can manage multi-cloud instances and cross-data-centre replication.

    Continue reading
  • Analyse this: Microsoft promises OLAP-OLTP 'Link' with new CosmosDB features
    Azure SQL Data Warehouse

    Ignite Microsoft has pushed out Azure Synapse Link for Cosmos DB to general availability in an effort to bring its transactional NoSQL database closer to the analytics workhorse data warehouse.

    Teased in May last year, Microsoft said the link would comprise of two main components.

    Firstly, a Cosmos DB would house a column-oriented analytical store within containers in addition to the existing row-oriented transactional store. "The analytical store is fully isolated from the transactional store such that queries over the analytical store have no impact on your transactional workloads," developers Ramnandan Krishnamurthy and Sri Chintala said in a blog post last year.

    Continue reading
  • Aerospike adds set indexing and SQL expressions to make the distributed NoSQL database more ML-friendly
    New Spark 3.0 connector will appeal to users too, analyst says

    Distributed NoSQL database Aerospike is introducing set indexes and SQL operations within expressions in the pursuit of greater machine learning efficiency via its Apache Spark 3.0 connector.

    Speaking to The Register, chief product officer Srini Srinivasan claimed the combined tweaks could help reduce the feedback cycle to improve ML models from days to hours.

    A key-value and multi-modal database, Aerospike can run on the edge to support so-called real-time decisions based on pre-existing ML models in applications such as fraud detection. It is also used to feed data back into the ML model management commonly used by data pipeline platform Apache Spark to ensure models reflect changes to data patterns in the real world.

    Continue reading

Biting the hand that feeds IT © 1998–2022