Meet the Tesla of the backup world – Datos IO (no, it doesn't make boxes that catch fire)

Data protection for distributed software

Analysis Startup Datos IO is the Tesla of the backup industry, redesigning data protection for distributed apps and focussing on high-end customers with code to solve specific problems no one else can solve anywhere near as well.

It now has ten paying customers for its RecoverX distributed database backup product, six months after its June 2016 launch. We have discussed it here and here. The software uses semantic deduplication as part of its armory, and that's what we are going to look at here.

RecoverX backs up non-relational databases running on-premises or in the Amazon Web Services (AWS), Google and other clouds. V1.5 RecoverX supports Google Compute Engine and Google Cloud Storage. It helps Google Cloud Platform users protect their next-generation apps hosted on non-relational databases (NoSQL, cloud, graph and more) deployed in IaaS and PaaS environments.

The ten represent financial services, retail, eCommerce, media and entertainment, healthcare, and internet of things (IoT). Ayla Networks is the IoT customer and Datos IO backs up its AWS cloud infrastructure and non-relational databases. It bought RecoverX after finding script-based backup unsuitable because of costs and inability to scale.

The retail customer is a Fortune 100 big-box home improvement retail business that migrated its existing e-commerce applications from on-premises data centers to public cloud with the Google Cloud Platform (GCP), to meet compliance standards and enhance operational efficiency.


Cassandra DBA Ishinder Singh's tweet asking if Datos IO had won Home Depot as a customer

Another customer is Barracuda Networks.

Datos IO says "RecoverX is founded upon Consistent Orchestrated Distributed Recovery (CODR), next-generation scale-out data protection architecture that is based on elastic compute services that can be auto-scaled with load, removes the dependency on media servers, and transfers data in parallel to and from file-based and object-based secondary storage. CODR allows RecoverX to provide scalable versioning so that enterprises can protect and back up their data at any interval and granularity. It enables one-click, orchestrated and repair-free recovery for both operational recovery and test/dev use cases, as well as industry-first semantic de-duplication that allows customers to save up to 70 percent on secondary storage costs."

So what is semantic deduplication? A downloadable PDF from Datos IO's webpage is a research paper by several authors, including Datos IO cofounders CEO Tarun Thakur and CTO Prasenjit Sarkar. Its abstract discusses the backup and restore problem of next-generation eventually consistent storage systems (NECST – think non-relational databases like Cassandra and MongoDB) and suggests "a deep semantic understanding of the data stored within the system of interest as a solution."

The paper describes how such "modern systems no longer store data on disks (or SSDs) within a single machine, but rather spread data across many machines in replicated fashion; the replication is implemented in an eventually consistent manner ... The core problem, as we outline, is simple: that tools and systems cannot readily obtain an efficient, consistent, and logical view of data beneath these complex, diverse, and distributed NECST systems."

And then they say:

We believe that the key to success centers upon a deep semantic understanding of the data being stored within these new storage systems. Only by monitoring and inspecting I/O traffic and reconstructing its meaning (i.e., whether a quorum has been reached, or exactly how a particular data item has been replicated) can critical NECST management functions be implemented in an efficient and scalable manner.

Thus the first characteristic of semantic deduplication is that it is app-aware, with the app being the non-relational database.

Attributes of the semantic understanding, or database operation structural understanding, include:

  • Quorum reconciliation – Unlike traditional storage, where it is relatively easy to tell when an update has taken place, the simple task of knowing when an update has been committed to the storage system is challenging. NECST systems demand that tools and systems that are interested in what is stored within them understand the basics of how quorums are formed, and exactly how and when a data item is safely replicated within the system. By having a comprehensive understanding of the NECST replication protocol, a backup tool can determine the order of updates and form a coherent view of storage.
  • Redundant-copy detection – Unlike traditional striped or mirrored systems, in which redundancy is easily observed, NECST systems may encode data copies in a non-bitwise-identical fashion. Thus, a NECST backup or archival system must be able to meticulously comb through the NECST system to determine where logically identical copies reside, so as to be able to coalesce them and thus achieve storage-efficient backup.
  • Configuration-oblivious backup and restore – Distributed systems have frequent configuration changes, scaling up to meet new demands or down when a failure occurs and a system is removed from operation. NECST tools must be able to store, and then recreate, data despite the fact that its configuration has changed.

Datos IO's CODR software "takes a full snapshot of the database of interest; after this, CODR tracks changes applied to the database and generates incremental versions for the changes." A version is "a cluster consistent snapshot of a scale-out distributed database." We're told:

Full and incremental snapshots are transferred, in parallel, to a backup storage system, which can be a single node in smaller deployments, or a cluster in larger-scale settings. At the backup store, CODR must process the collection of local snapshots to realize a version. CODR achieves this end by running an integrated quorum and semantic-deduplication algorithm, resulting in a single, space-efficient copy of the data.

This means that CODR must have algorithms inside it that are specific to individual database products and cannot simply be pointed at a new database product about which it has no semantic (operation structural) knowledge.


CODR deployment schematic

You can explore what Datos IO says about how CODR works here and there are several videos.

RecoverX has an annual subscription license with the notion of a capacity tier based on physical database size ($/TB). Having gained just ten customers in six months suggests that the pricing is not cheap, and that customers may need to suffer a degree of pain from their current backup and recovery operations before switching to Datos IO.

Why did we start this article by saying Datos IO is the Tesla of the backup industry? Assume Veeam, Veritas and Dell EMC are the Ford, GM and Chrysler of backup software. We think that Datos IO, like Tesla, is redefining how you do backup by going its own clever way, and focussing on a well-designed and high-end product to build its business. (No, stop thinking about cars catching fire or crashing.)

We're sure it will expand its database coverage in the future and potentially collide with the Veeams and Veritas' in the market, which will make for interesting times in the backup industry. ®

Similar topics

Other stories you might like

  • Robotics and 5G to spur growth of SoC industry – report
    Big OEMs hogging production and COVID causing supply issues

    The system-on-chip (SoC) side of the semiconductor industry is poised for growth between now and 2026, when it's predicted to be worth $6.85 billion, according to an analyst's report. 

    Chances are good that there's an SoC-powered device within arm's reach of you: the tiny integrated circuits contain everything needed for a basic computer, leading to their proliferation in mobile, IoT and smart devices. 

    The report predicting the growth comes from advisory biz Technavio, which looked at a long list of companies in the SoC market. Vendors it analyzed include Apple, Broadcom, Intel, Nvidia, TSMC, Toshiba, and more. The company predicts that much of the growth between now and 2026 will stem primarily from robotics and 5G. 

    Continue reading
  • Deepfake attacks can easily trick live facial recognition systems online
    Plus: Next PyTorch release will support Apple GPUs so devs can train neural networks on their own laptops

    In brief Miscreants can easily steal someone else's identity by tricking live facial recognition software using deepfakes, according to a new report.

    Sensity AI, a startup focused on tackling identity fraud, carried out a series of pretend attacks. Engineers scanned the image of someone from an ID card, and mapped their likeness onto another person's face. Sensity then tested whether they could breach live facial recognition systems by tricking them into believing the pretend attacker is a real user.

    So-called "liveness tests" try to authenticate identities in real-time, relying on images or video streams from cameras like face recognition used to unlock mobile phones, for example. Nine out of ten vendors failed Sensity's live deepfake attacks.

    Continue reading
  • Lonestar plans to put datacenters in the Moon's lava tubes
    How? Founder tells The Register 'Robots… lots of robots'

    Imagine a future where racks of computer servers hum quietly in darkness below the surface of the Moon.

    Here is where some of the most important data is stored, to be left untouched for as long as can be. The idea sounds like something from science-fiction, but one startup that recently emerged from stealth is trying to turn it into a reality. Lonestar Data Holdings has a unique mission unlike any other cloud provider: to build datacenters on the Moon backing up the world's data.

    "It's inconceivable to me that we are keeping our most precious assets, our knowledge and our data, on Earth, where we're setting off bombs and burning things," Christopher Stott, founder and CEO of Lonestar, told The Register. "We need to put our assets in place off our planet, where we can keep it safe."

    Continue reading

Biting the hand that feeds IT © 1998–2022