Metadata-farming, data-silo-killing startup: Go on. Bring us your unstructured stuff
Former Primary Data boss talks to El Reg about Hammerspace
+Comment Newcomer on the storage software-as-a-service scene Hammerspace announced the general availability of its eponymous SaaS application this week. This software has been engineered using technology from Primary Data – yes, that Primary Data – applied to hybrid IT and cloud environments, providing a SaaS cloud-control plane.
Primary Data says stop, Hammerspace, Innodisk cooks some SSDs, and Fujitsu goes blockchainREAD MORE
The Reg spoke to exec chairman, CEO and founder David Flynn about Hammerspace, which provides a way of unifying file silos into a single network-attached storage (NAS) resource so that it can provide access to unstructured data anywhere, in hybrid or public clouds, on demand.
Flynn was co-founder and chairman of Primary Data, founding and running flash storage pioneer Fusion-io before that. Primary Data wanted to be a metadata-driven control plane for, er, primary data – but it closed earlier this year. When The Register asked about this, Flynn chalked this up to the firm having been ill-suited to the hybrid IT and public cloud world – it was focused on software that was run on-premises.
Hammerspace is angel-funded, with about 25 employees, and it has acquired Primary Data and Tonian technology. It has partnerships with AWS, Azure and the Google Cloud Platform, also with Western Digital, NetApp, Cloudian and Red Hat.
What it does
A Hammerspace software instantiation on-premises finds locally stored files and objects and builds a metadata catalogue describing them. Using global deduplication and WAN optimisation, it replicates this data and metadata to a Hammerspace instantiation in the cloud from where it can be presented in file or object form, and sent to any other on-premises or cloud location where the data is needed.
For example, containerised applications in the cloud (AMIs) may need to access data through the Hammerspace control plane, an on-premises workload may need to burst to the cloud for intensive compute processing, or it may need to follow the Sun around the globe.
It provides concurrency control to allow multiple processes to update and read from a single shared copy of the data safely.
In effect Hammerspace – named for the invisible pocket of air from which a cartoon character pulls an object – farms metadata and then provides data management services using that metadata, such as file and object distribution across multiple clouds and locations, global search, stored item reporting and analytics.
The software is designed for data-intensive workloads in industries such as entertainment and media, oil and gas, semiconductor chip design, and also for MSPs.
Data coverage and pillars
Hammerspace said the software spans multiple data centres and multiple protocols and is aimed at all kinds of data: primary, secondary and tertiary, with the focus on primary data.
It is built on three pillars:
- Spanning multiple data centres at file-level granularity with machine learning-driven automation,
- Intelligent metadata to allow data owners granular control of their files using things like tags and keywords
- The ability to present data at extreme levels of performance
We haven't seen any numbers to back this up yet. Flynn said it's about replicating metadata very, very fast and data following it.
Data is managed in a deterministic manner, according to Flynn, who said desired outcomes are stated and the Hammerspace system then wrangles the data to achieve them.
The software also features automated decisions driven by machine learning.
Data Control Plane
Hammerspace is a multi-tenant, multi-cloud data control plane where data exists abstracted from storage and is available to any app, in any container, in any cloud. By automating the management of data with metadata-driven machine learning, the idea is that you can move your applications between clouds - rather than allowing it to accumulate in a data silo.
Cloud Storage Gateways, caching and competition
But isn't this what Nasuni, Panzura and Egnyte do, with cloud storage gateway software – share large files between distant locations and help with hybrid on-premises/public cloud IT setups?
Curse of Woz strikes again – first Fusion-io fizzles out, now Primary Data goes downREAD MORE
Flynn told us: "Hammerspace is tangentially like Nasuni and Panzura. They don't really serve the data-intensive world... just syncing low-performance files around the globe is not our sweet spot. It's about large scale and high performance."
What about caching?
Flynn said: "Caching is a half-assed way to do it."
He told us Hammerspace technology was non-invasive, unlike other file system redesigners: "What we are doing is non-disruptive. There's no need to replace your storage with our storage."
Other storage technology, such as products from Excelero and Spectrum Scale, used by Pixit Media, won't work in the public cloud.
Asked to compare Hammerspace to Cohesity and Rubrik, he said: "We're not storage [and] focused on metadata, and enabling the high-performance consumption of unstructured data."
It seems to The Reg that performance will be a key attribute. We would like to see comparative performance numbers to demonstrate Hammerspace speed compared to Nasuni and Panzura.
NetApp's Data Fabric concept, with SnapMirror, achieves file location flexibility in a hybrid IT environment. We asked Flynn how Hammerspace differs from NetApp's Data Fabric at providing file location flexibility.
Flynn told us: "NetApp SnapMirror moves entire volumes of data... before it can be made available at a new location. Hammerspace virtualizes data, making it appear accessible instantly in any location through its metadata.
"Machine learning then makes data available predictably and on-demand with file-level granularity, so data consumers can start their jobs right away. Additionally, Hammerspace is able to present data from multiple underlying storage vendors into the same global namespace across the hybrid cloud, giving data consumers the necessary flexibility to operate above storage silos without the need to copy data."
Overall Flynn said the aim is for Hammerspace to become a platform, like server virtualization and software-defined networking: "I want to end all storage silos. The biggest silo is the cloud. I want to end this and ensure the cloud does not trap people's data."
Will Hammerspace be another Primary Data? Flynn said it has moved on from on-premises data and is firmly rooted in the hybrid, multi-cloud world. It seems to us that an early demonstration of its claimed data access speed advantage would be a useful way of planting its stake in the high-performance distributed data access market and separating itself from the cloud storage gateway suppliers.
It will also need to differentiate itself from Cohesity and Rubrik, whose services include data protection and management, and also Komprise with its file data lifecycle management capabilities.
If Hammerspace can do this, we might be looking at a build-out-the-business funding round in 12 to 18 months. ®