Primary Data, Fusion-io founder David Flynn's startup, is unveiling its DataSphere product: universal data access software that masks multiple hardware and multiple vendor silos, to provide a single virtualized data sphere containing different tiers of data.
We'll start with a conceptual diagram and then explain the bits as we understand them. Spend a minute looking at this:
Conceptual DataSphere diagram
Data-accessing virtual machines (VMs) can read and write data, using standard protocols, from direct-attached storage, NAS shares, block-access SANs, and the cloud. These physical tiers can be provided by a mix of vendors – EMC Isilon (NAS), NetApp Data ONTAP (SAN), Intel NVMe (DAS flash), and Amazon S3 cloud storage – which occupy a single and infinitely scalable global data space, as Primary Data's marketeers would have it.
Data can be moved dynamically between tiers as access rates change. This is controlled by an out-of-band policy engine, Primary Data claiming: "By operating out of band, DataSphere ensures this limitless scalability with no performance or application impact."
The policy engine can be used by admins to set policies for service level objectives (SLOs) across different storage tiers. These can be adjusted dynamically, and "Once policies are set, DataSphere automatically places data on the resource that best meets evolving application requirements, and alerts IT when policies are at risk of non-compliance."
We are told, "VMs are created and assigned with storage policies, and automatic remediation of service level objectives maintains policies through dynamic data mobility across the various storage tiers."
Offloaded VM snapshots and clones are supported by DataSphere, and admins have the ability to add and remove backup storage without application disruption.
Now we have some Woz words, for Steve Wozniak is Primary Data chief scientist. "Primary Data finally makes it possible to automatically have the right data on the right storage tier at the right time, without the need to rip and replace a single storage system."
There is single-pane-of-glass management – below is a screenshot.
DataSphere multi-vendor array management screen
We asked Primary Data's Kaycee Lai, a technology executive, some questions about this...
El Reg What is a service level objective?
Kaycee Lai SLOs are business objectives for applications. They define a commitment to maintain a particular state of the service in a given period. For example, specific write IOPS, read IOPS, latency, etc, to maintain for each application. SLOs are measurable characteristics of the SLA.
El Reg How does it differ from an SLA?
Kaycee Lai SLA refers to the price, time, etc, assigned to achieve the SLO.
El Reg How many storage tiers are supported?
Kaycee Lai In the GA release, DataSphere will support DAS, NAS, and Object as storage types. Block level support for SAN will follow in the next release. Tiers are a logical concept in DataSphere. Tiers are simply a class of storage that is mapped to a particular SLO. The notion of having multiple tiers is not as important as having multiple objectives requiring the specific storage to meet those objectives. Customers can create as many objectives as their business requires.
El Reg How are they defined?
Kaycee Lai In short, objectives are defined at a logical level. Different types of storage resources are transparently added to and removed from the global dataspace, and are automatically discovered by DataSphere. It creates a logical construct using existing storage to create a logical volume or share that is designed to meet a specific SLO. The physical data can actually be spread across separate volumes or shares within and across a given storage system.
El Reg How are multi-vendor storage resources matched to tiers?
Kaycee Lai DataSphere utilizes the capabilities of third-party storage systems to match the specific requirements of an SLO. For example, for an application looking for 100K IOPS for data that is transient and doesn't need to persist (e.g., temp/scratch files), it can be placed on server side flash rather than a more expensive all-flash array that meets those performance objectives but gives 5 X 9's of availability. This allows the customer to meet the objectives of this application at a lower cost. As another example, if an application cannot benefit from de-dupe (e.g., media file), it can be defined as not requiring de-dupe and as a result, will not be placed on a device that has de-dupe always turned on.
El Reg Do VMware admins, via VVOLs, see tiers?
Kaycee Lai VMware admins will see a single VVOL data store with DataSphere. Tiers, again, are logical entities that are mapped to service level objectives, and are exposed to VMware admins as VM Storage Polices.
El Reg How are SLOs linked to tiers?
Kaycee Lai Admins create policies and objectives in DataSphere. This ensures granular objectives for individual VMs or VMDKs. VM Storage Policies for VVOL datastores can be created via VASA integration with DataSphere and assigned to VMs and VMDKs.
El Reg How does DataSphere talk to individual storage resources?
Kaycee Lai For control plane functions, DataSphere talks to individual storage resources via standard NFS protocols for NAS-based storage. For object-based storage, DataSphere has an object connector (SW or VM) that does the file-object translation. For DAS, DataSphere has a Linux-based VM inside the host that acts as an NFS server. All I/Os are direct from client to storage, without going through DataSphere.
El Reg Which multi-vendor arrays, server flash providers, and public cloud suppliers are supported? I need actual array classes please.
Kaycee Lai For NetApp, DataSphere works with ONTAP 8.1 and higher for 7-mode. For C-mode, we work with ONTAP 8.2. DataSphere supports EMC Isilon, Dell R630 for direct attached and hyper-converged implementations, and Object storage with Amazon S3 or any Swift-based object storage.
El Reg I see VVOLs are used for DataSphere<->vSphere communications. What is used for Hyper-V, KVM, and Xen, since they don't have VVOLs and VASA?
Kaycee Lai Each vendor has their own unique VM management tools and we are working to integrate with those. DataSphere can set and manage the SLOs for Virtual Disks used by all hypervisors, including Hyper-V, KVM, and Xen, using their protocols.
El Reg Primary Data says DataSphere provides the ability to:
- Adapt to continually changing business objectives with intelligent data mobility.
- Scale performance and capacity linearly and limitlessly with unique out-of-band architecture.
- Reduce costs through increased resource utilization and simplified operations.
- Simplify management through global and automated policies.
- Accelerate upgrades of new solutions such as VMware vSphere 6 with seamless migration using existing infrastructure.
- Reduce application downtime with automated non-disruptive movement of data.
- Deliver a full range of data services across all applications in the data center.
All of these are in keeping with the concepts above. It is early days and far too soon to say if this is great software or not. The concept has an attractive clarity about it, and it will be fascinating to see how Primary Data instantiates these concepts in its product and manages to develop a wide range of supported multi-vendor storage products – think more arrays, VSANs, and other SW-only storage products, and additional cloud targets. Will hyper-converted appliances be supported?
A Primary Data VMworld demo at VMworld 2015 US will host 762 million files utilizing 226TB of storage space, with objectives mapped to virtual machine (VM) storage policies through VMware vSphere Storage APIs for Storage Awareness (VASA). VMworld is in San Francisco's Moscone Center and runs from Aug 29 to Sep 3. ®