Seagate is building hard disk drives with a direct Ethernet interface and object-style API access for scalable object stores, a plan which - if it works - would destroy much of the existing, typical storage stack.
Drives would become native key/value stores that manage their own space mapping with accessing applications simply dealing at the object level with gets and puts instead of using file abstractions.
Seagate says it has developed its Kinetic technology because the existing app-to-drive storage stack is clumsy, inefficient and delays data access. Put an Ethernet interface module on each drive for apps to talk to directly, they say.
The storage behemoth states:
"The transit path from application to storage requires multiple layers of manipulation from databases, down through POSIX interfaces, file systems, volume managers and drivers. Information passes over Ethernet, through Fibre Channel, into RAID controllers, SAS expanders and SATA host bus adapters."
Seagate view of traditional storage stack
It seems this just won't do. Seagate thunders:
The majority of today’s mass scale object applications do not need either file semantics or a file system to determine and maintain the best strategy for space management on a device. Modern applications only need object semantics [e.g. write the whole thing, read the whole thing, delete the whole thing, refer to it by a handle chosen by the client and cluster manager] not where data resides on a given device.
We're further assured that "objects (information) are written, read and deleted but never modified."
A light went off in our minds here. Shingled media drives, which Seagate is developing, are very, very poor at re-writing data. Multiple track reads and re-writes are contingent on a single data item, due to the partial track overlapping inherent in shingling.
Seagate's kinetic drive storage scheme
In Seagate's kinetic scheme drives communicate in keys and values. For example, they do gets, puts and deletes. They allow applications to distribute objects and manage clusters while letting the drive efficiently manage functionality, such as:
- Managing key (object) ordering
- Quality of service
- Policy-based drive-to-drive data migration
- Handling of partial device failures and other management
- Data-at-rest security
Model of Seagate Kinetic Storage stack
Seagate says that, by scrapping the storage server and connect logic to the drives, racks can have more drives in them and four other bennies come about:
- Data Sharing - one application can write a key and value to a drive, while another has the ability to read the data
- Data can now be moved directly between drives with peer-to-peer data copy commands where ranges of keys can be moved between drives, using the APIs
- Silent data corruption is a fact of life, Seagate says. With Kinetic Storage, data can be stored with comprehensive end-to-end integrity checks guaranteeing that data is stored correctly
- Drive technology can develop without needing upper stack software changes
The assertion is that, in general, the total cost of ownership (TCO) of average cloud infrastructures get lowered by up to 50 per cent.
Seagate says further:
Specific Seagate drives are provided with a comprehensive user-space library that allows applications to access the drive directly. This library provides the complete interface to access the data and to manage the drive. It bypasses the normal operating system storage stack and lets the application talk directly to the drive as if it were talking to another service in the data centre. This process utilises a typical application remote procedure call (RPC). This Kinetic Storage API platform currently provides libraries for Java, C++, C, Python, and Erlang, and other languages will be provided over time.
We can imagine that hybrid drives, ones with flash caches, will be well-suited to this application, with metadata stored in flash.
The key/value API will be open sourced, we're told.
Seagate's kinetic storage drives - hard disk drives first it says - are for cloud and hyperscale data centres. The press announcement has tinned supporting quotes from Basho Technologies, Dell, EVault, Huawei, Hyve, Rackspace, Sanmina (Newisys division), Supermicro, SwiftStack, Yahoo and Xyratex. Thats a good collection of supporting companies.
Here's a sample out of the quote tins, from Xyratex CEO Ernie Sampias:
"In the last few years, we have been shipping an object-based storage solution in our ClusterStor offering, and we clearly see object-based storage as a key part of our future. What the Seagate team is doing fits together nicely with our strategy. We are pleased to partner with them and bring improved performance and scalability to our customers.”
So Xyratex will be producing storage enclosures with new Kinetic drives inside them and, presumably, adding system app-level software to access them.
Basho has today announced that it has partnered with Seagate Technology to deploy its distributed NoSQL database Riak on Seagate's Kinetic Open Storage platform. It claims that "Riak will improve the I/O operational efficiency of the platform by removing bottlenecks and optimising cluster management, whilst maximising storage density and simplifying operations to reduce customer costs."
Basho is making available an eKinetic driver enabling an Erlang-based high-performance socket connection to the drive. Basho is also providing software that maps a Riak backend to the drive library. Both the eKinetic driver and Riak backend compatibility are available as alpha version software.
Logically this means Basho has been using kinetic drives and Seagate must be close to announcing them.
Read up on Seagate's platform here. ®