The History Boys: Object storage ... from the beginning
Taking you way back with content-addressable storage
Backgrounder This is a terrific object storage history map from Silicon Valley object storage guy Philippe Nicolas*, who has put together a spreadsheet detailing the history of content-addressable storage (CAS**) – otherwise generally known as object storage.
I have heard so many odd things about suppliers and technologies in this market segment that I decided to build my own map of players, comments and analysis. The map (shown below) summarises the genesis of these various products coming from multiple origins, with a simple timeline on the X axis and Companies, Projects or Products on the Y axis.
Like many innovations, the map illustrates perfectly that almost all systems come from small players. What is true in other IT segments is also true in storage, even if storage was and is still associated with hardware and infrastructure.
At the top of the map some key papers produced by a few Internet giant companies are mentioned. These initiated a round of commercial implementation of the object storage approach for large (hyper-) scale environments.
The Google File System, MapReduce and BigTable plus Hadoop, Amazon Dynamo, Yahoo MObStor and Facebook Cassandra were and are still key references for almost all the companies listed in the map. These companies didn’t find any commercial products that scaled enough for their anticipated needs. They also didn’t want to pay a huge price for limited product and complexity, and preferred to design, build, develop, control and master their own technologies, based on their strong Linux DNA.
For easier reading, a legend at the bottom left describes the meaning of all coloured symbols to facilitate understanding of the map.
Philippe Nicolas’ spreadsheet labour of love – his CAS/object storage development history map. Click chart for larger version.
We can see a clear first wave with CAS players from 1998 to 2005. FilePool is clearly the pioneer in that domain, thanks to Mr Paul Carpentier, and initiated a new approach to storage dedicated to fixed content. The company designed a radical new way to efficiently store large volume of data for a long period of time. This initiative kicked off a new round of data archiving on disk.
We can then list Bycast, Evertrust, Permabit, Archivas, Sun with Honeycomb project, and potentially Caringo, the next story from Paul Carpentier, playing a real bridge between CAS and the later Object Storage phase.
To illustrate the attractiveness of this promising segment, almost all these players were acquired. For example:
- FilePool by EMC in 2001 to become the Centera offering,
- Evertrust by Nexsan in 2005 to be renamed Assureon then swallowed by Imation for $100M in 2013,
- Archivas by Hitachi Data Systems for $120M in 2007 to morph into the Hitachi Content Platform,
- Bycast by NetApp in 2010.
The second wave took place from 2004/2005 to 2009 with the real object storage pioneers. This period has a small overlap with the previous defined CAS era. The key players here are Caringo, Cleversafe, B-Virtual which became Amplidata, Compuverde, DDN with Bucket File System, and the Redcurrant project at Atos in France, which will be forked almost 10 years later in 2015 as OpenIO.
We also list the start of Ceph and Gluster, two famous open source projects, that got great traction on the market and were later acquired by RedHat, respectively via Inktank in 2014 for $175mn and even before in 2011 for $136mn. RedHat started its storage acquisition strategy in 2003 with Sistina for $33M for its cluster volume manager and cluster file system.
This period is also the real first appearance in storage of a new method to protect data to deliver better data integrity and durability, the erasure coding approach, promoted by Cleversafe and Amplidata. 2006 is also the year of Amazon S3, a disruptive storage model introduced by the online retail giant, that changed the IT world for ever. The cloud storage offering was available for remote access over the internet with a very easy subscription model. Nirvanix is launched the year after in 2007 aiming to address the enterprise portion of this market.
The third wave is from 2008/2009 to the present, with plenty of open source projects and products such as:
- OpenStack Swift,
- Joyent with Manta,
- Ambry from LinkedIn,
- Torus from CoreOS,
- Microsoft Azure,
- Google Cloud Storage,
There has been a real push for Hadoop and several commercial offering targeting on-premise needs for enterprise, telco operators or service providers. We see here some players wanting to build and offer the next data storage platform, with multiple access protocols or methods as the ultimate consolidation factor.
It started to be tough to differentiate solutions, even if data protection with erasure coding and access methods are good features to help comparisons. Some players added HDFS connectivity, and Hortonworks started the Ozone project illustrating the probable convergence of all these systems. At the end of the day, HDFS is just an access method.
This period is probably the most active of the three waves with a clear majority of open source solutions.
Globally in 2000, only four products existed: FilePool, Bycast, Evertrust and Permabit. None of them exist today as three of them got acquired (development is still happening for some them, like NetApp StorageGRID from Bycast) and Permabit stopped its product in 2011. In 2010, I counted more than 30 productss including a few open source ones. Today more than 45 systems are available on the market with many open source flavours.
This open source presence was started with Ceph in 2004, the wave has accelerated in 2008, and today 16 open source systems are among the 45 mentioned above and available on the market.
In term of acquisitions, many of them happened during recent years for companies that existed for at least five years; this is well illustrated on the top right of the map. The biggest ever was Cleversafe, acquired by IBM after some competitive bidding to try and acquire Amplidata.
Here are some take-away lessons from this foray into CAS and Object Storage history:
- HTTP has demonstrated its value in storage as a transport protocol,
- S3 has emerged as the de-facto market standard,
- Implementing storage with x86 commodity servers is real, and Software-Defined Storage’s emergence and presence has confirmed this reality for more than a decade, since FilePool in fact,
- Erasure Coding is a must, especially at scale,
- Open Source is a reality and makes adoption easier and faster for large capacity systems,
- Don’t forget file as it’s still present and many object storage vendors who refuse to acknowledge that reality for many years have had to adapt their solution, messaging and position,
- Microsoft except with Azure is totally absent,
- Veritas, the long time data and storage management giant, independent since its split from Symantec, is also out of this map with no product.
And finally, we can see afresh that innovations come from small players and teams. Storage is no an exception to this rule. ®
*Philippe Nicolas is an advisor at Rozo Systems, OpenIO, Infinit, Solix Technologies and Guardtime
** CAS also known as Content Addressable Storage then Content Aware Storage