A couple of weeks ago I published an article about high performance object storage. Reactions have been quite diverse. Some think that object stores can only be huge and slow and then others who think quite the opposite. In fact, they can also be fast and small.
In the last year I’ve had a lot of interesting conversations with end users and vendors concerning this topic. Having just covered the part about “fast object stores”, again I’d like to point out that by fast I mean “faster and with better latency than traditional object stores, but not as fast as block storage.”
This time round I’d like to talk about smaller object stores.
Talking to some of my colleagues (both analysts and bloggers), they say that object storage makes no sense under one petabyte or so. But my point of view is that they are dead wrong. It all depends on the applications and on the strategy your organization is adopting.
It all depends on the applications and on the strategy your organisation is adopting. Let me work with examples here.
It depends on the application
HDS was one of the first in the market to think about object storage as an enabler for cloud and data-driven applications and not just as a more affordable form of storage for cold data. They invested on building an ecosystem which is now very robust and seems quite successful with their customers.
Two pieces of this ecosystem are the remote NAS gateway and Sync&Share (HDI, HCP Anywhere in HDS nomenclature). HDS claims that more than 1,500 customers are running HCP now and there's something like 400+ PB of on-premises storage under management. Just by doing the simple maths (400/1500), this falls into the range of 260TB per user on average… without considering that some of these customers are huge and use HCP for the traditional archiving/content management use cases.
I’m wondering how big HDS customers would be on average if I were to remove the first 10 installations in terms of capacity from the equation – and how many of those 10 customers are actually using HCP for enterprise applications like Sync&Share. I'd bet that those 10 are more in the xSP field, video content distribution, archiving, big data and so on. But this is merely speculation on my part… and I invite HDS to leave a comment if it wants to expand on this.
Other vendors, such as Cloudian for example, have a license that starts as low as 10TB. I have personally met some of their (happy) customers in the range of 100/300TB. These end users have embraced object storage for NAS gateways, file distribution, and, lately, backup. For each new application they add more capacity and more cluster nodes.
Caringo is another good example. It has always worked with ISVs and many of their customers are quite small. And now, thanks to FileFly they have a compelling solution for file server consolidation/remotisation.
This kind of solution is good for small and large customers and they are doing rather well with it. I was having a talk with them a few months ago and they were thinking about bundling the whole solution (Swarm + FileFly) in a package for the smaller customers (starting at around 40/50TB) because they’ve seen a lot of interest in that capacity range.
I’m not saying that these vendors can’t scale or that they don’t have large installations. Large installations are the case histories you can find on their websites, the kind of installation that is much easier to publicize because it demonstrates the potential of your product. Need another example? Small and specialized vendors like Object Matrix have customers that start under the 200TB point but on their homepage you’ll find one of the biggest.
Nick Pearce, one of the founders of Object Matrix, told me that most of his customers start very small (in the past the average deal was 60TB and lately – because of large disks, I suppose – they start at 300TB), and they grow from there. His explanation is simple: less risk while taking advantage of scale-out architecture.
A customer of mine started working with Ceph a while ago and they are now implementing it in production on a three node cluster with 100TB usable. I’ve spoken to others in the last six to nine months who are doing the same with clusters in the order of 100/500TB built out of decommissioned servers. Many of them use it just as a third tier storage for log archiving, secondary backup and so on. But it’s cheap and reliable for them.
You don’t have to take my word for it. I asked someone who works with all the object storage (and NAS) vendors: Jeff Denworth, SVP Marketing at CTERA.
CTERA is a cool vendor that provides some really interesting solutions such as Sync & Share and NAS gateways, among other fancy cloud-based backup solutions. They have 100s of customers. Some of them are ISPs with several thousands of end users, but they are doing pretty well in the enterprise market, too.
When I asked Jeff to express his opinion about object storage sizes, he told me: “I would say, of our customer pool, the large majority of them have under 200TB. But we’re also not the only use case they consider object for, so we become the first use case (gateways, sync and share, etc.) and then the customer immediately starts thinking about new use cases (backup, then DevOps, are most commonly the next to consider.”
So even though CTERA has global deduplication and compression functionalities it is still in the range I’m talking about.
I asked the same question of SwiftStack last week and they told me that the first installation for the majority of their customers is in the order of 300TB now. A capacity that grows quickly in time but still, they usually start small…
Did I mention the majority of object storage vendors here? Well, if not it’s because the article can’t be as long as a novel, but I think I gave you enough to think about small object stores, didn’t I?
But there is more
Some startups are working on smaller object storage systems intentionally. They want to build small object storage systems by design - or better still, small footprint object storage systems)
Minio is working hard on an S3-compatible object store that can run in a single virtual machine or a container. The product is open source and has been thought up for developers. I think about it as the MySQL of object stores. And they are not alone, also Open.IO has a similar approach to building an object storage system that can serve single applications. The right back-end for developers in the cloud era.
The idea behind this object storage system is that developers are asking for S3-compatible storage to build their applications. The small footprint is necessary to embed it within a container and distribute the application in the easiest possible way. But this also means that the S3 engine is very small and fast (yes, again, fast!), security is simplified, and multi-tenancy is no longer a problem since you have an S3 repository dedicated to your application. For better or worse, the developer takes control of the overall “micro-infrastructure”.
You might think I’m out of my mind here but in a few weeks' time we’ll also be seeing Scality, an object storage vendor usually mentioned in very-large scale installations, announcing an interesting component that can also fit this use case.
Once again, we are talking about object storage systems which are intended for small data sets and single applications, with the ability to grow if needed.
Closing the circle
Thinking about object storage as being suited only for huge multi-petabyte installations is passé. Examples supporting this are everywhere and most enterprises are choosing object storage, not for its characteristics of durability or scalability, but because they want to implement cloud storage systems with applications that take advantage of protocols like S3.
Even though I agree that for smaller end users public cloud is a good option, for many of them there are good reasons for adopting an on-premises solution as well.
Storage is no longer just about saving data safely and efficiently, which is now taken for granted, but it’s also about distributing and sharing it quickly and securely. This is a major issue if the organization is widely distributed and is leveraging mobile devices for its business activity. I realise I’m repeating myself here but, from this point of view, object storage can be considered a NAS 2.0.
And last but not least, with more and more developers adopting S3 and Swift protocols, we’ll be seeing a great deal more small (and embedded?) object stores around… ®