Sucking primary data into the cloud: The dawn of a new age
Solutions remain immature, but well worth a look
We usually think about primary storage as something close to compute resources (no matter if they're on-premises or on the cloud) while cloud storage is something that we can access, more or less, from everywhere ... but things are becoming a little fuzzy.
Primary as we know it
What do you really look for in primary storage? Taking availability and resiliency for granted, the first things you look for are high IOPS, low latency and predictability. Am I right?
You want it as close as possible to your compute resources. Also, when your data is actually stored in the cloud, in an object store for example, to get the performance you need you will probably have a huge front-end smart cache (Avere Systems is a good example in this case).
For less latency-sensitive data and workloads, usually stored on secondary storage, you need less predictability and also latency is not a huge issue. In some cases cloud storage is OK. Object storage and all the applications leveraging it, for example, are designed to manage variable and high latency.
But cloud comes into the game
The real problem is that infrastructure is no longer in a single location. No matter what cloud service you are using, resources are now spread out between your data centre(s) and the cloud(s), so your infrastructure is hybrid, and problems can arise.
The nice thing is that technology is maturing pretty quickly and now you can move (and convert) VMs and services easily between different private or public infrastructures. Yes, of course, there are still major limits and constraints, but we are getting there.
The real problem lies in the fact that you have less control than in the past, especially if you are using public clouds and your services are spread out in different regions/areas. You can mitigate the problem by spending more money; for example by using leased lines. But is that sustainable in the long term?
The problem of primary storage in the cloud
Service providers such as Amazon do not commit in any substantial way to SLAs. They give you no assurance about the storage performance you can carve out from one of their VM/VPS. In fact, it’s not unusual to spin up two identical VMs and find out that they perform differently.
It can be good for certain type of workloads, much less in other cases ... especially if you have to integrate modern and legacy applications in the same infrastructure.
Predictability is just the tip of the iceberg. Shared storage is less common than you might think, (now both Amazon AWS and Microsoft Azure make use of it, but it’s still quite limited), data services are limited too, and so on ...