There is a problem with cloud storage that affects almost all of us, yet is something of which most of us remain blissfully unaware. The problem isn't the object stores underpinning cloud storage; used properly, object storage is great. Look instead to the bit shuffling data between end users and the cloud.
It's not the network that's the problem. It's the protocols we run on top of it.
Okay, the network is a problem too. Unless you have corporate megabucks or live in one of the handful of modernized neighbourhoods in the world then your connection to the intertubes is probably demonstrably crap.
The more precise definition of the problem with cloud storage is that the protocols we use are not really designed for mediocre networks.The result is an inability to use the networks we do have efficiently.
This is a minor inconvenience for the end user if they are trying to stream a single low- resolution movie or upload a few hundred megabytes to Dropbox. It is an exceptionally expensive problem when you are a company sending terabytes of data every day up to cloud storage and/or trying to distribute terabytes (or petabytes!) of data every day to customers.
The expense is large enough that it raises the cost to end users. It also serves as a significant cost and time barrier to those who want to use cloud storage for data protection or hybrid workloads.
Consider for a moment the rise of wireless and mobile devices. Depending on who you talk to, mobile usage is now (or will very soon) be the majority of internet end user content consumption. If we lump in those using their notebooks and tablets over Wi-Fi we're looking at the lion's share of consumer internet consumption getting to the device through some form of wireless technology.
Sadly, TCP – the primary transport protocol of today's internet – is awful over wireless. To make matters worse, the crude additive-increase/multiplicative-decrease (AIMD) algorithms used in most TCP implementation to deal with congestion aren't really great at dealing with congestion either.
On top of this inefficient TCP we then transfer files around using protocols like http(s) which don't natively have any means of supporting file transfer resume.
So if a connection is dropped or reset you're either back to square one, or you are relying on the application to handle resume operations in a graceful manner. For that matter, you usually need the server to be cooperative and allow the client application to pick up a transfer at an arbitrary point, which not all do.
On top of this we see yet more protocols attempting to create an environment which allows clients to mount cloud storage and then use that storage for classic file systems.
Increasingly we have people using file systems that expect to run on top of ultra-low-latency local storage being mounted on top of a protocol that runs on top of HTTP, which runs on top of TCP over a thready wireless connection through congested internet service providers, potentially to overseas cloud storage that can be seeing up to 10 per cent packet loss.
No. Just no.
internet service providers are not going to build for peak traffic
Sucking out loud, in real time
This is no joke. We are smack in the middle of a planetary transition away from traditional broadcast media towards media being consumed exclusively over the internet. Cisco's The Zettabyte Era is a must read on just how big the problem is.
A couple of key takeaways: by 2019 annual global IP traffic will reach 2 zettabytes per year and busy-hour Internet traffic will hit 1.4 petabits per second with average Internet traffic being only 414 terabits per second.
But internet service providers are not going to build for peak traffic. That whole "congestion" thing that TCP isn't so good at? It's about to get a lot worse. The "turtles all the way down" approach of layering abstractions protocols one on top of the other is increasingly looking like a very bad idea.
It's also somewhat inevitable.
Consider the data intensive job of video editing. This is no longer something that is done by having tapes brought into the office, video loaded onto the server and then having a team of nerds in cloth cubicles work until they're allowed out of the beige hell of their employment.
Increasingly, video goes from wherever it is shot up to the cloud. Once in the cloud various apps perform basic tasks like archiving the raw footage, automating touch-ups, colour and sound correction and so forth. A video nerd will download the working copies to their local system, do some work on them and then push a rendered master version back up to the cloud.
Once in the cloud, the various applications will then transcode the file into dozens of different resolutions and formats. They'll then make those files available to end users, usually through intermediaries such as content delivery networks.
Getting the video to end users isn't as simple as pushing all versions to all content networks. The right version needs to get to the right caches on the right networks. There's no point in handing off a 4k version to a mobile network located in a rural area, but that's probably the sort of thing you want provided to a fibre provider in Tokyo.
In getting from camera to smartphone, the same scene of footage can traverse some part of the internet hundreds or even thousands of times. When you start thinking about the amount of content a company like HBO moves around, and the size of their audience, the inefficiencies in how we shuffle that data add up quickly.
Protocols: the next generation
We're not going to get rid of TCP any time soon. We can, however, stop relying on it quite so much as we do. UDP is much better for shuffling bulk data. TCP can be left to handle control information. This is the basic concept behind several next generation data storage protocols.
In this space Tsunami and UDP-based Data Transfer Protocol (UDT) are the most commonly discussed open protocols, with a number of commercial protocols (such as IBM's fasp) rounding out the market. These protocols all offer some means of implementing congestion avoidance and other network technologies on top of UDP an each does so in a different way.
There are many other protocols vying for inclusion here as well. For example, I'm leaving UFTP out of this conversation because in every test I've ever run UFTP's performance is awful. The hivemind would seem to agree on this, so I'm largely considering it a mid-performance protocol somewhere between old-school stuff like traditional FTP and the modern stuff.
Even UDT's inclusion here is marginal. UDT is outright wrecked by Tsunami, however, Tsunami isn't really available for easy consumption in a lot of products at the moment. In turn, Tsunami is generally shown up by the commercial offerings.
Tomorrow's cloud storage, today
I mentioned IBM's fasp above. IBM got hold of fasp in 2013 by buying a company called Aspera, whose first customers were video and film production companies, mostly. IBM has big plans for the high-speed file transfer technology in other cloudy applications, including disaster recovery as a service (DRaaS), supply chain data and media/file distribution to retail stores, for example.
As a commercial offering, there is a lot more to what it ships than a protocol. Digging around in the guts is highly instructive and allows a glimpse at the future of cloud storage for everyone, as it is being implemented today.
On the cloud side of things Aspera is a series of gateway VMs that work to de-stupid storage. Client devices talk fasp to the Aspera gateway and the Aspera gateway talks cloud storage protocols (usually HTTP) to the cloud provider.
Yes, this still means that there is a bunch of horribly inefficient storage-over-HTTP going on, but it matters a lot less when the Aspera gateway is on the same local network as the cloud storage. There is no congestion, dropped packets or massive latency spikes to worry about.
Talking fasp to the Aspera gateway requires that you have an application on your client that understands how to talk fasp, and a network that will allow this conversation to occur. This can come in the form of browser plug-ins, mobile or desktop applications and so forth.
Not all networks are going to allow fasp communication to occur. Fasp uses TCP Port 22 for control and UDP Port 33001+ for data. The data port increments with each parallel connection in use, and Aspera is capable shuttling data around in parallel in order to better use every last scrap of bandwidth available.
If the network blocks the fasp communication (either because it is blocking ports or because the efficiency of the UDP data looks like a DDoS to older firewalls) then Aspera will switch to a traditional HTTP communication method for that client.
Where this gets crazy is when we start talking about sending large amounts of data very quickly. Aspera claims to be able to handle 1 gigabit per second (roughly 10 terabytes per day) per instance. Good luck achieving that kind of throughput to traditional cloud storage without a next generation protocol. Anyone who has tried using cloud services for serious backups or disaster recovery will know that sustaining 1 gigabit per second to any of the bit providers is (wait for it) a pipe dream.
As a cloud-native application, the Aspera gateways simply spawn more instances as they become saturated. Consider also that the whole solution can shift data about in parallel, using multiple instances. Now you should have an idea of what the future of cloud storage looks like.
Time to fight it out
The adoption of new technologies follow a pretty consistent pattern. Necessity drives invention. Those who inhabit some extreme become rapid adopters while early companies form. Marketing dollars are spent "defining the problem space" so that those beyond the early adopters even know that there's a problem.
Once everyone knows that the problem exists and various companies have had a go at creating buzzwords we get into differentiation. This is where the various players in the space stop talking about what they are doing and start talking about how they do things differently from everyone else in that space. Prices come down, features go up and shortly thereafter the product in mainstream.
Today, next generation cloud storage protocols are late in the "defining the problem space" segment. All the hyperscale customers are already using some form of this technology and more mainstream businesses are starting to feel the pinch and cast about for a solution.
What this means is that in a few months (less than a year) the rubber is going to meet the road in this space and the protocol wars will begin in earnest. The commercial protocols look to be quite a bit better than the open source stuff and who wins is going to boil down to who licences their protocol in the least burdensome fashion and manages to get it adopted widely.
Here, IBM's current (and growing) support for multiple cloud service providers makes me feel that its fasp protocol and Aspera implementation stand a good chance of emerging the victor. It could be trying to use Aspera as a club to beat everyone into their own Softlayer cloud. It isn’t
IBM knows how the game is played and they know that if it tries to use fasp as some lock-in tool it will be irrelevant in short order. Will it be able to turn fasp into the cloud storage equivalent of mp3 or H.264? Licenced, but in practice used by everyone such that the cost becomes invisible? Only time will tell.
One thing is for sure: begun, the cloud protocol wars have begun. ®