Hyperconverged Infrastructure 101: A short primer about HCI
It's time to drop out of over-hype space
There's a horrible, horrible thing I get asked at least three times a week: "What is hyperconvergence?" This is like an icepick into my soul, because I consult with almost all of the current hyperconverged infrastructure (HCI) vendors in one form or another and the truth is, “hyper-convergence” is a meaningless marketing term as wishy-washy and pointless as "cloud".
Every vendor has their own specific take on what it is supposed to mean. Each has its own opinion on what are the minimum feature sets to be considered a "hyper-converged" vendor and what under no circumstances should be called thus.
The formula for figuring it out is simple.
Small hyperconvergence players want to use whatever buzzwords the big hyperconvergence players are using, so that they can get some free marketing by living in the afterglow of the big players. Big players want to narrow the definition to "exactly how we do things", so that they can disassociate themselves from other players.
Everyone, ultimately, wants you to buy their specific flavour and loudly denounce their competitors. Preferably with about 100,000 others and a dozen bullhorns while marching up and down Sand Hill Road in California’s Menlo Park, where a significant number of the most important Silicon Valley venture capitalist firms have their offices.
As the big vendors evolve their marketing, the little vendors change how they want to talk about and position their company. Agreed upon marketing terminology that was perfectly fine in September is suddenly invalid by November and something you just don't even speak of by January.
It's an absolutely insane world to work in. It also means that if I hope to ever have a meaningful discussion about hyper-convergence or adjacent technologies, it is important to carve some definitions in stone.
So, with apologies to all storage journalists, analysts and vendors seeking their own patch of limelight, I'm going to do just that. It's time to define not only hyper-convergence, but also other, adjacent terms.
At the core of hyperconvergence is the "server SAN". If I recall correctly, we get to thank Stuart Miniman for this term.
A server SAN is a bunch of commodity servers (usually x86, but I've seen ARM prototypes) that are clustered in some fashion or another to collectively act as a single storage source. The goal is to take the local disks on each system and lash them all together.
A server SAN can tolerate the loss of individual drives within a given node, or the loss of entire nodes. How many of what kind of losses can be tolerated depend entirely on the technology used to lash the various bits together.
What exactly are we talking about?
Here is where we encounter our first hiccough in definition: some people with voice and audience specifically exclude object storage from being included in the server SAN definition. To understand why, a longish discussion about block storage versus filesystems versus object storage is required. Fortunately, I have a whole article on that here.
The short version of why some people exclude object storage systems from the server SAN definition is twofold. First, most operating systems and hypervisors don't natively talk object storage. It's something that is (currently) strictly application level and largely designed for developers, not infrastructure teams.
The second reason for exclusion is that object storage is quite crap at running VMs. The sort of people who usually talk about server SANs and hyper-convergence are infrastructure nerds, so anything that is "outside their wheelhouse" isn't something they want to have to constantly make exceptions for when talking about what they do.
I personally don't hold to this. My definition of a server SAN is "commodity servers clustered to collectively act as a single storage source". What that storage source is and how you use it is for other buzzwords. Let's not put too much on a single term, shall we?
Legacy convergence, hyperconvergence and data centre convergence, oh my!
Let’s break convergence into simple chunks – I’ll start with legacy convergence. Legacy convergence is best thought of as proprietary switches married to traditional disk arrays alongside some high-end, tier-1 nameplate servers providing compute capacity.
It is generally a "buy-by-the-rack" set-up, where a bunch of old-school hardware is sold together as a single SKU, typically for a king's ransom, and supported by the sorts of enterprise support teams that have their own helicopters.
Hyperconvergence does away with the expensive disk arrays (hence why many hyper-convergence vendors have "no SAN" stickers) and replaces the expensive tier-1 nameplate servers with more modest commodity servers from lower-margin vendors.
Storage in a hyperconverged environment is provided by filling up the compute nodes with disks and creating a server SAN. This uses a part of the compute servers RAM and CPU resources. Hyperconverged solutions still rely on proprietary switches for networking.
The argument for this trade-off is that the overall costs of the set-up are so much lower than the traditional legacy convergence stack that adding an extra node or two per cluster to make up for the lost compute power per node is still cheaper.
Hyperconvergence might require a few extra compute nodes to fit the same number of VMs, but the overall footprint is generally smaller. This is because legacy convergence storage is in separate units, which themselves tend to be configured to form a distributed cluster of storage.
Data centre convergence is an emerging term used by various hyperconverged companies that have integrated software-defined networking into their platform. As far as I am concerned, it should only properly be used by those offerings that replace proprietary switches with "open" merchant silicon switches that use open standards.
Pulling out the stopping bits
Convergence is a continuing trend of commoditisation. Legacy convergence did away with expensive and time-consuming integration projects. No longer did you have to do a massive needs assessment followed by hiring network architects, implementation consultants and so on and so forth. You dialled up the convergence vendor and ordered an SKU that provided X number of VMs in racks that could handle Y amperage and had Z interconnect to the rest of your network.
Unfortunately, the legacy convergence vendors charge a hefty sum for all that integration work. They also tend to be composed of the vendors of all the old-school gear that makes up the legacy convergence solution, so everyone has big, fat margins to meet and costs are eye-watering. They're just ever-so-slightly cheaper than doing it yourself – if you're an inherently inefficient Fortune 2000 company, or a government.
Hyperconverged solutions took the margins away from the storage and server vendors. This let the new generation of convergence players rake in fat margins while bringing the price (and the entry size) down into reach of the commercial mid-market.
Data centre converged vendors are now taking the margins away from the switch vendors. Some are simply absorbing this into their own margins, others are using it to drive the cost to the customer even lower, in an attempt to address the SMB market.
This will all last exactly as long as it takes for the various open-source convergence projects to become generally available about 10 years from now. At this point, the vendor integration margins go away and convergence simply becomes the default for everyone running their own infrastructure, no matter how small.
Software-defined table stakes
A critical part of this rush to commoditisation is the software. Let's put hypervisors to one side for a moment and talk about enterprise-storage features. Ten years ago, you could become a massively major player in the storage industry by introducing a new feature, such as deduplication, for example.
Today, snapshots, cloning, compression, deduplication, replication, continuous data protection and hybrid/tiered storage at a minimum are the table stakes. If you don't have these features as part of the base storage offering, you're already dead and you just don't know it yet.
Think about this for a moment. Multi-billion dollar companies were founded on almost every one of these features. Hundreds of millions, perhaps billions has been spent across the industry by dozens of companies, each to make their own version of these features.
Today, they are tick-box items in hyper-converged and data centre converged offerings. You cannot enter the market without them.
As data centre convergence becomes more and more prevalent, full SDN stacks will be added to that list. Hyper-convergence as a distinct category will go away. That's probably five years and a dozen pieces of buzzword bingo from now, but it's quite clear on the horizon.
Where it all gets fuzzy is the fighting over which software features are table stakes.
Everyone except the high-margin hardware vendors being butchered for fun and profit agree on the hardware part of the convergence definitions. But when we start talking about the software stack, well, there will be blood in the water.
Customer expectations are set by the new entrants to the market who are continually commoditising features. Customers look at the start-ups and then ask the big guys "why don't you have this feature?" or "why are you charging me extra for this?"
Thus hyperconvergence is a special class of server SANs where VM workloads run alongside the storage workloads. It was conceived of to be cheaper, denser and more appealing than legacy convergence. Data centre convergence is a special class of hyper-convergence.
This is the definition you must limit yourself to if you want only to talk about what hyper-convergence is.
Outside of that, however, is an ever-evolving metagame of promises, expectations, lies, damned lies and marketing. This is where everyone makes the mistake of conflating ought to be with is.
We're still a long way from what hyperconvergence ought to be. I wonder how many buzzwords we'll go through before we get there. ®