The need to disaggregate Compute And Storage for HCI

HPE: HCI 2.0 is the answer

Paid feature Let’s begin by reminding ourselves why Hyperconverged infrastructure, or HCI for short, was so revolutionary for IT customers.

With the early HCI platforms, mostly based on Nutanix or VMware software, virtual block storage was converged onto virtual compute servers to create a different abstraction and a unified set of infrastructure that still provided that appliance feel. This enabled customers to rationalise a fragmented server virtualisation, storage virtualisation, and network virtualisation stack and merge them under a single management framework and application runtime environment.

What the early HCI platforms delivered, for the most part, was the same level of ease of use to virtual compute and virtual storage as the cloud builders had created, to make it easy for applications to be deployed on their massive infrastructures.

But there was a scaling problem. Originally thought of as a smart architectural design and deployment methodology, it gradually became clear that the early HCI platforms were inherently flawed.

With these early platforms, compute and storage ratios were specific to a node. To be sure, HCI platform sellers had different node configurations – some with a lot of compute, some with a lot of storage, and some that were balanced in the middle – but they were always having to do capacity planning for the peaks of either and that means there was often a lot of stranded capacity in some of the nodes.

This added complexity in designing the solution, and brought new performance challenges, because nodes that are optimised for storage often introduce latency and bottlenecks to the environment.

As HPE tells us, the result was needless overprovisioning because customers rarely have a balanced IT estate when it comes to storage, CPU, or memory. “We often see storage as causing the imbalance here, causing customers to buy more HCI nodes (with storage, memory and CPU) to achieve their desired storage needs. For example, dedupe/ compression often came with performance impacts that needed more CPU and memory. This compounded the challenge that customers were seeing having to buy more powerful CPU’s and more memory to deal with this unforeseen overhead.”

This impedance mismatch between compute and storage in early HCI architectures has been a problem since the beginning, and moving away from appliances and towards software-only HCI doesn’t really solve the problem. Companies will still end up installing a mix of nodes with different compute and storage capacities, and that means their HCI clusters cannot be as fluid in meeting application needs as they would like. In fact, this impedance mismatch between compute and storage in legacy HCI platforms is getting worse.

Matt Shore, HCI business development manager for data and storage solutions for EMEA at HPE, notes there are heavy workloads deployed on HCI platforms that create “mega-VMs” or “monster VMs” that are not able to get the right allocation of compute and storage. And mega-VMs are greedy, as they can hog a cluster causing it to only run one VM or application. If the workload is storage heavy and not CPU/ memory that clusters CPU/ Memory resources will sit there unused to deliver just the storage element.

For instance, mission critical applications that are running atop Oracle relational databases or SAP HANA in-memory databases are very heavy on I/O and generally always so, and that often means the VMs supporting them have an over provisioning on compute and storage capacity. Then there are applications, such as end of day, week, or month batch statement or reporting jobs that require peak I/O at regular – and thankfully predictable – times. And then there are application development and test environments, increasingly with high and often unpredictable storage I/O demands. The heavy I/O demands of all three of these means that the nodes in an HCI 1.0-style cluster have to be overprovisioned.

Disaggregating Compute and Storage

The answer to this problem, which HPE calls HCI 2.0, is to do what the hyperscalers and cloud builders do, and that is to disaggregate the virtual compute and the virtual storage from each other, but to present the combined pieces as a single, converged platform that can scale – but without the premium monthly price point that the cloud suppliers charge.

The HCI 2.0 stack from HPE includes its Nimble Storage all-flash arrays or their follow-on Alletra 6000 all-flash arrays for block storage underpinning the server virtualisation hypervisors on the compute nodes, in this case various ProLiant servers running the ESXi hypervisor and the vCenter management stack from VMware. (Specifically, vCenter Standard Edition, either the 6.7 or 7.0 releases.)

The stack includes a storage software abstraction layer out of the Nimble Storage organisation, called dHCI, and also requires HPE’s InfoSight management console, which is common across HPE servers and storage. This has been infused with various kinds of artificial intelligence, and comes out of the Nimble Storage acquisition as well.

Every customer has its own reason for deploying dHCI and the benefits they derive from the move also vary. In some cases, customers moving to the dHCI stack were getting rid of ageing equipment due to increased need to support virtual workers during the coronavirus pandemic, while at the same time lowering the cost of supporting the infrastructure and lowering downtime.

This was the case for Highmark Credit Union in the United States, for instance. PetSure, which provides pet insurance, deployed remote veterinary application software on a dHCI stack and was able to cut operating costs by half, double the performance of virtual desktop infrastructure (VDI) middleware by a factor of 2X, and provision applications in half the time.

After the coronavirus pandemic hit, National Tree Company had to move its business online and needed to speed up processes to meet surging demand; it can process orders in 20 minutes instead of overnight and boost production by 70 percent while at the same time eliminating back orders.

The Institution of Engineering and Technology, which is a charity that supports the education of engineers, replaced its ageing legacy infrastructure, supporting 160 VMs on six servers, with a dHCI stack, and immediately cut 27 per cent of the storage requirements for those applications due to overprovisioning.

The case studies above demonstrate some of the ways that HPE has removed the limitations of traditional HCI. By enabling customers to scale compute, storage and memory independently HPE’s dHCI storage hardware and management software delivers significant performance and cost benefits - and no more over-provisioning. Or as HPE puts it, “HCI 2.0 delivers a better HCI experience without any of the trade-offs”.

Sponsored by HPE.

HPE has produced a short video that introduces HPE Data Services Console. You can watch it here:

Youtube Video

Biting the hand that feeds IT © 1998–2022