Sponsored So, you’ve fork lifted a bunch of applications to the cloud, and you’re looking forward to happy years of low cost, worry-free operation, where you can provision oodles of computing power at the flick of a switch. Congratulations. But you didn’t really think it would be that easy, did you?
Cloud computing’s dark secret is that one size does not fit all, and it instead depends on the applications and the trade offs you are prepared to make. Dreams of fast, smooth access will fail to materialise unless this reality is understood and embraced. You don't want to roll out slow, clunky apps that leave users fuming. So, what are you going to do about it?
A survey of 390 IT pros last year by SIOS Technology and ActualTech Media Research found that 98 per cent of cloud deployments experience some kind of performance problem each year. Almost half (45 per cent) of them found the hamster falling out of the wheel once a week, and 18 per cent reporting issues daily.
Why do cloud applications continue to offer sluggish levels of performance? The problem is people “using the application or the database,” according to 64 per cent of irritated cloud customers. The simplest answer, presumably, is not to use the application or the database, though that’s a little self-defeating, so we’d better find some other options.
You can tune your newly cloud-hosted software to solve some issues. The DBA should tinker with cache and bucket sizes, as well as the I/O parameters, to see how much extra juice they can squeeze out of the system, while you increase the database's resources, such as storage capacity, so it’s not maxing out. You also have to make sure you have the right application or database for the job. Lifting and shifting some legacy code into a cloud environment may cause a performance hit in new infrastructure, so could need some reworking.
After all that, though, you may have to look at the underlying hardware infrastructure itself. Do you have the right hardware — or even the right cloud architecture — for the job?
Matching the workload to the processor shouldn’t be too hard because vendors build server CPUs with multi-threading and virtualisation in mind. Intel® even optimises its Xeon® Scalable chips for machine-learning algorithms now, eliminating the need for GPUs for certain AI workloads, such as inference.
There are other parts of your cloud infrastructure that you can optimise, storage being one. Workload analysis has an important impact on your storage choice, whether you’re using a public, hybrid, or private cloud architecture. Archiving and disaster recovery are well-suited for public cloud environments because the emphasis is on location independence and reliable replication rather than latency.
If you’re using spinning disks for high-throughput transactional or analytics workloads, you could supercharge performance with some judiciously placed solid-state storage. NAND flash-based SSD drives should support the PCIe-based NVMe interface, maximising their IOPs capabilities. Not all SSD architectures are equal. Multi-layer cell (storing multiple bits per flash cell) is rapidly tapering off, ceding top place to triple-level cell (TLC) architectures, which are becoming more mainstream in enterprise markets. These store three bits per NAND cell, and while they offer lower data speeds than SLCs, they’re cheaper, and more dense, helping them to make faster inroads into hard disk markets. However, Intel® developed QLC, a quad-level cell, which adds another bit, increasing density by another third.
It’s important to put your storage as close to the application as possible, especially if it’s latency-sensitive enough to move you to solid-state media: you have to locate your SSD capacity near the workloads you’re running. If that’s in the public cloud, pay-as-you-go contracts help you avoid the initial capital investment. However, be sure to read the fine print. Service providers may throttle your monthly IOPs.
These performance enhancers can deliver tangible benefits, but for even greater performance in the cloud, you may need even faster components. In-memory data processing is one answer here.
Conventional DRAM gets you closer to the CPU, reducing latency. However, it’s expensive, with a low capacity, and it’s volatile, leading to increases in latency if you have to reload data into it.
Rather than dismissing memory as a small, expensive, and volatile resource, consider persistent storage memory, which marries capacity and low latency. It offers speeds closer to DRAM, with the non-volatility of SSDs.
Persistent memory needs a new architecture. Along with Micron, Intel® developed 3D XPoint™, a memory architecture that’s faster than the NAND flash used in SSDs, and which offers persistent memory, unlike DRAM. So it bridges the gap between the two.
Intel® found that replacing straight SSD capacity with a combination of SSD and Optane™ increased I/O operations per second (IOPs) by a factor of 1.9, while lowering latency by around a third. It worked some extra magic with cache acceleration software to increase storage application performance.
These persistent memory systems come into their own in areas demanding both memory- and I/O-intensive infrastructure, such as simulations, AI training, financial trading, and network packet processing. There are many other applications that can benefit too, but you must analyse your workload to figure out the right combination of persistent memory and SSDs.
Intel® Optane™ technology has found a place both in on-premises environments and also in the public cloud. For example, Google announced a partnership with Intel® and SAP to offer virtual machines with Optane™ DC persistent memory for SAP workloads in June 2018. A few months later, it expanded that offering to include virtual machines with 7TB of total memory using Optane™.
Hybrid cloud performance
Not all workloads are fit for the public cloud, though. Data volume, data portability requirements, security and compliance, and total cost of ownership (TCO) all play a part in those discussions, as does proximity to legacy resources that have rigid latency requirements and which will never leave the data centre.
It’s also more difficult to measure performance and conduct root cause analysis in the cloud because you don’t have full control over that environment. That could be because various connection or configuration problems are opaque, especially in a platform-as-a-service environment where you don’t have as much visibility as in an infrastructure-as-a-service scenario.
I/O intensive applications, or those with custom, specialised workloads that need a lot of tuning, are often best suited to a local cloud environment where you have more control over the hardware. However, that doesn’t mean that you have to trade out the flexibility of a cloud infrastructure for tighter control over performance.
Hyper-converged systems offer an opportunity to create a tightly integrated combination of virtualised network, storage, and computing resources on your own premises. In their early days, companies used them mostly as discrete solutions for specific projects. They’re approaching a breakout point, though, evolving into more mainstream options that you can use more systemically in your computing environment. Market advisory firm Storage Switzerland points to improvements in the underlying software that manages the storage, networking, and computing resources in hyperconverged nodes. It also mentions the increasing traction they’re gaining with tier-one application vendors, who are certifying their software to work with these environments.
We are seeing hardware and software vendors work more tightly together on hyper-converged infrastructure (HCI) certifications. For example, VMware now certifies its vSAN software-defined storage product to work on Intel® HCI boxes. Meanwhile, you can also buy Intel® Xeon® Scalable-based HCI boxes running Microsoft’s Azure Stack that connect to the Windows giant's back-end cloud infrastructure and services.
Hybrid cloud environments highlight another component to consider in cloud performance: networking. The network is more important than ever in a cloud environment, because it plays a big part in latency issues. These concerns drive companies to take colocation space in facilities that have a direct connection, and thus are geographically close, to major cloud hubs. For example, geospatial software and services company ESRI took space in a colo facility close to Seattle so it could get a direct, low latency connection to its hyperscale cloud service provider just up the road.
You can use a combination of techniques under the banner of WAN optimisation to help improve performance in a hybrid cloud setup. Data caching can preserve information needed by the on-premises component of your hybrid cloud solution, avoiding both the performance hit and potential cost of retransmitting it from a public cloud service provider. The deduplication and compression can reduce, and therefore speed up, traffic between your cloud and on-premises sites.
Very little is really guaranteed in the cloud. Service providers make it perfectly clear that you're responsible for a lot of the security measures in a public cloud environment. Similarly, performance is often down to you, unless you're using an entirely closed loop software-as-a-service solution where you have no input into the underlying setup. If you're viewing the cloud as a strategic resource that will support your company's applications, it's worth taking time to decide where the high-performance workloads will sit— and how you can choose the right hardware and software combinations to support the performance requirements you need.
Sponsored by Intel®