What’s the best time to optimize your cloud infrastructure? All the time

Focus on your applications with Continuous Optimization

Paid feature The cloud is the future. The problem for most organizations is that when it comes to working out what resources they need for that future – and what they're going to cost – they are usually flying blind.

As organizations move more and more complex workloads to the cloud, they need to be sure they can allocate resources and costs efficiently, and pinpoint where there is potential waste and overspend. The problem is, at best, the tooling provided by the main cloud platforms are basic visibility tools that show where they've been and give the merest hint of what things will look like in the very immediate future.

This was the situation facing Spot's founder, Amiram Shachar when he was working as a director of DevOps while studying computer science. "I had a real problem … our cost of cloud was way off what the calculators we used to predict and project how much money we were going to pay." Cloud vendors' tools would allow him to visualize his bills, but this was all retrospective.

Back then, the only way to get a grip on the problem was to take time out with a stack of price lists, and a spreadsheet.

And this all ran counter to the continuous improvement philosophy behind DevOps. "I wanted to make it a continuous thing that I'm always optimizing costs," he says. From there, the obvious step was to recognize that "the machine can do it better than me." So, as part of his studies, he prepared a project about optimizing cloud costs through sophisticated pricing selection of compute instances.

That project became Spot, which aims to help businesses make the most efficient use of the multitude of services and pricing options available in the cloud. The company was snapped up by NetApp in 2020, and Shachar remains its vice president and general manager.

The Spot by NetApp model is straightforward. Cloud vendors offer three types of resource. Shachar describes standard pay by the hour on-demand pricing as "very linear and the most expensive way to buy cloud". Reserved instances will attract a discount in the tens of percent – but only if a user commits for a period of time. The problem is, "you need to do a lot of planning work."

The biggest potential discounts come on spot instances, the spare capacity that cloud platforms have available at any given moment. These can be as much as 90 percent cheaper. That's compelling, if you can live with the fact that the cloud vendor they'll kick you off with zero notice if it needs that capacity. This poses an obvious operational challenge, and consequently the option was historically unfit when it came to serious work.

It might seem that cloud providers would have no interest in helping customers use the cheapest possible instances. But says Shachar, this ignores the Jevons paradox that "the more optimized, the more efficient, you become in consuming a resource, the more of that resource you consume."

Also, "cloud providers saw that unoptimized customers tend to churn much faster than optimized customers…they know if they optimize they'll spend more, they'll use more, and they'll stay longer."

So, taking those three pricing models as its starting point, Spot delivers three main services under its CloudOps solutions. Elastigroup offers predictive infrastructure scaling to run workloads on spot instances, while Eco offers reserved instance optimization for maximum resource utilization. Ocean offers automated container infrastructure management. The service currently spans AWS, Azure and GCP, with other public clouds likely to be included in the future.

Optimize, continuously

Spot by NetApp's underlying technology aims to provide the continuous optimization that Shachar envisioned when he was a student, including predictive replacement and rebalancing, application-driven autoscaling, and right-sizing.

At the backend, is a machine learning engine that, as Shachar puts it, "learns all the cloud pricing, learns all the cloud volatility." This includes issues like demand, capacity, when spot, on-demand and reserved servers make most sense for a given workload, and when medium, large or extra-large servers are most appropriate. The answers to all these questions "might change on an hourly basis, sometimes even on a minute basis."

At the same time, the platform analyses customers' apps and usage, "The important thing for us is to learn each pattern without the customer telling us what the pattern is, and make a decision based on the pattern." A bank might have large overnight runs, for example, while a ticketing agency might have a spike at 10am every morning as new events are released.

Spot by NetApp has integrations with platforms and tools such as Chef, Puppet, Terraform, and Jenkins. "We've done a lot of work in integrating with every DevOps tool that is out there to become a natural way for DevOps people to consume us," Shachar says.

When customers log into the Spot console, it presents them with a breakdown of " what their environments can look like when optimized", and offers up recommendations of best options to run their workloads. Customers can modify the recommendations if they wish.

The aim is to deliver ROI from day one. Then, "the automation kicks in. And as the customer continues to trust us, they deploy it in more places, and in more pipelines, in more CI/CD environments."

Customers still get their monthly bill from the cloud provider. Spot by NetApp is completely success based with no upfront costs. The service takes a percentage of what customers would have paid without the optimization and what they actually pay. If there is no saving, customers pay nothing.

Spot by NetApp revenue mix is changing in line with changes in the workloads companies are deploying, and cloud vendors' offerings. Three years ago, Shachar explains, Elastigroup accounted for the lion's share of turnover. But "right now about 40 per cent of our revenue comes from Eco, 40 per cent comes from Ocean, and 20 per cent is coming from Elastigroup."

Elastigroup will likely continue to shrink, he says, as it targets virtual machines, and people are using more and more containers and serverless. "Ocean just really rides that wave."

Automate, automate, automate

The default approach to deploying containers has been to use an orchestrator, typically Kubernetes, and naturally the cloud providers all offer their own fully managed Kubernetes' services. But Spot goes further, working with all of these, while leveraging the separation of the container from the underlying compute it is running on.

"We can manage the compute while intelligently and transparently moving you across compute instances," he says. "As a customer, you're not even aware that the Ocean is making decisions every 10 seconds, to put you on another underlying instance, to optimize your performance and costs."

This works out at about 80 to 90 per cent cheaper, but the key reason customers choose Ocean is the automation and simplicity it offers for deploying containers, because "this is how they can scale their operations."

Meanwhile, Eco keeps on top of the options for workloads that customers should consider for committed capacity to ensure the best possible utilization. The platform uses AI to build a recommended portfolio for customers, which changes dynamically to maximize savings and flexibility, while freeing the customer from the time and personnel overhead of keeping on top of multiple options available from cloud providers.

This all naturally raises the question, "What else can I optimize?" And, indeed, automate. The answer is to look back up the development chain.

As Shachar says, "you build in the cloud, then you care about security, then you care about CI and CD, then you care about automation and monitoring. And then in the end when you get your bill, and people are upset, you care about cost."

So Spot's aim is to meet customers earlier in their journey. It is aiming to release products in the security and CI/CD space, and address new use cases, such as running containerized big data applications. The ultimate aim being to enable "bigger, broader and more complete, continuous optimization for cloud operations".

This could encompass better integrations, for example, with CI/CD providers. Spot has launched a private preview of Ocean CD, aimed at providing continuous delivery for containers, with automated and customizable delivery workflows, as well as visibility, verification and automated rollbacks.

And the company has launched Spot Security, covering "all the things you need to know as a DevOps to secure your environment." Also in private preview, the security platform analyses users' cloud utilization, configurations and communications, as well as other information like audit trails. It uses AI to identify risks, including misconfiguration and anomalous behaviors, and then offers guided remediation and prioritization of issues.

Bringing increased automation further up the development chain will no doubt help save customers time and money. But that doesn't mean customers can ever relax into some sort of steady state, says Shachar. "Over time, cloud will be more expensive and more complex. Because you're going to use more of it, and there are new capabilities being added."

And that's precisely why "optimization shouldn't be a once in a while project. It should be a continuous thing. Right?"

Sponsored by Spot by NetApp.

Biting the hand that feeds IT © 1998–2022