CAST AI puts out report on extent of enterprises cloud resource overspend
Tool relies on platform getting everything right in real time, cautions analyst
A report from cloud startup CAST AI claims companies running cloud-native applications typically spend three times as much as they need on resources because of over-provisioning.
The startup, which hopes to convince users to consider its platform as a way to control the problem, said the overspending sometimes occurs due to an over-cautious approach from the customer, but is often because of the difficulty in knowing exactly how much resource is needed.
A report by CAST lifts data it generated while running infrastructure utilization reports for more than 400 organizations using the three major cloud providers, thanks to its AI-based platform that compares the resources that an application is using with those that it actually requires.
It found that, on average, organizations are spending three times more than they need to. The majority of this can be attributed to CPUs and memory being provisioned but not used, often through users selecting virtual machine instances that are not the optimal match for the requirements of the workload.
According to CAST AI co-founder and chief product officer Laurent Gil, part of the problem is that users often hedge their bets and err on the side of caution against unexpected spikes in demand when provisioning resources.
"The number one job of DevOps is that the application must work all the time 24x7, so what do they do, they look at the worst case scenario over a week, and they tend to provision resources based on that worst case," he said.
Gil cited an example of a customer using M5 instances on AWS for an application, in which the CAST AI engine was able to recommend savings by another instance called C6a that costs less.
"The biggest difference between these two is that C6a is AMD, which is slightly less expensive on Amazon. And C means it's compute optimized so it has less memory per core, and the application didn't need all that memory," Gil explained.
The key, however, is that this change meant lower cost without impacting application performance.
Provisioning is still a significant challenge for organizations of all sizes, according to CAST AI, as the major cloud providers have a bewildering array of instance types to choose from, making it a difficult and time-consuming task to pick the right instance for the workload, not to mention ensuring that the cloud-hosted infrastructure is continuously right-sized. The issue is that DevOps professionals would need to continually monitor cloud resources to achieve this.
If you're a DevOps, your number one priority is to make sure the app works. If it doesn't have enough resources you are in trouble, so I think this is a tendency of over provisioning... They're not evil, right?
The company is hoping customers will buy its platform to solve this, of course. The service is free to try out so users can see how much they are over-provisioning on resources, but if organizations sign up as a paying customer, they can allow the platform to take control of their cloud resource provisioning and let it automate the matching of workloads to the optimum machine instances for the job.
CAST AI works across the three major cloud platforms – AWS, Google Cloud, and Microsoft Azure – and is focused on cloud-native workloads running in containers and using Kubernetes. The firm claims that customers can see savings of 50-75 percent on cloud resources for running their applications.
The startup cited one customer, a French e-commerce startup called La Fourche, which it said was using fifteen t3.2xlarge and two t3.xlarge instances on AWS at a cost of $4,349.95 at the time of analysis. Its platform moved the workloads to five c5a.2xlarge instances instead, which resulted in a cost of $1,310.40, a saving of nearly 70 percent.
According to Gil, other savings can be made from using spot instances, which make use of spare resources that are available at much less cost than the standard on-demand price for the same machine instance.
- AWS CEO: We're not spinning out, likely to seek acquisitions
- Kraft Heinz signs up Microsoft to lift it into the cloud
- Cloud spending to scrape $500 billion this year – Gartner
- Google's plan to win the cloud war hinges on its security aspirations
"Cloud providers have to over provision resources because they have to be elastic. So some machines are not being used all of the time, and what they don't sell at a given time they typically discount by 60-70 percent," Gil said.
The drawback to this is that if AWS needs the machine back, they give you a two-minute warning before reclaiming it, which is likely to deter many from using these highly discounted instances in case they disappear at short notice.
"For us it's a benefit, because we can automatically detect that a spot instance is about to go down, and immediately replace it by another one. So there is never any outage in our case, because we replace the machine when we see it's going down, and if there are no more spot instances, we will select on demand," Gil told us.
The CAST AI engine can identify which of a customer's containers are "spot friendly," according to Gil, such that if a container is running a DNS server, for example, that would not be allocated to a spot instance. Customers can toggle on or off the option to use spot instances.
The co-founder said the statistics CAST AI has gathered on over-provisioning do not differ significantly across the three major cloud providers, indicating that the situation is not due to any particular cloud making it difficult for customers to manage costs.
"I think it's because of human behavior. Like, if you're a DevOps, your number one priority is to make sure the app works. If it doesn't have enough resources you are in trouble, so I think this is a tendency of over provisioning," he said. "They're not evil, right? They're not making you pay more for their service; it's just very hard for humans to consume something when they have so many choices."
Independent analyst Clive Longbottom told us that a monitoring and automated management system such as CAST AI makes a great deal of sense. However, he cautioned that customers are relying on the platform getting everything right in real time.
"Overall, my take would be go for it: carry out the financial calculations to make sure that the costs saved are greater than the cost of subscription to CAST AI, but make sure that you have a Plan B in place should CAST suddenly disappear one day." ®