Companies flush money down the drain with overfed Kubernetes cloud clusters
Just 13% of provisioned CPUs, 20% of memory utilized, study finds
Cloud optimization biz CAST AI says that companies are still overprovisioning resources and paying too much as a consequence. It claims that in Kubernetes clusters of 50 or more CPUs, only 13 percent of provisioned CPUs and 20 percent of memory is typically utilized.
CAST develops a platform to monitor use of Kubernetes resources and compare them with what it calculates the workload actually requires. The figures in its latest study are based on an analysis of more than 4,000 clusters operated by customers prior to optimization.
The reason for this overprovisioning is largely down to an excess of caution by users, with DevOps teams anxious to avoid running out of memory, and also because of the difficulty in knowing exactly how much resources may be needed at the outset. Users can also be confused by the sheer choice available, with AWS offering 600 different EC2 instances, the study says.
CAST reckons this can be seen by the percentage of unutilized memory which is virtually identical across the three major cloud platforms – AWS, Azure, and Google Cloud – meaning that it is not due to the peculiarities of a particular cloud.
CPU utilization is likewise little different, with clusters on AWS and Azure at an average 11 percent utilization, while those on Google Cloud tended to be higher, at 17 percent.
Google Kubernetes Engine (GKE) users have access to custom instances, which allow a more precise CPU/memory ratio than the other two clouds, CAST's study states.
- 40k servers, 400k CPUs and 40 PB of storage later... welcome to Google Cloud
- GitOps pioneer Weaveworks unravels after funding fabric frays
- Microsoft brings its cloudy virtual desktops on-prem to AzureStack HCI
- SUSE's Captain Container on sailing the open source seas
In large deployments - mega clusters of 30,000 CPUs or more - utilization tended to be higher at 44 percent, which CAST attributes to these being run by larger teams that can pay more attention to management.
As well as overprovisioning, CAST points to a reluctance to use spot instances as another driver of overspending. Many are hesitant to use this type of virtual machine, which is essentially spare capacity made available at less cost than the standard on-demand price for the same instance.
The average on-demand cost per CPU is $6.70 per hour, according to CAST, while for spot instances it is $1.80 per hour.
The reluctance is because the cloud provider can reclaim the instance at any time, with minimal warning. However, the CAST AI platform can move a customer's workload to another instance automatically.
CAST provides free analysis for organizations to see how much their cloud resources are being overprovisioned, while subscribers can have the platform take action to optimize things.
"This year's report makes it clear that companies running applications on Kubernetes are still in the early stages of their optimization journeys, and they're grappling with the complexity of manually managing cloud-native infrastructure," CAST AI co-founder and chief product officer Laurent Gil said in a statement. ®