Devops

This article is more than 1 year old

Microsoft's Azure Kubernetes Service mucked my cluster!

Redmond blames user error, invites further feedback to improve its service

Wed 8 Aug 2018 // 06:04 UTC

Microsoft's Azure Kubernetes Service (AKS) was launched to world+dog in June, however, a few disgruntled customers say the managed container confection isn't fully baked yet.

In a blog post published on Monday, Prashant Deva, creator of an app and infrastructure monitoring service called DripStat, savaged AKS, calling it "an alpha service marked as GA [generally available] by Microsoft."

Deva said he moved his company's production workload to AKS last month, and has been plagued by random DNS failures for domains outside of Azure and hostnames inside the Azure Virtual Network.

He characterized the response from Microsoft support – advice not to use excessive memory and CPU resources – as ridiculous, and said Microsoft failed to respond when told the DNS issues occurred mainly during application startup when memory and CPU usage is minimal.

Then there was the AKS Kubernetes Dashboard, which crashed after a few days and required a reboot of the Kubernetes API Server to fix. And this happened, Deva said, on a daily basis, which meant the constant filing of support tickets.

Have you tried turning your infrastructure off and on again?

When Docker containers crashed, the underlying virtual machine would fail too, according to Deva. Recovery required manually rebooting the VM from the Azure portal. He described the response he got from Azure Support thus: "Yeah this is your problem. Just make sure your containers never crash."

He recounts an unrecoverable cluster crash, and claims the service-level agreement (SLA), coving the VMs underlying AKS but not AKS itself, was violated.

"Azure Support has been the worst support experience of my life," he said, noting that he's moved to Google Cloud Platform for its Kubernetes service. "...Ignoring the SLA violation is downright fraudulent behavior."

Reached via Twitter's private message system, Deva said his experience was limited to AKS and didn't reflect other Azure services.

"This has been very poorly handled by Microsoft," he told The Register. "The worst part is them trying to blame the user for issues on their end."

In an email to El Reg, a Microsoft spokesperson attributed the problem to Deva running workloads without a memory limit:

In the course of an in-depth engagement by our engineering team, we determined that the customer’s workloads had been overscheduled on the nodes in his cluster, crowding out system services and causing undesirable behavior.

We provided recommendations for how the customer could prevent this from reoccurring and have made corresponding improvements in AKS to ensure that customers cannot inadvertently get into this situation again. We are also continuing to invest in providing better diagnostic and monitoring tools so that customers and our own support engineers can more quickly determine what might be causing problems in a customer’s environment. We are always concerned if a customer has an issue with AKS and we will use this feedback to continue to improve the service and our support process.

An individual posting under the name QiKe, claiming to be an engineering lead on AKS, offered a similar explanation in a post to Hacker News.

Deva is not the only AKS customer to report misadventures. Colin Jemmott, senior data scientist at Seismic Software, observed via Twitter, "This matches my experience with @Azure managed Kubernetes (AKS)."

In late June, Wojciech Barczyński, a senior software engineer at SMACC, a deep learning and finance biz, described a number of issues that arose using AKS. He hasn't jumped ship, however, he advises people to skip "first bumpy GA months" and wait until the service becomes more stable.

"The AKS team gets more and more experience with time and the growing number of clients," he observed. "So, the service improves fast."

At the same time, AKS has fans. One person chiming in on the Hacker News thread remarked, "I've had wildly different results. My shop wasn't large by any means but Azure worked pretty much perfectly for us."

Topics

Special Features

Vendor Voice

Resources

Devops

Microsoft's Azure Kubernetes Service mucked my cluster!

Redmond blames user error, invites further feedback to improve its service

Have you tried turning your infrastructure off and on again?

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Microsoft slammed for lax security that led to China's cyber-raid on Exchange Online

French lawmakers take a swing at cloud monopolies

Cloud Software Group and Microsoft pledge another eight years of co-opetition

Reducing the cloud security overhead

US government excoriates Microsoft for 'avoidable errors' but keeps paying for its products

October 2025 will be a support massacre for a bunch of Microsoft products

Microsoft breach allowed Russian spies to steal emails from US government

Open source versus Microsoft: The new rebellion begins

AI gold rush continues as Microsoft invests $1.5B in UAE's G42

Microsoft squashes SmartScreen security bypass bug exploited in the wild

Microsoft aims to triple datacenter capacity to fuel AI boom

Microsoft claims it didn't mean to inject Copilot into Windows Server 2022 this week

About Us

Our Websites

Your Privacy