Google admits Kubernetes container tech is so complex, it's had to roll out an Autopilot feature to do it all for you

More expensive, less flexible, but easier and safer to use

Google has recognised that users struggle to configure Kubernetes correctly and introduced a new Autopilot service in an attempt to simplify deployment and management.

Two things everyone knows about Kubernetes are: first, that it has won in the critically important container orchestration space, and second, that its complexity is both a barrier to adoption and a common cause of errors.

Even Google, the inventor and biggest promoter of Kubernetes, admits this is the case. "Despite 6 years of progress, Kubernetes is still incredibly complex," said Drew Bradstock, product lead for Google Kubernetes Engine (GKE). "What we've seen in the past year or so is a lot of enterprises are embracing Kubernetes, but then they run headlong into the difficulty."

GKE is a Kubernetes platform that runs primarily on Google Cloud Platform (GCP), but also on other clouds or on-premises, where it is part of Anthos.

This is the rationale for Autopilot, which is a fully managed deployment to GKE, provided it is running on GCP. But is GKE not already a managed service? It is – the difference is that Autopilot is both more opinionated and more automated than plain GKE.

Kubernetes has the concept of clusters (a set of physical or virtual servers), nodes (individual servers), pods (a management unit representing one or more containers on a node), and containers themselves. GKE is fully managed to the cluster level. Autopilot extends that to nodes and pods.

The best place to look in order to understand Autopilot's features and limitations is here, taking careful note of the options marked "pre-configured", which means they cannot be changed.

In essence it is another way to purchase and manage GKE resources that gives less flexibility but more convenience. Since Google manages more of the configuration, it offers a higher SLA of 99.9 per cent uptime for Autopilot pods in multiple zones.

GKE Autopilot: the nodes are managed for you

GKE Autopilot... the nodes are managed for you

On Google's cloud, regions are composed of three or more zones. Putting all resources in a single zone is less resilient than spreading them over multiple zones, while extending failover to multiple regions maximizes resilience. Autopilot clusters are always regional rather than zonal, which is good for resilience but higher cost.

Other Autopilot constraints are that the operating system is always Google's own "container optimized" Linux with Containerd, not Linux with Docker, nor Windows Server. The maximum number of pods per node is 32, as opposed to 110 on standard GKE.

There is no SSH access to nodes; Autopilot nodes are locked down. GPU and TPU (Tensor Processing Unit) support is not available in Autopilot, though it is planned for the future. "Removing SSH was a big deal," Bradstock said. It is somewhat limiting, but Bradstock told us that it is based on research "where people had misconfigured things, with the best intentions."

Money, money

Andy Jassy during AWS reinvent 2020 keynote

EKS appeal: 'Just snap install' it, says Canonical as AWS's container game goes hybrid


The pricing model is different too, being based on the CPU, memory, and storage used by pods, rather than on compute engine instances (virtual machines) used. There is also a fee of $0.10 per hour for each Autopilot cluster. GKE standard also has a $0.10 per hour fee for each cluster; you pay one but not both.

The obvious question of whether Autopilot or GKE standard is more expensive is not easy to answer. Since it is a somewhat premium deployment, Autopilot will cost more than a carefully optimised GKE standard deployment. "There is a premium over regular GKE," said Bradstock, "because we've got full SRE (Site Reliability Engineering) support and SLA support, it's not just the functionality."

That said, a GKE standard deployment which is under-utilised because it is hard to estimate the correct specification for the compute instances could cost more than Autopilot.

Why not just use Cloud Run, which allows container workloads to be deployed and run without any configuration of clusters, nodes, and pods, even though it still runs on GKE? "Cloud Run is a great opinionated developer environment, one app can spin up from zero to 1,000 and back to zero, that's where Cloud Run is targeted," Bradstock told us. "Autopilot is for people who want to do less work but they still want Kubernetes, they want to be able to see everything, they want to use third-party scripts, they want to build their platform."

Is compatibility with add-ons an issue with Autopilot's constrained environment? "There are things that don't work on day one," said Bradstock, though some third-party tools do work such as Datadog monitoring are already supported, and DaemonSets, a Kubernetes feature for running a service on every node, used by many add-ons.

The configuration of storage, compute, and the network means "we have to give up some level of flexibility and some connections," he told us. "But we definitely want the third-party ecosystem to run on it."

The Autopilot feature means that Google offers a wider range of Kubernetes options, from most to least hands-on. The trade-off is not only higher cost and less flexibility, but also potential deskilling of enterprise admins, though the argument is that businesses should focus on what provides business value rather than requirements that can be fulfilled by a third party.

Google's engineering has a better reputation than its customer support. Software engineer Kevin Lin, who admittedly is ex-Amazon, wrote recently of his experience as a new customer for AWS versus Google.

Google was slower, less helpful, and ended up referring him to a third-party partner, he said. "The initial onboarding call was entirely about how much money I was planning on spending with Google (as opposed to the Amazon call where they wanted to help me architect my service). Google Cloud has really nice ergonomics and world-class engineers but an awful reputation for customer support. My anecdotal experience seems to support this," he said.

Proof, if any were needed, that good engineering is not the only thing that matters for GCP to increase its cloud market share. ®

Similar topics

Other stories you might like

  • Cheers ransomware hits VMware ESXi systems
    Now we can say extortionware has jumped the shark

    Another ransomware strain is targeting VMware ESXi servers, which have been the focus of extortionists and other miscreants in recent months.

    ESXi, a bare-metal hypervisor used by a broad range of organizations throughout the world, has become the target of such ransomware families as LockBit, Hive, and RansomEXX. The ubiquitous use of the technology, and the size of some companies that use it has made it an efficient way for crooks to infect large numbers of virtualized systems and connected devices and equipment, according to researchers with Trend Micro.

    "ESXi is widely used in enterprise settings for server virtualization," Trend Micro noted in a write-up this week. "It is therefore a popular target for ransomware attacks … Compromising ESXi servers has been a scheme used by some notorious cybercriminal groups because it is a means to swiftly spread the ransomware to many devices."

    Continue reading
  • Twitter founder Dorsey beats hasty retweet from the board
    We'll see you around the Block

    Twitter has officially entered the post-Dorsey age: its founder and two-time CEO's board term expired Wednesday, marking the first time the social media company hasn't had him around in some capacity.

    Jack Dorsey announced his resignation as Twitter chief exec in November 2021, and passed the baton to Parag Agrawal while remaining on the board. Now that board term has ended, and Dorsey has stepped down as expected. Agrawal has taken Dorsey's board seat; Salesforce co-CEO Bret Taylor has assumed the role of Twitter's board chair. 

    In his resignation announcement, Dorsey – who co-founded and is CEO of Block (formerly Square) – said having founders leading the companies they created can be severely limiting for an organization and can serve as a single point of failure. "I believe it's critical a company can stand on its own, free of its founder's influence or direction," Dorsey said. He didn't respond to a request for further comment today. 

    Continue reading
  • Snowflake stock drops as some top customers cut usage
    You might say its valuation is melting away

    IPO darling Snowflake's share price took a beating in an already bearish market for tech stocks after filing weaker than expected financial guidance amid a slowdown in orders from some of its largest customers.

    For its first quarter of fiscal 2023, ended April 30, Snowflake's revenue grew 85 percent year-on-year to $422.4 million. The company made an operating loss of $188.8 million, albeit down from $205.6 million a year ago.

    Although surpassing revenue expectations, the cloud-based data warehousing business saw its valuation tumble 16 percent in extended trading on Wednesday. Its stock price dived from $133 apiece to $117 in after-hours trading, and today is cruising back at $127. That stumble arrived amid a general tech stock sell-off some observers said was overdue.

    Continue reading

Biting the hand that feeds IT © 1998–2022