Google sharpens AI toolset with new chips, GPUs, more at Cloud Next
TPU v5e, A3 VMs, and GKE Enterprise headline first in-person shindig since pandemic
Cloud Next Google is razor-sharp-focused on AI at this year's Cloud Next, with a slew of hardware projects, including TPU updates, GPU options, and a bevy of software tools to make it all work.
At the first in-person version of the event since before the pandemic, held in the massive Moscone Center in San Francisco, Google let loose detail on its Cloud TPU v5e, the latest of its Tensor Processing Unit AI accelerators, plus virtual machine instances powered by Nvidia H100 GPUs.
TPUs are Google's custom silicon for accelerating machine learning, and the Cloud TPU service is based around the company's own TensorFlow machine learning framework in addition to other frameworks, including Jax and PyTorch.
Its previous AI chip, TPU v4, was officially released in 2021, though the search giant had been testing it for a several years prior.
With Cloud TPU v5e, Google is claiming to have doubled the training performance per dollar and 2.5 times the inference performance per dollar on large language models (LLMs) and generative AI, when compared with Cloud TPU v4.
The cloud giant uses TPUv4 engines to do inference for its own search engine and ad serving platforms.
Google will be offering eight different virtual machine configurations, ranging from one TPU chip to over 250 within a single slice.
It's not all about hardware, of course. They're focusing on greater scalability for handling large AI workloads in Cloud TPU v5e with a feature called Multislice. Currently in preview, this has been developed to allow users to scale models beyond the confines of a single TPU pod to encompass tens of thousands of TPU chips, if necessary. Training jobs were previously limited to a single slice of TPU chips.
Also aimed at demanding AI workloads like LLMs are Google's A3 virtual machine instances which have eight Nvidia H100 GPUs, dual 4th Gen Intel Xeon Scalable processors and 2TB of memory. These instances were first announced at Google IO back in May, but are now set to be available next month, it said.
With improvements in networking bandwidth due to an offload network adapter and Nvidia Connective Communications Library (NCCL), Google expects the A3 virtual machines will provide a boost for users looking to build ever more sophisticated AI models.
Google Next also yielded details around GKE Enterprise, described as a premium edition of the company's managed Google Kubernetes Engine (GKE) service for containerized workloads.
GKE Enterprise edition, to be available in preview from early September, sports a new multi-cluster capability that lets customers group similar workloads together as "fleets" and apply custom configurations and policy guardrails across the fleet, Google said.
- US Republican party's spam filter lawsuit against Google dimissed
- Microsoft still prohibits Google or Alibaba from running O365 Windows Apps
- Big Tech pumps $235M into AI model depot Hugging Face
- OpenAI snaps up role-playing game dev as first acquisition
This edition comes with managed security features including workload vulnerability insights, governance and policy controls, plus a managed service mesh. With capabilities drawn from Google's Anthos platform, the company claims that GKE Enterprise edition can span hybrid and multi-cloud scenarios to let users run container workloads on other public clouds and on-premises as well as on GKE.
In addition, GKE itself now supports both Cloud TPU v5e and the A3 virtual machine instances with H100 GPUs for demanding AI workloads, Google said.
Also continuing the AI theme, Google is bringing additions to its Google Distributed Cloud (GDC) offering, plus updated hardware to support the on-prem extension to its cloud platform.
The three new AI and data offerings are Vertex AI integrations, AlloyDB Omni, and Dataproc Spark. The Vertex integrations bring Vertex Prediction and Vertex Pipelines to GDC Hosted, although these will only be available in preview from Q2 2024.
AlloyDB Omni is a new managed database engine, claimed to offer twice the speed of PostgreSQL for transactional workloads, and currently available in preview.
Dataproc Spark is a managed service for running analytics workloads under Apache Spark, claimed to offer users lower costs than deploying Spark themselves. It will be available in preview from Q4.
Finally, Google said it is introducing an updated hardware stack for GDC, featuring 4th Gen Intel Xeon Scalable processors and higher performance network fabrics with up to 400Gbps throughput.