This article is more than 1 year old
Google throws down gauntlet with first compute instances powered by AmpereOne chips
Though this is still a preview so another provider could swing it
Interview Google looks set to become the first cloud provider to offer virtual machine instances powered by Ampere's 192-core AmpereOne datacenter chip, which Ampere is now pitching as a solution for AI inferencing workloads.
Ampere launched its latest Arm-based datacenter processor back in May, and since then various cloud providers have been building out infrastructure based on it, according to Ampere, but Google is the first to announce AmpereOne-powered compute-optimized C3A instances for public access.
However, the announcement at Google Cloud Next is for an upcoming private preview starting next month, which means it is conceivable that another provider may pip them to actual public availability, if they are quick.
Google said that the C3A instances will have from 1 to 80 vCPUs with DDR5 memory, local SSD, and up to 100 Gbps networking, and deliver better price-performance than comparable x86-based virtual machines.
"C3A instances are powered by AmpereOne, so this is pretty significant for us because this is the first time that somebody is making publicly available AmpereOne to a bunch of end users," Ampere chief product officer Jeff Wittich told us.
"Obviously we've been shipping for production for a couple months now," Wittich added. "They've been going into datacenters to build out capacity for going public, but Google will be the first of the clouds that are making announcements. We'll see some other clouds follow pretty quickly behind and then we'll see the big parade of ODMs and OEMs."
Cloud providers are Ampere's target market so it is focused on their requirements, with large numbers of single-threaded cores optimized to run many workloads in parallel with predictable performance.
Cloud-native workloads that will be well suited for Google's C3A instances are said to include containerized microservices, web serving, high-performance databases, media transcoding, large-scale Java applications, cloud gaming, and high-performance computing (HPC).
However, with AI still the hot topic of the moment, Ampere is keen to promote the suitability of its chips for processing AI workloads, or the inferencing part at least.
In fact, Ampere is claiming that its many-core chips are the optimal solution for AI inferencing, and has published a white paper and blog post on the topic. It all comes down to "right-sizing" or carefully matching the compute resources to the demands of AI applications, according to the company.
"Everyone's been really focused on AI training and getting these massive large language models (LLMs) trained, and to do that you do almost need a supercomputer to go and plow through it because the models are huge," said Wittich.
- Google sharpens AI toolset with new chips, GPUs, more at Cloud Next
- Microsoft still prohibits Google or Alibaba from running O365 Windows Apps
- A closer look at Harvard and Google's HPC heart research project
- Google teases Project IDX, an AI-infused code editing thing
"The problem is that once the model is trained, now you've got to actually run the model and inferencing can be as much as 10 times more compute capacity as the training stage actually was," he claimed.
Inferencing 'considerably less computationally demanding' .... but the scale you need is key – analyst
Can this be correct? The conventional wisdom is that training requires a huge amount of resources such as costly GPUs to crunch through the data, whereas inferencing is supposed to be much less demanding, so we asked an expert.
"Inferencing is considerably less computationally demanding. However, in a lot of use cases, it's necessary to do it at much greater scale than training," Omdia's Alexander Harrowell, Principal Analyst in Advanced Computing for AI, told us.
"The whole idea is that you train the model once and then use it for however many inferences you need. Our survey research puts the multiplier from training to inference at 4-5. But if your workload is something like the YouTube recommendation engine, you can see how that would be quite the compute demand even if the model was a small one."
Harrowell told us that the problem with using top-end GPUs for inferencing is not so much that they don't give you enough as that they might be overkill and excessively expensive, and this is why the idea of specialized inference accelerators is attractive.
If you are thinking in terms of compute across an entire inference server fleet – which Ampere's cloud customers are – then it may well be right that a CPU is the optimal solution, he added.
Ampere's claim is that its many-core processors scale better than rivals, and it says they offer a notable advantage in energy efficiency, although it doesn't quantify this.
The latter would be a key distinction, because in benchmark charts shown to us by Ampere, its existing Altra Max 128-core chip is beaten in inferencing performance by AMD's 96-core 4th Gen Epyc chips, but offers better performance per watt and per dollar, Ampere claims.
The company's white paper claims that Ampere CPUs are "the best choice for AI workloads" because they deliver "the best performance, cost-effectiveness, and power efficiency when compared to any other CPU or GPU."
Those are strong claims, which will no doubt be put to the test once the AmpereOne virtual machine instances are available for developers to get to grips with. ®