Scaleway Ampere servers promise AI smarts without breaking the bank
We'll see when they reveal the prices
Ampere Computing and French cloud operator Scaleway are pushing Arm-based servers as a more cost-efficient way of operating AI-based services, especially when it comes to inferencing.
At the ai-PULSE conference in Paris, the two companies announced availability of cost-optimized COP-Arm instances operated by Scaleway using servers based on Ampere's Altra family of Arm-based datacenter processors.
According to Scaleway, the COP-Arm instances are "tailored to meet the demands of AI-driven applications" such as implementing chatbots, real-time analytics, or video content analysis, and it is claimed these can be delivered at a fraction of the cost of other solutions.
"With Ampere Altra Processors, we are offering businesses a powerful and cost-effective alternative, enabling them to achieve high-performance results in the most efficient and sustainable way possible," Scaleway CEO Damien Lucas said in a statement.
But that remains to be seen as the company has yet to disclose pricing.
The craze for generative AI has seen a rise in demand for more powerful servers stuffed with top-end GPUs, which is also leading to greater power consumption and higher prices for anyone wanting to access GPU-accelerated instances.
But while this kind of infrastructure is useful for training models, it isn't required for the inferencing work when putting the resultant algorithms to work, according to Ampere's chief product officer, Jeff Wittich.
"Often when we talk about AI, we forget that AI training and inference are two different tasks," Wittich said. Training is a "one-off, gigantic task that takes a long time," he explained, so while accelerators can make a lot of sense for training, inference workloads don't need to be done on supercomputing hardware.
- Ampere leads a pack of AI startups to challenge Nvidia's 'unsustainable' GPUs
- Core blimey, Intel's answer to AMD and Ampere's cloudy chips has 288 of them
- Post-IPO, Arm to push purpose-built almost-processors
- Google throws down gauntlet with first compute instances powered by AmpereOne chips
"In fact, general-purpose CPUs are good at inference, and they always have been," Wittich said. "Inference is your scale model that you're running all the time, so efficiency is more important here."
Wittich claimed that running OpenAI's generative speech recognition model, Whisper, on Ampere's 128-core Altra Max processor consumes 3.6 times less power per inference than operating the same on an Nvidia A10 Tensor Core GPU.
However, potential customers will not be able to judge the cost-efficient claim until Scaleway details the configuration of its COP-Arm instances and how much it is charging for these. We have inquired, but the details have not yet been made publicly available.
Ampere pointed us to a quote from an early customer, Paris-based Lampi.ai, which develops an AI copilot platform for enterprises. Co-founder and CEO Guillaume Couturas claimed that using COP-Arm provides 10 times the speed at a tenth of the cost of "competitors in x86 architectures."
Last month, Ampere launched the AI Platform Alliance, bringing together a number of chip startups to promote an open AI ecosystem to take on the likes of Nvidia.
The company also unveiled a more powerful processor this year, the 192-core AmpereOne. This is said to consume more power than the Altra family, at around 1.8 watts per core compared with between 1.25 and 1.4 watts per core for the chips going into Scaleway's COP-Arm instances. ®