Amazon sticks AI inference chip up for rent in the cloud for machine-learning geeks

AWS subscribers, you can forget GPUs (unless you need to train your models)

re:Invent Amazon Web Services has made Inf1, its cloud instance for running machine-learning software on its custom-designed AI chip Inferentia, available to all.

AWS CEO Andy Jassy announced the service on Tuesday during the internet goliath's annual re:Invent conference in Las Vegas. Inf1 is available as an EC2 instance, where developers will have access to a virtual machine that contains up to 16 AWS Inferentia chips capable of generating up to 2,000 TOPS in performance. These are also paired with Intel’s second-generation Xeon Scalable microprocessors for more grunt as needed.

As the name suggests, the AWS Inferentia is an accelerator chip optimized for running inference workloads using a machine-learning model trained to perform a specific task, such as image classification or face and speech recognition. AWS Inferencia was first teased last year. Now, engineers can finally use the hardware via its cloud service.

CodeGuru: ML-powered code reviews and a new profiler

Closing Windows, opening ML and AI-powered coding: Orgy of announcements in marathon AWS keynote


Inf1 instances are faster and cheaper, according to Jassy. They can deliver up to 3x higher throughput and slash costs by up to 40 per cent per inference compared to Amazon’s G4 instances that use Nvidia’s T4 GPUs with AWS custom Intel Cascade Lake CPUs.

AWS Inferentia specs are a little patchy, but here’s what we know so far. Each chip has up to 128 TOPS in performance, supports data represented in FP16, BF16, and INT8 types, and can handle a range of machine-learning frameworks like TensorFlow, Pytorch, and MXNet. If developers need to train a model, they’ll need to spin up a separate instance on AWS that employs GPUs: don't forget, the Inferentia is aimed at inference.

The trained model can then be deployed on Inf1 using AWS Neuron, a software package of tools to compile the system and optimize it to run on Inferentia hardware.

In other related AWS hardware news, Amazon teased the Graviton2, a 7nm CPU built with 64 64-bit Arm Neoverse N1 cores. It's a sneak preview of what's to come as they're not generally available to use as a cloud instance yet. We also covered Jassy’s whole keynote in more detail here. ®

Similar topics

Other stories you might like

Biting the hand that feeds IT © 1998–2022