Amazon sticks AI inference chip up for rent in the cloud for machine-learning geeks

AWS subscribers, you can forget GPUs (unless you need to train your models)

re:Invent Amazon Web Services has made Inf1, its cloud instance for running machine-learning software on its custom-designed AI chip Inferentia, available to all.

AWS CEO Andy Jassy announced the service on Tuesday during the internet goliath's annual re:Invent conference in Las Vegas. Inf1 is available as an EC2 instance, where developers will have access to a virtual machine that contains up to 16 AWS Inferentia chips capable of generating up to 2,000 TOPS in performance. These are also paired with Intel’s second-generation Xeon Scalable microprocessors for more grunt as needed.

As the name suggests, the AWS Inferentia is an accelerator chip optimized for running inference workloads using a machine-learning model trained to perform a specific task, such as image classification or face and speech recognition. AWS Inferencia was first teased last year. Now, engineers can finally use the hardware via its cloud service.

CodeGuru: ML-powered code reviews and a new profiler

Closing Windows, opening ML and AI-powered coding: Orgy of announcements in marathon AWS keynote


Inf1 instances are faster and cheaper, according to Jassy. They can deliver up to 3x higher throughput and slash costs by up to 40 per cent per inference compared to Amazon’s G4 instances that use Nvidia’s T4 GPUs with AWS custom Intel Cascade Lake CPUs.

AWS Inferentia specs are a little patchy, but here’s what we know so far. Each chip has up to 128 TOPS in performance, supports data represented in FP16, BF16, and INT8 types, and can handle a range of machine-learning frameworks like TensorFlow, Pytorch, and MXNet. If developers need to train a model, they’ll need to spin up a separate instance on AWS that employs GPUs: don't forget, the Inferentia is aimed at inference.

The trained model can then be deployed on Inf1 using AWS Neuron, a software package of tools to compile the system and optimize it to run on Inferentia hardware.

In other related AWS hardware news, Amazon teased the Graviton2, a 7nm CPU built with 64 64-bit Arm Neoverse N1 cores. It's a sneak preview of what's to come as they're not generally available to use as a cloud instance yet. We also covered Jassy’s whole keynote in more detail here. ®

Keep Reading

Tech Resources

How backup modernization changes the ransomware game

If the thrill of backing up your data and wondering if you will ever see it again has worn off, start the new year by getting rid of the lingering pain of legacy backup. Bipul Sinha, CEO of the Cloud Data Management Company, Rubrik, and Miguel Zatarain, Director of Global Infrastructure Technology at PACCAR, Fortune 500 manufacturer of trucks and Rubrik customer, are talking to the Reg’s Tim Phillips about how to eliminate the costly, slow and spotty performance of legacy backup, and how to modernize your implementation in 2021 to make your business more resilient.

The State of Application Security 2020

Forrester analyzed the state of application security in 2020 and found over 75% of external attacks are attributed to web application and software exploits.

Webcast Slide Deck | Three reasons you need a hybrid multicloud

Businesses need their IT teams to operate applications and data in a hybrid environment spanning on-premises private and public clouds. But this poses many challenges, such as managing complex networking, re-architecting applications for the cloud, and managing multiple infrastructure silos. There is a pressing need for a single platform that addresses these challenges - a hybrid multicloud built for the digital innovation era. Just this Regcast to find out: Why hybrid multicloud is the ideal path to accelerate cloud migration.

Top 20 Private Cloud Questions Answered

Download this asset for straight answers to your top private cloud questions.

Biting the hand that feeds IT © 1998–2021