Dropping Nvidia for Amazon's custom chips helped gene therapy startup Metagenomi cut AI bill 56%

It's not the size of your accelerator, it's how you use it

Gene editing startup Metagenomi has tapped AWS's Inferentia 2 accelerators to speed the discovery of potentially life-saving therapies, and said its efforts cost 56 percent less than it would have incurred using Nvidia GPUs.

Founded in 2018, Metagenomi is using a Nobel prize-winning approach developed by Jennifer Doudna and Emmanuelle Charpentier called CRISPR, which allows for the targeted editing of gene sequences.

"Gene editing is a new therapeutic modality aimed at treating disease by addressing the cause of disease at the genetic level. So rather than treating the symptoms, actually going after a cure," Chris Brown, VP of discovery at Metagenomi, told El Reg.

These therapies rely on identifying enzymes – essentially biological catalysts that facilitate chemical reactions – that can bind to the RNA sequences that guide them to their destination, cut the target DNA in the right spot, and – critically – fit in the delivery mechanism of choice.

To find these enzymes, the startup is using a class of generative AI known as protein language models (PLMs), like Progen2, to rapidly generate millions of potential candidates.

"It's about finding that one thing in a million. So if you've got access to twice as many, you're doubling your chances of potentially getting a product at the end," Brown said.

Developed by researchers at Salesforce, Johns Hopkins, and Columbia Universities in 2022, Progen2 is an auto-regressive transformer model not unlike GPT-2. But rather than spitting out strings of text, it synthesizes novel protein sequences.

Weighing in at about 800 million parameters for the base model, Progen2 is tiny compared to modern large language models like GPT-4 or DeepSeek R1, which means running it doesn't require massive quantities of high-bandwidth memory. For the trial, Metagenomi compared AWS's Inferentia 2 accelerator with Nvidia's L40S, which the biotech startup had previously been using to run Progen2.

Launched in 2023, Inferentia 2 is (as its name suggests) an inference-optimized accelerator, with 32GB of HBM, 820 GB/s of memory bandwidth, and 190 teraFLOPS of 16-bit performance.

By comparison, the L40S, based on Nvidia's previous-gen Ada Lovelace GPU architecture, features 48GB of GDDR6 good for 864 GB/s of memory bandwidth and 362 teraFLOPS at 16-bit precision.

But while the L40S outperforms Inferentia 2 on paper, Amazon claims its chip can do the job cheaper by taking advantage of its batch processing pipeline, AWS Batch, and spot instances.

"Spot Instances are generally 70-ish percent lower cost than on demand. Because the workflows that they were optimizing for could be scheduled around spot Instances utilizing AWS Batch, it really simplified these deployments ... and allowed them to schedule different types of experimentation to run around the clock," Kamran Khan, head of business development for the machine learning wing of AWS's Annapurna Labs team, told The Register.

Metagenomi and AWS found that a combination of spot instances, clever batching, and cheap chips could cut Metagenomi's operating costs by up to 56 percent

Metagenomi and AWS found that a combination of spot instances, clever batching, and cheap chips could cut Metagenomi's operating costs by up to 56 percent

A chunk of the savings from using Inferentia came from greater availability. The cloud giant says the interruption rate for its homegrown chip is roughly five percent compared to 20 percent for Nvidia's L40S-based spot instances. In theory, this means that only one in 20 of Metagenomi's protein generation batches should be interrupted, versus one in five for Nvidia's accelerator.

For Brown, Inferentia's lower operating cost translates directly into more science, increasing the likelihood of discovering enzymes capable of targeting different ailments.

"We took a problem where it would have been one project for the year, and instead we turned it into something that my team can do multiple times a day or a week," Brown said.

The collab also highlights that for AI workloads that aren't interactive, faster hardware isn't always better – older, heavily discounted accelerators may offer better value. ®

More about

TIP US OFF

Send us news


Other stories you might like