This article is more than 1 year old
Can AI transformer models help design drugs and treat incurable diseases?
From protein prediction to drug generation, neural networks are revolutionizing medication
Special report AI can study chemical molecules in ways scientists can't comprehend, automatically predicting complex protein structures and designing new drugs, despite having no real understanding of science.
The power to design new drugs at scale is no longer limited to Big Pharma. Startups armed with the right algorithms, data, and compute can invent tens of thousands of molecules in just a few hours. New machine learning architectures, including transformers, are automating parts of the design process, helping scientists develop new drugs for difficult diseases like Alzheimer's, cancer, or rare genetic conditions.
In 2017, researchers at Google came up with a method to build increasingly bigger and more powerful neural networks. Today, transformer-based models are behind some of the largest AI systems and typically learn patterns from vast amounts of text. They're versatile and can process different forms of language from code to ancient scripts scribbled thousands of years ago.
These systems are also useful in biology since proteins can be encoded as text too, Nadav Brandes, a postdoc at the University of California, San Francisco, studying bioinformatics, told the The Register. These complex molecules are made up of about 20 different amino acids; each building block can be represented with letters. Using this analogy, Brandes said proteins can be thought of as words, multiple proteins like sentences. But the vocabulary and grammar of these structures aren't comprehensible to humans like natural language is.
Transformers, however, are able to glean information from what appears to be gobbledygook. "Transformers can process all the molecules at different positions at the same time and capture relationships between them over longer distances," he said.
"They're also easier and more efficient to train, and we can scale them up to larger datasets."
When models are fed hundreds of thousands of protein sequences, they can do all sorts of things that would normally take scientists a long time, like mapping their shapes or predicting the effects of genetic mutations.
AlphaFold, developed by researchers at DeepMind, for example, learned how to plot the position of amino acids in a protein it has never seen in just minutes or hours. That's an impossible task for structural biologists; it often takes them years of extensive lab experiments to accurately map their jumbled curly ribbon-like shapes.
Knowing a protein's structure is critical for drug design. Scientists use this information to understand its function in the human body. Proteins interact with other molecules to perform vital tasks such as repairing cells or moving muscles. They have a unique binding site on their surface, where they can connect to other molecules and carry out their particular task.
Drugs are designed to latch onto these binding sites, preventing the sites from working with other molecules to carry out pathological functions that cause disease, like helping cancerous tumor cells grow.
AlphaFold has generated a wealth of information; its database is teeming with nearly a million squiggly protein structures found in all sorts of living organisms from animals, insects, plants, bacteria, and viruses.
If its predictions are indeed accurate, scientists will have leapfrogged years of physical experiments needed to discover their structures. It also means they'll be able to invent new drugs to target diseases that weren't within reach before.
One weak hit
Researchers from Insilico Medicine, a startup founded in 2014 and based in Hong Kong and New York, are using AlphaFold to do just that. A team claimed it was the first "to identify a confirmed hit for a novel target in early drug discovery" for an unverified protein.
Starting with an AlphaFold-predicted structure of CDK20, a protein involved in cell growth, the team generated 8918 drug candidates using machine learning to treat liver cancer.
Seven small molecules were synthesized and tested in lab experiments to see how strongly they interacted with the protein's binding site. One ended up being a "weak hit," Alex Zhavoronkov, Insilico's founder and CEO, told The Reg. "That means it looks somewhat promising, but it's not going to be a drug right away."
The whole process, from start to finish, took just 30 days.
It's too early to tell if AlphaFold was useful in this instance; Insilico will have to refine its search further, looking for more molecules that can lock onto the target more effectively. "The team is still working on this project; it's still in the hit identification process, and we will further progress it towards lead optimization, pre-clinical candidate, and hopefully clinical studies if everything goes smoothly. At the same time, more mechanistic studies for this target are ongoing," Zhavoronkov said.
Even if AlphaFold's predictions are accurate, they aren't always helpful to drug designers. They don't model how a protein's binding site changes shape when it interacts with a small molecule candidate, its something that developers have to figure out on their own using complicated physics-based simulations.
Even if Insilico is far away from a viable new drug in this experiment, it proved AlphaFold's predictions can be used by drug companies. "It shows it's not just a prototype. It's production ready," Zhavoronkov added.
Scientists can start plugging protein structures predicted by DeepMind's software into their own machine learning models and start creating new molecules to target protein structures yet to be experimentally verified.
DeepMind's co-founder and CEO, Demis Hassabis, knew AlphaFold would be commercially valuable. Last year, he spun out a separate startup, Isomorphic Labs, to develop new drugs using AlphaFold's knowledge of protein folding.
"Isomorphic's mission could not be a more important one: to use AI to accelerate drug discovery, and ultimately, find cures for some of humanity's most devastating diseases," he said at the time.
AI-designed antibodies growing in bacteria
Transformers also have another trick up their sleeves: they can recognize and predict properties in data they haven't explicitly seen before. At Absci, a public drug and target discovery company founded in 2011, researchers are building a model to automatically predict whether an AI-generated antibody will be rejected by a patient's immune system without training on any clinical data.
Antibodies are a type of protein produced by our immune systems. They form naturally in our bodies to fight infections from foreign viruses or bacteria. Antibodies prevent us from getting sick by binding with enemy proteins, blocking them from infecting our cells. If AI can generate novel antibodies, scientists can develop new therapeutics and vaccines.
These made-up proteins, however, must be chemically stable and must be accepted by the body's immune system. If they're rejected, they risk being identified as foreign molecules themselves. They'll be attacked by the body's natural defense mechanisms which could lead to an adverse reaction, making them unsuitable as medical treatments.
"Biological drug discovery is hard, there are more antibody variants that are possible to create than there are atoms in the universe," Joshua Meier, Absci's lead AI scientist, told The Register.
- Ex-Googlers take a stab at building 'general intelligence' that makes software do what you tell it
- Supercomputer to train 176-billion-parameter open-source AI language model
- Free AI protein software packages nearly predicted structure of the Omicron coronavirus variant correctly
- Alphabet launches new AI drug discovery startup, Isomorphic Labs, led by DeepMind CEO
"But only a small fraction are actually biologically viable."
Absci is building a transformer model to narrow that search down, selecting only the most promising variants that seem less likely to be rejected by the immune system.
"It sees hundreds of millions of antibody sequences. And then from there, we can present the model with a new antibody we're designing, and we can ask it, what is the natural order of this?" Meier explained.
The system calculates a "naturalness" score, comparing the structure of the artificially designed antibody to natural ones it has seen during training. Absci's training data includes antibodies from various organisms, from humans to llamas. If it appears natural, it'll probably be less likely to trigger the immune system's defenses, he reckons.
Transformers are also at play when Absci scientists want to see which antibodies will bind to their target proteins most strongly. Unlike small molecule drugs, Absci chemists don't synthesize the antibodies themselves. Instead, they're grown inside genetically engineered E coli bacteria cells.
DNA contains blueprint instructions on how cells can produce proteins. The company works backwards, figuring out the corresponding DNA sequences for its antibody designs. These DNA sequences are then inserted into the bacteria so the cells will generate the antibody designed by its software.
Biologists can then test these new strains and see if they're effective against a particular protein. "We do this on basically a cell-by-cell basis. Every cell is getting a different sequence of DNA corresponding to a different variant of the antibody, so every bacteria cell ends up producing a different antibody," Meier said.
The company declined to talk about potential antibody treatments under development, but told us it managed to generate candidates that may be more effective at treating breast cancer than existing treatments like Herceptin.
Give me the recipe
Training a neural network to generate drugs is easy. Trying to figure out how to make those molecules is hard. Pharmaceutical companies can't test each candidate; the lab experiments would be too time consuming and expensive.
Besides, even if they know what ingredients go into cooking a particular drug, they don't always know its recipe. Structural chemists are called in to probe and adjust the structure of the computer-designed molecules, changing it to something they reckon can be synthesized.
Exscientia, a pharmaceutical company founded in 2012 and headquartered in the UK, is developing transformer models to automate this step. The goal is to have a working system capable of ingesting made-up molecules as input, and spitting out the chemical reactions needed to make the molecules as output.
"You need to know not just that a reaction is possible in theory. You also need to know, based on the whole molecule, what's the likelihood this is actually going to work? What's the possible yield?," Adrian Schreyer, the company's VP of AI technology, told The Register.
The process of working backwards, deconstructing molecules into their constituent building blocks is known as retrosynthesis. Transformers are well-adapted for this task, said Ben Suutari, a senior AI research scientist. Given a chemical sequence of reactions, engineers can cover up various steps and ask the model to fill in the blanks.
A similar method is used to teach language models to autocomplete text. For example, in the sentence, "the cat sat on the __", the system is taught to assign a higher score to the word "mat" rather than, say, "hat". Instead of learning the order of words, however, Exscientia's retrosynthesis model learns the order of chemical reactions needed to make a finished molecule.
Another way to teach the algorithm is to scramble the answer and get it to reconstruct the data, we're told. It can then operate in reverse, dissecting a molecule to lay out a viable pathway to synthesizing drugs never created before.
Scientists and machines working together
The quest for the perfect AI-designed drug capable of treating or maybe even curing diseases is long, hard, and filled with all sorts of technical and regulatory roadblocks. Transformers are only a small part of a drug designer's toolkit, there are also several other machine learning models involved in the process. Sometimes these models even compete with each other, generating batches of molecules with different structures.
Transformers don't always come out on top, Insilico's Zhavoronkov told us. The startup's generation system, Chemistry42, currently consists of 32 different models, and includes generative adversarial networks and evolutionary algorithms.
"You want to have many different approaches competing with each other. If you just use transformers, it's very rare that it will perform best in every application. So for every target, you want to have that diversity. Very often people think transformer networks are the answer to everything and they're never the answer to everything," Zhavoronkov said.
AI may help scientists bring new drugs to market at a faster rate, but the drug discovery process can't be completely automated. It requires careful cooperation between humans and machines. New molecules still need to be examined by lab experiments and tested on patients before they're considered safe. Companies like Insilco and Exscientia already have drug candidates designed by their proprietary AI software in clinical trials.
The day an AI-designed drug can be given to a real patient might not be so far away. But it's difficult to predict when that day will arrive. Even if these new drugs are granted approval from regulators, many of these companies still need to find buyers for their IP if they haven't already partnered with a bigger pharmaceutical company to manufacture and sell their products at scale.
Here the incentives become murkier; they'll have to convince Big Pharma to make their medication not just because it'll save lives but because it'll make money. ®