Hype versus reality: What you can't do with DeepMind's AlphaFold in drug discovery
Protein prediction it can do, the next steps not so much – not that it was trained for this
Analysis DeepMind's AlphaFold model has predicted nearly all known protein structures discovered yet, though its ability to help scientists discover new drugs remains unproven.
Proteins are complex molecules created by organisms to carry out the biological functions necessary for life. Generally made up of a string of 20 amino acids, these chains fold up in countless ways, with their final shape determining how they work and interact with other things.
It's not a straightforward process determining how a protein will fold. For example, let's say you wanted to synthesize a protein or slightly alter its operation. You can't adjust its amino acids or come up with a new string of them and know for sure how they will turn out and work when folded. This is where computers come into it.
Advances in AI algorithms and training have led to the development of software, such as AlphaFold, that can accurately predict the 3D shapes of proteins given their amino acid combinations.
AlphaFold is impressive, and has now predicted over 200 million proteins from their amino acid strings. Researchers hoped that building such a large database would allow scientists to develop treatments targeting specific proteins associated with diseases such as cancer or dementia. Coming up with such medicines may require you to know the physical structure of the protein, which is where programs like AlphaFold can be used.
An investigation led by academics at MIT in America, however, shows just how difficult the task is in practice. Essentially, the AI software is useful in one step of the process – structure prediction – but can't help in other stages, such as modeling how drugs and proteins would physically interact.
"Breakthroughs such as AlphaFold are expanding the possibilities for in silico (computer simulation) drug discovery efforts, but these developments need to be coupled with additional advances in other aspects of modeling that are part of drug discovery efforts," James Collins, lead author of the study published in Molecular Systems Biology and a bioengineering professor at MIT, said in a statement.
"Our study speaks to both the current abilities and the current limitations of computational platforms for drug discovery."
Collins and his colleagues used AlphaFold to simulate interactions between bacterial proteins and antibacterial compounds, a task known as molecular docking. The goal was to use molecular docking to rank the candidate compounds by how strongly they bind to the target protein. A molecule that binds strongly to a protein is more likely to be an effective drug; it could be more effective at preventing the protein from carrying out a pathogenic function, such as tumor growth, for example.
The team tested AlphaFold's ability to model interactions between 296 essential proteins from E. coli bacteria with 218 antibacterial compounds, including antibiotics such as tetracyclines. AlphaFold was not very effective for modelling molecular docking simulations accurately.
"Utilizing these standard molecular docking simulations, we obtained an auROC value of roughly 0.5, which basically says you're doing no better than if you were randomly guessing," Collins said.
Not the smartest AI on the block
Other machine learning models were more accurate than AlphaFold for some simulations, according to Felix Wong, co-author of the paper and a postdoctoral researcher at MIT.
"The machine-learning models learn not just the shapes, but also chemical and physical properties of the known interactions, and then use that information to reassess the docking predictions," he said. "We found that if you were to filter the interactions using those additional models, you can get a higher ratio of true positives to false positives."
- Nearly all protein structures known to science predicted by AlphaFold AI
- Can AI transformer models help design drugs and treat incurable diseases?
- Free AI protein software packages nearly predicted structure of the Omicron coronavirus variant correctly
- LinkedIn billionaire Reid Hoffman, DeepMind co-founder launch AI startup
Derek Lowe, a longtime drug discovery chemist and science writer, told The Register he wasn't surprised with the results given that AlphaFold was not really trained for molecular docking simulations. "Docking small molecules into a given protein structure is really a different problem than determining that protein structure in the first place," he said.
Being able to model these types of chemical interactions is an unsolved problem. No algorithm is perfect. Even if scientists have a good model of the protein, its shape changes when it is interacting with a potential drug candidate in mysterious ways.
"Virtual screening has never yet reached the 'works every time' level - sometimes it provides useful information and sometimes it doesn't, and you are never sure up front which of those regimes you're working in. Added to that is the way that different docking software will give you different answers, and for any given target one of them might give notably more useful answers than another - but again, you don't know up front which of those it'll be," Lowe said.
"Even with perfect protein structures, some of them are going to be better 'fits' for a docking-and-scoring approach than others, and AlphaFold structures, while impressive, are not perfect, either. But to me, this isn't so much on AlphaFold as it is on docking technology."
AlphaFold may prove useful for other parts of the drug discovery pipeline, where comparing protein structures obtained via different methods against the model's predictions is valuable.
"The biggest problems in drug discovery are the ones that contribute to our roughly 85 percent failure rate in the clinic. And those are picking the right targets and getting early warnings about toxicity. Neither of those are helped much at all by knowing protein structures," Lowe added. ®
Updated to add
We had some questions for Felix Wong and Aarti Krishnan, the lead authors for the MIT study. They got back to us with these answers:
The Reg: Why does AlphaFold struggle to simulate docking accurately? Was it trained for this task?
MIT team: "AlphaFold was trained to predict the three-dimensional structures of proteins from amino acid sequences; these structures can then be used in downstream applications, including molecular docking to predict drug-protein interactions. While this is one of AlphaFold's most anticipated use cases, AlphaFold can likely be improved to better facilitate this. For instance, AlphaFold predicts only static and rigid protein structures that are stuck in time, and it's possible that knowing the dynamical and disordered properties of these structures can better enable us to predict drug binding."
What types of software do practitioners use in the drug industry to simulate model docking? Why is this task difficult, and how does AlphaFold compare?
"Molecular docking has evolved and improved over the past 40 years, and nowadays open-source (eg, AutoDock Vina and DOCK6) and proprietary (eg, Schrödinger) software are commonly used. Predicting drug binding is probably one of the most difficult tasks in biology: these are many-atom interactions between complex molecules with many potential conformations, and the aim of docking is to pinpoint just one of them.
Predicting drug binding is probably one of the most difficult tasks in biology
"We've all heard the analogy of a needle in a haystack, but this is even more challenging because conformational space is huge. Even the first step of guessing the general region of the protein to look at (the binding pocket) is difficult, since having just a 3D toy model doesn't tell you how a protein functions.
"One of AlphaFold's main contributions thus far has been to provide a comprehensive resource of predicted protein structures that we can now use for docking. These predictions complement all the experimental structures that we already had, because there were some structures that we didn't have and we can now go through the structures more holistically.
"When we were performing our benchmarking analyses using 12 well-characterized E. coli proteins, we found that the ability of molecular docking to accurately predict drug-protein interactions was similar when using AlphaFold as opposed to experimentally-determined structures. This strongly suggests that the bottleneck is not the quality of AlphaFold-predicted structures, but rather the molecular docking approach and our current, limited ways of harnessing structural information to accurately predict drug-protein interactions."
Why is it important to measure the binding between a protein and molecule?
"The binding between a protein and a molecule underlies how many drugs, including antibiotics, work. Most antibiotics, like penicillin, are simply small molecules that bind specifically to bacterial proteins.
"By binding to their protein targets, these drugs can interfere with the normal functions of proteins in many ways, including competing against physiological substrates and inducing protein conformational changes that render proteins inactive. For antibiotics, we want these proteins to be needed for the cell to survive, so that the drugs targeting these proteins would lead to bacterial death.
"This paradigm works similarly for anti-cancer and anti-viral drugs, and there are also cases where inhibiting the activity of some protein might be beneficial to a cell.
"In general, being able to measure the binding between a protein and a molecule tells you about how a drug works and is a critical part of any drug development process. Many cases in which a drug succeeds or fails can be informed by knowing the protein target (or targets). A common reason for drugs failing is that they turn out to have multiple targets, and this promiscuity is often associated with drug side-effects."