Proteins have evolved to excel at everything from muscle contraction to digesting food to recognizing viruses. To design better proteins, including antibodies, scientists often iteratively change amino acids (the units arranged in sequence to make up proteins) in different positions until the resulting protein has an improved function, such as eliciting a stronger immune response or more efficiently capturing carbon dioxide from the atmosphere.
But there are more possible amino acid sequences than there are grains of sand in the world. And finding the best protein, and thus the best potential drug, is often expensive or impossible.
Stanford scientists have developed a new method based on machine learning to more quickly and accurately predict the molecular changes that will lead to better antibody drugs. Published in Science July 4, the approach combines the 3D structure of the protein chain with large language models based on the amino acid sequence, and allows researchers to find, in minutes, rare and desirable mutations that would otherwise only be found with exhaustive experiments.
Led by Peter S. Kim, professor of biochemistry and Sarafan ChEM-H Institute researcher, and Brian Hie, assistant professor of chemical engineering, the team showed they could improve a previously FDA-approved anti-SARS-CoV-2 antibody that was abandoned due to its ineffectiveness against a new strain in November 2022. Their approach yielded a 25-fold improvement against the virus.
“A lot of the work in artificial intelligence and drug development is about gathering tons of data about how a certain molecule does a certain task so that a computer can learn enough to design a better version,” Kim said. “What’s remarkable is that we’ve shown that structure can be used in place of a lot of that data, and the computer will continue to learn.”
“Now more antibodies have the opportunity to be optimized,” said Hie, who is also an innovation researcher at the Arc Institute.
Animation showing the 3D structure of an FDA-approved anti-SARS-CoV-2 antibody (green and orange) bound to a protein that appears on the surface of the virus (white). The new approach allowed the team to identify specific changes in the amino acids that make up the antibody (shown as blue and pink spheres) that made the antibody 25 times more effective against the virus. | Varun Shanker
Folded into shape
When faced with the challenge of finding the most effective amino acid sequence, scientists often make millions and test them in miniaturized and simplified versions of biological systems. They hope that the best drug in a petri dish will also be the best drug in humans.
“It’s a lot of guesswork and verification,” Hie said. “The goal of many smart algorithms is to eliminate uncertainty.”
To speed up the process, scientists have developed ChatGPT-like machine learning algorithms that are trained on the amino acid sequences of millions of proteins to predict desirable mutations.
These models, however, often point scientists toward sequences that, when produced in the lab, are unstable or worse than where they started.
This is partly because the function of proteins depends not only on the sequence of amino acids, but also on the three-dimensional structure of that sequence. For example, to trigger an immune response, antibodies must be shaped properly to bind to molecules on the surface of viruses.
The team reasoned that the key to developing a better prediction algorithm was structure. So they narrowed down the long list of possible beneficial mutations—determined by the large-scale sequence-based language model—to those that would preserve the 3D shape of the starting protein.
Testing ground
In December 2022, the team tested it on a recently abandoned SARS-CoV-2 antibody treatment.
“The prevailing theory was that any attempt to improve this antibody would fail,” said Varun Shanker, a medical student with a degree in biophysics and the study’s lead author. “The virus was too smart. It evolved as it spread among millions of people to know exactly how to mutate to avoid these antibodies.”
Using purely sequential models to optimize the protein resulted in a slight doubling of efficiency. But with their structure-guided approach, the team saw a 25-fold increase.
“We were finally catching up with the virus,” said Shanker, who is also a member of the Chemistry/Biology Interface Training Program at Sarafan ChEM-H.
Teaching an old model new tricks
Most efforts to use AI to create better drugs rely on “training” or “supervising” the model, which involves generating massive amounts of data on the function and performance of single protein sequences. This approach is time-consuming and creates a model that is tailored to a specific protein performing a specific task.
This model requires no information about what the protein does, how it does it, or lab experiments. Because structure is tightly coupled with function, the protein’s coordinates become an indicator of performance. For the COVID antibody work, they restricted the structure not only to the antibody itself, but also to the antibody when it is bound to the virus. From there, their model “learned” some rules for antibody binding without ever needing to be taught.
Early experiments show that the approach is generalizable to other types of proteins, such as enzymes, which help catalyze chemical reactions in our bodies. So far, the researchers have found that the model points scientists toward dozens of proteins, and on average, half of them are better than the starting point.
This tool could be useful for responding quickly to emerging or evolving diseases. It would also reduce barriers to making more effective drugs. More powerful drugs require smaller doses, meaning a given amount could benefit more patients. For infectious diseases like HIV, where studies have shown that large but infrequent doses of an antibody can protect patients from infection, this could be a game changer.
The team makes its model and code available to everyone.
“This is an exciting example of how deep learning can democratize the process of making better proteins,” Shanker said. “Not only does this enable the development of new drugs, it opens up new areas of scientific exploration that were previously inaccessible.”