A team of researchers from the Pritzker School of Molecular Engineering (PME) at the University of Chicago has recently succeeded in the creation of an AI system that can create entirely new, artificial proteins by analyzing stores of big data.
Proteins are macromolecules essential for the construction of tissues in living things, and critical to the life of cells in general. Proteins are used by cells as chemical catalysts to make various chemical reactions occur and to carry out complex tasks. If scientists can figure out how to reliably engineer artificial proteins, it could open the door to new ways of carbon capturing, new methods of harvesting energy, and new disease treatments. Artificial proteins have the power to dramatically alter the world we live in. As reported by EurekaAlert, a recent breakthrough by researchers at PME University of Chicago has put scientists closer to those goals. The PME researchers made use of machine learning algorithms to develop a system capable of generating novel forms of protein.
The research team created machine learning models trained on data pulled from various genomic databases. As the models learned, they began to distinguish common underlying patterns, simple rules of design, that enable the creation of artificial proteins. Upon taking the patterns and synthesizing the respective proteins in the lab, the researchers found that the artificial proteins created chemical reactions that were approximately as effective as those driven by naturally occurring proteins.
According to Joseph Regenstein Professor at PME UC, Rama Ranganathan, the research team found that genome data contains a massive amount of information regarding the basic functions and structures of proteins. By utilizing machine learning to recognize these common structures, the researchers were “able to bottle nature’s rules to create proteins ourselves.”
The researchers focused on metabolic enzymes for this study, specifically a family of proteins called chorismate mutase. This protein family is necessary for life in a wide variety of plants, fungi, and bacteria.
Ranganathan and collaborators realized that genome databases contained insights just waiting to be discovered by scientists, but that traditional methods of determining the rules regarding protein structure and function have only had limited success. The team set out to design machine learning models capable of revealing these design rules. The model’s findings imply that new artificial sequences can be created by conserving amino acid positions and correlations in the evolution of amino acid pairs.
The team of researchers created synthetic genes that encoded amino acid sequences producing these proteins. They cloned bacteria with these synthetic genes and found that the bacteria used the synthetic proteins in their cellular machinery, functioning almost exactly the same as regular proteins.
According to Ranganathan, the simple rules that their AI distinguished can be used to create artificial proteins of incredible complexity and variety. As Ranganathan explained to EurekaAlert:
“The constraints are much smaller than we ever imagined they would be. There is a simplicity in nature’s design rules, and we believe similar approaches could help us search for models for design in other complex systems in biology, like ecosystems or the brain.”
Ranganathan and collaborators want to take their models and generalize them, creating a platform scientists can use to better understand how proteins are constructed and what effects they have. They hope to use their AI systems to enable other scientists to discover proteins that can tackle important issues like climate change. Ranganathan and Associate Professor Andrew Ferguson have created a company dubbed Evozyne, which aims to commercialize the technology and promote its use in fields like agriculture, energy, and environment.
Understanding the commonalities between proteins, and the relationships between structure and function could also assist in the creation of new drugs and forms of therapy. Though protein folding has long been considered an incredibly difficult problem for computers to crack, the insights from models like the once produced by Ranganathan’s team could help increase the speed these calculations are produced at, facilitating the creation of new drugs based on these proteins. Drugs could be developed that block the creation of proteins within viruses, potentially aiding in the treatment of even novel viruses like the Covid-19 coronavirus.
Ranganathan and the rest of the research team still need to understand how and why their models work and how they produce reliable protein blueprints. The research team’s next goal is to better understand what attributes the models are taking into account to arrive at their conclusions.