Connect with us

Natural Language Processing

Researcher Use Natural Language Processing Algorithms To Understand Protein Transformation




Researchers from the University of Maryland recently applied natural language processing techniques and machine learning algorithms to gain insight into how protein molecules shift from one shape to another shape. The recent paper, published in the journal Nature Communications, is the first time that an AI algorithm has been used to study the dynamics of biomolecular systems with regard to the transformation of proteins.

Proteins molecules can take on various forms, but the mechanisms that prompt a protein to shift from one form to another are still somewhat mysterious. The function of a protein molecule is defined by its shape, and gaining a better understanding of the mechanisms that influence the shape/structure of a protein could enable scientists to design targeted drug therapies and determine the cause of diseases.

Biological molecules aren’t stationary, they are constantly moving in response to events in their environment. Environmental pressures can make molecules shift into different forms, often quite suddenly. A molecule can suddenly refold into a completely different structure, in a process that is very similar to the uncoiling of a spring. Different portions of the molecule unfold and fold, and the researchers studied the intermediary stages between the different molecular forms.

According to, Pratyush Tiwary was the senior author of the paper and is an assistant professor at Maryland’s Department of Chemistry and Biochemistry and Institute for Physical Science and Technology. According to Tiwary, natural language processing can be used to model how molecules transform and adapt. Tiwary notes that molecules have a certain “language” that they speak, with the movements that molecules make capable of being translated into an abstract language. When this process of mapping molecule movement to language patterns is carried out, natural language processing techniques and AI algorithms can be used to “generate biologically truthful stories out of the resulting abstract words.”

When a molecule transitions from one form to another form, the transition occurs extremely fast. The transition may only take as long as a trillionth of a second. The sheer speed of the transition makes it difficult for scientists to determine what parameters impact the unfolding process using methods like spectroscopy or even high-powered microscopes. In order to determine which parameters impact the unfolding of proteins, Tiwary and the rest of the research team created physics models that simulated proteins. Complex statistical models were used to create protein simulations that emulated the shape, trajectory, and movement of the molecules. The models were then given to a machine learning algorithm based on natural language processing methods.

The natural language processing models used to train the machine learning system were much like the algorithms used in the predictive text systems Gmail employs. The simulated proteins were treated as a language where the movements of molecules were translated to “letters”. The letters were then linked together to make words and sentences. The machine learning algorithms were able to learn the grammatical and syntactical rules behind the protein structures, determining which shapes/movements followed other shapes/movements. The algorithms could then be used to predict how certain proteins would untangle and which shapes they would take.

The researchers utilized a long short-term memory (LSTM) network in order to analyze the protein-based sentences. The research team also kept track of the mathematics that the network was based on, monitoring the parameters as the network learned the dynamics of molecular transformation. According to the results of the study, the network used logic that was similar to a statical physics concept known as path entropy. If this finding holds constant, it could potentially lead to improvements in LSTM networks. Tiwary explained that the discovery peels back some of the black-box nature of an LSTM, letting researchers better understand which parameters can be tuned for optimal performance.

As a test case for their algorithm, the researchers analyzed a biomolecule called riboswitch. Riboswitch had already been analyzed using spectroscopy, and when riboswitch was analyzed with the machine learning system, the predicted riboswitch forms matched those discovered by spectroscopy.

Tiwary hopes that their findings will let researchers develop targeted drugs that have fewer side effects. As Tiwary explained via

“You want to have potent drugs that bind very strongly, but only to the thing that you want them to bind to. We can achieve that if we can understand the different forms that a given biomolecule of interest can take, because we can make drugs that bind only to one of those specific forms at the appropriate time and only for as long as we want.”