Powerful algorithms used by companies like Netflix, Facebook, and Amazon could have major implications in healthcare. They have demonstrated the ability to predict the biological language of cancer and other neurodegenerative diseases such as Alzheimer’s.
This initiative was undertaken by academics at St. John’s College, University of Cambridge, who fed big data produced over decades into a computer language model. The goal was to see if artificial intelligence (AI) could make more advanced discoveries than humans, and they found just that with the technology’s ability to decipher biological language.
The study was published in the scientific journal PNAS, titled “Learning the molecular grammar of protein condensates from sequence determinants and embeddings.” According to the experts, it could be used to “correct the grammatical mistakes inside cells that cause disease.”
Professor Tuomas Knowles is lead author of the paper and a Fellow at St. John’s College.
“Bringing machine-learning technology into research into neurodegenerative diseases and cancer is an absolute game-changer. Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia happening at all.”
The machine-learning algorithms used by companies like Netflix and Facebook make highly educated predictions about consumers and what they will do next. This is what happens when Netflix recommends a new movie or Facebook recommends a new friend. Voice assistants such as Alexa and Siri can recognize individuals right away and respond.
Dr. Kadi Liis Saar is first author of the paper and a Research Fellow at St. John’s College. She used similar technology to train a large-scale language model, which aimed to identify what is happening to proteins during disease.
“The human body is home to thousands and thousands of proteins and scientists don’t yet know the function of many of them. We asked a neural network based language model to learn the language of proteins,” she said.
“We specifically asked the program to learn the language of shapeshifting biomolecular condensates — droplets of proteins found in cells — that scientists really need to understand to crack the language of biological function and malfunction that cause cancer and neurodegenerative diseases like Alzheimer’s. We found it could learn, without being explicitly told, what scientists have already discovered about the language of proteins over decades of research.”
Scientists believe there are several hundred neurodegenerative diseases, with the most common being Alzheimer’s, Parkinson’s, and Huntingon’s diseases. Alzheimer’s affects 50 million people around the globe, and during the disease, proteins form clumps and kill healthy nerve cells.
Protein Condensates and NLP Technology
With a healthy brain, these masses of proteins can be disposed of effectively. According to more recent findings, scientists now believe that some disordered proteins form condensates, which are liquid-like droplets of proteins. These don’t have a membrane and merge freely with each other, and they can form and reform.
“Protein condensates have recently attracted a lot of attention in the scientific world because they control key events in the cell such as gene expression — how our DNA is converted into proteins — and protein synthesis — how the cells make proteins,” Professor Knowles said.
“Any defects connected with these protein droplets can lead to diseases such as cancer. This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease,” he continued.
“We fed the algorithm all of the data held on the known proteins so it could learn and predict the language of proteins in the same way these models learn about human language and how WhatsApp knows how to suggest words for you to use,” Dr. Saar said.
“Then we were able to ask it about the specific grammar that leads only some proteins to form condensates inside cells. It is a very challenging problem and unlocking it will help us learn the rules of the language of disease,” Dr. Saar continued.
The main drivers behind this advancement in technology are an increasing amount of available data, higher computing power, and technical advances. Machine learning has the potential to dramatically transform research in these areas, enabling discoveries that could have never been predicted.
According to Dr. Saar, “Machine-learning can be free of the limitations of what researchers think are the targets for scientific exploration and it will mean new connections will be found that we have not even conceived of yet. It is really very exciting indeed.”
The new network is available to researchers all around the world, and an increasing amount of scientists are getting involved.