A math professor from American University, along with his team of collaborators, developed a statistical model that can detect misinformation in social media posts.
Machine learning is increasingly being used to stop the spread of misinformation, but there is still a major hurdle involving the problem of black boxes that occur. This refers to when researchers don’t understand how a machine arrives at the same decision as human trainers.
Detecting Misinformation With Statistical Models
Zois Boukouvalas, assistant professor in AU’s Department of Mathematics and Statistics, used a Twitter dataset with misinformation tweets about COVID-19 to demonstrate how statistical models can detect misinformation in social media during major events like a pandemic or disaster.
Boukouvalas and his colleagues, which include AU student Caitlin Moroney and Computer Science Professor Nathalie Japkowics, demonstrated how the model’s decisions align with humans’ in the newly published research.
“We would like to know what a machine is thinking when it makes decisions, and how and why it agrees with the humans that trained it,” Boukouvalas said. “We don't want to block someone's social media account because the model makes a biased decision.”
The method used by the team is a type of machine learning that relies on statistics. Statistic models are effective and provide another way to combat misinformation.
The model achieved a high prediction performance and classified a testing set of 112 real and misinformation tweets with nearly 90% accuracy.
“What's significant about this finding is that our model achieved accuracy while offering transparency about how it detected the tweets that were misinformation,” Boukouvalas continued. “Deep learning methods cannot achieve this kind of accuracy with transparency.”
Training and Preparing the Model
The researchers prepared to train the model before testing it on a dataset since the information provided by humans can introduce biases and black boxes.
The tweets were labeled by the researchers as either misinformation or real based on a set of predefined rules about language used in misinformation. The team also considered nuances in human language and linguistic features that are linked to misinformation.
Before training the model, socio-linguist Professor Christine Mallinson of the University of Maryland Baltimore County identified the tweets for writing styles associated with misinformation, bias, and less reliable sources in news media.
“Once we add those inputs into the model, it is trying to understand the underlying factors that lead to the separation of good and bad information,” Japkowicz said. “It's learning the context and how words interact.”
The researchers will now look to improve the user interface for the model, as well as its ability to detect misinformation in social media posts that include images or other multimedia. The statistical model will be required to learn how a variety of different elements interact with each other to create misinformation.
Both Boukouvalas and Japkowicz say that human intelligence and news literacy are key to stopping the spread of misinformation.
“Through our work, we design tools based on machine learning to alert and educate the public in order to eliminate misinformation, but we strongly believe that humans need to play an active role in not spreading misinformation in the first place,” Boukouvalas said.