More and more attention is being paid in recent years to what scholars and researchers dub the replication/reproducability crisis. Many studies simply fail to give the same significant results when replication of the study is attempted, and as a result, the scientific community is concerned that findings are often overemphasized. The problem affects fields as diverse as psychology and artificial intelligence. When it comes to the AI field, many non-peer reviewed papers are published purporting impressive results that other researchers cannot reproduce. In order to tackle the problem and reduce the number of non-reproducible studies, researchers have designed an AI model that aims to determine which papers can be replicated.
As reported by Fortune, a new paper published by a team of researchers from the Kellog School of Management and the Institute of Complex Systems at Northwestern University presents a deep learning model that can potentially determine which studies are likely to be reproducible, and which studies aren’t. If the AI system can reliably discriminate between reproducible and non-reproducible studies, it could help universities, research institutes, companies, and other entities filter through thousands of research papers to determine which papers are most likely to be useful and reliable.
The AI systems developed by the Northwestern team doesn’t utilize the type of empirical/statistical evidence that researchers typically use to ascertain the validity of studies. The model actually employs natural language processing techniques to try and quantify the reliability of a paper. The system extracts patterns in the language used by the authors of a paper, finding that some word patterns indicate greater reliability than others.
The research team drew upon psychological research as old as the 1960’s, which found that people often communicate the level of confidence they have in their ideas through the words that they use. Running with this idea, the researchers thought paper authors might unknowingly signal their confidence in their research findings when writing their papers. The researchers conducted two rounds of training, utilizing different datasets. Initially, the model was trained on approximately two million abstracts from scientific papers, while the second time the model was trained on full papers to take from a project intended to determine which psychology papers can be reproduce – the Reproducibility Project: Psychology.
After testing, the researchers deployed the model on a collection of hundreds of other papers, taken from various fields like psychology and economics. The researchers found that their model gave a more reliable prediction regarding a paper’s reproducibility than the statistical techniques typically used to ascertain whether or not a paper’s results can be replicated.
Researcher and Kellog School of Management Professor Brian Uzzi, explained to Fortune that while he is hopeful that the AI model could someday be used to help researchers ascertain how likely results are to be reproduced, the research team is unsure of the patterns and details their model learned. The fact that machine learning models are often black boxes is a common problem within AI research, but this fact could make other scientists hesitant to utilize the model.
Uzzi explained that the research team hopes that the model could potentially be used to tackle the coronavirus crisis, helping scientists more quickly understand the virus and determine which study results are promising. As Uzzi said to Fortune:
“We want to begin to apply this to the COVID issue—an issue right now where a lot of things are becoming lax, and we need to build on a very strong foundation of prior work. It’s unclear what prior work is going to be replicated or not and we don’t have time for replications.”
Uzzi and the other researchers are hoping to improve the model by making use of further natural language processing techniques, including techniques that the team created to analyze call transcripts regarding corporate earnings. The research team has already built a database of approximately 30,000 call transcripts that they will analyze for clues. If the team can build a successful model, they might be able to convince analysts and investors to use the tool, which could pave the way for other innovative uses for the model and its techniques.