A team of researchers from Stanford University, Harvard University, and the University of Chicago trained algorithms to diagnose arthritis in x-rays of knees. It turns out that when patients’ reports are used as the training data for the algorithm, the algorithm was more accurate than radiologists when analyzing the records of Black patients.
Problem of Algorthimic Bias
The use of machine learning algorithms in the medical field can potentially improve outcomes for patients suffering from all manner of disease, but there are also well-documented issues with using AI algorithms to diagnose patients. Studies into the impacts of deployed AI models have found a number of notable incidents involving algorithmic bias. These include algorithms that give minorities fewer referrals to cardiology units than white patients, even though all reported symptoms were the same.
One of the authors of the study, professor Ziad Obermeyer at the University of California Berkeley’s School of Public Health, decided to employ AI to investigate disparities between diagnoses of X-rays by radiologists and the amount of pain that the patients reported. Although Black patients and low-income patients reported higher levels of pain, their X-ray interpretations were scored the same as the general population. The data on the reported pain levels came from the NIH, and the researchers wanted to investigate if human doctors were missing anything in their analysis of the data.
As reported by Wired, in order to identify the potential causes of these differences, Obermeyer and other researchers engineered a computer vision model trained on data from the NIH. The algorithms were designed to analyze X-rays and predict a patient’s pain levels based on the images. The software managed to find patterns within the images that proved highly correlated with a patient’s pain levels.
When the algorithm is presented with an unseen image, the model returns predictions for a patient’s level of reported pain. The predictions returned by the model aligned more closely with the actual reported pain levels of the patients than the scores assigned by the radiologists. This was especially true for Black patients. Obermeyer explained via Wired that the computer vision algorithm was able to detect phenomena that were more commonly linked with pain in Black patients.
Properly Training Systems
Reportedly, the criteria used to evaluate X-rays was originally developed based on the results of a small study carried out in northern England during 1957. The initial population used to develop osteoarthritis assessment criteria was much different than the very diverse population of the modern United States, so it isn’t surprising that there are mistakes made when diagnosing these diverse people.
The new study demonstrates that when AI algorithms are properly trained they can reduce bias. The training was based on the feedback of patients themselves instead of expert opinions. Obermeyer and colleagues previously demonstrated that a commonly used AI algorithm gave preference to White patients over Black patients, but Obermeyer also showed that training a machine learning system on the right data can help prevent bias.
A notable caveat to the study is one familiar to many machine learning researchers. The AI model developed by the research team is a black box, and the team of researchers themselves aren’t sure what kinds of features the algorithm is detecting in the X-rays, meaning that they can’t tell doctors what features they are missing.
Other radiologists and researchers are aiming to dig into the black box and uncover the patterns within them, hopefully helping doctors understand what they are missing. Radiologist and professor at Emory University, Judy Gichoya, is collecting a more expansive and varied set of x-rays to train the AI model. Gichoya will have radiologists create detailed notes on these X-rays. These notes will be compared with the output of the model to see if the patterns detected by the algorithm can be uncovered.