As artificial intelligence algorithms and systems become more sophisticated and take on bigger responsibilities, it becomes more and more important to ensure that AI systems avoid dangerous, unwanted behavior. Recently a team of researchers from the University of Massachusetts Amherst and Stanford published a paper that demonstrates how specific AI behavior can be avoided, through the use of a technique that elicits precise mathematical instructions that can be used to tweak the behavior of an AI.
According to TechXplore, the research was predicated on the assumption that unfair/unsafe behaviors can be defined with mathematical functions and variables. If this is true then it should be possible for researchers to train systems to avoid these specific behaviors. The research team aimed to develop a toolkit that could be employed by users of the AI to specify which behaviors they want the AI to avoid, and enable AI engineers to reliably train a system that will avoid unwanted actions when used in real-world scenarios.
Phillip Thomas, the first author on the paper and assistant computer science professor at U of Michigan Amherst, explained that the research team aims to demonstrate that designers of machine learning algorithms can make it easier for AI utilizers to describe unwanted behaviors and have it be highly likely that the AI system will avoid the behavior.
The research team tested their technique by applying it to a common problem in data science, gender bias. The research team aimed to make the algorithms used to predict college student GPA fairer by reducing gender bias. The research team utilized an experimental dataset and instructed their AI system to avoid the creation of models that across the board underestimated /overestimated GPAs for one gender. As a result of the researcher’s instructions, the algorithm created a model that better-predicted student GPAs and had substantially less systemic gender bias than previously existing models. Previous GPA prediction models suffered from bias because bias reduction models were often too limited to be useful, or no bias reduction was used at all.
A different algorithm was also developed by the research team. This algorithm was implemented in an automated insulin pump, and the algorithm was intended to balance both performance and safety. Automated insulin pumps need to decide how large of an insulin dose a patient should be given After eating, the pump will ideally deliver a dose of insulin just large enough to keep blood sugar levels in check. The insulin doses that are delivered must be neither too large or too small.
Machine learning algorithms are already proficient at identifying patterns in an individuals response to insulin doses, but these existing analysis methods can't let doctors specify outcomes that should be avoided, such as low blood sugar crashes. In contrast, the research team was able to develop a method that could be trained to deliver insulin doses that stay within the two extremes, preventing either underdosing or overdosing. While, the system isn't ready for testing in real patients just yet, a more sophisticated AI based on this approach could improve quality of life for those suffering from diabetes.
In the research paper, the researchers refer to the algorithm as a “Seledonian” algorithm. This is in reference to the three laws of robotics described by the Sci-Fi author Isaac Asimov. The implication is that the AI system “may not injure a human being or, through inaction, allow a human being to come to harm.” The research team hopes that their framework will allow AI researchers and engineers to create a variety of algorithms and systems that avoid dangerous behavior. Emma Brunskill, senior author of the paper and Stanford assistant professor of computer science, explained to TechXplore:
“We want to advance AI that respects the values of its human users and justifies the trust we place in autonomous systems.”