A team of researchers from MIT has developed a deep-learning algorithm intended to help AIs cope with “adversarial” examples, which can cause an AI to make the wrong predictions and carry out the wrong actions. The algorithm designed by the MIT team can help AI systems maintain their accuracy and avoid making mistakes when faced with confusing data points.
AI systems analyze the input features of an event to decide how to respond to that event. An AI responsible for maneuvering an autonomous vehicle has to take data from the vehicle’s cameras and decide what to do based on the data contained in those images. However, there’s the chance that the image data being analyzed by the AI isn’t an accurate representation of the real world. A glitch in the camera system could alter some of the pixels, leading to the AI drawing incorrect conclusions about the appropriate course of action.
“Adversarial inputs” are like optical illusions for an AI system. They are inputs that confuse an AI in some form. Adversarial inputs can be crafted with the express goal of causing an AI to make mistakes, by representing data in a fashion that makes the AI believe that the contents of an example are one thing instead of another. For instance, it is possible to create an adversarial example for a computer vision system by making slight changes to images of cats, causing the AI to mis-classify the images as computer monitors. The MIT research team designed an algorithm to help guard against adversarial examples by letting the model maintain a degree of “skepticism” about the inputs it receives.
The MIT researchers called their approach “Certified Adversarial Robustness for Deep Reinforcement Learning,” or CARRL. CARRL is composed of a reinforcement learning network and a traditional deep neural network joined together. Reinforcement learning uses the concept of “rewards” to train a model, giving the model proportionally more reward the closer it comes to hitting its goal. The reinforcement learning model is used to train a Deep Q-Netowrkk, or DQN. DQNs function like traditional neural networks, but they also associate input values with a level of reward, much like reinforcement learning systems.
CARRL operates by modeling a range of different possible values for input data.
Assuming that the AI is trying to track the position of a dot within a larger image, the AI considers that the dot’s position could be the result of the adversarial influence and considers regions where the dot could be instead. The network then makes decisions based on the worst-case scenario for the dot’s position, settling on the action that would produce the highest reward in this worst-case scenario.
The typical method of guarding against adversarial examples involves running slightly altered versions of the input image through the AI network to see if the same decision is always made. If alterations to the image don’t dramatically affect the outcome, there’s a good chance the network is resistant to adversarial examples. However, this isn’t a viable strategy for scenarios where quick decisions need to be made, as these are time-intensive, computationally expensive methods of testing. For this reason, the MIT team set out to create a neural network that could make decisions based on worst-case assumptions, one capable of operating in scenarios where safety is critical.
The MIT researchers tested their algorithms by having the AI play a game of Pong. They included adversarial examples by feeding the AI instances where the ball was displayed slightly further down the screen than it actually was. As the influence of the adversarial examples grew, the standard corrective techniques began to fail while CARRL was able to win more games by comparison. CARRL was also tested on a collision avoidance task. The task unfolded in a virtual environment where two different agents tried to switch positions without bumping into each other. The research team altered the first agent’s perception of the second agent and CARRL was able to successfully steer the first agent around the other agent, even in conditions of high uncertainty, although there did come a point where CARRL became too cautious and ended up avoiding its destination altogether.
Regardless, MIT Department of Aeronautics and Astronautics Postdoc Michael Everett, who lead the study, explained that the research could have implications for the ability of robots to handle unpredictable situations. As Everett explained via MIT News:
“People can be adversarial, like getting in front of a robot to block its sensors, or interacting with them, not necessarily with the best intentions,” Everett says. “How can a robot think of all the things people might try to do, and try to avoid them? What sort of adversarial models do we want to defend against? That’s something we’re thinking about how to do.”