One of the interesting facts about researching AI is that it can often execute actions and pursue strategies that surprise the very researchers designing them. This happened during a recent virtual game of hide and seek where multiple AI agents were pitted against one another. Researchers at OpenAI, an AI firm based out of San Francisco, were surprised to find that their AI agents started exploiting strategies in the game world that the researchers didn’t even know existed.
OpenAI has trained a group of AI agents to play a hide and seek game with each other. The AI programs are trained with reinforcement learning, a technique where the desired behavior is elicited from the AI algorithms by providing the algorithms with feedback. The AI starts out by taking random actions, and every time it takes an action that gets it closer to its goal, the agent is rewarded. The AI desires to gain the maximum amount of reward possible, so it will experiment to see which actions gain it more reward. Through trial and error the AI is capable of distinguishing strategies that will bring them to victory, those which will give them the most reward.
Reinforcement learning has already demonstrated impressive success at learning the rules of games. OpenAI recently trained a team of AI to play the MMORPG DOTA 2, and the AI defeated a world-champion team of human players last year. A similar thing happened with the game StarCraft when an AI was trained on the game by DeepMind. Reinforcement learning has also been used to teach AI programs to play Pictionary with humans, learning to interpret pictures and use basic common sense reasoning.
In the hide and seek video game created by the researchers, multiple AI agents were pitted against one another. The result was an arms race of sorts, where each agent wants to outperform the other and obtain the most reward points. A new strategy adopted by one agent will cause its opponent to seek a new strategy to counter it, and vice-versa. Igor Mordatch, a researcher at OpenAI, explained to IEEE Spectrum that the experiment demonstrates that this process of trial and error playing between agents “is enough for the agents to learn surprising behaviors on their own—it’s like children playing with each other.”
What were the surprising behaviors exactly? The researchers had four basic strategies that they expected the AI agents to learn, and they learned these fairly quickly, becoming competent in them after just 25 million simulated games. The game took place in a 3d environment full of ramps, blocks, and walls. The AI agents learned to chase each other around, move blocks to build forts they could hide in, and move ramps around. The AI seekers learned to drag ramps around to get inside the hiders’ forts, while the hiders learned to try and take the ramps into their forts so the seekers couldn’t use them.
However, around the benchmark of 380 million games, something unexpected happened. The AI agents learned to use two strategies the researchers didn’t expect. The seeker agents learned that by jumping onto a box and tilting/riding the box towards a nearby fort, they could jump into the fort and find the hider. The researchers hadn’t even realized that this was possible within the physics of the game environment. The hiders learned to deal with this issue by dragging the boxes into place within their fort.
While the unexpected behavior of agents trained on reinforcement learning algorithms is harmless in this instance, it does raise some potential concerns about how reinforcement learning is applied to other situations. A member of the OpenAI research team, Bowen Baker, explained to IEEE Spectrum that these unexpected behaviors could be potentially dangerous. After all, what if robots started behaving in unexpected ways?
“Building these environments is hard,” Baker explained. “The agents will come up with these unexpected behaviors, which will be a safety problem down the road when you put them in more complex environments.”
However, Baker also explained that reinforcement strategies could lead to innovative solutions to current problems. Systems trained with reinforcement learning could solve a wide array of problems with solutions we may not able to even imagine.