Connect with us

Reinforcement Learning

AI Researchers Create Video Game Playing Model That Can Remember Past Events

mm

Published

 on

A team of researchers at Uber’s AI lab have recently developed a system of AI algorithms that outperformed both human players and other AI systems at classic Atari video games. The AI system developed by the researchers is capable of remembering previously successful strategies, creating new strategies based on what worked in the past. The study’s research team believes that the algorithms they developed have potential applications in other technical fields like language processing and robotics.

The typical method used to create AI systems capable of playing video games is to use a reinforcement learning algorithm. Reinforcement learning algorithms learn how to carry out a task by exploring a range of possible actions, and after each action, they are provided with a type of reinforcement (a reward or punishment). Over time, the AI model learns which actions lead to larger rewards, and it becomes more likely to carry out these actions. Unfortunately, reinforcement learning models run into trouble when they encounter data-points incongruous with others in the dataset.

According to the research team, the reason that their approach hadn’t been considered by other AI researchers is that the strategy differs from the “intrinsic motivation” approach typically used in reinforcement learning. The issue with an intrinsic motivation approach is that the model can be prone to “forgetting” about potentially rewarding areas that still merit exploration. This phenomenon is referred to as “detachment”. As a consequence, when the model encounters unexpected data, it may forget about areas that should still be explored.

According to TechXplore, the research team set out to create a learning model that was more flexible and able to respond to unexpected data. The researchers overcame this problem by introducing an algorithm capable of remembering all of the actions taken by a previous version of the model when it tried to solve a problem. When the AI model encounters a data point that isn’t consistent with what it has learned so far, the model checks its memory map. The model will then identify which strategies succeeded and failed and choose strategies appropriately.

When playing a video game, the model collects screenshots of the game as it plays, making a log of its actions. The images are grouped together based on similarity, forming clear points in time that the model can refer back to. The algorithm can use the logged images to return to an interesting point in time and continue exploring from there. When the model finds that it’s losing, it will refer back to the screenshots taken and try a different strategy.

As explained by the BBC, there is also the problem of handling dangerous scenarios for the AI agent playing the game. If the agent runs into a hazard that can kill it, that would prevent it from returning to areas that merit more exploration, a problem called “derailment”. The AI model handles derailment problems through a separate process from the one used to encourage exploration of old areas.

The research team had the mode play through 55 Atari games. These games are commonly used to benchmark the performance of AI models, but the researchers added a twist for their model. The researchers introduced additional rules to the games, instructing the model to not only achieve the highest score possible but to try to achieve an even higher score every time. When the results of the model’s performance were analyzed, the researchers found that their AI system outperformed other AIs at the games around 85% of the time. The AI performed especially well at the game Montezuma’s Revenge, a platforming game where the player dodges hazards and collects treasures. The game beat the record for a human player and also scored higher than any other AI system has.

According to the Uber AI researchers, the strategies used by the research team have applications for industries like robotics. Robots benefit from the ability to remember which actions are successful, which didn’t work, and which haven’t been tried yet.