The human brain often recalls past memories (seemingly) unprompted. As we go throughout our day, we have spontaneous flashes of memory from our lives. While this spontaneous conjuration of memories has long been of interest to neuroscientists, AI research company DeepMind recently published a paper detailing how an AI of theirs replicated this strange pattern of recall.
The conjuration of memories in the brain, neural replay, is tightly linked with the hippocampus. The hippocampus is a seahorse-shaped formation in the brain that belongs to the limbic system, and it is associated with the formation of new memories, as well as the emotions that memories spark. Current theories on the role of the hippocampi (there is one in each hemisphere of the brain), state that different regions of the hippocampus are responsible for the handling of different types of memories. For instance, spatial memory is believed to be handled in the rear region of the hippocampus.
As reported by Jesus Rodriguez on Medium, Dr. John O’Keefe is responsible for many contributions to our understanding of the hippocampus, including the hippocampal “place” cells. The place cells in the hippocampus are triggered by stimuli in a specific environment. As an example, experiments on rats showed that specific neurons would fire when the rats ran through certain portions of a track. Researchers continued to monitor the rats even when they were resting, and they found that the same patterns of neurons denoting a portion of the maze would fire, although they fired at an accelerated speed. The rats seemed to be replaying the memories of the maze in their minds.
In humans, recalling memories is an important part of the learning process, but when trying to enable AI to learn, it is difficult to recreate the phenomenon.
The DeepMind team set about trying to recreate the phenomenon of recall using reinforcement learning. Reinforcement learning algorithms work by getting feedback from their interactions with the environment around them, getting rewarded whenever they take actions that bring them closer to the desired goal. In this context, the reinforcement learning agent records events and then plays them back at later times, with the system being reinforced to improve how efficiently it ends up recalling past experiences.
DeepMind added the replaying of experiences to a reinforcement learning algorithm using a replay buffer that would playback memories/recorded experiences to the system at specific times. Some versions of the system had the experiences played back in random orders while other models had pre-selected playback orders. While the researchers experimented with the order of playback for the reinforcement agents, they also experimented with different methods of replaying the experiences themselves.
There are two primary methods that are used to provide reinforcement algorithms with recalled experiences. These methods are the imagination replay method and the movie replay method. The DeepMind paper uses an analogy to describe both of the strategies:
“Suppose you come home and, to your surprise and dismay, discover water pooling on your beautiful wooden floors. Stepping into the dining room, you find a broken vase. Then you hear a whimper, and you glance out the patio door to see your dog looking very guilty.”
As reported by Rodriguez, the imagination replay method doesn’t record the events in the order that they were experienced. Rather, a probable cause between the events is inferred. The events are inferred based on the agent’s understanding of the world. Meanwhile, the movie replay method stores memories in the order in which the events occurred, and replays the sequence of stimuli – “spilled water, broken vase, dog”. The chronological ordering of events is preserved.
Research from the field of neuroscience implies that the movie replay method is integral to the creation of associations between concepts and the connection of neurons between events. Yet the imagination replay method could help the agent create new sequences when it reasons by analogy. For instance, the agent could reason that if a barrel is to oil as a vase is to water, a barrel could be spilled by a factory robot instead of a dog. Indeed, when DeepMind probed further into the possibilities of the imagination replay method, they found that their learning agent was able to create impressive, innovative sequences by taking previous experiences into account.
Most of the current progress being made in the area of reinforcement learning memory is being made with the movie strategy, although researchers have recently begun to make progress with the imagination strategy. Research into both methods of AI memory can not only enable better performance from reinforcement learning agents, but they can also help us gain new insight into how the human mind might function.
DeepMind and Google Brain Aim Create Methods to Improve Efficiency of Reinforcement Learning
Reinforcement learning systems can be powerful and robust, able to carry out extremely complex tasks through thousands of iterations of training. While reinforcement learning algorithms are capable of enabling sophisticated and occasionally surprising behavior, they take a long time to train and require vast amounts of data. These factors make reinforcement learning techniques rather inefficient, and recently research teams from Alphabet DeepMind and Google Brain have endeavored to find more efficient methods of creating reinforcement learning systems.
As reported by VentureBeat, the combined research group recently proposed methods of making reinforcement learning training more efficient. One of the proposed improvements was an algorithm dubbed Adaptive Behavior Policy Sharing (ABPS), while the other was a framework called Universal Value Function Approximators (UVFA). ABPS lets pools of AI agents share their adaptively selected experiences, while UVFA lets those AI simultaneously investigate directed exploration policies.
ABPS is intended to expedite the customization of hyperparameters when training a model. ABPS makes finding the optimal hyperparameters quicker by allowing several different agents with different hyperparameters to share their behavior policy experiences. To be more precise, ABPS lets reinforcement learning agents select actions from those actions that a policy has deemed okay and afterward it’s granted a reward and observation based on the following state.
AI reinforcement agents are trained with various combinations of possible hyperparameters, like decay rate and learning rate. When training a model, the goal is that the model converges on the combination of hyperparameters that gives it the best performance, and in this case those that also improve data efficiency. The efficiency is increased by training many agents at one time and choosing the behavior of only one agent to be deployed during the next time step. The policy that the target agent has is used to sample actions. The transitions are then logged within a shared space, and this space is constantly evaluated so that policy selection doesn’t have to occur as often. At the end of the training, an ensemble of agents is chosen and the top performing agents are selected to undergo final deployment.
In terms of UVFA, it attempts to deal with one of the common problems of reinforcement learning, that weakly reinforced agents often don’t learn tasks. UVFA attempts to solve the issue by having the agent learn a separate set of exploitation and exploration policies at the same time. Separating the tasks creates a framework that allows the exploratory policies to keep exploring the environment while the exploitation policies continue to try and maximize the reward for the current task. The exploratory policies of UVFA serve as a baseline architecture that will continue to improve even if there are no natural rewards being found. In such a condition, a function which corresponds to intrinsic rewards is approximated, which pushes the agents to explore all states in an environment, even if they often return to familiar states.
As VentureBeat explained, when the UVFA framework is in play, the intrinsic rewards of the system are given directly to the agent as inputs. The agent then keeps track of a representation of all inputs (such as rewards, action, and state) during a given episode. The result is that the reward is preserved over time and the agent’s policy is at least somewhat informed by it at all times.
This is accomplished with the utilization of an “episodic novelty” and a “life-long novelty” module. The function of the first module is to hold the current, episodic memory and map the current findings to the previously mentioned representation, letting the agent determine an intrinsic episodic reward for every step of training. Afterward, the state-linked with the current observation is added into memory. Meanwhile, the life-long novelty module is responsible for influencing how often the agent explores over the course of many episodes.
According to the Alphabet/Google teams, the new training techniques have already demonstrated the potential for substantial improvement while training a reinforcement learning system. UVFA was able to double the performance of some of the base agents that played various Atari games. Meanwhile, ABPS was able to increase performance on some of the same Atari games, decreasing variance amongst the top performing agents by approximately 25%. The UVFA trained algorithm was able to achieve a high score in Pitfall by itself, lacking any engineered features of human demos.
DeepMind Discovers AI Training Technique That May Also Work In Our Brains
DeepMind just recently published a paper detailing how a newly developed type of reinforcement learning could potentially explain how reward pathways within the human brain operate. As reported by NewScientist, the machine learning training method is called distributional reinforcement learning and the mechanisms behind it seem to plausibly explain how dopamine is released by neurons within the brain.
Neuroscience and computer science have a long history together. As far back as 1951, Marvin Minksy used a system of rewards and punishments to create a computer program capable of solving a maze. Minksy was inspired by the work of Ivan Pavlov, a physiologist who demonstrated that dogs could learn through a series of rewards and punishments. Deepmind’s new paper adds to the intertwining history of neuroscience and computer science by applying a type of reinforcement learning to gain insight into how dopamine neurons might function.
Whenever a person, or animal, is about to carry out an action, the collections of neurons in their brain responsible for the release of dopamine make a prediction about how rewarding the action will be. Once the action has been carried out and the consequences (rewards) of that action made apparent, the brain releases dopamine. However, this dopamine release is scaled in accordance with the size of the error in prediction. If the reward is larger/better than expected, a stronger surge of dopamine is triggered. In contrast, a worse reward leads to less dopamine being released. The dopamine serves as a corrective function that makes the neurons tune their predictions until they converge on the actual rewards being earned. This is very similar to how reinforcement learning algorithms operate.
The year 2017 saw DeepMind researchers release an enhanced version of a commonly used reinforcement learning algorithm, and this superior learning method was able to boost performance on many reinforcement learning tasks. The DeepMind team thought that the mechanisms behind the new algorithm could be used to better explain how dopamine neurons operate within the human brain.
In contrast to older reinforcement learning algorithms, DeepMind’s newer algorithm represents rewards as a distribution. Older reinforcement learning approaches represented estimated rewards as just a single number that stood for the average expected result. This change allowed the model to more accurately represent possible rewards and perform better as a result. The superior performance of the new training method prompted the DeepMind researchers to investigate if dopamine neurons in the human brain operate in a similar fashion.
In order to investigate the workings of dopamine neurons, DeepMind worked alongside Harvard to research the activity of dopamine neurons in mice. The researchers had the mice perform various tasks and gave them rewards based on the roll of dice, recording how their dopamine neurons fired. Different neurons seemed to predict different potential results, releasing different amounts of dopamine. Some neurons predicted lower than the actual reward while some predicted rewards higher than the actual reward. After graphing out the distribution of the reward predictions, the researchers found that the distribution of predictions was fairly close to the genuine reward distribution. This suggests that the brain does make use of a distributional system when making predictions and adjusting predictions to better match reality.
The study could inform both neuroscience nad computer science. The study supports the use of distributional reinforcement learning as a method of creating more advanced AI models. Beyond that, it could have implications for our theories of how the brain operates regarding reward systems. If dopamine neurons are distributed and some are more pessimistic or optimistic than others, understanding these distributions could alter how we approach aspects of psychology like mental health and motivation.
As MIT Technology View reported, Matt Botvinik, the director of neuroscience research at DeepMind, explained the importance of the findings at a press briefing. Botvinik said:
“If the brain is using it, it’s probably a good idea. It tells us that this is a computational technique that can scale in real-world situations. It’s going to fit well with other computational processes. It gives us a new perspective on what’s going on in our brains during everyday life”
Ubisoft Trains AI Agent To Drive A Car In A Racing Game
The term “AI” is used a lot in discussions of video games, but it is typically used to refer to the logic that controls non-player characters in video games, rather than referring to any system driven by what computer scientists would recognize as AI. Actual applications of AI utilizing artificial neural networks are fairly rare within the video game industry, but as VentureBeat reports gaming company Ubisoft has recently published a paper investigating possible uses for an AI agent trained with reinforcement learning.
While entities like DeepMind and OpenAI have investigated how AIs perform in a variety of video games, like StarCraft 2, Dota 2, and Minecraft, very little research has been done on the use of AI under the specific constraints often faced by game developers. Ubisoft La Forge, the prototyping arm of Ubisoft, just recently published a paper detailing an algorithm capable of carrying out predictable actions within a commercial video game. According to the report, the AI algorithms were capable of hitting current benchmarks and performing complex tasks reliably.
The authors of the paper note that while reinforcement learning has been used to great effect in the context of certain video games, often achieving parity with the best human players of said games, the systems created by OpenAI and DeepMind are rarely useful for game developers. The authors note that lack of accessibility is a large issue and that the most impressive results are obtained by research groups with access to large scale computational resources, resources that typically go well beyond what the average game developer has access to. Wrote the researchers:
“These systems have comparatively seen little use within the video game industry, and we believe lack of accessibility to be a major reason behind this. Indeed, really impressive results … are produced by large research groups with computational resources well beyond what is typically available within video game studios.”
The research team from Ubisoft aimed to remedy some of these problems by creating a reinforcement learning approach that optimized for issues like data sample collection and runtime budget constraints. Ubisoft’s solution was adapted from research done at the University of California, Berkeley. The Soft Actor-Critic model developed by UC Berkely researches is able to create a model that can effectively generalize to new conditions and is much more sample-efficient than most models. The Ubisoft team took this approach and adapted it for both discrete and continuous actions.
The Ubisoft research team evaluated the performance of their algorithm on three different games. There were two soccer games used to test the algorithm, as well as a simple platformer-style game. While the results for these games was slightly worse than the state-of-the-art industry results, another test was conducted in which the algorithms performed much better. The researchers used a driving video game as their test case, having the AI agent follow a given path and negotiate obstacles in an environment the agent hadn’t witnessed during training. There were two continuous actions, steering and acceleration, as well as one binary action (breaking).
The researchers summarized their results in the paper, declaring that the hybrid Soft Actor-Critic approach was successful when training an AI agent to drive at high speeds in a commercially available video game. According to the researchers, their training approach can potentially work for a wide variety of possible interaction approaches. These include instances where the AI agent has the exact same input options that the player has, demonstrating the “practical usefulness of such an algorithm for the video game industry.”
- New Research Shows How AI Can Act as Mediators
- Facial Expressions Of Mice Analyze With Artificial Intelligence
- Shell Begins to Reskill Workers in Artificial Intelligence
- Anthony Macciola, Chief Innovation Officer at ABBYY – Interview Series
- Brain Implants and AI Model Used To Translate Thought Into Text