The human brain often recalls past memories (seemingly) unprompted. As we go throughout our day, we have spontaneous flashes of memory from our lives. While this spontaneous conjuration of memories has long been of interest to neuroscientists, AI research company DeepMind recently published a paper detailing how an AI of theirs replicated this strange pattern of recall.
The conjuration of memories in the brain, neural replay, is tightly linked with the hippocampus. The hippocampus is a seahorse-shaped formation in the brain that belongs to the limbic system, and it is associated with the formation of new memories, as well as the emotions that memories spark. Current theories on the role of the hippocampi (there is one in each hemisphere of the brain), state that different regions of the hippocampus are responsible for the handling of different types of memories. For instance, spatial memory is believed to be handled in the rear region of the hippocampus.
As reported by Jesus Rodriguez on Medium, Dr. John O’Keefe is responsible for many contributions to our understanding of the hippocampus, including the hippocampal “place” cells. The place cells in the hippocampus are triggered by stimuli in a specific environment. As an example, experiments on rats showed that specific neurons would fire when the rats ran through certain portions of a track. Researchers continued to monitor the rats even when they were resting, and they found that the same patterns of neurons denoting a portion of the maze would fire, although they fired at an accelerated speed. The rats seemed to be replaying the memories of the maze in their minds.
In humans, recalling memories is an important part of the learning process, but when trying to enable AI to learn, it is difficult to recreate the phenomenon.
The DeepMind team set about trying to recreate the phenomenon of recall using reinforcement learning. Reinforcement learning algorithms work by getting feedback from their interactions with the environment around them, getting rewarded whenever they take actions that bring them closer to the desired goal. In this context, the reinforcement learning agent records events and then plays them back at later times, with the system being reinforced to improve how efficiently it ends up recalling past experiences.
DeepMind added the replaying of experiences to a reinforcement learning algorithm using a replay buffer that would playback memories/recorded experiences to the system at specific times. Some versions of the system had the experiences played back in random orders while other models had pre-selected playback orders. While the researchers experimented with the order of playback for the reinforcement agents, they also experimented with different methods of replaying the experiences themselves.
There are two primary methods that are used to provide reinforcement algorithms with recalled experiences. These methods are the imagination replay method and the movie replay method. The DeepMind paper uses an analogy to describe both of the strategies:
“Suppose you come home and, to your surprise and dismay, discover water pooling on your beautiful wooden floors. Stepping into the dining room, you find a broken vase. Then you hear a whimper, and you glance out the patio door to see your dog looking very guilty.”
As reported by Rodriguez, the imagination replay method doesn’t record the events in the order that they were experienced. Rather, a probable cause between the events is inferred. The events are inferred based on the agent’s understanding of the world. Meanwhile, the movie replay method stores memories in the order in which the events occurred, and replays the sequence of stimuli – “spilled water, broken vase, dog”. The chronological ordering of events is preserved.
Research from the field of neuroscience implies that the movie replay method is integral to the creation of associations between concepts and the connection of neurons between events. Yet the imagination replay method could help the agent create new sequences when it reasons by analogy. For instance, the agent could reason that if a barrel is to oil as a vase is to water, a barrel could be spilled by a factory robot instead of a dog. Indeed, when DeepMind probed further into the possibilities of the imagination replay method, they found that their learning agent was able to create impressive, innovative sequences by taking previous experiences into account.
Most of the current progress being made in the area of reinforcement learning memory is being made with the movie strategy, although researchers have recently begun to make progress with the imagination strategy. Research into both methods of AI memory can not only enable better performance from reinforcement learning agents, but they can also help us gain new insight into how the human mind might function.
What Is Reinforcement Learning?
Put simply, reinforcement learning is a machine learning technique that involves training an artificial intelligence agent through the repetition of actions and associated rewards. A reinforcement learning agent experiments in an environment, taking actions and being rewarded when the correct actions are taken. Over time, the agent learns to take the actions that will maximize its reward. That’s a quick definition of reinforcement learning, but taking a closer look at the concepts behind reinforcement learning will help you gain a better, more intuitive understanding of it.
Reinforcement In Psychology
The term “reinforcement learning” is adapted from the concept of reinforcement in psychology. For that reason, let’s take a moment to understand the psychological concept of reinforcement. In the psychological sense, the term reinforcement refers to something that increases the likelihood that a particular response/action will occur. This concept of reinforcement is a central idea of the theory of operant conditioning, initially proposed by the psychologist B.F. Skinner. In this context, reinforcement is anything that causes the frequency of a given behavior to increase. If we think about possible reinforcement for humans, these can be things like praise, a raise at work, candy, and fun activities.
In the traditional, psychological sense, there are two types of reinforcement. There’s positive reinforcement and negative reinforcement. Positive reinforcement is the addition of something to increase a behavior, like giving your dog a treat when it is well behaved. Negative reinforcement involves removing a stimulus to elicit a behavior, like shutting off loud noises to coax out a skittish cat.
Positive and Negative Reinforcement In Machine Learning
Positive reinforcement increases the frequency of a behavior while negative reinforcement decreases the frequency. In general, positive reinforcement is the most common type of reinforcement used in reinforcement learning, as it helps models maximize the performance on a given task. Not only that but positive reinforcement leads the model to make more sustainable changes, changes which can become consistent patterns and persist for long periods of time.
In contrast, while negative reinforcement also makes a behavior more likely to occur, it is used for maintaining a minimum performance standard rather than reaching a model’s maximum performance. Negative reinforcement in reinforcement learning can help ensure that a model is kept away from undesirable actions, but it can’t really make a model explore desired actions.
Training A Reinforcement Agent
Imagine that we are training a reinforcement agent to play a platforming video game where the AI’s goal is to make it to the end of the level by moving right across the screen. The initial state of the game is drawn from the environment, meaning the first frame of the game is analyzed and given to the model. Based on this information, the model must decide on an action.
During the initial phases of training, these actions are random but as the model is reinforced, certain actions will become more common. After the action is taken the environment of the game is updated and a new state or frame is created. If the action taken by the agent produced a desirable result, let’s say in this case that the agent is still alive and hasn’t been hit by an enemy, some reward is given to the agent and it becomes more likely to do the same in the future.
This basic system is constantly looped, happening again and again, and each time the agent tries to learn a little more and maximize its reward.
Episodic vs Continuous Tasks
Reinforcement learning tasks can typically be placed in one of two different categories: episodic tasks and continual tasks.
Episodic tasks will carry out the learning/training loop and improve their performance until some end criteria are met and the training is terminated. In a game, this might be reaching the end of the level or falling into a hazard like spikes. In contrast, continual tasks have no termination criteria, essentially continuing to train forever until the engineer chooses to end the training.
Monte Carlo vs Temporal Difference
There are two primary ways of learning, or training, a reinforcement learning agent. In the Monte Carlo approach, rewards are delivered to the agent (its score is updated) only at the end of the training episode. To put that another way, only when the termination condition is hit does the model learn how well it performed. It can then use this information to update and when the next training round is started it will respond in accordance to the new information.
The temporal-difference method differs from the Monte Carlo method in that the value estimation, or the score estimation, is updated during the course of the training episode. Once the model advances to the next time step the values are updated.
Explore vs Exploit
Training a reinforcement learning agent is a balancing act, involving the balancing of two different metrics: exploration and exploitation.
Exploration is the act of collecting more information about the surrounding environment, while exploration is using the information already known about the environment to earn reward points. If an agent only explores and never exploits the environment, the desired actions will never be carried out. On the other hand, if the agent only exploits and never explores, the agent will only learn to carry out one action and won’t discover other possible strategies of earning rewards. Therefore, balancing exploration and exploitation is critical when creating a reinforcement learning agent.
Uses For Reinforcement Learning
Reinforcement learning can be used in a wide variety of roles, and it is best suited for applications where tasks require automation.
Automation of tasks to be carried out by industrial robots is one area where reinforcement learning proves useful. Reinforcement learning can also be used for problems like text mining, creating models that are able to summarize long bodies of text. Researchers are also experimenting with using reinforcement learning in the healthcare field, with reinforcement agents handling jobs like the optimization of treatment policies. Reinforcement learning could also be used to customize educational material for students.
Reinforcement learning is a powerful method of constructing AI agents that can lead to impressive and sometimes surprising results. Training an agent through reinforcement learning can be complex and difficult, as it takes many training iterations and a delicate balance of the explore/exploit dichotomy. However, if successful, an agent created with reinforcement learning can carry out complex tasks under a wide variety of different environments.
AI Agents Demonstrate Emergent Intelligence Properties In Virtual Hide And Seek
One of the interesting facts about researching AI is that it can often execute actions and pursue strategies that surprise the very researchers designing them. This happened during a recent virtual game of hide and seek where multiple AI agents were pitted against one another. Researchers at OpenAI, an AI firm based out of San Francisco, were surprised to find that their AI agents started exploiting strategies in the game world that the researchers didn’t even know existed.
OpenAI has trained a group of AI agents to play a hide and seek game with each other. The AI programs are trained with reinforcement learning, a technique where the desired behavior is elicited from the AI algorithms by providing the algorithms with feedback. The AI starts out by taking random actions, and every time it takes an action that gets it closer to its goal, the agent is rewarded. The AI desires to gain the maximum amount of reward possible, so it will experiment to see which actions gain it more reward. Through trial and error the AI is capable of distinguishing strategies that will bring them to victory, those which will give them the most reward.
Reinforcement learning has already demonstrated impressive success at learning the rules of games. OpenAI recently trained a team of AI to play the MMORPG DOTA 2, and the AI defeated a world-champion team of human players last year. A similar thing happened with the game StarCraft when an AI was trained on the game by DeepMind. Reinforcement learning has also been used to teach AI programs to play Pictionary with humans, learning to interpret pictures and use basic common sense reasoning.
In the hide and seek video game created by the researchers, multiple AI agents were pitted against one another. The result was an arms race of sorts, where each agent wants to outperform the other and obtain the most reward points. A new strategy adopted by one agent will cause its opponent to seek a new strategy to counter it, and vice-versa. Igor Mordatch, a researcher at OpenAI, explained to IEEE Spectrum that the experiment demonstrates that this process of trial and error playing between agents “is enough for the agents to learn surprising behaviors on their own—it’s like children playing with each other.”
What were the surprising behaviors exactly? The researchers had four basic strategies that they expected the AI agents to learn, and they learned these fairly quickly, becoming competent in them after just 25 million simulated games. The game took place in a 3d environment full of ramps, blocks, and walls. The AI agents learned to chase each other around, move blocks to build forts they could hide in, and move ramps around. The AI seekers learned to drag ramps around to get inside the hiders’ forts, while the hiders learned to try and take the ramps into their forts so the seekers couldn’t use them.
However, around the benchmark of 380 million games, something unexpected happened. The AI agents learned to use two strategies the researchers didn’t expect. The seeker agents learned that by jumping onto a box and tilting/riding the box towards a nearby fort, they could jump into the fort and find the hider. The researchers hadn’t even realized that this was possible within the physics of the game environment. The hiders learned to deal with this issue by dragging the boxes into place within their fort.
While the unexpected behavior of agents trained on reinforcement learning algorithms is harmless in this instance, it does raise some potential concerns about how reinforcement learning is applied to other situations. A member of the OpenAI research team, Bowen Baker, explained to IEEE Spectrum that these unexpected behaviors could be potentially dangerous. After all, what if robots started behaving in unexpected ways?
“Building these environments is hard,” Baker explained. “The agents will come up with these unexpected behaviors, which will be a safety problem down the road when you put them in more complex environments.”
However, Baker also explained that reinforcement strategies could lead to innovative solutions to current problems. Systems trained with reinforcement learning could solve a wide array of problems with solutions we may not able to even imagine.
AI Used To Create Drug Molecule That Could Fight Fibrosis
Creating new medical drugs is a complex process that can take years of research and billions of dollars. Yet it’s also an important investment to make for people’s health. Artificial intelligence could potentially make the discovery of new drugs easier and substantially quicker if the recent work of the startup Insilico Medicine continues to make progress. As reported by SingularityHub, the AI startup has recently utilized AI to design a molecule that could combat fibrosis.
Given how complex and time-consuming the process of discovering new molecules for a drug is, scientists and engineers are constantly looking for ways to expedite it. The idea of using computers to help discover new drugs is nothing new, as the concept has existed for decades. However, progress on this front has been slow, with engineers struggling to find the right algorithms for drug creation.
Deep learning has started to make AI-driven drug discovery more viable, with pharmaceutical companies investing heavily in AI startups over the past few years. One company has managed to use AI to design a molecule that could combat fibrosis, taking only 46 days to do dream up a molecule resembling therapeutic drugs. Insilco Medicine combined two different deep learning techniques to achieve this result: reinforcement learning and generative adversarial networks (GANs).
Reinforcement learning is a machine learning method that encourages the machine learning model to make certain decisions by providing the network with feedback that elicits certain responses. The model can be punished for making undesirable choices or rewarded for making desirable choices. By using a combination of both negative and positive reinforcement the model is guided toward making desired decisions, and it will trend towards making decisions that minimize punishment and maximize reward.
Meanwhile, generative adversarial networks are “adversarial” because they consist of two different neural networks pitted against one another. The two networks are given examples of objects to train on, frequently images. The job of one network is to create a counterfeit object, something sufficiently similar to the real object that it can be confused for the genuine article. The job of the second network is to detect counterfeit objects. The two networks try to outperform the other network, and as they are both increasing their performance to overcome the other network, this virtual arms race leads to the counterfeit model generating objects that are nearly indistinguishable from the real thing.
By combining both GANS and reinforcement learning algorithms, the researchers were able to have their models produce new drug molecules extremely similar to already existing therapeutic drugs.
The results of Insilico Medicine’s experiments with AI drug discovery were recently published in the journal Nature Biotechnology. In the paper, the researchers discuss how the deep learning models were trained. The researchers took representations of molecules already used in drugs to handle proteins involved in idiopathic pulmonary fibrosis or IPF. These molecules were used as the basis for training and the combined models were able to generate around 30,000 possible drug molecules.
The researchers then sorted through the 30000 candidate molecules and selected the six most promising molecules for lab testing. These six finalists were synthesized in the lab and used in a series of tests that tracked their ability to target the IPF protein. One molecule, in particular, seemed promising, as it delivered the kind of results that are desired in a medical drug.
It’s important to note that the fibrosis drug targeted in the experiment has already been extensively researched, with multiple effective drugs already existing for it. The researchers could reference these drugs, and this gave the research team a leg up as they had a substantial amount of data to train their models on. This doesn’t hold true for many other diseases, and as a result, there is a larger gap to close on these treatments.
Another important fact is that the company’s current drug development model only deals with the initial discovery process,and that the molecules generated by their model will still require many tweaks and optimizations before the molecules could potentially be used for clinical trials.
According to Wired, Insilico Medicine’s CEO Alex Zharvornokov acknowledges that their AI-driven drug isn’t ready for field use, with the current study just being a proof of concept. The goal of this experiment was to see how quickly a drug could be designed with the assistance of AI systems. However, Zhavornokov notes that the researchers were able to design a potentially useful molecule much faster than they could have if they had used regular drug discovery methods.
Despite the caveats, Insilico Medicine’s research still represents a notable advancement in the usage of AI to create new drugs. The refinement of the techniques used in the study could substantially shorten the amount of time required to develop a new drug. This could prove especially useful in an era where antibiotic-resistant bacteria are proliferating and many previously effective drugs losing their potency.