Connect with us

Reinforcement Learning

AI Agents Demonstrate Emergent Intelligence Properties In Virtual Hide And Seek

mm

Published

 on

AI Agents Demonstrate Emergent Intelligence Properties In Virtual Hide And Seek

One of the interesting facts about researching AI is that it can often execute actions and pursue strategies that surprise the very researchers designing them. This happened during a recent virtual game of hide and seek where multiple AI agents were pitted against one another. Researchers at OpenAI, an AI firm based out of San Francisco, were surprised to find that their AI agents started exploiting strategies in the game world that the researchers didn’t even know existed.

OpenAI has trained a group of AI agents to play a hide and seek game with each other. The AI programs are trained with reinforcement learning, a technique where the desired behavior is elicited from the AI algorithms by providing the algorithms with feedback. The AI starts out by taking random actions, and every time it takes an action that gets it closer to its goal, the agent is rewarded. The AI desires to gain the maximum amount of reward possible, so it will experiment to see which actions gain it more reward. Through trial and error the AI is capable of distinguishing strategies that will bring them to victory, those which will give them the most reward.

Reinforcement learning has already demonstrated impressive success at learning the rules of games. OpenAI recently trained a team of AI to play the MMORPG DOTA 2, and the AI defeated a world-champion team of human players last year. A similar thing happened with the game StarCraft when an AI was trained on the game by DeepMind. Reinforcement learning has also been used to teach AI programs to play Pictionary with humans, learning to interpret pictures and use basic common sense reasoning.

In the hide and seek video game created by the researchers, multiple AI agents were pitted against one another. The result was an arms race of sorts, where each agent wants to outperform the other and obtain the most reward points. A new strategy adopted by one agent will cause its opponent to seek a new strategy to counter it, and vice-versa. Igor Mordatch, a researcher at OpenAI, explained to IEEE Spectrum that the experiment demonstrates that this process of trial and error playing between agents  “is enough for the agents to learn surprising behaviors on their own—it’s like children playing with each other.”

What were the surprising behaviors exactly? The researchers had four basic strategies that they expected the AI agents to learn, and they learned these fairly quickly, becoming competent in them after just 25 million simulated games. The game took place in a 3d environment full of ramps, blocks, and walls. The AI agents learned to chase each other around, move blocks to build forts they could hide in, and move ramps around. The AI seekers learned to drag ramps around to get inside the hiders’ forts, while the hiders learned to try and take the ramps into their forts so the seekers couldn’t use them.

However, around the benchmark of 380 million games, something unexpected happened. The AI agents learned to use two strategies the researchers didn’t expect. The seeker agents learned that by jumping onto a box and tilting/riding the box towards a nearby fort, they could jump into the fort and find the hider. The researchers hadn’t even realized that this was possible within the physics of the game environment. The hiders learned to deal with this issue by dragging the boxes into place within their fort.

While the unexpected behavior of agents trained on reinforcement learning algorithms is harmless in this instance, it does raise some potential concerns about how reinforcement learning is applied to other situations. A member of the OpenAI research team, Bowen Baker, explained to IEEE Spectrum that these unexpected behaviors could be potentially dangerous. After all, what if robots started behaving in unexpected ways?

“Building these environments is hard,” Baker explained. “The agents will come up with these unexpected behaviors, which will be a safety problem down the road when you put them in more complex environments.”

However, Baker also explained that reinforcement strategies could lead to innovative solutions to current problems. Systems trained with reinforcement learning could solve a wide array of problems with solutions we may not able to even imagine.

Spread the love

Deep Learning Specialization on Coursera

Blogger and programmer with specialties in machine learning and deep learning topics. Daniel hopes to help others use the power of AI for social good.

Reinforcement Learning

DeepMind Creates AI That Replays Memories Like The Hippocampus

mm

Published

on

DeepMind Creates AI That Replays Memories Like The Hippocampus

The human brain often recalls past memories (seemingly) unprompted. As we go throughout our day, we have spontaneous flashes of memory from our lives. While this spontaneous conjuration of memories has long been of interest to neuroscientists, AI research company DeepMind recently published a paper detailing how an AI of theirs replicated this strange pattern of recall.

The conjuration of memories in the brain, neural replay, is tightly linked with the hippocampus. The hippocampus is a seahorse-shaped formation in the brain that belongs to the limbic system, and it is associated with the formation of new memories, as well as the emotions that memories spark. Current theories on the role of the hippocampi (there is one in each hemisphere of the brain), state that different regions of the hippocampus are responsible for the handling of different types of memories. For instance, spatial memory is believed to be handled in the rear region of the hippocampus.

As reported by Jesus Rodriguez on MediumDr. John O’Keefe is responsible for many contributions to our understanding of the hippocampus, including the hippocampal “place” cells. The place cells in the hippocampus are triggered by stimuli in a specific environment. As an example, experiments on rats showed that specific neurons would fire when the rats ran through certain portions of a track. Researchers continued to monitor the rats even when they were resting, and they found that the same patterns of neurons denoting a portion of the maze would fire, although they fired at an accelerated speed. The rats seemed to be replaying the memories of the maze in their minds.

In humans, recalling memories is an important part of the learning process, but when trying to enable AI to learn, it is difficult to recreate the phenomenon.

The DeepMind team set about trying to recreate the phenomenon of recall using reinforcement learning. Reinforcement learning algorithms work by getting feedback from their interactions with the environment around them, getting rewarded whenever they take actions that bring them closer to the desired goal. In this context, the reinforcement learning agent records events and then plays them back at later times, with the system being reinforced to improve how efficiently it ends up recalling past experiences.

DeepMind added the replaying of experiences to a reinforcement learning algorithm using a replay buffer that would playback memories/recorded experiences to the system at specific times. Some versions of the system had the experiences played back in random orders while other models had pre-selected playback orders. While the researchers experimented with the order of playback for the reinforcement agents, they also experimented with different methods of replaying the experiences themselves.

There are two primary methods that are used to provide reinforcement algorithms with recalled experiences. These methods are the imagination replay method and the movie replay method. The DeepMind paper uses an analogy to describe both of the strategies:

“Suppose you come home and, to your surprise and dismay, discover water pooling on your beautiful wooden floors. Stepping into the dining room, you find a broken vase. Then you hear a whimper, and you glance out the patio door to see your dog looking very guilty.”

As reported by Rodriguez, the imagination replay method doesn’t record the events in the order that they were experienced. Rather, a probable cause between the events is inferred. The events are inferred based on the agent’s understanding of the world. Meanwhile, the movie replay method stores memories in the order in which the events occurred, and replays the sequence of stimuli – “spilled water, broken vase, dog”. The chronological ordering of events is preserved.

Research from the field of neuroscience implies that the movie replay method is integral to the creation of associations between concepts and the connection of neurons between events. Yet the imagination replay method could help the agent create new sequences when it reasons by analogy. For instance, the agent could reason that if a barrel is to oil as a vase is to water, a barrel could be spilled by a factory robot instead of a dog. Indeed, when DeepMind probed further into the possibilities of the imagination replay method, they found that their learning agent was able to create impressive, innovative sequences by taking previous experiences into account.

Most of the current progress being made in the area of reinforcement learning memory is being made with the movie strategy, although researchers have recently begun to make progress with the imagination strategy. Research into both methods of AI memory can not only enable better performance from reinforcement learning agents, but they can also help us gain new insight into how the human mind might function.

Spread the love

Deep Learning Specialization on Coursera
Continue Reading

Healthcare

AI Used To Create Drug Molecule That Could Fight Fibrosis

mm

Published

on

AI Used To Create Drug Molecule That Could Fight Fibrosis

Creating new medical drugs is a complex process that can take years of research and billions of dollars. Yet it’s also an important investment to make for people’s health. Artificial intelligence could potentially make the discovery of new drugs easier and substantially quicker if the recent work of the startup Insilico Medicine continues to make progress. As reported by SingularityHub, the AI startup has recently utilized AI to design a molecule that could combat fibrosis.

Given how complex and time-consuming the process of discovering new molecules for a drug is, scientists and engineers are constantly looking for ways to expedite it. The idea of using computers to help discover new drugs is nothing new, as the concept has existed for decades. However, progress on this front has been slow, with engineers struggling to find the right algorithms for drug creation.

Deep learning has started to make AI-driven drug discovery more viable, with pharmaceutical companies investing heavily in AI startups over the past few years. One company has managed to use AI to design a molecule that could combat fibrosis, taking only 46 days to do dream up a molecule resembling therapeutic drugs. Insilco Medicine combined two different deep learning techniques to achieve this result: reinforcement learning and generative adversarial networks (GANs).

Reinforcement learning is a machine learning method that encourages the machine learning model to make certain decisions by providing the network with feedback that elicits certain responses. The model can be punished for making undesirable choices or rewarded for making desirable choices. By using a combination of both negative and positive reinforcement the model is guided toward making desired decisions, and it will trend towards making decisions that minimize punishment and maximize reward.

Meanwhile, generative adversarial networks are “adversarial” because they consist of two different neural networks pitted against one another. The two networks are given examples of objects to train on, frequently images. The job of one network is to create a counterfeit object, something sufficiently similar to the real object that it can be confused for the genuine article. The job of the second network is to detect counterfeit objects. The two networks try to outperform the other network, and as they are both increasing their performance to overcome the other network, this virtual arms race leads to the counterfeit model generating objects that are nearly indistinguishable from the real thing.

By combining both GANS and reinforcement learning algorithms, the researchers were able to have their models produce new drug molecules extremely similar to already existing therapeutic drugs.

The results of Insilico Medicine’s experiments with AI drug discovery were recently published in the journal Nature Biotechnology. In the paper, the researchers discuss how the deep learning models were trained. The researchers took representations of molecules already used in drugs to handle proteins involved in idiopathic pulmonary fibrosis or IPF. These molecules were used as the basis for training and the combined models were able to generate around 30,000 possible drug molecules.

The researchers then sorted through the 30000 candidate molecules and selected the six most promising molecules for lab testing. These six finalists were synthesized in the lab and used in a series of tests that tracked their ability to target the IPF protein. One molecule, in particular, seemed promising, as it delivered the kind of results that are desired in a medical drug.

It’s important to note that the fibrosis drug targeted in the experiment has already been extensively researched, with multiple effective drugs already existing for it. The researchers could reference these drugs, and this gave the research team a leg up as they had a substantial amount of data to train their models on. This doesn’t hold true for many other diseases, and as a result, there is a larger gap to close on these treatments.

Another important fact is that the company’s current drug development model only deals with the initial discovery process,and that the molecules generated by their model will still require many tweaks and optimizations before the molecules could potentially be used for clinical trials.

According to Wired, Insilico Medicine’s CEO Alex Zharvornokov acknowledges that their AI-driven drug isn’t ready for field use, with the current study just being a proof of concept. The goal of this experiment was to see how quickly a drug could be designed with the assistance of AI systems. However, Zhavornokov notes that the researchers were able to design a potentially useful molecule much faster than they could have if they had used regular drug discovery methods.

Despite the caveats, Insilico Medicine’s research still represents a notable advancement in the usage of AI to create new drugs. The refinement of the techniques used in the study could substantially shorten the amount of time required to develop a new drug. This could prove especially useful in an era where antibiotic-resistant bacteria are proliferating and many previously effective drugs losing their potency.

Spread the love

Deep Learning Specialization on Coursera
Continue Reading