Connect with us

Robotics

Computer Scientists Use Positive Reinforcement to Teach Robots

Updated

 on

Image: Johns Hopkins University

Computer scientists at Johns Hopkins University have deployed the long-standing training technique of positive reinforcement, which is often used to train animals such as dogs, on a robot so that it could teach itself new tricks. Among those new skills was the ability to stack blocks. 

The robot is called Spot, and according to the researchers, it can learn skills within days that traditionally take around a month.

Positive Reinforcement

Positive reinforcement was used by the team to increase the robot’s skill sets. The speed at which the team was able to do this makes it easier for these types of robots to be deployed in the real world.

The work was published in IEEE Robotics and Automation Letters, titled “Good Robot!: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer.” 

Andrew Hundt is a PhD student working in Johns Hopkins University and lead author of the research. 

“The question here was how do we get the robot to learn a skill?” he said. “I’ve had dogs so I know rewards work and that was the inspiration for how I designed the learning algorithm.”

One of the reasons positive reinforcement works on computers is that they do not have intuitive brains, meaning they are basically a blank canvas in which anything can be projected onto. In other words, they must learn everything from nothing. One of the most effective methods of learning for computers is trial and error, which is something roboticists are still working on today.

This is exactly what the researchers did when they created a reward system for the robot, similarly to the process of training a dog by giving it treats. The difference is the robot will receive numeric points when it completes a task correctly. 

Dog Training Methods Help Teach Robots to Learn New Tricks

Skills Learned

When it came to learning how to stack blocks, the robot had to learn to focus on constructive actions. In the method, Spot the robot received higher points when it completed correct behaviors during the stacking of the blocks. On the opposite end, it earned nothing for incorrect behaviors. It earned the highest amount of points by completing a four-block stack with the last block on top.

The researchers saw great success in this method, with the robot learning in days what would’ve taken weeks in the past. By training a simulated robot, the team reduced practice time before moving onto the Spot robot.

“The robot wants the higher score,” Hundt said. “It quickly learns the right behavior to get the best reward. In fact, it used to take a month of practice for the robot to achieve 100% accuracy. We were able to do it in two days.” 

Besides learning how to stack blocks, the robot also used positive reinforcement to learn other tasks, such as how to play a simulated navigation game. 

“At the start the robot has no idea what it’s doing but it will get better and better with each practice. It never gives up and keeps trying to stack and is able to finish the task 100% of the time,” Hundt said.

Some of the possible applications for this method include training household robots to complete certain tasks, as well as improving autonomous vehicles.

“Our goal is to eventually develop robots that can do complex tasks in the real world — like product assembly, caring for the elderly and surgery,” Hager said. “We don’t currently know how to program tasks like that — the world is too complex. But work like this shows us that there is promise to the idea that robots can learn how to accomplish such real-world tasks in a safe and efficient way.