In one of the latest developments in the field of robotics, researchers at the University of Southern California (USC) have developed a system where robots can learn complicated tasks with few demonstrations. Even more impressively, some of the demonstrations can be imperfect.
The research was presented at the Conference on Robot Learning (CoRL) on Nov. 18, titled “Learning from Demonstrations Using Signal Temporal Logic.”
The quality of each demonstration is measured so that the system can learn from its successes and failures. Unlike current methods, which require at least 100 demonstrations to teach a specific task, the new system requires just a few. In an intuitive manner, the way these robots learn is similar to the way humans learn from each other. For example, humans watch and learn from others completing tasks successfully or imperfectly.
Aniruddh Puranic is the lead author of the research and a Ph.D. student in computer science at the USC Viterbi School of Engineering.
“Many machine learning and reinforcement learning systems require large amounts of data and hundreds of demonstrations – you need a human to demonstrate over and over again, which is not feasible,” said Puranic.
“Also, most people don’t have programming knowledge to explicitly state what the robot needs to do, and a human cannot possibly demonstrate everything that a robot needs to know,” he continued. “What if the robot encounters something it hasn’t seen before? This is a key challenge.”
The researchers utilized “signal temporal logic” or STL to determine the quality of the demonstrations, ranking them accordingly and creating inherent rewards.
There are two main reasons the researchers decided on STL:
- By learning through demonstrations, robots can pick up imperfections or even unsafe behaviors and undesirable actions.
- Demonstrations can differ in quality depending on the user providing them, and some demonstrations are better indicators of desired behavior than others.
By developing the system in this way, the robot can still learn from the imperfect demonstrations, even if they don’t meet logic requirements. In other words, it makes its own conclusion about accuracy or success.
Stefanos Nikolaidis is a co-author and a USC Viterbi assistant professor of computer science.
“Let’s say robots learn from different types of demonstrations – it could be a hands-on demonstration, videos, or simulations – if I do something that is very unsafe, standard approaches will do one of two things: either, they will completely disregard it, or even worse, the robot will learn the wrong thing,” Nikolaidis says.
“In contrast, in a very intelligent way, this work uses some common-sense reasoning in the form of logic to understand which parts of the demonstration are good and which parts are not,” he continues. “In essence, this is exactly what also humans do.”
Signal Temporal Logic
Robots can reason about current and future outcomes through STL, which is an expressive mathematical symbolic language. Previously to STL, research relied on “linear temporal logic.”
Jyo Deshmukh is a former Toyota engineer and assistant professor of computer science at USC.
“When we go into the world of cyber physical systems, like robots and self driving cars, where time is crucial, linear temporal logic becomes a bit cumbersome, because it reasons about sequences of true/false values for variables, while STL allows reasoning about physical signals,” Deshmukh says.
The team of researchers was surprised by the system’s level of success.
“Compared to a state-of-the-art algorithm, being used extensively in robotics applications, you see an order of magnitude difference in how many demonstrations are required,” says Nikolaidis.
According to the researchers, the systems could learn from driving simulators and eventually videos. The next step is to test it on real robots, as the initial testing was done on a game simulator. The system will be useful for applications such as those in household environments, warehouses, and space exploration rovers.
“If we want robots to be good teammates and help people, first they need to learn and adapt to human preference very efficiently,” says Nikolaidis. “Our method provides that.”