A team of researchers at Electronic Arts have recently experimented with various artificial intelligence algorithms, including reinforcement learning models, to automate aspects of video game creation. The researchers hope that the AI models can save their developers and animators time doing repetitive tasks like coding character movement.
Designing a video game, particularly the large, triple-A video games designed by large game companies, requires thousands of hours of work. As video game consoles, computers, and mobile devices become more powerful, video games themselves become more and more complex. Game developers are searching for ways to produce more game content with less effort, for example, they often choose to use procedural generation algorithms to produce landscapes and environments. Similarly, artificial intelligence algorithms can be used to generate video game levels, automate game testing, and even animate character movements.
Character animations for video games are often completed with the assistance of motion capture systems, which track the movements of real actors to ensure more life-like animations. However, this approach does have limitations. Not only does the code that drives the animations still need to be written, but animators are also limited only to the actions that have been captured.
As Wired reported, researchers from EA set out to automate this process and save both time and money on these animations. The team of researchers demonstrated that a reinforcement learning algorithm could be used to create a human model that moves in realistic fashions, without the need to manually record and code the movements. The research team used “Motion Variational Autoencoders” (Motion VAEs) to identify relevant patterns of movement from motion-capture data. After the autoencoders extracted the movement patterns, a reinforcement learning system was trained with the data, with the goal of creating realistic animations based on certain objectives (such as running after a ball in a soccer game). The planning and control algorithms used by the research team were able to generate the desired motions, even producing motions that weren’t in the original set of motion-capture data. This means that after learning how a subject walks, the reinforcement learning model can determine what running looks like.
Julian Togelius, NYU professor and AI tools company Modl.ai co-founder was quoted by Wired as saying that the technology could be quite useful in the future and is likely to change how content for games is created.
“Procedural animation will be a huge thing. It basically automates a lot of the work that goes into building game content,” Togelius said to Wired.
According to professor Michiel van de Panne from UBC, who was involved with the reinforcement learning project, the research team is looking to take the concept further by animating non-human avatars with the same process. Van de Panne said to Wired that although the process of creating new animations can be quite difficult, he is confident the technology will be able to render appealing animations someday.
Other applications of AI in the development of video games include the generation of basic games. For instance, researchers at the University of Toronto managed to design a generative adversarial network that could recreate the game Pac-Man without access to any of the code used to design the game. Elsewhere, researchers from the University of Alberta used AI models to generate levels of video games based off on the rules of different games like Super Mario Bros. and Mega Man.
Dor Skuler, the CEO and Co-Founder of Intuition Robotics – Interview Series
Dor Skuler is the co-founder and CEO of Intuition Robotics, a company redefining the relationship between humans and machines. They build digital companions including ElliQ – the sidekick for happier aging which improves the lives of older adults.
Intuition Robotics is your fifth venture. What inspired you to launch this company?
Throughout my career, I’ve enjoyed finding brand new challenges that are in need of the latest technology innovations. As technology around us became more sophisticated, we believed that there was a need to redefine the relationship between humans and machines, through digital companion agents. We decided to start with helping older adults stay active and engaged with a social companion. We felt this was an important space to start with, as we could create a solution to help older adults avoid loneliness and social isolation. We’re doing this by focusing on celebrating aging and the joys of this specific period in life, rather than focusing on disabilities.
Intuition Robotics’ first product is ElliQ a digital assistant for the elderly. How does ElliQ help older adults fight loneliness, dementia, etc?
90% of older adults prefer to age at home, and we’re seeing a positive trend of “aging in place” at home and within their own communities, as opposed to moving to a senior care facility. We’re also seeing a strong willingness to adopt non-medical approaches to improve quality of life for older adults, including technologies that allow them to thrive and continue living independently, rather than offerings that only treat issues.
Many home assistants on the market today are reactive and command-based; they only respond to questions and do tasks when prompted. This does little to create a relationship and combat loneliness as you feel like you’re just talking to a machine. ElliQ is different in that she intuitively learns users’ interests and proactively makes suggestions. Instead of waiting for someone to ask her to play music, for example, ElliQ will suggest digital content like TED talks, trivia games, or music. She’ll learn her user’s routines and preferences and will prompt them to engage in an activity after ElliQ notices inactivity. ElliQ creates an emotional bond and helps users feel like they aren’t alone.
You’ve stated that pro-active AI initiated Interactions is very important. In one product demo one of the interesting functions is ElliQ will randomly introduce a piece of interesting information. Is this simply a way of connecting with the user? What are some of the other advantages of doing this?
Proactivity helps to create a bi-directional relationship with the user. Not only is the user prompting the device, but since the device is a goal-based AI agent wanting to motivate the user to be more connected and engaged in the world around him, she’ll proactively initiate interactions that will promote the agent’s goals. Proactivity also helps the user in relating better to the device and feeling as if this is a lifelike entity and not a piece of hardware.
Being pro-active is important, but one of the challenges of a digital assistant is not to annoy a user, how do you tackle this challenge?
We have been designing our digital companions to encompass a “do not disturb the user” goal. This goal is part of our decision making algorithm based on which the agent makes a decision what to proactively initiate. This goal competes with the agent’s other goals such as keeping the user entertained or connected to family members. Based on reinforcement learning, one of these goals “wins”.
Can you discuss designing personality or character in order to enable the human to bond with the machine?
A distinct, character-based personality makes an AI agent more fun, intriguing, and approachable, so the user feels much more comfortable opening up and engaging in a two-way exchange of information. The agent’s personality also provides the unique opportunity to reflect and personify the brand and product that it’s embedded into – we like to think of it as a character in a movie or play. The agent is like an actor that was selected to play a specific role in a movie, serving its unique purpose in its environment (or “scene”) accordingly.
As such, the agent for a car would have a completely different personality and way of communicating than an agent designed for work or home. Nevertheless, its personality should be as distinct and recognizable as possible. We like to describe ElliQ’s personality as a combination of a Labrador and Sam from Lord of the Rings – highly knowledgeable, yet loyal and playful. Discovering the agent’s personality over time helps the user open up and get to know the agent, and the enticement keeps the user coming back for more.
Sometimes an AI may interrupt a conversation or some other event. Is ElliQ programmed to ask for forgiveness? If yes, how is this achieved without further annoying the end user?
ElliQ’s multi modality allows her to express her mistakes. For example, she can bow down her head to signal that she’s apologetic. Overall in designing an AI agent, it is very important to create fail and repair mechanisms that will allow the agent to sophisticatedly apologize for disturbing or not understanding.
One of the interesting things you stated is that ‘users don’t want to anticipate what she (ElliQ) will do next’. Do you believe that people yearn for the type of unpredictability that is normally associated with humans?
We think that users yearn for many elements of human interaction, including quirks like unpredictability, spontaneity, and fun. ElliQ achieves this with unprompted questions, suggestions, and recommendations for activities in the digital world and the physical world. To increase the feeling of having a machine, ElliQ is designed to not repeat herself but to surprise the users with her interactions. This is all to invoke the feeling of being lifelike and to allow the creation of a real bond.
Users will learn to expect ElliQ to anticipate their needs. Do you believe that some type of resentment towards the AI can begin to brew if this anticipation remains unmet?
Yes, I think users will be disappointed if AI around them doesn’t act on their behalf or expectations are unmet. This is why transparency is important when designing such agents – so the user really understands the boundaries of what is possible.
You also stated that users do not see ElliQ as something that is alive, instead they see it as an in-between, something not alive or a machine, but something closer to a presence or a companion. What does this tell us about the human condition, and how should people who design AI systems take this into consideration?
This tells us that as humans, we need interaction and to build relationships in order to feel connected. ElliQ won’t replace other humans, but she can help evoke similar feelings of companionship and help users not feel so lonely or like they’re just talking to a box. She’s much more than an assistant or a machine; she’s a companion with a personality. She’s an emotive entity that users feel as if she lifelike but they truly comprehend that she is actually a device.
Intuition Robotics also has a second product which is an in-car digital companion. Could you give us some details about this product?
In 2019, Toyota Research Institute (TRI) selected Intuition Robotics to collaborate on an in-car AI agent. Through this collaboration, we’re helping TRI create an experience in which the car will engage drivers and passengers in a proactive and personalized way. The experience is powered by our cognitive AI platform, Q. It’s an in-cabin digital companion that creates a unique, personalized experience and aims to accelerate consumer’s trust with autonomy in cars and create a much more engaging and seamless in-car experience.
Thank you for the interview, I really enjoyed learning more about your company and how elliQ can be such a powerful solution for an elderly population that technology often ignores. Readers who wish to learn more should visit Intuition Robotics, or visit ElliQ.
Google’s AI teaches robots how to move by watching dogs
Even some of the most advanced robots today still move in somewhat clunky, jerky ways. In order to get robots to move in more lifelike, fluid ways, researchers at Google have developed an AI system that is capable of learning from the motions of real animals. The Google research team published a preprint paper that detailed their approach late last week. In the paper and an accompanying blog post, the research team describes the rationale behind the system. The authors of the paper believe that endowing robots with more natural movement could help them accomplish real-world tasks that require precise movement, such as delivering items between different levels of a building.
As VentureBeat reported, the research team utilized reinforcement learning to train their robots. The researchers began by collecting clips of real animals moving and using reinforcement learning (RL) techniques to push the robots towards imitating the movements of the animals in the video clips. In this case, the researchers trained the robots on clips of a dog, designed in a physics simulator, instructing a four-legged Unitree Laikago robot to imitate the dog’s movements. After the robot was trained it was capable of accomplishing complex motions like hopping, turning, and walking swiftly, at a speed of around 2.6 miles per hour.
The training data consisted of approximately 200 million samples of dogs in motion, tracked in a physics simulation. The different motions were then run through reward functions and policies that the agents learned with. After the policies were created in the simulation, they were transferred to the real world using a technique called latent space adaptation. Because the physics simulators used to train the robots could only approximate certain aspects of real-world motion, the researchers randomly applied various perturbations to the simulation, intended to simulate operation under different conditions.
According to the research team, they were able to adapt the simulation policies to the real-world robots utilizing just eight minutes of data gathered from across 50 different trials. The researchers managed to demonstrate that the real-world robots were able to imitate a variety of different, specific motions like trotting, turning around, hopping, and pacing. They were even able to imitate animations created by animation artists, such as a combination hop and turn.
The researchers summarize the findings in the paper:
“We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire [of] behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment.”
The control policies used during the reinforcement learning process had their limitations. Because of constraints imposed by the hardware and algorithms, there were a few things the robots simply couldn’t do. They weren’t able to run or make large jumps, for instance. The learned policies also didn’t exhibit as much stability when compared with movements that were manually designed. The research team wants to take the work farther by making the controllers more robust and capable of learning from different types of data. Ideally, future versions of the framework will be able to learn from video data.
DeepMind and Google Brain Aim Create Methods to Improve Efficiency of Reinforcement Learning
Reinforcement learning systems can be powerful and robust, able to carry out extremely complex tasks through thousands of iterations of training. While reinforcement learning algorithms are capable of enabling sophisticated and occasionally surprising behavior, they take a long time to train and require vast amounts of data. These factors make reinforcement learning techniques rather inefficient, and recently research teams from Alphabet DeepMind and Google Brain have endeavored to find more efficient methods of creating reinforcement learning systems.
As reported by VentureBeat, the combined research group recently proposed methods of making reinforcement learning training more efficient. One of the proposed improvements was an algorithm dubbed Adaptive Behavior Policy Sharing (ABPS), while the other was a framework called Universal Value Function Approximators (UVFA). ABPS lets pools of AI agents share their adaptively selected experiences, while UVFA lets those AI simultaneously investigate directed exploration policies.
ABPS is intended to expedite the customization of hyperparameters when training a model. ABPS makes finding the optimal hyperparameters quicker by allowing several different agents with different hyperparameters to share their behavior policy experiences. To be more precise, ABPS lets reinforcement learning agents select actions from those actions that a policy has deemed okay and afterward it’s granted a reward and observation based on the following state.
AI reinforcement agents are trained with various combinations of possible hyperparameters, like decay rate and learning rate. When training a model, the goal is that the model converges on the combination of hyperparameters that gives it the best performance, and in this case those that also improve data efficiency. The efficiency is increased by training many agents at one time and choosing the behavior of only one agent to be deployed during the next time step. The policy that the target agent has is used to sample actions. The transitions are then logged within a shared space, and this space is constantly evaluated so that policy selection doesn’t have to occur as often. At the end of the training, an ensemble of agents is chosen and the top performing agents are selected to undergo final deployment.
In terms of UVFA, it attempts to deal with one of the common problems of reinforcement learning, that weakly reinforced agents often don’t learn tasks. UVFA attempts to solve the issue by having the agent learn a separate set of exploitation and exploration policies at the same time. Separating the tasks creates a framework that allows the exploratory policies to keep exploring the environment while the exploitation policies continue to try and maximize the reward for the current task. The exploratory policies of UVFA serve as a baseline architecture that will continue to improve even if there are no natural rewards being found. In such a condition, a function which corresponds to intrinsic rewards is approximated, which pushes the agents to explore all states in an environment, even if they often return to familiar states.
As VentureBeat explained, when the UVFA framework is in play, the intrinsic rewards of the system are given directly to the agent as inputs. The agent then keeps track of a representation of all inputs (such as rewards, action, and state) during a given episode. The result is that the reward is preserved over time and the agent’s policy is at least somewhat informed by it at all times.
This is accomplished with the utilization of an “episodic novelty” and a “life-long novelty” module. The function of the first module is to hold the current, episodic memory and map the current findings to the previously mentioned representation, letting the agent determine an intrinsic episodic reward for every step of training. Afterward, the state-linked with the current observation is added into memory. Meanwhile, the life-long novelty module is responsible for influencing how often the agent explores over the course of many episodes.
According to the Alphabet/Google teams, the new training techniques have already demonstrated the potential for substantial improvement while training a reinforcement learning system. UVFA was able to double the performance of some of the base agents that played various Atari games. Meanwhile, ABPS was able to increase performance on some of the same Atari games, decreasing variance amongst the top performing agents by approximately 25%. The UVFA trained algorithm was able to achieve a high score in Pitfall by itself, lacking any engineered features of human demos.
- How Quantum Mechanics will Change the Tech Industry
- Jim McGowan, head of product at ElectrifAi – Interview Series
- NASA to Use Machine Learning to Enhance Search for Alien Life on Mars
- New Study Attempts to Improve Hate Speech Detection Algorithms
- Pentagon’s Joint AI Center (JAIC) Testing First Lethal AI Projects