Over the past few months, Microsoft and other companies researching machine learning challenged teams of AI developers to create an AI system that could play Minecraft and find a diamond within the game. As reported by the BBC, while AI platforms have managed to dominate chess and go, but it has struggled to master a task in Minecraft.
Microsoft’s Minecraft-based AI challenge was called MineRL, and the competition results were formally announced at the recent NeurIPS conference. The competition’s intention was to train an AI through an “imitation learning” approach. Imitation learning is a method where an AI is trained through the use of observation. Imitation learning intends to let AI systems learn actions by watching humans carries out those actions, learning through the act of observation. Imitation learning, in comparison to reinforcement learning, is a much less computationally expensive and substantially more efficient way of training an AI.
Reinforcement learning often requires many powerful computers networked together and hundreds or thousands of hours of training to become effective at a task. In contrast, an AI trained with an imitation learning method can be trained much quicker, as the AI already has a baseline of knowledge to work with courtesy of the human operators who have proceeded it.
Imitation learning has practical applications in training an AI where the AI cannot safely explore until it figures out the correct actions. Such scenarios would include the training of an autonomous vehicle as the car couldn’t be allowed to just roam around a street until it has learned desired behaviors. Using a human demonstrator’s data to train the vehicle could potentially make the process faster and safer.
The act of finding a diamond in Minecraft requires carrying out many steps in sequence, such as cutting down trees to make tools, exploring the caves that contain the diamonds, and actually finding a diamond within the cave. Despite the complexity of the task, a human player familiar with the game should be able to get a diamond in around 20 minutes.
Over 660 different AI agents were submitted to the competition, but not a single one of the AIs was able to find a diamond. The data provided to train the AI was a dataset containing over 60 million frames of gameplay collected from many human players. The locations of diamonds are randomized when an instance of the game is started, so this means that the AIs cannot simply look where the human players found the diamonds. In other words, the AIs need to form an understanding of how concepts, like making tools, using tools, exploring, and finding resources, are linked together.
Despite the fact that none of the AI agents were able to successfully find a diamond, the organization team was still pleased by the results of the competition, and that much was still learned from the experiment. The research that the AI teams conducted can help advance the AI field, finding alternatives to reinforcement learning strategies.
Reinforcement learning often gives superior performance over imitation learning, with one notable success of reinforcement learning being DeepMind’s AlphaGo. However, as previously noted, reinforcement learning requires massive computing resources, limiting its use by organizations that cannot afford computer processers at large scale.
William Guss, PhD Student at Carnegie Mellon University and head organizer of the competition, explained to the BBC that the MineRL competition was intended to investigate alternatives to computationally heaving AI. Said Guss:
“…Throwing massive compute at problems isn’t necessarily the right way for us to push the state of the art as a field… It works directly against democratising access to these reinforcement learning systems, and leaves the ability to train agents in complex environments to corporations with swathes of compute.”