As impressive and useful as virtual assistants like Siri, Alexa, and Google Assistant are, their conversational skills are typically limited to receiving certain commands and delivering pre-defined responses. Companies like Google and Amazon have been pursuing methods of AI training and development that can make AI chatbots more robust and flexible, able to carry on conversations with users in a much more natural way. As reported by DigitalTrends, Google has recently published a paper demonstrating the capabilities of its new chatbot, dubbed “Meena”. According to a blog post from the researchers, Meena can engage in conversation with its users on just about any topic.
Meena is an open-domain chatbot, meaning that it responds to the context of the conversation so far and adapts to inputs in order to deliver more natural responses. Most other chatbots are closed-domain, which means that their responses are themed around certain ideas and limited to accomplishing specific tasks.
According to Google’s report, Meena’s flexibility was the result of a massive training dataset. Meena was trained on around 40 billion words pulled from social media conversations and filtered for the most relevant and representative words. Google aimed to deal with some of the problems that are found in most voice assistants, such as an ability to handle topics and commands that unfold over multiple turns in the conversation, with the user providing additional inputs after the bot has responded to one input. This means that man chatbots are unable to prompt the user for clarification and when there is a query that can’t be interpreted they often just default to web results.
In order to deal with this particular problem, Google’s researchers enabled its algorithms to keep track of the context of the conversation, meaning that it can generate specific answers. The model used an encoder that processes what has already been said in the conversation and a decoder that creates a response based on the context. The model was trained on specific and non-specific data. Specific data is words that are closely related to the proceeding statement. As the Google post explained:
“For example, if A says, ‘I love tennis,’ and B responds, ‘That’s nice,’ then the utterance should be marked, ‘not specific’. That reply could be used in dozens of different contexts. But if B responds, ‘Me too, I can’t get enough of Roger Federer!’, then it is marked as ‘specific’ since it relates closely to what is being discussed.
The data that was used to train the model consisted of seven “turns” in the conversation. During training, the model had 2.6 billion parameters which examined 341 GB of text data for patterns, a dataset around 8.5 times larger than the dataset used to train the GPT-2 model created by OpenAI.
Google reported how Meena performed at the Sensibleness and Specificity Average (SSA) metric. The SSA is a metric designed by Google researchers and it’s intended to quantify the ability of a conversational entity to reply with specific, relevant responses as a conversation goes on.
SSA scores are calculated by testing a model against a fixed number of prompts, and the number of sensible responses that the model gives is tracked. The model’s score is derived based on the percentage of sensible/specific responses the model was able to give with respect to the prompts. Generic responses are penalized. According to Google, an average person scores about 86% on the SSA, while Meena was able to score a 79%. Another famous AI model, an agent created by Pandora Bots, won the Loebner Prize in recognition of the fact that their AI bots achieved sophisticated human-like communication. The Pandora Bots agent achieved approximately 56% in the SSA test.
Microsoft and Amazon are also trying to make more flexible and natural chatbots. Microsoft has been attempting to create multiturn dialogue in chatbots for two years, acquiring Semantic Machines, an AI startup, to improve Cortana. Amazon recently ran the Alexa Prize challenge, which prompted participants to design a bot capable of conversing for approximately 20 minutes.