Researchers from the University of Maryland were able to create a set of questions that are easy for people to answer but hard for some of the best computer answering systems that exist today. The team generated the questions through a human-computer collaboration, and they were able to create a database of more than 1,200 words. If a computer system is able to learn and master these questions, it will have the best understanding of human language among any computer systems that currently exist.
The work was published in an article in the journal Transactions of the Association for Computational Linguistics.
Jordan Boyd-Gaber, an associate professor of computer science at UMD and senior author of the paper, spoke about the new developments.
“Most question-answering computer systems don’t explain why they answer the way they do, but our work helps us see what computers actually understand,” he said. “In addition, we have produced a dataset to test on computers that will reveal if a computer language system is actually reading and doing the same sorts of processing that humans are able to do.”
As of right now, questions for these programs and systems are generated by human authors or computers. The problem is that when humans are the ones generating questions, they aren’t aware of all of the different elements of a question that are confusing to computers. Computer systems on the other hand, they use formulas, write fill-in-the blank questions, or make mistakes all which can generate nonsense.
In order to get cooperation between humans and computers that allowed them to generate the questions, Boyd-Garber and the team of researchers created a special computer interface. According to them, it is able to tell what a computer is “thinking” while a human types out a question. The writer is then able to edit and change the question based on the computer’s weaknesses. This is able to generate confusion for the computer.
As the writer types the question, the computer’s guesses are put in a ranked order. The words that are responsible for the computer’s guesses are highlighted.
The system can correctly answer a question and the interface will highlight the words or phrases that led to the answer. With that info, the author is then able to edit the question to make it more difficult for the computer, but the question will still have the same meaning. While the computer will eventually be confused, expert humans would still be able to answer.
When the humans and computers worked together, they were able to develop 1,213 computer questions that the computer was not able to answer. The researchers tested the questions in a competition between human players and the computers. The human players included high school trivia teams and “Jeopardy!” champions. The weakest human team was able to defeat the strongest computer system.
Shi Feng, a computer science graduate student from UMD and co-author of the paper spoke about the new research.
“For three or four years, people have been aware that computer question-answering systems are very brittle and can be fooled very easily,” she said. “But this is the first paper we are aware of that actually uses a machine to help humans break the model itself.”
The questions used were able to reveal six different language phenomena that confuse computers. There are two different categories. The first one is linguistic phenomena that includes paraphrasing, distracting language, and unexpected contexts. The second is reasoning skills and includes logic and calculation, mental triangulation of elements in a question, and putting together multiple steps to form a conclusion.
“Humans are able to generalize more and to see deeper connections,” Boyd-Garber said. “They don’t have the limitless memory of computers, but they still have an advantage in being able to see the forest for the trees. Cataloguing the problems computers have helps us understand the issues we need to address, so that we can actually get computers to begin to see the forest through the trees and answer questions in the way humans do.”
This research lays the foundation for computer systems to eventually master the human language. It will undoubtedly keep getting developed and improved.
“This paper is laying out a research agenda for the next several years so that we can actually get computers to answer questions well,” Boyd-Garber said.
Multimodal Learning Is Becoming Prominent Among AI Developers
The key concept lies in the fact that “data sets are fundamental building blocks of AI systems,” and that without data sets, “models can’t learn the relationships that inform their predictions.” The ABI report predicts that “while the total installed base of AI devices will grow from 2.69 billion in 2019 to 4.47 billion in 2024, comparatively few will be interoperable in the short term.”
This could represent a considerable waste of time, energy and resources, “rather than combine the gigabytes to petabytes of data flowing through them into a single AI model or framework, they’ll work independently and heterogeneously to make sense of the data they’re fed.”
To overcome this, ABI proposes multimodal learning, a methodology that could consolidate data “from various sensors and inputs into a single system. Multimodal learning can carry complementary information or trends, which often only become evident when they’re all included in the learning process.”
VB presents a viable example that considers images and text captions. “ If different words are paired with similar images, these words are likely used to describe the same things or objects. Conversely, if some words appear next to different images, this implies these images represent the same object. Given this, it should be possible for an AI model to predict image objects from text descriptions, and indeed, a body of academic literature has proven this to be the case.”
Despite the possible advantages, ABI notes that even tech giants like IBM, Microsoft, Amazon, and Google continue to focus predominantly on unimodal systems. One of the reasons being the challenges such a switch would represent.
Still, the ABI researchers anticipate that “the total number of devices shipped will grow from 3.94 million in 2017 to 514.12 million in 2023, spurred by adoption in the robotics, consumer, health care, and media and entertainment segments.” Among the examples of companies that are already implementing multimodal learning they cite Waymo which is using such approaches to build “ hyper-aware self-driving vehicles,” and Intel Labs, where the company’s engineering team is “investigating techniques for sensor data collation in real-world environments.”
Intel Labs principal engineer Omesh Tickoo explained to VB that “What we did is, using techniques to figure out context such as the time of day, we built a system that tells you when a sensor’s data is not of the highest quality. Given that confidence value, it weighs different sensors against each at different intervals and chooses the right mix to give us the answer we’re looking for.”
VB notes that unimodal learning will remain predominant where it is highly effective – in applications like image recognition and natural language processing. At the same time it predicts that “as electronics become cheaper and compute more scalable, multimodal learning will likely only rise in prominence.”
Google Adds Two New Artificial Intelligence Features To Its Applications
As The Verge and CNET report, Google is adding two new AI features to its applications. The first is the Smart Compose feature that will help Google Docs users, while the second is the capability for the users to buy movie tickets through its Duplex booking system.
With Smart Compose, when it becomes fully available, the users will be able to access “AI-powered writing suggestions outside of their inbox.” At the moment, “only domain administrators can sign up for the beta.”
This new feature will use Google’s machine learning models which will study the user’s “past writing to personalize its prompts (in Gmail you can turn this feature off in settings).” Theoretically, this would mean that Smart Compose is supposed to give writing suggestions based on the writing style of the user.
The Verge suggests that “Smart Compose to Google Docs could be a big step up for the tool, challenging its AI autosuggestions with a larger range of writing styles.” The new tool could be applied to all documents that can be created with the application – “from schoolwork to corporate planning documents,” to first drafts of a novel.
In the beginning, Google will limit Smart Compose’s reach and will target businesses only. As mentioned, Smart Compose for Docs is only available in beta, only in English, and only domain administrators can volunteer to test it. (You can sign up for it here.)
Another feature that Google announced on November 21, is Duplex on the Web, a tool that can be used as a booking service that lets users buy movie tickets easily.
As CNET notes, the “ service is available on Android phones. To use it, you’d ask the Assistant — Google’s digital helper software akin to Amazon’s Alexa and Apple’s Siri — to look up showtimes for a particular movie in your area. The software then opens up Google’s Chrome browser and finds the tickets. “
To offer the service Google partnered with “ 70 movie theater and ticket companies, including AMC, Fandango and Odeon.” The company plans to expand the booking system to car rental reservations next.
The AI software itself included in the tool is “patterned after the human speech, using verbal tics like ‘uh’ and ‘um.’ It speaks with the cadence of a real person, pausing before responding and elongating certain words as though it’s buying time to think.” Duplex actually premiered last year and offered to book for restaurants and hair salons. “Google later said it would build in disclosures so people would know they were talking to automated software.“
As explained, in the new Duplex version for ordering movie tickets works as follows: “Once you’ve asked the Assistant for movie tickets, the software opens up a ticketing website in Chrome and starts filling in fields. The system enters information in the form by using data culled from your calendar, Gmail inbox and Chrome autofill (like your credit card and login information).
Throughout the process, you see a progress bar, like you’d see if you were downloading a file. Whenever the system needs more information, like a price or seat selection, the process pauses and prompts you to make a selection. When it’s done, you tap to confirm the booking or payment.”
New AI Powered Tool Enables Video Editing From Themed Text Documents
A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.
Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.
While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.
Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.
The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.
According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.
“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.
The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.