A neuroscience graduate student at Northwestern University recently created a text-based video game where the text the user reads is entirely generated by AI. The AI responsible for generating the text is based on the GPT-2 algorithm created by OpenAI earlier this year.
Many early computer games had no graphics, instead, they use a text-based interface. These text adventure games would take in user commands and deliver a series of pre-programmed responses. The user would have to use text commands to solve puzzles and advance farther in the game, a task which could prove challenging depending on the sophistication of the text parser. Early text-based adventure games had a very limited range of potential commands the game could respond to.
As reported by ZME Science, Nathan Whitemore, a neuroscience graduate from Northwestern University, has revitalized this game concept, using AI algorithms to generate responses in real-time, as opposed to pre-programmed responses. Whitmore was apparently inspired to create the project by a Mind Game that appeared in the Sci-Fi novel Ender’s Game, which responded to user input and reformed the game world around the user.
The algorithm that drives the text-based adventure game is the GPT2 algorithm, which was created by OpenAI. The predictive text algorithm was trained on a text dataset, dubbed WebText, which was over 40 GB in size and pulled from Reddit links. The result was an extremely effective predictive text algorithm that could generate shockingly realistic and natural-sounding paragraphs, achieving state-of-the-art performance on a number of different language tests. The OpenAI algorithm was apparently so effective at generating fake news stories that OpenAI was hesitant to release the algorithm to the public, fearing its misuse. Thankfully, Whitmore has used the algorithm for something much more benign then making fake news articles. ‘
Whitmore explained to Digital Trends that in order to produce the game he had to modify the GPT-2 output by training it extensively on a number of adventure game scripts, using various algorithms to adjust the parameters of GPT-2 until the text output by the algorithm resembled the text of adventure games.
What’s particularly interesting about the game is that it is genuinely creative. The user can input almost any text that they can think of, regardless of the particular setting or context of the game, and the game will try to adapt and determine what should happen next. Whitemore explained that you can enter almost any random prompt you would like because the model has enough “ common sense” to adapt to the input.
Whitemore’s custom GPT2 Algorithm does have some limitations. It easily forgets things the user has already told it, having a short “memory”. In other words, it doesn’t preserve the context of the situation with regards to commands, as a traditional pre-programmed text adventure game would, and of course, like many passages of text generated by AI, the generated text doesn’t always make sense.
However, the program does markedly well at simulating the structure and style of text adventure games, providing the user with descriptions of the setting and even providing them with various options they can select to interact with the environment it has created.
“I think it’s creative in a very basic way, like how a person playing ‘Apples to Apples’ is creative,” Whitmore explained. “It’s taking things from old adventure games and rearranging them into something that’s new and interesting and different every time. But it’s not actually generating an overall plot or overarching idea. There are a lot of different kinds of creativity and I think it’s doing one: Generating novel environments, but not the other kinds: Figuring out an intriguing plot for a game.”
Whitmore’s project also seems to confirm that the GPT-2 algorithms are robust enough to be used for a wide variety of other purposes outside of generating text intended only to be read. Whitemore demonstrates the algorithms can be used in a system that enables user responses and feedback, and it will be interesting to see what other responsive applications of GPT-2 will surface in the future.
Multimodal Learning Is Becoming Prominent Among AI Developers
The key concept lies in the fact that “data sets are fundamental building blocks of AI systems,” and that without data sets, “models can’t learn the relationships that inform their predictions.” The ABI report predicts that “while the total installed base of AI devices will grow from 2.69 billion in 2019 to 4.47 billion in 2024, comparatively few will be interoperable in the short term.”
This could represent a considerable waste of time, energy and resources, “rather than combine the gigabytes to petabytes of data flowing through them into a single AI model or framework, they’ll work independently and heterogeneously to make sense of the data they’re fed.”
To overcome this, ABI proposes multimodal learning, a methodology that could consolidate data “from various sensors and inputs into a single system. Multimodal learning can carry complementary information or trends, which often only become evident when they’re all included in the learning process.”
VB presents a viable example that considers images and text captions. “ If different words are paired with similar images, these words are likely used to describe the same things or objects. Conversely, if some words appear next to different images, this implies these images represent the same object. Given this, it should be possible for an AI model to predict image objects from text descriptions, and indeed, a body of academic literature has proven this to be the case.”
Despite the possible advantages, ABI notes that even tech giants like IBM, Microsoft, Amazon, and Google continue to focus predominantly on unimodal systems. One of the reasons being the challenges such a switch would represent.
Still, the ABI researchers anticipate that “the total number of devices shipped will grow from 3.94 million in 2017 to 514.12 million in 2023, spurred by adoption in the robotics, consumer, health care, and media and entertainment segments.” Among the examples of companies that are already implementing multimodal learning they cite Waymo which is using such approaches to build “ hyper-aware self-driving vehicles,” and Intel Labs, where the company’s engineering team is “investigating techniques for sensor data collation in real-world environments.”
Intel Labs principal engineer Omesh Tickoo explained to VB that “What we did is, using techniques to figure out context such as the time of day, we built a system that tells you when a sensor’s data is not of the highest quality. Given that confidence value, it weighs different sensors against each at different intervals and chooses the right mix to give us the answer we’re looking for.”
VB notes that unimodal learning will remain predominant where it is highly effective – in applications like image recognition and natural language processing. At the same time it predicts that “as electronics become cheaper and compute more scalable, multimodal learning will likely only rise in prominence.”
Google Adds Two New Artificial Intelligence Features To Its Applications
As The Verge and CNET report, Google is adding two new AI features to its applications. The first is the Smart Compose feature that will help Google Docs users, while the second is the capability for the users to buy movie tickets through its Duplex booking system.
With Smart Compose, when it becomes fully available, the users will be able to access “AI-powered writing suggestions outside of their inbox.” At the moment, “only domain administrators can sign up for the beta.”
This new feature will use Google’s machine learning models which will study the user’s “past writing to personalize its prompts (in Gmail you can turn this feature off in settings).” Theoretically, this would mean that Smart Compose is supposed to give writing suggestions based on the writing style of the user.
The Verge suggests that “Smart Compose to Google Docs could be a big step up for the tool, challenging its AI autosuggestions with a larger range of writing styles.” The new tool could be applied to all documents that can be created with the application – “from schoolwork to corporate planning documents,” to first drafts of a novel.
In the beginning, Google will limit Smart Compose’s reach and will target businesses only. As mentioned, Smart Compose for Docs is only available in beta, only in English, and only domain administrators can volunteer to test it. (You can sign up for it here.)
Another feature that Google announced on November 21, is Duplex on the Web, a tool that can be used as a booking service that lets users buy movie tickets easily.
As CNET notes, the “ service is available on Android phones. To use it, you’d ask the Assistant — Google’s digital helper software akin to Amazon’s Alexa and Apple’s Siri — to look up showtimes for a particular movie in your area. The software then opens up Google’s Chrome browser and finds the tickets. “
To offer the service Google partnered with “ 70 movie theater and ticket companies, including AMC, Fandango and Odeon.” The company plans to expand the booking system to car rental reservations next.
The AI software itself included in the tool is “patterned after the human speech, using verbal tics like ‘uh’ and ‘um.’ It speaks with the cadence of a real person, pausing before responding and elongating certain words as though it’s buying time to think.” Duplex actually premiered last year and offered to book for restaurants and hair salons. “Google later said it would build in disclosures so people would know they were talking to automated software.“
As explained, in the new Duplex version for ordering movie tickets works as follows: “Once you’ve asked the Assistant for movie tickets, the software opens up a ticketing website in Chrome and starts filling in fields. The system enters information in the form by using data culled from your calendar, Gmail inbox and Chrome autofill (like your credit card and login information).
Throughout the process, you see a progress bar, like you’d see if you were downloading a file. Whenever the system needs more information, like a price or seat selection, the process pauses and prompts you to make a selection. When it’s done, you tap to confirm the booking or payment.”
New AI Powered Tool Enables Video Editing From Themed Text Documents
A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.
Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.
While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.
Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.
The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.
According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.
“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.
The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.