Connect with us

Natural Language Processing

New AI Powered Tool Enables Video Editing From Themed Text Documents

mm

Published

 on

New AI Powered Tool Enables Video Editing From Themed Text Documents

A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.

Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.

While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.

Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.

The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool  will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.

According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.

“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.

The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.

Spread the love

Deep Learning Specialization on Coursera

Blogger and programmer with specialties in machine learning and deep learning topics. Daniel hopes to help others use the power of AI for social good.

Natural Language Processing

Multimodal Learning Is Becoming Prominent Among AI Developers

mm

Published

on

Multimodal Learning Is Becoming Prominent Among AI Developers

Venture Beat (VB) devoted one of its weekly reports to the advantages of multimodal learning in the development of artificial intelligence. Their prompt was a report by ABI Research on the matter.

The key concept lies in the fact that “data sets are fundamental building blocks of AI systems,” and that without data sets, “models can’t learn the relationships that inform their predictions.” The ABI report predicts that “while the total installed base of AI devices will grow from 2.69 billion in 2019 to 4.47 billion in 2024, comparatively few will be interoperable in the short term.”

This could represent a considerable waste of time, energy and resources, “rather than combine the gigabytes to petabytes of data flowing through them into a single AI model or framework, they’ll work independently and heterogeneously to make sense of the data they’re fed.”

To overcome this, ABI proposes multimodal learning, a methodology that could consolidate data “from various sensors and inputs into a single system. Multimodal learning can carry complementary information or trends, which often only become evident when they’re all included in the learning process.”

VB presents a viable example that considers images and text captions. “ If different words are paired with similar images, these words are likely used to describe the same things or objects. Conversely, if some words appear next to different images, this implies these images represent the same object. Given this, it should be possible for an AI model to predict image objects from text descriptions, and indeed, a body of academic literature has proven this to be the case.”

Despite the possible advantages, ABI notes that even tech giants like  IBM, Microsoft, Amazon, and Google continue to focus predominantly on unimodal systems. One of the reasons being the challenges such a switch would represent.

Still, the ABI researchers anticipate that “the total number of devices shipped will grow from 3.94 million in 2017 to 514.12 million in 2023, spurred by adoption in the robotics, consumer, health care, and media and entertainment segments.” Among the examples of companies that are already implementing multimodal learning they cite Waymo which is using such approaches to build “ hyper-aware self-driving vehicles,” and Intel Labs, where the company’s engineering team is “investigating techniques for sensor data collation in real-world environments.”

Intel Labs principal engineer Omesh Tickoo explained to VB that “What we did is, using techniques to figure out context such as the time of day, we built a system that tells you when a sensor’s data is not of the highest quality. Given that confidence value, it weighs different sensors against each at different intervals and chooses the right mix to give us the answer we’re looking for.”

VB notes that unimodal learning will remain predominant where it is highly effective – in applications like image recognition and natural language processing. At the same time it predicts that “as electronics become cheaper and compute more scalable, multimodal learning will likely only rise in prominence.”

Spread the love

Deep Learning Specialization on Coursera
Continue Reading

Natural Language Processing

Google Adds Two New Artificial Intelligence Features To Its Applications

mm

Published

on

Google Adds Two New Artificial Intelligence Features To Its Applications

As  The Verge and CNET report, Google is adding two new AI features to its applications. The first is the  Smart Compose feature that will help Google Docs users, while the second is the capability for the users to buy movie tickets through its Duplex booking system.

Smart Compose

With Smart Compose, when it becomes fully available, the users will be able to access “AI-powered writing suggestions outside of their inbox.” At the moment, “only domain administrators can sign up for the beta.”

This new feature will use Google’s machine learning models which will study the user’s “past writing to personalize its prompts (in Gmail you can turn this feature off in settings).” Theoretically, this would mean that Smart Compose is supposed to give writing suggestions based on the writing style of the user.

The Verge suggests  that “Smart Compose to Google Docs could be a big step up for the tool, challenging its AI autosuggestions with a larger range of writing styles.” The new tool could be applied to all documents that can be created with the application – “from schoolwork to corporate planning documents,” to first drafts of a novel.

In the beginning, Google will limit Smart Compose’s reach and will target businesses only. As mentioned, Smart Compose for Docs is only available in beta, only in English, and only domain administrators can volunteer to test it. (You can sign up for it here.)

Google Duplex

Another feature that Google announced on November 21, is Duplex on the Web, a tool that can be used as a booking service that lets users buy movie tickets easily.

As CNET notes, the “ service is available on Android phones. To use it, you’d ask the Assistant — Google’s digital helper software akin to Amazon’s Alexa and Apple’s Siri — to look up showtimes for a particular movie in your area. The software then opens up Google’s Chrome browser and finds the tickets. “

To offer the service Google partnered with “ 70 movie theater and ticket companies, including AMC, Fandango and Odeon.” The company plans to expand the booking system to car rental reservations next.

The AI software itself included in the tool is “patterned after the human speech, using verbal tics like ‘uh’ and ‘um.’ It speaks with the cadence of a real person, pausing before responding and elongating certain words as though it’s buying time to think.” Duplex actually premiered last year and offered to book for restaurants and hair salons. “Google later said it would build in disclosures so people would know they were talking to automated software.“

As explained, in the new Duplex version for ordering movie tickets works as follows: “Once you’ve asked the Assistant for movie tickets, the software opens up a ticketing website in Chrome and starts filling in fields. The system enters information in the form by using data culled from your calendar, Gmail inbox and Chrome autofill (like your credit card and login information). 

Throughout the process, you see a progress bar, like you’d see if you were downloading a file. Whenever the system needs more information, like a price or seat selection, the process pauses and prompts you to make a selection. When it’s done, you tap to confirm the booking or payment.”

Spread the love

Deep Learning Specialization on Coursera
Continue Reading

AI 101

What is Natural Language Processing?

mm

Published

on

What is Natural Language Processing?

Natural Language Processing (NLP) is the study and application of techniques and tools that enable computers to process, analyze, interpret, and reason about human language. NLP is an interdisciplinary field and it combines techniques established in fields like linguistics and computer science. These techniques are used in concert with AI to create chatbots and digital assistants like Google Assistant and Amazon’s Alexa.

Let’s take some time to explore the rationale behind Natural Language Processing, some of the techniques used in NLP, and some common uses cases for NLP.

Why Is Natural Language Processing Important?

In order for computers to interpret human language, they must be converted into a form that a computer can manipulate. However, this isn’t as simple as converting text data into numbers. In order to derive meaning from human language, patterns have to be extracted from the hundreds or thousands of words that make up a text document. This is no easy task. There are few hard and fast rules that can be applied to the interpretation of human language. For instance, the exact same set of words can mean different things depending on the context. Human language is a complex and often ambiguous thing, and a statement can be uttered with sincerity or sarcasm.

Despite this, there are some general guidelines that can be used when interpreting words and characters, such as the character “s” being used to denote that an item is plural. These general guidelines have to be used in concert with each other to extract meaning from the text, to create features that a machine learning algorithm can interpret.

Natural Language Processing involves the application of various algorithms capable of taking unstructured data and converting it into structured data. If these algorithms are applied in the wrong manner, the computer will often fail to derive the correct meaning from the text. This can often be seen in the translation of text between languages, where the precise meaning of the sentence is often lost. While machine translation has improved substantially over the past few years, machine translation errors still occur frequently.

Natural Language Processing Techniques

What is Natural Language Processing?

Photo: Tamur via WikiMedia Commons, Public Domain (https://commons.wikimedia.org/wiki/File:ParseTree.svg)

Many of the techniques that are used in natural language processing can be placed in one of two categories: syntax or semantics. Syntax techniques are those that deal with the ordering of words, while semantic techniques are the techniques that involve the meaning of words.

Syntax NLP Techniques

Examples of syntax include:

  • Lemmatization
  • Morphological Segmentation
  • Part-of-Speech Tagging
  • Parsing
  • Sentence Breaking
  • Stemming
  • Word Segmentation

Lemmatization refers to distilling the different inflections of a word down to a single form. Lemmatization takes things like tenses and plurals and simplifies them, for example, “feet” might become “foot” and “stripes” may become “stripe”.  This simplified word form makes it easier for an algorithm to interpret the words in a document.

Morphological segmentation is the process of dividing words into morphemes or the base units of a word. These units are things like free morphemes (which can stand alone as words) and prefixes or suffixes.

Part-of-speech tagging is simply the process of identifying which part of speech every word in an input document is.

Parsing refers to analyzing all the words in a sentence and correlating them with their formal grammar labels or doing grammatical analysis for all the words.

Sentence breaking, or sentence boundary segmentation, refers to deciding where a sentence begins and ends.

Stemming is the process of reducing words down to the root form of the word. For instance, connected, connection, and connections would all be stemmed to “connect”.

Word Segmentation is the process of dividing large pieces of text down into small units, which can be words or stemmed/lemmatized units.

Semantic NLP Techniques

Semantic NLP techniques include techniques like:

  • Named Entity Recognition
  • Natural Language Generation
  • Word-Sense disambiguation

Named entity recognition involves tagging certain text portions that can be placed into one of a number of different preset groups. Pre-defined categories include things like dates, cities, places, companies, and individuals.

Natural language generation is the process of using databases to transform structured data into natural language. For instance, statistics about the weather, like temperature and wind speed could be summarized with natural language.

Word-sense disambiguation is the process of assigning meaning to words within a text based on the context the words appear in.

Deep Learning Models For Natural Language Processing

Regular multilayer perceptrons are unable to handle the interpretation of sequential data, where the order of the information is important. In order to deal with the importance of order in sequential data, a type of neural network is used that preserves information from previous timesteps in the training.

Recurrent Neural Networks are types of neural networks that loop over data from previous timesteps, taking them into account when calculating the weights of the current timestep. Essentially, RNN’s have three parameters that are used during the forward training pass: a matrix based on the Previous Hidden State, a matrix based on the Current Input, and a matrix that is between the hidden state and the output. Because RNNs can take information from previous timesteps into account, they can extract relevant patterns from text data by taking earlier words in the sentence into account when interpreting the meaning of a word.

Another type of deep learning architecture used to process text data is a Long Short-Term Memory (LSTM) network. LSTM networks are similar to RNNs in structure, but owing to some differences in their architecture they tend to perform better than RNNs. They avoid a specific problem that often occurs when using RNNs called the exploding gradient problem.

These deep neural networks can be either unidirectional or bi-directional. Bi-directional networks are capable of taking not just the words that come prior to the current word into account, but the words that come after it. While this leads to higher accuracy, it is more computationally expensive.

Use Cases For Natural Language Processing

What is Natural Language Processing?

Photo: mohammed_hassan via Pixabay, Pixabay License (https://pixabay.com/illustrations/chatbot-chat-application-artificial-3589528/)

Because Natural Language Processing involves the analysis and manipulation of human languages, it has an incredibly wide range of applications. Possible applications for NLP include chatbots, digital assistants, sentiment analysis, document organization, talent recruitment, and healthcare.

Chatbots and digital assistants like Amazon’s Alexa and Google Assistant are examples of voice recognition and synthesis platforms that use NLP to interpret and respond to vocal commands. These digital assistants help people with a wide variety of tasks, letting them offload some of their cognitive tasks to another device and free up some of their brainpower for other, more important things. Instead of looking up the best route to the bank on a busy morning, we can just have our digital assistant do it.

Sentiment analysis is the use of NLP techniques to study people’s reactions and feelings to a phenomenon, as communicated by their use of language. Capturing the sentiment of a statement, like interpreting whether a review of a product is good or bad, can provide companies with substantial information regarding how their product is being received.

Automatically organizing text documents is another application of NLP. Companies like Google and Yahoo use NLP algorithms to classify email documents, putting them in the appropriate bins such as “social” or “promotions”. They also use these techniques to identify spam and prevent it from reaching your inbox.

Groups have also developed NLP techniques are being used to identify potential job hires, finding them based on relevant skills. Hiring managers are also using NLP techniques to help them sort through lists of applicants.

NLP techniques are also being used to enhance healthcare. NLP can be used to improve the detection of diseases. Health records can be analyzed and symptoms extracted by NLP algorithms, which can then be used to suggest possible diagnoses. One example of this is Amazon’s Comprehend Medical platform, which analyzes health records and extracts diseases and treatments. Healthcare applications of NLP also extend to mental health. There are apps such as WoeBot, which talks users through a variety of anxiety management techniques based in Cognitive Behavioral Therapy.

To Learn More

Recommended Natural Language Processing CoursesOffered ByDurationDifficulty


Introduction to Artificial Intelligence



IBM

9 Hours

Beginner


Natural Language Processing in TensorFlow


Deep Learning AI

9 Hours

Intermediate


An Introduction to Practical Deep Learning


Intel Software

12 Hours

Intermediate


Natural Language Processing


Higher School of Economics

34 Hours

Advanced
Spread the love

Deep Learning Specialization on Coursera
Continue Reading