A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.
Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.
While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.
Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.
The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.
According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.
“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.
The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.