A team of researchers at the University of Washington developed an artificial intelligence (AI) system called Audeo that can create audio from silent piano performances. The testing phase involved music recognition apps like SoundHound, which could correctly identify the music from Audeo around 86% of the time.
The research was presented at the NeurlPS 2020 conference on Dec. 8.
Senior author Eli Shlizerman is an assistant professor in the applied mathematics and electrical and computer engineering departments at the university.
“To create music that sounds like it could be played in a musical performance was previously believed to be impossible,” Shlizerman said. “An algorithm needs to figure out the cues, or ‘features,’ in the video frames that are related to generating music, and it needs to ‘imagine’ the sound that’s happening in between the video frames. It requires a system that is both precise and imaginative. The fact that we achieved music that sounded pretty good was a surprise.”
How Audeo Works
The Audeo system works by decoding a video and translating it into music. The first of several steps involves the AI detecting the keys pressed in each video frame, and it eventually develops a diagram. The diagram is then translated so that a music synthesizer can recognize sounds.
The next step is for the data to be cleaned up and additional information to be added. This information can include things like the pressure behind each key press and how long it lasted.
“If we attempt to synthesize music from the first step alone, we would find the quality of the music to be unsatisfactory,” Shlizerman said. “The second step is like how a teacher goes over a student composer’s music and helps enhance it.”
The system was trained and tested with YouTube videos of pianist Paul Barton, and it consisted of around 172,000 video frames of the musician playing various classical composers like Mozart. Audeo was tested with 19,000 frames of Barton playing the different music.
After being trained, Audeo generates a transcript of the music, which is then fed to a synthesizer to translate it into sound. The music sounds different depending on each synthesizer, which is the equivalent of changing the instrument setting on an electric keyboard.
Two separate synthesizers were used by the team.
“Fluidsynth makes synthesizer piano sounds that we are familiar with. These are somewhat mechanical-sounding but pretty accurate,” Shlizerman said. “We also used PerfNet, a new AI synthesizer that generates richer and more expressive music. But it also generates more noise.”
“The goal of this study was to see if artificial intelligence could generate music that was played by a pianist in a video recording — though we were not aiming to replicate Paul Barton because he is such a virtuoso,” Shlizerman continued. “We hope that our study enables novel ways to interact with music. For example, one future application is that Audeo can be extended to a virtual piano with a camera recording just a person’s hands. Also, by placing a camera on top of a real piano, Audeo could potentially assist in new ways of teaching students how to play.”
Kung Su and Ziulong Liu, doctoral students in electrical and computer engineering, were co-authors of the paper.