A group of researchers on the UW has developed a man-made intelligence (AI) framework named Audeo that generates music from silent video recordings of piano performances.
The research was carried out by doctoral college students Kun Su and Xiulong Liu, and assistant professor of utilized arithmetic and electrical and laptop engineering Eli Shlizerman. Audeo’s produced audio samples have been detectable by the music identification software program SoundHound 85.6% of the time, in comparison with a 92.6% detection charge of the unique audio.
To breed the audio at this excessive degree of precision, Audeo processes the visible enter in three phases, as outlined in the paper.
First, a neural community processes a number of consecutive video frames to detect which keys are pressed within the center body, repeating this at some stage in the video.
The second stage corrects errors from the primary stage and fills in different particulars, such because the eventual decay of the sound produced when a secret’s sustained for a very long time.
“Music is far sooner and extra high quality than the visible enter, which implies that there are various particulars in between the frames that we have to guess,” Shlizerman stated.
As soon as this illustration of the audio is full, musical instrument digital interface (MIDI) synthesizers convert the info into music.
Whereas Audeo was primarily examined on movies recorded by pianist Paul Barton, Shlizerman stated the group plans to make the system work “within the wild,” that means it could be adaptable to some other pianist and even different devices. This requires coaching the system on a bigger dataset.
Moreover, Shlizerman stated the group hopes to make Audeo quick sufficient for use in actual time. He proposed the concept of a digital piano, the place somebody with out entry to a piano might simulate the expertise through the use of the know-how to translate their hand actions into the sound they’d produce on an actual keyboard.
“We aren’t there but, however I really feel like that is actually thrilling — this can be a new expertise that we might have within the digital world and a brand new strategy to work together with music,” Shlizerman stated.
Presently, the AI-generated music sounds expectedly extra robotic than a human’s efficiency, however the know-how’s capability to seize the essence of the piece — the sequence of keys pressed — signifies a step ahead in uniting the visible and audio streams, in accordance with Shlizerman.
Shlizerman identified that skilled movie editors method video and audio modifying as separate processes, regardless of the 2 modalities being deeply interrelated. He urged that AI know-how may very well be used for producing music to accompany visible scenes, like including soundtracks to a movie.
Although his group is particularly specializing in synthesizing music, Shlizerman stated that Audeo might encourage related applied sciences for producing speech or different forms of audio.
“The visual-audio house is sort of new, when it comes to computational capabilities,” Shlizerman stated. “Some automated instruments are showing in each areas individually, however while you attempt to mix them, it’s a a lot tougher process … we’re displaying that it’s potential, and the following step will likely be all of the cool purposes that come out.”
Attain contributing author Anna Wang at email@example.com. Twitter: @annaw_ng
Like what you’re studying? Assist high-quality pupil journalism by donating here.