When it comes to audio-visual editing, mankind has come a long way. From the first “talkie” or film with sound in 1927 to manually splicing film to the CG effects of today, it’s a trade that’s constantly being improved upon. With every new advancement, filmmakers can focus less on the technical hurdles of their medium and more on the stories they want to tell.
Last Fall, Adobe sparked controversy at its annual MAX Conference when a speaker demonstrated an audio editing software that had the ability to alter and reconstruct speech. The software, called VoCo, ingests a 20-40 minute clip of a person’s speech and creates a transcript. The user can then alter the transcript by rearranging text or adding new words. By isolating independent syllables of the person’s speech, VoCo is able to reconstruct the audio recording into the content typed on screen.
The conference took place less than a week before the 2016 United States Presidential Election and talk of fake news and unfair media practices was everywhere. The technology was criticized for its threat to security and the potential for it to further undermine trust in journalism. The technology certainly does raise legitimate concern. We’ve seen the way in which Photoshop has altered people’s trust in images and the malicious way in which it’s been used online to shape stories about celebrities, politicians, news events and more. A “Photoshop for audio”, as some are calling it, could easily lend itself to propaganda and amplify the flood of “fake news” we are plagued with online.
But when used responsibly, this technology could be revolutionary to the entertainment industry. From a documentary filmmaking perspective, it could be incredibly useful in providing context when interview subjects don’t speak in full sentences or don’t introduce the topic about which they are talking. If a news anchor makes a mistake about a statistic, name or location, this can be edited so that the information is correct and up to date when it is rebroadcast. Of course, this kind of audio editing only works when the person speaking isn’t on-camera. Luckily for us, the answer could be near.
Research teams at universities across the US, including Stanford, University of South Carolina and the University of Washington, have been researching artificial intelligence technologies for facial manipulation and have been able to apply these technologies to video. A team of computer scientists at the University of Washington have successful generated video footage of Barack Obama giving a talk he actually did give, but in an entirely different context.
The team released a report in July explaining their methodology. They trained a AI neural network, a computer system modelled on the human brain and nervous system, on 17 hours of footage of the former president’s weekly addresses to generate Obama’s unique mouth shapes based off of the audio and video the computer system ingested. They were then able to retime the target video, a clip in an entirely new location, time and context, so that Obama’s mouth movement and physical expressions matched the audio from the original clip.
It’s clear that the technology is far from perfect. Even an untrained eye can see the digitization of his mouth’s movements and catch when it occasionally slips out of sync. But give this technology five years and we could have something game-changing. Combined with software like VoCo, facial manipulation technology could become invaluable to video editing. If an ad agency produces a commercial featuring a well-known celebrity and wants to broadcast it globally, by combining these softwares they could re-create the script in another language then use facial manipulation to ensure the actor’s lips sync up with the audio. News and documentary filmmaking could see huge advantages when it comes to addressing misspeaks or fumbles in presentations and providing context in on-screen interviews.
That being said, the implications of these combined technologies is daunting from a journalistic point of view. In the wrong hands, it could destroy political careers and create propaganda and ruin journalistic integrity.
Artificial intelligence is and will continue to be a controversial topic. Where do we draw the line between productive innovation and dangerous technological power? If the technology is out there, surely filmmakers have the right to access it and use it to improve their craft, but as these powerful technologies come into play over the next decade, strict regulations need to be put into play to ensure these tools are used responsibly. From a techie point of view, the future looks pretty darn awesome.
To learn more about this technology, you can listen to Radio Lab’s in-depth podcast on the topic.
Or visit http://www.graphics.stanford.edu/~niessner/thies2016face.html to read about Face2Face facial manipulation technology.