Fake images, often produced using sophisticated software like Photoshop or the GIMP, were around long before so-called “fake news” became an issue. They are part and parcel of the Internet’s fast-moving creative culture, and a trap for anyone that passes on striking images without checking their provenance or plausibility. Until now, this kind of artful manipulation has been limited to the visual sphere. But a new generation of tools will soon allow entire voice patterns to be cloned from relatively small samples with increasing fidelity such that it can be hard to spot they are fake. For example, in November last year, the Verge wrote about Adobe’s Project VoCo:
“When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative,” reads an official Adobe statement. “We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words.”
Since then, things have moved on apace. Last week, the Economist wrote about the French company CandyVoice:
Utter 160 or so French or English phrases into a phone app developed by CandyVoice, a new Parisian company, and the app’s software will reassemble tiny slices of those sounds to enunciate, in a plausible simulacrum of your own dulcet tones, whatever typed words it is subsequently fed. In effect, the app has cloned your voice.
The Montreal company Lyrebird has a page full of fascinating demos of its own voice cloning technology, which requires even less in the way of samples:
Lyrebird will offer an API to copy the voice of anyone. It will need as little as one minute of audio recording of a speaker to compute a unique key defining her/his voice. This key will then allow to generate anything from its corresponding voice. The API will be robust enough to learn from noisy recordings. The following sample illustrates this feature, the samples are not cherry-picked.
Please note that those are artificial voices and they do not convey the opinions of Donald Trump, Barack Obama and Hillary Clinton.
As Techdirt readers will have spotted, this technical development raises big ethical questions, articulated here by Lyrebird:
Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.
The Economist quantifies the problem. According to its article, voice-biometrics software similar to the kind deployed by many banks to block unauthorized access to accounts was fooled 80% of the time in tests using the new technology. Humans didn’t do much better, only spotting that a voice had been cloned 50% of the time. And remember, these figures are for today’s technologies. As algorithms improve, and Moore’s Law kicks in, it’s not unreasonable to think that it will become almost impossible to tell by ear whether the voice you hear is the real thing, or a version generated using the latest cloning technology.