Categories: Technology

Impressive Artificial Intelligence program that recreates faces from audio

Speech2Face is a study that showed that it is possible to know what a person’s face looks like with just a small fragment of their voice

Technology continues to advance at a breakneck pace, drawing on diverse fields to investigate new capabilities and features. One of them is the ability to “reconstruct” a person’s face from a voice fragment.

The Speech2Face study, which was presented in 2019 at the Vision and Pattern Recognition conference, demonstrated that Artificial Intelligence (AI) can decipher what a person looks like based on short audio segments.

According to the document, the goal of MIT Science and Research Program researchers Tae-Hyun On, Tali Dekel, Changi Kim, Inbar Mosseri, William T. Freeman, and Michael Rubinstein is to create an image with physical characteristics related to the analyzed audio rather than to identically reconstruct people’s faces.

To accomplish this, they used, designed, and trained a deep neural network that analyzed millions of YouTube videos with people talking. During training, the model learned to associate voices with faces, enabling it to generate images with physical characteristics similar to speakers, such as age, gender, and ethnicity.

Without the need to model detailed physical characteristics of the face, the training was carried out under the supervision and with the concurrence of faces and voices from Internet videos.

“The correlations between faces and voices are revealed by our reconstructions, which were obtained directly from the audio.” We numerically assess and quantify how closely our Speech2Face reconstructions from audio resemble real images of speakers’ faces.”

They explain that because this study may have sensitive aspects due to ethnicity or privacy, no specific physical aspects have been added to the recreation of faces, and that, like any other machine learning system, this will improve overtime as each use increases your library of knowledge.

While the results of the displayed tests show that Speech2Face has a high number of face-to-voice matches, it also had some flaws, such as failing to match ethnicity, age, or gender with the voice sample used.

The model is intended to present statistical correlations between facial features and voice. It should be noted that the AI was trained using YouTube videos, which do not represent a representative sample of the world’s population; for example, in some languages, it shows discrepancies with the training data.

In this regard, the study itself recommends at the end of its findings that those who decide to investigate and modernize the system take into account a larger sample of people and voices so that machine learning has a broader repertoire of matching and recreation. of expressions

The program was also able to recreate the voices in cartoons, which have a striking resemblance to the voices in the analyzed audios.

Because this technology could be used for malicious purposes, the recreation of the face only keeps the closest thing to the person and does not provide full faces, as this could be a privacy issue.

Nonetheless, I’ve been astounded by what technology can do with audio samples.

This post was last modified on February 22, 2023 7:13 pm

Geekybar

Linguist-translator by education. I have been working in the field of advertising journalism for over 10 years. For over 7 years in journalism. Half of them are as editor. My weakness is doing mini-investigations on new topics.

Recent Posts

How John Wick Directors Chad Stahelski and David Leitch Kept Action Movies Alive Without Superheroes

The John Wick series has established itself as one of the few major action series today that… Read More

49 minutes ago

Shadow and Bone Season 2: Explaining the Dramatic Final

After binge-watching Shadow and Bone season 2, many viewers have been left with a million… Read More

50 minutes ago

If Thor Was Real: Ways The World Would Be Different

The thought of Thor coming down from the heavens and saving us from catastrophes is… Read More

50 minutes ago

Chad Stahelski and David Leitch Kept Action Movies Alive Without the Superheroes

The action movie industry has been dominated by superhero franchises such as Marvel and DC… Read More

50 minutes ago

Heather Jones: From Cult To Netflix

You might be disappointed when you watch Tiller Russell's latest documentary series, Waco: American Apocalypse,… Read More

51 minutes ago

The Pope’s Exorcist : Movie Review

The Pope’s Exorcist is a horror movie released in 2023 that’s based on the life… Read More

51 minutes ago