Categories: Technology

Impressive Artificial Intelligence program that recreates faces from audio

Speech2Face is a study that showed that it is possible to know what a person’s face looks like with just a small fragment of their voice

Technology continues to advance at a breakneck pace, drawing on diverse fields to investigate new capabilities and features. One of them is the ability to “reconstruct” a person’s face from a voice fragment.

The Speech2Face study, which was presented in 2019 at the Vision and Pattern Recognition conference, demonstrated that Artificial Intelligence (AI) can decipher what a person looks like based on short audio segments.

According to the document, the goal of MIT Science and Research Program researchers Tae-Hyun On, Tali Dekel, Changi Kim, Inbar Mosseri, William T. Freeman, and Michael Rubinstein is to create an image with physical characteristics related to the analyzed audio rather than to identically reconstruct people’s faces.

To accomplish this, they used, designed, and trained a deep neural network that analyzed millions of YouTube videos with people talking. During training, the model learned to associate voices with faces, enabling it to generate images with physical characteristics similar to speakers, such as age, gender, and ethnicity.

Without the need to model detailed physical characteristics of the face, the training was carried out under the supervision and with the concurrence of faces and voices from Internet videos.

“The correlations between faces and voices are revealed by our reconstructions, which were obtained directly from the audio.” We numerically assess and quantify how closely our Speech2Face reconstructions from audio resemble real images of speakers’ faces.”

They explain that because this study may have sensitive aspects due to ethnicity or privacy, no specific physical aspects have been added to the recreation of faces, and that, like any other machine learning system, this will improve overtime as each use increases your library of knowledge.

While the results of the displayed tests show that Speech2Face has a high number of face-to-voice matches, it also had some flaws, such as failing to match ethnicity, age, or gender with the voice sample used.

The model is intended to present statistical correlations between facial features and voice. It should be noted that the AI was trained using YouTube videos, which do not represent a representative sample of the world’s population; for example, in some languages, it shows discrepancies with the training data.

In this regard, the study itself recommends at the end of its findings that those who decide to investigate and modernize the system take into account a larger sample of people and voices so that machine learning has a broader repertoire of matching and recreation. of expressions

The program was also able to recreate the voices in cartoons, which have a striking resemblance to the voices in the analyzed audios.

Because this technology could be used for malicious purposes, the recreation of the face only keeps the closest thing to the person and does not provide full faces, as this could be a privacy issue.

Nonetheless, I’ve been astounded by what technology can do with audio samples.

This post was last modified on February 22, 2023 7:13 pm

Geekybar

Linguist-translator by education. I have been working in the field of advertising journalism for over 10 years. For over 7 years in journalism. Half of them are as editor. My weakness is doing mini-investigations on new topics.

Recent Posts

Review of ‘Berlín’: A more enjoyable robbery than in Money Heist’, with a band that you fall in love with

Creating a completely different series based on a universe that knows half the world can't… Read More

2 years ago

The best series of 2023

The strikes of scriptwriters and actors have put in check the constant flow of content to… Read More

2 years ago

Who is Samantha Siqueiros, the seductive Camile in ‘Berlin: Money Heist’ on Netflix

Money Heist has returned to Netflix. This time as a prequel to the original series… Read More

2 years ago

You only have 2 days to see it: what for many is the best war film in the history of cinema is leaving Netflix

One of the 50 best war films in cinema history turns 25 years old. And it is not… Read More

2 years ago

A kiss that promises orgasms: the Singapore kiss

What is the singapore kiss In short, it is about emulating during intercourse, through the… Read More

3 years ago

PlayStation does not want you to become an online bully, and this is demonstrated with a patent that aims to analyze your voice when you play

Regarding patents, Sony is one of the technology companies that has carried out the most registrations in… Read More

3 years ago