VALL-E, artificial intelligence capable of imitating any voice after listening to a 3-second audio

Share This Post

- Advertisement -

While the sophistication of the chatbots the debate on artificial intelligence grows and intensifies, Microsoft is in full development of its own technology. It concerns VALLEY and has the ability to learn and imitate any voice taking a three-second recording as an example.

- Advertisement -

The company founded by Bill Gates is also working on projects to add the chat developed by OpenAI ChatGPT in their search engines and Office suite, according to various US media. Among other things, it would integrate Word, PowerPoint and Outlook. You are also going to use it in your search engine Bing and thus enter into competition with Google.

How is VALL-E, Microsoft’s artificial intelligence

- Advertisement -

The Redmond tech giant presented its AI project VALLEYa Text-to-Speech (TTS) language model capable of synthesizing text to turn it into speech.

“Specifically, we trained a neural codec language model using discrete codes derived from an out-of-the-box neural audio codec model, and considered TTS as a conditional language modeling task rather than a continuous regression of the signal as in the work above,” explains the company on its website.

The machine is conscientiously prepared. In preparation, TTS training data was entered at 60,000 hours of English language, “hundreds of times larger than existing systems.”

The novelty of this technology developed by Microsoft is your learning ability in context that, through audio recordings of just three seconds, he is able to imitate the voices of these recordings.

‘VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with just a three-second recorded recording of an invisible speaker as an acoustic cue,’ they said.

The American multinational is very satisfied with the results, understanding that “they considerably surpass the latest generation TTS system in terms of speech naturalness and speaker similarity”.

Its developers also point out that the samples taken suggest that VALL-E could “preserve the emotion of the speaker and the acoustic environment of the message”.

The news doesn’t end there. VALL-E is that it was developed to work with “other models of generative AI”, such as GPT-3. In the not too distant future, this feature offers the possibility to integrate VALL-E into other technologies such as ChatGPT.

Similar to Microsoft, other industry giants have also entered the field of these smart technologies.

Researchers at Meta (Facebook) recently developed a program called Cicero, named after the Roman statesman Cicero.

The software tested Diplomacy, a board game that requires participants to show their negotiating talents.

“If you don’t speak like a real person — showing empathy, building relationships and speaking properly — you won’t be able to form alliances with other players,” the social media giant explained in a statement.

character.ai, a startup founded by former Google engineers, launched an experimental chatbot online in October that can take on any personality. Users create characters based on a short description and can then “chat” with a fake Sherlock Holmes, Socrates or even Donald Trump.

This degree of sophistication fascinates, but also worries many observers with the idea that these technologies are not being used to deceive humans, by spreading false information, for example, or by creating increasingly believable scams.

SL

Source: Clarin

- Advertisement -

Related Posts