Microsoft VALL-E

Product

Developers:	Microsoft
Date of the premiere of the system:	January 2023
Branches:	Information Technology
Technology:	Speech technology

2023: Neural Network Announcement

On January 5, 2023, Microsoft Corporation introduced a new model of artificial intelligence (AI), capable of converting text into speech, accurately imitating the voice of a particular person. The project was named VALL-E.

Microsoft calls the proposed solution "the language model of the neural codec." This AI is able to recreate a person's voice based on a speech sample lasting only three seconds. Moreover, not only the voice is imitated, but also the emotional coloring.

Microsoft introduced an open neural network that can imitate the voice of a person

The VALL-E neural network is based on EnCodec technology, which Meta (recognized as an extremist organization; activities on the territory of the Russian Federation are prohibited) presented in October 2022. Unlike other text-to-speech techniques that typically manipulate sound waves, VALL-E analyzes a person's speech by breaking that information into separate components (called "tokens"). The neural network then uses training algorithms to synthesize any phrases based on available knowledge. For training, the Meta LibriLight library was used, which contains about 60,000 hours of English-language speech from more than 7,000 people (mainly from LibriVox public audiobooks).

It is noted that VALL-E does an excellent job of recreating the sound environment of the original recording. If a voice sounds like a person is on the phone, then synthesized phrases will sound the same way. In addition, the neural network mimics accents well - at least British and American several European ones.

VALL-E can be used, for example, to simulate the voice of actors or create voice chatbots. On the other hand, such a neural network can be powerful tools in the hands of attackers. Fraudsters, for example, will be able to call a person's relatives on the phone, imitating his speech after a three-second recording of the conversation. In addition, fake statements with the votes of politicians, etc. can be created.^[1]

Notes

↑ Vall-e on GitHub

Источник — «https://tadviser.com/index.php/Product:Microsoft_VALL-E»

The site content is translated by machine translation software powered by PROMT. The machine-translated articles are not always perfect and may contain errors in vocabulary, syntax or grammar. Read original article
If you find inaccuracies or errors in the results of machine translation, please write to editor@tadviser.ru. We will make every effort to correct them as soon as possible.

Simple Link

How to create a "smart plant": Key characteristics of a modern digital enterprise 10000

Model Studio CS: How to use BIM to give new impetus to the development of the fuel and energy complex 9900