RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

VKontakte: Automatic Speech Recognition (ASR)

Product
Developers: Vkontakte
Date of the premiere of the system: 2022/09/22
Technology: Speech technology

The main articles are:

2022: Announcement of speech recognition technology

Now developers will be able to use VKontakte technology in their indie projects for free, which reads the voice and translates it into text. This was announced on September 22, 2022 by the technical director of VKontakte Alexander Tobol. Speech recognition technology, or ASR, Automatic Speech Recognition, is implemented in a few clicks. Neural networks cope well with audio with extraneous noise, a lot of slang and contractions.

What can a neural network do?

According to the company, you can choose one of two models for recognition. Neutral is suitable for intelligible speech, as in a TV show or interview, and spontaneous will help when you need to process more ordinary speech with slang and profanity. Neural networks VKontakte process files in a few seconds, know how to remove noise and pauses from decryption, understand illegible speech and even a separate sound "b."

The technology can be tried through a web interface on a special page or integrated through the public VKontakte API. A wide range of methods is available on the portal with which you can create VKontakte widgets or use them in third-party projects. The solution is suitable for startups, indie projects, personal pet projects for training and self-development. The version with audio processing up to 100 minutes per day can be used for any purpose. And for unlimited use of the technology, you can send an application to e-mail.

File:Aquote1.png
Every month, VKontakte users send more than 2 billion voice - this is millions of hours of audio that our neural networks process. The use of the technology is limited only by imagination: you can make a game with voice control or, using a chatbot, finally add voice recognition to some third-party messenger.

told Alexander Tobol, STO VKontakte
File:Aquote2.png

ASR is used by VKontakte to decrypt voice messages, generate subtitles in video, personal recommendations, and more. Under the hood, the solution has three neural networks at once: one is responsible for speech recognition, the second finds suitable words, and the third places punctuation marks. The technology is designed to cope with the daily processing of hundreds of millions of messages of different duration, quality and content. Each message is decrypted very quickly - about 1.5 seconds after sending.